All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2
@ 2012-07-24 11:03 Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 01/47] qapi: generalize documentation of streaming commands Paolo Bonzini
                   ` (47 more replies)
  0 siblings, 48 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Hi all, this is the first non-RFC submission of my block job patches
for 1.2.  Everything is there, including multiple in-flight operations
in the mirroring job and new testcases (for all of streaming, mirroring,
hierarchical bitmap).  The tests use blkdebug to test error reporting
for both streaming and mirroring.

This still does not include a persistent dirty bitmap, which will be work
for 1.3.

If you want to tinker with this, everything is available at
git://github.com/bonzini/qemu.git in branch blkmirror-job-1.2.

I know it's a lot of code, I'm sorry for dropping this quite close to
the feature freeze.  Unfortunately, preparing for the Linux merge window
and other non-QEMU tasks have dragged this 1-2 weeks more than I would
have liked.

The patches are organized as follows:

01-12   preparatory work for block job errors, including support for
        pausing and resuming jobs

13-17   introduce block job errors, and add support in block-stream

18-26   preparatory work for block mirroring, including creating new
        new functions out of existing code.

27-34   introduce a simple version of mirroring.  The initial patch
        add the mirroring logic, followed by the ability to switch to
        the destination of migration, to query the target file (for
        example, polling the high-water mark), and to handle errors
        during the job.  All these changes come with testcases.

35-43   These patches introduce the first optimizations, namely supporting
        an arbitrary granularity for the dirty bitmap.  The current default,
        1M, is too coarse to let the job converge quickly and in almost
        real-time.  These patches reimplement the block device dirty bitmap
        to allow efficient iteration, and add cluster copy-on-write logic.
        Cluster copy-on-write is needed because management will want to
        start the copy before the backing file is in place in the destination;
        if mirroring takes care of copy-on-write, BDRV_O_NO_BACKING can be
        used even if the granularity is smaller than the cluster size.

44-47   A second round optimizations, replacing serialized read-write
        operations with multiple asynchronous I/O operations.  The various
        in-flight operations can be of arbitrary size.  The initial copy
        will end up reading large chunks sequentially (10M by default),
        while subsequent passes can mimic more closely the guest's I/O
        patterns.

Compared to v1, the last four patches are entirely new, and so are many
of the testcase changes.  All comments from Eric's review are addressed.
In some cases the patches were modified (reversing if conditions or things
like that) in order to keep later patches simpler.  I also added several
new tracepoints.

Latency is vital to any migration scheme using a dirty bitmap, especially
because completion is entirely asynchronous, so I expect this to be used
either with pretty good storage, or on guests doing relatively little I/O.
I tested this both on my laptop and with moderately high-end SAS disks.

On the SAS disks, time between checkpoints (trace_mirror_before_flush)
on kernel compilation (-j3 to -j12, 4 or 8 vCPUs) is almost always within
1 second, usually much less targeting a local disk.  On hibernation,
which is a worst-case test (sequential I/O happening with no flushes
in between) and failed completely to converge on my lowly laptop hard
disk, a checkpoint was reached every 0.5 to 3 seconds.  When targeting
a local qemu-nbd server performance was similar.  Kernel compilation
showed occasional bumps, but they were fixed in 1.5-7 seconds.

Please review!

Paolo Bonzini (47):
  qapi: generalize documentation of streaming commands
  qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE
  block: move job APIs to separate files
  block: add block_job_query
  block: add support for job pause/resume
  qmp: add block-job-pause and block-job-resume
  qemu-iotests: add test for pausing a streaming operation
  block: rename block_job_complete to block_job_completed
  block: rename BlockErrorAction, BlockQMPEventAction
  block: move BlockdevOnError declaration to QAPI
  block: reorganize io error code
  block: sort BlockDeviceIoStatus errors by severity
  block: introduce block job error
  stream: add on-error argument
  blkdebug: process all set_state rules in the old state
  qemu-iotests: map underscore to dash in QMP argument names
  qemu-iotests: add tests for streaming error handling
  block: live snapshot documentation tweaks
  block: add bdrv_query_info
  block: add bdrv_query_stats
  block: add bdrv_ensure_backing_file
  block: make device optional in BlockInfo
  block: add target info to QMP query-blockjobs command
  block: introduce new dirty bitmap functionality
  block: add block-job-complete
  block: introduce BLOCK_JOB_READY event
  block: introduce mirror job
  qmp: add drive-mirror command
  mirror: support querying target file
  mirror: implement completion
  qemu-iotests: add mirroring test case
  block: forward bdrv_iostatus_reset to block job
  mirror: add support for on-source-error/on-target-error
  qmp: add pull_event function
  qemu-iotests: add testcases for mirroring
    on-source-error/on-target-error
  host-utils: add ffsl and flsl
  add hierarchical bitmap data type and test cases
  block: implement dirty bitmap using HBitmap
  block: make round_to_clusters public
  mirror: perform COW if the cluster size is bigger than the
    granularity
  block: return count of dirty sectors, not chunks
  block: allow customizing the granularity of the dirty bitmap
  mirror: allow customizing the granularity
  mirror: switch mirror_iteration to AIO
  mirror: add buf-size argument to drive-mirror
  mirror: support more than one in-flight AIO operation
  mirror: support arbitrarily-sized iterations

 Makefile.objs                 |    5 +-
 QMP/qmp-events.txt            |   43 +++
 QMP/qmp.py                    |   20 ++
 block-migration.c             |    8 +-
 block.c                       |  486 ++++++++++++------------------
 block.h                       |   37 ++-
 block/Makefile.objs           |    3 +-
 block/blkdebug.c              |   14 +-
 block/mirror.c                |  562 +++++++++++++++++++++++++++++++++++
 block/stream.c                |   33 +-
 block_int.h                   |  192 +++---------
 blockdev.c                    |  257 +++++++++++++---
 blockjob.c                    |  290 ++++++++++++++++++
 blockjob.h                    |  285 ++++++++++++++++++
 hbitmap.c                     |  394 ++++++++++++++++++++++++
 hbitmap.h                     |   51 ++++
 hmp-commands.hx               |   73 ++++-
 hmp.c                         |   65 +++-
 hmp.h                         |    4 +
 host-utils.h                  |   45 +++
 hw/fdc.c                      |    4 +-
 hw/ide/core.c                 |   20 +-
 hw/scsi-disk.c                |   23 +-
 hw/scsi-generic.c             |    4 +-
 hw/virtio-blk.c               |   19 +-
 monitor.c                     |    2 +
 monitor.h                     |    2 +
 qapi-schema.json              |  238 +++++++++++++--
 qemu-tool.c                   |    6 +
 qerror.c                      |   12 +
 qerror.h                      |    9 +
 qmp-commands.hx               |   72 ++++-
 tests/Makefile                |    2 +
 tests/qemu-iotests/030        |  178 ++++++++++-
 tests/qemu-iotests/039        |  661 +++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group      |    3 +-
 tests/qemu-iotests/iotests.py |   19 +-
 tests/test-hbitmap.c          |  384 ++++++++++++++++++++++++
 trace-events                  |   24 +-
 39 files changed, 3946 insertions(+), 603 deletions(-)
 create mode 100644 block/mirror.c
 create mode 100644 blockjob.c
 create mode 100644 blockjob.h
 create mode 100644 hbitmap.c
 create mode 100644 hbitmap.h
 create mode 100755 tests/qemu-iotests/039
 create mode 100644 tests/test-hbitmap.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 01/47] qapi: generalize documentation of streaming commands
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 02/47] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE Paolo Bonzini
                   ` (46 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Talk about background operations in general, rather than specifically
about streaming.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hmp-commands.hx  |    2 +-
 qapi-schema.json |   17 ++++++++---------
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index eea8b32..9bbc7f7 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -101,7 +101,7 @@ ETEXI
         .name       = "block_job_cancel",
         .args_type  = "device:B",
         .params     = "device",
-        .help       = "stop an active block streaming operation",
+        .help       = "stop an active background block operation",
         .mhandler.cmd = hmp_block_job_cancel,
     },
 
diff --git a/qapi-schema.json b/qapi-schema.json
index bc55ed2..000eb83 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1657,7 +1657,7 @@
 # Returns: Nothing on success
 #          If the job type does not support throttling, NotSupported
 #          If the speed value is invalid, InvalidParameter
-#          If streaming is not active on this device, DeviceNotActive
+#          If no background operation is active on this device, DeviceNotActive
 #
 # Since: 1.1
 ##
@@ -1667,9 +1667,9 @@
 ##
 # @block-job-cancel:
 #
-# Stop an active block streaming operation.
+# Stop an active background block operation.
 #
-# This command returns immediately after marking the active block streaming
+# This command returns immediately after marking the active background block
 # operation for cancellation.  It is an error to call this command if no
 # operation is in progress.
 #
@@ -1677,16 +1677,15 @@
 # BLOCK_JOB_CANCELLED event.  Before that happens the job is still visible when
 # enumerated using query-block-jobs.
 #
-# The image file retains its backing file unless the streaming operation happens
-# to complete just as it is being cancelled.
-#
-# A new block streaming operation can be started at a later time to finish
-# copying all data from the backing file.
+# For streaming, the image file retains its backing file unless the streaming
+# operation happens to complete just as it is being cancelled.  A new streaming
+# operation can be started at a later time to finish copying all data from the
+# backing file.
 #
 # @device: the device name
 #
 # Returns: Nothing on success
-#          If streaming is not active on this device, DeviceNotActive
+#          If no background operation is active on this device, DeviceNotActive
 #          If cancellation already in progress, DeviceInUse
 #
 # Since: 1.1
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 02/47] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 01/47] qapi: generalize documentation of streaming commands Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-26 15:26   ` Kevin Wolf
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 03/47] block: move job APIs to separate files Paolo Bonzini
                   ` (45 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

The DeviceNotActive error is not a particularly good match, add
a separate one.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |    4 ++--
 qapi-schema.json |    5 ++---
 qerror.c         |    4 ++++
 qerror.h         |    3 +++
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 3d75015..9c142ee 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1139,7 +1139,7 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
     BlockJob *job = find_block_job(device);
 
     if (!job) {
-        error_set(errp, QERR_DEVICE_NOT_ACTIVE, device);
+        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
         return;
     }
 
@@ -1151,7 +1151,7 @@ void qmp_block_job_cancel(const char *device, Error **errp)
     BlockJob *job = find_block_job(device);
 
     if (!job) {
-        error_set(errp, QERR_DEVICE_NOT_ACTIVE, device);
+        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
         return;
     }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 000eb83..040981e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1657,7 +1657,7 @@
 # Returns: Nothing on success
 #          If the job type does not support throttling, NotSupported
 #          If the speed value is invalid, InvalidParameter
-#          If no background operation is active on this device, DeviceNotActive
+#          If no background operation is active on this device, BlockJobNotActive
 #
 # Since: 1.1
 ##
@@ -1685,8 +1685,7 @@
 # @device: the device name
 #
 # Returns: Nothing on success
-#          If no background operation is active on this device, DeviceNotActive
-#          If cancellation already in progress, DeviceInUse
+#          If no background operation is active on this device, BlockJobNotActive
 #
 # Since: 1.1
 ##
diff --git a/qerror.c b/qerror.c
index 92c4eff..bc672a5 100644
--- a/qerror.c
+++ b/qerror.c
@@ -60,6 +60,10 @@ static const QErrorStringTable qerror_table[] = {
         .desc      = "Base '%(base)' not found",
     },
     {
+        .error_fmt = QERR_BLOCK_JOB_NOT_ACTIVE,
+        .desc      = "No active block job on device '%(name)'",
+    },
+    {
         .error_fmt = QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
         .desc      = "Block format '%(format)' used by device '%(name)' does not support feature '%(feature)'",
     },
diff --git a/qerror.h b/qerror.h
index b4c8758..7cf7d22 100644
--- a/qerror.h
+++ b/qerror.h
@@ -64,6 +64,9 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_BASE_NOT_FOUND \
     "{ 'class': 'BaseNotFound', 'data': { 'base': %s } }"
 
+#define QERR_BLOCK_JOB_NOT_ACTIVE \
+    "{ 'class': 'BlockJobNotActive', 'data': { 'name': %s } }"
+
 #define QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED \
     "{ 'class': 'BlockFormatFeatureNotSupported', 'data': { 'format': %s, 'name': %s, 'feature': %s } }"
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 03/47] block: move job APIs to separate files
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 01/47] qapi: generalize documentation of streaming commands Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 02/47] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-26 15:50   ` Kevin Wolf
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 04/47] block: add block_job_query Paolo Bonzini
                   ` (44 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.objs       |    5 +-
 block.c             |  128 +-----------------------------------
 block.h             |    2 +
 block/Makefile.objs |    3 +-
 block/stream.c      |    1 +
 block_int.h         |  153 -------------------------------------------
 blockdev.c          |    1 +
 blockjob.c          |  163 ++++++++++++++++++++++++++++++++++++++++++++++
 blockjob.h          |  181 +++++++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 354 insertions(+), 283 deletions(-)
 create mode 100644 blockjob.c
 create mode 100644 blockjob.h

diff --git a/Makefile.objs b/Makefile.objs
index 5ebbcfa..67e9d8d 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -42,7 +42,8 @@ coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
 block-obj-y = cutils.o iov.o cache-utils.o qemu-option.o module.o async.o
-block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o qemu-progress.o qemu-sockets.o
+block-obj-y += nbd.o block.o blockjob.o aio.o aes.o qemu-config.o
+block-obj-y += qemu-progress.o qemu-sockets.o
 block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
 block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
@@ -59,7 +60,7 @@ endif
 # suppress *all* target specific code in case of system emulation, i.e. a
 # single QEMU executable should support all CPUs and machines.
 
-common-obj-y = $(block-obj-y) blockdev.o
+common-obj-y = $(block-obj-y) blockdev.o block/
 common-obj-y += net.o net/
 common-obj-y += qom/
 common-obj-y += readline.o console.o cursor.o
diff --git a/block.c b/block.c
index ce7eb8f..41e329a 100644
--- a/block.c
+++ b/block.c
@@ -26,6 +26,7 @@
 #include "trace.h"
 #include "monitor.h"
 #include "block_int.h"
+#include "blockjob.h"
 #include "module.h"
 #include "qjson.h"
 #include "qemu-coroutine.h"
@@ -4026,130 +4027,3 @@ out:
 
     return ret;
 }
-
-void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
-                       int64_t speed, BlockDriverCompletionFunc *cb,
-                       void *opaque, Error **errp)
-{
-    BlockJob *job;
-
-    if (bs->job || bdrv_in_use(bs)) {
-        error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
-        return NULL;
-    }
-    bdrv_set_in_use(bs, 1);
-
-    job = g_malloc0(job_type->instance_size);
-    job->job_type      = job_type;
-    job->bs            = bs;
-    job->cb            = cb;
-    job->opaque        = opaque;
-    job->busy          = true;
-    bs->job = job;
-
-    /* Only set speed when necessary to avoid NotSupported error */
-    if (speed != 0) {
-        Error *local_err = NULL;
-
-        block_job_set_speed(job, speed, &local_err);
-        if (error_is_set(&local_err)) {
-            bs->job = NULL;
-            g_free(job);
-            bdrv_set_in_use(bs, 0);
-            error_propagate(errp, local_err);
-            return NULL;
-        }
-    }
-    return job;
-}
-
-void block_job_complete(BlockJob *job, int ret)
-{
-    BlockDriverState *bs = job->bs;
-
-    assert(bs->job == job);
-    job->cb(job->opaque, ret);
-    bs->job = NULL;
-    g_free(job);
-    bdrv_set_in_use(bs, 0);
-}
-
-void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
-{
-    Error *local_err = NULL;
-
-    if (!job->job_type->set_speed) {
-        error_set(errp, QERR_NOT_SUPPORTED);
-        return;
-    }
-    job->job_type->set_speed(job, speed, &local_err);
-    if (error_is_set(&local_err)) {
-        error_propagate(errp, local_err);
-        return;
-    }
-
-    job->speed = speed;
-}
-
-void block_job_cancel(BlockJob *job)
-{
-    job->cancelled = true;
-    if (job->co && !job->busy) {
-        qemu_coroutine_enter(job->co, NULL);
-    }
-}
-
-bool block_job_is_cancelled(BlockJob *job)
-{
-    return job->cancelled;
-}
-
-struct BlockCancelData {
-    BlockJob *job;
-    BlockDriverCompletionFunc *cb;
-    void *opaque;
-    bool cancelled;
-    int ret;
-};
-
-static void block_job_cancel_cb(void *opaque, int ret)
-{
-    struct BlockCancelData *data = opaque;
-
-    data->cancelled = block_job_is_cancelled(data->job);
-    data->ret = ret;
-    data->cb(data->opaque, ret);
-}
-
-int block_job_cancel_sync(BlockJob *job)
-{
-    struct BlockCancelData data;
-    BlockDriverState *bs = job->bs;
-
-    assert(bs->job == job);
-
-    /* Set up our own callback to store the result and chain to
-     * the original callback.
-     */
-    data.job = job;
-    data.cb = job->cb;
-    data.opaque = job->opaque;
-    data.ret = -EINPROGRESS;
-    job->cb = block_job_cancel_cb;
-    job->opaque = &data;
-    block_job_cancel(job);
-    while (data.ret == -EINPROGRESS) {
-        qemu_aio_wait();
-    }
-    return (data.cancelled && data.ret == 0) ? -ECANCELED : data.ret;
-}
-
-void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
-{
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (!block_job_is_cancelled(job)) {
-        job->busy = false;
-        co_sleep_ns(clock, ns);
-        job->busy = true;
-    }
-}
diff --git a/block.h b/block.h
index c89590d..9b5fca4 100644
--- a/block.h
+++ b/block.h
@@ -6,9 +6,11 @@
 #include "qemu-option.h"
 #include "qemu-coroutine.h"
 #include "qobject.h"
+#include "qapi-types.h"
 
 /* block.c */
 typedef struct BlockDriver BlockDriver;
+typedef struct BlockJob BlockJob;
 
 typedef struct BlockDriverInfo {
     /* in bytes, 0 if irrelevant */
diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..c45affc 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -3,9 +3,10 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-c
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
-block-obj-y += stream.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
+
+common-obj-y += stream.o
diff --git a/block/stream.c b/block/stream.c
index 37c4652..5b801b4 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -13,6 +13,7 @@
 
 #include "trace.h"
 #include "block_int.h"
+#include "blockjob.h"
 #include "qemu/ratelimit.h"
 
 enum {
diff --git a/block_int.h b/block_int.h
index d72317f..d4642fb 100644
--- a/block_int.h
+++ b/block_int.h
@@ -64,73 +64,6 @@ typedef struct BlockIOBaseValue {
     uint64_t ios[2];
 } BlockIOBaseValue;
 
-typedef struct BlockJob BlockJob;
-
-/**
- * BlockJobType:
- *
- * A class type for block job objects.
- */
-typedef struct BlockJobType {
-    /** Derived BlockJob struct size */
-    size_t instance_size;
-
-    /** String describing the operation, part of query-block-jobs QMP API */
-    const char *job_type;
-
-    /** Optional callback for job types that support setting a speed limit */
-    void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
-} BlockJobType;
-
-/**
- * BlockJob:
- *
- * Long-running operation on a BlockDriverState.
- */
-struct BlockJob {
-    /** The job type, including the job vtable.  */
-    const BlockJobType *job_type;
-
-    /** The block device on which the job is operating.  */
-    BlockDriverState *bs;
-
-    /**
-     * The coroutine that executes the job.  If not NULL, it is
-     * reentered when busy is false and the job is cancelled.
-     */
-    Coroutine *co;
-
-    /**
-     * Set to true if the job should cancel itself.  The flag must
-     * always be tested just before toggling the busy flag from false
-     * to true.  After a job has been cancelled, it should only yield
-     * if #qemu_aio_wait will ("sooner or later") reenter the coroutine.
-     */
-    bool cancelled;
-
-    /**
-     * Set to false by the job while it is in a quiescent state, where
-     * no I/O is pending and the job has yielded on any condition
-     * that is not detected by #qemu_aio_wait, such as a timer.
-     */
-    bool busy;
-
-    /** Offset that is published by the query-block-jobs QMP API */
-    int64_t offset;
-
-    /** Length that is published by the query-block-jobs QMP API */
-    int64_t len;
-
-    /** Speed that was set with @block_job_set_speed.  */
-    int64_t speed;
-
-    /** The completion function that will be called when the job completes.  */
-    BlockDriverCompletionFunc *cb;
-
-    /** The opaque value that is passed to the completion function.  */
-    void *opaque;
-};
-
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -345,92 +278,6 @@ int is_windows_drive(const char *filename);
 #endif
 
 /**
- * block_job_create:
- * @job_type: The class object for the newly-created job.
- * @bs: The block
- * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
- * @cb: Completion function for the job.
- * @opaque: Opaque pointer value passed to @cb.
- * @errp: Error object.
- *
- * Create a new long-running block device job and return it.  The job
- * will call @cb asynchronously when the job completes.  Note that
- * @bs may have been closed at the time the @cb it is called.  If
- * this is the case, the job may be reported as either cancelled or
- * completed.
- *
- * This function is not part of the public job interface; it should be
- * called from a wrapper that is specific to the job type.
- */
-void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
-                       int64_t speed, BlockDriverCompletionFunc *cb,
-                       void *opaque, Error **errp);
-
-/**
- * block_job_sleep_ns:
- * @job: The job that calls the function.
- * @clock: The clock to sleep on.
- * @ns: How many nanoseconds to stop for.
- *
- * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will interrupt the wait immediately.
- */
-void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns);
-
-/**
- * block_job_complete:
- * @job: The job being completed.
- * @ret: The status code.
- *
- * Call the completion function that was registered at creation time, and
- * free @job.
- */
-void block_job_complete(BlockJob *job, int ret);
-
-/**
- * block_job_set_speed:
- * @job: The job to set the speed for.
- * @speed: The new value
- * @errp: Error object.
- *
- * Set a rate-limiting parameter for the job; the actual meaning may
- * vary depending on the job type.
- */
-void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
-
-/**
- * block_job_cancel:
- * @job: The job to be canceled.
- *
- * Asynchronously cancel the specified job.
- */
-void block_job_cancel(BlockJob *job);
-
-/**
- * block_job_is_cancelled:
- * @job: The job being queried.
- *
- * Returns whether the job is scheduled for cancellation.
- */
-bool block_job_is_cancelled(BlockJob *job);
-
-/**
- * block_job_cancel:
- * @job: The job to be canceled.
- *
- * Asynchronously cancel the job and wait for it to reach a quiescent
- * state.  Note that the completion callback will still be called
- * asynchronously, hence it is *not* valid to call #bdrv_delete
- * immediately after #block_job_cancel_sync.  Users of block jobs
- * will usually protect the BlockDriverState objects with a reference
- * count, should this be a concern.
- *
- * Returns the return value from the job if the job actually completed
- * during the call, or -ECANCELED if it was canceled.
- */
-int block_job_cancel_sync(BlockJob *job);
-
-/**
  * stream_start:
  * @bs: Block device to operate on.
  * @base: Block device that will become the new base, or %NULL to
diff --git a/blockdev.c b/blockdev.c
index 9c142ee..e066f8f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -9,6 +9,7 @@
 
 #include "blockdev.h"
 #include "hw/block-common.h"
+#include "blockjob.h"
 #include "monitor.h"
 #include "qerror.h"
 #include "qemu-option.h"
diff --git a/blockjob.c b/blockjob.c
new file mode 100644
index 0000000..9737a43
--- /dev/null
+++ b/blockjob.c
@@ -0,0 +1,163 @@
+/*
+ * QEMU System Emulator block driver
+ *
+ * Copyright (c) 2011 IBM Corp.
+ * Copyright (c) 2012 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "config-host.h"
+#include "qemu-common.h"
+#include "trace.h"
+#include "monitor.h"
+#include "block.h"
+#include "blockjob.h"
+#include "block_int.h"
+#include "qjson.h"
+#include "qemu-coroutine.h"
+#include "qmp-commands.h"
+#include "qemu-timer.h"
+
+void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
+                       int64_t speed, BlockDriverCompletionFunc *cb,
+                       void *opaque, Error **errp)
+{
+    BlockJob *job;
+
+    if (bs->job || bdrv_in_use(bs)) {
+        error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
+        return NULL;
+    }
+    bdrv_set_in_use(bs, 1);
+
+    job = g_malloc0(job_type->instance_size);
+    job->job_type      = job_type;
+    job->bs            = bs;
+    job->cb            = cb;
+    job->opaque        = opaque;
+    job->busy          = true;
+    bs->job = job;
+
+    /* Only set speed when necessary to avoid NotSupported error */
+    if (speed != 0) {
+        Error *local_err = NULL;
+
+        block_job_set_speed(job, speed, &local_err);
+        if (error_is_set(&local_err)) {
+            bs->job = NULL;
+            g_free(job);
+            bdrv_set_in_use(bs, 0);
+            error_propagate(errp, local_err);
+            return NULL;
+        }
+    }
+    return job;
+}
+
+void block_job_complete(BlockJob *job, int ret)
+{
+    BlockDriverState *bs = job->bs;
+
+    assert(bs->job == job);
+    job->cb(job->opaque, ret);
+    bs->job = NULL;
+    g_free(job);
+    bdrv_set_in_use(bs, 0);
+}
+
+void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+    Error *local_err = NULL;
+
+    if (!job->job_type->set_speed) {
+        error_set(errp, QERR_NOT_SUPPORTED);
+        return;
+    }
+    job->job_type->set_speed(job, speed, &local_err);
+    if (error_is_set(&local_err)) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    job->speed = speed;
+}
+
+void block_job_cancel(BlockJob *job)
+{
+    job->cancelled = true;
+    if (job->co && !job->busy) {
+        qemu_coroutine_enter(job->co, NULL);
+    }
+}
+
+bool block_job_is_cancelled(BlockJob *job)
+{
+    return job->cancelled;
+}
+
+struct BlockCancelData {
+    BlockJob *job;
+    BlockDriverCompletionFunc *cb;
+    void *opaque;
+    bool cancelled;
+    int ret;
+};
+
+static void block_job_cancel_cb(void *opaque, int ret)
+{
+    struct BlockCancelData *data = opaque;
+
+    data->cancelled = block_job_is_cancelled(data->job);
+    data->ret = ret;
+    data->cb(data->opaque, ret);
+}
+
+int block_job_cancel_sync(BlockJob *job)
+{
+    struct BlockCancelData data;
+    BlockDriverState *bs = job->bs;
+
+    assert(bs->job == job);
+
+    /* Set up our own callback to store the result and chain to
+     * the original callback.
+     */
+    data.job = job;
+    data.cb = job->cb;
+    data.opaque = job->opaque;
+    data.ret = -EINPROGRESS;
+    job->cb = block_job_cancel_cb;
+    job->opaque = &data;
+    block_job_cancel(job);
+    while (data.ret == -EINPROGRESS) {
+        qemu_aio_wait();
+    }
+    return (data.cancelled && data.ret == 0) ? -ECANCELED : data.ret;
+}
+
+void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
+{
+    /* Check cancellation *before* setting busy = false, too!  */
+    if (!block_job_is_cancelled(job)) {
+        job->busy = false;
+        co_sleep_ns(clock, ns);
+        job->busy = true;
+    }
+}
diff --git a/blockjob.h b/blockjob.h
new file mode 100644
index 0000000..1b4c7be
--- /dev/null
+++ b/blockjob.h
@@ -0,0 +1,181 @@
+/*
+ * Declarations for long-running block device operations
+ *
+ * Copyright (c) 2011 IBM Corp.
+ * Copyright (c) 2012 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef BLOCKJOB_H
+#define BLOCKJOB_H 1
+
+#include "block.h"
+
+/**
+ * BlockJobType:
+ *
+ * A class type for block job objects.
+ */
+typedef struct BlockJobType {
+    /** Derived BlockJob struct size */
+    size_t instance_size;
+
+    /** String describing the operation, part of query-block-jobs QMP API */
+    const char *job_type;
+
+    /** Optional callback for job types that support setting a speed limit */
+    void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
+} BlockJobType;
+
+/**
+ * BlockJob:
+ *
+ * Long-running operation on a BlockDriverState.
+ */
+struct BlockJob {
+    /** The job type, including the job vtable.  */
+    const BlockJobType *job_type;
+
+    /** The block device on which the job is operating.  */
+    BlockDriverState *bs;
+
+    /**
+     * The coroutine that executes the job.  If not NULL, it is
+     * reentered when busy is false and the job is cancelled.
+     */
+    Coroutine *co;
+
+    /**
+     * Set to true if the job should cancel itself.  The flag must
+     * always be tested just before toggling the busy flag from false
+     * to true.  After a job has been cancelled, it should only yield
+     * if #qemu_aio_wait will ("sooner or later") reenter the coroutine.
+     */
+    bool cancelled;
+
+    /**
+     * Set to false by the job while it is in a quiescent state, where
+     * no I/O is pending and the job has yielded on any condition
+     * that is not detected by #qemu_aio_wait, such as a timer.
+     */
+    bool busy;
+
+    /** Offset that is published by the query-block-jobs QMP API */
+    int64_t offset;
+
+    /** Length that is published by the query-block-jobs QMP API */
+    int64_t len;
+
+    /** Speed that was set with @block_job_set_speed.  */
+    int64_t speed;
+
+    /** The completion function that will be called when the job completes.  */
+    BlockDriverCompletionFunc *cb;
+
+    /** The opaque value that is passed to the completion function.  */
+    void *opaque;
+};
+
+/**
+ * block_job_create:
+ * @job_type: The class object for the newly-created job.
+ * @bs: The block
+ * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @errp: Error object.
+ *
+ * Create a new long-running block device job and return it.  The job
+ * will call @cb asynchronously when the job completes.  Note that
+ * @bs may have been closed at the time the @cb it is called.  If
+ * this is the case, the job may be reported as either cancelled or
+ * completed.
+ *
+ * This function is not part of the public job interface; it should be
+ * called from a wrapper that is specific to the job type.
+ */
+void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
+                       int64_t speed, BlockDriverCompletionFunc *cb,
+                       void *opaque, Error **errp);
+
+/**
+ * block_job_sleep_ns:
+ * @job: The job that calls the function.
+ * @clock: The clock to sleep on.
+ * @ns: How many nanoseconds to stop for.
+ *
+ * Put the job to sleep (assuming that it wasn't canceled) for @ns
+ * nanoseconds.  Canceling the job will interrupt the wait immediately.
+ */
+void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns);
+
+/**
+ * block_job_complete:
+ * @job: The job being completed.
+ * @ret: The status code.
+ *
+ * Call the completion function that was registered at creation time, and
+ * free @job.
+ */
+void block_job_complete(BlockJob *job, int ret);
+
+/**
+ * block_job_set_speed:
+ * @job: The job to set the speed for.
+ * @speed: The new value
+ * @errp: Error object.
+ *
+ * Set a rate-limiting parameter for the job; the actual meaning may
+ * vary depending on the job type.
+ */
+void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
+
+/**
+ * block_job_cancel:
+ * @job: The job to be canceled.
+ *
+ * Asynchronously cancel the specified job.
+ */
+void block_job_cancel(BlockJob *job);
+
+/**
+ * block_job_is_cancelled:
+ * @job: The job being queried.
+ *
+ * Returns whether the job is scheduled for cancellation.
+ */
+bool block_job_is_cancelled(BlockJob *job);
+
+/**
+ * block_job_cancel_sync:
+ * @job: The job to be canceled.
+ *
+ * Synchronously cancel the job and wait for it to reach a quiescent
+ * state.  Note that the completion callback will still be called
+ * asynchronously, hence it is *not* valid to call #bdrv_delete
+ * immediately after #block_job_cancel_sync.  Users of block jobs
+ * will usually protect the BlockDriverState objects with a reference
+ * count, should this be a concern.
+ *
+ * Returns the return value from the job if the job actually completed
+ * during the call, or -ECANCELED if it was canceled.
+ */
+int block_job_cancel_sync(BlockJob *job);
+
+#endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 04/47] block: add block_job_query
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (2 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 03/47] block: move job APIs to separate files Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-30 14:47   ` Kevin Wolf
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 05/47] block: add support for job pause/resume Paolo Bonzini
                   ` (43 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Extract it out of the implementation of query-block-jobs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c |   15 ++-------------
 blockjob.c |   11 +++++++++++
 blockjob.h |    8 ++++++++
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index e066f8f..dc099f9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1166,19 +1166,8 @@ static void do_qmp_query_block_jobs_one(void *opaque, BlockDriverState *bs)
     BlockJob *job = bs->job;
 
     if (job) {
-        BlockJobInfoList *elem;
-        BlockJobInfo *info = g_new(BlockJobInfo, 1);
-        *info = (BlockJobInfo){
-            .type   = g_strdup(job->job_type->job_type),
-            .device = g_strdup(bdrv_get_device_name(bs)),
-            .len    = job->len,
-            .offset = job->offset,
-            .speed  = job->speed,
-        };
-
-        elem = g_new0(BlockJobInfoList, 1);
-        elem->value = info;
-
+        BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
+        elem->value = block_job_query(bs->job);
         (*prev)->next = elem;
         *prev = elem;
     }
diff --git a/blockjob.c b/blockjob.c
index 9737a43..a947a6e 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -161,3 +161,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
         job->busy = true;
     }
 }
+
+BlockJobInfo *block_job_query(BlockJob *job)
+{
+    BlockJobInfo *info = g_new(BlockJobInfo, 1);
+    info->type   = g_strdup(job->job_type->job_type);
+    info->device = g_strdup(bdrv_get_device_name(job->bs));
+    info->len    = job->len;
+    info->offset = job->offset;
+    info->speed  = job->speed;
+    return info;
+}
diff --git a/blockjob.h b/blockjob.h
index 1b4c7be..347a62f 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -163,6 +163,14 @@ void block_job_cancel(BlockJob *job);
 bool block_job_is_cancelled(BlockJob *job);
 
 /**
+ * block_job_query:
+ * @job: The job to get information about.
+ *
+ * Return information about a job.
+ */
+BlockJobInfo *block_job_query(BlockJob *job);
+
+/**
  * block_job_cancel_sync:
  * @job: The job to be canceled.
  *
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 05/47] block: add support for job pause/resume
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (3 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 04/47] block: add block_job_query Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 06/47] qmp: add block-job-pause and block-job-resume Paolo Bonzini
                   ` (42 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Job pausing reuses the existing support for cancellable sleeps.  A pause
happens at the next sleeping point and lasts until the coroutine is
re-entered explicitly.  Cancellation was already doing a forced resume,
so implement it explicitly in terms of resume.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |    4 ++++
 blockjob.c       |   35 ++++++++++++++++++++++++++++++-----
 blockjob.h       |   30 ++++++++++++++++++++++++++++++
 qapi-schema.json |    5 ++++-
 qerror.c         |    4 ++++
 qerror.h         |    3 +++
 6 files changed, 75 insertions(+), 6 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index dc099f9..33ac7e0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1155,6 +1155,10 @@ void qmp_block_job_cancel(const char *device, Error **errp)
         error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
         return;
     }
+    if (job->paused) {
+        error_set(errp, QERR_BLOCK_JOB_PAUSED, device);
+        return;
+    }
 
     trace_qmp_block_job_cancel(job);
     block_job_cancel(job);
diff --git a/blockjob.c b/blockjob.c
index a947a6e..5d62191 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -99,14 +99,30 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     job->speed = speed;
 }
 
-void block_job_cancel(BlockJob *job)
+void block_job_pause(BlockJob *job)
 {
-    job->cancelled = true;
+    job->paused = true;
+}
+
+bool block_job_is_paused(BlockJob *job)
+{
+    return job->paused;
+}
+
+void block_job_resume(BlockJob *job)
+{
+    job->paused = false;
     if (job->co && !job->busy) {
         qemu_coroutine_enter(job->co, NULL);
     }
 }
 
+void block_job_cancel(BlockJob *job)
+{
+    job->cancelled = true;
+    block_job_resume(job);
+}
+
 bool block_job_is_cancelled(BlockJob *job)
 {
     return job->cancelled;
@@ -154,12 +170,20 @@ int block_job_cancel_sync(BlockJob *job)
 
 void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
 {
+    assert(job->busy);
+
     /* Check cancellation *before* setting busy = false, too!  */
-    if (!block_job_is_cancelled(job)) {
-        job->busy = false;
+    if (block_job_is_cancelled(job)) {
+        return;
+    }
+
+    job->busy = false;
+    if (block_job_is_paused(job)) {
+        qemu_coroutine_yield();
+    } else {
         co_sleep_ns(clock, ns);
-        job->busy = true;
     }
+    job->busy = true;
 }
 
 BlockJobInfo *block_job_query(BlockJob *job)
@@ -168,6 +192,7 @@ BlockJobInfo *block_job_query(BlockJob *job)
     info->type   = g_strdup(job->job_type->job_type);
     info->device = g_strdup(bdrv_get_device_name(job->bs));
     info->len    = job->len;
+    info->paused = job->paused;
     info->offset = job->offset;
     info->speed  = job->speed;
     return info;
diff --git a/blockjob.h b/blockjob.h
index 347a62f..ccbef07 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -70,6 +70,12 @@ struct BlockJob {
     bool cancelled;
 
     /**
+     * Set to true if the job is either paused, or will pause itself
+     * as soon as possible (if busy == true).
+     */
+    bool paused;
+
+    /**
      * Set to false by the job while it is in a quiescent state, where
      * no I/O is pending and the job has yielded on any condition
      * that is not detected by #qemu_aio_wait, such as a timer.
@@ -171,6 +177,30 @@ bool block_job_is_cancelled(BlockJob *job);
 BlockJobInfo *block_job_query(BlockJob *job);
 
 /**
+ * block_job_pause:
+ * @job: The job to be paused.
+ *
+ * Asynchronously pause the specified job.
+ */
+void block_job_pause(BlockJob *job);
+
+/**
+ * block_job_resume:
+ * @job: The job to be resumed.
+ *
+ * Resume the specified job.
+ */
+void block_job_resume(BlockJob *job);
+
+/**
+ * block_job_is_paused:
+ * @job: The job being queried.
+ *
+ * Returns whether the job is currently paused.
+ */
+bool block_job_is_paused(BlockJob *job);
+
+/**
  * block_job_cancel_sync:
  * @job: The job to be canceled.
  *
diff --git a/qapi-schema.json b/qapi-schema.json
index 040981e..f52ce3c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -912,6 +912,8 @@
 #
 # @len: the maximum progress value
 #
+# @paused: whether the job is paused (since 1.2)
+#
 # @offset: the current progress value
 #
 # @speed: the rate limit, bytes per second
@@ -920,7 +922,7 @@
 ##
 { 'type': 'BlockJobInfo',
   'data': {'type': 'str', 'device': 'str', 'len': 'int',
-           'offset': 'int', 'speed': 'int'} }
+           'offset': 'int', 'paused': 'bool', 'speed': 'int'} }
 
 ##
 # @query-block-jobs:
@@ -1686,6 +1688,7 @@
 #
 # Returns: Nothing on success
 #          If no background operation is active on this device, BlockJobNotActive
+#          If the job is currently paused, BlockJobPaused
 #
 # Since: 1.1
 ##
diff --git a/qerror.c b/qerror.c
index bc672a5..72183ec 100644
--- a/qerror.c
+++ b/qerror.c
@@ -64,6 +64,10 @@ static const QErrorStringTable qerror_table[] = {
         .desc      = "No active block job on device '%(name)'",
     },
     {
+        .error_fmt = QERR_BLOCK_JOB_PAUSED,
+        .desc      = "The block job for device '%(name)' is currently paused",
+    },
+    {
         .error_fmt = QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
         .desc      = "Block format '%(format)' used by device '%(name)' does not support feature '%(feature)'",
     },
diff --git a/qerror.h b/qerror.h
index 7cf7d22..d1baea0 100644
--- a/qerror.h
+++ b/qerror.h
@@ -67,6 +67,9 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_BLOCK_JOB_NOT_ACTIVE \
     "{ 'class': 'BlockJobNotActive', 'data': { 'name': %s } }"
 
+#define QERR_BLOCK_JOB_PAUSED \
+    "{ 'class': 'BlockJobPaused', 'data': { 'name': %s } }"
+
 #define QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED \
     "{ 'class': 'BlockFormatFeatureNotSupported', 'data': { 'format': %s, 'name': %s, 'feature': %s } }"
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 06/47] qmp: add block-job-pause and block-job-resume
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (4 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 05/47] block: add support for job pause/resume Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-08-01  7:42   ` Kevin Wolf
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 07/47] qemu-iotests: add test for pausing a streaming operation Paolo Bonzini
                   ` (41 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Add QMP commands matching the functionality.

Paused jobs cannot be canceled without first resuming them.  This
ensures that I/O errors are never missed by management.  However, an
optional force argument can be specified to allow that.  Right now,
jobs do not see the difference between a forced and a normal resume.
This can be changed in the future if needed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |   35 +++++++++++++++++++++++++++++++++--
 hmp-commands.hx  |   35 ++++++++++++++++++++++++++++++++---
 hmp.c            |   23 ++++++++++++++++++++++-
 hmp.h            |    2 ++
 qapi-schema.json |   47 +++++++++++++++++++++++++++++++++++++++++++++--
 qmp-commands.hx  |   12 +++++++++++-
 trace-events     |    2 ++
 7 files changed, 147 insertions(+), 9 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 33ac7e0..d29a408 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1147,15 +1147,20 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
     block_job_set_speed(job, speed, errp);
 }
 
-void qmp_block_job_cancel(const char *device, Error **errp)
+void qmp_block_job_cancel(const char *device,
+                          bool has_force, bool force, Error **errp)
 {
     BlockJob *job = find_block_job(device);
 
+    if (!has_force) {
+        force = false;
+    }
+
     if (!job) {
         error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
         return;
     }
-    if (job->paused) {
+    if (job->paused && !force) {
         error_set(errp, QERR_BLOCK_JOB_PAUSED, device);
         return;
     }
@@ -1164,6 +1169,32 @@ void qmp_block_job_cancel(const char *device, Error **errp)
     block_job_cancel(job);
 }
 
+void qmp_block_job_pause(const char *device, Error **errp)
+{
+    BlockJob *job = find_block_job(device);
+
+    if (!job) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
+        return;
+    }
+
+    trace_qmp_block_job_pause(job);
+    block_job_pause(job);
+}
+
+void qmp_block_job_resume(const char *device, Error **errp)
+{
+    BlockJob *job = find_block_job(device);
+
+    if (!job) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
+        return;
+    }
+
+    trace_qmp_block_job_resume(job);
+    block_job_resume(job);
+}
+
 static void do_qmp_query_block_jobs_one(void *opaque, BlockDriverState *bs)
 {
     BlockJobInfoList **prev = opaque;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 9bbc7f7..3c60626 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -99,9 +99,10 @@ ETEXI
 
     {
         .name       = "block_job_cancel",
-        .args_type  = "device:B",
-        .params     = "device",
-        .help       = "stop an active background block operation",
+        .args_type  = "force:-f,device:B",
+        .params     = "[-f] device",
+        .help       = "stop an active background block operation (use -f"
+                      "\n\t\t\t if the operation is currently paused)",
         .mhandler.cmd = hmp_block_job_cancel,
     },
 
@@ -112,6 +113,34 @@ Stop an active block streaming operation.
 ETEXI
 
     {
+        .name       = "block_job_pause",
+        .args_type  = "device:B",
+        .params     = "device",
+        .help       = "pause an active background block operation",
+        .mhandler.cmd = hmp_block_job_pause,
+    },
+
+STEXI
+@item block_job_pause
+@findex block_job_pause
+Pause an active block streaming operation.
+ETEXI
+
+    {
+        .name       = "block_job_resume",
+        .args_type  = "device:B",
+        .params     = "device",
+        .help       = "resume a paused background block operation",
+        .mhandler.cmd = hmp_block_job_resume,
+    },
+
+STEXI
+@item block_job_resume
+@findex block_job_resume
+Resume a paused block streaming operation.
+ETEXI
+
+    {
         .name       = "eject",
         .args_type  = "force:-f,device:B",
         .params     = "[-f] device",
diff --git a/hmp.c b/hmp.c
index 6b72a64..a4c6629 100644
--- a/hmp.c
+++ b/hmp.c
@@ -865,8 +865,29 @@ void hmp_block_job_cancel(Monitor *mon, const QDict *qdict)
 {
     Error *error = NULL;
     const char *device = qdict_get_str(qdict, "device");
+    bool force = qdict_get_try_bool(qdict, "force", 0);
 
-    qmp_block_job_cancel(device, &error);
+    qmp_block_job_cancel(device, true, force, &error);
+
+    hmp_handle_error(mon, &error);
+}
+
+void hmp_block_job_pause(Monitor *mon, const QDict *qdict)
+{
+    Error *error = NULL;
+    const char *device = qdict_get_str(qdict, "device");
+
+    qmp_block_job_pause(device, &error);
+
+    hmp_handle_error(mon, &error);
+}
+
+void hmp_block_job_resume(Monitor *mon, const QDict *qdict)
+{
+    Error *error = NULL;
+    const char *device = qdict_get_str(qdict, "device");
+
+    qmp_block_job_resume(device, &error);
 
     hmp_handle_error(mon, &error);
 }
diff --git a/hmp.h b/hmp.h
index 8d2b0d7..39a71c4 100644
--- a/hmp.h
+++ b/hmp.h
@@ -59,6 +59,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict);
 void hmp_block_stream(Monitor *mon, const QDict *qdict);
 void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
+void hmp_block_job_pause(Monitor *mon, const QDict *qdict);
+void hmp_block_job_resume(Monitor *mon, const QDict *qdict);
 void hmp_migrate(Monitor *mon, const QDict *qdict);
 void hmp_device_del(Monitor *mon, const QDict *qdict);
 void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index f52ce3c..ef7caab 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1686,13 +1686,56 @@
 #
 # @device: the device name
 #
+# @force: #optional whether to allow cancellation of a paused job (default false)
+#
 # Returns: Nothing on success
 #          If no background operation is active on this device, BlockJobNotActive
-#          If the job is currently paused, BlockJobPaused
+#          If the job is paused and force is absent or false, BlockJobPaused
 #
 # Since: 1.1
 ##
-{ 'command': 'block-job-cancel', 'data': { 'device': 'str' } }
+{ 'command': 'block-job-cancel', 'data': { 'device': 'str', '*force': 'bool' } }
+
+##
+# @block-job-pause:
+#
+# Pause an active background block operation.
+#
+# This command returns immediately after marking the active background block
+# operation for pausing.  It is an error to call this command if no
+# operation is in progress.
+#
+# The operation will pause as soon as possible.  No event is emitted when
+# the operation is actually paused.  Cancelling a paused job automatically
+# resumes it.
+#
+# @device: the device name
+#
+# Returns: Nothing on success
+#          If no background operation is active on this device, DeviceNotActive
+#
+# Since: 1.2
+##
+{ 'command': 'block-job-pause', 'data': { 'device': 'str' } }
+
+##
+# @block-job-resume:
+#
+# Resume an active background block operation.
+#
+# This command returns immediately after resuming a paused background block
+# operation for cancellation.  It is an error to call this command if no
+# operation is in progress.
+#
+# @device: the device name
+#
+# Returns: Nothing on success
+#          If no background operation is active on this device, DeviceNotActive
+#          If cancellation already in progress, DeviceInUse
+#
+# Since: 1.2
+##
+{ 'command': 'block-job-resume', 'data': { 'device': 'str' } }
 
 ##
 # @ObjectTypeInfo:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e3cf3c5..f0c98a1 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -730,10 +730,20 @@ EQMP
 
     {
         .name       = "block-job-cancel",
-        .args_type  = "device:B",
+        .args_type  = "device:B,force:b?",
         .mhandler.cmd_new = qmp_marshal_input_block_job_cancel,
     },
     {
+        .name       = "block-job-pause",
+        .args_type  = "device:B",
+        .mhandler.cmd_new = qmp_marshal_input_block_job_pause,
+    },
+    {
+        .name       = "block-job-resume",
+        .args_type  = "device:B",
+        .mhandler.cmd_new = qmp_marshal_input_block_job_resume,
+    },
+    {
         .name       = "transaction",
         .args_type  = "actions:q",
         .mhandler.cmd_new = qmp_marshal_input_transaction,
diff --git a/trace-events b/trace-events
index 2034353..416b723 100644
--- a/trace-events
+++ b/trace-events
@@ -77,6 +77,8 @@ stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
+qmp_block_job_pause(void *job) "job %p"
+qmp_block_job_resume(void *job) "job %p"
 block_stream_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 07/47] qemu-iotests: add test for pausing a streaming operation
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (5 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 06/47] qmp: add block-job-pause and block-job-resume Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 08/47] block: rename block_job_complete to block_job_completed Paolo Bonzini
                   ` (40 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

These check that a paused streaming job does not advance its offset.

Sometimes the new test fails; the map is different between the source
and the destination of the streaming because qemu-io does not always
pack adjacent clusters that have the same allocated/unallocated state.
However, this also happens with the existing test_stream testcase, and
is better fixed in qemu-io.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/030 |   40 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index cc671dd..0163945 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -18,6 +18,7 @@
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 #
 
+import time
 import os
 import iotests
 from iotests import qemu_img, qemu_io
@@ -98,6 +99,43 @@ class TestSingleDrive(ImageStreamingTestCase):
                          qemu_io('-c', 'map', test_img),
                          'image file map does not match backing file after streaming')
 
+    def test_stream_pause(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block-stream', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-pause', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        offset = self.dictpath(result, 'return[0]/offset')
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/offset', offset)
+
+        result = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+        self.vm.shutdown()
+
+        self.assertEqual(qemu_io('-c', 'map', backing_img),
+                         qemu_io('-c', 'map', test_img),
+                         'image file map does not match backing file after streaming')
+
     def test_stream_partial(self):
         self.assert_no_active_streams()
 
@@ -140,8 +178,6 @@ class TestStreamStop(ImageStreamingTestCase):
         os.remove(backing_img)
 
     def test_stream_stop(self):
-        import time
-
         self.assert_no_active_streams()
 
         result = self.vm.qmp('block-stream', device='drive0')
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 08/47] block: rename block_job_complete to block_job_completed
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (6 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 07/47] qemu-iotests: add test for pausing a streaming operation Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 09/47] block: rename BlockErrorAction, BlockQMPEventAction Paolo Bonzini
                   ` (39 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

The imperative will be used for the QMP command.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/stream.c |    4 ++--
 blockjob.c     |    2 +-
 blockjob.h     |    4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 5b801b4..b3ede44 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -84,7 +84,7 @@ static void coroutine_fn stream_run(void *opaque)
 
     s->common.len = bdrv_getlength(bs);
     if (s->common.len < 0) {
-        block_job_complete(&s->common, s->common.len);
+        block_job_completed(&s->common, s->common.len);
         return;
     }
 
@@ -161,7 +161,7 @@ wait:
     }
 
     qemu_vfree(buf);
-    block_job_complete(&s->common, ret);
+    block_job_completed(&s->common, ret);
 }
 
 static void stream_set_speed(BlockJob *job, int64_t speed, Error **errp)
diff --git a/blockjob.c b/blockjob.c
index 5d62191..a18da3f 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -71,7 +71,7 @@ void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
     return job;
 }
 
-void block_job_complete(BlockJob *job, int ret)
+void block_job_completed(BlockJob *job, int ret)
 {
     BlockDriverState *bs = job->bs;
 
diff --git a/blockjob.h b/blockjob.h
index ccbef07..2abbe13 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -132,14 +132,14 @@ void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
 void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns);
 
 /**
- * block_job_complete:
+ * block_job_completed:
  * @job: The job being completed.
  * @ret: The status code.
  *
  * Call the completion function that was registered at creation time, and
  * free @job.
  */
-void block_job_complete(BlockJob *job, int ret);
+void block_job_completed(BlockJob *job, int ret);
 
 /**
  * block_job_set_speed:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 09/47] block: rename BlockErrorAction, BlockQMPEventAction
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (7 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 08/47] block: rename block_job_complete to block_job_completed Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 10/47] block: move BlockdevOnError declaration to QAPI Paolo Bonzini
                   ` (38 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

We want to remove knowledge of BLOCK_ERR_STOP_ENOSPC from drivers;
drivers should only be told whether to stop/report/ignore the error.
On the other hand, we want to keep using the nicer BlockErrorAction
name in the drivers.  So rename the enums, while leaving aside the
names of the enum values for now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c         |    8 ++++----
 block.h         |   12 ++++++------
 block_int.h     |    2 +-
 hw/ide/core.c   |    2 +-
 hw/scsi-disk.c  |    2 +-
 hw/virtio-blk.c |    2 +-
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/block.c b/block.c
index 41e329a..7660992 100644
--- a/block.c
+++ b/block.c
@@ -1153,7 +1153,7 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
 }
 
 void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                               BlockQMPEventAction action, int is_read)
+                               BlockErrorAction action, int is_read)
 {
     QObject *data;
     const char *action_str;
@@ -2135,14 +2135,14 @@ void bdrv_set_io_limits(BlockDriverState *bs,
     bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
 }
 
-void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
-                       BlockErrorAction on_write_error)
+void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
+                       BlockdevOnError on_write_error)
 {
     bs->on_read_error = on_read_error;
     bs->on_write_error = on_write_error;
 }
 
-BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read)
+BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, int is_read)
 {
     return is_read ? bs->on_read_error : bs->on_write_error;
 }
diff --git a/block.h b/block.h
index 9b5fca4..5cfa137 100644
--- a/block.h
+++ b/block.h
@@ -91,11 +91,11 @@ typedef struct BlockDevOps {
 typedef enum {
     BLOCK_ERR_REPORT, BLOCK_ERR_IGNORE, BLOCK_ERR_STOP_ENOSPC,
     BLOCK_ERR_STOP_ANY
-} BlockErrorAction;
+} BlockdevOnError;
 
 typedef enum {
     BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP
-} BlockQMPEventAction;
+} BlockErrorAction;
 
 void bdrv_iostatus_enable(BlockDriverState *bs);
 void bdrv_iostatus_reset(BlockDriverState *bs);
@@ -103,7 +103,7 @@ void bdrv_iostatus_disable(BlockDriverState *bs);
 bool bdrv_iostatus_is_enabled(const BlockDriverState *bs);
 void bdrv_iostatus_set_err(BlockDriverState *bs, int error);
 void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                               BlockQMPEventAction action, int is_read);
+                               BlockErrorAction action, int is_read);
 void bdrv_info_print(Monitor *mon, const QObject *data);
 void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
@@ -259,9 +259,9 @@ int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
                       int *pnum);
 
-void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
-                       BlockErrorAction on_write_error);
-BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
+void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
+                       BlockdevOnError on_write_error);
+BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, int is_read);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
 int bdrv_enable_write_cache(BlockDriverState *bs);
diff --git a/block_int.h b/block_int.h
index d4642fb..4cc173d 100644
--- a/block_int.h
+++ b/block_int.h
@@ -253,7 +253,7 @@ struct BlockDriverState {
 
     /* NOTE: the following infos are only hints for real hardware
        drivers. They are not used by the block driver */
-    BlockErrorAction on_read_error, on_write_error;
+    BlockdevOnError on_read_error, on_write_error;
     bool iostatus_enabled;
     BlockDeviceIoStatus iostatus;
     char device_name[32];
diff --git a/hw/ide/core.c b/hw/ide/core.c
index d65ef3d..4823e92 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -559,7 +559,7 @@ void ide_dma_error(IDEState *s)
 static int ide_handle_rw_error(IDEState *s, int error, int op)
 {
     int is_read = (op & BM_STATUS_RETRY_READ);
-    BlockErrorAction action = bdrv_get_on_error(s->bs, is_read);
+    BlockdevOnError action = bdrv_get_on_error(s->bs, is_read);
 
     if (action == BLOCK_ERR_IGNORE) {
         bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 525816c..40804a6 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -392,7 +392,7 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
 {
     int is_read = (r->req.cmd.xfer == SCSI_XFER_FROM_DEV);
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-    BlockErrorAction action = bdrv_get_on_error(s->qdev.conf.bs, is_read);
+    BlockdevOnError action = bdrv_get_on_error(s->qdev.conf.bs, is_read);
 
     if (action == BLOCK_ERR_IGNORE) {
         bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_IGNORE, is_read);
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index f21757e..a70fbe4 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -66,7 +66,7 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
 static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
     int is_read)
 {
-    BlockErrorAction action = bdrv_get_on_error(req->dev->bs, is_read);
+    BlockdevOnError action = bdrv_get_on_error(req->dev->bs, is_read);
     VirtIOBlock *s = req->dev;
 
     if (action == BLOCK_ERR_IGNORE) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 10/47] block: move BlockdevOnError declaration to QAPI
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (8 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 09/47] block: rename BlockErrorAction, BlockQMPEventAction Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 11/47] block: reorganize io error code Paolo Bonzini
                   ` (37 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This will let block-stream reuse the enum.  The enum values are
renamed accordingly.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c           |    6 +++---
 block.h           |    5 -----
 blockdev.c        |   12 ++++++------
 hw/fdc.c          |    4 ++--
 hw/ide/core.c     |    6 +++---
 hw/scsi-disk.c    |    6 +++---
 hw/scsi-generic.c |    4 ++--
 hw/virtio-blk.c   |    6 +++---
 qapi-schema.json  |   23 +++++++++++++++++++++++
 9 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/block.c b/block.c
index 7660992..5cd3a4b 100644
--- a/block.c
+++ b/block.c
@@ -3829,9 +3829,9 @@ void bdrv_iostatus_enable(BlockDriverState *bs)
 bool bdrv_iostatus_is_enabled(const BlockDriverState *bs)
 {
     return (bs->iostatus_enabled &&
-           (bs->on_write_error == BLOCK_ERR_STOP_ENOSPC ||
-            bs->on_write_error == BLOCK_ERR_STOP_ANY    ||
-            bs->on_read_error == BLOCK_ERR_STOP_ANY));
+           (bs->on_write_error == BLOCKDEV_ON_ERROR_ENOSPC ||
+            bs->on_write_error == BLOCKDEV_ON_ERROR_STOP   ||
+            bs->on_read_error == BLOCKDEV_ON_ERROR_STOP));
 }
 
 void bdrv_iostatus_disable(BlockDriverState *bs)
diff --git a/block.h b/block.h
index 5cfa137..f0466b8 100644
--- a/block.h
+++ b/block.h
@@ -89,11 +89,6 @@ typedef struct BlockDevOps {
 #define BDRV_SECTOR_MASK   ~(BDRV_SECTOR_SIZE - 1)
 
 typedef enum {
-    BLOCK_ERR_REPORT, BLOCK_ERR_IGNORE, BLOCK_ERR_STOP_ENOSPC,
-    BLOCK_ERR_STOP_ANY
-} BlockdevOnError;
-
-typedef enum {
     BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP
 } BlockErrorAction;
 
diff --git a/blockdev.c b/blockdev.c
index d29a408..fccbe3d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -241,13 +241,13 @@ static void drive_put_ref_bh_schedule(DriveInfo *dinfo)
 static int parse_block_error_action(const char *buf, int is_read)
 {
     if (!strcmp(buf, "ignore")) {
-        return BLOCK_ERR_IGNORE;
+        return BLOCKDEV_ON_ERROR_IGNORE;
     } else if (!is_read && !strcmp(buf, "enospc")) {
-        return BLOCK_ERR_STOP_ENOSPC;
+        return BLOCKDEV_ON_ERROR_ENOSPC;
     } else if (!strcmp(buf, "stop")) {
-        return BLOCK_ERR_STOP_ANY;
+        return BLOCKDEV_ON_ERROR_STOP;
     } else if (!strcmp(buf, "report")) {
-        return BLOCK_ERR_REPORT;
+        return BLOCKDEV_ON_ERROR_REPORT;
     } else {
         error_report("'%s' invalid %s error action",
                      buf, is_read ? "read" : "write");
@@ -432,7 +432,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         return NULL;
     }
 
-    on_write_error = BLOCK_ERR_STOP_ENOSPC;
+    on_write_error = BLOCKDEV_ON_ERROR_ENOSPC;
     if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
         if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
             error_report("werror is not supported by this bus type");
@@ -445,7 +445,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         }
     }
 
-    on_read_error = BLOCK_ERR_REPORT;
+    on_read_error = BLOCKDEV_ON_ERROR_REPORT;
     if ((buf = qemu_opt_get(opts, "rerror")) != NULL) {
         if (type != IF_IDE && type != IF_VIRTIO && type != IF_SCSI && type != IF_NONE) {
             error_report("rerror is not supported by this bus type");
diff --git a/hw/fdc.c b/hw/fdc.c
index 08830c1..43b0f20 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -1994,11 +1994,11 @@ static int fdctrl_connect_drives(FDCtrl *fdctrl)
         drive->fdctrl = fdctrl;
 
         if (drive->bs) {
-            if (bdrv_get_on_error(drive->bs, 0) != BLOCK_ERR_STOP_ENOSPC) {
+            if (bdrv_get_on_error(drive->bs, 0) != BLOCKDEV_ON_ERROR_ENOSPC) {
                 error_report("fdc doesn't support drive option werror");
                 return -1;
             }
-            if (bdrv_get_on_error(drive->bs, 1) != BLOCK_ERR_REPORT) {
+            if (bdrv_get_on_error(drive->bs, 1) != BLOCKDEV_ON_ERROR_REPORT) {
                 error_report("fdc doesn't support drive option rerror");
                 return -1;
             }
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 4823e92..73f830d 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -561,13 +561,13 @@ static int ide_handle_rw_error(IDEState *s, int error, int op)
     int is_read = (op & BM_STATUS_RETRY_READ);
     BlockdevOnError action = bdrv_get_on_error(s->bs, is_read);
 
-    if (action == BLOCK_ERR_IGNORE) {
+    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
         bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
         return 0;
     }
 
-    if ((error == ENOSPC && action == BLOCK_ERR_STOP_ENOSPC)
-            || action == BLOCK_ERR_STOP_ANY) {
+    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
+            || action == BLOCKDEV_ON_ERROR_STOP) {
         s->bus->dma->ops->set_unit(s->bus->dma, s->unit);
         s->bus->error_status = op;
         bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_STOP, is_read);
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 40804a6..39d0053 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -394,13 +394,13 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
     BlockdevOnError action = bdrv_get_on_error(s->qdev.conf.bs, is_read);
 
-    if (action == BLOCK_ERR_IGNORE) {
+    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
         bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_IGNORE, is_read);
         return 0;
     }
 
-    if ((error == ENOSPC && action == BLOCK_ERR_STOP_ENOSPC)
-            || action == BLOCK_ERR_STOP_ANY) {
+    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
+            || action == BLOCKDEV_ON_ERROR_STOP) {
 
         bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_STOP, is_read);
         vm_stop(RUN_STATE_IO_ERROR);
diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c
index 8d51060..ba40be6 100644
--- a/hw/scsi-generic.c
+++ b/hw/scsi-generic.c
@@ -400,11 +400,11 @@ static int scsi_generic_initfn(SCSIDevice *s)
         return -1;
     }
 
-    if (bdrv_get_on_error(s->conf.bs, 0) != BLOCK_ERR_STOP_ENOSPC) {
+    if (bdrv_get_on_error(s->conf.bs, 0) != BLOCKDEV_ON_ERROR_ENOSPC) {
         error_report("Device doesn't support drive option werror");
         return -1;
     }
-    if (bdrv_get_on_error(s->conf.bs, 1) != BLOCK_ERR_REPORT) {
+    if (bdrv_get_on_error(s->conf.bs, 1) != BLOCKDEV_ON_ERROR_REPORT) {
         error_report("Device doesn't support drive option rerror");
         return -1;
     }
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index a70fbe4..095e84c 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -69,13 +69,13 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
     BlockdevOnError action = bdrv_get_on_error(req->dev->bs, is_read);
     VirtIOBlock *s = req->dev;
 
-    if (action == BLOCK_ERR_IGNORE) {
+    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
         bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
         return 0;
     }
 
-    if ((error == ENOSPC && action == BLOCK_ERR_STOP_ENOSPC)
-            || action == BLOCK_ERR_STOP_ANY) {
+    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
+            || action == BLOCKDEV_ON_ERROR_STOP) {
         req->next = s->rq;
         s->rq = req;
         bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_STOP, is_read);
diff --git a/qapi-schema.json b/qapi-schema.json
index ef7caab..136ce5e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -902,6 +902,29 @@
 { 'command': 'query-pci', 'returns': ['PciInfo'] }
 
 ##
+# @BlockdevOnError:
+#
+# An enumeration of possible behaviors for errors on I/O operations.
+# The exact meaning depends on whether the I/O was initiated by a guest
+# or by a block job
+#
+# @report: for guest operations, report the error to the guest;
+#          for jobs, cancel the job
+#
+# @ignore: ignore the error, only report a QMP event (BLOCK_IO_ERROR
+#          or BLOCK_JOB_ERROR)
+#
+# @stop: for guest operations, stop the virtual machine;
+#        for jobs, pause the job
+#
+# @enospc: same as @stop on ENOSPC, same as @report otherwise.
+#
+# Since: 1.2
+##
+{ 'enum': 'BlockdevOnError',
+  'data': ['report', 'ignore', 'enospc', 'stop'] }
+
+##
 # @BlockJobInfo:
 #
 # Information about a long-running block device operation.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 11/47] block: reorganize io error code
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (9 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 10/47] block: move BlockdevOnError declaration to QAPI Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-08-01  9:30   ` Kevin Wolf
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 12/47] block: sort BlockDeviceIoStatus errors by severity Paolo Bonzini
                   ` (36 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Move the common part of IDE/SCSI/virtio error handling to the block
layer.  The new function bdrv_error_action subsumes all three of
bdrv_emit_qmp_error_event, vm_stop, bdrv_iostatus_set_err.

The same scheme will be used for errors in block jobs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c         |   46 ++++++++++++++++++++++++++++++++++++++--------
 block.h         |    5 +++--
 hw/ide/core.c   |   20 +++++---------------
 hw/scsi-disk.c  |   23 +++++++----------------
 hw/virtio-blk.c |   19 +++++--------------
 qemu-tool.c     |    6 ++++++
 6 files changed, 64 insertions(+), 55 deletions(-)

diff --git a/block.c b/block.c
index 5cd3a4b..333a8fd 100644
--- a/block.c
+++ b/block.c
@@ -29,6 +29,7 @@
 #include "blockjob.h"
 #include "module.h"
 #include "qjson.h"
+#include "sysemu.h"
 #include "qemu-coroutine.h"
 #include "qmp-commands.h"
 #include "qemu-timer.h"
@@ -1152,8 +1153,8 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
     }
 }
 
-void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                               BlockErrorAction action, int is_read)
+static void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
+                                      BlockErrorAction action, int is_read)
 {
     QObject *data;
     const char *action_str;
@@ -2147,6 +2148,39 @@ BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, int is_read)
     return is_read ? bs->on_read_error : bs->on_write_error;
 }
 
+BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, int is_read, int error)
+{
+    BlockdevOnError on_err = is_read ? bs->on_read_error : bs->on_write_error;
+
+    switch (on_err) {
+    case BLOCKDEV_ON_ERROR_ENOSPC:
+        return (error == ENOSPC) ? BDRV_ACTION_STOP : BDRV_ACTION_REPORT;
+    case BLOCKDEV_ON_ERROR_STOP:
+        return BDRV_ACTION_STOP;
+    case BLOCKDEV_ON_ERROR_REPORT:
+        return BDRV_ACTION_REPORT;
+    case BLOCKDEV_ON_ERROR_IGNORE:
+        return BDRV_ACTION_IGNORE;
+    default:
+        abort();
+    }
+}
+
+/* This is done by device models because, while the block layer knows
+ * about the error, it does not know whether an operation comes from
+ * the device or the block layer (from a job, for example).
+ */
+void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
+                       int is_read, int error)
+{
+    assert(error >= 0);
+    bdrv_emit_qmp_error_event(bs, action, is_read);
+    if (action == BDRV_ACTION_STOP) {
+        vm_stop(RUN_STATE_IO_ERROR);
+        bdrv_iostatus_set_err(bs, error);
+    }
+}
+
 int bdrv_is_read_only(BlockDriverState *bs)
 {
     return bs->read_only;
@@ -3846,14 +3880,10 @@ void bdrv_iostatus_reset(BlockDriverState *bs)
     }
 }
 
-/* XXX: Today this is set by device models because it makes the implementation
-   quite simple. However, the block layer knows about the error, so it's
-   possible to implement this without device models being involved */
 void bdrv_iostatus_set_err(BlockDriverState *bs, int error)
 {
-    if (bdrv_iostatus_is_enabled(bs) &&
-        bs->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
-        assert(error >= 0);
+    assert(bdrv_iostatus_is_enabled(bs));
+    if (bs->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
         bs->iostatus = error == ENOSPC ? BLOCK_DEVICE_IO_STATUS_NOSPACE :
                                          BLOCK_DEVICE_IO_STATUS_FAILED;
     }
diff --git a/block.h b/block.h
index f0466b8..935e04e 100644
--- a/block.h
+++ b/block.h
@@ -97,8 +97,6 @@ void bdrv_iostatus_reset(BlockDriverState *bs);
 void bdrv_iostatus_disable(BlockDriverState *bs);
 bool bdrv_iostatus_is_enabled(const BlockDriverState *bs);
 void bdrv_iostatus_set_err(BlockDriverState *bs, int error);
-void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                               BlockErrorAction action, int is_read);
 void bdrv_info_print(Monitor *mon, const QObject *data);
 void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
@@ -257,6 +255,9 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
 void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
                        BlockdevOnError on_write_error);
 BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, int is_read);
+BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, int is_read, int error);
+void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
+                       int is_read, int error);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
 int bdrv_enable_write_cache(BlockDriverState *bs);
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 73f830d..cb5ca4b 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -559,31 +559,21 @@ void ide_dma_error(IDEState *s)
 static int ide_handle_rw_error(IDEState *s, int error, int op)
 {
     int is_read = (op & BM_STATUS_RETRY_READ);
-    BlockdevOnError action = bdrv_get_on_error(s->bs, is_read);
+    BlockErrorAction action = bdrv_get_error_action(s->bs, is_read, error);
 
-    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
-        return 0;
-    }
-
-    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
-            || action == BLOCKDEV_ON_ERROR_STOP) {
+    if (action == BDRV_ACTION_STOP) {
         s->bus->dma->ops->set_unit(s->bus->dma, s->unit);
         s->bus->error_status = op;
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_STOP, is_read);
-        vm_stop(RUN_STATE_IO_ERROR);
-        bdrv_iostatus_set_err(s->bs, error);
-    } else {
+    } else if (action == BDRV_ACTION_REPORT) {
         if (op & BM_STATUS_DMA_RETRY) {
             dma_buf_commit(s);
             ide_dma_error(s);
         } else {
             ide_rw_error(s);
         }
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_REPORT, is_read);
     }
-
-    return 1;
+    bdrv_error_action(s->bs, action, is_read, error);
+    return action != BDRV_ACTION_IGNORE;
 }
 
 void ide_dma_cb(void *opaque, int ret)
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 39d0053..39a2830 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -392,21 +392,9 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
 {
     int is_read = (r->req.cmd.xfer == SCSI_XFER_FROM_DEV);
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-    BlockdevOnError action = bdrv_get_on_error(s->qdev.conf.bs, is_read);
+    BlockErrorAction action = bdrv_get_error_action(s->qdev.conf.bs, is_read, error);
 
-    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
-        bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_IGNORE, is_read);
-        return 0;
-    }
-
-    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
-            || action == BLOCKDEV_ON_ERROR_STOP) {
-
-        bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_STOP, is_read);
-        vm_stop(RUN_STATE_IO_ERROR);
-        bdrv_iostatus_set_err(s->qdev.conf.bs, error);
-        scsi_req_retry(&r->req);
-    } else {
+    if (action == BDRV_ACTION_REPORT) {
         switch (error) {
         case ENOMEDIUM:
             scsi_check_condition(r, SENSE_CODE(NO_MEDIUM));
@@ -421,9 +409,12 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
             scsi_check_condition(r, SENSE_CODE(IO_ERROR));
             break;
         }
-        bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_REPORT, is_read);
     }
-    return 1;
+    bdrv_error_action(s->qdev.conf.bs, action, is_read, error);
+    if (action == BDRV_ACTION_STOP) {
+        scsi_req_retry(&r->req);
+    }
+    return action != BDRV_ACTION_IGNORE;
 }
 
 static void scsi_write_complete(void * opaque, int ret)
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 095e84c..d8031ec 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -66,29 +66,20 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
 static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
     int is_read)
 {
-    BlockdevOnError action = bdrv_get_on_error(req->dev->bs, is_read);
+    BlockErrorAction action = bdrv_get_error_action(req->dev->bs, is_read, error);
     VirtIOBlock *s = req->dev;
 
-    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
-        return 0;
-    }
-
-    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
-            || action == BLOCKDEV_ON_ERROR_STOP) {
+    if (action == BDRV_ACTION_STOP) {
         req->next = s->rq;
         s->rq = req;
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_STOP, is_read);
-        vm_stop(RUN_STATE_IO_ERROR);
-        bdrv_iostatus_set_err(s->bs, error);
-    } else {
+    } else if (action == BDRV_ACTION_REPORT) {
         virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
         bdrv_acct_done(s->bs, &req->acct);
         g_free(req);
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_REPORT, is_read);
     }
 
-    return 1;
+    bdrv_error_action(s->bs, action, is_read, error);
+    return action != BDRV_ACTION_IGNORE;
 }
 
 static void virtio_blk_rw_complete(void *opaque, int ret)
diff --git a/qemu-tool.c b/qemu-tool.c
index 318c5fc..a59c24c 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -19,6 +19,7 @@
 #include "qemu-log.h"
 #include "migration.h"
 #include "main-loop.h"
+#include "sysemu.h"
 #include "qemu_socket.h"
 #include "slirp/libslirp.h"
 
@@ -32,6 +33,11 @@ struct QEMUBH
 
 Monitor *cur_mon;
 
+void vm_stop(RunState state)
+{
+    abort();
+}
+
 int monitor_cur_is_qmp(void)
 {
     return 0;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 12/47] block: sort BlockDeviceIoStatus errors by severity
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (10 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 11/47] block: reorganize io error code Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-08-01  9:44   ` Paolo Bonzini
  2012-08-01  9:44   ` Kevin Wolf
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 13/47] block: introduce block job error Paolo Bonzini
                   ` (35 subsequent siblings)
  47 siblings, 2 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This does not let a "failed" (EIO) status override a "nospace" status.
When several concurrent asynchronous operations fail, management will
always observe the most severe condition.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c          |   11 ++++++++---
 qapi-schema.json |    2 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 333a8fd..dce07b3 100644
--- a/block.c
+++ b/block.c
@@ -3883,9 +3883,14 @@ void bdrv_iostatus_reset(BlockDriverState *bs)
 void bdrv_iostatus_set_err(BlockDriverState *bs, int error)
 {
     assert(bdrv_iostatus_is_enabled(bs));
-    if (bs->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
-        bs->iostatus = error == ENOSPC ? BLOCK_DEVICE_IO_STATUS_NOSPACE :
-                                         BLOCK_DEVICE_IO_STATUS_FAILED;
+    BlockDeviceIoStatus new_status =
+        (error == ENOSPC ? BLOCK_DEVICE_IO_STATUS_NOSPACE :
+                           BLOCK_DEVICE_IO_STATUS_FAILED);
+
+    /* iostatus values are sorted from less severe to most severe
+     * (ok, nospace, failed).  */
+    if (bs->iostatus < new_status) {
+        bs->iostatus = new_status;
     }
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 136ce5e..2dee7c3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -435,7 +435,7 @@
 #
 # Since: 1.0
 ##
-{ 'enum': 'BlockDeviceIoStatus', 'data': [ 'ok', 'failed', 'nospace' ] }
+{ 'enum': 'BlockDeviceIoStatus', 'data': [ 'ok', 'nospace', 'failed' ] }
 
 ##
 # @BlockInfo:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (11 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 12/47] block: sort BlockDeviceIoStatus errors by severity Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-25 17:40   ` Eric Blake
  2012-08-01 10:14   ` Kevin Wolf
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 14/47] stream: add on-error argument Paolo Bonzini
                   ` (34 subsequent siblings)
  47 siblings, 2 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

The following behaviors are possible:

'report': The behavior is the same as in 1.1.  An I/O error,
respectively during a read or a write, will complete the job immediately
with an error code.

'ignore': An I/O error, respectively during a read or a write, will be
ignored.  For streaming, the job will complete with an error and the
backing file will be left in place.  For mirroring, the sector will be
marked again as dirty and re-examined later.

'stop': The job will be paused and the job iostatus will be set to
failed or nospace, while the VM will keep running.  This can only be
specified if the block device has rerror=stop and werror=stop or enospc.

'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.

In all cases, even for 'report', the I/O error is reported as a QMP
event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR.

It is possible that while stopping the VM a BLOCK_IO_ERROR event will be
reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa.
This is not really avoidable since stopping the VM completes all pending
I/O requests.  In fact, it is already possible now that a series of
BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop
calls bdrv_drain_all and this can generate further errors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 QMP/qmp-events.txt |   23 +++++++++++++++++++
 block.c            |    9 ++++----
 block_int.h        |    4 ++++
 blockjob.c         |   62 +++++++++++++++++++++++++++++++++++++++++++++++-----
 blockjob.h         |   17 ++++++++++++++
 monitor.c          |    1 +
 monitor.h          |    1 +
 qapi-schema.json   |    5 ++++-
 8 files changed, 111 insertions(+), 11 deletions(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index 9ba7079..e910deb 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -353,3 +353,26 @@ Example:
 { "event": "BALLOON_CHANGE",
     "data": { "actual": 944766976 },
     "timestamp": { "seconds": 1267020223, "microseconds": 435656 } }
+
+
+BLOCK_JOB_ERROR
+---------------
+
+Emitted when a block job encounters an error.
+
+Data:
+
+- "device": device name (json-string)
+- "operation": I/O operation (json-string, "read" or "write")
+- "action": action that has been taken, it's one of the following (json-string):
+    "ignore": error has been ignored, the job may fail later
+    "report": error will be reported and the job canceled
+    "stop": error caused job to be paused
+
+Example:
+
+{ "event": "BLOCK_JOB_ERROR",
+    "data": { "device": "ide0-hd1",
+              "operation": "write",
+              "action": "stop" },
+    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
diff --git a/block.c b/block.c
index dce07b3..44542e5 100644
--- a/block.c
+++ b/block.c
@@ -1153,8 +1153,9 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
     }
 }
 
-static void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                                      BlockErrorAction action, int is_read)
+void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
+                               enum MonitorEvent ev,
+                               BlockErrorAction action, int is_read)
 {
     QObject *data;
     const char *action_str;
@@ -1177,7 +1178,7 @@ static void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
                               bdrv->device_name,
                               action_str,
                               is_read ? "read" : "write");
-    monitor_protocol_event(QEVENT_BLOCK_IO_ERROR, data);
+    monitor_protocol_event(ev, data);
 
     qobject_decref(data);
 }
@@ -2174,7 +2175,7 @@ void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
                        int is_read, int error)
 {
     assert(error >= 0);
-    bdrv_emit_qmp_error_event(bs, action, is_read);
+    bdrv_emit_qmp_error_event(bs, QEVENT_BLOCK_IO_ERROR, action, is_read);
     if (action == BDRV_ACTION_STOP) {
         vm_stop(RUN_STATE_IO_ERROR);
         bdrv_iostatus_set_err(bs, error);
diff --git a/block_int.h b/block_int.h
index 4cc173d..92c106a 100644
--- a/block_int.h
+++ b/block_int.h
@@ -30,6 +30,7 @@
 #include "qemu-coroutine.h"
 #include "qemu-timer.h"
 #include "qapi-types.h"
+#include "monitor.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
@@ -276,6 +277,9 @@ void bdrv_set_io_limits(BlockDriverState *bs,
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
+void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
+                               enum MonitorEvent ev,
+                               BlockErrorAction action, int is_read);
 
 /**
  * stream_start:
diff --git a/blockjob.c b/blockjob.c
index a18da3f..562e0b5 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -112,6 +112,7 @@ bool block_job_is_paused(BlockJob *job)
 void block_job_resume(BlockJob *job)
 {
     job->paused = false;
+    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
     if (job->co && !job->busy) {
         qemu_coroutine_enter(job->co, NULL);
     }
@@ -189,11 +190,60 @@ void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
 BlockJobInfo *block_job_query(BlockJob *job)
 {
     BlockJobInfo *info = g_new(BlockJobInfo, 1);
-    info->type   = g_strdup(job->job_type->job_type);
-    info->device = g_strdup(bdrv_get_device_name(job->bs));
-    info->len    = job->len;
-    info->paused = job->paused;
-    info->offset = job->offset;
-    info->speed  = job->speed;
+    info->type      = g_strdup(job->job_type->job_type);
+    info->device    = g_strdup(bdrv_get_device_name(job->bs));
+    info->len       = job->len;
+    info->paused    = job->paused;
+    info->offset    = job->offset;
+    info->speed     = job->speed;
+    info->io_status = job->iostatus;
     return info;
 }
+
+static void block_job_iostatus_set_err(BlockJob *job, int error)
+{
+    BlockDeviceIoStatus new_status =
+        (error == ENOSPC ? BLOCK_DEVICE_IO_STATUS_NOSPACE :
+                           BLOCK_DEVICE_IO_STATUS_FAILED);
+
+    /* iostatus values are sorted from less severe to most severe
+     * (ok, nospace, failed).  */
+    if (job->iostatus < new_status) {
+        job->iostatus = new_status;
+    }
+}
+
+
+BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
+                                        BlockdevOnError on_err,
+                                        int is_read, int error)
+{
+    BlockErrorAction action;
+
+    switch (on_err) {
+    case BLOCKDEV_ON_ERROR_ENOSPC:
+        action = (error == ENOSPC) ? BDRV_ACTION_STOP : BDRV_ACTION_REPORT;
+        break;
+    case BLOCKDEV_ON_ERROR_STOP:
+        action = BDRV_ACTION_STOP;
+        break;
+    case BLOCKDEV_ON_ERROR_REPORT:
+        action = BDRV_ACTION_REPORT;
+        break;
+    case BLOCKDEV_ON_ERROR_IGNORE:
+        action = BDRV_ACTION_IGNORE;
+        break;
+    default:
+        abort();
+    }
+    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR, action, is_read);
+    if (action == BDRV_ACTION_STOP) {
+        block_job_pause(job);
+        if (bs == job->bs) {
+            block_job_iostatus_set_err(job, error);
+        } else {
+            bdrv_iostatus_set_err(bs, error);
+        }
+    }
+    return action;
+}
diff --git a/blockjob.h b/blockjob.h
index 2abbe13..b17ee2e 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -82,6 +82,9 @@ struct BlockJob {
      */
     bool busy;
 
+    /** Status that is published by the query-block-jobs QMP API */
+    BlockDeviceIoStatus iostatus;
+
     /** Offset that is published by the query-block-jobs QMP API */
     int64_t offset;
 
@@ -216,4 +219,18 @@ bool block_job_is_paused(BlockJob *job);
  */
 int block_job_cancel_sync(BlockJob *job);
 
+/**
+ * block_job_error_action:
+ * @job: The job to signal an error for.
+ * @bs: The block device on which to set an I/O error.
+ * @on_err: The error action setting.
+ * @is_read: Whether the operation was a read.
+ * @error: The error that was reported.
+ *
+ * Report an I/O error for a block job and possibly stop the VM.  Return the
+ * action that was selected based on @on_err and @error.
+ */
+BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
+                                        BlockdevOnError on_err,
+                                        int is_read, int error);
 #endif
diff --git a/monitor.c b/monitor.c
index 49dccfe..19da71d 100644
--- a/monitor.c
+++ b/monitor.c
@@ -454,6 +454,7 @@ static const char *monitor_event_names[] = {
     [QEVENT_SPICE_DISCONNECTED] = "SPICE_DISCONNECTED",
     [QEVENT_BLOCK_JOB_COMPLETED] = "BLOCK_JOB_COMPLETED",
     [QEVENT_BLOCK_JOB_CANCELLED] = "BLOCK_JOB_CANCELLED",
+    [QEVENT_BLOCK_JOB_ERROR] = "BLOCK_JOB_ERROR",
     [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
     [QEVENT_SUSPEND] = "SUSPEND",
     [QEVENT_WAKEUP] = "WAKEUP",
diff --git a/monitor.h b/monitor.h
index 5f4de1b..f806962 100644
--- a/monitor.h
+++ b/monitor.h
@@ -38,6 +38,7 @@ typedef enum MonitorEvent {
     QEVENT_SPICE_DISCONNECTED,
     QEVENT_BLOCK_JOB_COMPLETED,
     QEVENT_BLOCK_JOB_CANCELLED,
+    QEVENT_BLOCK_JOB_ERROR,
     QEVENT_DEVICE_TRAY_MOVED,
     QEVENT_SUSPEND,
     QEVENT_WAKEUP,
diff --git a/qapi-schema.json b/qapi-schema.json
index 2dee7c3..d7191f3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -941,11 +941,14 @@
 #
 # @speed: the rate limit, bytes per second
 #
+# @io-status: the status of the job (since 1.2)
+#
 # Since: 1.1
 ##
 { 'type': 'BlockJobInfo',
   'data': {'type': 'str', 'device': 'str', 'len': 'int',
-           'offset': 'int', 'paused': 'bool', 'speed': 'int'} }
+           'offset': 'int', 'paused': 'bool', 'speed': 'int',
+           'io-status': 'BlockDeviceIoStatus'} }
 
 ##
 # @query-block-jobs:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 14/47] stream: add on-error argument
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (12 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 13/47] block: introduce block job error Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-31 18:40   ` Eric Blake
  2012-08-01 10:29   ` Kevin Wolf
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 15/47] blkdebug: process all set_state rules in the old state Paolo Bonzini
                   ` (33 subsequent siblings)
  47 siblings, 2 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This patch adds support for error management to streaming.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/stream.c   |   28 +++++++++++++++++++++++++++-
 block_int.h      |    3 ++-
 blockdev.c       |   11 ++++++++---
 hmp.c            |    3 ++-
 qapi-schema.json |   10 ++++++++--
 qmp-commands.hx  |    2 +-
 6 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index b3ede44..03cae14 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -31,6 +31,7 @@ typedef struct StreamBlockJob {
     BlockJob common;
     RateLimit limit;
     BlockDriverState *base;
+    BlockdevOnError on_error;
     char backing_file_id[1024];
 } StreamBlockJob;
 
@@ -78,6 +79,7 @@ static void coroutine_fn stream_run(void *opaque)
     BlockDriverState *bs = s->common.bs;
     BlockDriverState *base = s->base;
     int64_t sector_num, end;
+    int error = 0;
     int ret = 0;
     int n = 0;
     void *buf;
@@ -136,7 +138,19 @@ wait:
             ret = stream_populate(bs, sector_num, n, buf);
         }
         if (ret < 0) {
-            break;
+            BlockErrorAction action =
+                block_job_error_action(&s->common, s->common.bs, s->on_error,
+                                       true, -ret);
+            if (action == BDRV_ACTION_STOP) {
+                n = 0;
+                continue;
+            }
+            if (error == 0) {
+                error = ret;
+            }
+            if (action == BDRV_ACTION_REPORT) {
+                break;
+            }
         }
         ret = 0;
 
@@ -148,6 +162,9 @@ wait:
         bdrv_disable_copy_on_read(bs);
     }
 
+    /* Do not remove the backing file if an error was there but ignored.  */
+    ret = error;
+
     if (!block_job_is_cancelled(&s->common) && sector_num == end && ret == 0) {
         const char *base_id = NULL, *base_fmt = NULL;
         if (base) {
@@ -183,11 +200,19 @@ static BlockJobType stream_job_type = {
 
 void stream_start(BlockDriverState *bs, BlockDriverState *base,
                   const char *base_id, int64_t speed,
+                  BlockdevOnError on_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp)
 {
     StreamBlockJob *s;
 
+    if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
+         on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
+        !bdrv_iostatus_is_enabled(bs)) {
+        error_set(errp, QERR_INVALID_PARAMETER, "on-error");
+        return;
+    }
+
     s = block_job_create(&stream_job_type, bs, speed, cb, opaque, errp);
     if (!s) {
         return;
@@ -198,6 +223,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
         pstrcpy(s->backing_file_id, sizeof(s->backing_file_id), base_id);
     }
 
+    s->on_error = on_error;
     s->common.co = qemu_coroutine_create(stream_run);
     trace_stream_start(bs, base, s, s->common.co, opaque);
     qemu_coroutine_enter(s->common.co, s);
diff --git a/block_int.h b/block_int.h
index 92c106a..d30ff7f 100644
--- a/block_int.h
+++ b/block_int.h
@@ -289,6 +289,7 @@ void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
  * @base_id: The file name that will be written to @bs as the new
  * backing file if the job completes.  Ignored if @base is %NULL.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @on_error: The action to take upon error.
  * @cb: Completion function for the job.
  * @opaque: Opaque pointer value passed to @cb.
  * @errp: Error object.
@@ -300,7 +301,7 @@ void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
  * @base_id in the written image and to @base in the live BlockDriverState.
  */
 void stream_start(BlockDriverState *bs, BlockDriverState *base,
-                  const char *base_id, int64_t speed,
+                  const char *base_id, int64_t speed, BlockdevOnError on_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
diff --git a/blockdev.c b/blockdev.c
index fccbe3d..49ade14 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1088,13 +1088,18 @@ static void block_stream_cb(void *opaque, int ret)
 }
 
 void qmp_block_stream(const char *device, bool has_base,
-                      const char *base, bool has_speed,
-                      int64_t speed, Error **errp)
+                      const char *base, bool has_speed, int64_t speed,
+                      bool has_on_error, BlockdevOnError on_error,
+                      Error **errp)
 {
     BlockDriverState *bs;
     BlockDriverState *base_bs = NULL;
     Error *local_err = NULL;
 
+    if (!has_on_error) {
+        on_error = BLOCKDEV_ON_ERROR_REPORT;
+    }
+
     bs = bdrv_find(device);
     if (!bs) {
         error_set(errp, QERR_DEVICE_NOT_FOUND, device);
@@ -1110,7 +1115,7 @@ void qmp_block_stream(const char *device, bool has_base,
     }
 
     stream_start(bs, base_bs, base, has_speed ? speed : 0,
-                 block_stream_cb, bs, &local_err);
+                 on_error, block_stream_cb, bs, &local_err);
     if (error_is_set(&local_err)) {
         error_propagate(errp, local_err);
         return;
diff --git a/hmp.c b/hmp.c
index a4c6629..5d4e6ac 100644
--- a/hmp.c
+++ b/hmp.c
@@ -845,7 +845,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
     int64_t speed = qdict_get_try_int(qdict, "speed", 0);
 
     qmp_block_stream(device, base != NULL, base,
-                     qdict_haskey(qdict, "speed"), speed, &error);
+                     qdict_haskey(qdict, "speed"), speed,
+                     BLOCKDEV_ON_ERROR_REPORT, true, &error);
 
     hmp_handle_error(mon, &error);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index d7191f3..6133d90 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1656,17 +1656,23 @@
 #
 # @speed:  #optional the maximum speed, in bytes per second
 #
+# @on-error: #optional the action to take on an error (default report).
+#            'stop' and 'enospc' can only be used if the block device
+#            supports io-status (see BlockInfo).  Since 1.2.
+#
 # Returns: Nothing on success
 #          If streaming is already active on this device, DeviceInUse
 #          If @device does not exist, DeviceNotFound
 #          If image streaming is not supported by this device, NotSupported
 #          If @base does not exist, BaseNotFound
 #          If @speed is invalid, InvalidParameter
+#          If @on_error is not supported, InvalidParameter
 #
 # Since: 1.1
 ##
-{ 'command': 'block-stream', 'data': { 'device': 'str', '*base': 'str',
-                                       '*speed': 'int' } }
+{ 'command': 'block-stream',
+  'data': { 'device': 'str', '*base': 'str', '*speed': 'int',
+            '*on-error': 'BlockdevOnError' } }
 
 ##
 # @block-job-set-speed:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index f0c98a1..7026a4a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -718,7 +718,7 @@ EQMP
 
     {
         .name       = "block-stream",
-        .args_type  = "device:B,base:s?,speed:o?",
+        .args_type  = "device:B,base:s?,speed:o?,on-error:s?",
         .mhandler.cmd_new = qmp_marshal_input_block_stream,
     },
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 15/47] blkdebug: process all set_state rules in the old state
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (13 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 14/47] stream: add on-error argument Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-24 20:06   ` Blue Swirl
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 16/47] qemu-iotests: map underscore to dash in QMP argument names Paolo Bonzini
                   ` (32 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Currently it is impossible to write a blkdebug script that ping-pongs
between two states, because the second set-state rule will use the
state that is set in the first.  If you have

    [set-state]
    event = "..."
    state = "1"
    new_state = "2"

    [set-state]
    event = "..."
    state = "2"
    new_state = "1"

for example the state will remain locked at 1.  This can be fixed
by first processing all rules, and then setting the state.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/blkdebug.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 59dcea0..0f12145 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -28,6 +28,7 @@
 
 typedef struct BDRVBlkdebugState {
     int state;
+    int new_state;
     QLIST_HEAD(, BlkdebugRule) rules[BLKDBG_EVENT_MAX];
     QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
 } BDRVBlkdebugState;
@@ -351,6 +352,7 @@ static BlockDriverAIOCB *blkdebug_aio_readv(BlockDriverState *bs,
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;
 
+    printf("read %ld\n", sector_num);
     QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
         if (rule->options.inject.sector == -1 ||
             (rule->options.inject.sector >= sector_num &&
@@ -403,12 +405,12 @@ static void blkdebug_close(BlockDriverState *bs)
 }
 
 static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
-    int old_state, bool injected)
+    bool injected)
 {
     BDRVBlkdebugState *s = bs->opaque;
 
     /* Only process rules for the current state */
-    if (rule->state && rule->state != old_state) {
+    if (rule->state && rule->state != s->state) {
         return injected;
     }
 
@@ -423,7 +425,7 @@ static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
         break;
 
     case ACTION_SET_STATE:
-        s->state = rule->options.set_state.new_state;
+        s->new_state = rule->options.set_state.new_state;
         break;
     }
     return injected;
@@ -433,15 +435,17 @@ static void blkdebug_debug_event(BlockDriverState *bs, BlkDebugEvent event)
 {
     BDRVBlkdebugState *s = bs->opaque;
     struct BlkdebugRule *rule;
-    int old_state = s->state;
     bool injected;
 
     assert((int)event >= 0 && event < BLKDBG_EVENT_MAX);
 
+    printf("state %d\n", s->state);
     injected = false;
+    s->new_state = s->state;
     QLIST_FOREACH(rule, &s->rules[event], next) {
-        injected = process_rule(bs, rule, old_state, injected);
+        injected = process_rule(bs, rule, injected);
     }
+    s->state = s->new_state;
 }
 
 static int64_t blkdebug_getlength(BlockDriverState *bs)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 16/47] qemu-iotests: map underscore to dash in QMP argument names
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (14 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 15/47] blkdebug: process all set_state rules in the old state Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 17/47] qemu-iotests: add tests for streaming error handling Paolo Bonzini
                   ` (31 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

iotests.py provides a convenience function that uses Python keyword
arguments to represent QMP command arguments.  However, almost all
QMP commands use dashes for argument names (the sole exception is
block_set_io_throttle), and dashes are not allowed in a keyword
argument name.  Hence provide automatic conversion of underscores
to dashes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/iotests.py |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index e05b1d6..a94ea75 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -19,6 +19,7 @@
 import os
 import re
 import subprocess
+import string
 import unittest
 import sys; sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'QMP'))
 import qmp
@@ -96,9 +97,14 @@ class VM(object):
             os.remove(self._qemu_log_path)
             self._popen = None
 
+    underscore_to_dash = string.maketrans('_', '-')
     def qmp(self, cmd, **args):
         '''Invoke a QMP command and return the result dict'''
-        return self._qmp.cmd(cmd, args=args)
+        qmp_args = dict()
+        for k in args.keys():
+            qmp_args[k.translate(self.underscore_to_dash)] = args[k]
+
+        return self._qmp.cmd(cmd, args=qmp_args)
 
     def get_qmp_events(self, wait=False):
         '''Poll for queued QMP events and return a list of dicts'''
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 17/47] qemu-iotests: add tests for streaming error handling
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (15 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 16/47] qemu-iotests: map underscore to dash in QMP argument names Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-08-01 10:43   ` Kevin Wolf
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 18/47] block: live snapshot documentation tweaks Paolo Bonzini
                   ` (30 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Add a test for each of report/ignore/stop.  The tests use blkdebug
to generate an error in the middle of a script.  The error is
recoverable (once = "on") so that we can test resuming a job after
stopping for an error.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/030        |  138 +++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group      |    2 +-
 tests/qemu-iotests/iotests.py |    7 +++
 3 files changed, 146 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 0163945..c65bf5e 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -163,6 +163,144 @@ class TestSingleDrive(ImageStreamingTestCase):
         result = self.vm.qmp('block-stream', device='nonexistent')
         self.assert_qmp(result, 'error/class', 'DeviceNotFound')
 
+class TestErrors(ImageStreamingTestCase):
+    image_len = 2 * 1024 * 1024 # MB
+
+    # this should match STREAM_BUFFER_SIZE/512 in block/stream.c
+    STREAM_BUFFER_SIZE = 512 * 1024
+
+    def create_blkdebug_file(self, name, event, errno):
+        file = open(name, 'w')
+        file.write('''
+[inject-error]
+state = "1"
+event = "%s"
+errno = "%d"
+immediately = "off"
+once = "on"
+sector = "%d"
+
+[set-state]
+state = "1"
+event = "%s"
+new_state = "2"
+
+[set-state]
+state = "2"
+event = "%s"
+new_state = "1"
+''' % (event, errno, self.STREAM_BUFFER_SIZE / 512, event, event))
+        file.close()
+
+    def setUp(self):
+        self.blkdebug_file = backing_img + ".blkdebug"
+        self.create_image(backing_img, TestErrors.image_len)
+        self.create_blkdebug_file(self.blkdebug_file, "read_aio", 5)
+        qemu_img('create', '-f', iotests.imgfmt,
+                 '-o', 'backing_file=blkdebug:%s:%s,backing_fmt=raw'
+                       % (self.blkdebug_file, backing_img),
+                 test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(self.blkdebug_file)
+
+    def test_report(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block-stream', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        error = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/error', 'Input/output error')
+                    self.assert_qmp(event, 'data/offset', self.STREAM_BUFFER_SIZE)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+        self.vm.shutdown()
+
+    def test_ignore(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block-stream', device='drive0', on_error='ignore')
+        self.assert_qmp(result, 'return', {})
+
+        error = False
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', False)
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/error', 'Input/output error')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+        self.vm.shutdown()
+
+    def test_stop(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block-stream', device='drive0', on_error='stop')
+        self.assert_qmp(result, 'return', {})
+
+        error = False
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', True)
+                    self.assert_qmp(result, 'return[0]/offset', self.STREAM_BUFFER_SIZE)
+                    self.assert_qmp(result, 'return[0]/io-status', 'failed')
+
+                    result = self.vm.qmp('block-job-resume', device='drive0')
+                    self.assert_qmp(result, 'return', {})
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', False)
+                    self.assert_qmp(result, 'return[0]/io-status', 'ok')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp_absent(event, 'data/error')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+        self.vm.shutdown()
+
 class TestStreamStop(ImageStreamingTestCase):
     image_len = 8 * 1024 * 1024 * 1024 # GB
 
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 7a2c92b..a569bc9 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -36,7 +36,7 @@
 027 rw auto quick
 028 rw backing auto
 029 rw auto quick
-030 rw auto
+030 rw auto backing
 031 rw auto quick
 032 rw auto
 033 rw auto
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index a94ea75..3c60b2d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -138,6 +138,13 @@ class QMPTestCase(unittest.TestCase):
                     self.fail('invalid index "%s" in path "%s" in "%s"' % (idx, path, str(d)))
         return d
 
+    def assert_qmp_absent(self, d, path):
+        try:
+            result = self.dictpath(d, path)
+        except AssertionError:
+            return
+        self.fail('path "%s" has value "%s"' % (path, str(result)))
+
     def assert_qmp(self, d, path, value):
         '''Assert that the value for a specific path in a QMP dict matches'''
         result = self.dictpath(d, path)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 18/47] block: live snapshot documentation tweaks
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (16 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 17/47] qemu-iotests: add tests for streaming error handling Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 19/47] block: add bdrv_query_info Paolo Bonzini
                   ` (29 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qapi-schema.json |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 6133d90..fca1806 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1228,7 +1228,7 @@
 # @format: #optional the format of the snapshot image, default is 'qcow2'.
 #
 # @mode: #optional whether and how QEMU should create a new image, default is
-# 'absolute-paths'.
+#        'absolute-paths'.
 ##
 { 'type': 'BlockdevSnapshot',
   'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str',
@@ -1258,9 +1258,9 @@
 #
 # Returns: nothing on success
 #          If @device is not a valid block device, DeviceNotFound
+#          If the block device has no medium inserted, DeviceHasNoMedium
 #          If @device is busy, DeviceInUse will be returned
-#          If @snapshot-file can't be created, OpenFileFailed
-#          If @snapshot-file can't be opened, OpenFileFailed
+#          If @snapshot-file can't be created or opened, OpenFileFailed
 #          If @format is invalid, InvalidBlockFormat
 #
 # Note: The transaction aborts on the first failure.  Therefore, there will
@@ -1286,11 +1286,13 @@
 # @format: #optional the format of the snapshot image, default is 'qcow2'.
 #
 # @mode: #optional whether and how QEMU should create a new image, default is
-# 'absolute-paths'.
+#        'absolute-paths'.
 #
 # Returns: nothing on success
 #          If @device is not a valid block device, DeviceNotFound
-#          If @snapshot-file can't be opened, OpenFileFailed
+#          If the block device has no medium inserted, DeviceHasNoMedium
+#          If @device is busy, DeviceInUse will be returned
+#          If @snapshot-file can't be created or opened, OpenFileFailed
 #          If @format is invalid, InvalidBlockFormat
 #
 # Since 0.14.0
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 19/47] block: add bdrv_query_info
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (17 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 18/47] block: live snapshot documentation tweaks Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-09-11 13:07   ` Kevin Wolf
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 20/47] block: add bdrv_query_stats Paolo Bonzini
                   ` (28 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Extract it out of the implementation of "info block".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |  104 +++++++++++++++++++++++++++++++--------------------------------
 block.h |    1 +
 2 files changed, 53 insertions(+), 52 deletions(-)

diff --git a/block.c b/block.c
index 44542e5..591e027 100644
--- a/block.c
+++ b/block.c
@@ -2452,63 +2452,15 @@ int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *top,
 
 BlockInfoList *qmp_query_block(Error **errp)
 {
-    BlockInfoList *head = NULL, *cur_item = NULL;
+    BlockInfoList *head = NULL, **p_next = &head;
     BlockDriverState *bs;
 
     QTAILQ_FOREACH(bs, &bdrv_states, list) {
         BlockInfoList *info = g_malloc0(sizeof(*info));
+        info->value = bdrv_query_info(bs);
 
-        info->value = g_malloc0(sizeof(*info->value));
-        info->value->device = g_strdup(bs->device_name);
-        info->value->type = g_strdup("unknown");
-        info->value->locked = bdrv_dev_is_medium_locked(bs);
-        info->value->removable = bdrv_dev_has_removable_media(bs);
-
-        if (bdrv_dev_has_removable_media(bs)) {
-            info->value->has_tray_open = true;
-            info->value->tray_open = bdrv_dev_is_tray_open(bs);
-        }
-
-        if (bdrv_iostatus_is_enabled(bs)) {
-            info->value->has_io_status = true;
-            info->value->io_status = bs->iostatus;
-        }
-
-        if (bs->drv) {
-            info->value->has_inserted = true;
-            info->value->inserted = g_malloc0(sizeof(*info->value->inserted));
-            info->value->inserted->file = g_strdup(bs->filename);
-            info->value->inserted->ro = bs->read_only;
-            info->value->inserted->drv = g_strdup(bs->drv->format_name);
-            info->value->inserted->encrypted = bs->encrypted;
-            if (bs->backing_file[0]) {
-                info->value->inserted->has_backing_file = true;
-                info->value->inserted->backing_file = g_strdup(bs->backing_file);
-            }
-
-            if (bs->io_limits_enabled) {
-                info->value->inserted->bps =
-                               bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
-                info->value->inserted->bps_rd =
-                               bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
-                info->value->inserted->bps_wr =
-                               bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
-                info->value->inserted->iops =
-                               bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
-                info->value->inserted->iops_rd =
-                               bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
-                info->value->inserted->iops_wr =
-                               bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
-            }
-        }
-
-        /* XXX: waiting for the qapi to support GSList */
-        if (!cur_item) {
-            head = cur_item = info;
-        } else {
-            cur_item->next = info;
-            cur_item = info;
-        }
+        *p_next = info;
+        p_next = &info->next;
     }
 
     return head;
@@ -2545,6 +2497,54 @@ static BlockStats *qmp_query_blockstat(const BlockDriverState *bs, Error **errp)
     return s;
 }
 
+BlockInfo *bdrv_query_info(BlockDriverState *bs)
+{
+    BlockInfo *info = g_malloc0(sizeof(*info));
+    info->device = g_strdup(bs->device_name);
+    info->type = g_strdup("unknown");
+    info->locked = bdrv_dev_is_medium_locked(bs);
+    info->removable = bdrv_dev_has_removable_media(bs);
+
+    if (bdrv_dev_has_removable_media(bs)) {
+        info->has_tray_open = true;
+        info->tray_open = bdrv_dev_is_tray_open(bs);
+    }
+
+    if (bdrv_iostatus_is_enabled(bs)) {
+        info->has_io_status = true;
+        info->io_status = bs->iostatus;
+    }
+
+    if (bs->drv) {
+        info->has_inserted = true;
+        info->inserted = g_malloc0(sizeof(*info->inserted));
+        info->inserted->file = g_strdup(bs->filename);
+        info->inserted->ro = bs->read_only;
+        info->inserted->drv = g_strdup(bs->drv->format_name);
+        info->inserted->encrypted = bs->encrypted;
+        if (bs->backing_file[0]) {
+            info->inserted->has_backing_file = true;
+            info->inserted->backing_file = g_strdup(bs->backing_file);
+        }
+
+        if (bs->io_limits_enabled) {
+            info->inserted->bps =
+                           bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+            info->inserted->bps_rd =
+                           bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
+            info->inserted->bps_wr =
+                           bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
+            info->inserted->iops =
+                           bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+            info->inserted->iops_rd =
+                           bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
+            info->inserted->iops_wr =
+                           bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
+        }
+    }
+    return info;
+}
+
 BlockStatsList *qmp_query_blockstats(Error **errp)
 {
     BlockStatsList *head = NULL, *cur_item = NULL;
diff --git a/block.h b/block.h
index 935e04e..ba99585 100644
--- a/block.h
+++ b/block.h
@@ -288,6 +288,7 @@ void bdrv_get_backing_filename(BlockDriverState *bs,
                                char *filename, int filename_size);
 void bdrv_get_full_backing_filename(BlockDriverState *bs,
                                     char *dest, size_t sz);
+BlockInfo *bdrv_query_info(BlockDriverState *s);
 int bdrv_can_snapshot(BlockDriverState *bs);
 int bdrv_is_snapshot(BlockDriverState *bs);
 BlockDriverState *bdrv_snapshots(void);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 20/47] block: add bdrv_query_stats
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (18 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 19/47] block: add bdrv_query_info Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 21/47] block: add bdrv_ensure_backing_file Paolo Bonzini
                   ` (27 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Extract it out of the implementation of "info blockstats".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   17 ++++++-----------
 block.h |    1 +
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index 591e027..19da114 100644
--- a/block.c
+++ b/block.c
@@ -2467,7 +2467,7 @@ BlockInfoList *qmp_query_block(Error **errp)
 }
 
 /* Consider exposing this as a full fledged QMP command */
-static BlockStats *qmp_query_blockstat(const BlockDriverState *bs, Error **errp)
+BlockStats *bdrv_query_stats(const BlockDriverState *bs)
 {
     BlockStats *s;
 
@@ -2491,7 +2491,7 @@ static BlockStats *qmp_query_blockstat(const BlockDriverState *bs, Error **errp)
 
     if (bs->file) {
         s->has_parent = true;
-        s->parent = qmp_query_blockstat(bs->file, NULL);
+        s->parent = bdrv_query_stats(bs->file);
     }
 
     return s;
@@ -2547,20 +2547,15 @@ BlockInfo *bdrv_query_info(BlockDriverState *bs)
 
 BlockStatsList *qmp_query_blockstats(Error **errp)
 {
-    BlockStatsList *head = NULL, *cur_item = NULL;
+    BlockStatsList *head = NULL, **p_next = &head;
     BlockDriverState *bs;
 
     QTAILQ_FOREACH(bs, &bdrv_states, list) {
         BlockStatsList *info = g_malloc0(sizeof(*info));
-        info->value = qmp_query_blockstat(bs, NULL);
+        info->value = bdrv_query_stats(bs);
 
-        /* XXX: waiting for the qapi to support GSList */
-        if (!cur_item) {
-            head = cur_item = info;
-        } else {
-            cur_item->next = info;
-            cur_item = info;
-        }
+        *p_next = info;
+        p_next = &info->next;
     }
 
     return head;
diff --git a/block.h b/block.h
index ba99585..9bff842 100644
--- a/block.h
+++ b/block.h
@@ -289,6 +289,7 @@ void bdrv_get_backing_filename(BlockDriverState *bs,
 void bdrv_get_full_backing_filename(BlockDriverState *bs,
                                     char *dest, size_t sz);
 BlockInfo *bdrv_query_info(BlockDriverState *s);
+BlockStats *bdrv_query_stats(const BlockDriverState *bs);
 int bdrv_can_snapshot(BlockDriverState *bs);
 int bdrv_is_snapshot(BlockDriverState *bs);
 BlockDriverState *bdrv_snapshots(void);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 21/47] block: add bdrv_ensure_backing_file
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (19 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 20/47] block: add bdrv_query_stats Paolo Bonzini
@ 2012-07-24 11:03 ` Paolo Bonzini
  2012-09-11 13:32   ` Kevin Wolf
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 22/47] block: make device optional in BlockInfo Paolo Bonzini
                   ` (26 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Mirroring runs without the backing file so that it can be copied outside
QEMU.  However, we need to add it at the time the job is completed and
QEMU switches to the target.  Factor out the common bits of opening an
image and completing a mirroring operation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   69 ++++++++++++++++++++++++++++++++++++++++-----------------------
 block.h |    1 +
 2 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/block.c b/block.c
index 19da114..002b442 100644
--- a/block.c
+++ b/block.c
@@ -730,6 +730,48 @@ int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags)
     return 0;
 }
 
+int bdrv_ensure_backing_file(BlockDriverState *bs)
+{
+    char backing_filename[PATH_MAX];
+    int back_flags, ret;
+    BlockDriver *back_drv = NULL;
+
+    if (bs->backing_hd != NULL) {
+        return 0;
+    }
+
+    bs->open_flags &= ~BDRV_O_NO_BACKING;
+    if (bs->backing_file[0] == '\0') {
+        return 0;
+    }
+
+    bs->backing_hd = bdrv_new("");
+    bdrv_get_full_backing_filename(bs, backing_filename,
+                                   sizeof(backing_filename));
+
+    if (bs->backing_format[0] != '\0') {
+        back_drv = bdrv_find_format(bs->backing_format);
+    }
+
+    /* backing files always opened read-only */
+    back_flags = bs->open_flags & ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT);
+
+    ret = bdrv_open(bs->backing_hd, backing_filename, back_flags, back_drv);
+    if (ret < 0) {
+        bdrv_close(bs);
+        bdrv_delete(bs->backing_hd);
+        bs->backing_hd = NULL;
+        return ret;
+    }
+    if (bs->is_temporary) {
+        bs->backing_hd->keep_read_only = !(bs->open_flags & BDRV_O_RDWR);
+    } else {
+        /* base images use the same setting as leaf */
+        bs->backing_hd->keep_read_only = bs->keep_read_only;
+    }
+    return 0;
+}
+
 /*
  * Opens a disk image (raw, qcow2, vmdk, ...)
  */
@@ -813,34 +855,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
     }
 
     /* If there is a backing file, use it */
-    if ((flags & BDRV_O_NO_BACKING) == 0 && bs->backing_file[0] != '\0') {
-        char backing_filename[PATH_MAX];
-        int back_flags;
-        BlockDriver *back_drv = NULL;
-
-        bs->backing_hd = bdrv_new("");
-        bdrv_get_full_backing_filename(bs, backing_filename,
-                                       sizeof(backing_filename));
-
-        if (bs->backing_format[0] != '\0') {
-            back_drv = bdrv_find_format(bs->backing_format);
-        }
-
-        /* backing files always opened read-only */
-        back_flags =
-            flags & ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING);
-
-        ret = bdrv_open(bs->backing_hd, backing_filename, back_flags, back_drv);
+    if ((flags & BDRV_O_NO_BACKING) == 0) {
+        ret = bdrv_ensure_backing_file(bs);
         if (ret < 0) {
-            bdrv_close(bs);
             return ret;
         }
-        if (bs->is_temporary) {
-            bs->backing_hd->keep_read_only = !(flags & BDRV_O_RDWR);
-        } else {
-            /* base image inherits from "parent" */
-            bs->backing_hd->keep_read_only = bs->keep_read_only;
-        }
     }
 
     if (!bdrv_key_required(bs)) {
diff --git a/block.h b/block.h
index 9bff842..8aeb7a9 100644
--- a/block.h
+++ b/block.h
@@ -122,6 +122,7 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_parse_cache_flags(const char *mode, int *flags);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
+int bdrv_ensure_backing_file(BlockDriverState *bs);
 int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
               BlockDriver *drv);
 void bdrv_close(BlockDriverState *bs);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 22/47] block: make device optional in BlockInfo
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (20 preceding siblings ...)
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 21/47] block: add bdrv_ensure_backing_file Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-09-11 13:38   ` Kevin Wolf
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 23/47] block: add target info to QMP query-blockjobs command Paolo Bonzini
                   ` (25 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Targets of a mirroring operation will not have a device.  Once we have
-blockdev or equivalent, "detached" block devices and non-anonymous
backing files also will not have a device.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qapi-schema.json |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index fca1806..b00d8c6 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -443,7 +443,8 @@
 # Block device information.  This structure describes a virtual device and
 # the backing device associated with it.
 #
-# @device: The device name associated with the virtual device.
+# @device: #optional The device name associated with the virtual device.
+#          Always included in the output of query-block.
 #
 # @type: This field is returned only for compatibility reasons, it should
 #        not be used (always returns 'unknown')
@@ -465,7 +466,7 @@
 # Since:  0.14.0
 ##
 { 'type': 'BlockInfo',
-  'data': {'device': 'str', 'type': 'str', 'removable': 'bool',
+  'data': {'*device': 'str', 'type': 'str', 'removable': 'bool',
            'locked': 'bool', '*inserted': 'BlockDeviceInfo',
            '*tray_open': 'bool', '*io-status': 'BlockDeviceIoStatus'} }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 23/47] block: add target info to QMP query-blockjobs command
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (21 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 22/47] block: make device optional in BlockInfo Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 24/47] block: introduce new dirty bitmap functionality Paolo Bonzini
                   ` (24 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockjob.c       |    3 +++
 blockjob.h       |    6 ++++++
 qapi-schema.json |   21 ++++++++++++++++++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/blockjob.c b/blockjob.c
index 562e0b5..651ee8d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -197,6 +197,9 @@ BlockJobInfo *block_job_query(BlockJob *job)
     info->offset    = job->offset;
     info->speed     = job->speed;
     info->io_status = job->iostatus;
+    if (job->job_type->query) {
+        job->job_type->query(job, info);
+    }
     return info;
 }
 
diff --git a/blockjob.h b/blockjob.h
index b17ee2e..5e03b5d 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -41,6 +41,12 @@ typedef struct BlockJobType {
 
     /** Optional callback for job types that support setting a speed limit */
     void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
+
+    /**
+     * Optional callback for job types that can fill the target member
+     * of BlockJobInfo.
+     */
+    void (*query)(BlockJob *job, BlockJobInfo *info);
 } BlockJobType;
 
 /**
diff --git a/qapi-schema.json b/qapi-schema.json
index b00d8c6..2697220 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -926,6 +926,21 @@
   'data': ['report', 'ignore', 'enospc', 'stop'] }
 
 ##
+# @BlockJobTargetInfo:
+#
+# Information about the target device for a long-running block device
+# operation.
+#
+# @info: information on the target device
+#
+# @stats: statistics about a target device
+#
+# Since: 1.2
+##
+{ 'type': 'BlockJobTargetInfo',
+  'data': {'info': 'BlockInfo', 'stats': 'BlockStats'} }
+
+##
 # @BlockJobInfo:
 #
 # Information about a long-running block device operation.
@@ -944,12 +959,16 @@
 #
 # @io-status: the status of the job (since 1.2)
 #
+# @target: the target device, if applicable to this particular type of
+#          job.
+#
 # Since: 1.1
 ##
 { 'type': 'BlockJobInfo',
   'data': {'type': 'str', 'device': 'str', 'len': 'int',
            'offset': 'int', 'paused': 'bool', 'speed': 'int',
-           'io-status': 'BlockDeviceIoStatus'} }
+           'io-status': 'BlockDeviceIoStatus',
+            '*target': 'BlockJobTargetInfo'} }
 
 ##
 # @query-block-jobs:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 24/47] block: introduce new dirty bitmap functionality
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (22 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 23/47] block: add target info to QMP query-blockjobs command Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-09-11 14:57   ` Kevin Wolf
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 25/47] block: add block-job-complete Paolo Bonzini
                   ` (23 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Assert that write_compressed is never used with the dirty bitmap.
Setting the bits early is wrong, because a coroutine might concurrently
examine them and copy incomplete data from the source.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   51 +++++++++++++++++++++++++++++++++++++++++++++------
 block.h |    5 +++--
 2 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 002b442..81c2bc5 100644
--- a/block.c
+++ b/block.c
@@ -2048,7 +2048,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
     }
 
     if (bs->dirty_bitmap) {
-        set_dirty_bitmap(bs, sector_num, nb_sectors, 1);
+        bdrv_set_dirty(bs, sector_num, nb_sectors);
     }
 
     if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
@@ -2607,9 +2607,7 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
     if (bdrv_check_request(bs, sector_num, nb_sectors))
         return -EIO;
 
-    if (bs->dirty_bitmap) {
-        set_dirty_bitmap(bs, sector_num, nb_sectors, 1);
-    }
+    assert(!bs->dirty_bitmap);
 
     return drv->bdrv_write_compressed(bs, sector_num, buf, nb_sectors);
 }
@@ -3838,13 +3836,54 @@ int bdrv_get_dirty(BlockDriverState *bs, int64_t sector)
 
     if (bs->dirty_bitmap &&
         (sector << BDRV_SECTOR_BITS) < bdrv_getlength(bs)) {
-        return !!(bs->dirty_bitmap[chunk / (sizeof(unsigned long) * 8)] &
-            (1UL << (chunk % (sizeof(unsigned long) * 8))));
+        return !!(bs->dirty_bitmap[chunk / BITS_PER_LONG] &
+            (1UL << (chunk % BITS_PER_LONG)));
     } else {
         return 0;
     }
 }
 
+int64_t bdrv_get_next_dirty(BlockDriverState *bs, int64_t sector)
+{
+    int64_t chunk;
+    int bit, elem;
+
+    /* Avoid an infinite loop.  */
+    assert(bs->dirty_count > 0);
+
+    sector = (sector | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
+    chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
+
+    QEMU_BUILD_BUG_ON(sizeof(bs->dirty_bitmap[0]) * 8 != BITS_PER_LONG);
+    elem = chunk / BITS_PER_LONG;
+    bit = chunk % BITS_PER_LONG;
+    for (;;) {
+        if (sector >= bs->total_sectors) {
+            sector = 0;
+            bit = elem = 0;
+        }
+        if (bit == 0 && bs->dirty_bitmap[elem] == 0) {
+            sector += BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG;
+            elem++;
+        } else {
+            if (bs->dirty_bitmap[elem] & (1UL << bit)) {
+                return sector;
+            }
+            sector += BDRV_SECTORS_PER_DIRTY_CHUNK;
+            if (++bit == BITS_PER_LONG) {
+                bit = 0;
+                elem++;
+            }
+        }
+    }
+}
+
+void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
+                    int nr_sectors)
+{
+    set_dirty_bitmap(bs, cur_sector, nr_sectors, 1);
+}
+
 void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
                       int nr_sectors)
 {
diff --git a/block.h b/block.h
index 8aeb7a9..e7440f6 100644
--- a/block.h
+++ b/block.h
@@ -328,8 +328,9 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
 void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
 int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
-void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
-                      int nr_sectors);
+void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
+void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
+int64_t bdrv_get_next_dirty(BlockDriverState *bs, int64_t sector);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 25/47] block: add block-job-complete
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (23 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 24/47] block: introduce new dirty bitmap functionality Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 26/47] block: introduce BLOCK_JOB_READY event Paolo Bonzini
                   ` (22 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

While streaming can be dropped as soon as it progressed through the whole
image, mirroring needs to be completed manually for two reasons: 1) so that
management knows exactly when the VM switches to the target; 2) because
for other use cases such as replication, we may leave the operation running
for the whole life of the virtual machine.

Add a new block job command that manually completes background operations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |   13 +++++++++++++
 blockjob.c       |   10 ++++++++++
 blockjob.h       |   15 +++++++++++++++
 hmp-commands.hx  |   17 ++++++++++++++++-
 hmp.c            |   10 ++++++++++
 hmp.h            |    1 +
 qapi-schema.json |   27 +++++++++++++++++++++++++++
 qerror.c         |    4 ++++
 qerror.h         |    3 +++
 qmp-commands.hx  |    5 +++++
 trace-events     |    1 +
 11 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 49ade14..7544afc 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1200,6 +1200,19 @@ void qmp_block_job_resume(const char *device, Error **errp)
     block_job_resume(job);
 }
 
+void qmp_block_job_complete(const char *device, Error **errp)
+{
+    BlockJob *job = find_block_job(device);
+
+    if (!job) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
+        return;
+    }
+
+    trace_qmp_block_job_complete(job);
+    block_job_complete(job, errp);
+}
+
 static void do_qmp_query_block_jobs_one(void *opaque, BlockDriverState *bs)
 {
     BlockJobInfoList **prev = opaque;
diff --git a/blockjob.c b/blockjob.c
index 651ee8d..e5b5e63 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -99,6 +99,16 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     job->speed = speed;
 }
 
+void block_job_complete(BlockJob *job, Error **errp)
+{
+    if (job->paused || job->cancelled || !job->job_type->complete) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_READY, job->bs->device_name);
+        return;
+    }
+
+    job->job_type->complete(job, errp);
+}
+
 void block_job_pause(BlockJob *job)
 {
     job->paused = true;
diff --git a/blockjob.h b/blockjob.h
index 5e03b5d..61fdafb 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -47,6 +47,12 @@ typedef struct BlockJobType {
      * of BlockJobInfo.
      */
     void (*query)(BlockJob *job, BlockJobInfo *info);
+
+    /**
+     * Optional callback for job types whose completion must be triggered
+     * manually.
+     */
+    void (*complete)(BlockJob *job, Error **errp);
 } BlockJobType;
 
 /**
@@ -170,6 +176,15 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
 void block_job_cancel(BlockJob *job);
 
 /**
+ * block_job_complete:
+ * @job: The job to be completed.
+ * @errp: Error object.
+ *
+ * Synchronously complete the specified job.
+ */
+void block_job_complete(BlockJob *job, Error **errp);
+
+/**
  * block_job_is_cancelled:
  * @job: The job being queried.
  *
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3c60626..7a72122 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -109,7 +109,22 @@ ETEXI
 STEXI
 @item block_job_cancel
 @findex block_job_cancel
-Stop an active block streaming operation.
+Stop an active background block operation (streaming, mirroring).
+ETEXI
+
+    {
+        .name       = "block_job_complete",
+        .args_type  = "device:B",
+        .params     = "device",
+        .help       = "stop an active background block operation",
+        .mhandler.cmd = hmp_block_job_complete,
+    },
+
+STEXI
+@item block_job_complete
+@findex block_job_complete
+Manually trigger completion of an active background block operation.
+For mirroring, this will switch the device to the destination path.
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index 5d4e6ac..206e920 100644
--- a/hmp.c
+++ b/hmp.c
@@ -893,6 +893,16 @@ void hmp_block_job_resume(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &error);
 }
 
+void hmp_block_job_complete(Monitor *mon, const QDict *qdict)
+{
+    Error *error = NULL;
+    const char *device = qdict_get_str(qdict, "device");
+
+    qmp_block_job_complete(device, &error);
+
+    hmp_handle_error(mon, &error);
+}
+
 typedef struct MigrationStatus
 {
     QEMUTimer *timer;
diff --git a/hmp.h b/hmp.h
index 39a71c4..24e551e 100644
--- a/hmp.h
+++ b/hmp.h
@@ -61,6 +61,7 @@ void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
 void hmp_block_job_pause(Monitor *mon, const QDict *qdict);
 void hmp_block_job_resume(Monitor *mon, const QDict *qdict);
+void hmp_block_job_complete(Monitor *mon, const QDict *qdict);
 void hmp_migrate(Monitor *mon, const QDict *qdict);
 void hmp_device_del(Monitor *mon, const QDict *qdict);
 void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index 2697220..e081ddf 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1792,6 +1792,33 @@
 { 'command': 'block-job-resume', 'data': { 'device': 'str' } }
 
 ##
+# @block-job-complete:
+#
+# Manually trigger completion of an active background block operation.  This
+# is supported for drive mirroring, where it also switches the device to
+# write to the target path only.
+#
+# This command completes an active background block operation synchronously.
+# The ordering of this command's return with the BLOCK_JOB_COMPLETED event
+# is not defined.  Note that if an I/O error occurs during the processing of
+# this command: 1) the command itself will fail; 2) the error will be processed
+# according to the rerror/werror arguments that were specified when starting
+# the operation.
+#
+# A cancelled or paused job cannot be completed.
+#
+# @device: the device name
+#
+# Returns: Nothing on success
+#          If no background operation is active on this device, BlockJobNotActive
+#          If the operation cannot be completed manually (either in general, or
+#            not at the time the command is invoked), BlockJobNotReady
+#
+# Since: 1.2
+##
+{ 'command': 'block-job-complete', 'data': { 'device': 'str' } }
+
+##
 # @ObjectTypeInfo:
 #
 # This structure describes a search result from @qom-list-types
diff --git a/qerror.c b/qerror.c
index 72183ec..60303a4 100644
--- a/qerror.c
+++ b/qerror.c
@@ -68,6 +68,10 @@ static const QErrorStringTable qerror_table[] = {
         .desc      = "The block job for device '%(name)' is currently paused",
     },
     {
+        .error_fmt = QERR_BLOCK_JOB_NOT_READY,
+        .desc      = "The active block job for device '%(name)' cannot be completed",
+    },
+    {
         .error_fmt = QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
         .desc      = "Block format '%(format)' used by device '%(name)' does not support feature '%(feature)'",
     },
diff --git a/qerror.h b/qerror.h
index d1baea0..c15e933 100644
--- a/qerror.h
+++ b/qerror.h
@@ -70,6 +70,9 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_BLOCK_JOB_PAUSED \
     "{ 'class': 'BlockJobPaused', 'data': { 'name': %s } }"
 
+#define QERR_BLOCK_JOB_NOT_READY \
+    "{ 'class': 'BlockJobNotReady', 'data': { 'name': %s } }"
+
 #define QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED \
     "{ 'class': 'BlockFormatFeatureNotSupported', 'data': { 'format': %s, 'name': %s, 'feature': %s } }"
 
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7026a4a..56f953d 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -744,6 +744,11 @@ EQMP
         .mhandler.cmd_new = qmp_marshal_input_block_job_resume,
     },
     {
+        .name       = "block-job-complete",
+        .args_type  = "device:B",
+        .mhandler.cmd_new = qmp_marshal_input_block_job_complete,
+    },
+    {
         .name       = "transaction",
         .args_type  = "actions:q",
         .mhandler.cmd_new = qmp_marshal_input_transaction,
diff --git a/trace-events b/trace-events
index 416b723..58dfe6c 100644
--- a/trace-events
+++ b/trace-events
@@ -79,6 +79,7 @@ stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base
 qmp_block_job_cancel(void *job) "job %p"
 qmp_block_job_pause(void *job) "job %p"
 qmp_block_job_resume(void *job) "job %p"
+qmp_block_job_complete(void *job) "job %p"
 block_stream_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 26/47] block: introduce BLOCK_JOB_READY event
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (24 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 25/47] block: add block-job-complete Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 27/47] block: introduce mirror job Paolo Bonzini
                   ` (21 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Even for jobs that need to be manually completed, management may want
to take care itself of the completion, not requiring the user to issue
a command to terminate the job.  In this case we want to avoid that
they poll us continuously, waiting for completion to become available.
Thus, add a new event that signals the phase switch and the availability
of the block-job-complete command.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 QMP/qmp-events.txt |   20 ++++++++++++++++++++
 blockdev.c         |   14 --------------
 blockjob.c         |   21 +++++++++++++++++++++
 blockjob.h         |   16 ++++++++++++++++
 monitor.c          |    1 +
 monitor.h          |    1 +
 qapi-schema.json   |    3 ++-
 7 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index e910deb..e7b7851 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -376,3 +376,23 @@ Example:
               "operation": "write",
               "action": "stop" },
     "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+
+BLOCK_JOB_READY
+---------------
+
+Emitted when a block job is ready to complete.
+
+Data:
+
+- "device": device name (json-string)
+
+Example:
+
+{ "event": "BLOCK_JOB_READY",
+    "data": { "device": "ide0-hd1",
+              "operation": "write",
+              "action": "stop" },
+    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+
+Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
+event.
diff --git a/blockdev.c b/blockdev.c
index 7544afc..192a9db 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1049,20 +1049,6 @@ void qmp_block_resize(const char *device, int64_t size, Error **errp)
     }
 }
 
-static QObject *qobject_from_block_job(BlockJob *job)
-{
-    return qobject_from_jsonf("{ 'type': %s,"
-                              "'device': %s,"
-                              "'len': %" PRId64 ","
-                              "'offset': %" PRId64 ","
-                              "'speed': %" PRId64 " }",
-                              job->job_type->job_type,
-                              bdrv_get_device_name(job->bs),
-                              job->len,
-                              job->offset,
-                              job->speed);
-}
-
 static void block_stream_cb(void *opaque, int ret)
 {
     BlockDriverState *bs = opaque;
diff --git a/blockjob.c b/blockjob.c
index e5b5e63..b5a4033 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -227,6 +227,27 @@ static void block_job_iostatus_set_err(BlockJob *job, int error)
 }
 
 
+QObject *qobject_from_block_job(BlockJob *job)
+{
+    return qobject_from_jsonf("{ 'type': %s,"
+                              "'device': %s,"
+                              "'len': %" PRId64 ","
+                              "'offset': %" PRId64 ","
+                              "'speed': %" PRId64 " }",
+                              job->job_type->job_type,
+                              bdrv_get_device_name(job->bs),
+                              job->len,
+                              job->offset,
+                              job->speed);
+}
+
+void block_job_ready(BlockJob *job)
+{
+    QObject *data = qobject_from_block_job(job);
+    monitor_protocol_event(QEVENT_BLOCK_JOB_READY, data);
+    qobject_decref(data);
+}
+
 BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
                                         BlockdevOnError on_err,
                                         int is_read, int error)
diff --git a/blockjob.h b/blockjob.h
index 61fdafb..0d29fdf 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -217,6 +217,22 @@ void block_job_pause(BlockJob *job);
 void block_job_resume(BlockJob *job);
 
 /**
+ * qobject_from_block_job:
+ * @job: The job whose information is requested.
+ *
+ * Return a QDict corresponding to @job's query-block-jobs entry.
+ */
+QObject *qobject_from_block_job(BlockJob *job);
+
+/**
+ * block_job_ready:
+ * @job: The job which is now ready to complete.
+ *
+ * Send a BLOCK_JOB_READY event for the specified job.
+ */
+void block_job_ready(BlockJob *job);
+
+/**
  * block_job_is_paused:
  * @job: The job being queried.
  *
diff --git a/monitor.c b/monitor.c
index 19da71d..5336690 100644
--- a/monitor.c
+++ b/monitor.c
@@ -455,6 +455,7 @@ static const char *monitor_event_names[] = {
     [QEVENT_BLOCK_JOB_COMPLETED] = "BLOCK_JOB_COMPLETED",
     [QEVENT_BLOCK_JOB_CANCELLED] = "BLOCK_JOB_CANCELLED",
     [QEVENT_BLOCK_JOB_ERROR] = "BLOCK_JOB_ERROR",
+    [QEVENT_BLOCK_JOB_READY] = "BLOCK_JOB_READY",
     [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
     [QEVENT_SUSPEND] = "SUSPEND",
     [QEVENT_WAKEUP] = "WAKEUP",
diff --git a/monitor.h b/monitor.h
index f806962..cc71b9b 100644
--- a/monitor.h
+++ b/monitor.h
@@ -39,6 +39,7 @@ typedef enum MonitorEvent {
     QEVENT_BLOCK_JOB_COMPLETED,
     QEVENT_BLOCK_JOB_CANCELLED,
     QEVENT_BLOCK_JOB_ERROR,
+    QEVENT_BLOCK_JOB_READY,
     QEVENT_DEVICE_TRAY_MOVED,
     QEVENT_SUSPEND,
     QEVENT_WAKEUP,
diff --git a/qapi-schema.json b/qapi-schema.json
index e081ddf..a480c8a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1796,7 +1796,8 @@
 #
 # Manually trigger completion of an active background block operation.  This
 # is supported for drive mirroring, where it also switches the device to
-# write to the target path only.
+# write to the target path only.  The ability to complete is signaled with
+# a BLOCK_JOB_READY event.
 #
 # This command completes an active background block operation synchronously.
 # The ordering of this command's return with the BLOCK_JOB_COMPLETED event
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 27/47] block: introduce mirror job
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (25 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 26/47] block: introduce BLOCK_JOB_READY event Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-25 23:02   ` Eric Blake
  2012-09-13 12:54   ` Kevin Wolf
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command Paolo Bonzini
                   ` (20 subsequent siblings)
  47 siblings, 2 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This patch adds the implementation of a new job that mirrors a disk to
a new image while letting the guest continue using the old image.
The target is treated as a "black box" and data is copied from the
source to the target in the background.  This can be used for several
purposes, including storage migration, continuous replication, and
observation of the guest I/O in an external program.  It is also a
first step in replacing the inefficient block migration code that is
part of QEMU.

The job is possibly never-ending, but it is logically structured into
two phases: 1) copy all data as fast as possible until the target
first gets in sync with the source; 2) keep target in sync and
ensure that reopening to the target gets a correct (full) copy
of the source data.

The second phase is indicated by the progress in "info block-jobs"
reporting the current offset to be equal to the length of the file.
When the job is cancelled in the second phase, QEMU will run the
job until the source is clean and quiescent, then it will report
successful completion of the job.

In other words, the BLOCK_JOB_CANCELLED event means that the target
may _not_ be consistent with a past state of the source; the
BLOCK_JOB_COMPLETED event means that the target is consistent with
a past state of the source.  (Note that it could already happen
that management lost the race against QEMU and got a completion
event instead of cancellation).

It is not yet possible to complete the job and switch over to the target
disk.  The next patches will fix this and add many refinements to the
basic idea introduced here.  These include improved error management,
some tunable knobs and performance optimizations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/Makefile.objs |    2 +-
 block/mirror.c      |  232 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block_int.h         |   20 +++++
 qapi-schema.json    |   17 ++++
 trace-events        |    7 ++
 5 files changed, 277 insertions(+), 1 deletion(-)
 create mode 100644 block/mirror.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c45affc..f1a394a 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -9,4 +9,4 @@ block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 
-common-obj-y += stream.o
+common-obj-y += stream.o mirror.o
diff --git a/block/mirror.c b/block/mirror.c
new file mode 100644
index 0000000..f7d36f9
--- /dev/null
+++ b/block/mirror.c
@@ -0,0 +1,232 @@
+/*
+ * Image mirroring
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Paolo Bonzini  <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "trace.h"
+#include "blockjob.h"
+#include "block_int.h"
+#include "qemu/ratelimit.h"
+
+enum {
+    /*
+     * Size of data buffer for populating the image file.  This should be large
+     * enough to process multiple clusters in a single call, so that populating
+     * contiguous regions of the image is efficient.
+     */
+    BLOCK_SIZE = 512 * BDRV_SECTORS_PER_DIRTY_CHUNK, /* in bytes */
+};
+
+#define SLICE_TIME 100000000ULL /* ns */
+
+typedef struct MirrorBlockJob {
+    BlockJob common;
+    RateLimit limit;
+    BlockDriverState *target;
+    MirrorSyncMode mode;
+    int64_t sector_num;
+    uint8_t *buf;
+} MirrorBlockJob;
+
+static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
+{
+    BlockDriverState *source = s->common.bs;
+    BlockDriverState *target = s->target;
+    QEMUIOVector qiov;
+    int ret, nb_sectors;
+    int64_t end;
+    struct iovec iov;
+
+    end = s->common.len >> BDRV_SECTOR_BITS;
+    s->sector_num = bdrv_get_next_dirty(source, s->sector_num);
+    nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
+    bdrv_reset_dirty(source, s->sector_num, nb_sectors);
+
+    /* Copy the dirty cluster.  */
+    iov.iov_base = s->buf;
+    iov.iov_len  = nb_sectors * 512;
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
+    ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
+    if (ret < 0) {
+        return ret;
+    }
+    return bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+}
+
+static void coroutine_fn mirror_run(void *opaque)
+{
+    MirrorBlockJob *s = opaque;
+    BlockDriverState *bs = s->common.bs;
+    int64_t sector_num, end;
+    int ret = 0;
+    int n;
+    bool synced = false;
+
+    if (block_job_is_cancelled(&s->common)) {
+        goto immediate_exit;
+    }
+
+    s->common.len = bdrv_getlength(bs);
+    if (s->common.len < 0) {
+        block_job_completed(&s->common, s->common.len);
+        return;
+    }
+
+    end = s->common.len >> BDRV_SECTOR_BITS;
+    s->buf = qemu_blockalign(bs, BLOCK_SIZE);
+
+    if (s->mode == MIRROR_SYNC_MODE_FULL || s->mode == MIRROR_SYNC_MODE_TOP) {
+        /* First part, loop on the sectors and initialize the dirty bitmap.  */
+        BlockDriverState *base;
+        base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
+        for (sector_num = 0; sector_num < end; ) {
+            int64_t next = (sector_num | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
+            ret = bdrv_co_is_allocated_above(bs, base,
+                                             sector_num, next - sector_num, &n);
+
+            if (ret < 0) {
+                break;
+            } else if (ret == 1) {
+                bdrv_set_dirty(bs, sector_num, n);
+                sector_num = next;
+            } else {
+                sector_num += n;
+            }
+        }
+    }
+
+    if (ret < 0) {
+        goto immediate_exit;
+    }
+
+    s->sector_num = -1;
+    for (;;) {
+        uint64_t delay_ns;
+        int64_t cnt;
+        bool should_complete;
+
+        cnt = bdrv_get_dirty_count(bs);
+        if (cnt != 0) {
+            ret = mirror_iteration(s);
+            if (ret < 0) {
+                break;
+            }
+            cnt = bdrv_get_dirty_count(bs);
+        }
+
+        if (cnt != 0) {
+            should_complete = false;
+        } else {
+            trace_mirror_before_flush(s);
+            bdrv_flush(s->target);
+
+            /* We're out of the streaming phase.  From now on, if the
+             * job is cancelled we will actually complete all pending
+             * I/O and report completion, so that drive-reopen can be
+             * used to pivot to the mirroring target.
+             */
+            synced = true;
+            s->common.offset = end * BDRV_SECTOR_SIZE;
+
+            should_complete = block_job_is_cancelled(&s->common);
+            if (should_complete) {
+                /* The dirty bitmap is not updated while operations are pending.
+                 * If we're about to exit, wait for pending operations before
+                 * calling bdrv_get_dirty_count(bs), or we may exit while the
+                 * source has dirty data to copy!
+                 *
+                 * Note that I/O can be submitted by the guest while
+                 * mirror_iteration runs.
+                 */
+                trace_mirror_before_drain(s, cnt);
+                bdrv_drain_all();
+            }
+            cnt = bdrv_get_dirty_count(bs);
+        }
+
+        ret = 0;
+        trace_mirror_before_sleep(s, cnt, synced);
+        if (!synced) {
+            /* Publish progress */
+            s->common.offset = end * BDRV_SECTOR_SIZE - cnt * BLOCK_SIZE;
+
+            if (s->common.speed) {
+                delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
+            } else {
+                delay_ns = 0;
+            }
+
+            /* Note that even when no rate limit is applied we need to yield
+             * with no pending I/O here so that qemu_aio_flush() returns.
+             */
+            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+            if (block_job_is_cancelled(&s->common)) {
+                break;
+            }
+        } else if (!should_complete) {
+            delay_ns = (cnt == 0 ? SLICE_TIME : 0);
+            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+        } else if (cnt == 0) {
+            /* The two disks are in sync.  Exit and report successful
+             * completion.
+             */
+            assert(QLIST_EMPTY(&bs->tracked_requests));
+            s->common.cancelled = false;
+            break;
+        }
+    }
+
+immediate_exit:
+    g_free(s->buf);
+    bdrv_set_dirty_tracking(bs, false);
+    bdrv_close(s->target);
+    bdrv_delete(s->target);
+    block_job_completed(&s->common, ret);
+}
+
+static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+    if (speed < 0) {
+        error_set(errp, QERR_INVALID_PARAMETER, "speed");
+        return;
+    }
+    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+}
+
+static BlockJobType mirror_job_type = {
+    .instance_size = sizeof(MirrorBlockJob),
+    .job_type      = "mirror",
+    .set_speed     = mirror_set_speed,
+};
+
+void mirror_start(BlockDriverState *bs, BlockDriverState *target,
+                  int64_t speed, MirrorSyncMode mode,
+                  BlockDriverCompletionFunc *cb,
+                  void *opaque, Error **errp)
+{
+    MirrorBlockJob *s;
+
+    s = block_job_create(&mirror_job_type, bs, speed, cb, opaque, errp);
+    if (!s) {
+        return;
+    }
+
+    s->target = target;
+    s->mode = mode;
+    bdrv_set_dirty_tracking(bs, true);
+    s->common.co = qemu_coroutine_create(mirror_run);
+    trace_mirror_start(bs, s, s->common.co, opaque);
+    qemu_coroutine_enter(s->common.co, s);
+}
diff --git a/block_int.h b/block_int.h
index d30ff7f..714f939 100644
--- a/block_int.h
+++ b/block_int.h
@@ -305,4 +305,24 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
+/**
+ * mirror_start:
+ * @bs: Block device to operate on.
+ * @target: Block device to write to.
+ * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @mode: Whether to collapse all images in the chain to the target.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @errp: Error object.
+ *
+ * Start a mirroring operation on @bs.  Clusters that are allocated
+ * in @bs will be written to @bs until the job is canceled or
+ * manually completed.  At the end of a successful mirroring job,
+ * @bs will be switched to read from @target.
+ */
+void mirror_start(BlockDriverState *bs, BlockDriverState *target,
+                  int64_t speed, MirrorSyncMode mode,
+                  BlockDriverCompletionFunc *cb,
+                  void *opaque, Error **errp);
+
 #endif /* BLOCK_INT_H */
diff --git a/qapi-schema.json b/qapi-schema.json
index a480c8a..b7f16c4 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -926,6 +926,23 @@
   'data': ['report', 'ignore', 'enospc', 'stop'] }
 
 ##
+# @MirrorSyncMode:
+#
+# An enumeration of possible behaviors for the initial synchronization
+# phase of storage mirroring.
+#
+# @top: copies data in the topmost image to the destination
+#
+# @full: copies data from all images to the destination
+#
+# @none: only copy data written from now on
+#
+# Since: 1.2
+##
+{ 'enum': 'MirrorSyncMode',
+  'data': ['top', 'full', 'none'] }
+
+##
 # @BlockJobTargetInfo:
 #
 # Information about the target device for a long-running block device
diff --git a/trace-events b/trace-events
index 58dfe6c..1e65c77 100644
--- a/trace-events
+++ b/trace-events
@@ -75,6 +75,13 @@ bdrv_co_do_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t c
 stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
 stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base %p s %p co %p opaque %p"
 
+# block/mirror.c
+mirror_start(void *bs, void *s, void *co, void *opaque) "bs %p s %p co %p opaque %p"
+mirror_before_flush(void *s) "s %p"
+mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
+mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
+mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
+
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
 qmp_block_job_pause(void *job) "job %p"
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (26 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 27/47] block: introduce mirror job Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-26 23:42   ` Eric Blake
                     ` (2 more replies)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 29/47] mirror: support querying target file Paolo Bonzini
                   ` (19 subsequent siblings)
  47 siblings, 3 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This adds the monitor commands that start the mirroring job.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 hmp-commands.hx  |   21 +++++++++
 hmp.c            |   28 ++++++++++++
 hmp.h            |    1 +
 qapi-schema.json |   35 ++++++++++++++
 qmp-commands.hx  |   42 +++++++++++++++++
 trace-events     |    2 +-
 7 files changed, 258 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 192a9db..4b4574a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -21,6 +21,8 @@
 #include "trace.h"
 #include "arch_init.h"
 
+static void block_job_cb(void *opaque, int ret);
+
 static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
 
 static const char *const if_name[IF_COUNT] = {
@@ -825,6 +827,131 @@ exit:
     return;
 }
 
+void qmp_drive_mirror(const char *device, const char *target,
+                      bool has_format, const char *format,
+                      enum MirrorSyncMode sync,
+                      bool has_mode, enum NewImageMode mode,
+                      bool has_speed, int64_t speed, Error **errp)
+{
+    BlockDriverInfo bdi;
+    BlockDriverState *bs;
+    BlockDriverState *source, *target_bs;
+    BlockDriver *proto_drv;
+    BlockDriver *drv = NULL;
+    Error *local_err = NULL;
+    int flags;
+    uint64_t size;
+    int ret;
+
+    if (!has_speed) {
+        speed = 0;
+    }
+    if (!has_mode) {
+        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+    }
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+
+    if (!has_format) {
+        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;
+    }
+    if (format) {
+        drv = bdrv_find_format(format);
+        if (!drv) {
+            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+            return;
+        }
+    }
+
+    if (!bdrv_is_inserted(bs)) {
+        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+        return;
+    }
+
+    if (bdrv_in_use(bs)) {
+        error_set(errp, QERR_DEVICE_IN_USE, device);
+        return;
+    }
+
+    flags = bs->open_flags | BDRV_O_RDWR;
+    source = bs->backing_hd;
+    if (!source && sync == MIRROR_SYNC_MODE_TOP) {
+        sync = MIRROR_SYNC_MODE_FULL;
+    }
+
+    proto_drv = bdrv_find_protocol(target);
+    if (!proto_drv) {
+        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+        return;
+    }
+
+    if (sync == MIRROR_SYNC_MODE_FULL && mode != NEW_IMAGE_MODE_EXISTING) {
+        /* create new image w/o backing file */
+        assert(format && drv);
+        bdrv_get_geometry(bs, &size);
+        size *= 512;
+        ret = bdrv_img_create(target, format,
+                              NULL, NULL, NULL, size, flags);
+    } else {
+        switch (mode) {
+        case NEW_IMAGE_MODE_EXISTING:
+            ret = 0;
+            break;
+        case NEW_IMAGE_MODE_ABSOLUTE_PATHS:
+            /* create new image with backing file */
+            ret = bdrv_img_create(target, format,
+                                  source->filename,
+                                  source->drv->format_name,
+                                  NULL, -1, flags);
+            break;
+        default:
+            abort();
+        }
+    }
+
+    if (ret) {
+        error_set(errp, QERR_OPEN_FILE_FAILED, target);
+        return;
+    }
+
+    target_bs = bdrv_new("");
+    ret = bdrv_open(target_bs, target, flags | BDRV_O_NO_BACKING, drv);
+
+    if (ret < 0) {
+        bdrv_delete(target_bs);
+        error_set(errp, QERR_OPEN_FILE_FAILED, target);
+        return;
+    }
+
+    /* We need a backing file if we will copy parts of a cluster.  */
+    if (bdrv_get_info(target_bs, &bdi) >= 0 && bdi.cluster_size != 0 &&
+        bdi.cluster_size >= BDRV_SECTORS_PER_DIRTY_CHUNK * 512) {
+        ret = bdrv_ensure_backing_file(target_bs);
+        if (ret < 0) {
+            bdrv_delete(target_bs);
+            error_set(errp, QERR_OPEN_FILE_FAILED, target);
+            return;
+        }
+    }
+
+    mirror_start(bs, target_bs, speed, sync, block_job_cb, bs, &local_err);
+    if (local_err != NULL) {
+        bdrv_delete(target_bs);
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Grab a reference so hotplug does not delete the BlockDriverState from
+     * underneath us.
+     */
+    drive_get_ref(drive_get_by_blockdev(bs));
+}
+
+
 
 static void eject_device(BlockDriverState *bs, int force, Error **errp)
 {
@@ -1049,12 +1176,12 @@ void qmp_block_resize(const char *device, int64_t size, Error **errp)
     }
 }
 
-static void block_stream_cb(void *opaque, int ret)
+static void block_job_cb(void *opaque, int ret)
 {
     BlockDriverState *bs = opaque;
     QObject *obj;
 
-    trace_block_stream_cb(bs, bs->job, ret);
+    trace_block_job_cb(bs, bs->job, ret);
 
     assert(bs->job);
     obj = qobject_from_block_job(bs->job);
@@ -1101,7 +1228,7 @@ void qmp_block_stream(const char *device, bool has_base,
     }
 
     stream_start(bs, base_bs, base, has_speed ? speed : 0,
-                 on_error, block_stream_cb, bs, &local_err);
+                 on_error, block_job_cb, bs, &local_err);
     if (error_is_set(&local_err)) {
         error_propagate(errp, local_err);
         return;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 7a72122..a6db333 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -973,6 +973,27 @@ Snapshot device, using snapshot file as target if provided
 ETEXI
 
     {
+        .name       = "drive_mirror",
+        .args_type  = "reuse:-n,full:-f,device:B,target:s,format:s?",
+        .params     = "[-n] [-f] device target [format]",
+        .help       = "initiates live storage\n\t\t\t"
+                      "migration for a device. The device's contents are\n\t\t\t"
+                      "copied to the new image file, including data that\n\t\t\t"
+                      "is written after the command is started.\n\t\t\t"
+                      "The -n flag requests QEMU to reuse the image found\n\t\t\t"
+                      "in new-image-file, instead of recreating it from scratch.\n\t\t\t"
+                      "The -f flag requests QEMU to copy the whole disk,\n\t\t\t"
+                      "so that the result does not need a backing file.\n\t\t\t",
+        .mhandler.cmd = hmp_drive_mirror,
+    },
+STEXI
+@item drive_mirror
+@findex drive_mirror
+Start mirroring a block device's writes to a new destination,
+using the specified target.
+ETEXI
+
+    {
         .name       = "drive_add",
         .args_type  = "pci_addr:s,opts:s",
         .params     = "[[<domain>:]<bus>:]<slot>\n"
diff --git a/hmp.c b/hmp.c
index 206e920..1d6a606 100644
--- a/hmp.c
+++ b/hmp.c
@@ -695,6 +695,34 @@ void hmp_block_resize(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &errp);
 }
 
+void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
+{
+    const char *device = qdict_get_str(qdict, "device");
+    const char *filename = qdict_get_str(qdict, "target");
+    const char *format = qdict_get_try_str(qdict, "format");
+    int reuse = qdict_get_try_bool(qdict, "reuse", 0);
+    int full = qdict_get_try_bool(qdict, "full", 0);
+    enum NewImageMode mode;
+    Error *errp = NULL;
+
+    if (!filename) {
+        error_set(&errp, QERR_MISSING_PARAMETER, "target");
+        hmp_handle_error(mon, &errp);
+        return;
+    }
+
+    if (reuse) {
+        mode = NEW_IMAGE_MODE_EXISTING;
+    } else {
+        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+    }
+
+    qmp_drive_mirror(device, filename, !!format, format,
+                     full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
+                     true, mode, false, 0, &errp);
+    hmp_handle_error(mon, &errp);
+}
+
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
 {
     const char *device = qdict_get_str(qdict, "device");
diff --git a/hmp.h b/hmp.h
index 24e551e..300efc2 100644
--- a/hmp.h
+++ b/hmp.h
@@ -48,6 +48,7 @@ void hmp_block_passwd(Monitor *mon, const QDict *qdict);
 void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_block_resize(Monitor *mon, const QDict *qdict);
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
+void hmp_drive_mirror(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index b7f16c4..f5c20ca 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1367,6 +1367,41 @@
   'returns': 'str' }
 
 ##
+# @drive-mirror
+#
+# Start mirroring a block device's writes to a new destination.
+#
+# @device:  the name of the device whose writes should be mirrored.
+#
+# @target: the target of the new image. If the file exists, or if it
+#          is a device, the existing file/device will be used as the new
+#          destination.  If it does not exist, a new file will be created.
+#
+# @format: #optional the format of the new destination, default is to
+#          probe is @mode is 'existing', else the format of the source
+#
+# @mode: #optional whether and how QEMU should create a new image, default is
+#        'absolute-paths'.
+#
+# @speed:  #optional the maximum speed, in bytes per second
+#
+# @sync: what parts of the disk image should be copied to the destination
+#        (all the disk, only the sectors allocated in the topmost image, or
+#        only new I/O).
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+#          If @target can't be opened, OpenFileFailed
+#          If @format is invalid, InvalidBlockFormat
+#
+# Since 1.2
+##
+{ 'command': 'drive-mirror',
+  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
+            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
+            '*speed': 'int' } }
+
+##
 # @migrate_cancel
 #
 # Cancel the current executing migration process.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 56f953d..7a32bb6 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -837,6 +837,48 @@ Example:
 EQMP
 
     {
+        .name       = "drive-mirror",
+        .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?",
+        .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
+    },
+
+SQMP
+drive-mirror
+------------
+
+Start mirroring a block device's writes to a new destination. target
+specifies the target of the new image. If the file exists, or if it is
+a device, it will be used as the new destination for writes. If does not
+exist, a new file will be created. format specifies the format of the
+mirror image, default is to probe if mode='existing', else the format
+of the source.
+
+Arguments:
+
+- "device": device name to operate on (json-string)
+- "target": name of new image file (json-string)
+- "format": format of new image (json-string, optional)
+- "mode": how an image file should be created into the target
+  file/device (NewImageMode, optional, default 'absolute-paths')
+- "speed": maximum speed of the streaming job, in bytes per second
+  (json-int)
+- "sync": what parts of the disk image should be copied to the destination;
+  possibilities include "full" for all the disk, "top" for only the sectors
+  allocated in the topmost image, or "none" to only replicate new I/O
+  (MirrorSyncMode).
+
+
+Example:
+
+-> { "execute": "drive-mirror", "arguments": { "device": "ide-hd0",
+                                               "target": "/some/place/my-image",
+                                               "sync": "full",
+                                               "format": "qcow2" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "balloon",
         .args_type  = "value:M",
         .mhandler.cmd_new = qmp_marshal_input_balloon,
diff --git a/trace-events b/trace-events
index 1e65c77..ac58f3a 100644
--- a/trace-events
+++ b/trace-events
@@ -87,7 +87,7 @@ qmp_block_job_cancel(void *job) "job %p"
 qmp_block_job_pause(void *job) "job %p"
 qmp_block_job_resume(void *job) "job %p"
 qmp_block_job_complete(void *job) "job %p"
-block_stream_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
+block_job_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
 # hw/virtio-blk.c
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 29/47] mirror: support querying target file
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (27 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 30/47] mirror: implement completion Paolo Bonzini
                   ` (18 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This lets query-block-jobs retrieve information and statistics on the
mirroring target.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c |   11 +++++++++++
 blockjob.c     |    2 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index f7d36f9..9c8ebd4 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -205,10 +205,21 @@ static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
     ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
 }
 
+static void mirror_query(BlockJob *job, BlockJobInfo *info)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+    info->has_target = true;
+    info->target = g_new0(BlockJobTargetInfo, 1);
+    info->target->info = bdrv_query_info(s->target);
+    info->target->stats = bdrv_query_stats(s->target);
+}
+
 static BlockJobType mirror_job_type = {
     .instance_size = sizeof(MirrorBlockJob),
     .job_type      = "mirror",
     .set_speed     = mirror_set_speed,
+    .query         = mirror_query,
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
diff --git a/blockjob.c b/blockjob.c
index b5a4033..42485bf 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -199,7 +199,7 @@ void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
 
 BlockJobInfo *block_job_query(BlockJob *job)
 {
-    BlockJobInfo *info = g_new(BlockJobInfo, 1);
+    BlockJobInfo *info = g_new0(BlockJobInfo, 1);
     info->type      = g_strdup(job->job_type->job_type);
     info->device    = g_strdup(bdrv_get_device_name(job->bs));
     info->len       = job->len;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 30/47] mirror: implement completion
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (28 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 29/47] mirror: support querying target file Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 31/47] qemu-iotests: add mirroring test case Paolo Bonzini
                   ` (17 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Switching to the target of the migration is done mostly asynchronously,
and reported to management via the BLOCK_JOB_COMPLETED event; the only
synchronous phase is opening the backing files.  Note that this can be
done always, even for migration of the full image, because the backing
file structure of the source and target are not in any relationship.
For full migration (aka sync: 'full') qmp_drive_mirror will create the
target disk with no backing file at all, and bdrv_ensure_backing_file
will be a no-op.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c |   43 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9c8ebd4..4454ef3 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -32,6 +32,8 @@ typedef struct MirrorBlockJob {
     RateLimit limit;
     BlockDriverState *target;
     MirrorSyncMode mode;
+    bool synced;
+    bool complete;
     int64_t sector_num;
     uint8_t *buf;
 } MirrorBlockJob;
@@ -70,7 +72,6 @@ static void coroutine_fn mirror_run(void *opaque)
     int64_t sector_num, end;
     int ret = 0;
     int n;
-    bool synced = false;
 
     if (block_job_is_cancelled(&s->common)) {
         goto immediate_exit;
@@ -135,10 +136,13 @@ static void coroutine_fn mirror_run(void *opaque)
              * I/O and report completion, so that drive-reopen can be
              * used to pivot to the mirroring target.
              */
-            synced = true;
             s->common.offset = end * BDRV_SECTOR_SIZE;
+            if (!s->synced) {
+                block_job_ready(&s->common);
+                s->synced = true;
+            }
 
-            should_complete = block_job_is_cancelled(&s->common);
+            should_complete = block_job_is_cancelled(&s->common) || s->complete;
             if (should_complete) {
                 /* The dirty bitmap is not updated while operations are pending.
                  * If we're about to exit, wait for pending operations before
@@ -155,8 +159,8 @@ static void coroutine_fn mirror_run(void *opaque)
         }
 
         ret = 0;
-        trace_mirror_before_sleep(s, cnt, synced);
-        if (!synced) {
+        trace_mirror_before_sleep(s, cnt, s->synced);
+        if (!s->synced) {
             /* Publish progress */
             s->common.offset = end * BDRV_SECTOR_SIZE - cnt * BLOCK_SIZE;
 
@@ -189,7 +193,11 @@ static void coroutine_fn mirror_run(void *opaque)
 immediate_exit:
     g_free(s->buf);
     bdrv_set_dirty_tracking(bs, false);
-    bdrv_close(s->target);
+    if (s->complete && ret == 0) {
+        bdrv_swap(s->target, s->common.bs);
+    } else {
+        bdrv_close(s->target);
+    }
     bdrv_delete(s->target);
     block_job_completed(&s->common, ret);
 }
@@ -215,11 +223,34 @@ static void mirror_query(BlockJob *job, BlockJobInfo *info)
     info->target->stats = bdrv_query_stats(s->target);
 }
 
+static void mirror_complete(BlockJob *job, Error **errp)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+    int ret;
+
+    ret = bdrv_ensure_backing_file(s->target);
+    if (ret < 0) {
+        char backing_filename[PATH_MAX];
+        bdrv_get_full_backing_filename(s->target, backing_filename,
+                                       sizeof(backing_filename));
+        error_set(errp, QERR_OPEN_FILE_FAILED, backing_filename);
+        return;
+    }
+    if (!s->synced) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_READY, job->bs->device_name);
+        return;
+    }
+
+    s->complete = true;
+    block_job_resume(job);
+}
+
 static BlockJobType mirror_job_type = {
     .instance_size = sizeof(MirrorBlockJob),
     .job_type      = "mirror",
     .set_speed     = mirror_set_speed,
     .query         = mirror_query,
+    .complete      = mirror_complete,
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 31/47] qemu-iotests: add mirroring test case
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (29 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 30/47] mirror: implement completion Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-26 23:46   ` Eric Blake
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 32/47] block: forward bdrv_iostatus_reset to block job Paolo Bonzini
                   ` (16 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

The tests are not meant to offer full coverage; in particular, there is
no concurrent I/O going on from the guest.  However, especially in the
case of blkdebug-based tests (introduced later in the series) they do
cover some paths that will usually be skipped by integration tests,
for example tests running on kvm-autotest.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/039   |  352 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group |    1 +
 2 files changed, 353 insertions(+)
 create mode 100755 tests/qemu-iotests/039

diff --git a/tests/qemu-iotests/039 b/tests/qemu-iotests/039
new file mode 100755
index 0000000..c512f14
--- /dev/null
+++ b/tests/qemu-iotests/039
@@ -0,0 +1,352 @@
+#!/usr/bin/env python
+#
+# Tests for image mirroring.
+#
+# Copyright (C) 2012 IBM Corp.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import time
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+import struct
+
+backing_img = os.path.join(iotests.test_dir, 'backing.img')
+target_backing_img = os.path.join(iotests.test_dir, 'target-backing.img')
+test_img = os.path.join(iotests.test_dir, 'test.img')
+target_img = os.path.join(iotests.test_dir, 'target.img')
+
+class ImageMirroringTestCase(iotests.QMPTestCase):
+    '''Abstract base class for image mirroring test cases'''
+
+    def assert_no_active_mirrors(self):
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return', [])
+
+    def cancel_and_wait(self, drive='drive0', wait_ready=True):
+        '''Cancel a block job and wait for it to finish'''
+        if wait_ready:
+            ready = False
+            while not ready:
+                for event in self.vm.get_qmp_events(wait=True):
+                    if event['event'] == 'BLOCK_JOB_READY':
+                        self.assert_qmp(event, 'data/type', 'mirror')
+                        self.assert_qmp(event, 'data/device', drive)
+                        ready = True
+
+        result = self.vm.qmp('block-job-cancel', device=drive,
+                             force=not wait_ready)
+        self.assert_qmp(result, 'return', {})
+
+        cancelled = False
+        while not cancelled:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED' or \
+                   event['event'] == 'BLOCK_JOB_CANCELLED':
+                    self.assert_qmp(event, 'data/type', 'mirror')
+                    self.assert_qmp(event, 'data/device', drive)
+                    if wait_ready:
+                        self.assertEquals(event['event'], 'BLOCK_JOB_COMPLETED')
+                        self.assert_qmp(event, 'data/offset', self.image_len)
+                        self.assert_qmp(event, 'data/len', self.image_len)
+                    cancelled = True
+
+        self.assert_no_active_mirrors()
+
+    def complete_and_wait(self, drive='drive0', wait_ready=True):
+        '''Complete a block job and wait for it to finish'''
+        if wait_ready:
+            ready = False
+            while not ready:
+                for event in self.vm.get_qmp_events(wait=True):
+                    if event['event'] == 'BLOCK_JOB_READY':
+                        self.assert_qmp(event, 'data/type', 'mirror')
+                        self.assert_qmp(event, 'data/device', drive)
+                        ready = True
+
+        result = self.vm.qmp('block-job-complete', device=drive)
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assert_qmp(event, 'data/type', 'mirror')
+                    self.assert_qmp(event, 'data/device', drive)
+                    self.assert_qmp_absent(event, 'data/error')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_mirrors()
+
+    def create_image(self, name, size):
+        file = open(name, 'w')
+        i = 0
+        while i < size:
+            sector = struct.pack('>l504xl', i / 512, i / 512)
+            file.write(sector)
+            i = i + 512
+        file.close()
+
+    def compare_images(self, img1, img2):
+        try:
+            qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img1, img1 + '.raw')
+            qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img2, img2 + '.raw')
+            file1 = open(img1 + '.raw', 'r')
+            file2 = open(img2 + '.raw', 'r')
+            return file1.read() == file2.read()
+        finally:
+            if file1 is not None:
+                file1.close()
+            if file2 is not None:
+                file2.close()
+            try:
+                os.remove(img1 + '.raw')
+            except OSError:
+                pass
+            try:
+                os.remove(img2 + '.raw')
+            except OSError:
+                pass
+
+class TestSingleDrive(ImageMirroringTestCase):
+    image_len = 1 * 1024 * 1024 # MB
+
+    def setUp(self):
+        self.create_image(backing_img, TestSingleDrive.image_len)
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(target_img)
+
+    def test_complete(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_cancel(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.cancel_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', test_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_pause(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-pause', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        offset = self.dictpath(result, 'return[0]/offset')
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/offset', offset)
+
+        result = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_large_cluster(self):
+        self.assert_no_active_mirrors()
+
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
+                        % (TestSingleDrive.image_len, mid_img), target_img)
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_image_not_found(self):
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'error/class', 'OpenFileFailed')
+
+        # Avoid failure on os.remove
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
+                        % (TestSingleDrive.image_len, test_img), target_img)
+
+    def test_device_not_found(self):
+        result = self.vm.qmp('drive-mirror', device='nonexistent', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+
+        # Avoid failure on os.remove
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
+                        % (TestSingleDrive.image_len, test_img), target_img)
+
+class TestMirrorNoBacking(ImageMirroringTestCase):
+    image_len = 2 * 1024 * 1024 # MB
+
+    def complete_and_wait(self, drive='drive0', wait_ready=True):
+        self.create_image(target_backing_img, TestMirrorNoBacking.image_len)
+        return ImageMirroringTestCase.complete_and_wait(self, drive, wait_ready)
+
+    def compare_images(self, img1, img2):
+        self.create_image(target_backing_img, TestMirrorNoBacking.image_len)
+        return ImageMirroringTestCase.compare_images(self, img1, img2)
+
+    def setUp(self):
+        self.create_image(backing_img, TestMirrorNoBacking.image_len)
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(target_backing_img)
+        os.remove(target_img)
+
+    def test_complete(self):
+        self.assert_no_active_mirrors()
+
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, target_img)
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_cancel(self):
+        self.assert_no_active_mirrors()
+
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, target_img)
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.cancel_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', test_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+class TestSetSpeed(ImageMirroringTestCase):
+    image_len = 80 * 1024 * 1024 # MB
+
+    def setUp(self):
+        qemu_img('create', backing_img, str(TestSetSpeed.image_len))
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(target_img)
+
+    def test_set_speed(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        # Default speed is 0
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 0)
+
+        result = self.vm.qmp('block-job-set-speed', device='drive0', speed=8 * 1024 * 1024)
+        self.assert_qmp(result, 'return', {})
+
+        # Ensure the speed we set was accepted
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 8 * 1024 * 1024)
+
+        self.cancel_and_wait()
+
+        # Check setting speed in drive-mirror works
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img, speed=4*1024*1024)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 4 * 1024 * 1024)
+
+        self.cancel_and_wait()
+
+    def test_set_speed_invalid(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img, speed=-1)
+        self.assert_qmp(result, 'error/class', 'InvalidParameter')
+        self.assert_qmp(result, 'error/data/name', 'speed')
+
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-set-speed', device='drive0', speed=-1)
+        self.assert_qmp(result, 'error/class', 'InvalidParameter')
+        self.assert_qmp(result, 'error/data/name', 'speed')
+
+        self.cancel_and_wait()
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index a569bc9..13ed177 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -45,3 +45,4 @@
 036 rw auto quick
 037 rw auto backing
 038 rw auto backing
+039 rw auto backing
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 32/47] block: forward bdrv_iostatus_reset to block job
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (30 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 31/47] qemu-iotests: add mirroring test case Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 33/47] mirror: add support for on-source-error/on-target-error Paolo Bonzini
                   ` (15 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This will be needed for mirroring, where the source and target have
different iostatuses.  The source iostatus is reported normally
in "query-block", while the target iostatus is accessible via
"query-block-jobs".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c    |    3 +++
 blockjob.c |    7 +++++++
 blockjob.h |   12 ++++++++++++
 3 files changed, 22 insertions(+)

diff --git a/block.c b/block.c
index 81c2bc5..65d9bed 100644
--- a/block.c
+++ b/block.c
@@ -3931,6 +3931,9 @@ void bdrv_iostatus_reset(BlockDriverState *bs)
 {
     if (bdrv_iostatus_is_enabled(bs)) {
         bs->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
+        if (bs->job) {
+            block_job_iostatus_reset(bs->job);
+        }
     }
 }
 
diff --git a/blockjob.c b/blockjob.c
index 42485bf..1770a02 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -139,6 +139,13 @@ bool block_job_is_cancelled(BlockJob *job)
     return job->cancelled;
 }
 
+void block_job_iostatus_reset(BlockJob *job)
+{
+    if (job->job_type->iostatus_reset) {
+        job->job_type->iostatus_reset(job);
+    }
+}
+
 struct BlockCancelData {
     BlockJob *job;
     BlockDriverCompletionFunc *cb;
diff --git a/blockjob.h b/blockjob.h
index 0d29fdf..a1fae9d 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -42,6 +42,9 @@ typedef struct BlockJobType {
     /** Optional callback for job types that support setting a speed limit */
     void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
 
+    /** Optional callback for job types that need to forward I/O status reset */
+    void (*iostatus_reset)(BlockJob *job);
+
     /**
      * Optional callback for job types that can fill the target member
      * of BlockJobInfo.
@@ -257,6 +260,15 @@ bool block_job_is_paused(BlockJob *job);
 int block_job_cancel_sync(BlockJob *job);
 
 /**
+ * block_job_iostatus_reset:
+ * @job: The job whose I/O status should be reset.
+ *
+ * Reset I/O status on BlockDriverState objects used by @job, other
+ * than job->bs.
+ */
+void block_job_iostatus_reset(BlockJob *job);
+
+/**
  * block_job_error_action:
  * @job: The job to signal an error for.
  * @bs: The block device on which to set an I/O error.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 33/47] mirror: add support for on-source-error/on-target-error
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (31 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 32/47] block: forward bdrv_iostatus_reset to block job Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-27 15:26   ` Eric Blake
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 34/47] qmp: add pull_event function Paolo Bonzini
                   ` (14 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Error management is important for mirroring; otherwise, an error on the
target (even something as "innocent" as ENOSPC) requires to start again
with a full copy.  Similar to on_read_error/on_write_error, two separate
knobs are provided for on_source_error (reads) and on_target_error (writes).
The default is 'report' for both.

The 'ignore' policy will leave the sector dirty, so that it will be
retried later.  Thus, it will not cause corruption.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c   |   50 +++++++++++++++++++++++++++++++++++++++++++++-----
 block_int.h      |    4 ++++
 blockdev.c       |   14 ++++++++++++--
 hmp.c            |    3 ++-
 qapi-schema.json |   12 +++++++++++-
 qmp-commands.hx  |    8 +++++++-
 6 files changed, 81 insertions(+), 10 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 4454ef3..fb54d27 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -32,13 +32,15 @@ typedef struct MirrorBlockJob {
     RateLimit limit;
     BlockDriverState *target;
     MirrorSyncMode mode;
+    BlockdevOnError on_source_error, on_target_error;
     bool synced;
     bool complete;
     int64_t sector_num;
     uint8_t *buf;
 } MirrorBlockJob;
 
-static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
+static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
+                                         BlockErrorAction *p_action)
 {
     BlockDriverState *source = s->common.bs;
     BlockDriverState *target = s->target;
@@ -60,9 +62,24 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
     trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
     ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
     if (ret < 0) {
-        return ret;
+        *p_action = block_job_error_action(&s->common, source,
+                                           s->on_source_error, true, -ret);
+        s->synced = false;
+        goto fail;
     }
-    return bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+    ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+    if (ret < 0) {
+        *p_action = block_job_error_action(&s->common, target,
+                                           s->on_target_error, false, -ret);
+        s->synced = false;
+        goto fail;
+    }
+    return 0;
+
+fail:
+    /* Try again later.  */
+    bdrv_set_dirty(source, s->sector_num, nb_sectors);
+    return ret;
 }
 
 static void coroutine_fn mirror_run(void *opaque)
@@ -118,8 +135,9 @@ static void coroutine_fn mirror_run(void *opaque)
 
         cnt = bdrv_get_dirty_count(bs);
         if (cnt != 0) {
-            ret = mirror_iteration(s);
-            if (ret < 0) {
+            BlockErrorAction action = BDRV_ACTION_REPORT;
+            ret = mirror_iteration(s, &action);
+            if (ret < 0 && action == BDRV_ACTION_REPORT) {
                 break;
             }
             cnt = bdrv_get_dirty_count(bs);
@@ -193,6 +211,7 @@ static void coroutine_fn mirror_run(void *opaque)
 immediate_exit:
     g_free(s->buf);
     bdrv_set_dirty_tracking(bs, false);
+    bdrv_iostatus_disable(s->target);
     if (s->complete && ret == 0) {
         bdrv_swap(s->target, s->common.bs);
     } else {
@@ -213,6 +232,13 @@ static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
     ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
 }
 
+static void mirror_iostatus_reset(BlockJob *job)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+    bdrv_iostatus_reset(s->target);
+}
+
 static void mirror_query(BlockJob *job, BlockJobInfo *info)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
@@ -249,25 +275,39 @@ static BlockJobType mirror_job_type = {
     .instance_size = sizeof(MirrorBlockJob),
     .job_type      = "mirror",
     .set_speed     = mirror_set_speed,
+    .iostatus_reset= mirror_iostatus_reset,
     .query         = mirror_query,
     .complete      = mirror_complete,
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
                   int64_t speed, MirrorSyncMode mode,
+                  BlockdevOnError on_source_error,
+                  BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp)
 {
     MirrorBlockJob *s;
 
+    if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
+         on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
+        !bdrv_iostatus_is_enabled(bs)) {
+        error_set(errp, QERR_INVALID_PARAMETER, "on-source-error");
+        return;
+    }
+
     s = block_job_create(&mirror_job_type, bs, speed, cb, opaque, errp);
     if (!s) {
         return;
     }
 
+    s->on_source_error = on_source_error;
+    s->on_target_error = on_target_error;
     s->target = target;
     s->mode = mode;
     bdrv_set_dirty_tracking(bs, true);
+    bdrv_set_on_error(s->target, on_target_error, on_target_error);
+    bdrv_iostatus_enable(s->target);
     s->common.co = qemu_coroutine_create(mirror_run);
     trace_mirror_start(bs, s, s->common.co, opaque);
     qemu_coroutine_enter(s->common.co, s);
diff --git a/block_int.h b/block_int.h
index 714f939..0de2fdc 100644
--- a/block_int.h
+++ b/block_int.h
@@ -311,6 +311,8 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @target: Block device to write to.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @mode: Whether to collapse all images in the chain to the target.
+ * @on_source_error: The action to take upon error reading from the source.
+ * @on_target_error: The action to take upon error writing to the target.
  * @cb: Completion function for the job.
  * @opaque: Opaque pointer value passed to @cb.
  * @errp: Error object.
@@ -322,6 +324,8 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  */
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
                   int64_t speed, MirrorSyncMode mode,
+                  BlockdevOnError on_source_error,
+                  BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
diff --git a/blockdev.c b/blockdev.c
index 4b4574a..eb528cd 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -831,7 +831,10 @@ void qmp_drive_mirror(const char *device, const char *target,
                       bool has_format, const char *format,
                       enum MirrorSyncMode sync,
                       bool has_mode, enum NewImageMode mode,
-                      bool has_speed, int64_t speed, Error **errp)
+                      bool has_speed, int64_t speed,
+                      bool has_on_source_error, BlockdevOnError on_source_error,
+                      bool has_on_target_error, BlockdevOnError on_target_error,
+                      Error **errp)
 {
     BlockDriverInfo bdi;
     BlockDriverState *bs;
@@ -846,6 +849,12 @@ void qmp_drive_mirror(const char *device, const char *target,
     if (!has_speed) {
         speed = 0;
     }
+    if (!has_on_source_error) {
+        on_source_error = BLOCKDEV_ON_ERROR_REPORT;
+    }
+    if (!has_on_target_error) {
+        on_target_error = BLOCKDEV_ON_ERROR_REPORT;
+    }
     if (!has_mode) {
         mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
     }
@@ -938,7 +947,8 @@ void qmp_drive_mirror(const char *device, const char *target,
         }
     }
 
-    mirror_start(bs, target_bs, speed, sync, block_job_cb, bs, &local_err);
+    mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
+                 block_job_cb, bs, &local_err);
     if (local_err != NULL) {
         bdrv_delete(target_bs);
         error_propagate(errp, local_err);
diff --git a/hmp.c b/hmp.c
index 1d6a606..b6bc263 100644
--- a/hmp.c
+++ b/hmp.c
@@ -719,7 +719,8 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
 
     qmp_drive_mirror(device, filename, !!format, format,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-                     true, mode, false, 0, &errp);
+                     true, mode, false, 0,
+                     false, 0, false, 0, &errp);
     hmp_handle_error(mon, &errp);
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index f5c20ca..7a97fad 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1389,17 +1389,27 @@
 #        (all the disk, only the sectors allocated in the topmost image, or
 #        only new I/O).
 #
+# @on-source-error: #optional the action to take on an error on the source,
+#                   default 'report'.  'stop' and 'enospc' can only be used
+#                   if the block device supports io-status (see BlockInfo).
+#
+# @on-target-error: #optional the action to take on an error on the target,
+#                   default 'report' (no limitations, since this applies to
+#                   a different block device than @device).
+#
 # Returns: nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #          If @target can't be opened, OpenFileFailed
 #          If @format is invalid, InvalidBlockFormat
+#          If @on_source_error is not supported, InvalidParameter
 #
 # Since 1.2
 ##
 { 'command': 'drive-mirror',
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
             'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
-            '*speed': 'int' } }
+            '*speed': 'int', '*on-source-error': 'BlockdevOnError',
+            '*on-target-error': 'BlockdevOnError' } }
 
 ##
 # @migrate_cancel
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7a32bb6..5081b01 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -838,7 +838,8 @@ EQMP
 
     {
         .name       = "drive-mirror",
-        .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?",
+        .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
+                      "on-source-error:s?,on-target-error:s?",
         .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
     },
 
@@ -866,6 +867,11 @@ Arguments:
   possibilities include "full" for all the disk, "top" for only the sectors
   allocated in the topmost image, or "none" to only replicate new I/O
   (MirrorSyncMode).
+- "on-source-error": the action to take on an error on the source
+  (BlockdevOnError, default 'report')
+- "on-target-error": the action to take on an error on the target
+  (BlockdevOnError, default 'report')
+
 
 
 Example:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 34/47] qmp: add pull_event function
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (32 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 33/47] mirror: add support for on-source-error/on-target-error Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 35/47] qemu-iotests: add testcases for mirroring on-source-error/on-target-error Paolo Bonzini
                   ` (13 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha, Luiz Capitulino

This function is unlike get_events, in that it makes it easy to process
one event at a time.  This is useful in the mirroring test cases, where
we want to process just one event (BLOCK_JOB_ERROR) and leave the others
to a helper function.

Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 QMP/qmp.py |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/QMP/qmp.py b/QMP/qmp.py
index bd83a21..db1b76a 100644
--- a/QMP/qmp.py
+++ b/QMP/qmp.py
@@ -136,6 +136,26 @@ class QEMUMonitorProtocol:
             raise Exception(ret['error']['desc'])
         return ret['return']
 
+    def pull_event(self, wait=False):
+        """
+        Get and delete the first available QMP event.
+
+        @param wait: block until an event is available (bool)
+        """
+        self.__sock.setblocking(0)
+        try:
+            self.__json_read()
+        except socket.error, err:
+            if err[0] == errno.EAGAIN:
+                # No data available
+                pass
+        self.__sock.setblocking(1)
+        if not self.__events and wait:
+            self.__json_read(only_event=True)
+        event = self.__events[0]
+        del self.__events[0]
+        return event
+
     def get_events(self, wait=False):
         """
         Get a list of available QMP events.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 35/47] qemu-iotests: add testcases for mirroring on-source-error/on-target-error
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (33 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 34/47] qmp: add pull_event function Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 36/47] host-utils: add ffsl and flsl Paolo Bonzini
                   ` (12 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

The new options are tested with blkdebug on both the source and the
target.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/039        |  257 +++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/iotests.py |    4 +
 2 files changed, 261 insertions(+)

diff --git a/tests/qemu-iotests/039 b/tests/qemu-iotests/039
index c512f14..3e17881 100755
--- a/tests/qemu-iotests/039
+++ b/tests/qemu-iotests/039
@@ -280,6 +280,263 @@ class TestMirrorNoBacking(ImageMirroringTestCase):
         self.assertTrue(self.compare_images(test_img, target_img),
                         'target image does not match source after mirroring')
 
+class TestReadErrors(ImageMirroringTestCase):
+    image_len = 2 * 1024 * 1024 # MB
+
+    # this should be a multiple of twice the default granularity
+    # so that we hit this offset first in state 1
+    MIRROR_GRANULARITY = 1024 * 1024
+
+    def create_blkdebug_file(self, name, event, errno):
+        file = open(name, 'w')
+        file.write('''
+[inject-error]
+state = "1"
+event = "%s"
+errno = "%d"
+immediately = "off"
+once = "on"
+sector = "%d"
+
+[set-state]
+state = "1"
+event = "%s"
+new_state = "2"
+
+[set-state]
+state = "2"
+event = "%s"
+new_state = "1"
+''' % (event, errno, self.MIRROR_GRANULARITY / 512, event, event))
+        file.close()
+
+    def setUp(self):
+        self.blkdebug_file = backing_img + ".blkdebug"
+        self.create_image(backing_img, TestReadErrors.image_len)
+        self.create_blkdebug_file(self.blkdebug_file, "read_aio", 5)
+        qemu_img('create', '-f', iotests.imgfmt,
+                 '-o', 'backing_file=blkdebug:%s:%s,backing_fmt=raw'
+                       % (self.blkdebug_file, backing_img),
+                 test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(self.blkdebug_file)
+
+    def test_report_read(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        error = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_READY':
+                    self.assertTrue(False, 'job completed unexpectedly')
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'mirror')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/error', 'Input/output error')
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_mirrors()
+        self.vm.shutdown()
+
+    def test_ignore_read(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img, on_source_error='ignore')
+        self.assert_qmp(result, 'return', {})
+
+        event = self.vm.get_qmp_event(wait=True)
+        self.assertEquals(event['event'], 'BLOCK_JOB_ERROR')
+        self.assert_qmp(event, 'data/device', 'drive0')
+        self.assert_qmp(event, 'data/operation', 'read')
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/paused', False)
+        self.complete_and_wait()
+        self.vm.shutdown()
+
+    def test_stop_read(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img, on_source_error='stop')
+        self.assert_qmp(result, 'return', {})
+
+        error = False
+        ready = False
+        while not ready:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', True)
+                    self.assert_qmp(result, 'return[0]/io-status', 'failed')
+
+                    result = self.vm.qmp('block-job-resume', device='drive0')
+                    self.assert_qmp(result, 'return', {})
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_READY':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    ready = True
+
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/paused', False)
+        self.assert_qmp(result, 'return[0]/io-status', 'ok')
+
+        self.complete_and_wait(wait_ready=False)
+        self.assert_no_active_mirrors()
+        self.vm.shutdown()
+
+class TestWriteErrors(ImageMirroringTestCase):
+    image_len = 2 * 1024 * 1024 # MB
+
+    # this should be a multiple of twice the default granularity
+    # so that we hit this offset first in state 1
+    MIRROR_GRANULARITY = 1024 * 1024
+
+    def create_blkdebug_file(self, name, event, errno):
+        file = open(name, 'w')
+        file.write('''
+[inject-error]
+state = "1"
+event = "%s"
+errno = "%d"
+immediately = "off"
+once = "on"
+sector = "%d"
+
+[set-state]
+state = "1"
+event = "%s"
+new_state = "2"
+
+[set-state]
+state = "2"
+event = "%s"
+new_state = "1"
+''' % (event, errno, self.MIRROR_GRANULARITY / 512, event, event))
+        file.close()
+
+    def setUp(self):
+        self.blkdebug_file = target_img + ".blkdebug"
+        self.create_image(backing_img, TestWriteErrors.image_len)
+        self.create_blkdebug_file(self.blkdebug_file, "write_aio", 5)
+        qemu_img('create', '-f', iotests.imgfmt, '-obacking_file=%s' %(backing_img), test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.target_img = 'blkdebug:%s:%s' % (self.blkdebug_file, target_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-osize=%d' %(TestWriteErrors.image_len), target_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(self.blkdebug_file)
+
+    def test_report_write(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=self.target_img)
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        error = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'write')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_READY':
+                    self.assertTrue(False, 'job completed unexpectedly')
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'mirror')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/error', 'Input/output error')
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_mirrors()
+        self.vm.shutdown()
+
+    def test_ignore_write(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=self.target_img,
+                             on_target_error='ignore')
+        self.assert_qmp(result, 'return', {})
+
+        event = self.vm.get_qmp_event(wait=True)
+        self.assertEquals(event['event'], 'BLOCK_JOB_ERROR')
+        self.assert_qmp(event, 'data/device', 'drive0')
+        self.assert_qmp(event, 'data/operation', 'write')
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/paused', False)
+        self.complete_and_wait()
+        self.vm.shutdown()
+
+    def test_stop_write(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=self.target_img,
+                             on_target_error='stop')
+        self.assert_qmp(result, 'return', {})
+
+        error = False
+        ready = False
+        while not ready:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'write')
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', True)
+                    self.assert_qmp(result, 'return[0]/io-status', 'ok')
+                    self.assert_qmp(result,
+                                    'return[0]/target/info/io-status', 'failed')
+
+                    result = self.vm.qmp('block-job-resume', device='drive0')
+                    self.assert_qmp(result, 'return', {})
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', False)
+                    self.assert_qmp(result, 'return[0]/io-status', 'ok')
+                    self.assert_qmp(result,
+                                    'return[0]/target/info/io-status', 'failed')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_READY':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    ready = True
+
+        self.complete_and_wait(wait_ready=False)
+        self.assert_no_active_mirrors()
+        self.vm.shutdown()
+
 class TestSetSpeed(ImageMirroringTestCase):
     image_len = 80 * 1024 * 1024 # MB
 
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 3c60b2d..735c674 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -106,6 +106,10 @@ class VM(object):
 
         return self._qmp.cmd(cmd, args=qmp_args)
 
+    def get_qmp_event(self, wait=False):
+        '''Poll for one queued QMP events and return it'''
+        return self._qmp.pull_event(wait=wait)
+
     def get_qmp_events(self, wait=False):
         '''Poll for queued QMP events and return a list of dicts'''
         events = self._qmp.get_events(wait=wait)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 36/47] host-utils: add ffsl and flsl
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (34 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 35/47] qemu-iotests: add testcases for mirroring on-source-error/on-target-error Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-27 16:05   ` Eric Blake
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 37/47] add hierarchical bitmap data type and test cases Paolo Bonzini
                   ` (11 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

We can provide fast versions based on the other functions defined
by host-utils.h.  Some care is required on glibc, which provides
ffsl already.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 host-utils.h |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/host-utils.h b/host-utils.h
index 821db93..4250eb0 100644
--- a/host-utils.h
+++ b/host-utils.h
@@ -24,6 +24,7 @@
  */
 
 #include "compiler.h"   /* QEMU_GNUC_PREREQ */
+#include <string.h>     /* ffsl */
 
 #if defined(__x86_64__)
 #define __HAVE_FAST_MULU64__
@@ -234,3 +235,47 @@ static inline int ctpop64(uint64_t val)
     return val;
 #endif
 }
+
+/* glibc does not provide an inline version of ffsl, so always define
+ * ours.
+ */
+#ifdef __GLIBC__
+#define ffsl qemu_ffsl
+#endif
+static inline int ffsl(long val)
+{
+    if (!val) {
+        return 0;
+    }
+
+#if QEMU_GNUC_PREREQ(3, 4)
+    return __builtin_ctzl(val) + 1;
+#else
+    if (sizeof(long) == 4) {
+        return ctz32(val) + 1;
+    } else if (sizeof(long) == 8) {
+        return ctz64(val) + 1;
+    } else {
+        abort();
+    }
+#endif
+}
+
+static inline int flsl(long val)
+{
+    if (!val) {
+        return 0;
+    }
+
+#if QEMU_GNUC_PREREQ(3, 4)
+    return __builtin_clzl(val) + 1;
+#else
+    if (sizeof(long) == 4) {
+        return clz32(val) + 1;
+    } else if (sizeof(long) == 8) {
+        return clz64(val) + 1;
+    } else {
+        abort();
+    }
+#endif
+}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 37/47] add hierarchical bitmap data type and test cases
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (35 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 36/47] host-utils: add ffsl and flsl Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-28 13:26   ` Eric Blake
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 38/47] block: implement dirty bitmap using HBitmap Paolo Bonzini
                   ` (10 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

HBitmaps provide an array of bits.  The bits are stored as usual in an
array of unsigned longs, but HBitmap is also optimized to provide fast
iteration over set bits; going from one bit to the next is O(logB n)
worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
that the number of levels is in fact fixed.

In order to do this, it stacks multiple bitmaps with progressively coarser
granularity; in all levels except the last, bit N is set iff the N-th
unsigned long is nonzero in the immediately next level.  When iteration
completes on the last level it can examine the 2nd-last level to quickly
skip entire words, and even do so recursively to skip blocks of 64 words or
powers thereof (32 on 32-bit machines).

Given an index in the bitmap, it can be split in group of bits like
this (for the 64-bit case):

     bits 0-57 => word in the last bitmap     | bits 58-63 => bit in the word
     bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
     bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word

So it is easy to move up simply by shifting the index right by
log2(BITS_PER_LONG) bits.  To move down, you shift the index left
similarly, and add the word index within the group.  Iteration uses
ffs (find first set bit) to find the next word to examine; this
operation can be done in constant time in most current architectures.

Setting or clearing a range of m bits on all levels, the work to perform
is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.

When iterating on a bitmap, each bit (on any level) is only visited
once.  Hence, the total cost of visiting a bitmap with m bits in it is
the number of bits that are set in all bitmaps.  Unless the bitmap is
extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
cost of advancing from one bit to the next is usually constant.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hbitmap.c            |  394 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hbitmap.h            |   51 +++++++
 tests/Makefile       |    2 +
 tests/test-hbitmap.c |  384 ++++++++++++++++++++++++++++++++++++++++++++++++
 trace-events         |    5 +
 5 files changed, 836 insertions(+)
 create mode 100644 hbitmap.c
 create mode 100644 hbitmap.h
 create mode 100644 tests/test-hbitmap.c

diff --git a/hbitmap.c b/hbitmap.c
new file mode 100644
index 0000000..cf14751
--- /dev/null
+++ b/hbitmap.c
@@ -0,0 +1,394 @@
+/*
+ * Hierarchical Bitmap Data Type
+ *
+ * Copyright Red Hat, Inc., 2012
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "osdep.h"
+#include "hbitmap.h"
+#include "host-utils.h"
+#include "trace.h"
+#include <string.h>
+#include <glib.h>
+#include <assert.h>
+
+/* HBitmaps provides an array of bits.  The bits are stored as usual in an
+ * array of unsigned longs, but HBitmap is also optimized to provide fast
+ * iteration over set bits; going from one bit to the next is O(logB n)
+ * worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
+ * that the number of levels is in fact fixed.
+ *
+ * In order to do this, it stacks multiple bitmaps with progressively coarser
+ * granularity; in all levels except the last, bit N is set iff the N-th
+ * unsigned long is nonzero in the immediately next level.  When iteration
+ * completes on the last level it can examine the 2nd-last level to quickly
+ * skip entire words, and even do so recursively to skip blocks of 64 words or
+ * powers thereof (32 on 32-bit machines).
+ *
+ * Given an index in the bitmap, it can be split in group of bits like
+ * this (for the 64-bit case):
+ *
+ *   bits 0-57 => word in the last bitmap     | bits 58-63 => bit in the word
+ *   bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
+ *   bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word
+ *
+ * So it is easy to move up simply by shifting the index right by
+ * log2(BITS_PER_LONG) bits.  To move down, you shift the index left
+ * similarly, and add the word index within the group.  Iteration uses
+ * ffs (find first set bit) to find the next word to examine; this
+ * operation can be done in constant time in most current architectures.
+ *
+ * Setting or clearing a range of m bits on all levels, the work to perform
+ * is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.
+ *
+ * When iterating on a bitmap, each bit (on any level) is only visited
+ * once.  Hence, the total cost of visiting a bitmap with m bits in it is
+ * the number of bits that are set in all bitmaps.  Unless the bitmap is
+ * extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
+ * cost of advancing from one bit to the next is usually constant (worst case
+ * O(logB n) as in the non-amortized complexity).
+ */
+
+struct HBitmap {
+    /* Number of total bits in the bottom level.  */
+    uint64_t size;
+
+    /* Number of set bits in the bottom level.  */
+    uint64_t count;
+
+    /* A scaling factor.  When setting or resetting bits, the bitmap will
+     * scale bit numbers right by this amount of bits.  When iterating,
+     * the bitmap will scale bit numbers left by this amonut of bits.
+     * Example of operations in a size-16, granularity-1 HBitmap:
+     *
+     *    initial state            00000000
+     *    set(start=0, count=9)    11111000 (iter: 0, 2, 4, 6, 8)
+     *    reset(start=1, count=3)  00111000 (iter: 4, 6, 8)
+     *    set(start=9, count=2)    00111100 (iter: 4, 6, 8, 10)
+     *    reset(start=5, count=5)  00000000
+     */
+    int granularity;
+
+    /* A number of progressively less coarse bitmaps (i.e. level 0 is the
+     * coarsest).  Each bit in level N represents a word in level N+1 that
+     * has a set bit, except the last level where each bit represents the
+     * actual bitmap.
+     */
+    unsigned long *levels[HBITMAP_LEVELS];
+};
+
+static int64_t hbi_next_internal(HBitmapIter *hbi)
+{
+    unsigned long cur = hbi->cur[HBITMAP_LEVELS - 1];
+    size_t pos = hbi->pos;
+
+    if (cur == 0) {
+        HBitmap *hb = hbi->hb;
+        int i = HBITMAP_LEVELS - 1;
+
+        do {
+            cur = hbi->cur[--i];
+            pos >>= BITS_PER_LEVEL;
+        } while (cur == 0);
+
+        /* Check for end of iteration.  We only use up to
+         * BITS_PER_LEVEL bits (actually less) in the level 0 bitmap,
+         * and a sentinel is placed in hbitmap_alloc that ends the
+         * above loop.
+         */
+
+        if (i == 0 && (cur & (BITS_PER_LONG - 1)) == 0) {
+            return -1;
+        }
+        for (; i < HBITMAP_LEVELS - 1; i++) {
+            /* Find least significant set bit in the word, use them
+             * to add back shifted out bits to pos.
+             */
+            pos = (pos << BITS_PER_LEVEL) + ffsl(cur) - 1;
+            hbi->cur[i] = cur & (cur - 1);
+
+            /* Set up next level for iteration.  */
+            cur = hb->levels[i + 1][pos];
+        }
+
+        hbi->pos = pos;
+        hbi->cur[HBITMAP_LEVELS - 1] = cur & (cur - 1);
+    } else {
+        hbi->cur[HBITMAP_LEVELS - 1] &= cur - 1;
+    }
+    return ((uint64_t)pos << BITS_PER_LEVEL) + ffsl(cur) - 1;
+}
+
+static inline int popcountl(unsigned long l)
+{
+    return BITS_PER_LONG == 32 ? ctpop32(l) : ctpop64(l);
+}
+
+static int hbi_count_towards(HBitmapIter *hbi, uint64_t last)
+{
+    uint64_t next = hbi_next_internal(hbi);
+    int n;
+
+    /* Take it easy with the last few bits.  */
+    if (next >= (last & -BITS_PER_LONG)) {
+        return (next > last ? 0 : 1);
+    }
+
+    /* Process one word at a time, hbi_next_internal takes
+     * care of skipping large all-zero blocks.  Sum one to
+     * account for the value that was returned by next.
+     */
+    n = popcountl(hbi->cur[HBITMAP_LEVELS - 1]) + 1;
+    hbi->cur[HBITMAP_LEVELS - 1] = 0;
+    return n;
+}
+
+int64_t hbitmap_iter_next(HBitmapIter *hbi)
+{
+    int64_t next = hbi_next_internal(hbi);
+    trace_hbitmap_iter_next(hbi->hb, hbi, next << hbi->granularity, next);
+
+    return next << hbi->granularity;
+}
+
+void hbitmap_iter_init(HBitmapIter *hbi, HBitmap *hb, uint64_t first)
+{
+    int i, bit;
+    size_t pos;
+
+    hbi->hb = hb;
+    pos = first;
+    for (i = HBITMAP_LEVELS; --i >= 0; ) {
+        bit = pos & (BITS_PER_LONG - 1);
+        pos >>= BITS_PER_LEVEL;
+
+        /* Drop bits representing items before first.  */
+        hbi->cur[i] = hb->levels[i][pos] & ~((1UL << bit) - 1);
+
+        /* We have already added level i+1, so the lowest set bit has
+         * been processed.  Clear it.
+         */
+        if (i != HBITMAP_LEVELS - 1) {
+            hbi->cur[i] &= ~(1UL << bit);
+        }
+    }
+
+    hbi->pos = first >> BITS_PER_LEVEL;
+    hbi->granularity = hb->granularity;
+}
+
+bool hbitmap_empty(HBitmap *hb)
+{
+    return hb->count == 0;
+}
+
+uint64_t hbitmap_count(HBitmap *hb)
+{
+    return hb->count << hb->granularity;
+}
+
+/* Count the number of set bits between start and end, not accounting for
+ * the granularity.
+ */
+static int hb_count_between(HBitmap *hb, uint64_t start, uint64_t end)
+{
+    HBitmapIter hbi;
+    uint64_t count = 0, more;
+
+    hbitmap_iter_init(&hbi, hb, start);
+    do {
+        more = hbi_count_towards(&hbi, end);
+        count += more;
+    } while (more > 0);
+    return count;
+}
+
+/* Setting starts at the last layer and propagates up if an element
+ * changes from zero to non-zero.
+ */
+static inline bool hb_set_elem(unsigned long *elem, uint64_t start, uint64_t end)
+{
+    unsigned long mask;
+    bool changed;
+
+    assert((end & -BITS_PER_LONG) == (start & -BITS_PER_LONG));
+
+    mask = 2UL << (end & (BITS_PER_LONG - 1));
+    mask -= 1UL << (start & (BITS_PER_LONG - 1));
+    changed = (*elem == 0);
+    *elem |= mask;
+    return changed;
+}
+
+/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)... */
+static void hb_set_between(HBitmap *hb, int level, uint64_t start, uint64_t end)
+{
+    size_t pos = start >> BITS_PER_LEVEL;
+    size_t endpos = end >> BITS_PER_LEVEL;
+    bool changed = false;
+    size_t i;
+
+    i = pos;
+    if (i < endpos) {
+        uint64_t next = (start | (BITS_PER_LONG - 1)) + 1;
+        changed |= hb_set_elem(&hb->levels[level][i], start, next - 1);
+        for (;;) {
+            start = next;
+            next += BITS_PER_LONG;
+            if (++i == endpos) {
+                break;
+            }
+            changed |= (hb->levels[level][i] == 0);
+            hb->levels[level][i] = ~0UL;
+        }
+    }
+    changed |= hb_set_elem(&hb->levels[level][i], start, end);
+
+    /* If there was any change in this layer, we may have to update
+     * the one above.
+     */
+    if (level > 0 && changed) {
+        return hb_set_between(hb, level - 1, pos, endpos);
+    }
+}
+
+void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t count)
+{
+    /* Compute range in the last layer.  */
+    uint64_t last = start + count - 1;
+
+    trace_hbitmap_set(hb, start, count,
+                      start >> hb->granularity, last >> hb->granularity);
+
+    start >>= hb->granularity;
+    last >>= hb->granularity;
+    count = last - start + 1;
+
+    hb->count += count - hb_count_between(hb, start, last);
+    hb_set_between(hb, HBITMAP_LEVELS - 1, start, last);
+}
+
+/* Resetting works the other way round: propagate up if the new
+ * value is zero.
+ */
+static inline bool hb_reset_elem(unsigned long *elem, uint64_t start, uint64_t end)
+{
+    unsigned long mask;
+    bool blanked;
+
+    assert((end & -BITS_PER_LONG) == (start & -BITS_PER_LONG));
+
+    mask = 2UL << (end & (BITS_PER_LONG - 1));
+    mask -= 1UL << (start & (BITS_PER_LONG - 1));
+    blanked = *elem != 0 && ((*elem & ~mask) == 0);
+    *elem &= ~mask;
+    return blanked;
+}
+
+/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)... */
+static void hb_reset_between(HBitmap *hb, int level, uint64_t start, uint64_t end)
+{
+    size_t pos = start >> BITS_PER_LEVEL;
+    size_t endpos = end >> BITS_PER_LEVEL;
+    bool changed = false;
+    size_t i;
+
+    i = pos;
+    if (i < endpos) {
+        uint64_t next = (start | (BITS_PER_LONG - 1)) + 1;
+
+        /* Here we need a more complex test than when setting bits.  Even if
+         * something was changed, we must not blank bits in the upper level
+         * unless the lower-level word became entirely zero.  So, remove pos
+         * from the upper-level range if bits remain set.
+         */
+        if (hb_reset_elem(&hb->levels[level][i], start, next - 1)) {
+            changed = true;
+        } else {
+            pos++;
+        }
+
+        for (;;) {
+            start = next;
+            next += BITS_PER_LONG;
+            if (++i == endpos) {
+                break;
+            }
+            changed |= (hb->levels[level][i] != 0);
+            hb->levels[level][i] = 0UL;
+        }
+    }
+
+    /* Same as above, this time for endpos.  */
+    if (hb_reset_elem(&hb->levels[level][i], start, end)) {
+        changed = true;
+    } else {
+        endpos--;
+    }
+
+    if (level > 0 && changed) {
+        return hb_reset_between(hb, level - 1, pos, endpos);
+    }
+}
+
+void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count)
+{
+    /* Compute range in the last layer.  */
+    uint64_t last = start + count - 1;
+
+    trace_hbitmap_reset(hb, start, count,
+                        start >> hb->granularity, last >> hb->granularity);
+
+    start >>= hb->granularity;
+    last >>= hb->granularity;
+
+    hb->count -= hb_count_between(hb, start, last);
+    hb_reset_between(hb, HBITMAP_LEVELS - 1, start, last);
+}
+
+bool hbitmap_get(HBitmap *hb, uint64_t item)
+{
+    /* Compute position and bit in the last layer.  */
+    uint64_t pos = item >> hb->granularity;
+    unsigned long bit = 1UL << (pos & (BITS_PER_LONG - 1));
+
+    return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
+}
+
+void hbitmap_free(HBitmap *hb)
+{
+    int i;
+    for (i = HBITMAP_LEVELS; --i >= 0; ) {
+        g_free(hb->levels[i]);
+    }
+    g_free(hb);
+}
+
+HBitmap *hbitmap_alloc(uint64_t size, int granularity)
+{
+    HBitmap *hb = g_malloc0(sizeof(struct HBitmap));
+    int i;
+
+    assert(granularity >= 0 && granularity < 64);
+    size = (size + (1ULL << granularity) - 1) >> granularity;
+    assert(size <= ((uint64_t)1 << HBITMAP_LOG_MAX_SIZE));
+
+    hb->size = size;
+    hb->granularity = granularity;
+    for (i = HBITMAP_LEVELS; --i >= 0; ) {
+        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+        hb->levels[i] = g_malloc0(size * sizeof(unsigned long));
+    }
+
+    /* Add a sentinel in the level 0 bitmap.  We only use up to
+     * BITS_PER_LEVEL bits in level 0, so it's safe.
+     */
+    assert(size == 1);
+    hb->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
+    return hb;
+}
diff --git a/hbitmap.h b/hbitmap.h
new file mode 100644
index 0000000..2b717b0
--- /dev/null
+++ b/hbitmap.h
@@ -0,0 +1,51 @@
+/*
+ * Hierarchical Bitmap Data Type
+ *
+ * Copyright Red Hat, Inc., 2012
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef HBITMAP_H
+#define HBITMAP_H 1
+
+#include <limits.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include "bitops.h"
+
+typedef struct HBitmap HBitmap;
+typedef struct HBitmapIter HBitmapIter;
+
+#define BITS_PER_LEVEL         (BITS_PER_LONG == 32 ? 5 : 6)
+
+/* For 32-bit, the largest that fits in a 4 GiB address space.
+ * For 64-bit, the number of sectors in 1 PiB.  Good luck, in
+ * either case... :)
+ */
+#define HBITMAP_LOG_MAX_SIZE   (BITS_PER_LONG == 32 ? 34 : 41)
+
+/* Leave an extra bit for a sentinel.  */
+#define HBITMAP_LEVELS         ((HBITMAP_LOG_MAX_SIZE / BITS_PER_LEVEL) + 1)
+
+struct HBitmapIter {
+    HBitmap *hb;
+    size_t pos;
+    int granularity;
+    unsigned long cur[HBITMAP_LEVELS];
+};
+
+int64_t hbitmap_iter_next(HBitmapIter *hbi);
+void hbitmap_iter_init(HBitmapIter *hbi, HBitmap *hb, uint64_t first);
+bool hbitmap_empty(HBitmap *hb);
+uint64_t hbitmap_count(HBitmap *hb);
+void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t count);
+void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
+bool hbitmap_get(HBitmap *hb, uint64_t item);
+void hbitmap_free(HBitmap *hb);
+HBitmap *hbitmap_alloc(uint64_t size, int granularity);
+
+#endif
diff --git a/tests/Makefile b/tests/Makefile
index 9675ba7..7e3bfed 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -15,6 +15,7 @@ check-unit-y += tests/test-string-output-visitor$(EXESUF)
 check-unit-y += tests/test-coroutine$(EXESUF)
 check-unit-y += tests/test-visitor-serialization$(EXESUF)
 check-unit-y += tests/test-iov$(EXESUF)
+check-unit-y += tests/test-hbitmap$(EXESUF)
 
 check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
 
@@ -50,6 +51,7 @@ tests/check-qfloat$(EXESUF): tests/check-qfloat.o qfloat.o $(tools-obj-y)
 tests/check-qjson$(EXESUF): tests/check-qjson.o $(qobject-obj-y) $(tools-obj-y)
 tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(coroutine-obj-y) $(tools-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o iov.o
+tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o hbitmap.o $(trace-obj-y)
 
 tests/test-qapi-types.c tests/test-qapi-types.h :\
 $(SRC_PATH)/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
new file mode 100644
index 0000000..8a9b497
--- /dev/null
+++ b/tests/test-hbitmap.c
@@ -0,0 +1,384 @@
+/*
+ * Hierarchical bitmap unit-tests.
+ *
+ * Copyright (C) 2012 Red Hat Inc.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <stdarg.h>
+#include "hbitmap.h"
+
+#define LOG_BITS_PER_LONG          (BITS_PER_LONG == 32 ? 5 : 6)
+
+#define L1                         BITS_PER_LONG
+#define L2                         (BITS_PER_LONG * L1)
+#define L3                         (BITS_PER_LONG * L2)
+
+typedef struct TestHBitmapData {
+    HBitmap       *hb;
+    unsigned long *bits;
+    size_t         size;
+    int            granularity;
+} TestHBitmapData;
+
+
+/* Check that the HBitmap and the shadow bitmap contain the same data,
+ * ignoring the same "first" bits.
+ */
+static void hbitmap_test_check(TestHBitmapData *data,
+                               uint64_t first)
+{
+    uint64_t count = 0;
+    size_t pos;
+    int bit;
+    HBitmapIter hbi;
+    int64_t next;
+
+    hbitmap_iter_init(&hbi, data->hb, first);
+
+    for (;;) {
+        next = hbitmap_iter_next(&hbi);
+        if (next < 0) {
+            next = data->size;
+        }
+
+        while (first < next) {
+            pos = first >> LOG_BITS_PER_LONG;
+            bit = first & (BITS_PER_LONG - 1);
+            first++;
+            g_assert_cmpint(data->bits[pos] & (1UL << bit), ==, 0);
+        }
+
+        if (next == data->size) {
+            break;
+        }
+
+        pos = first >> LOG_BITS_PER_LONG;
+        bit = first & (BITS_PER_LONG - 1);
+        first++;
+        count++;
+        g_assert_cmpint(data->bits[pos] & (1UL << bit), !=, 0);
+    }
+
+    if (first == 0) {
+        g_assert_cmpint(count * data->granularity, ==, hbitmap_count(data->hb));
+    }
+}
+
+/* This is provided instead of a test setup function so that the sizes
+   are kept in the test functions (and not in main()) */
+static void hbitmap_test_init(TestHBitmapData *data,
+                              uint64_t size, int granularity)
+{
+    size_t n;
+    data->hb = hbitmap_alloc(size, granularity);
+
+    n = (size + BITS_PER_LONG - 1) / BITS_PER_LONG;
+    if (n == 0) {
+        n = 1;
+    }
+    data->bits = g_new0(unsigned long, n);
+    data->size = size;
+    data->granularity = granularity;
+    hbitmap_test_check(data, 0);
+}
+
+static void hbitmap_test_teardown(TestHBitmapData *data,
+                                  const void *unused)
+{
+    if (data->hb) {
+        hbitmap_free(data->hb);
+        data->hb = NULL;
+    }
+    if (data->bits) {
+        g_free(data->bits);
+        data->bits = NULL;
+    }
+}
+
+/* Set a range in the HBitmap and in the shadow "simple" bitmap.
+ * The two bitmaps are then tested against each other.
+ */
+static void hbitmap_test_set(TestHBitmapData *data,
+                             uint64_t first, uint64_t count)
+{
+    hbitmap_set(data->hb, first, count);
+    while (count-- != 0) {
+        size_t pos = first >> LOG_BITS_PER_LONG;
+        int bit = first & (BITS_PER_LONG - 1);
+        first++;
+
+        data->bits[pos] |= 1UL << bit;
+    }
+
+    if (data->granularity == 0) {
+        hbitmap_test_check(data, 0);
+    }
+}
+
+/* Reset a range in the HBitmap and in the shadow "simple" bitmap.
+ */
+static void hbitmap_test_reset(TestHBitmapData *data,
+                               uint64_t first, uint64_t count)
+{
+    hbitmap_reset(data->hb, first, count);
+    while (count-- != 0) {
+        size_t pos = first >> LOG_BITS_PER_LONG;
+        int bit = first & (BITS_PER_LONG - 1);
+        first++;
+
+        data->bits[pos] &= ~(1UL << bit);
+    }
+
+    if (data->granularity == 0) {
+        hbitmap_test_check(data, 0);
+    }
+}
+
+static void hbitmap_test_check_get(TestHBitmapData *data)
+{
+    uint64_t count = 0;
+    uint64_t i;
+
+    for (i = 0; i < data->size; i++) {
+        size_t pos = i >> LOG_BITS_PER_LONG;
+        int bit = i & (BITS_PER_LONG - 1);
+        unsigned long val = data->bits[pos] & (1UL << bit);
+        count += hbitmap_get(data->hb, i);
+        g_assert_cmpint(hbitmap_get(data->hb, i), ==, val != 0);
+    }
+    g_assert_cmpint(count, ==, hbitmap_count(data->hb));
+}
+
+static void test_hbitmap_zero(TestHBitmapData *data,
+                               const void *unused)
+{
+    hbitmap_test_init(data, 0, 0);
+}
+
+static void test_hbitmap_unaligned(TestHBitmapData *data,
+                                   const void *unused)
+{
+    hbitmap_test_init(data, L3 + 23, 0);
+    hbitmap_test_set(data, 0, 1);
+    hbitmap_test_set(data, L3 + 22, 1);
+}
+
+static void test_hbitmap_iter_empty(TestHBitmapData *data,
+                                    const void *unused)
+{
+    hbitmap_test_init(data, L1, 0);
+}
+
+static void test_hbitmap_iter_partial(TestHBitmapData *data,
+                                      const void *unused)
+{
+    hbitmap_test_init(data, L3, 0);
+    hbitmap_test_set(data, 0, L3);
+    hbitmap_test_check(data, 1);
+    hbitmap_test_check(data, L1 - 1);
+    hbitmap_test_check(data, L1);
+    hbitmap_test_check(data, L1 * 2 - 1);
+    hbitmap_test_check(data, L2 - 1);
+    hbitmap_test_check(data, L2);
+    hbitmap_test_check(data, L2 + 1);
+    hbitmap_test_check(data, L2 + L1);
+    hbitmap_test_check(data, L2 + L1 * 2 - 1);
+    hbitmap_test_check(data, L2 * 2 - 1);
+    hbitmap_test_check(data, L2 * 2);
+    hbitmap_test_check(data, L2 * 2 + 1);
+    hbitmap_test_check(data, L2 * 2 + L1);
+    hbitmap_test_check(data, L2 * 2 + L1 * 2 - 1);
+    hbitmap_test_check(data, L3 / 2);
+}
+
+static void test_hbitmap_iter_past(TestHBitmapData *data,
+                                    const void *unused)
+{
+    hbitmap_test_init(data, L3, 0);
+    hbitmap_test_set(data, 0, L3);
+    hbitmap_test_check(data, L3);
+}
+
+static void test_hbitmap_set_all(TestHBitmapData *data,
+                                 const void *unused)
+{
+    hbitmap_test_init(data, L3, 0);
+    hbitmap_test_set(data, 0, L3);
+}
+
+static void test_hbitmap_get_all(TestHBitmapData *data,
+                                 const void *unused)
+{
+    hbitmap_test_init(data, L3, 0);
+    hbitmap_test_set(data, 0, L3);
+    hbitmap_test_check_get(data);
+}
+
+static void test_hbitmap_get_some(TestHBitmapData *data,
+                                  const void *unused)
+{
+    hbitmap_test_init(data, 2 * L2, 0);
+    hbitmap_test_set(data, 10, 1);
+    hbitmap_test_check_get(data);
+    hbitmap_test_set(data, L1 - 1, 1);
+    hbitmap_test_check_get(data);
+    hbitmap_test_set(data, L1, 1);
+    hbitmap_test_check_get(data);
+    hbitmap_test_set(data, L2 - 1, 1);
+    hbitmap_test_check_get(data);
+    hbitmap_test_set(data, L2, 1);
+    hbitmap_test_check_get(data);
+}
+
+static void test_hbitmap_set_one(TestHBitmapData *data,
+                                 const void *unused)
+{
+    hbitmap_test_init(data, 2 * L2, 0);
+    hbitmap_test_set(data, 10, 1);
+    hbitmap_test_set(data, L1 - 1, 1);
+    hbitmap_test_set(data, L1, 1);
+    hbitmap_test_set(data, L2 - 1, 1);
+    hbitmap_test_set(data, L2, 1);
+}
+
+static void test_hbitmap_set_two_elem(TestHBitmapData *data,
+                                      const void *unused)
+{
+    hbitmap_test_init(data, 2 * L2, 0);
+    hbitmap_test_set(data, L1 - 1, 2);
+    hbitmap_test_set(data, L1 * 2 - 1, 4);
+    hbitmap_test_set(data, L1 * 4, L1 + 1);
+    hbitmap_test_set(data, L1 * 8 - 1, L1 + 1);
+    hbitmap_test_set(data, L2 - 1, 2);
+    hbitmap_test_set(data, L2 + L1 - 1, 8);
+    hbitmap_test_set(data, L2 + L1 * 4, L1 + 1);
+    hbitmap_test_set(data, L2 + L1 * 8 - 1, L1 + 1);
+}
+
+static void test_hbitmap_set(TestHBitmapData *data,
+                             const void *unused)
+{
+    hbitmap_test_init(data, L3 * 2, 0);
+    hbitmap_test_set(data, L1 - 1, L1 + 2);
+    hbitmap_test_set(data, L1 * 3 - 1, L1 + 2);
+    hbitmap_test_set(data, L1 * 5, L1 * 2 + 1);
+    hbitmap_test_set(data, L1 * 8 - 1, L1 * 2 + 1);
+    hbitmap_test_set(data, L2 - 1, L1 + 2);
+    hbitmap_test_set(data, L2 + L1 * 2 - 1, L1 + 2);
+    hbitmap_test_set(data, L2 + L1 * 4, L1 * 2 + 1);
+    hbitmap_test_set(data, L2 + L1 * 7 - 1, L1 * 2 + 1);
+    hbitmap_test_set(data, L2 * 2 - 1, L3 * 2 - L2 * 2);
+}
+
+static void test_hbitmap_set_overlap(TestHBitmapData *data,
+                                     const void *unused)
+{
+    hbitmap_test_init(data, L3 * 2, 0);
+    hbitmap_test_set(data, L1 - 1, L1 + 2);
+    hbitmap_test_set(data, L1 * 2 - 1, L1 * 2 + 2);
+    hbitmap_test_set(data, 0, L1 * 3);
+    hbitmap_test_set(data, L1 * 8 - 1, L2);
+    hbitmap_test_set(data, L2, L1);
+    hbitmap_test_set(data, L2 - L1 - 1, L1 * 8 + 2);
+    hbitmap_test_set(data, L2, L3 - L2 + 1);
+    hbitmap_test_set(data, L3 - L1, L1 * 3);
+    hbitmap_test_set(data, L3 - 1, 3);
+    hbitmap_test_set(data, L3 - 1, L2);
+}
+
+static void test_hbitmap_reset_empty(TestHBitmapData *data,
+                                     const void *unused)
+{
+    hbitmap_test_init(data, L3, 0);
+    hbitmap_test_reset(data, 0, L3);
+}
+
+static void test_hbitmap_reset(TestHBitmapData *data,
+                               const void *unused)
+{
+    hbitmap_test_init(data, L3 * 2, 0);
+    hbitmap_test_set(data, L1 - 1, L1 + 2);
+    hbitmap_test_reset(data, L1 * 2 - 1, L1 * 2 + 2);
+    hbitmap_test_set(data, 0, L1 * 3);
+    hbitmap_test_reset(data, L1 * 8 - 1, L2);
+    hbitmap_test_set(data, L2, L1);
+    hbitmap_test_reset(data, L2 - L1 - 1, L1 * 8 + 2);
+    hbitmap_test_set(data, L2, L3 - L2 + 1);
+    hbitmap_test_reset(data, L3 - L1, L1 * 3);
+    hbitmap_test_set(data, L3 - 1, 3);
+    hbitmap_test_reset(data, L3 - 1, L2);
+    hbitmap_test_set(data, 0, L3 * 2);
+    hbitmap_test_reset(data, 0, L1);
+    hbitmap_test_reset(data, 0, L2);
+    hbitmap_test_reset(data, L3, L3);
+    hbitmap_test_set(data, L3 / 2, L3);
+}
+
+static void test_hbitmap_granularity(TestHBitmapData *data,
+                                     const void *unused)
+{
+    /* Note that hbitmap_test_check has to be invoked manually in this test.  */
+    hbitmap_test_init(data, L1, 1);
+    hbitmap_test_set(data, 0, 1);
+    hbitmap_test_check(data, 0);
+    hbitmap_test_set(data, 2, 1);
+    hbitmap_test_check(data, 0);
+    hbitmap_test_set(data, 0, 3);
+    g_assert_cmpint(hbitmap_count(data->hb), ==, 4);
+    hbitmap_test_reset(data, 0, 1);
+    g_assert_cmpint(hbitmap_count(data->hb), ==, 2);
+}
+
+static void test_hbitmap_iter_granularity(TestHBitmapData *data,
+                                          const void *unused)
+{
+    HBitmapIter hbi;
+
+    /* Note that hbitmap_test_check has to be invoked manually in this test.  */
+    hbitmap_test_init(data, 131072 << 7, 7);
+    hbitmap_iter_init(&hbi, data->hb, 0);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+    hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
+    hbitmap_iter_init(&hbi, data->hb, 0);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
+}
+
+static void hbitmap_test_add(const char *testpath,
+                                   TestHBitmapData *data,
+                                   void (*test_func)(TestHBitmapData *data, const void *user_data))
+{
+    g_test_add(testpath, TestHBitmapData, data, NULL, test_func,
+               hbitmap_test_teardown);
+}
+
+int main(int argc, char **argv)
+{
+    TestHBitmapData hbitmap_data;
+
+    g_test_init(&argc, &argv, NULL);
+    hbitmap_test_add("/hbitmap/size/0", &hbitmap_data, test_hbitmap_zero);
+    hbitmap_test_add("/hbitmap/size/unaligned", &hbitmap_data, test_hbitmap_unaligned);
+    hbitmap_test_add("/hbitmap/iter/empty", &hbitmap_data, test_hbitmap_iter_empty);
+    hbitmap_test_add("/hbitmap/iter/past", &hbitmap_data, test_hbitmap_iter_past);
+    hbitmap_test_add("/hbitmap/iter/partial", &hbitmap_data, test_hbitmap_iter_partial);
+    hbitmap_test_add("/hbitmap/iter/granularity", &hbitmap_data, test_hbitmap_iter_granularity);
+    hbitmap_test_add("/hbitmap/get/all", &hbitmap_data, test_hbitmap_get_all);
+    hbitmap_test_add("/hbitmap/get/some", &hbitmap_data, test_hbitmap_get_some);
+    hbitmap_test_add("/hbitmap/set/all", &hbitmap_data, test_hbitmap_set_all);
+    hbitmap_test_add("/hbitmap/set/one", &hbitmap_data, test_hbitmap_set_one);
+    hbitmap_test_add("/hbitmap/set/two-elem", &hbitmap_data, test_hbitmap_set_two_elem);
+    hbitmap_test_add("/hbitmap/set/general", &hbitmap_data, test_hbitmap_set);
+    hbitmap_test_add("/hbitmap/set/overlap", &hbitmap_data, test_hbitmap_set_overlap);
+    hbitmap_test_add("/hbitmap/reset/empty", &hbitmap_data, test_hbitmap_reset_empty);
+    hbitmap_test_add("/hbitmap/reset/general", &hbitmap_data, test_hbitmap_reset);
+    hbitmap_test_add("/hbitmap/granularity", &hbitmap_data, test_hbitmap_granularity);
+    g_test_run();
+
+    return 0;
+}
diff --git a/trace-events b/trace-events
index ac58f3a..9313ae7 100644
--- a/trace-events
+++ b/trace-events
@@ -977,3 +977,8 @@ qxl_render_blit_guest_primary_initialized(void) ""
 qxl_render_blit(int32_t stride, int32_t left, int32_t right, int32_t top, int32_t bottom) "stride=%d [%d, %d, %d, %d]"
 qxl_render_guest_primary_resized(int32_t width, int32_t height, int32_t stride, int32_t bytes_pp, int32_t bits_pp) "%dx%d, stride %d, bpp %d, depth %d"
 qxl_render_update_area_done(void *cookie) "%p"
+
+# hbitmap.c
+hbitmap_iter_next(void *hb, void *hbi, int64_t item, int64_t bit) "hb %p hbi %p item %"PRId64" bit %"PRId64
+hbitmap_reset(void *hb, uint64_t start, uint64_t count, uint64_t sbit, uint64_t ebit) "hb %p items %"PRIu64",%"PRIu64" bits %"PRIu64"..%"PRIu64
+hbitmap_set(void *hb, uint64_t start, uint64_t count, uint64_t sbit, uint64_t ebit) "hb %p items %"PRIu64",%"PRIu64" bits %"PRIu64"..%"PRIu64
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 38/47] block: implement dirty bitmap using HBitmap
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (36 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 37/47] add hierarchical bitmap data type and test cases Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 39/47] block: make round_to_clusters public Paolo Bonzini
                   ` (9 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This starts using HBitmap for the dirty bitmap in the block layer,
and converts mirroring to use an HBitmapIter.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.objs  |    2 +-
 block.c        |   94 +++++++++-----------------------------------------------
 block.h        |    6 ++--
 block/mirror.c |   12 ++++++--
 block_int.h    |    4 +--
 trace-events   |    1 +
 6 files changed, 33 insertions(+), 86 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 67e9d8d..fa7f6d5 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -43,7 +43,7 @@ coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
 
 block-obj-y = cutils.o iov.o cache-utils.o qemu-option.o module.o async.o
 block-obj-y += nbd.o block.o blockjob.o aio.o aes.o qemu-config.o
-block-obj-y += qemu-progress.o qemu-sockets.o
+block-obj-y += qemu-progress.o qemu-sockets.o hbitmap.o
 block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
 block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
diff --git a/block.c b/block.c
index 65d9bed..54d0cec 100644
--- a/block.c
+++ b/block.c
@@ -1026,7 +1026,6 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
     bs_dest->iostatus           = bs_src->iostatus;
 
     /* dirty bitmap */
-    bs_dest->dirty_count        = bs_src->dirty_count;
     bs_dest->dirty_bitmap       = bs_src->dirty_bitmap;
 
     /* job */
@@ -1669,36 +1668,6 @@ int bdrv_read_unthrottled(BlockDriverState *bs, int64_t sector_num,
     return ret;
 }
 
-#define BITS_PER_LONG  (sizeof(unsigned long) * 8)
-
-static void set_dirty_bitmap(BlockDriverState *bs, int64_t sector_num,
-                             int nb_sectors, int dirty)
-{
-    int64_t start, end;
-    unsigned long val, idx, bit;
-
-    start = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
-    end = (sector_num + nb_sectors - 1) / BDRV_SECTORS_PER_DIRTY_CHUNK;
-
-    for (; start <= end; start++) {
-        idx = start / BITS_PER_LONG;
-        bit = start % BITS_PER_LONG;
-        val = bs->dirty_bitmap[idx];
-        if (dirty) {
-            if (!(val & (1UL << bit))) {
-                bs->dirty_count++;
-                val |= 1UL << bit;
-            }
-        } else {
-            if (val & (1UL << bit)) {
-                bs->dirty_count--;
-                val &= ~(1UL << bit);
-            }
-        }
-        bs->dirty_bitmap[idx] = val;
-    }
-}
-
 /* Return < 0 if error. Important errors are:
   -EIO         generic I/O error (may happen for all errors)
   -ENOMEDIUM   No media inserted.
@@ -3813,18 +3782,15 @@ void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
 {
     int64_t bitmap_size;
 
-    bs->dirty_count = 0;
     if (enable) {
         if (!bs->dirty_bitmap) {
-            bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS) +
-                    BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG - 1;
-            bitmap_size /= BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG;
-
-            bs->dirty_bitmap = g_new0(unsigned long, bitmap_size);
+            bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
+            bs->dirty_bitmap = hbitmap_alloc(bitmap_size,
+                                             BDRV_LOG_SECTORS_PER_DIRTY_CHUNK);
         }
     } else {
         if (bs->dirty_bitmap) {
-            g_free(bs->dirty_bitmap);
+            hbitmap_free(bs->dirty_bitmap);
             bs->dirty_bitmap = NULL;
         }
     }
@@ -3832,67 +3798,37 @@ void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
 
 int bdrv_get_dirty(BlockDriverState *bs, int64_t sector)
 {
-    int64_t chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
-
-    if (bs->dirty_bitmap &&
-        (sector << BDRV_SECTOR_BITS) < bdrv_getlength(bs)) {
-        return !!(bs->dirty_bitmap[chunk / BITS_PER_LONG] &
-            (1UL << (chunk % BITS_PER_LONG)));
+    if (bs->dirty_bitmap) {
+        return hbitmap_get(bs->dirty_bitmap, sector);
     } else {
         return 0;
     }
 }
 
-int64_t bdrv_get_next_dirty(BlockDriverState *bs, int64_t sector)
+void bdrv_dirty_iter_init(BlockDriverState *bs, HBitmapIter *hbi)
 {
-    int64_t chunk;
-    int bit, elem;
-
-    /* Avoid an infinite loop.  */
-    assert(bs->dirty_count > 0);
-
-    sector = (sector | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
-    chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
-
-    QEMU_BUILD_BUG_ON(sizeof(bs->dirty_bitmap[0]) * 8 != BITS_PER_LONG);
-    elem = chunk / BITS_PER_LONG;
-    bit = chunk % BITS_PER_LONG;
-    for (;;) {
-        if (sector >= bs->total_sectors) {
-            sector = 0;
-            bit = elem = 0;
-        }
-        if (bit == 0 && bs->dirty_bitmap[elem] == 0) {
-            sector += BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG;
-            elem++;
-        } else {
-            if (bs->dirty_bitmap[elem] & (1UL << bit)) {
-                return sector;
-            }
-            sector += BDRV_SECTORS_PER_DIRTY_CHUNK;
-            if (++bit == BITS_PER_LONG) {
-                bit = 0;
-                elem++;
-            }
-        }
-    }
+    hbitmap_iter_init(hbi, bs->dirty_bitmap, 0);
 }
 
 void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
                     int nr_sectors)
 {
-    set_dirty_bitmap(bs, cur_sector, nr_sectors, 1);
+    hbitmap_set(bs->dirty_bitmap, cur_sector, nr_sectors);
 }
 
 void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
                       int nr_sectors)
 {
-    set_dirty_bitmap(bs, cur_sector, nr_sectors, 0);
+    hbitmap_reset(bs->dirty_bitmap, cur_sector, nr_sectors);
 }
 
 int64_t bdrv_get_dirty_count(BlockDriverState *bs)
 {
-    return bs->dirty_count;
+    if (bs->dirty_bitmap) {
+        return hbitmap_count(bs->dirty_bitmap) >> BDRV_LOG_SECTORS_PER_DIRTY_CHUNK;
+    } else {
+        return 0;
+    }
 }
 
 void bdrv_set_in_use(BlockDriverState *bs, int in_use)
diff --git a/block.h b/block.h
index e7440f6..8212a0f 100644
--- a/block.h
+++ b/block.h
@@ -324,13 +324,15 @@ int bdrv_img_create(const char *filename, const char *fmt,
 void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
-#define BDRV_SECTORS_PER_DIRTY_CHUNK 2048
+#define BDRV_SECTORS_PER_DIRTY_CHUNK     (1 << BDRV_LOG_SECTORS_PER_DIRTY_CHUNK)
+#define BDRV_LOG_SECTORS_PER_DIRTY_CHUNK 11
 
+struct HBitmapIter;
 void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
 int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
 void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
 void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
-int64_t bdrv_get_next_dirty(BlockDriverState *bs, int64_t sector);
+void bdrv_dirty_iter_init(BlockDriverState *bs, struct HBitmapIter *hbi);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
diff --git a/block/mirror.c b/block/mirror.c
index fb54d27..c3340d1 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -36,6 +36,7 @@ typedef struct MirrorBlockJob {
     bool synced;
     bool complete;
     int64_t sector_num;
+    HBitmapIter hbi;
     uint8_t *buf;
 } MirrorBlockJob;
 
@@ -49,8 +50,15 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
     int64_t end;
     struct iovec iov;
 
+    s->sector_num = hbitmap_iter_next(&s->hbi);
+    if (s->sector_num < 0) {
+        bdrv_dirty_iter_init(source, &s->hbi);
+        s->sector_num = hbitmap_iter_next(&s->hbi);
+        trace_mirror_restart_iter(s, bdrv_get_dirty_count(source));
+        assert(s->sector_num >= 0);
+    }
+
     end = s->common.len >> BDRV_SECTOR_BITS;
-    s->sector_num = bdrv_get_next_dirty(source, s->sector_num);
     nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
     bdrv_reset_dirty(source, s->sector_num, nb_sectors);
 
@@ -127,7 +135,7 @@ static void coroutine_fn mirror_run(void *opaque)
         goto immediate_exit;
     }
 
-    s->sector_num = -1;
+    bdrv_dirty_iter_init(bs, &s->hbi);
     for (;;) {
         uint64_t delay_ns;
         int64_t cnt;
diff --git a/block_int.h b/block_int.h
index 0de2fdc..4cb3c5b 100644
--- a/block_int.h
+++ b/block_int.h
@@ -30,6 +30,7 @@
 #include "qemu-coroutine.h"
 #include "qemu-timer.h"
 #include "qapi-types.h"
+#include "hbitmap.h"
 #include "monitor.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
@@ -258,8 +259,7 @@ struct BlockDriverState {
     bool iostatus_enabled;
     BlockDeviceIoStatus iostatus;
     char device_name[32];
-    unsigned long *dirty_bitmap;
-    int64_t dirty_count;
+    HBitmap *dirty_bitmap;
     int in_use; /* users other than guest access, eg. block migration */
     QTAILQ_ENTRY(BlockDriverState) list;
 
diff --git a/trace-events b/trace-events
index 9313ae7..496824c 100644
--- a/trace-events
+++ b/trace-events
@@ -77,6 +77,7 @@ stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base
 
 # block/mirror.c
 mirror_start(void *bs, void *s, void *co, void *opaque) "bs %p s %p co %p opaque %p"
+mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_flush(void *s) "s %p"
 mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 39/47] block: make round_to_clusters public
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (37 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 38/47] block: implement dirty bitmap using HBitmap Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 40/47] mirror: perform COW if the cluster size is bigger than the granularity Paolo Bonzini
                   ` (8 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This function is needed by block/mirror.c in the following patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   16 ++++++++--------
 block.h |    4 ++++
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 54d0cec..c56a500 100644
--- a/block.c
+++ b/block.c
@@ -1450,10 +1450,10 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
 /**
  * Round a region to cluster boundaries
  */
-static void round_to_clusters(BlockDriverState *bs,
-                              int64_t sector_num, int nb_sectors,
-                              int64_t *cluster_sector_num,
-                              int *cluster_nb_sectors)
+void bdrv_round_to_clusters(BlockDriverState *bs,
+                            int64_t sector_num, int nb_sectors,
+                            int64_t *cluster_sector_num,
+                            int *cluster_nb_sectors)
 {
     BlockDriverInfo bdi;
 
@@ -1495,8 +1495,8 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
      * CoR read and write operations are atomic and guest writes cannot
      * interleave between them.
      */
-    round_to_clusters(bs, sector_num, nb_sectors,
-                      &cluster_sector_num, &cluster_nb_sectors);
+    bdrv_round_to_clusters(bs, sector_num, nb_sectors,
+                           &cluster_sector_num, &cluster_nb_sectors);
 
     do {
         retry = false;
@@ -1819,8 +1819,8 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
     /* Cover entire cluster so no additional backing file I/O is required when
      * allocating cluster in the image file.
      */
-    round_to_clusters(bs, sector_num, nb_sectors,
-                      &cluster_sector_num, &cluster_nb_sectors);
+    bdrv_round_to_clusters(bs, sector_num, nb_sectors,
+                           &cluster_sector_num, &cluster_nb_sectors);
 
     trace_bdrv_co_do_copy_on_readv(bs, sector_num, nb_sectors,
                                    cluster_sector_num, cluster_nb_sectors);
diff --git a/block.h b/block.h
index 8212a0f..d0312a0 100644
--- a/block.h
+++ b/block.h
@@ -283,6 +283,10 @@ int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
                           const uint8_t *buf, int nb_sectors);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
+void bdrv_round_to_clusters(BlockDriverState *bs,
+                            int64_t sector_num, int nb_sectors,
+                            int64_t *cluster_sector_num,
+                            int *cluster_nb_sectors);
 
 const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
 void bdrv_get_backing_filename(BlockDriverState *bs,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 40/47] mirror: perform COW if the cluster size is bigger than the granularity
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (38 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 39/47] block: make round_to_clusters public Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 41/47] block: return count of dirty sectors, not chunks Paolo Bonzini
                   ` (7 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

When mirroring runs, the backing files for the target may not yet be
ready.  However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros.  Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying).  So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.

However, we want to lower the granularity for efficiency, so we need
a better solution.  The solution is to always copy a whole cluster the
first time it is touched.  The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c         |   60 ++++++++++++++++++++++++++++++++++++++++--------
 blockdev.c             |   13 +----------
 tests/qemu-iotests/039 |   25 ++++++++++++++++++--
 trace-events           |    1 +
 4 files changed, 76 insertions(+), 23 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index c3340d1..6f8ae62 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -15,6 +15,7 @@
 #include "blockjob.h"
 #include "block_int.h"
 #include "qemu/ratelimit.h"
+#include "bitmap.h"
 
 enum {
     /*
@@ -36,6 +37,8 @@ typedef struct MirrorBlockJob {
     bool synced;
     bool complete;
     int64_t sector_num;
+    size_t buf_size;
+    unsigned long *cow_bitmap;
     HBitmapIter hbi;
     uint8_t *buf;
 } MirrorBlockJob;
@@ -47,7 +50,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
     BlockDriverState *target = s->target;
     QEMUIOVector qiov;
     int ret, nb_sectors;
-    int64_t end;
+    int64_t end, sector_num, cluster_num;
     struct iovec iov;
 
     s->sector_num = hbitmap_iter_next(&s->hbi);
@@ -58,24 +61,43 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
         assert(s->sector_num >= 0);
     }
 
+    /* If we have no backing file yet in the destination, and the cluster size
+     * is very large, we need to do COW ourselves.  The first time a cluster is
+     * copied, copy it entirely.
+     *
+     * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are
+     * powers of two, the number of sectors to copy cannot exceed one cluster.
+     */
+    sector_num = s->sector_num;
+    nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
+    cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
+    if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
+        trace_mirror_cow(s, sector_num);
+        bdrv_round_to_clusters(s->target,
+                               sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK,
+                               &sector_num, &nb_sectors);
+        bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK,
+                   nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK);
+    }
+
     end = s->common.len >> BDRV_SECTOR_BITS;
-    nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
-    bdrv_reset_dirty(source, s->sector_num, nb_sectors);
+    nb_sectors = MIN(nb_sectors, end - sector_num);
+    bdrv_reset_dirty(source, sector_num, nb_sectors);
 
     /* Copy the dirty cluster.  */
     iov.iov_base = s->buf;
     iov.iov_len  = nb_sectors * 512;
     qemu_iovec_init_external(&qiov, &iov, 1);
 
-    trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
-    ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
+    trace_mirror_one_iteration(s, sector_num, nb_sectors);
+    ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov);
     if (ret < 0) {
         *p_action = block_job_error_action(&s->common, source,
                                            s->on_source_error, true, -ret);
         s->synced = false;
         goto fail;
     }
-    ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+    ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov);
     if (ret < 0) {
         *p_action = block_job_error_action(&s->common, target,
                                            s->on_target_error, false, -ret);
@@ -86,7 +108,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
 
 fail:
     /* Try again later.  */
-    bdrv_set_dirty(source, s->sector_num, nb_sectors);
+    bdrv_set_dirty(source, sector_num, nb_sectors);
     return ret;
 }
 
@@ -94,7 +116,9 @@ static void coroutine_fn mirror_run(void *opaque)
 {
     MirrorBlockJob *s = opaque;
     BlockDriverState *bs = s->common.bs;
-    int64_t sector_num, end;
+    int64_t sector_num, end, length;
+    BlockDriverInfo bdi;
+    char backing_filename[1024];
     int ret = 0;
     int n;
 
@@ -108,8 +132,23 @@ static void coroutine_fn mirror_run(void *opaque)
         return;
     }
 
+    /* If we have no backing file yet in the destination, we cannot let
+     * the destination do COW.  Instead, we copy sectors around the
+     * dirty data if needed.  We need a bitmap to do that.
+     */
+    bdrv_get_backing_filename(s->target, backing_filename,
+                              sizeof(backing_filename));
+    if (backing_filename[0] && !s->target->backing_hd) {
+        bdrv_get_info(s->target, &bdi);
+        if (s->buf_size < bdi.cluster_size) {
+            s->buf_size = bdi.cluster_size;
+            length = (bdrv_getlength(bs) + BLOCK_SIZE - 1) / BLOCK_SIZE;
+            s->cow_bitmap = bitmap_new(length);
+        }
+    }
+
     end = s->common.len >> BDRV_SECTOR_BITS;
-    s->buf = qemu_blockalign(bs, BLOCK_SIZE);
+    s->buf = qemu_blockalign(bs, s->buf_size);
 
     if (s->mode == MIRROR_SYNC_MODE_FULL || s->mode == MIRROR_SYNC_MODE_TOP) {
         /* First part, loop on the sectors and initialize the dirty bitmap.  */
@@ -218,6 +257,7 @@ static void coroutine_fn mirror_run(void *opaque)
 
 immediate_exit:
     g_free(s->buf);
+    g_free(s->cow_bitmap);
     bdrv_set_dirty_tracking(bs, false);
     bdrv_iostatus_disable(s->target);
     if (s->complete && ret == 0) {
@@ -313,6 +353,8 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     s->on_target_error = on_target_error;
     s->target = target;
     s->mode = mode;
+    s->buf_size = BLOCK_SIZE;
+
     bdrv_set_dirty_tracking(bs, true);
     bdrv_set_on_error(s->target, on_target_error, on_target_error);
     bdrv_iostatus_enable(s->target);
diff --git a/blockdev.c b/blockdev.c
index eb528cd..e160610 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -836,7 +836,6 @@ void qmp_drive_mirror(const char *device, const char *target,
                       bool has_on_target_error, BlockdevOnError on_target_error,
                       Error **errp)
 {
-    BlockDriverInfo bdi;
     BlockDriverState *bs;
     BlockDriverState *source, *target_bs;
     BlockDriver *proto_drv;
@@ -927,6 +926,7 @@ void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
+    /* Mirroring takes care of copy-on-write using data from the source.  */
     target_bs = bdrv_new("");
     ret = bdrv_open(target_bs, target, flags | BDRV_O_NO_BACKING, drv);
 
@@ -936,17 +936,6 @@ void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
-    /* We need a backing file if we will copy parts of a cluster.  */
-    if (bdrv_get_info(target_bs, &bdi) >= 0 && bdi.cluster_size != 0 &&
-        bdi.cluster_size >= BDRV_SECTORS_PER_DIRTY_CHUNK * 512) {
-        ret = bdrv_ensure_backing_file(target_bs);
-        if (ret < 0) {
-            bdrv_delete(target_bs);
-            error_set(errp, QERR_OPEN_FILE_FAILED, target);
-            return;
-        }
-    }
-
     mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
                  block_job_cb, bs, &local_err);
     if (local_err != NULL) {
diff --git a/tests/qemu-iotests/039 b/tests/qemu-iotests/039
index 3e17881..17fa05f 100755
--- a/tests/qemu-iotests/039
+++ b/tests/qemu-iotests/039
@@ -195,8 +195,8 @@ class TestSingleDrive(ImageMirroringTestCase):
     def test_large_cluster(self):
         self.assert_no_active_mirrors()
 
-        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
-                        % (TestSingleDrive.image_len, mid_img), target_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,size=%d'
+                        % (TestSingleDrive.image_len, TestSingleDrive.image_len), target_img)
         result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
                              mode='existing', target=target_img)
         self.assert_qmp(result, 'return', {})
@@ -280,6 +280,27 @@ class TestMirrorNoBacking(ImageMirroringTestCase):
         self.assertTrue(self.compare_images(test_img, target_img),
                         'target image does not match source after mirroring')
 
+    def test_large_cluster(self):
+        self.assert_no_active_mirrors()
+
+        # qemu-img create fails if the image is not there
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'size=%d'
+                        %(TestMirrorNoBacking.image_len), target_backing_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
+                        % (TestMirrorNoBacking.image_len, target_backing_img), target_img)
+        os.remove(target_backing_img)
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
 class TestReadErrors(ImageMirroringTestCase):
     image_len = 2 * 1024 * 1024 # MB
 
diff --git a/trace-events b/trace-events
index 496824c..6b504d8 100644
--- a/trace-events
+++ b/trace-events
@@ -82,6 +82,7 @@ mirror_before_flush(void *s) "s %p"
 mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
 mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
+mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 41/47] block: return count of dirty sectors, not chunks
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (39 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 40/47] mirror: perform COW if the cluster size is bigger than the granularity Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 42/47] block: allow customizing the granularity of the dirty bitmap Paolo Bonzini
                   ` (6 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This will make the API easier to use as soon as we'll let clients
customize the granularity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block-migration.c |    2 +-
 block.c           |    2 +-
 block/mirror.c    |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index b95b4e1..7ff188a 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -485,7 +485,7 @@ static int64_t get_remaining_dirty(void)
         dirty += bdrv_get_dirty_count(bmds->bs);
     }
 
-    return dirty * BLOCK_SIZE;
+    return dirty << BDRV_SECTOR_BITS;
 }
 
 static int is_stage2_completed(void)
diff --git a/block.c b/block.c
index c56a500..f3dd2a7 100644
--- a/block.c
+++ b/block.c
@@ -3825,7 +3825,7 @@ void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
 int64_t bdrv_get_dirty_count(BlockDriverState *bs)
 {
     if (bs->dirty_bitmap) {
-        return hbitmap_count(bs->dirty_bitmap) >> BDRV_LOG_SECTORS_PER_DIRTY_CHUNK;
+        return hbitmap_count(bs->dirty_bitmap);
     } else {
         return 0;
     }
diff --git a/block/mirror.c b/block/mirror.c
index 6f8ae62..8d242ef 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -227,7 +227,7 @@ static void coroutine_fn mirror_run(void *opaque)
         trace_mirror_before_sleep(s, cnt, s->synced);
         if (!s->synced) {
             /* Publish progress */
-            s->common.offset = end * BDRV_SECTOR_SIZE - cnt * BLOCK_SIZE;
+            s->common.offset = (end - cnt) * BDRV_SECTOR_SIZE;
 
             if (s->common.speed) {
                 delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 42/47] block: allow customizing the granularity of the dirty bitmap
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (40 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 41/47] block: return count of dirty sectors, not chunks Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 43/47] mirror: allow customizing the granularity Paolo Bonzini
                   ` (5 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block-migration.c |    6 ++++--
 block.c           |   14 +++++++-------
 block.h           |    5 +----
 block/mirror.c    |   14 ++++----------
 4 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 7ff188a..fa952e5 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -23,7 +23,8 @@
 #include "blockdev.h"
 #include <assert.h>
 
-#define BLOCK_SIZE (BDRV_SECTORS_PER_DIRTY_CHUNK << BDRV_SECTOR_BITS)
+#define BLOCK_SIZE                       (1 << 20)
+#define BDRV_SECTORS_PER_DIRTY_CHUNK     (BLOCK_SIZE >> BDRV_SECTOR_BITS)
 
 #define BLK_MIG_FLAG_DEVICE_BLOCK       0x01
 #define BLK_MIG_FLAG_EOS                0x02
@@ -264,7 +265,8 @@ static void set_dirty_tracking(int enable)
     BlkMigDevState *bmds;
 
     QSIMPLEQ_FOREACH(bmds, &block_mig_state.bmds_list, entry) {
-        bdrv_set_dirty_tracking(bmds->bs, enable);
+        bdrv_set_dirty_tracking(bmds->bs,
+                                enable ? BDRV_SECTORS_PER_DIRTY_CHUNK : 0);
     }
 }
 
diff --git a/block.c b/block.c
index f3dd2a7..972200f 100644
--- a/block.c
+++ b/block.c
@@ -3778,16 +3778,16 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size)
     return qemu_memalign((bs && bs->buffer_alignment) ? bs->buffer_alignment : 512, size);
 }
 
-void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
+void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity)
 {
     int64_t bitmap_size;
 
-    if (enable) {
-        if (!bs->dirty_bitmap) {
-            bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
-            bs->dirty_bitmap = hbitmap_alloc(bitmap_size,
-                                             BDRV_LOG_SECTORS_PER_DIRTY_CHUNK);
-        }
+    assert((granularity & (granularity - 1)) == 0);
+
+    if (granularity) {
+        assert(!bs->dirty_bitmap);
+        bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
+        bs->dirty_bitmap = hbitmap_alloc(bitmap_size, ffs(granularity) - 1);
     } else {
         if (bs->dirty_bitmap) {
             hbitmap_free(bs->dirty_bitmap);
diff --git a/block.h b/block.h
index d0312a0..3513b24 100644
--- a/block.h
+++ b/block.h
@@ -328,11 +328,8 @@ int bdrv_img_create(const char *filename, const char *fmt,
 void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
-#define BDRV_SECTORS_PER_DIRTY_CHUNK     (1 << BDRV_LOG_SECTORS_PER_DIRTY_CHUNK)
-#define BDRV_LOG_SECTORS_PER_DIRTY_CHUNK 11
-
 struct HBitmapIter;
-void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
+void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity);
 int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
 void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
 void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
diff --git a/block/mirror.c b/block/mirror.c
index 8d242ef..48ee963 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -17,14 +17,8 @@
 #include "qemu/ratelimit.h"
 #include "bitmap.h"
 
-enum {
-    /*
-     * Size of data buffer for populating the image file.  This should be large
-     * enough to process multiple clusters in a single call, so that populating
-     * contiguous regions of the image is efficient.
-     */
-    BLOCK_SIZE = 512 * BDRV_SECTORS_PER_DIRTY_CHUNK, /* in bytes */
-};
+#define BLOCK_SIZE                       (1 << 20)
+#define BDRV_SECTORS_PER_DIRTY_CHUNK     (BLOCK_SIZE >> BDRV_SECTOR_BITS)
 
 #define SLICE_TIME 100000000ULL /* ns */
 
@@ -258,7 +252,7 @@ static void coroutine_fn mirror_run(void *opaque)
 immediate_exit:
     g_free(s->buf);
     g_free(s->cow_bitmap);
-    bdrv_set_dirty_tracking(bs, false);
+    bdrv_set_dirty_tracking(bs, 0);
     bdrv_iostatus_disable(s->target);
     if (s->complete && ret == 0) {
         bdrv_swap(s->target, s->common.bs);
@@ -355,7 +349,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     s->mode = mode;
     s->buf_size = BLOCK_SIZE;
 
-    bdrv_set_dirty_tracking(bs, true);
+    bdrv_set_dirty_tracking(bs, BDRV_SECTORS_PER_DIRTY_CHUNK);
     bdrv_set_on_error(s->target, on_target_error, on_target_error);
     bdrv_iostatus_enable(s->target);
     s->common.co = qemu_coroutine_create(mirror_run);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 43/47] mirror: allow customizing the granularity
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (41 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 42/47] block: allow customizing the granularity of the dirty bitmap Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-28 13:43   ` Eric Blake
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 44/47] mirror: switch mirror_iteration to AIO Paolo Bonzini
                   ` (4 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

The desired granularity may be very different depending on the kind of
operation (e.g. continous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.
Allow the user to customize it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c   |   38 ++++++++++++++++++++------------------
 block_int.h      |    3 ++-
 blockdev.c       |   15 ++++++++++++++-
 hmp.c            |    2 +-
 qapi-schema.json |    7 ++++++-
 qmp-commands.hx  |    5 ++++-
 6 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 48ee963..81a600b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -17,9 +17,6 @@
 #include "qemu/ratelimit.h"
 #include "bitmap.h"
 
-#define BLOCK_SIZE                       (1 << 20)
-#define BDRV_SECTORS_PER_DIRTY_CHUNK     (BLOCK_SIZE >> BDRV_SECTOR_BITS)
-
 #define SLICE_TIME 100000000ULL /* ns */
 
 typedef struct MirrorBlockJob {
@@ -31,6 +28,7 @@ typedef struct MirrorBlockJob {
     bool synced;
     bool complete;
     int64_t sector_num;
+    int64_t granularity;
     size_t buf_size;
     unsigned long *cow_bitmap;
     HBitmapIter hbi;
@@ -43,7 +41,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
     BlockDriverState *source = s->common.bs;
     BlockDriverState *target = s->target;
     QEMUIOVector qiov;
-    int ret, nb_sectors;
+    int ret, nb_sectors, nb_sectors_chunk;
     int64_t end, sector_num, cluster_num;
     struct iovec iov;
 
@@ -59,19 +57,19 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
      * is very large, we need to do COW ourselves.  The first time a cluster is
      * copied, copy it entirely.
      *
-     * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are
-     * powers of two, the number of sectors to copy cannot exceed one cluster.
+     * Because both the granularity and the cluster size are powers of two, the
+     * number of sectors to copy cannot exceed one cluster.
      */
     sector_num = s->sector_num;
-    nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
-    cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
+    nb_sectors_chunk = nb_sectors = s->granularity >> BDRV_SECTOR_BITS;
+    cluster_num = sector_num / nb_sectors_chunk;
     if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
         trace_mirror_cow(s, sector_num);
         bdrv_round_to_clusters(s->target,
-                               sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK,
+                               sector_num, nb_sectors_chunk,
                                &sector_num, &nb_sectors);
-        bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK,
-                   nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK);
+        bitmap_set(s->cow_bitmap, sector_num / nb_sectors_chunk,
+                   nb_sectors / nb_sectors_chunk);
     }
 
     end = s->common.len >> BDRV_SECTOR_BITS;
@@ -110,7 +108,7 @@ static void coroutine_fn mirror_run(void *opaque)
 {
     MirrorBlockJob *s = opaque;
     BlockDriverState *bs = s->common.bs;
-    int64_t sector_num, end, length;
+    int64_t sector_num, end, nb_sectors_chunk, length;
     BlockDriverInfo bdi;
     char backing_filename[1024];
     int ret = 0;
@@ -136,20 +134,21 @@ static void coroutine_fn mirror_run(void *opaque)
         bdrv_get_info(s->target, &bdi);
         if (s->buf_size < bdi.cluster_size) {
             s->buf_size = bdi.cluster_size;
-            length = (bdrv_getlength(bs) + BLOCK_SIZE - 1) / BLOCK_SIZE;
+            length = (bdrv_getlength(bs) + s->granularity - 1) / s->granularity;
             s->cow_bitmap = bitmap_new(length);
         }
     }
 
     end = s->common.len >> BDRV_SECTOR_BITS;
     s->buf = qemu_blockalign(bs, s->buf_size);
+    nb_sectors_chunk = s->granularity >> BDRV_SECTOR_BITS;
 
     if (s->mode == MIRROR_SYNC_MODE_FULL || s->mode == MIRROR_SYNC_MODE_TOP) {
         /* First part, loop on the sectors and initialize the dirty bitmap.  */
         BlockDriverState *base;
         base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
         for (sector_num = 0; sector_num < end; ) {
-            int64_t next = (sector_num | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
+            int64_t next = (sector_num | (nb_sectors_chunk - 1)) + 1;
             ret = bdrv_co_is_allocated_above(bs, base,
                                              sector_num, next - sector_num, &n);
 
@@ -224,7 +223,7 @@ static void coroutine_fn mirror_run(void *opaque)
             s->common.offset = (end - cnt) * BDRV_SECTOR_SIZE;
 
             if (s->common.speed) {
-                delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
+                delay_ns = ratelimit_calculate_delay(&s->limit, nb_sectors_chunk);
             } else {
                 delay_ns = 0;
             }
@@ -323,7 +322,7 @@ static BlockJobType mirror_job_type = {
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-                  int64_t speed, MirrorSyncMode mode,
+                  int64_t speed, int64_t granularity, MirrorSyncMode mode,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
@@ -331,6 +330,8 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
 {
     MirrorBlockJob *s;
 
+    assert((granularity & (granularity - 1)) == 0);
+
     if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
          on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
         !bdrv_iostatus_is_enabled(bs)) {
@@ -347,9 +348,10 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     s->on_target_error = on_target_error;
     s->target = target;
     s->mode = mode;
-    s->buf_size = BLOCK_SIZE;
+    s->granularity = granularity;
+    s->buf_size = granularity;
 
-    bdrv_set_dirty_tracking(bs, BDRV_SECTORS_PER_DIRTY_CHUNK);
+    bdrv_set_dirty_tracking(bs, granularity >> BDRV_SECTOR_BITS);
     bdrv_set_on_error(s->target, on_target_error, on_target_error);
     bdrv_iostatus_enable(s->target);
     s->common.co = qemu_coroutine_create(mirror_run);
diff --git a/block_int.h b/block_int.h
index 4cb3c5b..69bb769 100644
--- a/block_int.h
+++ b/block_int.h
@@ -310,6 +310,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @bs: Block device to operate on.
  * @target: Block device to write to.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @granularity: The chosen granularity for the dirty bitmap.
  * @mode: Whether to collapse all images in the chain to the target.
  * @on_source_error: The action to take upon error reading from the source.
  * @on_target_error: The action to take upon error writing to the target.
@@ -323,7 +324,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @bs will be switched to read from @target.
  */
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-                  int64_t speed, MirrorSyncMode mode,
+                  int64_t speed, int64_t granularity, MirrorSyncMode mode,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
diff --git a/blockdev.c b/blockdev.c
index e160610..13b6217 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -832,6 +832,7 @@ void qmp_drive_mirror(const char *device, const char *target,
                       enum MirrorSyncMode sync,
                       bool has_mode, enum NewImageMode mode,
                       bool has_speed, int64_t speed,
+                      bool has_granularity, int64_t granularity,
                       bool has_on_source_error, BlockdevOnError on_source_error,
                       bool has_on_target_error, BlockdevOnError on_target_error,
                       Error **errp)
@@ -857,6 +858,17 @@ void qmp_drive_mirror(const char *device, const char *target,
     if (!has_mode) {
         mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
     }
+    if (!has_granularity) {
+        granularity = 65536;
+    }
+    if (granularity < 512 || granularity > 1048576 * 64) {
+        error_set(errp, QERR_INVALID_PARAMETER, device);
+        return;
+    }
+    if (granularity & (granularity - 1)) {
+        error_set(errp, QERR_INVALID_PARAMETER, device);
+        return;
+    }
 
     bs = bdrv_find(device);
     if (!bs) {
@@ -936,7 +948,8 @@ void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
-    mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
+    mirror_start(bs, target_bs, speed, granularity, sync,
+                 on_source_error, on_target_error,
                  block_job_cb, bs, &local_err);
     if (local_err != NULL) {
         bdrv_delete(target_bs);
diff --git a/hmp.c b/hmp.c
index b6bc263..4f95096 100644
--- a/hmp.c
+++ b/hmp.c
@@ -719,7 +719,7 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
 
     qmp_drive_mirror(device, filename, !!format, format,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-                     true, mode, false, 0,
+                     true, mode, false, 0, false, 0,
                      false, 0, false, 0, &errp);
     hmp_handle_error(mon, &errp);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 7a97fad..fb0ccc7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1389,6 +1389,9 @@
 #        (all the disk, only the sectors allocated in the topmost image, or
 #        only new I/O).
 #
+# @granularity: #optional granularity of the dirty bitmap, default is 64K.
+#               Must be a power of 2 between 512 and 64M.
+#
 # @on-source-error: #optional the action to take on an error on the source,
 #                   default 'report'.  'stop' and 'enospc' can only be used
 #                   if the block device supports io-status (see BlockInfo).
@@ -1401,6 +1404,7 @@
 #          If @device is not a valid block device, DeviceNotFound
 #          If @target can't be opened, OpenFileFailed
 #          If @format is invalid, InvalidBlockFormat
+#          If @granularity is not a power of 2, InvalidParameter
 #          If @on_source_error is not supported, InvalidParameter
 #
 # Since 1.2
@@ -1408,7 +1412,8 @@
 { 'command': 'drive-mirror',
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
             'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
-            '*speed': 'int', '*on-source-error': 'BlockdevOnError',
+            '*speed': 'int', '*granularity': 'int',
+            '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError' } }
 
 ##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 5081b01..89e12f5 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -839,7 +839,8 @@ EQMP
     {
         .name       = "drive-mirror",
         .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
-                      "on-source-error:s?,on-target-error:s?",
+                      "on-source-error:s?,on-target-error:s?,"
+                      "granularity:i?",
         .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
     },
 
@@ -863,6 +864,8 @@ Arguments:
   file/device (NewImageMode, optional, default 'absolute-paths')
 - "speed": maximum speed of the streaming job, in bytes per second
   (json-int)
+- "granularity": granularity of the dirty bitmap (json-int, default 64k,
+  must be a power of two between 512 and 64M.
 - "sync": what parts of the disk image should be copied to the destination;
   possibilities include "full" for all the disk, "top" for only the sectors
   allocated in the topmost image, or "none" to only replicate new I/O
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 44/47] mirror: switch mirror_iteration to AIO
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (42 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 43/47] mirror: allow customizing the granularity Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-28 13:46   ` Eric Blake
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 45/47] mirror: add buf-size argument to drive-mirror Paolo Bonzini
                   ` (3 subsequent siblings)
  47 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

There is really no change in the behavior of the job here, since there
is still a maximum of one in-flight I/O operation between the source and
the target.  However, this patch already introduces moves the copy logic
from mirror_iteration to AIO callbacks; it also adds the logic to count
in-flight operations, and only complete the job after they have finished.

Some care is required in the error and cancellation cases, in order
to avoid access to dangling pointers (and consequent corruption).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c |  161 ++++++++++++++++++++++++++++++++++++++++++--------------
 trace-events   |    2 +
 2 files changed, 123 insertions(+), 40 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 81a600b..971c923 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -17,7 +17,7 @@
 #include "qemu/ratelimit.h"
 #include "bitmap.h"
 
-#define SLICE_TIME 100000000ULL /* ns */
+#define SLICE_TIME    100000000ULL /* ns */
 
 typedef struct MirrorBlockJob {
     BlockJob common;
@@ -33,17 +33,78 @@ typedef struct MirrorBlockJob {
     unsigned long *cow_bitmap;
     HBitmapIter hbi;
     uint8_t *buf;
+
+    int in_flight;
+    int ret;
 } MirrorBlockJob;
 
-static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
-                                         BlockErrorAction *p_action)
+typedef struct MirrorOp {
+    MirrorBlockJob *s;
+    QEMUIOVector qiov;
+    struct iovec iov;
+    int64_t sector_num;
+    int nb_sectors;
+} MirrorOp;
+
+static void mirror_iteration_done(MirrorOp *op)
 {
+    MirrorBlockJob *s = op->s;
+
+    s->in_flight--;
+    trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors);
+    g_slice_free(MirrorOp, op);
+    qemu_coroutine_enter(s->common.co, NULL);
+}
+
+static void mirror_write_complete(void *opaque, int ret)
+{
+    MirrorOp *op = opaque;
+    MirrorBlockJob *s = op->s;
     BlockDriverState *source = s->common.bs;
     BlockDriverState *target = s->target;
-    QEMUIOVector qiov;
-    int ret, nb_sectors, nb_sectors_chunk;
+    BlockErrorAction action;
+
+    if (ret < 0) {
+        bdrv_set_dirty(source, op->sector_num, op->nb_sectors);
+        action = block_job_error_action(&s->common, target, s->on_target_error,
+                                        false, -ret);
+        s->synced = false;
+        if (action == BDRV_ACTION_REPORT && s->ret >= 0) {
+            s->ret = ret;
+        }
+    }
+    mirror_iteration_done(op);
+}
+
+static void mirror_read_complete(void *opaque, int ret)
+{
+    MirrorOp *op = opaque;
+    MirrorBlockJob *s = op->s;
+    BlockDriverState *source = s->common.bs;
+    BlockDriverState *target = s->target;
+    BlockErrorAction action;
+    if (ret < 0) {
+        bdrv_set_dirty(source, op->sector_num, op->nb_sectors);
+        action = block_job_error_action(&s->common, source, s->on_source_error,
+                                        true, -ret);
+        s->synced = false;
+        if (action == BDRV_ACTION_REPORT && s->ret >= 0) {
+            s->ret = ret;
+        }
+
+        mirror_iteration_done(op);
+        return;
+    }
+    bdrv_aio_writev(target, op->sector_num, &op->qiov, op->nb_sectors,
+                    mirror_write_complete, op);
+}
+
+static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
+{
+    BlockDriverState *source = s->common.bs;
+    int nb_sectors, nb_sectors_chunk;
     int64_t end, sector_num, cluster_num;
-    struct iovec iov;
+    MirrorOp *op;
 
     s->sector_num = hbitmap_iter_next(&s->hbi);
     if (s->sector_num < 0) {
@@ -74,34 +135,30 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
 
     end = s->common.len >> BDRV_SECTOR_BITS;
     nb_sectors = MIN(nb_sectors, end - sector_num);
+
+    /* Allocate a MirrorOp that is used as an AIO callback.  */
+    op = g_slice_new(MirrorOp);
+    op->s = s;
+    op->iov.iov_base = s->buf;
+    op->iov.iov_len  = nb_sectors * 512;
+    op->sector_num = sector_num;
+    op->nb_sectors = nb_sectors;
+    qemu_iovec_init_external(&op->qiov, &op->iov, 1);
+
     bdrv_reset_dirty(source, sector_num, nb_sectors);
 
     /* Copy the dirty cluster.  */
-    iov.iov_base = s->buf;
-    iov.iov_len  = nb_sectors * 512;
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
+    s->in_flight++;
     trace_mirror_one_iteration(s, sector_num, nb_sectors);
-    ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov);
-    if (ret < 0) {
-        *p_action = block_job_error_action(&s->common, source,
-                                           s->on_source_error, true, -ret);
-        s->synced = false;
-        goto fail;
-    }
-    ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov);
-    if (ret < 0) {
-        *p_action = block_job_error_action(&s->common, target,
-                                           s->on_target_error, false, -ret);
-        s->synced = false;
-        goto fail;
-    }
-    return 0;
+    bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
+                   mirror_read_complete, op);
+}
 
-fail:
-    /* Try again later.  */
-    bdrv_set_dirty(source, sector_num, nb_sectors);
-    return ret;
+static void mirror_drain(MirrorBlockJob *s)
+{
+    while (s->in_flight > 0) {
+        qemu_coroutine_yield();
+    }
 }
 
 static void coroutine_fn mirror_run(void *opaque)
@@ -109,6 +166,7 @@ static void coroutine_fn mirror_run(void *opaque)
     MirrorBlockJob *s = opaque;
     BlockDriverState *bs = s->common.bs;
     int64_t sector_num, end, nb_sectors_chunk, length;
+    uint64_t last_pause_ns;
     BlockDriverInfo bdi;
     char backing_filename[1024];
     int ret = 0;
@@ -168,22 +226,37 @@ static void coroutine_fn mirror_run(void *opaque)
     }
 
     bdrv_dirty_iter_init(bs, &s->hbi);
+    last_pause_ns = qemu_get_clock_ns(rt_clock);
     for (;;) {
         uint64_t delay_ns;
         int64_t cnt;
         bool should_complete;
 
+        if (s->ret < 0) {
+            ret = s->ret;
+            break;
+        }
+
         cnt = bdrv_get_dirty_count(bs);
-        if (cnt != 0) {
-            BlockErrorAction action = BDRV_ACTION_REPORT;
-            ret = mirror_iteration(s, &action);
-            if (ret < 0 && action == BDRV_ACTION_REPORT) {
-                break;
+
+        /* Note that even when no rate limit is applied we need to yield
+         * periodically with no pending I/O so that qemu_aio_flush() returns.
+         * We do so every SLICE_TIME milliseconds, or when there is an error,
+         * or when the source is clean, whichever comes first.
+         */
+        if (qemu_get_clock_ns(rt_clock) - last_pause_ns < SLICE_TIME &&
+            s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
+            if (s->in_flight > 0) {
+                trace_mirror_yield(s, s->in_flight, cnt);
+                qemu_coroutine_yield();
+                continue;
+            } else if (cnt != 0) {
+                mirror_iteration(s);
+                continue;
             }
-            cnt = bdrv_get_dirty_count(bs);
         }
 
-        if (cnt != 0) {
+        if (s->in_flight > 0 || cnt != 0) {
             should_complete = false;
         } else {
             trace_mirror_before_flush(s);
@@ -228,15 +301,12 @@ static void coroutine_fn mirror_run(void *opaque)
                 delay_ns = 0;
             }
 
-            /* Note that even when no rate limit is applied we need to yield
-             * with no pending I/O here so that qemu_aio_flush() returns.
-             */
             block_job_sleep_ns(&s->common, rt_clock, delay_ns);
             if (block_job_is_cancelled(&s->common)) {
                 break;
             }
         } else if (!should_complete) {
-            delay_ns = (cnt == 0 ? SLICE_TIME : 0);
+            delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
             block_job_sleep_ns(&s->common, rt_clock, delay_ns);
         } else if (cnt == 0) {
             /* The two disks are in sync.  Exit and report successful
@@ -246,9 +316,20 @@ static void coroutine_fn mirror_run(void *opaque)
             s->common.cancelled = false;
             break;
         }
+        last_pause_ns = qemu_get_clock_ns(rt_clock);
+    }
+
+    if (s->in_flight > 0) {
+        /* We get here only if something went wrong.  Either the job failed,
+         * or it was cancelled prematurely so that we do not guarantee that
+         * the target is a copy of the source.
+         */
+        assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common)));
+        mirror_drain(s);
     }
 
 immediate_exit:
+    assert(s->in_flight == 0);
     g_free(s->buf);
     g_free(s->cow_bitmap);
     bdrv_set_dirty_tracking(bs, 0);
diff --git a/trace-events b/trace-events
index 6b504d8..fe20bd7 100644
--- a/trace-events
+++ b/trace-events
@@ -83,6 +83,8 @@ mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
 mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
 mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64
+mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
+mirror_yield(void *s, int64_t cnt, int in_flight) "s %p dirty count %"PRId64" in_flight %d"
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 45/47] mirror: add buf-size argument to drive-mirror
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (43 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 44/47] mirror: switch mirror_iteration to AIO Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 46/47] mirror: support more than one in-flight AIO operation Paolo Bonzini
                   ` (2 subsequent siblings)
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

This makes sense when the next commit starts using the extra buffer space
to perform many I/O operations asynchronously.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c         |    6 +++---
 block_int.h            |    5 +++--
 blockdev.c             |    9 ++++++++-
 hmp.c                  |    2 +-
 qapi-schema.json       |    5 ++++-
 qmp-commands.hx        |    4 +++-
 tests/qemu-iotests/039 |   31 +++++++++++++++++++++++++++++++
 7 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 971c923..475a7e0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -403,8 +403,8 @@ static BlockJobType mirror_job_type = {
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-                  int64_t speed, int64_t granularity, MirrorSyncMode mode,
-                  BlockdevOnError on_source_error,
+                  int64_t speed, int64_t granularity, int64_t buf_size,
+                  MirrorSyncMode mode, BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp)
@@ -430,7 +430,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     s->target = target;
     s->mode = mode;
     s->granularity = granularity;
-    s->buf_size = granularity;
+    s->buf_size = MAX(buf_size, granularity);
 
     bdrv_set_dirty_tracking(bs, granularity >> BDRV_SECTOR_BITS);
     bdrv_set_on_error(s->target, on_target_error, on_target_error);
diff --git a/block_int.h b/block_int.h
index 69bb769..7f2b7f1 100644
--- a/block_int.h
+++ b/block_int.h
@@ -311,6 +311,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @target: Block device to write to.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @granularity: The chosen granularity for the dirty bitmap.
+ * @buf_size: The amount of data that can be in flight at one time.
  * @mode: Whether to collapse all images in the chain to the target.
  * @on_source_error: The action to take upon error reading from the source.
  * @on_target_error: The action to take upon error writing to the target.
@@ -324,8 +325,8 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @bs will be switched to read from @target.
  */
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-                  int64_t speed, int64_t granularity, MirrorSyncMode mode,
-                  BlockdevOnError on_source_error,
+                  int64_t speed, int64_t granularity, int64_t buf_size,
+                  MirrorSyncMode mode, BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
diff --git a/blockdev.c b/blockdev.c
index 13b6217..d617d02 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -827,12 +827,15 @@ exit:
     return;
 }
 
+#define DEFAULT_MIRROR_BUF_SIZE   (10 << 20)
+
 void qmp_drive_mirror(const char *device, const char *target,
                       bool has_format, const char *format,
                       enum MirrorSyncMode sync,
                       bool has_mode, enum NewImageMode mode,
                       bool has_speed, int64_t speed,
                       bool has_granularity, int64_t granularity,
+                      bool has_buf_size, int64_t buf_size,
                       bool has_on_source_error, BlockdevOnError on_source_error,
                       bool has_on_target_error, BlockdevOnError on_target_error,
                       Error **errp)
@@ -861,6 +864,10 @@ void qmp_drive_mirror(const char *device, const char *target,
     if (!has_granularity) {
         granularity = 65536;
     }
+    if (!has_buf_size) {
+        buf_size = DEFAULT_MIRROR_BUF_SIZE;
+    }
+
     if (granularity < 512 || granularity > 1048576 * 64) {
         error_set(errp, QERR_INVALID_PARAMETER, device);
         return;
@@ -948,7 +955,7 @@ void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
-    mirror_start(bs, target_bs, speed, granularity, sync,
+    mirror_start(bs, target_bs, speed, granularity, buf_size, sync,
                  on_source_error, on_target_error,
                  block_job_cb, bs, &local_err);
     if (local_err != NULL) {
diff --git a/hmp.c b/hmp.c
index 4f95096..85dcf7e 100644
--- a/hmp.c
+++ b/hmp.c
@@ -719,7 +719,7 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
 
     qmp_drive_mirror(device, filename, !!format, format,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-                     true, mode, false, 0, false, 0,
+                     true, mode, false, 0, false, 0, false, 0,
                      false, 0, false, 0, &errp);
     hmp_handle_error(mon, &errp);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index fb0ccc7..8cfcd4d 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1392,6 +1392,9 @@
 # @granularity: #optional granularity of the dirty bitmap, default is 64K.
 #               Must be a power of 2 between 512 and 64M.
 #
+# @buf-size: #optional maximum amount of data in flight from source to
+#            target.
+#
 # @on-source-error: #optional the action to take on an error on the source,
 #                   default 'report'.  'stop' and 'enospc' can only be used
 #                   if the block device supports io-status (see BlockInfo).
@@ -1413,7 +1416,7 @@
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
             'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
             '*speed': 'int', '*granularity': 'int',
-            '*on-source-error': 'BlockdevOnError',
+            '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError' } }
 
 ##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 89e12f5..1b9f5af 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -840,7 +840,7 @@ EQMP
         .name       = "drive-mirror",
         .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
                       "on-source-error:s?,on-target-error:s?,"
-                      "granularity:i?",
+                      "buf-size:i?,granularity:i?",
         .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
     },
 
@@ -866,6 +866,8 @@ Arguments:
   (json-int)
 - "granularity": granularity of the dirty bitmap (json-int, default 64k,
   must be a power of two between 512 and 64M.
+- "buf_size": maximum amount of data in flight from source to target
+  (json-int, default 10M)
 - "sync": what parts of the disk image should be copied to the destination;
   possibilities include "full" for all the disk, "top" for only the sectors
   allocated in the topmost image, or "none" to only replicate new I/O
diff --git a/tests/qemu-iotests/039 b/tests/qemu-iotests/039
index 17fa05f..a76766b 100755
--- a/tests/qemu-iotests/039
+++ b/tests/qemu-iotests/039
@@ -192,6 +192,37 @@ class TestSingleDrive(ImageMirroringTestCase):
         self.assertTrue(self.compare_images(test_img, target_img),
                         'target image does not match source after mirroring')
 
+    def test_small_buffer(self):
+        self.assert_no_active_mirrors()
+
+        # A small buffer is rounded up automatically
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             buf_size=4096, target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_small_buffer2(self):
+        self.assert_no_active_mirrors()
+
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,size=%d'
+                        % (TestSingleDrive.image_len, TestSingleDrive.image_len), target_img)
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             buf_size=65536, mode='existing', target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
     def test_large_cluster(self):
         self.assert_no_active_mirrors()
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 46/47] mirror: support more than one in-flight AIO operation
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (44 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 45/47] mirror: add buf-size argument to drive-mirror Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 47/47] mirror: support arbitrarily-sized iterations Paolo Bonzini
  2012-07-28 13:51 ` [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Eric Blake
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

With AIO support in place, we can start copying more than one chunk
in parallel.  This patch introduces the required infrastructure for
this: the buffer is split into multiple granularity-sized chunks,
and there is a free list to access them.

Because of copy-on-write, a single operation may already require
multiple chunks to be available on the free list.  The next patch
will make this more general, but the logic remains the same overall.

In addition, two different iterations on the HBitmap may want to
copy the same cluster.  We avoid this by keeping a bitmap of in-flight
I/O operations, and blocking until the previous iteration completes.
This should be a relatively rare occurrence, though, and as long as
there is no overlap the next iteration can start before the previous
one finishes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++++++------
 trace-events   |    4 ++-
 2 files changed, 99 insertions(+), 12 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 475a7e0..93e718f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -18,6 +18,14 @@
 #include "bitmap.h"
 
 #define SLICE_TIME    100000000ULL /* ns */
+#define MAX_IN_FLIGHT 16
+
+/* The mirroring buffer is a list of granularity-sized chunks.
+ * Free chunks are organized in a list.
+ */
+typedef struct MirrorBuffer {
+    QSIMPLEQ_ENTRY(MirrorBuffer) next;
+} MirrorBuffer;
 
 typedef struct MirrorBlockJob {
     BlockJob common;
@@ -33,7 +41,10 @@ typedef struct MirrorBlockJob {
     unsigned long *cow_bitmap;
     HBitmapIter hbi;
     uint8_t *buf;
+    QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
+    int buf_free_count;
 
+    unsigned long *in_flight_bitmap;
     int in_flight;
     int ret;
 } MirrorBlockJob;
@@ -41,7 +52,6 @@ typedef struct MirrorBlockJob {
 typedef struct MirrorOp {
     MirrorBlockJob *s;
     QEMUIOVector qiov;
-    struct iovec iov;
     int64_t sector_num;
     int nb_sectors;
 } MirrorOp;
@@ -49,8 +59,22 @@ typedef struct MirrorOp {
 static void mirror_iteration_done(MirrorOp *op)
 {
     MirrorBlockJob *s = op->s;
+    struct iovec *iov;
+    int64_t cluster_num;
+    int i, nb_chunks;
 
     s->in_flight--;
+    iov = op->qiov.iov;
+    for (i = 0; i < op->qiov.niov; i++) {
+        MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
+        QSIMPLEQ_INSERT_TAIL(&s->buf_free, buf, next);
+        s->buf_free_count++;
+    }
+
+    cluster_num = op->sector_num / s->granularity;
+    nb_chunks = op->nb_sectors / s->granularity;
+    bitmap_clear(s->in_flight_bitmap, cluster_num, nb_chunks);
+
     trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors);
     g_slice_free(MirrorOp, op);
     qemu_coroutine_enter(s->common.co, NULL);
@@ -102,8 +126,8 @@ static void mirror_read_complete(void *opaque, int ret)
 static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
     BlockDriverState *source = s->common.bs;
-    int nb_sectors, nb_sectors_chunk;
-    int64_t end, sector_num, cluster_num;
+    int nb_sectors, nb_sectors_chunk, nb_chunks;
+    int64_t end, sector_num, cluster_num, next_sector, hbitmap_next_sector;
     MirrorOp *op;
 
     s->sector_num = hbitmap_iter_next(&s->hbi);
@@ -114,6 +138,8 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
         assert(s->sector_num >= 0);
     }
 
+    hbitmap_next_sector = s->sector_num;
+
     /* If we have no backing file yet in the destination, and the cluster size
      * is very large, we need to do COW ourselves.  The first time a cluster is
      * copied, copy it entirely.
@@ -129,21 +155,58 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
         bdrv_round_to_clusters(s->target,
                                sector_num, nb_sectors_chunk,
                                &sector_num, &nb_sectors);
-        bitmap_set(s->cow_bitmap, sector_num / nb_sectors_chunk,
-                   nb_sectors / nb_sectors_chunk);
+
+        /* The rounding may make us copy sectors before the
+         * first dirty one.
+         */
+        cluster_num = sector_num / nb_sectors_chunk;
+    }
+
+    /* Wait for I/O to this cluster (from a previous iteration) to be done.  */
+    while (test_bit(cluster_num, s->in_flight_bitmap)) {
+        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+        qemu_coroutine_yield();
     }
 
     end = s->common.len >> BDRV_SECTOR_BITS;
     nb_sectors = MIN(nb_sectors, end - sector_num);
+    nb_chunks = (nb_sectors + nb_sectors_chunk - 1) / nb_sectors_chunk;
+    while (s->buf_free_count < nb_chunks) {
+        trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
+        qemu_coroutine_yield();
+    }
+
+    /* We have enough free space to copy these sectors.  */
+    if (s->cow_bitmap) {
+        bitmap_set(s->cow_bitmap, cluster_num, nb_chunks);
+    }
 
     /* Allocate a MirrorOp that is used as an AIO callback.  */
     op = g_slice_new(MirrorOp);
     op->s = s;
-    op->iov.iov_base = s->buf;
-    op->iov.iov_len  = nb_sectors * 512;
     op->sector_num = sector_num;
     op->nb_sectors = nb_sectors;
-    qemu_iovec_init_external(&op->qiov, &op->iov, 1);
+
+    /* Now make a QEMUIOVector taking enough granularity-sized chunks
+     * from s->buf_free.
+     */
+    qemu_iovec_init(&op->qiov, nb_chunks);
+    next_sector = sector_num;
+    while (nb_chunks-- > 0) {
+        MirrorBuffer *buf = QSIMPLEQ_FIRST(&s->buf_free);
+        QSIMPLEQ_REMOVE_HEAD(&s->buf_free, next);
+        s->buf_free_count--;
+        qemu_iovec_add(&op->qiov, buf, s->granularity);
+
+        /* Advance the HBitmapIter in parallel, so that we do not examine
+         * the same sector twice.
+         */
+        if (next_sector > hbitmap_next_sector && bdrv_get_dirty(source, next_sector)) {
+            hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
+        }
+
+        next_sector += nb_sectors_chunk;
+    }
 
     bdrv_reset_dirty(source, sector_num, nb_sectors);
 
@@ -154,6 +217,23 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
                    mirror_read_complete, op);
 }
 
+static void mirror_free_init(MirrorBlockJob *s)
+{
+    int granularity = s->granularity;
+    size_t buf_size = s->buf_size;
+    uint8_t *buf = s->buf;
+
+    assert(s->buf_free_count == 0);
+    QSIMPLEQ_INIT(&s->buf_free);
+    while (buf_size != 0) {
+        MirrorBuffer *cur = (MirrorBuffer *)buf;
+        QSIMPLEQ_INSERT_TAIL(&s->buf_free, cur, next);
+        s->buf_free_count++;
+        buf_size -= granularity;
+        buf += granularity;
+    }
+}
+
 static void mirror_drain(MirrorBlockJob *s)
 {
     while (s->in_flight > 0) {
@@ -182,6 +262,9 @@ static void coroutine_fn mirror_run(void *opaque)
         return;
     }
 
+    length = (bdrv_getlength(bs) + s->granularity - 1) / s->granularity;
+    s->in_flight_bitmap = bitmap_new(length);
+
     /* If we have no backing file yet in the destination, we cannot let
      * the destination do COW.  Instead, we copy sectors around the
      * dirty data if needed.  We need a bitmap to do that.
@@ -192,7 +275,6 @@ static void coroutine_fn mirror_run(void *opaque)
         bdrv_get_info(s->target, &bdi);
         if (s->buf_size < bdi.cluster_size) {
             s->buf_size = bdi.cluster_size;
-            length = (bdrv_getlength(bs) + s->granularity - 1) / s->granularity;
             s->cow_bitmap = bitmap_new(length);
         }
     }
@@ -200,6 +282,7 @@ static void coroutine_fn mirror_run(void *opaque)
     end = s->common.len >> BDRV_SECTOR_BITS;
     s->buf = qemu_blockalign(bs, s->buf_size);
     nb_sectors_chunk = s->granularity >> BDRV_SECTOR_BITS;
+    mirror_free_init(s);
 
     if (s->mode == MIRROR_SYNC_MODE_FULL || s->mode == MIRROR_SYNC_MODE_TOP) {
         /* First part, loop on the sectors and initialize the dirty bitmap.  */
@@ -246,8 +329,9 @@ static void coroutine_fn mirror_run(void *opaque)
          */
         if (qemu_get_clock_ns(rt_clock) - last_pause_ns < SLICE_TIME &&
             s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
-            if (s->in_flight > 0) {
-                trace_mirror_yield(s, s->in_flight, cnt);
+            if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 ||
+                (cnt == 0 && s->in_flight > 0)) {
+                trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
                 qemu_coroutine_yield();
                 continue;
             } else if (cnt != 0) {
@@ -332,6 +416,7 @@ immediate_exit:
     assert(s->in_flight == 0);
     g_free(s->buf);
     g_free(s->cow_bitmap);
+    g_free(s->in_flight_bitmap);
     bdrv_set_dirty_tracking(bs, 0);
     bdrv_iostatus_disable(s->target);
     if (s->complete && ret == 0) {
diff --git a/trace-events b/trace-events
index fe20bd7..7ae11e9 100644
--- a/trace-events
+++ b/trace-events
@@ -84,7 +84,9 @@ mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64
 mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
 mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64
 mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
-mirror_yield(void *s, int64_t cnt, int in_flight) "s %p dirty count %"PRId64" in_flight %d"
+mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d"
+mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
+mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [Qemu-devel] [PATCH 47/47] mirror: support arbitrarily-sized iterations
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (45 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 46/47] mirror: support more than one in-flight AIO operation Paolo Bonzini
@ 2012-07-24 11:04 ` Paolo Bonzini
  2012-07-28 13:51 ` [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Eric Blake
  47 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Yet another optimization is to extend the mirroring iteration to include more
adjacent dirty blocks.  This limits the number of I/O operations and makes
mirroring efficient even with a small granularity.  Most of the infrastructure
is already in place; we only need to put a loop around the computation of
the origin and sector count of the iteration.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c |  100 ++++++++++++++++++++++++++++++++++++++------------------
 trace-events   |    1 +
 2 files changed, 69 insertions(+), 32 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 93e718f..87d97eb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -127,7 +127,7 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
     BlockDriverState *source = s->common.bs;
     int nb_sectors, nb_sectors_chunk, nb_chunks;
-    int64_t end, sector_num, cluster_num, next_sector, hbitmap_next_sector;
+    int64_t end, sector_num, next_cluster, next_sector, hbitmap_next_sector;
     MirrorOp *op;
 
     s->sector_num = hbitmap_iter_next(&s->hbi);
@@ -139,47 +139,83 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
     }
 
     hbitmap_next_sector = s->sector_num;
+    sector_num = s->sector_num;
+    nb_sectors_chunk = s->granularity >> BDRV_SECTOR_BITS;
+    end = s->common.len >> BDRV_SECTOR_BITS;
 
-    /* If we have no backing file yet in the destination, and the cluster size
-     * is very large, we need to do COW ourselves.  The first time a cluster is
-     * copied, copy it entirely.
+    /* Extend the QEMUIOVector to include all adjacent blocks that will
+     * be copied in this operation.
+     *
+     * We have to do this if we have no backing file yet in the destination,
+     * and the cluster size is very large.  Then we need to do COW ourselves.
+     * The first time a cluster is copied, copy it entirely.  Note that,
+     * because both the granularity and the cluster size are powers of two,
+     * the number of sectors to copy cannot exceed one cluster.
      *
-     * Because both the granularity and the cluster size are powers of two, the
-     * number of sectors to copy cannot exceed one cluster.
+     * We also want to extend the QEMUIOVector to include more adjacent
+     * dirty blocks if possible, to limit the number of I/O operations and
+     * run efficiently even with a small granularity.
      */
-    sector_num = s->sector_num;
-    nb_sectors_chunk = nb_sectors = s->granularity >> BDRV_SECTOR_BITS;
-    cluster_num = sector_num / nb_sectors_chunk;
-    if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
-        trace_mirror_cow(s, sector_num);
-        bdrv_round_to_clusters(s->target,
-                               sector_num, nb_sectors_chunk,
-                               &sector_num, &nb_sectors);
-
-        /* The rounding may make us copy sectors before the
-         * first dirty one.
-         */
-        cluster_num = sector_num / nb_sectors_chunk;
-    }
+    nb_chunks = 0;
+    nb_sectors = 0;
+    next_sector = sector_num;
+    next_cluster = sector_num / nb_sectors_chunk;
 
     /* Wait for I/O to this cluster (from a previous iteration) to be done.  */
-    while (test_bit(cluster_num, s->in_flight_bitmap)) {
+    while (test_bit(next_cluster, s->in_flight_bitmap)) {
         trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
         qemu_coroutine_yield();
     }
 
-    end = s->common.len >> BDRV_SECTOR_BITS;
-    nb_sectors = MIN(nb_sectors, end - sector_num);
-    nb_chunks = (nb_sectors + nb_sectors_chunk - 1) / nb_sectors_chunk;
-    while (s->buf_free_count < nb_chunks) {
-        trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
-        qemu_coroutine_yield();
-    }
+    do {
+        int added_sectors, added_chunks;
 
-    /* We have enough free space to copy these sectors.  */
-    if (s->cow_bitmap) {
-        bitmap_set(s->cow_bitmap, cluster_num, nb_chunks);
-    }
+        if (!bdrv_get_dirty(source, next_sector) ||
+            test_bit(next_cluster, s->in_flight_bitmap)) {
+            assert(nb_sectors > 0);
+            break;
+        }
+
+        added_sectors = nb_sectors_chunk;
+        if (s->cow_bitmap && !test_bit(next_cluster, s->cow_bitmap)) {
+            bdrv_round_to_clusters(s->target,
+                                   next_sector, added_sectors,
+                                   &next_sector, &added_sectors);
+
+            /* On the first iteration, the rounding may make us copy
+             * sectors before the first dirty one.
+             */
+            if (next_sector < sector_num) {
+                assert(nb_sectors == 0);
+                sector_num = next_sector;
+                next_cluster = next_sector / nb_sectors_chunk;
+            }
+        }
+
+        added_sectors = MIN(added_sectors, end - (sector_num + nb_sectors));
+        added_chunks = (added_sectors + nb_sectors_chunk - 1) / nb_sectors_chunk;
+
+        /* When doing COW, it may happen that there are not enough free
+         * buffers to copy a full cluster.  Wait if that is the case.
+         */
+        while (nb_chunks == 0 && s->buf_free_count < added_chunks) {
+            trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
+            qemu_coroutine_yield();
+        }
+        if (s->buf_free_count < nb_chunks + added_chunks) {
+            trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
+            break;
+        }
+
+        /* We have enough free space to copy these sectors.  */
+        if (s->cow_bitmap) {
+            bitmap_set(s->cow_bitmap, next_cluster, added_chunks);
+        }
+        nb_sectors += added_sectors;
+        nb_chunks += added_chunks;
+        next_sector += added_sectors;
+        next_cluster += added_chunks;
+    } while (next_sector < end);
 
     /* Allocate a MirrorOp that is used as an AIO callback.  */
     op = g_slice_new(MirrorOp);
diff --git a/trace-events b/trace-events
index 7ae11e9..cd387fa 100644
--- a/trace-events
+++ b/trace-events
@@ -87,6 +87,7 @@ mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors) "s %p sector_
 mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d"
 mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
 mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
+mirror_break_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 15/47] blkdebug: process all set_state rules in the old state
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 15/47] blkdebug: process all set_state rules in the old state Paolo Bonzini
@ 2012-07-24 20:06   ` Blue Swirl
  0 siblings, 0 replies; 136+ messages in thread
From: Blue Swirl @ 2012-07-24 20:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, eblake, qemu-devel, stefanha

On Tue, Jul 24, 2012 at 11:03 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Currently it is impossible to write a blkdebug script that ping-pongs
> between two states, because the second set-state rule will use the
> state that is set in the first.  If you have
>
>     [set-state]
>     event = "..."
>     state = "1"
>     new_state = "2"
>
>     [set-state]
>     event = "..."
>     state = "2"
>     new_state = "1"
>
> for example the state will remain locked at 1.  This can be fixed
> by first processing all rules, and then setting the state.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/blkdebug.c |   14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 59dcea0..0f12145 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -28,6 +28,7 @@
>
>  typedef struct BDRVBlkdebugState {
>      int state;
> +    int new_state;
>      QLIST_HEAD(, BlkdebugRule) rules[BLKDBG_EVENT_MAX];
>      QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
>  } BDRVBlkdebugState;
> @@ -351,6 +352,7 @@ static BlockDriverAIOCB *blkdebug_aio_readv(BlockDriverState *bs,
>      BDRVBlkdebugState *s = bs->opaque;
>      BlkdebugRule *rule = NULL;
>
> +    printf("read %ld\n", sector_num);

Leftover debugging?

>      QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
>          if (rule->options.inject.sector == -1 ||
>              (rule->options.inject.sector >= sector_num &&
> @@ -403,12 +405,12 @@ static void blkdebug_close(BlockDriverState *bs)
>  }
>
>  static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
> -    int old_state, bool injected)
> +    bool injected)
>  {
>      BDRVBlkdebugState *s = bs->opaque;
>
>      /* Only process rules for the current state */
> -    if (rule->state && rule->state != old_state) {
> +    if (rule->state && rule->state != s->state) {
>          return injected;
>      }
>
> @@ -423,7 +425,7 @@ static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
>          break;
>
>      case ACTION_SET_STATE:
> -        s->state = rule->options.set_state.new_state;
> +        s->new_state = rule->options.set_state.new_state;
>          break;
>      }
>      return injected;
> @@ -433,15 +435,17 @@ static void blkdebug_debug_event(BlockDriverState *bs, BlkDebugEvent event)
>  {
>      BDRVBlkdebugState *s = bs->opaque;
>      struct BlkdebugRule *rule;
> -    int old_state = s->state;
>      bool injected;
>
>      assert((int)event >= 0 && event < BLKDBG_EVENT_MAX);
>
> +    printf("state %d\n", s->state);

Here too?

>      injected = false;
> +    s->new_state = s->state;
>      QLIST_FOREACH(rule, &s->rules[event], next) {
> -        injected = process_rule(bs, rule, old_state, injected);
> +        injected = process_rule(bs, rule, injected);
>      }
> +    s->state = s->new_state;
>  }
>
>  static int64_t blkdebug_getlength(BlockDriverState *bs)
> --
> 1.7.10.4
>
>
>

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 13/47] block: introduce block job error Paolo Bonzini
@ 2012-07-25 17:40   ` Eric Blake
  2012-08-01 10:14   ` Kevin Wolf
  1 sibling, 0 replies; 136+ messages in thread
From: Eric Blake @ 2012-07-25 17:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

On 07/24/2012 05:03 AM, Paolo Bonzini wrote:
> The following behaviors are possible:
> 
> 'report': The behavior is the same as in 1.1.  An I/O error,
> respectively during a read or a write, will complete the job immediately
> with an error code.
> 
> 'ignore': An I/O error, respectively during a read or a write, will be
> ignored.  For streaming, the job will complete with an error and the
> backing file will be left in place.  For mirroring, the sector will be
> marked again as dirty and re-examined later.

You actually documented these early in qapi-schema.json in patch 10/47,
but accurate bisection of docs is not a show-stopper.

I've reviewed through this point in the series, and haven't found
anything that triggered any complaints from my end.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 27/47] block: introduce mirror job
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 27/47] block: introduce mirror job Paolo Bonzini
@ 2012-07-25 23:02   ` Eric Blake
  2012-09-13 12:54   ` Kevin Wolf
  1 sibling, 0 replies; 136+ messages in thread
From: Eric Blake @ 2012-07-25 23:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 2127 bytes --]

On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
> This patch adds the implementation of a new job that mirrors a disk to
> a new image while letting the guest continue using the old image.
> The target is treated as a "black box" and data is copied from the
> source to the target in the background.  This can be used for several
> purposes, including storage migration, continuous replication, and
> observation of the guest I/O in an external program.  It is also a
> first step in replacing the inefficient block migration code that is
> part of QEMU.
> 
> The job is possibly never-ending, but it is logically structured into

> +++ b/block/mirror.c

> +
> +            /* We're out of the streaming phase.  From now on, if the
> +             * job is cancelled we will actually complete all pending
> +             * I/O and report completion, so that drive-reopen can be
> +             * used to pivot to the mirroring target.
> +             */

Stale comment - isn't it now 'block-job-complete' instead of 'drive-reopen'?

> +++ b/block_int.h
> @@ -305,4 +305,24 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
>                    BlockDriverCompletionFunc *cb,
>                    void *opaque, Error **errp);
>  
> +/**
> + * mirror_start:
> + * @bs: Block device to operate on.
> + * @target: Block device to write to.
> + * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
> + * @mode: Whether to collapse all images in the chain to the target.
> + * @cb: Completion function for the job.
> + * @opaque: Opaque pointer value passed to @cb.
> + * @errp: Error object.
> + *
> + * Start a mirroring operation on @bs.  Clusters that are allocated
> + * in @bs will be written to @bs until the job is canceled or

I've messed you up - you've got 'canceled' and 'cancelled' in different
comments within the same patch :)

I've now reviewed the series up to this point, with no findings on the
patches where I did not reply.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 02/47] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 02/47] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE Paolo Bonzini
@ 2012-07-26 15:26   ` Kevin Wolf
  2012-07-26 15:41     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-07-26 15:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, Luiz Capitulino, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> The DeviceNotActive error is not a particularly good match, add
> a separate one.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Luiz, what do you think about this one? It seems to contradict the idea
of having only few error classes and free form error descriptions.

What's the error class that we should really have here? A general
QERR_NOT_ACTIVE?

Kevin


> ---
>  blockdev.c       |    4 ++--
>  qapi-schema.json |    5 ++---
>  qerror.c         |    4 ++++
>  qerror.h         |    3 +++
>  4 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 3d75015..9c142ee 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1139,7 +1139,7 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
>      BlockJob *job = find_block_job(device);
>  
>      if (!job) {
> -        error_set(errp, QERR_DEVICE_NOT_ACTIVE, device);
> +        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
>          return;
>      }
>  
> @@ -1151,7 +1151,7 @@ void qmp_block_job_cancel(const char *device, Error **errp)
>      BlockJob *job = find_block_job(device);
>  
>      if (!job) {
> -        error_set(errp, QERR_DEVICE_NOT_ACTIVE, device);
> +        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
>          return;
>      }
>  
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 000eb83..040981e 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1657,7 +1657,7 @@
>  # Returns: Nothing on success
>  #          If the job type does not support throttling, NotSupported
>  #          If the speed value is invalid, InvalidParameter
> -#          If no background operation is active on this device, DeviceNotActive
> +#          If no background operation is active on this device, BlockJobNotActive
>  #
>  # Since: 1.1
>  ##
> @@ -1685,8 +1685,7 @@
>  # @device: the device name
>  #
>  # Returns: Nothing on success
> -#          If no background operation is active on this device, DeviceNotActive
> -#          If cancellation already in progress, DeviceInUse
> +#          If no background operation is active on this device, BlockJobNotActive
>  #
>  # Since: 1.1
>  ##
> diff --git a/qerror.c b/qerror.c
> index 92c4eff..bc672a5 100644
> --- a/qerror.c
> +++ b/qerror.c
> @@ -60,6 +60,10 @@ static const QErrorStringTable qerror_table[] = {
>          .desc      = "Base '%(base)' not found",
>      },
>      {
> +        .error_fmt = QERR_BLOCK_JOB_NOT_ACTIVE,
> +        .desc      = "No active block job on device '%(name)'",
> +    },
> +    {
>          .error_fmt = QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
>          .desc      = "Block format '%(format)' used by device '%(name)' does not support feature '%(feature)'",
>      },
> diff --git a/qerror.h b/qerror.h
> index b4c8758..7cf7d22 100644
> --- a/qerror.h
> +++ b/qerror.h
> @@ -64,6 +64,9 @@ QError *qobject_to_qerror(const QObject *obj);
>  #define QERR_BASE_NOT_FOUND \
>      "{ 'class': 'BaseNotFound', 'data': { 'base': %s } }"
>  
> +#define QERR_BLOCK_JOB_NOT_ACTIVE \
> +    "{ 'class': 'BlockJobNotActive', 'data': { 'name': %s } }"
> +
>  #define QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED \
>      "{ 'class': 'BlockFormatFeatureNotSupported', 'data': { 'format': %s, 'name': %s, 'feature': %s } }"
>  
> 

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 02/47] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE
  2012-07-26 15:26   ` Kevin Wolf
@ 2012-07-26 15:41     ` Paolo Bonzini
  2012-07-26 16:49       ` Luiz Capitulino
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-26 15:41 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, Luiz Capitulino, eblake, qemu-devel, stefanha

Il 26/07/2012 17:26, Kevin Wolf ha scritto:
>> The DeviceNotActive error is not a particularly good match, add
>> > a separate one.
>> > 
>> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Luiz, what do you think about this one? It seems to contradict the idea
> of having only few error classes and free form error descriptions.

I agree, but that's what we have to live with for now...

> What's the error class that we should really have here? A general
> QERR_NOT_ACTIVE?

See my proposal here:
http://lists.nongnu.org/archive/html/qemu-devel/2012-07/msg00061.html
(totally ignored ;)).

This would be QERR_INVALID_STATE (quoting from that message:
"InvalidStateError is generally caused by the interaction with other
commands, could be fixed by sending some commands and retrying").

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 03/47] block: move job APIs to separate files
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 03/47] block: move job APIs to separate files Paolo Bonzini
@ 2012-07-26 15:50   ` Kevin Wolf
  0 siblings, 0 replies; 136+ messages in thread
From: Kevin Wolf @ 2012-07-26 15:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

The commit message is not only short, but it even lies. This is not pure
code motion.

I didn't really review in detail what the Makefile changes do besides
including a new blockjob.o, but I expect that either it is explained in
the commit message or preferably the not absolutely required changes to
make it build are moved into a separate patch.

> -/**
> - * block_job_cancel:
> - * @job: The job to be canceled.
> - *
> - * Asynchronously cancel the job and wait for it to reach a quiescent
> - * state.  Note that the completion callback will still be called
> - * asynchronously, hence it is *not* valid to call #bdrv_delete
> - * immediately after #block_job_cancel_sync.  Users of block jobs
> - * will usually protect the BlockDriverState objects with a reference
> - * count, should this be a concern.
> - *
> - * Returns the return value from the job if the job actually completed
> - * during the call, or -ECANCELED if it was canceled.
> - */
> -int block_job_cancel_sync(BlockJob *job);

> +/**
> + * block_job_cancel_sync:
> + * @job: The job to be canceled.
> + *
> + * Synchronously cancel the job and wait for it to reach a quiescent
> + * state.  Note that the completion callback will still be called
> + * asynchronously, hence it is *not* valid to call #bdrv_delete
> + * immediately after #block_job_cancel_sync.  Users of block jobs
> + * will usually protect the BlockDriverState objects with a reference
> + * count, should this be a concern.
> + *
> + * Returns the return value from the job if the job actually completed
> + * during the call, or -ECANCELED if it was canceled.
> + */
> +int block_job_cancel_sync(BlockJob *job);

This is _NOT_ the same. Please do not hide such changes, as harmless as
they might be, in code motion patches!

Actually, I was almost going to trust you on that and not do the diff
before I decided to have the extra check here. I'll have to spend more
time for review and be extra careful for your series now after this bad
surprise.

I think "Synchronously cancel..." is not what the comment means because
then "and wait for it" doesn't make any sense any more. As I understand
it, it means that AIO requests continue to run during
block_job_cancel_sync(). Not sure if this inner working really matters
for the caller, but if it doesn't then it should be properly reworded in
a separate patch instead of silently changing a word in a code motion patch.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 02/47] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE
  2012-07-26 15:41     ` Paolo Bonzini
@ 2012-07-26 16:49       ` Luiz Capitulino
  2012-07-26 16:59         ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Luiz Capitulino @ 2012-07-26 16:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, jcody, eblake, qemu-devel, stefanha

On Thu, 26 Jul 2012 17:41:01 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 26/07/2012 17:26, Kevin Wolf ha scritto:
> >> The DeviceNotActive error is not a particularly good match, add
> >> > a separate one.
> >> > 
> >> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > Luiz, what do you think about this one? It seems to contradict the idea
> > of having only few error classes and free form error descriptions.
> 
> I agree, but that's what we have to live with for now...

Why don't you add QERR_INVALID_STATE then?

> 
> > What's the error class that we should really have here? A general
> > QERR_NOT_ACTIVE?
> 
> See my proposal here:
> http://lists.nongnu.org/archive/html/qemu-devel/2012-07/msg00061.html
> (totally ignored ;)).
> 
> This would be QERR_INVALID_STATE (quoting from that message:
> "InvalidStateError is generally caused by the interaction with other
> commands, could be fixed by sending some commands and retrying").
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 02/47] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE
  2012-07-26 16:49       ` Luiz Capitulino
@ 2012-07-26 16:59         ` Paolo Bonzini
  2012-07-26 17:02           ` Luiz Capitulino
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-26 16:59 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: Kevin Wolf, jcody, eblake, qemu-devel, stefanha

Il 26/07/2012 18:49, Luiz Capitulino ha scritto:
> On Thu, 26 Jul 2012 17:41:01 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 26/07/2012 17:26, Kevin Wolf ha scritto:
>>>> The DeviceNotActive error is not a particularly good match, add
>>>>> a separate one.
>>>>>
>>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Luiz, what do you think about this one? It seems to contradict the idea
>>> of having only few error classes and free form error descriptions.
>>
>> I agree, but that's what we have to live with for now...
> 
> Why don't you add QERR_INVALID_STATE then?

Because I do want a meaningful error message.

Paolo

>>
>>> What's the error class that we should really have here? A general
>>> QERR_NOT_ACTIVE?
>>
>> See my proposal here:
>> http://lists.nongnu.org/archive/html/qemu-devel/2012-07/msg00061.html
>> (totally ignored ;)).
>>
>> This would be QERR_INVALID_STATE (quoting from that message:
>> "InvalidStateError is generally caused by the interaction with other
>> commands, could be fixed by sending some commands and retrying").
>>
>> Paolo
>>
> 

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 02/47] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE
  2012-07-26 16:59         ` Paolo Bonzini
@ 2012-07-26 17:02           ` Luiz Capitulino
  0 siblings, 0 replies; 136+ messages in thread
From: Luiz Capitulino @ 2012-07-26 17:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, jcody, eblake, qemu-devel, stefanha

On Thu, 26 Jul 2012 18:59:30 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 26/07/2012 18:49, Luiz Capitulino ha scritto:
> > On Thu, 26 Jul 2012 17:41:01 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> >> Il 26/07/2012 17:26, Kevin Wolf ha scritto:
> >>>> The DeviceNotActive error is not a particularly good match, add
> >>>>> a separate one.
> >>>>>
> >>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> Luiz, what do you think about this one? It seems to contradict the idea
> >>> of having only few error classes and free form error descriptions.
> >>
> >> I agree, but that's what we have to live with for now...
> > 
> > Why don't you add QERR_INVALID_STATE then?
> 
> Because I do want a meaningful error message.

It's fine with me adding QERR_BLOCK_JOB_NOT_ACTIVE then, as you've said,
it's what we have today.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command Paolo Bonzini
@ 2012-07-26 23:42   ` Eric Blake
  2012-07-27  7:04     ` Paolo Bonzini
  2012-07-31  9:26   ` Kevin Wolf
  2012-09-13 13:15   ` Kevin Wolf
  2 siblings, 1 reply; 136+ messages in thread
From: Eric Blake @ 2012-07-26 23:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 4328 bytes --]

On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
> This adds the monitor commands that start the mirroring job.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockdev.c       |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  hmp-commands.hx  |   21 +++++++++
>  hmp.c            |   28 ++++++++++++
>  hmp.h            |    1 +
>  qapi-schema.json |   35 ++++++++++++++
>  qmp-commands.hx  |   42 +++++++++++++++++
>  trace-events     |    2 +-
>  7 files changed, 258 insertions(+), 4 deletions(-)

This command only allows mirroring from the top or the complete disk,
but no intermediate points.  That is, given:
base <- sn1 <- sn2
I can mirror sn2 in isolation, or the entire chain including base, but I
cannot mirror exactly sn1+sn2.  Instead, I would have to do a mirror of
sn2 then follow up with a partial block-stream to rebase that mirror on
top of base.  I guess this is okay, but it feels a bit limited to have
to do it in two steps instead of one.

[Hmm, your later patches introduce the ability to have a persistent
mapping file; perhaps this can be exploited to have the user pre-create
a mapping file that shows the portions allocated by sn1+sn2 as the
starting point, but I'm not there in the patch series yet to know if
this is the case.]

> +++ b/qapi-schema.json
> @@ -1367,6 +1367,41 @@
>    'returns': 'str' }
>  
>  ##
> +# @drive-mirror
> +#
> +# Start mirroring a block device's writes to a new destination.
> +#
> +# @device:  the name of the device whose writes should be mirrored.
> +#
> +# @target: the target of the new image. If the file exists, or if it
> +#          is a device, the existing file/device will be used as the new
> +#          destination.  If it does not exist, a new file will be created.
> +#
> +# @format: #optional the format of the new destination, default is to
> +#          probe is @mode is 'existing', else the format of the source

s/probe is/probe if/

> +#
> +# @mode: #optional whether and how QEMU should create a new image, default is
> +#        'absolute-paths'.
> +#
> +# @speed:  #optional the maximum speed, in bytes per second
> +#
> +# @sync: what parts of the disk image should be copied to the destination
> +#        (all the disk, only the sectors allocated in the topmost image, or
> +#        only new I/O).

Is there any reason 'sync' is listed last here, but before 'mode' in the
JSON?

> +#
> +# Returns: nothing on success
> +#          If @device is not a valid block device, DeviceNotFound
> +#          If @target can't be opened, OpenFileFailed
> +#          If @format is invalid, InvalidBlockFormat
> +#
> +# Since 1.2
> +##
> +{ 'command': 'drive-mirror',
> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> +            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> +            '*speed': 'int' } }
> +

> +drive-mirror
> +------------
> +
> +Start mirroring a block device's writes to a new destination. target
> +specifies the target of the new image. If the file exists, or if it is
> +a device, it will be used as the new destination for writes. If does not
> +exist, a new file will be created. format specifies the format of the
> +mirror image, default is to probe if mode='existing', else the format
> +of the source.
> +
> +Arguments:
> +
> +- "device": device name to operate on (json-string)
> +- "target": name of new image file (json-string)
> +- "format": format of new image (json-string, optional)
> +- "mode": how an image file should be created into the target
> +  file/device (NewImageMode, optional, default 'absolute-paths')

Should we call out all the possible 'NewImageMode' strings here,...

> +- "speed": maximum speed of the streaming job, in bytes per second
> +  (json-int)

Mention that this is optional.

> +- "sync": what parts of the disk image should be copied to the destination;
> +  possibilities include "full" for all the disk, "top" for only the sectors
> +  allocated in the topmost image, or "none" to only replicate new I/O
> +  (MirrorSyncMode).

...given that we did the same for all the possible MirrorSyncMode strings?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 31/47] qemu-iotests: add mirroring test case
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 31/47] qemu-iotests: add mirroring test case Paolo Bonzini
@ 2012-07-26 23:46   ` Eric Blake
  2012-07-27  7:04     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Eric Blake @ 2012-07-26 23:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1188 bytes --]

On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
> The tests are not meant to offer full coverage; in particular, there is
> no concurrent I/O going on from the guest.  However, especially in the
> case of blkdebug-based tests (introduced later in the series) they do
> cover some paths that will usually be skipped by integration tests,
> for example tests running on kvm-autotest.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  tests/qemu-iotests/039   |  352 ++++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/group |    1 +
>  2 files changed, 353 insertions(+)
>  create mode 100755 tests/qemu-iotests/039
> 
> diff --git a/tests/qemu-iotests/039 b/tests/qemu-iotests/039
> new file mode 100755
> index 0000000..c512f14
> --- /dev/null
> +++ b/tests/qemu-iotests/039
> @@ -0,0 +1,352 @@
> +#!/usr/bin/env python
> +#
> +# Tests for image mirroring.
> +#
> +# Copyright (C) 2012 IBM Corp.

Did you write this, or IBM?  Or should it be joint copyright, because
you are adding to framework borrowed from elsewhere?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-26 23:42   ` Eric Blake
@ 2012-07-27  7:04     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-27  7:04 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha

Il 27/07/2012 01:42, Eric Blake ha scritto:
> On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
>> This adds the monitor commands that start the mirroring job.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  blockdev.c       |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  hmp-commands.hx  |   21 +++++++++
>>  hmp.c            |   28 ++++++++++++
>>  hmp.h            |    1 +
>>  qapi-schema.json |   35 ++++++++++++++
>>  qmp-commands.hx  |   42 +++++++++++++++++
>>  trace-events     |    2 +-
>>  7 files changed, 258 insertions(+), 4 deletions(-)
> 
> This command only allows mirroring from the top or the complete disk,
> but no intermediate points.  That is, given:
> base <- sn1 <- sn2
> I can mirror sn2 in isolation, or the entire chain including base, but I
> cannot mirror exactly sn1+sn2.  Instead, I would have to do a mirror of
> sn2 then follow up with a partial block-stream to rebase that mirror on
> top of base.  I guess this is okay, but it feels a bit limited to have
> to do it in two steps instead of one.

I couldn't think of a use case for this:

- preparing for a failing disk requires to keep all snapshots
unmodified, so mirroring only the top

- migration with non-shared storage requires to copy the whole disk
contents (unless you want to do part of the copy outside QEMU)

- collapsing a "tower" of images to raw for performance requires a copy
of the whole disk contents

And we need the sync parameter anyway (for 'none' and in the future for
the dirty bitmap), so I preferred to keep the API simple.

> [Hmm, your later patches introduce the ability to have a persistent
> mapping file; perhaps this can be exploited to have the user pre-create
> a mapping file that shows the portions allocated by sn1+sn2 as the
> starting point, but I'm not there in the patch series yet to know if
> this is the case.]

These are not in the posted patches (and in fact have not been developed
yet :)).

>> +++ b/qapi-schema.json
>> @@ -1367,6 +1367,41 @@
>>    'returns': 'str' }
>>  
>>  ##
>> +# @drive-mirror
>> +#
>> +# Start mirroring a block device's writes to a new destination.
>> +#
>> +# @device:  the name of the device whose writes should be mirrored.
>> +#
>> +# @target: the target of the new image. If the file exists, or if it
>> +#          is a device, the existing file/device will be used as the new
>> +#          destination.  If it does not exist, a new file will be created.
>> +#
>> +# @format: #optional the format of the new destination, default is to
>> +#          probe is @mode is 'existing', else the format of the source
> 
> s/probe is/probe if/
> 
>> +#
>> +# @mode: #optional whether and how QEMU should create a new image, default is
>> +#        'absolute-paths'.
>> +#
>> +# @speed:  #optional the maximum speed, in bytes per second
>> +#
>> +# @sync: what parts of the disk image should be copied to the destination
>> +#        (all the disk, only the sectors allocated in the topmost image, or
>> +#        only new I/O).
> 
> Is there any reason 'sync' is listed last here, but before 'mode' in the
> JSON?

No, no reason.

>> +#
>> +# Returns: nothing on success
>> +#          If @device is not a valid block device, DeviceNotFound
>> +#          If @target can't be opened, OpenFileFailed
>> +#          If @format is invalid, InvalidBlockFormat
>> +#
>> +# Since 1.2
>> +##
>> +{ 'command': 'drive-mirror',
>> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
>> +            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
>> +            '*speed': 'int' } }
>> +
> 
>> +drive-mirror
>> +------------
>> +
>> +Start mirroring a block device's writes to a new destination. target
>> +specifies the target of the new image. If the file exists, or if it is
>> +a device, it will be used as the new destination for writes. If does not
>> +exist, a new file will be created. format specifies the format of the
>> +mirror image, default is to probe if mode='existing', else the format
>> +of the source.
>> +
>> +Arguments:
>> +
>> +- "device": device name to operate on (json-string)
>> +- "target": name of new image file (json-string)
>> +- "format": format of new image (json-string, optional)
>> +- "mode": how an image file should be created into the target
>> +  file/device (NewImageMode, optional, default 'absolute-paths')
> 
> Should we call out all the possible 'NewImageMode' strings here,...
> 
>> +- "speed": maximum speed of the streaming job, in bytes per second
>> +  (json-int)
> 
> Mention that this is optional.
> 
>> +- "sync": what parts of the disk image should be copied to the destination;
>> +  possibilities include "full" for all the disk, "top" for only the sectors
>> +  allocated in the topmost image, or "none" to only replicate new I/O
>> +  (MirrorSyncMode).
> 
> ...given that we did the same for all the possible MirrorSyncMode strings?

We could.  The difference between the two is that NewImageMode is used
also in other commands.  In the end, I'm not even sure of the usefulness
of this documentation since the .json schema is much better and could be
converted to a format with hyperlinks.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 31/47] qemu-iotests: add mirroring test case
  2012-07-26 23:46   ` Eric Blake
@ 2012-07-27  7:04     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-27  7:04 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha

Il 27/07/2012 01:46, Eric Blake ha scritto:
> On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
>> The tests are not meant to offer full coverage; in particular, there is
>> no concurrent I/O going on from the guest.  However, especially in the
>> case of blkdebug-based tests (introduced later in the series) they do
>> cover some paths that will usually be skipped by integration tests,
>> for example tests running on kvm-autotest.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  tests/qemu-iotests/039   |  352 ++++++++++++++++++++++++++++++++++++++++++++++
>>  tests/qemu-iotests/group |    1 +
>>  2 files changed, 353 insertions(+)
>>  create mode 100755 tests/qemu-iotests/039
>>
>> diff --git a/tests/qemu-iotests/039 b/tests/qemu-iotests/039
>> new file mode 100755
>> index 0000000..c512f14
>> --- /dev/null
>> +++ b/tests/qemu-iotests/039
>> @@ -0,0 +1,352 @@
>> +#!/usr/bin/env python
>> +#
>> +# Tests for image mirroring.
>> +#
>> +# Copyright (C) 2012 IBM Corp.
> 
> Did you write this, or IBM?  Or should it be joint copyright, because
> you are adding to framework borrowed from elsewhere?

The latter.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 33/47] mirror: add support for on-source-error/on-target-error
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 33/47] mirror: add support for on-source-error/on-target-error Paolo Bonzini
@ 2012-07-27 15:26   ` Eric Blake
  2012-07-30 13:29     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Eric Blake @ 2012-07-27 15:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1792 bytes --]

On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
> Error management is important for mirroring; otherwise, an error on the
> target (even something as "innocent" as ENOSPC) requires to start again
> with a full copy.  Similar to on_read_error/on_write_error, two separate
> knobs are provided for on_source_error (reads) and on_target_error (writes).
> The default is 'report' for both.
> 
> The 'ignore' policy will leave the sector dirty, so that it will be
> retried later.  Thus, it will not cause corruption.

How frequently will the dirty sector be retried when the policy is
'ignore'?  Are we going to be causing a denial-of-service by repeatedly
retrying the sector until the user does something about the error event?

> +++ b/qapi-schema.json
> @@ -1389,17 +1389,27 @@
>  #        (all the disk, only the sectors allocated in the topmost image, or
>  #        only new I/O).
>  #
> +# @on-source-error: #optional the action to take on an error on the source,
> +#                   default 'report'.  'stop' and 'enospc' can only be used
> +#                   if the block device supports io-status (see BlockInfo).
> +#
> +# @on-target-error: #optional the action to take on an error on the target,
> +#                   default 'report' (no limitations, since this applies to
> +#                   a different block device than @device).
> +#
>  # Returns: nothing on success
>  #          If @device is not a valid block device, DeviceNotFound
>  #          If @target can't be opened, OpenFileFailed
>  #          If @format is invalid, InvalidBlockFormat
> +#          If @on_source_error is not supported, InvalidParameter

s/_/-/g

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 36/47] host-utils: add ffsl and flsl
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 36/47] host-utils: add ffsl and flsl Paolo Bonzini
@ 2012-07-27 16:05   ` Eric Blake
  2012-07-30 13:30     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Eric Blake @ 2012-07-27 16:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 911 bytes --]

On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
> We can provide fast versions based on the other functions defined
> by host-utils.h.  Some care is required on glibc, which provides
> ffsl already.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  host-utils.h |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)

> +#ifdef __GLIBC__
> +#define ffsl qemu_ffsl
> +#endif
> +static inline int ffsl(long val)

ffsl() makes sense in comparison to the standardized ffs() (why POSIX
doesn't specify one is beyond me).

> +
> +static inline int flsl(long val)

But what good is flsl (I'm assuming you mean find-last-set, or the
most-significant set bit), especially since there is no standardized
fls() and no fls() in host-utils.h?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 37/47] add hierarchical bitmap data type and test cases
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 37/47] add hierarchical bitmap data type and test cases Paolo Bonzini
@ 2012-07-28 13:26   ` Eric Blake
  2012-07-30 13:39     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Eric Blake @ 2012-07-28 13:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 6570 bytes --]

On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
> HBitmaps provide an array of bits.  The bits are stored as usual in an
> array of unsigned longs, but HBitmap is also optimized to provide fast
> iteration over set bits; going from one bit to the next is O(logB n)
> worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
> that the number of levels is in fact fixed.
> 

> +++ b/hbitmap.c
> + * So it is easy to move up simply by shifting the index right by
> + * log2(BITS_PER_LONG) bits.  To move down, you shift the index left
> + * similarly, and add the word index within the group.  Iteration uses
> + * ffs (find first set bit) to find the next word to examine; this
> + * operation can be done in constant time in most current architectures.

Technically, ffs and friends can be done in constant time in ALL
architectures.  There are some well-known bit-twiddling algorithms for
computing it in straightline C code with no conditionals (and therefore
in constant time, just with a larger constant than on platforms that
lack the builtin single instruction).  But no need to tweak this
documentation on a technicality :)

> +
> +struct HBitmap {
> +    /* Number of total bits in the bottom level.  */
> +    uint64_t size;
> +
> +    /* Number of set bits in the bottom level.  */
> +    uint64_t count;
> +
> +    /* A scaling factor.  When setting or resetting bits, the bitmap will
> +     * scale bit numbers right by this amount of bits.  When iterating,
> +     * the bitmap will scale bit numbers left by this amonut of bits.

s/amonut/amount/

> +     * Example of operations in a size-16, granularity-1 HBitmap:
> +     *
> +     *    initial state            00000000
> +     *    set(start=0, count=9)    11111000 (iter: 0, 2, 4, 6, 8)
> +     *    reset(start=1, count=3)  00111000 (iter: 4, 6, 8)
> +     *    set(start=9, count=2)    00111100 (iter: 4, 6, 8, 10)
> +     *    reset(start=5, count=5)  00000000

Thanks.  That helps me understand tremendously over the previous version
of this patch.  Basically, you are stating that rather than storing 16
bits, you compress and only store information on 8 pages, and each
operation on a range of bits first rounds the bits to determine which
page they land in then affect the entire page; while iteration only
visits the first bit of each page.  A granularity of 1 means pages are
1<<1 or every 2 bit numbers.  Typical values of granularity will
probably then be 0 (every bit is important) or 12 (operating on 4k
pages, where touching any byte in the page is sufficient to track that
entire page as affected in the bitmap).

> +     */
> +    int granularity;
> +
> +    /* A number of progressively less coarse bitmaps (i.e. level 0 is the
> +     * coarsest).  Each bit in level N represents a word in level N+1 that
> +     * has a set bit, except the last level where each bit represents the
> +     * actual bitmap.
> +     */
> +    unsigned long *levels[HBITMAP_LEVELS];

That is, even allocating a 1-bit bitmap will still allocate
HBITMAP_LEVELS arrays, rather than trying to dynamically optimize and
reduce the number of levels necessary to hold the requested size.  Fair
enough.

> +
> +static int hbi_count_towards(HBitmapIter *hbi, uint64_t last)
> +{
> +    uint64_t next = hbi_next_internal(hbi);
> +    int n;
> +
> +    /* Take it easy with the last few bits.  */
> +    if (next >= (last & -BITS_PER_LONG)) {
> +        return (next > last ? 0 : 1);

You could write this as:
 return next <= last;
but probably not worth the obfuscation.

> +void hbitmap_iter_init(HBitmapIter *hbi, HBitmap *hb, uint64_t first)
> +{
> +    int i, bit;
> +    size_t pos;
> +
> +    hbi->hb = hb;
> +    pos = first;
> +    for (i = HBITMAP_LEVELS; --i >= 0; ) {
> +        bit = pos & (BITS_PER_LONG - 1);
> +        pos >>= BITS_PER_LEVEL;
> +
> +        /* Drop bits representing items before first.  */
> +        hbi->cur[i] = hb->levels[i][pos] & ~((1UL << bit) - 1);
> +
> +        /* We have already added level i+1, so the lowest set bit has
> +         * been processed.  Clear it.
> +         */
> +        if (i != HBITMAP_LEVELS - 1) {
> +            hbi->cur[i] &= ~(1UL << bit);
> +        }
> +    }
> +
> +    hbi->pos = first >> BITS_PER_LEVEL;
> +    hbi->granularity = hb->granularity;

Do we really need hbi->granularity, or can the code get by with
hbi->hb->granularity?

> +HBitmap *hbitmap_alloc(uint64_t size, int granularity)
> +{
> +    HBitmap *hb = g_malloc0(sizeof(struct HBitmap));
> +    int i;
> +
> +    assert(granularity >= 0 && granularity < 64);

Shouldn't this be granularity < BITS_PER_LONG?

> +++ b/hbitmap.h

> +#define BITS_PER_LEVEL         (BITS_PER_LONG == 32 ? 5 : 6)
> +
> +/* For 32-bit, the largest that fits in a 4 GiB address space.
> + * For 64-bit, the number of sectors in 1 PiB.  Good luck, in
> + * either case... :)
> + */
> +#define HBITMAP_LOG_MAX_SIZE   (BITS_PER_LONG == 32 ? 34 : 41)
> +
> +/* Leave an extra bit for a sentinel.  */
> +#define HBITMAP_LEVELS         ((HBITMAP_LOG_MAX_SIZE / BITS_PER_LEVEL) + 1)

Interesting that this picks 7 levels for both 32-bit and 64-bit long
(hmm, that's why you capped HBITMAP_LOG_MAX_SIZE to the number of
sectors in 1 PiB, rather than covering the entire address space :)

> +
> +struct HBitmapIter {
> +    HBitmap *hb;
> +    size_t pos;
> +    int granularity;

Again, do you really need granularity here?

> +    unsigned long cur[HBITMAP_LEVELS];
> +};
> +

I did a much closer read of the code this time around, and I'm happy
that your implementation looks sound by inspection as well as has an
accompanying testsuite.

> +++ b/tests/test-hbitmap.c

> +static void test_hbitmap_iter_granularity(TestHBitmapData *data,
> +                                          const void *unused)
> +{
> +    HBitmapIter hbi;
> +
> +    /* Note that hbitmap_test_check has to be invoked manually in this test.  */
> +    hbitmap_test_init(data, 131072 << 7, 7);
> +    hbitmap_iter_init(&hbi, data->hb, 0);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
> +    hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
> +    hbitmap_iter_init(&hbi, data->hb, 0);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);

Misleading comment, since you didn't call hbitmap_test_check.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 43/47] mirror: allow customizing the granularity
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 43/47] mirror: allow customizing the granularity Paolo Bonzini
@ 2012-07-28 13:43   ` Eric Blake
  2012-07-30 13:40     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Eric Blake @ 2012-07-28 13:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1332 bytes --]

On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
> The desired granularity may be very different depending on the kind of
> operation (e.g. continous replication vs. collapse-to-raw) and whether

s/continous/continuous/

> the VM is expected to perform lots of I/O while mirroring is in progress.
> Allow the user to customize it.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

> @@ -857,6 +858,17 @@ void qmp_drive_mirror(const char *device, const char *target,
>      if (!has_mode) {
>          mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
>      }
> +    if (!has_granularity) {
> +        granularity = 65536;
> +    }
> +    if (granularity < 512 || granularity > 1048576 * 64) {
> +        error_set(errp, QERR_INVALID_PARAMETER, device);
> +        return;
> +    }
> +    if (granularity & (granularity - 1)) {
> +        error_set(errp, QERR_INVALID_PARAMETER, device);
> +        return;

In the XBZLRE migration series, we decided to round the users input down
to a power of two instead of reject it.  Should we do that here?  Also,
there is already an explicit QERR_PROPERY_VALUE_NOT_POWER_OF_2 that
might fit better here (depending on how the error cleanup series goes).

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 44/47] mirror: switch mirror_iteration to AIO
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 44/47] mirror: switch mirror_iteration to AIO Paolo Bonzini
@ 2012-07-28 13:46   ` Eric Blake
  2012-07-30 13:41     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Eric Blake @ 2012-07-28 13:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1364 bytes --]

On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
> There is really no change in the behavior of the job here, since there
> is still a maximum of one in-flight I/O operation between the source and
> the target.  However, this patch already introduces moves the copy logic

grammar: 'already introduces moves' is awkward, but I'm not sure what
you meant.

> from mirror_iteration to AIO callbacks; it also adds the logic to count
> in-flight operations, and only complete the job after they have finished.

s/complete/completes/

> 
> Some care is required in the error and cancellation cases, in order
> to avoid access to dangling pointers (and consequent corruption).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/mirror.c |  161 ++++++++++++++++++++++++++++++++++++++++++--------------
>  trace-events   |    2 +
>  2 files changed, 123 insertions(+), 40 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 81a600b..971c923 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -17,7 +17,7 @@
>  #include "qemu/ratelimit.h"
>  #include "bitmap.h"
>  
> -#define SLICE_TIME 100000000ULL /* ns */
> +#define SLICE_TIME    100000000ULL /* ns */

Why the spurious respacing?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2
  2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
                   ` (46 preceding siblings ...)
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 47/47] mirror: support arbitrarily-sized iterations Paolo Bonzini
@ 2012-07-28 13:51 ` Eric Blake
  47 siblings, 0 replies; 136+ messages in thread
From: Eric Blake @ 2012-07-28 13:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

On 07/24/2012 05:03 AM, Paolo Bonzini wrote:
> Hi all, this is the first non-RFC submission of my block job patches
> for 1.2.  Everything is there, including multiple in-flight operations
> in the mirroring job and new testcases (for all of streaming, mirroring,
> hierarchical bitmap).  The tests use blkdebug to test error reporting
> for both streaming and mirroring.

Overall, the series looks pretty nice by code inspection.

> 
> This still does not include a persistent dirty bitmap, which will be work
> for 1.3.
> 
> If you want to tinker with this, everything is available at
> git://github.com/bonzini/qemu.git in branch blkmirror-job-1.2.

Yep, now I'll have to start playing with this in parallel with tweaking
my libvirt patches to the slight interface changes, to see if anything
else jumps out at me.  I already know that I'll have to add some new
libvirt commands to tweak parameters such as granularity and buf-size,
but that can be an add-on.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 33/47] mirror: add support for on-source-error/on-target-error
  2012-07-27 15:26   ` Eric Blake
@ 2012-07-30 13:29     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-30 13:29 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha

Il 27/07/2012 17:26, Eric Blake ha scritto:
> On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
>> Error management is important for mirroring; otherwise, an error on the
>> target (even something as "innocent" as ENOSPC) requires to start again
>> with a full copy.  Similar to on_read_error/on_write_error, two separate
>> knobs are provided for on_source_error (reads) and on_target_error (writes).
>> The default is 'report' for both.
>>
>> The 'ignore' policy will leave the sector dirty, so that it will be
>> retried later.  Thus, it will not cause corruption.
> 
> How frequently will the dirty sector be retried when the policy is
> 'ignore'?  Are we going to be causing a denial-of-service by repeatedly
> retrying the sector until the user does something about the error event?

A lot.  It is definitely the "I know what I'm doing" kind of option.
The usecase I had in mind was a network backend that had some kind of
timeout-and-reconnect option, so that you know all failing operations
will take at least a few seconds.

Paolo

>> +++ b/qapi-schema.json
>> @@ -1389,17 +1389,27 @@
>>  #        (all the disk, only the sectors allocated in the topmost image, or
>>  #        only new I/O).
>>  #
>> +# @on-source-error: #optional the action to take on an error on the source,
>> +#                   default 'report'.  'stop' and 'enospc' can only be used
>> +#                   if the block device supports io-status (see BlockInfo).
>> +#
>> +# @on-target-error: #optional the action to take on an error on the target,
>> +#                   default 'report' (no limitations, since this applies to
>> +#                   a different block device than @device).
>> +#
>>  # Returns: nothing on success
>>  #          If @device is not a valid block device, DeviceNotFound
>>  #          If @target can't be opened, OpenFileFailed
>>  #          If @format is invalid, InvalidBlockFormat
>> +#          If @on_source_error is not supported, InvalidParameter
> 
> s/_/-/g
> 

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 36/47] host-utils: add ffsl and flsl
  2012-07-27 16:05   ` Eric Blake
@ 2012-07-30 13:30     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-30 13:30 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha

Il 27/07/2012 18:05, Eric Blake ha scritto:
>> > +static inline int flsl(long val)
> But what good is flsl (I'm assuming you mean find-last-set, or the
> most-significant set bit), especially since there is no standardized
> fls() and no fls() in host-utils.h?

No idea why I thought that fls existed.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 37/47] add hierarchical bitmap data type and test cases
  2012-07-28 13:26   ` Eric Blake
@ 2012-07-30 13:39     ` Paolo Bonzini
  2012-07-30 14:18       ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-30 13:39 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha

Il 28/07/2012 15:26, Eric Blake ha scritto:
> On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
>> HBitmaps provide an array of bits.  The bits are stored as usual in an
>> array of unsigned longs, but HBitmap is also optimized to provide fast
>> iteration over set bits; going from one bit to the next is O(logB n)
>> worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
>> that the number of levels is in fact fixed.
>>
> 
>> +++ b/hbitmap.c
>> + * So it is easy to move up simply by shifting the index right by
>> + * log2(BITS_PER_LONG) bits.  To move down, you shift the index left
>> + * similarly, and add the word index within the group.  Iteration uses
>> + * ffs (find first set bit) to find the next word to examine; this
>> + * operation can be done in constant time in most current architectures.
> 
> Technically, ffs and friends can be done in constant time in ALL
> architectures.  There are some well-known bit-twiddling algorithms for
> computing it in straightline C code with no conditionals (and therefore
> in constant time, just with a larger constant than on platforms that
> lack the builtin single instruction). 

Technically, those are O(log2 BITS_PER_LONG). :)

>> +
>> +struct HBitmap {
>> +    /* Number of total bits in the bottom level.  */
>> +    uint64_t size;
>> +
>> +    /* Number of set bits in the bottom level.  */
>> +    uint64_t count;
>> +
>> +    /* A scaling factor.  When setting or resetting bits, the bitmap will
>> +     * scale bit numbers right by this amount of bits.  When iterating,
>> +     * the bitmap will scale bit numbers left by this amonut of bits.
> 
> s/amonut/amount/
> 
>> +     * Example of operations in a size-16, granularity-1 HBitmap:
>> +     *
>> +     *    initial state            00000000
>> +     *    set(start=0, count=9)    11111000 (iter: 0, 2, 4, 6, 8)
>> +     *    reset(start=1, count=3)  00111000 (iter: 4, 6, 8)
>> +     *    set(start=9, count=2)    00111100 (iter: 4, 6, 8, 10)
>> +     *    reset(start=5, count=5)  00000000
> 
> Thanks.  That helps me understand tremendously over the previous version
> of this patch.  Basically, you are stating that rather than storing 16
> bits, you compress and only store information on 8 pages, and each
> operation on a range of bits first rounds the bits to determine which
> page they land in then affect the entire page; while iteration only
> visits the first bit of each page.  A granularity of 1 means pages are
> 1<<1 or every 2 bit numbers.

Yes.

> Typical values of granularity will
> probably then be 0 (every bit is important) or 12 (operating on 4k
> pages, where touching any byte in the page is sufficient to track that
> entire page as affected in the bitmap).

In our case, bit indices are sector numbers, so a common value of the
granularity will be for example 7=16-9 (for a 64k-coarse dirty bitmap).

>> +     */
>> +    int granularity;
>> +
>> +    /* A number of progressively less coarse bitmaps (i.e. level 0 is the
>> +     * coarsest).  Each bit in level N represents a word in level N+1 that
>> +     * has a set bit, except the last level where each bit represents the
>> +     * actual bitmap.
>> +     */
>> +    unsigned long *levels[HBITMAP_LEVELS];
> 
> That is, even allocating a 1-bit bitmap will still allocate
> HBITMAP_LEVELS arrays, rather than trying to dynamically optimize and
> reduce the number of levels necessary to hold the requested size.  Fair
> enough.

Also because the HBitmapIter would become even more complex than it
already is.

>> +
>> +static int hbi_count_towards(HBitmapIter *hbi, uint64_t last)
>> +{
>> +    uint64_t next = hbi_next_internal(hbi);
>> +    int n;
>> +
>> +    /* Take it easy with the last few bits.  */
>> +    if (next >= (last & -BITS_PER_LONG)) {
>> +        return (next > last ? 0 : 1);
> 
> You could write this as:
>  return next <= last;
> but probably not worth the obfuscation.

You probably guessed why I wrote it that way.  It's at the top of the
function, and such a return may give the wrong idea that the function
returns a boolean.

>> +void hbitmap_iter_init(HBitmapIter *hbi, HBitmap *hb, uint64_t first)
>> +{
>> +    int i, bit;
>> +    size_t pos;
>> +
>> +    hbi->hb = hb;
>> +    pos = first;
>> +    for (i = HBITMAP_LEVELS; --i >= 0; ) {
>> +        bit = pos & (BITS_PER_LONG - 1);
>> +        pos >>= BITS_PER_LEVEL;
>> +
>> +        /* Drop bits representing items before first.  */
>> +        hbi->cur[i] = hb->levels[i][pos] & ~((1UL << bit) - 1);
>> +
>> +        /* We have already added level i+1, so the lowest set bit has
>> +         * been processed.  Clear it.
>> +         */
>> +        if (i != HBITMAP_LEVELS - 1) {
>> +            hbi->cur[i] &= ~(1UL << bit);
>> +        }
>> +    }
>> +
>> +    hbi->pos = first >> BITS_PER_LEVEL;
>> +    hbi->granularity = hb->granularity;
> 
> Do we really need hbi->granularity, or can the code get by with
> hbi->hb->granularity?

It's probably prematurely optimized---this way the fast path does not
need to access hb at all.  But it's also a little bit helpful in gdb,
and it is practically free, so I'd rather leave it.

>> +HBitmap *hbitmap_alloc(uint64_t size, int granularity)
>> +{
>> +    HBitmap *hb = g_malloc0(sizeof(struct HBitmap));
>> +    int i;
>> +
>> +    assert(granularity >= 0 && granularity < 64);
> 
> Shouldn't this be granularity < BITS_PER_LONG?

Yep, thanks.

>> +++ b/hbitmap.h
> 
>> +#define BITS_PER_LEVEL         (BITS_PER_LONG == 32 ? 5 : 6)
>> +
>> +/* For 32-bit, the largest that fits in a 4 GiB address space.
>> + * For 64-bit, the number of sectors in 1 PiB.  Good luck, in
>> + * either case... :)
>> + */
>> +#define HBITMAP_LOG_MAX_SIZE   (BITS_PER_LONG == 32 ? 34 : 41)
>> +
>> +/* Leave an extra bit for a sentinel.  */
>> +#define HBITMAP_LEVELS         ((HBITMAP_LOG_MAX_SIZE / BITS_PER_LEVEL) + 1)
> 
> Interesting that this picks 7 levels for both 32-bit and 64-bit long
> (hmm, that's why you capped HBITMAP_LOG_MAX_SIZE to the number of
> sectors in 1 PiB, rather than covering the entire address space :)
> 
>> +
>> +struct HBitmapIter {
>> +    HBitmap *hb;
>> +    size_t pos;
>> +    int granularity;
> 
> Again, do you really need granularity here?
> 
>> +    unsigned long cur[HBITMAP_LEVELS];
>> +};
>> +
> 
> I did a much closer read of the code this time around, and I'm happy
> that your implementation looks sound by inspection as well as has an
> accompanying testsuite.

Yeah, no way to be confident about this without a testsuite.

Thanks for the review!

Paolo

>> +++ b/tests/test-hbitmap.c
> 
>> +static void test_hbitmap_iter_granularity(TestHBitmapData *data,
>> +                                          const void *unused)
>> +{
>> +    HBitmapIter hbi;
>> +
>> +    /* Note that hbitmap_test_check has to be invoked manually in this test.  */
>> +    hbitmap_test_init(data, 131072 << 7, 7);
>> +    hbitmap_iter_init(&hbi, data->hb, 0);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
>> +    hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
>> +    hbitmap_iter_init(&hbi, data->hb, 0);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
> 
> Misleading comment, since you didn't call hbitmap_test_check.
> 

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 43/47] mirror: allow customizing the granularity
  2012-07-28 13:43   ` Eric Blake
@ 2012-07-30 13:40     ` Paolo Bonzini
  2012-07-30 13:53       ` Eric Blake
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-30 13:40 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha

Il 28/07/2012 15:43, Eric Blake ha scritto:
>> > +    if (granularity < 512 || granularity > 1048576 * 64) {
>> > +        error_set(errp, QERR_INVALID_PARAMETER, device);
>> > +        return;
>> > +    }
>> > +    if (granularity & (granularity - 1)) {
>> > +        error_set(errp, QERR_INVALID_PARAMETER, device);
>> > +        return;
> In the XBZLRE migration series, we decided to round the users input down
> to a power of two instead of reject it.  Should we do that here?

I can certainly do that, but do you have a pointer to the discussion so
that I can understand the rationale?

> Also,
> there is already an explicit QERR_PROPERY_VALUE_NOT_POWER_OF_2 that
> might fit better here (depending on how the error cleanup series goes).

It's not a property value though.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 44/47] mirror: switch mirror_iteration to AIO
  2012-07-28 13:46   ` Eric Blake
@ 2012-07-30 13:41     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-30 13:41 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha

Il 28/07/2012 15:46, Eric Blake ha scritto:
> On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
>> There is really no change in the behavior of the job here, since there
>> is still a maximum of one in-flight I/O operation between the source and
>> the target.  However, this patch already introduces moves the copy logic
> 
> grammar: 'already introduces moves' is awkward, but I'm not sure what
> you meant.
> 
>> from mirror_iteration to AIO callbacks; it also adds the logic to count
>> in-flight operations, and only complete the job after they have finished.
> 
> s/complete/completes/

Wow, I'm embarrassed...

>>
>> Some care is required in the error and cancellation cases, in order
>> to avoid access to dangling pointers (and consequent corruption).
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  block/mirror.c |  161 ++++++++++++++++++++++++++++++++++++++++++--------------
>>  trace-events   |    2 +
>>  2 files changed, 123 insertions(+), 40 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 81a600b..971c923 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -17,7 +17,7 @@
>>  #include "qemu/ratelimit.h"
>>  #include "bitmap.h"
>>  
>> -#define SLICE_TIME 100000000ULL /* ns */
>> +#define SLICE_TIME    100000000ULL /* ns */
> 
> Why the spurious respacing?

This patch was split from the one that introduces MAX_IN_FLIGHT.

 #define SLICE_TIME    100000000ULL /* ns */
+#define MAX_IN_FLIGHT 16

so the respacing belongs there.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 43/47] mirror: allow customizing the granularity
  2012-07-30 13:40     ` Paolo Bonzini
@ 2012-07-30 13:53       ` Eric Blake
  2012-07-30 14:03         ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Eric Blake @ 2012-07-30 13:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

On 07/30/2012 07:40 AM, Paolo Bonzini wrote:
> Il 28/07/2012 15:43, Eric Blake ha scritto:
>>>> +    if (granularity < 512 || granularity > 1048576 * 64) {
>>>> +        error_set(errp, QERR_INVALID_PARAMETER, device);
>>>> +        return;
>>>> +    }
>>>> +    if (granularity & (granularity - 1)) {
>>>> +        error_set(errp, QERR_INVALID_PARAMETER, device);
>>>> +        return;
>> In the XBZLRE migration series, we decided to round the users input down
>> to a power of two instead of reject it.  Should we do that here?
> 
> I can certainly do that, but do you have a pointer to the discussion so
> that I can understand the rationale?

https://lists.gnu.org/archive/html/qemu-devel/2012-05/msg02421.html

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 43/47] mirror: allow customizing the granularity
  2012-07-30 13:53       ` Eric Blake
@ 2012-07-30 14:03         ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-30 14:03 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha

Il 30/07/2012 15:53, Eric Blake ha scritto:
> On 07/30/2012 07:40 AM, Paolo Bonzini wrote:
>> Il 28/07/2012 15:43, Eric Blake ha scritto:
>>>>> +    if (granularity < 512 || granularity > 1048576 * 64) {
>>>>> +        error_set(errp, QERR_INVALID_PARAMETER, device);
>>>>> +        return;
>>>>> +    }
>>>>> +    if (granularity & (granularity - 1)) {
>>>>> +        error_set(errp, QERR_INVALID_PARAMETER, device);
>>>>> +        return;
>>> In the XBZLRE migration series, we decided to round the users input down
>>> to a power of two instead of reject it.  Should we do that here?
>>
>> I can certainly do that, but do you have a pointer to the discussion so
>> that I can understand the rationale?
> 
> https://lists.gnu.org/archive/html/qemu-devel/2012-05/msg02421.html

Hmm, a buffer size (the cache size in the case of XBZRLE) is different
however.  There is no reason in principle why it needs to be a power of
two (except if you know how the hash table is implemented, or something
like that).  For example, the buf_size argument in this series does
indeed support a non-power-of-two size.

Requesting a granularity to be a power-of-two shouldn't be surprising to
anybody who has an idea of what a bit shift is and how it is used...

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 37/47] add hierarchical bitmap data type and test cases
  2012-07-30 13:39     ` Paolo Bonzini
@ 2012-07-30 14:18       ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-30 14:18 UTC (permalink / raw)
  Cc: kwolf, jcody, Eric Blake, qemu-devel, stefanha

Il 30/07/2012 15:39, Paolo Bonzini ha scritto:
>>> +HBitmap *hbitmap_alloc(uint64_t size, int granularity)
>>> >> +{
>>> >> +    HBitmap *hb = g_malloc0(sizeof(struct HBitmap));
>>> >> +    int i;
>>> >> +
>>> >> +    assert(granularity >= 0 && granularity < 64);
>> > 
>> > Shouldn't this be granularity < BITS_PER_LONG?
> Yep, thanks.
> 

Actually, no.  granularity is always applied to int64_t/uint64_t values.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 04/47] block: add block_job_query
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 04/47] block: add block_job_query Paolo Bonzini
@ 2012-07-30 14:47   ` Kevin Wolf
  2012-07-30 15:05     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-07-30 14:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> Extract it out of the implementation of query-block-jobs.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockdev.c |   15 ++-------------
>  blockjob.c |   11 +++++++++++
>  blockjob.h |    8 ++++++++
>  3 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index e066f8f..dc099f9 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1166,19 +1166,8 @@ static void do_qmp_query_block_jobs_one(void *opaque, BlockDriverState *bs)
>      BlockJob *job = bs->job;
>  
>      if (job) {
> -        BlockJobInfoList *elem;
> -        BlockJobInfo *info = g_new(BlockJobInfo, 1);
> -        *info = (BlockJobInfo){
> -            .type   = g_strdup(job->job_type->job_type),
> -            .device = g_strdup(bdrv_get_device_name(bs)),
> -            .len    = job->len,
> -            .offset = job->offset,
> -            .speed  = job->speed,
> -        };
> -
> -        elem = g_new0(BlockJobInfoList, 1);
> -        elem->value = info;
> -
> +        BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
> +        elem->value = block_job_query(bs->job);
>          (*prev)->next = elem;
>          *prev = elem;
>      }
> diff --git a/blockjob.c b/blockjob.c
> index 9737a43..a947a6e 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -161,3 +161,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
>          job->busy = true;
>      }
>  }
> +
> +BlockJobInfo *block_job_query(BlockJob *job)
> +{
> +    BlockJobInfo *info = g_new(BlockJobInfo, 1);
> +    info->type   = g_strdup(job->job_type->job_type);
> +    info->device = g_strdup(bdrv_get_device_name(job->bs));
> +    info->len    = job->len;
> +    info->offset = job->offset;
> +    info->speed  = job->speed;
> +    return info;
> +}

Why did you convert the initialisation to separate statement? If you
really want to do this, I think using g_new0 would be safer now, but I
actually like compound literals better.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 04/47] block: add block_job_query
  2012-07-30 14:47   ` Kevin Wolf
@ 2012-07-30 15:05     ` Paolo Bonzini
  2012-07-31  8:47       ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-30 15:05 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 30/07/2012 16:47, Kevin Wolf ha scritto:
>> > +BlockJobInfo *block_job_query(BlockJob *job)
>> > +{
>> > +    BlockJobInfo *info = g_new(BlockJobInfo, 1);
>> > +    info->type   = g_strdup(job->job_type->job_type);
>> > +    info->device = g_strdup(bdrv_get_device_name(job->bs));
>> > +    info->len    = job->len;
>> > +    info->offset = job->offset;
>> > +    info->speed  = job->speed;
>> > +    return info;
>> > +}
> Why did you convert the initialisation to separate statement? If you
> really want to do this, I think using g_new0 would be safer now, but I
> actually like compound literals better.

Later on I will have some more initialization beyond the list of fields,
so I preferred an explicit list.  I can change it back if you prefer.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 04/47] block: add block_job_query
  2012-07-30 15:05     ` Paolo Bonzini
@ 2012-07-31  8:47       ` Kevin Wolf
  2012-07-31  8:50         ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-07-31  8:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 30.07.2012 17:05, schrieb Paolo Bonzini:
> Il 30/07/2012 16:47, Kevin Wolf ha scritto:
>>>> +BlockJobInfo *block_job_query(BlockJob *job)
>>>> +{
>>>> +    BlockJobInfo *info = g_new(BlockJobInfo, 1);
>>>> +    info->type   = g_strdup(job->job_type->job_type);
>>>> +    info->device = g_strdup(bdrv_get_device_name(job->bs));
>>>> +    info->len    = job->len;
>>>> +    info->offset = job->offset;
>>>> +    info->speed  = job->speed;
>>>> +    return info;
>>>> +}
>> Why did you convert the initialisation to separate statement? If you
>> really want to do this, I think using g_new0 would be safer now, but I
>> actually like compound literals better.
> 
> Later on I will have some more initialization beyond the list of fields,
> so I preferred an explicit list.  I can change it back if you prefer.

What I'm really interested in is having zero-initialisation for any not
explicitly initialised fields, just to be on the safe side. You can do
that with g_new0() or with compound literals, that's a matter of taste.
My taste happens to prefer the latter, but I won't criticise a patch
based on taste as long as it's doing the same thing functionally.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 04/47] block: add block_job_query
  2012-07-31  8:47       ` Kevin Wolf
@ 2012-07-31  8:50         ` Paolo Bonzini
  2012-08-02 19:28           ` Jeff Cody
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-31  8:50 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 31/07/2012 10:47, Kevin Wolf ha scritto:
>>> >> Why did you convert the initialisation to separate statement? If you
>>> >> really want to do this, I think using g_new0 would be safer now, but I
>>> >> actually like compound literals better.
>> > 
>> > Later on I will have some more initialization beyond the list of fields,
>> > so I preferred an explicit list.  I can change it back if you prefer.
> What I'm really interested in is having zero-initialisation for any not
> explicitly initialised fields, just to be on the safe side. You can do
> that with g_new0() or with compound literals, that's a matter of taste.

Yes, and in fact I even have a change to g_new0 later in the series.
I'll squash that change in this patch.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command Paolo Bonzini
  2012-07-26 23:42   ` Eric Blake
@ 2012-07-31  9:26   ` Kevin Wolf
  2012-07-31  9:33     ` Paolo Bonzini
  2012-09-13 13:15   ` Kevin Wolf
  2 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-07-31  9:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:04, schrieb Paolo Bonzini:
> This adds the monitor commands that start the mirroring job.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Moving the discussion upstream ]

> Why make all of it inaccessible?  Everything except target device access
> does have a stable API.  The target device access can be delayed to 1.3,
> together with the much-needed QMP schema introspection.

I'm not even sure about the QMP mirror command itself.

I don't really like it, it does too many things at once: It can create
the target image file, it opens the target and it actually starts the
mirroring. It's rather bad at the first two steps, because it doesn't
take any options. This means that it can't create qcow2v3 images, for
example. Or you can't mirror into a backup with cache=unsafe while
running your real VM on cache=writethrough.

Having an all-in-one mirror command is a nice feature for HMP, but for
QMP it's more like a design problem.

Now I see you have called it drive-mirror, so that kind of implies that
it's not the final blockdev-mirror but just a QMP version of a command
primarily designed for HMP. As such this restricted functionality may be
acceptable, but it's not like everything is already perfect and there's
no room for discussion.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-31  9:26   ` Kevin Wolf
@ 2012-07-31  9:33     ` Paolo Bonzini
  2012-07-31  9:46       ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-31  9:33 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 31/07/2012 11:26, Kevin Wolf ha scritto:
> Am 24.07.2012 13:04, schrieb Paolo Bonzini:
>> This adds the monitor commands that start the mirroring job.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> [ Moving the discussion upstream ]
> 
>> Why make all of it inaccessible?  Everything except target device access
>> does have a stable API.  The target device access can be delayed to 1.3,
>> together with the much-needed QMP schema introspection.
> 
> I'm not even sure about the QMP mirror command itself.
> 
> I don't really like it, it does too many things at once: It can create
> the target image file, it opens the target and it actually starts the
> mirroring. It's rather bad at the first two steps, because it doesn't
> take any options. This means that it can't create qcow2v3 images, for
> example. Or you can't mirror into a backup with cache=unsafe while
> running your real VM on cache=writethrough.

Yes, though this can be worked around with mode: 'existing'.

> Having an all-in-one mirror command is a nice feature for HMP, but for
> QMP it's more like a design problem.
> 
> Now I see you have called it drive-mirror

I thought this was your idea. :)

> , so that kind of implies that
> it's not the final blockdev-mirror but just a QMP version of a command
> primarily designed for HMP. As such this restricted functionality may be
> acceptable, but it's not like everything is already perfect and there's
> no room for discussion.

We keep going back to the same point that we do not have -blockdev, but
it's becoming a bit frustrating to always rehash this same point...

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-31  9:33     ` Paolo Bonzini
@ 2012-07-31  9:46       ` Kevin Wolf
  2012-07-31 10:02         ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-07-31  9:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 31.07.2012 11:33, schrieb Paolo Bonzini:
> Il 31/07/2012 11:26, Kevin Wolf ha scritto:
>> Am 24.07.2012 13:04, schrieb Paolo Bonzini:
>>> This adds the monitor commands that start the mirroring job.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> [ Moving the discussion upstream ]
>>
>>> Why make all of it inaccessible?  Everything except target device access
>>> does have a stable API.  The target device access can be delayed to 1.3,
>>> together with the much-needed QMP schema introspection.
>>
>> I'm not even sure about the QMP mirror command itself.
>>
>> I don't really like it, it does too many things at once: It can create
>> the target image file, it opens the target and it actually starts the
>> mirroring. It's rather bad at the first two steps, because it doesn't
>> take any options. This means that it can't create qcow2v3 images, for
>> example. Or you can't mirror into a backup with cache=unsafe while
>> running your real VM on cache=writethrough.
> 
> Yes, though this can be worked around with mode: 'existing'.

True. Only the problem with image creation, though, not the one with
bdrv_open() flags, right?

>> Having an all-in-one mirror command is a nice feature for HMP, but for
>> QMP it's more like a design problem.
>>
>> Now I see you have called it drive-mirror
> 
> I thought this was your idea. :)

Hm, then probably we discussed similar things before. :-)

>> , so that kind of implies that
>> it's not the final blockdev-mirror but just a QMP version of a command
>> primarily designed for HMP. As such this restricted functionality may be
>> acceptable, but it's not like everything is already perfect and there's
>> no room for discussion.
> 
> We keep going back to the same point that we do not have -blockdev, but
> it's becoming a bit frustrating to always rehash this same point...

The question is whether we need it at all. We do have a drive_add
if=none, and for creating a mirror target that should actually be enough.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-31  9:46       ` Kevin Wolf
@ 2012-07-31 10:02         ` Paolo Bonzini
  2012-07-31 10:25           ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-31 10:02 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 31/07/2012 11:46, Kevin Wolf ha scritto:
>>> I'm not even sure about the QMP mirror command itself.
>>>
>>> I don't really like it, it does too many things at once: It can create
>>> the target image file, it opens the target and it actually starts the
>>> mirroring. It's rather bad at the first two steps, because it doesn't
>>> take any options. This means that it can't create qcow2v3 images, for
>>> example. Or you can't mirror into a backup with cache=unsafe while
>>> running your real VM on cache=writethrough.
>>
>> Yes, though this can be worked around with mode: 'existing'.
> 
> True. Only the problem with image creation, though, not the one with
> bdrv_open() flags, right?

Yeah, but do you really care about for example io=threads vs. io=native?
 The only interesting one is cache=unsafe; the mirror should enable
writeback caching on the target (bdrv_swap will disable it if needed;
I'll change this in the next submission), so cache=writethrough vs.
writeback doesn't matter.

>>> Having an all-in-one mirror command is a nice feature for HMP, but for
>>> QMP it's more like a design problem.
>>>
>>> Now I see you have called it drive-mirror
>>
>> I thought this was your idea. :)
> 
> Hm, then probably we discussed similar things before. :-)
> 
>>> , so that kind of implies that
>>> it's not the final blockdev-mirror but just a QMP version of a command
>>> primarily designed for HMP. As such this restricted functionality may be
>>> acceptable, but it's not like everything is already perfect and there's
>>> no room for discussion.
>>
>> We keep going back to the same point that we do not have -blockdev, but
>> it's becoming a bit frustrating to always rehash this same point...
> 
> The question is whether we need it at all. We do have a drive_add
> if=none, and for creating a mirror target that should actually be enough.

But not for creating images.  That would require qemu-img invocation.

If you're okay with always using an existing image in the QMP case (and
moving image creation to the HMP implementation), we can do it.  But I'm
not sure I like it, I think it's excessive in the other direction.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-31 10:02         ` Paolo Bonzini
@ 2012-07-31 10:25           ` Kevin Wolf
  2012-07-31 10:51             ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-07-31 10:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 31.07.2012 12:02, schrieb Paolo Bonzini:
> Il 31/07/2012 11:46, Kevin Wolf ha scritto:
>>>> I'm not even sure about the QMP mirror command itself.
>>>>
>>>> I don't really like it, it does too many things at once: It can create
>>>> the target image file, it opens the target and it actually starts the
>>>> mirroring. It's rather bad at the first two steps, because it doesn't
>>>> take any options. This means that it can't create qcow2v3 images, for
>>>> example. Or you can't mirror into a backup with cache=unsafe while
>>>> running your real VM on cache=writethrough.
>>>
>>> Yes, though this can be worked around with mode: 'existing'.
>>
>> True. Only the problem with image creation, though, not the one with
>> bdrv_open() flags, right?
> 
> Yeah, but do you really care about for example io=threads vs. io=native?
>  The only interesting one is cache=unsafe; the mirror should enable
> writeback caching on the target (bdrv_swap will disable it if needed;
> I'll change this in the next submission), so cache=writethrough vs.
> writeback doesn't matter.

Can we really make it writeback unconditionally? For a passive mirror it
probably doesn't make a difference, but what happens when the user stops
the mirroring and switches to the target? Will it stay writeback?

The same goes for aio=native/threads. Probably not interesting for the
mirror, but well afterwards.

Another interesting thing is I/O throttling. The mirror currently
implements rate limiting itself, but is there really a reason why we
can't reuse regular I/O throttling on the target?

>>>> Having an all-in-one mirror command is a nice feature for HMP, but for
>>>> QMP it's more like a design problem.
>>>>
>>>> Now I see you have called it drive-mirror
>>>
>>> I thought this was your idea. :)
>>
>> Hm, then probably we discussed similar things before. :-)
>>
>>>> , so that kind of implies that
>>>> it's not the final blockdev-mirror but just a QMP version of a command
>>>> primarily designed for HMP. As such this restricted functionality may be
>>>> acceptable, but it's not like everything is already perfect and there's
>>>> no room for discussion.
>>>
>>> We keep going back to the same point that we do not have -blockdev, but
>>> it's becoming a bit frustrating to always rehash this same point...
>>
>> The question is whether we need it at all. We do have a drive_add
>> if=none, and for creating a mirror target that should actually be enough.
> 
> But not for creating images.  That would require qemu-img invocation.

Yeah, either qemu-img or another monitor command. I believe that in
practice libvirt will do this anyway if this is the only way to specify
image creation options.

> If you're okay with always using an existing image in the QMP case (and
> moving image creation to the HMP implementation), we can do it.  But I'm
> not sure I like it, I think it's excessive in the other direction.

If you think it's helpful, we could make it optional and have a mode
'blockdev' where you don't specify a file name but a blockdev name. But
this is an approach that feels a bit HMPish...

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-31 10:25           ` Kevin Wolf
@ 2012-07-31 10:51             ` Paolo Bonzini
  2012-07-31 11:13               ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-31 10:51 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 31/07/2012 12:25, Kevin Wolf ha scritto:
>> Yeah, but do you really care about for example io=threads vs. io=native?
>>  The only interesting one is cache=unsafe; the mirror should enable
>> writeback caching on the target (bdrv_swap will disable it if needed;
>> I'll change this in the next submission), so cache=writethrough vs.
>> writeback doesn't matter.
> 
> Can we really make it writeback unconditionally? For a passive mirror it
> probably doesn't make a difference, but what happens when the user stops
> the mirroring and switches to the target? Will it stay writeback?

bdrv_swap takes care of it just fine.

> The same goes for aio=native/threads. Probably not interesting for the
> mirror, but well afterwards.

Actually it is interesting for the mirror.  Passive mirroring can only
benefit from lower latency.

But yes, bdrv_swap would not copy this one.  Right now we always use the
same aio method as the source (at worst it is ignored by the protocol),
so it is not a problem.

> Another interesting thing is I/O throttling. The mirror currently
> implements rate limiting itself, but is there really a reason why we
> can't reuse regular I/O throttling on the target?

I thought about it, but I'm worried of what happens when I/O throttling
kicks in, and how it interacts with pause/resume/cancel.

>>>>> Having an all-in-one mirror command is a nice feature for HMP, but for
>>>>> QMP it's more like a design problem.
>>>>>
>>>>> Now I see you have called it drive-mirror
>>>>
>>>> I thought this was your idea. :)
>>>
>>> Hm, then probably we discussed similar things before. :-)
>>>
>>>>> , so that kind of implies that
>>>>> it's not the final blockdev-mirror but just a QMP version of a command
>>>>> primarily designed for HMP. As such this restricted functionality may be
>>>>> acceptable, but it's not like everything is already perfect and there's
>>>>> no room for discussion.
>>>>
>>>> We keep going back to the same point that we do not have -blockdev, but
>>>> it's becoming a bit frustrating to always rehash this same point...
>>>
>>> The question is whether we need it at all. We do have a drive_add
>>> if=none, and for creating a mirror target that should actually be enough.
>>
>> But not for creating images.  That would require qemu-img invocation.
> 
> Yeah, either qemu-img or another monitor command. I believe that in
> practice libvirt will do this anyway if this is the only way to specify
> image creation options.

Playing devil's advocate because you've almost convinced me, we have the
same problem for blockdev-snapshot-sync.  Now drive-mirror is a bit
different because you can use it to "reshape" an image to something
else, but the same could be done with snapshot + streaming in many cases.

>> If you're okay with always using an existing image in the QMP case (and
>> moving image creation to the HMP implementation), we can do it.  But I'm
>> not sure I like it, I think it's excessive in the other direction.
> 
> If you think it's helpful, we could make it optional and have a mode
> 'blockdev' where you don't specify a file name but a blockdev name. But
> this is an approach that feels a bit HMPish...

I think having a few limited knobs for image creation make some sense
(not all QMP clients need to be as sophisticated as libvirt), but that's
actually an interesting idea (as it is in general to piggyback on
drive_add).

Still, it leaves something to be desired.  It's not that it feels
HMP-ish, it's that it's overloading target a bit too much.  I would
prefer to keep drive-mirror for simple clients, and have a separate
blockdev-mirror that must have a blockdev target.  But doing the same
with blockdev-snapshot-sync will always look like duct-tape, because the
blockdev name is already taken. :(  Man, sometimes it feels like we're
not getting one thing right.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-31 10:51             ` Paolo Bonzini
@ 2012-07-31 11:13               ` Kevin Wolf
  2012-07-31 11:25                 ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-07-31 11:13 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 31.07.2012 12:51, schrieb Paolo Bonzini:
> Il 31/07/2012 12:25, Kevin Wolf ha scritto:
>>> Yeah, but do you really care about for example io=threads vs. io=native?
>>>  The only interesting one is cache=unsafe; the mirror should enable
>>> writeback caching on the target (bdrv_swap will disable it if needed;
>>> I'll change this in the next submission), so cache=writethrough vs.
>>> writeback doesn't matter.
>>
>> Can we really make it writeback unconditionally? For a passive mirror it
>> probably doesn't make a difference, but what happens when the user stops
>> the mirroring and switches to the target? Will it stay writeback?
> 
> bdrv_swap takes care of it just fine.

Ah, the switch uses bdrv_swap? Then that one is fine indeed.

>> The same goes for aio=native/threads. Probably not interesting for the
>> mirror, but well afterwards.
> 
> Actually it is interesting for the mirror.  Passive mirroring can only
> benefit from lower latency.
> 
> But yes, bdrv_swap would not copy this one.  Right now we always use the
> same aio method as the source (at worst it is ignored by the protocol),
> so it is not a problem.

Fair enough, though there may be cases where you'd really want to
switch, like migrating from a block device to a file on NFS.

>> Another interesting thing is I/O throttling. The mirror currently
>> implements rate limiting itself, but is there really a reason why we
>> can't reuse regular I/O throttling on the target?
> 
> I thought about it, but I'm worried of what happens when I/O throttling
> kicks in, and how it interacts with pause/resume/cancel.

bdrv_co_write won't return until the request has been throttled, so it
should be mostly transparent. The effect that it could have is that
pausing the mirror could take a little bit longer to complete (though
not much, as there is only one mirror request at the same time). But
iirc, pausing a block job was async anyway.

Any other aspect I'm missing?

>>>>>> Having an all-in-one mirror command is a nice feature for HMP, but for
>>>>>> QMP it's more like a design problem.
>>>>>>
>>>>>> Now I see you have called it drive-mirror
>>>>>
>>>>> I thought this was your idea. :)
>>>>
>>>> Hm, then probably we discussed similar things before. :-)
>>>>
>>>>>> , so that kind of implies that
>>>>>> it's not the final blockdev-mirror but just a QMP version of a command
>>>>>> primarily designed for HMP. As such this restricted functionality may be
>>>>>> acceptable, but it's not like everything is already perfect and there's
>>>>>> no room for discussion.
>>>>>
>>>>> We keep going back to the same point that we do not have -blockdev, but
>>>>> it's becoming a bit frustrating to always rehash this same point...
>>>>
>>>> The question is whether we need it at all. We do have a drive_add
>>>> if=none, and for creating a mirror target that should actually be enough.
>>>
>>> But not for creating images.  That would require qemu-img invocation.
>>
>> Yeah, either qemu-img or another monitor command. I believe that in
>> practice libvirt will do this anyway if this is the only way to specify
>> image creation options.
> 
> Playing devil's advocate because you've almost convinced me, we have the
> same problem for blockdev-snapshot-sync.  Now drive-mirror is a bit
> different because you can use it to "reshape" an image to something
> else, but the same could be done with snapshot + streaming in many cases.

Yes, blockdev-snapshot-sync is more or less the same case. We were aware
from the beginning that it's not the right command, but apparently
didn't think of drive_add.

>>> If you're okay with always using an existing image in the QMP case (and
>>> moving image creation to the HMP implementation), we can do it.  But I'm
>>> not sure I like it, I think it's excessive in the other direction.
>>
>> If you think it's helpful, we could make it optional and have a mode
>> 'blockdev' where you don't specify a file name but a blockdev name. But
>> this is an approach that feels a bit HMPish...
> 
> I think having a few limited knobs for image creation make some sense
> (not all QMP clients need to be as sophisticated as libvirt), but that's
> actually an interesting idea (as it is in general to piggyback on
> drive_add).
> 
> Still, it leaves something to be desired.  It's not that it feels
> HMP-ish, it's that it's overloading target a bit too much.  I would
> prefer to keep drive-mirror for simple clients, and have a separate
> blockdev-mirror that must have a blockdev target.  But doing the same
> with blockdev-snapshot-sync will always look like duct-tape, because the
> blockdev name is already taken. :(  Man, sometimes it feels like we're
> not getting one thing right.

blockdev-snapshot isn't taken yet. However, having the two side by side
would imply that blockdev-snapshot is async, which I believe is
currently not the most urgent of our concerns...

Or actually, it might not even matter any more, because the thing that
really takes some time is creating and opening the image. Once you have
the blockdev, there's no point in making things async any more.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-31 11:13               ` Kevin Wolf
@ 2012-07-31 11:25                 ` Paolo Bonzini
  2012-07-31 12:17                   ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-31 11:25 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 31/07/2012 13:13, Kevin Wolf ha scritto:
> Am 31.07.2012 12:51, schrieb Paolo Bonzini:
>> Il 31/07/2012 12:25, Kevin Wolf ha scritto:
>>>> Yeah, but do you really care about for example io=threads vs. io=native?
>>>>  The only interesting one is cache=unsafe; the mirror should enable
>>>> writeback caching on the target (bdrv_swap will disable it if needed;
>>>> I'll change this in the next submission), so cache=writethrough vs.
>>>> writeback doesn't matter.
>>>
>>> Can we really make it writeback unconditionally? For a passive mirror it
>>> probably doesn't make a difference, but what happens when the user stops
>>> the mirroring and switches to the target? Will it stay writeback?
>>
>> bdrv_swap takes care of it just fine.
> 
> Ah, the switch uses bdrv_swap? Then that one is fine indeed.
> 
>>> The same goes for aio=native/threads. Probably not interesting for the
>>> mirror, but well afterwards.
>>
>> Actually it is interesting for the mirror.  Passive mirroring can only
>> benefit from lower latency.
>>
>> But yes, bdrv_swap would not copy this one.  Right now we always use the
>> same aio method as the source (at worst it is ignored by the protocol),
>> so it is not a problem.
> 
> Fair enough, though there may be cases where you'd really want to
> switch, like migrating from a block device to a file on NFS.
> 
>>> Another interesting thing is I/O throttling. The mirror currently
>>> implements rate limiting itself, but is there really a reason why we
>>> can't reuse regular I/O throttling on the target?
>>
>> I thought about it, but I'm worried of what happens when I/O throttling
>> kicks in, and how it interacts with pause/resume/cancel.
> 
> bdrv_co_write won't return until the request has been throttled, so it
> should be mostly transparent.

At the end of this series I use bdrv_aio_readv/writev.

> The effect that it could have is that
> pausing the mirror could take a little bit longer to complete (though
> not much, as there is only one mirror request at the same time).

Not anymore. :)

> But iirc, pausing a block job was async anyway.

Yes, it is, and job->busy nicely abstracts the hairy parts.

> Any other aspect I'm missing?

No, that should be ok.  Though I'm not sure if it's so useful to apply
throttling on the target.  It's more useful to throttle the source
(making writes slower than reads will help the job's convergence) and
copy at full steam to the target.

>>>> If you're okay with always using an existing image in the QMP case (and
>>>> moving image creation to the HMP implementation), we can do it.  But I'm
>>>> not sure I like it, I think it's excessive in the other direction.
>>>
>>> If you think it's helpful, we could make it optional and have a mode
>>> 'blockdev' where you don't specify a file name but a blockdev name. But
>>> this is an approach that feels a bit HMPish...
>>
>> I think having a few limited knobs for image creation make some sense
>> (not all QMP clients need to be as sophisticated as libvirt), but that's
>> actually an interesting idea (as it is in general to piggyback on
>> drive_add).
>>
>> Still, it leaves something to be desired.  It's not that it feels
>> HMP-ish, it's that it's overloading target a bit too much.  I would
>> prefer to keep drive-mirror for simple clients, and have a separate
>> blockdev-mirror that must have a blockdev target.  But doing the same
>> with blockdev-snapshot-sync will always look like duct-tape, because the
>> blockdev name is already taken. :(  Man, sometimes it feels like we're
>> not getting one thing right.
> 
> blockdev-snapshot isn't taken yet. However, having the two side by side
> would imply that blockdev-snapshot is async, which I believe is
> currently not the most urgent of our concerns...
> 
> Or actually, it might not even matter any more, because the thing that
> really takes some time is creating and opening the image. Once you have
> the blockdev, there's no point in making things async any more.

Right, blockdev-snapshot would really be just a bdrv_append operation.
/me smiles. :)  So let's keep drive-mirror as is, and later add
blockdev-mirror.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-31 11:25                 ` Paolo Bonzini
@ 2012-07-31 12:17                   ` Kevin Wolf
  2012-07-31 12:52                     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-07-31 12:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 31.07.2012 13:25, schrieb Paolo Bonzini:
> Il 31/07/2012 13:13, Kevin Wolf ha scritto:
>> Am 31.07.2012 12:51, schrieb Paolo Bonzini:
>>> Il 31/07/2012 12:25, Kevin Wolf ha scritto:
>>>> Another interesting thing is I/O throttling. The mirror currently
>>>> implements rate limiting itself, but is there really a reason why we
>>>> can't reuse regular I/O throttling on the target?
>>>
>>> I thought about it, but I'm worried of what happens when I/O throttling
>>> kicks in, and how it interacts with pause/resume/cancel.
>>
>> bdrv_co_write won't return until the request has been throttled, so it
>> should be mostly transparent.
> 
> At the end of this series I use bdrv_aio_readv/writev.
> 
>> The effect that it could have is that
>> pausing the mirror could take a little bit longer to complete (though
>> not much, as there is only one mirror request at the same time).
> 
> Not anymore. :)

Hm, I see. Makes it a bit more involved, but then the logic should be
almost the same as you already need to complete the mirror.

>> But iirc, pausing a block job was async anyway.
> 
> Yes, it is, and job->busy nicely abstracts the hairy parts.
> 
>> Any other aspect I'm missing?
> 
> No, that should be ok.  Though I'm not sure if it's so useful to apply
> throttling on the target.  It's more useful to throttle the source
> (making writes slower than reads will help the job's convergence) and
> copy at full steam to the target.

But doesn't the rate limiting of the mirror already throttle the target?
Which isn't too bad, because I think at least in the initial phase
you'll want to have both source and target throttled (later the target
is automatically throttled to the level of the source, except for bitmap
granularity artefacts).

>>>>> If you're okay with always using an existing image in the QMP case (and
>>>>> moving image creation to the HMP implementation), we can do it.  But I'm
>>>>> not sure I like it, I think it's excessive in the other direction.
>>>>
>>>> If you think it's helpful, we could make it optional and have a mode
>>>> 'blockdev' where you don't specify a file name but a blockdev name. But
>>>> this is an approach that feels a bit HMPish...
>>>
>>> I think having a few limited knobs for image creation make some sense
>>> (not all QMP clients need to be as sophisticated as libvirt), but that's
>>> actually an interesting idea (as it is in general to piggyback on
>>> drive_add).
>>>
>>> Still, it leaves something to be desired.  It's not that it feels
>>> HMP-ish, it's that it's overloading target a bit too much.  I would
>>> prefer to keep drive-mirror for simple clients, and have a separate
>>> blockdev-mirror that must have a blockdev target.  But doing the same
>>> with blockdev-snapshot-sync will always look like duct-tape, because the
>>> blockdev name is already taken. :(  Man, sometimes it feels like we're
>>> not getting one thing right.
>>
>> blockdev-snapshot isn't taken yet. However, having the two side by side
>> would imply that blockdev-snapshot is async, which I believe is
>> currently not the most urgent of our concerns...
>>
>> Or actually, it might not even matter any more, because the thing that
>> really takes some time is creating and opening the image. Once you have
>> the blockdev, there's no point in making things async any more.
> 
> Right, blockdev-snapshot would really be just a bdrv_append operation.
> /me smiles. :)  So let's keep drive-mirror as is, and later add
> blockdev-mirror.

Ok, that's fair enough.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-31 12:17                   ` Kevin Wolf
@ 2012-07-31 12:52                     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-31 12:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 31/07/2012 14:17, Kevin Wolf ha scritto:
>> No, that should be ok.  Though I'm not sure if it's so useful to apply
>> throttling on the target.  It's more useful to throttle the source
>> (making writes slower than reads will help the job's convergence) and
>> copy at full steam to the target.
> 
> But doesn't the rate limiting of the mirror already throttle the target?

Of course whatever you throttle (any of job, source, target) will have
an effect on the other two as well.

IMO, the target is perhaps the least useful to throttle.  It is more
interesting to play with the source, because that's guest visible.
Slowing down the target, while letting the guest run at full speed is
unlikely to help convergence of the job.

On the other hand, the job and target speeds are really duplicates of
each other, so the job speed is really just as useless.

So it sounds like removing the job speed is a good idea.  If needed,
libvirt can implement it later with a named block device for the target
+ I/O throttling.

> Which isn't too bad, because I think at least in the initial phase
> you'll want to have both source and target throttled (later the target
> is automatically throttled to the level of the source, except for bitmap
> granularity artefacts).

The target is always throttled to the level of the source and vice
versa.  The target can never be written faster than you read the source;
and slowing down the target will keep buffers busy so you cannot read
more from the source.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 14/47] stream: add on-error argument
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 14/47] stream: add on-error argument Paolo Bonzini
@ 2012-07-31 18:40   ` Eric Blake
  2012-08-01 10:29   ` Kevin Wolf
  1 sibling, 0 replies; 136+ messages in thread
From: Eric Blake @ 2012-07-31 18:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 803 bytes --]

On 07/24/2012 05:03 AM, Paolo Bonzini wrote:
> This patch adds support for error management to streaming.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

> +++ b/qapi-schema.json
> @@ -1656,17 +1656,23 @@
>  #
>  # @speed:  #optional the maximum speed, in bytes per second
>  #
> +# @on-error: #optional the action to take on an error (default report).
> +#            'stop' and 'enospc' can only be used if the block device
> +#            supports io-status (see BlockInfo).  Since 1.2.

>  #          If @speed is invalid, InvalidParameter
> +#          If @on_error is not supported, InvalidParameter

s/on_error/on-error/ to match the rest of the docs

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 06/47] qmp: add block-job-pause and block-job-resume
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 06/47] qmp: add block-job-pause and block-job-resume Paolo Bonzini
@ 2012-08-01  7:42   ` Kevin Wolf
  0 siblings, 0 replies; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01  7:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> Add QMP commands matching the functionality.
> 
> Paused jobs cannot be canceled without first resuming them.  This
> ensures that I/O errors are never missed by management.  However, an
> optional force argument can be specified to allow that.  Right now,
> jobs do not see the difference between a forced and a normal resume.
> This can be changed in the future if needed.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

> +##
> +# @block-job-pause:
> +#
> +# Pause an active background block operation.
> +#
> +# This command returns immediately after marking the active background block
> +# operation for pausing.  It is an error to call this command if no
> +# operation is in progress.
> +#
> +# The operation will pause as soon as possible.  No event is emitted when
> +# the operation is actually paused.  Cancelling a paused job automatically
> +# resumes it.
> +#
> +# @device: the device name
> +#
> +# Returns: Nothing on success
> +#          If no background operation is active on this device, DeviceNotActive
> +#
> +# Since: 1.2
> +##
> +{ 'command': 'block-job-pause', 'data': { 'device': 'str' } }
> +
> +##
> +# @block-job-resume:
> +#
> +# Resume an active background block operation.
> +#
> +# This command returns immediately after resuming a paused background block
> +# operation for cancellation.  It is an error to call this command if no
> +# operation is in progress.
> +#
> +# @device: the device name
> +#
> +# Returns: Nothing on success
> +#          If no background operation is active on this device, DeviceNotActive
> +#          If cancellation already in progress, DeviceInUse
> +#
> +# Since: 1.2
> +##
> +{ 'command': 'block-job-resume', 'data': { 'device': 'str' } }

We should document what happens when you pause an already paused job, or
resume a running one. Currently, a second pause is completely ignored,
and a second resume is mostly ignored as well, but can shorten the
rate-limiting delay (the latter isn't worth to be documented, but if we
should be clear about whether it's allowed)

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 11/47] block: reorganize io error code
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 11/47] block: reorganize io error code Paolo Bonzini
@ 2012-08-01  9:30   ` Kevin Wolf
  2012-08-01  9:46     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01  9:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> Move the common part of IDE/SCSI/virtio error handling to the block
> layer.  The new function bdrv_error_action subsumes all three of
> bdrv_emit_qmp_error_event, vm_stop, bdrv_iostatus_set_err.
> 
> The same scheme will be used for errors in block jobs.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c         |   46 ++++++++++++++++++++++++++++++++++++++--------
>  block.h         |    5 +++--
>  hw/ide/core.c   |   20 +++++---------------
>  hw/scsi-disk.c  |   23 +++++++----------------
>  hw/virtio-blk.c |   19 +++++--------------
>  qemu-tool.c     |    6 ++++++
>  6 files changed, 64 insertions(+), 55 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 5cd3a4b..333a8fd 100644
> --- a/block.c
> +++ b/block.c
> @@ -29,6 +29,7 @@
>  #include "blockjob.h"
>  #include "module.h"
>  #include "qjson.h"
> +#include "sysemu.h"
>  #include "qemu-coroutine.h"
>  #include "qmp-commands.h"
>  #include "qemu-timer.h"
> @@ -1152,8 +1153,8 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
>      }
>  }
>  
> -void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
> -                               BlockErrorAction action, int is_read)
> +static void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
> +                                      BlockErrorAction action, int is_read)
>  {
>      QObject *data;
>      const char *action_str;
> @@ -2147,6 +2148,39 @@ BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, int is_read)
>      return is_read ? bs->on_read_error : bs->on_write_error;
>  }
>  
> +BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, int is_read, int error)

Maybe bool is_read?

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 12/47] block: sort BlockDeviceIoStatus errors by severity
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 12/47] block: sort BlockDeviceIoStatus errors by severity Paolo Bonzini
@ 2012-08-01  9:44   ` Paolo Bonzini
  2012-08-01  9:44   ` Kevin Wolf
  1 sibling, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-01  9:44 UTC (permalink / raw)
  Cc: kwolf, jcody, eblake, qemu-devel, stefanha

Il 24/07/2012 13:03, Paolo Bonzini ha scritto:
> This does not let a "failed" (EIO) status override a "nospace" status.
> When several concurrent asynchronous operations fail, management will
> always observe the most severe condition.

Patch dropped; Kevin noted on IRC that if you have both errors you need to
take separate actions to fix them.  So when one is fixed the requeued
request will always fail and it's not important which error is signaled
first.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 12/47] block: sort BlockDeviceIoStatus errors by severity
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 12/47] block: sort BlockDeviceIoStatus errors by severity Paolo Bonzini
  2012-08-01  9:44   ` Paolo Bonzini
@ 2012-08-01  9:44   ` Kevin Wolf
  1 sibling, 0 replies; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01  9:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> This does not let a "failed" (EIO) status override a "nospace" status.
> When several concurrent asynchronous operations fail, management will
> always observe the most severe condition.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

As discussed in IRC, the commit message doesn't agree with the patch.
Probably the patch isn't needed at all because resuming after fixing the
first error will always give you the second one, and it doesn't really
matter which one you fix first.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 11/47] block: reorganize io error code
  2012-08-01  9:30   ` Kevin Wolf
@ 2012-08-01  9:46     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-01  9:46 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 01/08/2012 11:30, Kevin Wolf ha scritto:
> Maybe bool is_read?

I was mimicking existing code, but since we're touching this part it's
worth cleaning it up.  I'll add a patch before this one to convert the
existing occurrences of int is_read, and then use bool here too.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 13/47] block: introduce block job error Paolo Bonzini
  2012-07-25 17:40   ` Eric Blake
@ 2012-08-01 10:14   ` Kevin Wolf
  2012-08-01 11:17     ` Paolo Bonzini
  1 sibling, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01 10:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> The following behaviors are possible:
> 
> 'report': The behavior is the same as in 1.1.  An I/O error,
> respectively during a read or a write, will complete the job immediately
> with an error code.
> 
> 'ignore': An I/O error, respectively during a read or a write, will be
> ignored.  For streaming, the job will complete with an error and the
> backing file will be left in place.  For mirroring, the sector will be
> marked again as dirty and re-examined later.
> 
> 'stop': The job will be paused and the job iostatus will be set to
> failed or nospace, while the VM will keep running.  This can only be
> specified if the block device has rerror=stop and werror=stop or enospc.
> 
> 'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.
> 
> In all cases, even for 'report', the I/O error is reported as a QMP
> event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR.
> 
> It is possible that while stopping the VM a BLOCK_IO_ERROR event will be
> reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa.
> This is not really avoidable since stopping the VM completes all pending
> I/O requests.  In fact, it is already possible now that a series of
> BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop
> calls bdrv_drain_all and this can generate further errors.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

If we want to switch to named target block devices later, it would
probably make sense to use the io_status of that block device rather
than adding it to the job.

Maybe what results is a duplication that can be tolerated, though.

> +BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
> +                                        BlockdevOnError on_err,
> +                                        int is_read, int error)
> +{
> +    BlockErrorAction action;
> +
> +    switch (on_err) {
> +    case BLOCKDEV_ON_ERROR_ENOSPC:
> +        action = (error == ENOSPC) ? BDRV_ACTION_STOP : BDRV_ACTION_REPORT;
> +        break;
> +    case BLOCKDEV_ON_ERROR_STOP:
> +        action = BDRV_ACTION_STOP;
> +        break;
> +    case BLOCKDEV_ON_ERROR_REPORT:
> +        action = BDRV_ACTION_REPORT;
> +        break;
> +    case BLOCKDEV_ON_ERROR_IGNORE:
> +        action = BDRV_ACTION_IGNORE;
> +        break;
> +    default:
> +        abort();
> +    }
> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR, action, is_read);
> +    if (action == BDRV_ACTION_STOP) {
> +        block_job_pause(job);
> +        if (bs == job->bs) {
> +            block_job_iostatus_set_err(job, error);
> +        } else {
> +            bdrv_iostatus_set_err(bs, error);
> +        }

However, so that everything just falls into place once we make the
target block device visible, I'd make the bdrv_iostatus_set_err() call
unconditional then.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 14/47] stream: add on-error argument
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 14/47] stream: add on-error argument Paolo Bonzini
  2012-07-31 18:40   ` Eric Blake
@ 2012-08-01 10:29   ` Kevin Wolf
  2012-08-01 11:11     ` Paolo Bonzini
  1 sibling, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01 10:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> This patch adds support for error management to streaming.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/stream.c   |   28 +++++++++++++++++++++++++++-
>  block_int.h      |    3 ++-
>  blockdev.c       |   11 ++++++++---
>  hmp.c            |    3 ++-
>  qapi-schema.json |   10 ++++++++--
>  qmp-commands.hx  |    2 +-
>  6 files changed, 48 insertions(+), 9 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index b3ede44..03cae14 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -31,6 +31,7 @@ typedef struct StreamBlockJob {
>      BlockJob common;
>      RateLimit limit;
>      BlockDriverState *base;
> +    BlockdevOnError on_error;
>      char backing_file_id[1024];
>  } StreamBlockJob;
>  
> @@ -78,6 +79,7 @@ static void coroutine_fn stream_run(void *opaque)
>      BlockDriverState *bs = s->common.bs;
>      BlockDriverState *base = s->base;
>      int64_t sector_num, end;
> +    int error = 0;
>      int ret = 0;
>      int n = 0;
>      void *buf;
> @@ -136,7 +138,19 @@ wait:
>              ret = stream_populate(bs, sector_num, n, buf);
>          }
>          if (ret < 0) {
> -            break;
> +            BlockErrorAction action =
> +                block_job_error_action(&s->common, s->common.bs, s->on_error,
> +                                       true, -ret);
> +            if (action == BDRV_ACTION_STOP) {
> +                n = 0;
> +                continue;
> +            }
> +            if (error == 0) {
> +                error = ret;
> +            }

I'm not sure we should return an error at the end of the job for
BDRV_ACTION_IGNORE. If it's used for guest block devices, there's no
sign of an error either (except the guest reading garbage, of course).
But it's hard to discuss a feature for which there is no clear use case
anyway... Maybe it's something you could use to rescue what's still
accessible on a corrupted image or so.

In any case we should document it somewhere. Your commit message in
patch 13 probably belongs somewhere in the docs.

>  void stream_start(BlockDriverState *bs, BlockDriverState *base,
>                    const char *base_id, int64_t speed,
> +                  BlockdevOnError on_error,
>                    BlockDriverCompletionFunc *cb,
>                    void *opaque, Error **errp)
>  {
>      StreamBlockJob *s;
>  
> +    if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
> +         on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
> +        !bdrv_iostatus_is_enabled(bs)) {
> +        error_set(errp, QERR_INVALID_PARAMETER, "on-error");
> +        return;
> +    }

Hm, this is an interesting one. bdrv_iostatus_is_enabled() returns true
for a block device that is (or was once) attached to virtio-blk, IDE or
scsi-disk. Which made sense so far because only these devices would
actually set the status.

Now with block jobs, we have other places that can set the status. And
we have images that don't belong to any device, but can still get errors
(mirror target). Maybe it would make sense to just enable the iostatus
here instead of failing?

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 17/47] qemu-iotests: add tests for streaming error handling
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 17/47] qemu-iotests: add tests for streaming error handling Paolo Bonzini
@ 2012-08-01 10:43   ` Kevin Wolf
  2012-08-01 11:09     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01 10:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> Add a test for each of report/ignore/stop.  The tests use blkdebug
> to generate an error in the middle of a script.  The error is
> recoverable (once = "on") so that we can test resuming a job after
> stopping for an error.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  tests/qemu-iotests/030        |  138 +++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/group      |    2 +-
>  tests/qemu-iotests/iotests.py |    7 +++
>  3 files changed, 146 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
> index 0163945..c65bf5e 100755
> --- a/tests/qemu-iotests/030
> +++ b/tests/qemu-iotests/030
> @@ -163,6 +163,144 @@ class TestSingleDrive(ImageStreamingTestCase):
>          result = self.vm.qmp('block-stream', device='nonexistent')
>          self.assert_qmp(result, 'error/class', 'DeviceNotFound')
>  
> +class TestErrors(ImageStreamingTestCase):
> +    image_len = 2 * 1024 * 1024 # MB
> +
> +    # this should match STREAM_BUFFER_SIZE/512 in block/stream.c
> +    STREAM_BUFFER_SIZE = 512 * 1024
> +
> +    def create_blkdebug_file(self, name, event, errno):
> +        file = open(name, 'w')
> +        file.write('''
> +[inject-error]
> +state = "1"
> +event = "%s"
> +errno = "%d"
> +immediately = "off"
> +once = "on"
> +sector = "%d"
> +
> +[set-state]
> +state = "1"
> +event = "%s"
> +new_state = "2"
> +
> +[set-state]
> +state = "2"
> +event = "%s"
> +new_state = "1"
> +''' % (event, errno, self.STREAM_BUFFER_SIZE / 512, event, event))
> +        file.close()
> +
> +    def setUp(self):
> +        self.blkdebug_file = backing_img + ".blkdebug"
> +        self.create_image(backing_img, TestErrors.image_len)
> +        self.create_blkdebug_file(self.blkdebug_file, "read_aio", 5)
> +        qemu_img('create', '-f', iotests.imgfmt,
> +                 '-o', 'backing_file=blkdebug:%s:%s,backing_fmt=raw'
> +                       % (self.blkdebug_file, backing_img),
> +                 test_img)
> +        self.vm = iotests.VM().add_drive(test_img)
> +        self.vm.launch()
> +
> +    def tearDown(self):
> +        self.vm.shutdown()
> +        os.remove(test_img)
> +        os.remove(backing_img)
> +        os.remove(self.blkdebug_file)
> +
> +    def test_report(self):
> +        self.assert_no_active_streams()
> +
> +        result = self.vm.qmp('block-stream', device='drive0')
> +        self.assert_qmp(result, 'return', {})
> +
> +        completed = False
> +        error = False
> +        while not completed:
> +            for event in self.vm.get_qmp_events(wait=True):
> +                if event['event'] == 'BLOCK_JOB_ERROR':
> +                    self.assert_qmp(event, 'data/device', 'drive0')
> +                    self.assert_qmp(event, 'data/operation', 'read')

data/action should be asserted as well (same in the other tests).

What about adding an enospc test as well, once with EIO and once with
ENOSPC?

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 17/47] qemu-iotests: add tests for streaming error handling
  2012-08-01 10:43   ` Kevin Wolf
@ 2012-08-01 11:09     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-01 11:09 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 01/08/2012 12:43, Kevin Wolf ha scritto:
>> > +
>> > +    def test_report(self):
>> > +        self.assert_no_active_streams()
>> > +
>> > +        result = self.vm.qmp('block-stream', device='drive0')
>> > +        self.assert_qmp(result, 'return', {})
>> > +
>> > +        completed = False
>> > +        error = False
>> > +        while not completed:
>> > +            for event in self.vm.get_qmp_events(wait=True):
>> > +                if event['event'] == 'BLOCK_JOB_ERROR':
>> > +                    self.assert_qmp(event, 'data/device', 'drive0')
>> > +                    self.assert_qmp(event, 'data/operation', 'read')
> data/action should be asserted as well (same in the other tests).
> 
> What about adding an enospc test as well, once with EIO and once with
> ENOSPC?

Ok.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 14/47] stream: add on-error argument
  2012-08-01 10:29   ` Kevin Wolf
@ 2012-08-01 11:11     ` Paolo Bonzini
  2012-08-01 11:45       ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-01 11:11 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 01/08/2012 12:29, Kevin Wolf ha scritto:
>> > +    if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
>> > +         on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
>> > +        !bdrv_iostatus_is_enabled(bs)) {
>> > +        error_set(errp, QERR_INVALID_PARAMETER, "on-error");
>> > +        return;
>> > +    }
> Hm, this is an interesting one. bdrv_iostatus_is_enabled() returns true
> for a block device that is (or was once) attached to virtio-blk, IDE or
> scsi-disk. Which made sense so far because only these devices would
> actually set the status.
> 
> Now with block jobs, we have other places that can set the status. And
> we have images that don't belong to any device, but can still get errors
> (mirror target). Maybe it would make sense to just enable the iostatus
> here instead of failing?

I'm not sure what would happen, so I preferred to be safe.

The right solution would be "support iostatus in sd and friends, and
drop bdrv_iostatus_is_enabled altogether", of course...

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 10:14   ` Kevin Wolf
@ 2012-08-01 11:17     ` Paolo Bonzini
  2012-08-01 11:49       ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-01 11:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 01/08/2012 12:14, Kevin Wolf ha scritto:
>> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> If we want to switch to named target block devices later, it would
> probably make sense to use the io_status of that block device rather
> than adding it to the job.
> 
> Maybe what results is a duplication that can be tolerated, though.

We are probably thinking of two opposite implementations.

You are thinking:

- errors in streaming, or in the mirroring source go to the block device

- errors in the mirroring target go to the block job

What I implemented is:

- errors in streaming, or in the mirroring source go to the block job

- errors in the mirroring target go to the target block device (which as
of this series could be inspected with query-block-jobs).


The reason is that an error in streaming or in the mirroring source does
not stop the VM.  A hypothetical management that polled for errors with
"info block" would see a mismatch between the error state ("failed") and
the VM RunState ("running").

So this is already ready for making the target block device visible.

The real question is: if I remove the possibility to inspect the (so far
anonymous) target device with query-block-jobs, how do you read the
status of the target device?...

Paolo

>> +    }
>> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR, action, is_read);
>> +    if (action == BDRV_ACTION_STOP) {
>> +        block_job_pause(job);
>> +        if (bs == job->bs) {
>> +            block_job_iostatus_set_err(job, error);
>> +        } else {
>> +            bdrv_iostatus_set_err(bs, error);
>> +        }
> 
> However, so that everything just falls into place once we make the
> target block device visible, I'd make the bdrv_iostatus_set_err() call
> unconditional then.


Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 14/47] stream: add on-error argument
  2012-08-01 11:11     ` Paolo Bonzini
@ 2012-08-01 11:45       ` Kevin Wolf
  0 siblings, 0 replies; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01 11:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 01.08.2012 13:11, schrieb Paolo Bonzini:
> Il 01/08/2012 12:29, Kevin Wolf ha scritto:
>>>> +    if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
>>>> +         on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
>>>> +        !bdrv_iostatus_is_enabled(bs)) {
>>>> +        error_set(errp, QERR_INVALID_PARAMETER, "on-error");
>>>> +        return;
>>>> +    }
>> Hm, this is an interesting one. bdrv_iostatus_is_enabled() returns true
>> for a block device that is (or was once) attached to virtio-blk, IDE or
>> scsi-disk. Which made sense so far because only these devices would
>> actually set the status.
>>
>> Now with block jobs, we have other places that can set the status. And
>> we have images that don't belong to any device, but can still get errors
>> (mirror target). Maybe it would make sense to just enable the iostatus
>> here instead of failing?
> 
> I'm not sure what would happen, so I preferred to be safe.
> 
> The right solution would be "support iostatus in sd and friends, and
> drop bdrv_iostatus_is_enabled altogether", of course...

Support it _for_ sd and friends, but support it _in_ the block layer.
What's missing before this can happen is a VMState for the block layer
so that requeued requests can be migrated. Breaks migration
compatibility, obviously.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 11:17     ` Paolo Bonzini
@ 2012-08-01 11:49       ` Kevin Wolf
  2012-08-01 12:09         ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01 11:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 01.08.2012 13:17, schrieb Paolo Bonzini:
> Il 01/08/2012 12:14, Kevin Wolf ha scritto:
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> If we want to switch to named target block devices later, it would
>> probably make sense to use the io_status of that block device rather
>> than adding it to the job.
>>
>> Maybe what results is a duplication that can be tolerated, though.
> 
> We are probably thinking of two opposite implementations.
> 
> You are thinking:
> 
> - errors in streaming, or in the mirroring source go to the block device
> 
> - errors in the mirroring target go to the block job
> 
> What I implemented is:
> 
> - errors in streaming, or in the mirroring source go to the block job
> 
> - errors in the mirroring target go to the target block device (which as
> of this series could be inspected with query-block-jobs).

Ah, yes, I misunderstood.

> The reason is that an error in streaming or in the mirroring source does
> not stop the VM.  A hypothetical management that polled for errors with
> "info block" would see a mismatch between the error state ("failed") and
> the VM RunState ("running").
> 
> So this is already ready for making the target block device visible.
> 
> The real question is: if I remove the possibility to inspect the (so far
> anonymous) target device with query-block-jobs, how do you read the
> status of the target device?...

You don't? :-)

Maybe we should use named block devices from the beginning.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 11:49       ` Kevin Wolf
@ 2012-08-01 12:09         ` Paolo Bonzini
  2012-08-01 12:23           ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-01 12:09 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 01/08/2012 13:49, Kevin Wolf ha scritto:
> > The real question is: if I remove the possibility to inspect the (so far
> > anonymous) target device with query-block-jobs, how do you read the
> > status of the target device?...
> 
> You don't? :-)

That's a possibility. :)

You can just report it in the block job.  It's more consistent with the
fact that you got a BLOCK_JOB_ERROR and not a BLOCK_IO_ERROR.  So I
would do:

+    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR,
+                              action, is_read);
+    if (action == BDRV_ACTION_STOP) {
+        block_job_pause(job);
+        block_job_iostatus_set_err(job, error);
+        if (bs != job->bs) {
+            bdrv_iostatus_set_err(bs, error);
+        }
+    }

where the bdrv_iostatus_set_err is mostly to "prepare for the future"
usage of named block devices.

As you said for ENOSPC vs. EIO, management must be ready to retry
multiple times, if it has only the final state at its disposal.

On the other hand, if you see the exact sequence of BLOCK_IO_ERROR vs.
BLOCK_JOB_ERROR you know exactly how the error happened and you can fix it.

> Maybe we should use named block devices from the beginning.

Hmm, but I'm a bit wary of introducing such a big change now.  We know
what it makes nicer, but we don't know of anything irremediably broken
without them, and we haven't thought enough of any warts it introduces.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 12:09         ` Paolo Bonzini
@ 2012-08-01 12:23           ` Kevin Wolf
  2012-08-01 12:30             ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01 12:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 01.08.2012 14:09, schrieb Paolo Bonzini:
> Il 01/08/2012 13:49, Kevin Wolf ha scritto:
>>> The real question is: if I remove the possibility to inspect the (so far
>>> anonymous) target device with query-block-jobs, how do you read the
>>> status of the target device?...
>>
>> You don't? :-)
> 
> That's a possibility. :)
> 
> You can just report it in the block job.  It's more consistent with the
> fact that you got a BLOCK_JOB_ERROR and not a BLOCK_IO_ERROR.  So I
> would do:
> 
> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR,
> +                              action, is_read);

Isn't job->bs the source?

Also, if you miss the event (e.g. libvirt crashed), can you still tell
which bs caused the error? Do we need another BlockJobInfo field for the
name (*cough* ;-)) of the failed bs?

If I understand it right, error reporting on the source and on the
target would be completely symmetrical then, which I think is a good thing.

job->bs makes one image special, which isn't really useful for a generic
interface. So we should keep the fact that it exists internal and get
rid of it sometime.

> +    if (action == BDRV_ACTION_STOP) {
> +        block_job_pause(job);
> +        block_job_iostatus_set_err(job, error);
> +        if (bs != job->bs) {
> +            bdrv_iostatus_set_err(bs, error);
> +        }
> +    }
> 
> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
> usage of named block devices.

Again, I'd make it unconditional to get symmetric behaviour.

> As you said for ENOSPC vs. EIO, management must be ready to retry
> multiple times, if it has only the final state at its disposal.
> 
> On the other hand, if you see the exact sequence of BLOCK_IO_ERROR vs.
> BLOCK_JOB_ERROR you know exactly how the error happened and you can fix it.
> 
>> Maybe we should use named block devices from the beginning.
> 
> Hmm, but I'm a bit wary of introducing such a big change now.  We know
> what it makes nicer, but we don't know of anything irremediably broken
> without them, and we haven't thought enough of any warts it introduces.

On the one hand, I can understand your concerns, but on the other hand,
introducing an API now and then making such a big change afterwards
scares me much more.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 12:23           ` Kevin Wolf
@ 2012-08-01 12:30             ` Paolo Bonzini
  2012-08-01 13:09               ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-01 12:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 01/08/2012 14:23, Kevin Wolf ha scritto:
> Am 01.08.2012 14:09, schrieb Paolo Bonzini:
>> Il 01/08/2012 13:49, Kevin Wolf ha scritto:
>>>> The real question is: if I remove the possibility to inspect the (so far
>>>> anonymous) target device with query-block-jobs, how do you read the
>>>> status of the target device?...
>>>
>>> You don't? :-)
>>
>> That's a possibility. :)
>>
>> You can just report it in the block job.  It's more consistent with the
>> fact that you got a BLOCK_JOB_ERROR and not a BLOCK_IO_ERROR.  So I
>> would do:
>>
>> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR,
>> +                              action, is_read);
> 
> Isn't job->bs the source?

It's not about source vs. target, it's about what block device the _job_
is attached too.  The source is always going to be special because you
must do "block-job-resume source".

is_read tells you where the error was.

> Also, if you miss the event (e.g. libvirt crashed), can you still tell
> which bs caused the error?

No, but how is it different from "can you still tell which bs in the
chain caused the error"?  Which you cannot tell anyway even from the
event parameters we have had so far.

> Do we need another BlockJobInfo field for the
> name (*cough* ;-)) of the failed bs?
> 
> If I understand it right, error reporting on the source and on the
> target would be completely symmetrical then, which I think is a good thing.

Yeah, it would, but job->bs _is_ special anyway.

> job->bs makes one image special, which isn't really useful for a generic
> interface. So we should keep the fact that it exists internal and get
> rid of it sometime.
> 
>> +    if (action == BDRV_ACTION_STOP) {
>> +        block_job_pause(job);
>> +        block_job_iostatus_set_err(job, error);
>> +        if (bs != job->bs) {
>> +            bdrv_iostatus_set_err(bs, error);
>> +        }
>> +    }
>>
>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>> usage of named block devices.
> 
> Again, I'd make it unconditional to get symmetric behaviour.

Not possible, because existing clients may expect "iostatus:
{nospace,failed}" to imply a runstate of "not running (i/o error)".

>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>> what it makes nicer, but we don't know of anything irremediably broken
>> without them, and we haven't thought enough of any warts it introduces.
> 
> On the one hand, I can understand your concerns, but on the other hand,
> introducing an API now and then making such a big change afterwards
> scares me much more.

One example of the doubts I have: is iostatus a BlockBackend or a
BlockDriverState thing?  If it a BlockBackend thing, does the target
device have an iostatus at all?

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 12:30             ` Paolo Bonzini
@ 2012-08-01 13:09               ` Kevin Wolf
  2012-08-01 13:21                 ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01 13:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 01.08.2012 14:30, schrieb Paolo Bonzini:
> Il 01/08/2012 14:23, Kevin Wolf ha scritto:
>> Am 01.08.2012 14:09, schrieb Paolo Bonzini:
>>> Il 01/08/2012 13:49, Kevin Wolf ha scritto:
>>>>> The real question is: if I remove the possibility to inspect the (so far
>>>>> anonymous) target device with query-block-jobs, how do you read the
>>>>> status of the target device?...
>>>>
>>>> You don't? :-)
>>>
>>> That's a possibility. :)
>>>
>>> You can just report it in the block job.  It's more consistent with the
>>> fact that you got a BLOCK_JOB_ERROR and not a BLOCK_IO_ERROR.  So I
>>> would do:
>>>
>>> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR,
>>> +                              action, is_read);
>>
>> Isn't job->bs the source?
> 
> It's not about source vs. target, it's about what block device the _job_
> is attached too.  The source is always going to be special because you
> must do "block-job-resume source".

This whole concept of a block job being attached to a single bs is
flawed. Because in the general case it isn't. And probably it should
have been a BackgroundJob rather than a BlockJob from the beginning,
because I'm sure there are use cases outside the block layer as well.

We really manage to get every single point messed up. :-(

> is_read tells you where the error was.

Rather indirect, but yes, for the mirror it works.

>> Also, if you miss the event (e.g. libvirt crashed), can you still tell
>> which bs caused the error?
> 
> No, but how is it different from "can you still tell which bs in the
> chain caused the error"?  Which you cannot tell anyway even from the
> event parameters we have had so far.

It isn't different. We really should report the exact BDS. In practice,
management tools care about ENOSPC which is always the top-level BDS,
and call the admin for help in other cases. The admin can try manually
which file is at fault, but I suppose he would be very glad to be told
by the error message which file it is.

>> Do we need another BlockJobInfo field for the
>> name (*cough* ;-)) of the failed bs?
>>
>> If I understand it right, error reporting on the source and on the
>> target would be completely symmetrical then, which I think is a good thing.
> 
> Yeah, it would, but job->bs _is_ special anyway.
> 
>> job->bs makes one image special, which isn't really useful for a generic
>> interface. So we should keep the fact that it exists internal and get
>> rid of it sometime.
>>
>>> +    if (action == BDRV_ACTION_STOP) {
>>> +        block_job_pause(job);
>>> +        block_job_iostatus_set_err(job, error);
>>> +        if (bs != job->bs) {
>>> +            bdrv_iostatus_set_err(bs, error);
>>> +        }
>>> +    }
>>>
>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>> usage of named block devices.
>>
>> Again, I'd make it unconditional to get symmetric behaviour.
> 
> Not possible, because existing clients may expect "iostatus:
> {nospace,failed}" to imply a runstate of "not running (i/o error)".

Did we make such guarantees? Does libvirt actually make use of it?

>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>> what it makes nicer, but we don't know of anything irremediably broken
>>> without them, and we haven't thought enough of any warts it introduces.
>>
>> On the one hand, I can understand your concerns, but on the other hand,
>> introducing an API now and then making such a big change afterwards
>> scares me much more.
> 
> One example of the doubts I have: is iostatus a BlockBackend or a
> BlockDriverState thing?  If it a BlockBackend thing, does the target
> device have an iostatus at all?

I think it's better to have it in BlockDriverState, but in my
imagination the target is a BlockBackend anyway.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 13:09               ` Kevin Wolf
@ 2012-08-01 13:21                 ` Paolo Bonzini
  2012-08-01 14:01                   ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-01 13:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 01/08/2012 15:09, Kevin Wolf ha scritto:
>> It's not about source vs. target, it's about what block device the _job_
>> is attached too.  The source is always going to be special because you
>> must do "block-job-resume source".
> 
> This whole concept of a block job being attached to a single bs is
> flawed. Because in the general case it isn't. And probably it should
> have been a BackgroundJob rather than a BlockJob from the beginning,
> because I'm sure there are use cases outside the block layer as well.
> 
> We really manage to get every single point messed up. :-(

The balance between overengineering and messing up is unfortunately very
difficult to strike.  Yes, migration is a BackgroundJob in itself.  But
how far would we want to go specializing the monitor interface for
BackgroundJob subclasses, considering after all this is C?

If we had a tiny C core and everything above it written in {dynamic
language of the day} (I vote for Smalltalk, FYI) perhaps we could be
more cavalier in creating new classes and management abstractions, but
keeping it simple has its advantages.

No matter how much we messed things up, so far we decently managed to
refactor our way out (and screw up even more the next thing).

>> is_read tells you where the error was.
> 
> Rather indirect, but yes, for the mirror it works.
> 
>>> Also, if you miss the event (e.g. libvirt crashed), can you still tell
>>> which bs caused the error?
>>
>> No, but how is it different from "can you still tell which bs in the
>> chain caused the error"?  Which you cannot tell anyway even from the
>> event parameters we have had so far.
> 
> It isn't different. We really should report the exact BDS. In practice,
> management tools care about ENOSPC which is always the top-level BDS,
> and call the admin for help in other cases. The admin can try manually
> which file is at fault, but I suppose he would be very glad to be told
> by the error message which file it is.

Yes, and we can add it in the future as an additional argument to the
event and of query-{block,block-jobs} output.  At which point we are ok
with having the target iostatus in BlockJob.

>>> Do we need another BlockJobInfo field for the
>>> name (*cough* ;-)) of the failed bs?
>>>
>>> If I understand it right, error reporting on the source and on the
>>> target would be completely symmetrical then, which I think is a good thing.
>>
>> Yeah, it would, but job->bs _is_ special anyway.
>>
>>> job->bs makes one image special, which isn't really useful for a generic
>>> interface. So we should keep the fact that it exists internal and get
>>> rid of it sometime.
>>>
>>>> +    if (action == BDRV_ACTION_STOP) {
>>>> +        block_job_pause(job);
>>>> +        block_job_iostatus_set_err(job, error);
>>>> +        if (bs != job->bs) {
>>>> +            bdrv_iostatus_set_err(bs, error);
>>>> +        }
>>>> +    }
>>>>
>>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>>> usage of named block devices.
>>>
>>> Again, I'd make it unconditional to get symmetric behaviour.
>>
>> Not possible, because existing clients may expect "iostatus:
>> {nospace,failed}" to imply a runstate of "not running (i/o error)".
> 
> Did we make such guarantees? Does libvirt actually make use of it?

I'm not sure libvirt relies on it, but I think it's a reasonable
expectation.

>>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>>> what it makes nicer, but we don't know of anything irremediably broken
>>>> without them, and we haven't thought enough of any warts it introduces.
>>>
>>> On the one hand, I can understand your concerns, but on the other hand,
>>> introducing an API now and then making such a big change afterwards
>>> scares me much more.
>>
>> One example of the doubts I have: is iostatus a BlockBackend or a
>> BlockDriverState thing?  If it a BlockBackend thing, does the target
>> device have an iostatus at all?
> 
> I think it's better to have it in BlockDriverState, but in my
> imagination the target is a BlockBackend anyway.

Great, 0/2. :)  My rhetorical questions didn't have the outcome I hoped for.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 13:21                 ` Paolo Bonzini
@ 2012-08-01 14:01                   ` Kevin Wolf
  2012-08-01 14:34                     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01 14:01 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 01.08.2012 15:21, schrieb Paolo Bonzini:
> Il 01/08/2012 15:09, Kevin Wolf ha scritto:
>>> It's not about source vs. target, it's about what block device the _job_
>>> is attached too.  The source is always going to be special because you
>>> must do "block-job-resume source".
>>
>> This whole concept of a block job being attached to a single bs is
>> flawed. Because in the general case it isn't. And probably it should
>> have been a BackgroundJob rather than a BlockJob from the beginning,
>> because I'm sure there are use cases outside the block layer as well.
>>
>> We really manage to get every single point messed up. :-(
> 
> The balance between overengineering and messing up is unfortunately very
> difficult to strike.  Yes, migration is a BackgroundJob in itself.  But
> how far would we want to go specializing the monitor interface for
> BackgroundJob subclasses, considering after all this is C?
> 
> If we had a tiny C core and everything above it written in {dynamic
> language of the day} (I vote for Smalltalk, FYI) perhaps we could be
> more cavalier in creating new classes and management abstractions, but
> keeping it simple has its advantages.
> 
> No matter how much we messed things up, so far we decently managed to
> refactor our way out (and screw up even more the next thing).

Well, the thing that disturbs me is having a BlockJob attached to a
single block device. Jobs can and do operate on multiple images and
should probably affect both BDSs the same. For example, bs->in_use
should probably be set for both (Haven't checked in detail yet, do we
have bugs here? Patch 27 doesn't call bdrv_set_in_use at least. It
starts to matter when we get named targets.)

So my wish was to get rid of the "main bs" in job->bs and move all
needed BDSs to the subclass. Which, I then noticed, leaves you with a
mere BackgroundJob.

>>> is_read tells you where the error was.
>>
>> Rather indirect, but yes, for the mirror it works.
>>
>>>> Also, if you miss the event (e.g. libvirt crashed), can you still tell
>>>> which bs caused the error?
>>>
>>> No, but how is it different from "can you still tell which bs in the
>>> chain caused the error"?  Which you cannot tell anyway even from the
>>> event parameters we have had so far.
>>
>> It isn't different. We really should report the exact BDS. In practice,
>> management tools care about ENOSPC which is always the top-level BDS,
>> and call the admin for help in other cases. The admin can try manually
>> which file is at fault, but I suppose he would be very glad to be told
>> by the error message which file it is.
> 
> Yes, and we can add it in the future as an additional argument to the
> event and of query-{block,block-jobs} output.  At which point we are ok
> with having the target iostatus in BlockJob.

Yup, target iostatus along with an ID of the BDS (QOM path? Or will all
of them be named eventually?)

>>>> Do we need another BlockJobInfo field for the
>>>> name (*cough* ;-)) of the failed bs?
>>>>
>>>> If I understand it right, error reporting on the source and on the
>>>> target would be completely symmetrical then, which I think is a good thing.
>>>
>>> Yeah, it would, but job->bs _is_ special anyway.
>>>
>>>> job->bs makes one image special, which isn't really useful for a generic
>>>> interface. So we should keep the fact that it exists internal and get
>>>> rid of it sometime.
>>>>
>>>>> +    if (action == BDRV_ACTION_STOP) {
>>>>> +        block_job_pause(job);
>>>>> +        block_job_iostatus_set_err(job, error);
>>>>> +        if (bs != job->bs) {
>>>>> +            bdrv_iostatus_set_err(bs, error);
>>>>> +        }
>>>>> +    }
>>>>>
>>>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>>>> usage of named block devices.
>>>>
>>>> Again, I'd make it unconditional to get symmetric behaviour.
>>>
>>> Not possible, because existing clients may expect "iostatus:
>>> {nospace,failed}" to imply a runstate of "not running (i/o error)".
>>
>> Did we make such guarantees? Does libvirt actually make use of it?
> 
> I'm not sure libvirt relies on it, but I think it's a reasonable
> expectation.

Possibly, but I don't think anyone should rely on it. You first get the
event, or notice that the VM is stopped, and then check query-block.
Doing it the other way round and then inferring the VM state from
query-block sounds highly unlikely.

>>>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>>>> what it makes nicer, but we don't know of anything irremediably broken
>>>>> without them, and we haven't thought enough of any warts it introduces.
>>>>
>>>> On the one hand, I can understand your concerns, but on the other hand,
>>>> introducing an API now and then making such a big change afterwards
>>>> scares me much more.
>>>
>>> One example of the doubts I have: is iostatus a BlockBackend or a
>>> BlockDriverState thing?  If it a BlockBackend thing, does the target
>>> device have an iostatus at all?
>>
>> I think it's better to have it in BlockDriverState, but in my
>> imagination the target is a BlockBackend anyway.
> 
> Great, 0/2. :)  My rhetorical questions didn't have the outcome I hoped for.

Heh. :-)

I was thinking that when you do it for backing files, you automatically
have to use the BDS, but the iostatus/pointer model works as well, so
yeah, maybe BB is better.

But why wouldn't you use a BlockBackend for the target?

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 14:01                   ` Kevin Wolf
@ 2012-08-01 14:34                     ` Paolo Bonzini
  2012-08-01 14:59                       ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-01 14:34 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 01/08/2012 16:01, Kevin Wolf ha scritto:
> Well, the thing that disturbs me is having a BlockJob attached to a
> single block device. Jobs can and do operate on multiple images and
> should probably affect both BDSs the same. For example, bs->in_use
> should probably be set for both (Haven't checked in detail yet, do we
> have bugs here? Patch 27 doesn't call bdrv_set_in_use at least. It
> starts to matter when we get named targets.)

Makes sense, but what does bs->in_use mean if being a target is "just
another way" to use a named block device?  A target cannot be attached
to a guest disk, but a source that is used by the guest can become part
of a block job.  So, you could block in_use devices from being used in
the drive property.  But you need a separate mechanism to mark devices
as used by the guest... in_use does not cut it.

Stuff like this is what makes me nervous about naming the target
devices.  Name something, and people start using it in ways you haven't
anticipated.  It also makes me wonder if the target is indeed a
BlockBackend or there is a common superclass hiding.

>>>>>> +    if (action == BDRV_ACTION_STOP) {
>>>>>> +        block_job_pause(job);
>>>>>> +        block_job_iostatus_set_err(job, error);
>>>>>> +        if (bs != job->bs) {
>>>>>> +            bdrv_iostatus_set_err(bs, error);
>>>>>> +        }
>>>>>> +    }
>>>>>>
>>>>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>>>>> usage of named block devices.
>>>>>
>>>>> Again, I'd make it unconditional to get symmetric behaviour.
>>>>
>>>> Not possible, because existing clients may expect "iostatus:
>>>> {nospace,failed}" to imply a runstate of "not running (i/o error)".
>>>
>>> Did we make such guarantees? Does libvirt actually make use of it?
>>
>> I'm not sure libvirt relies on it, but I think it's a reasonable
>> expectation.
> 
> Possibly, but I don't think anyone should rely on it. You first get the
> event, or notice that the VM is stopped, and then check query-block.
> Doing it the other way round and then inferring the VM state from
> query-block sounds highly unlikely.

Yeah, makes sense.  The other thing is that the iostatus is very much
tied to the running state.  A stop+cont happening for unrelated reasons
would clear the iostatus for example.

>>>>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>>>>> what it makes nicer, but we don't know of anything irremediably broken
>>>>>> without them, and we haven't thought enough of any warts it introduces.
>>>>>
>>>>> On the one hand, I can understand your concerns, but on the other hand,
>>>>> introducing an API now and then making such a big change afterwards
>>>>> scares me much more.
>>>>
>>>> One example of the doubts I have: is iostatus a BlockBackend or a
>>>> BlockDriverState thing?  If it a BlockBackend thing, does the target
>>>> device have an iostatus at all?
>>>
>>> I think it's better to have it in BlockDriverState, but in my
>>> imagination the target is a BlockBackend anyway.
>>
>> Great, 0/2. :)  My rhetorical questions didn't have the outcome I hoped for.
> 
> Heh. :-)
> 
> I was thinking that when you do it for backing files, you automatically
> have to use the BDS, but the iostatus/pointer model works as well, so
> yeah, maybe BB is better.
> 
> But why wouldn't you use a BlockBackend for the target?

Probably just because at the moment I'm not sure how much of BDS would
percolate there.  In_use is tricky, of course, but perhaps we can or
need to get rid of in_use altogether.

I'm not sure of iostatus because we currently do not have a way to reset
it.  "cont" and "block-job-resume" do that, but only implicitly and
that's too tied to the current usage of iostatus.  If the target device
is not a BB we solve the problem of "cont" resetting its iostatus...

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 14:34                     ` Paolo Bonzini
@ 2012-08-01 14:59                       ` Kevin Wolf
  2012-08-01 15:15                         ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-01 14:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 01.08.2012 16:34, schrieb Paolo Bonzini:
> Il 01/08/2012 16:01, Kevin Wolf ha scritto:
>> Well, the thing that disturbs me is having a BlockJob attached to a
>> single block device. Jobs can and do operate on multiple images and
>> should probably affect both BDSs the same. For example, bs->in_use
>> should probably be set for both (Haven't checked in detail yet, do we
>> have bugs here? Patch 27 doesn't call bdrv_set_in_use at least. It
>> starts to matter when we get named targets.)
> 
> Makes sense, but what does bs->in_use mean if being a target is "just
> another way" to use a named block device?  A target cannot be attached
> to a guest disk

For the mirror, probably yes. Live commit writes to a BDS that is part
of a BlockBackend used by a guest. So it depends on the exact job type
which involved images can be used in which other cases.

Maybe we should first get an overview of which uses exclude which other
uses of a BDS. Because honestly, at this point I have no idea how to
model it correctly.

> but a source that is used by the guest can become part
> of a block job.  So, you could block in_use devices from being used in
> the drive property.  But you need a separate mechanism to mark devices
> as used by the guest... in_use does not cut it.
> 
> Stuff like this is what makes me nervous about naming the target
> devices.  Name something, and people start using it in ways you haven't
> anticipated.  It also makes me wonder if the target is indeed a
> BlockBackend or there is a common superclass hiding.

Hm...Don't know, what would the difference be?

>>>>>>> +    if (action == BDRV_ACTION_STOP) {
>>>>>>> +        block_job_pause(job);
>>>>>>> +        block_job_iostatus_set_err(job, error);
>>>>>>> +        if (bs != job->bs) {
>>>>>>> +            bdrv_iostatus_set_err(bs, error);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>>
>>>>>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>>>>>> usage of named block devices.
>>>>>>
>>>>>> Again, I'd make it unconditional to get symmetric behaviour.
>>>>>
>>>>> Not possible, because existing clients may expect "iostatus:
>>>>> {nospace,failed}" to imply a runstate of "not running (i/o error)".
>>>>
>>>> Did we make such guarantees? Does libvirt actually make use of it?
>>>
>>> I'm not sure libvirt relies on it, but I think it's a reasonable
>>> expectation.
>>
>> Possibly, but I don't think anyone should rely on it. You first get the
>> event, or notice that the VM is stopped, and then check query-block.
>> Doing it the other way round and then inferring the VM state from
>> query-block sounds highly unlikely.
> 
> Yeah, makes sense.  The other thing is that the iostatus is very much
> tied to the running state.  A stop+cont happening for unrelated reasons
> would clear the iostatus for example.

Yes, clearing it automatically may be convenient in the common case, but
is probably not exactly the best idea we've ever had.

>>>>>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>>>>>> what it makes nicer, but we don't know of anything irremediably broken
>>>>>>> without them, and we haven't thought enough of any warts it introduces.
>>>>>>
>>>>>> On the one hand, I can understand your concerns, but on the other hand,
>>>>>> introducing an API now and then making such a big change afterwards
>>>>>> scares me much more.
>>>>>
>>>>> One example of the doubts I have: is iostatus a BlockBackend or a
>>>>> BlockDriverState thing?  If it a BlockBackend thing, does the target
>>>>> device have an iostatus at all?
>>>>
>>>> I think it's better to have it in BlockDriverState, but in my
>>>> imagination the target is a BlockBackend anyway.
>>>
>>> Great, 0/2. :)  My rhetorical questions didn't have the outcome I hoped for.
>>
>> Heh. :-)
>>
>> I was thinking that when you do it for backing files, you automatically
>> have to use the BDS, but the iostatus/pointer model works as well, so
>> yeah, maybe BB is better.
>>
>> But why wouldn't you use a BlockBackend for the target?
> 
> Probably just because at the moment I'm not sure how much of BDS would
> percolate there.  In_use is tricky, of course, but perhaps we can or
> need to get rid of in_use altogether.

in_use has never been properly designed, it is a band-aid hack that has
spread more or more. Can you explain its exact semantics? The best I can
come up with is something like: "Someone does something with the device
and thought that doing one specific other thing, that happens to use
in_use as well, at the same time might be a bad idea."

I'm pretty sure that in_use forbids more cases than is really necessary,
and probably many other cases that should be forbidden are missing.

> I'm not sure of iostatus because we currently do not have a way to reset
> it.  "cont" and "block-job-resume" do that, but only implicitly and
> that's too tied to the current usage of iostatus.  If the target device
> is not a BB we solve the problem of "cont" resetting its iostatus...

This is an awful reason. :-)

Block jobs aren't really different from guests in that respect. Maybe
the BB needs a second iostatus field that must explicitly be reset, and
the old one keeps doing the stupid thing for compatibility's sake.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 14:59                       ` Kevin Wolf
@ 2012-08-01 15:15                         ` Paolo Bonzini
  2012-08-06  9:29                           ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-01 15:15 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 01/08/2012 16:59, Kevin Wolf ha scritto:
> Am 01.08.2012 16:34, schrieb Paolo Bonzini:
>> Il 01/08/2012 16:01, Kevin Wolf ha scritto:
>>> Well, the thing that disturbs me is having a BlockJob attached to a
>>> single block device. Jobs can and do operate on multiple images and
>>> should probably affect both BDSs the same. For example, bs->in_use
>>> should probably be set for both (Haven't checked in detail yet, do we
>>> have bugs here? Patch 27 doesn't call bdrv_set_in_use at least. It
>>> starts to matter when we get named targets.)
>>
>> Makes sense, but what does bs->in_use mean if being a target is "just
>> another way" to use a named block device?  A target cannot be attached
>> to a guest disk
> 
> For the mirror, probably yes. Live commit writes to a BDS that is part
> of a BlockBackend used by a guest. So it depends on the exact job type
> which involved images can be used in which other cases.

Yes, but then it makes sense (at least initially) to just block things
more than needed.  In this respect, in_use is doing its job decently;
there is always time to relax things later.

> Maybe we should first get an overview of which uses exclude which other
> uses of a BDS. Because honestly, at this point I have no idea how to
> model it correctly.
> 
>> but a source that is used by the guest can become part
>> of a block job.  So, you could block in_use devices from being used in
>> the drive property.  But you need a separate mechanism to mark devices
>> as used by the guest... in_use does not cut it.
>>
>> Stuff like this is what makes me nervous about naming the target
>> devices.  Name something, and people start using it in ways you haven't
>> anticipated.  It also makes me wonder if the target is indeed a
>> BlockBackend or there is a common superclass hiding.
> 
> Hm...Don't know, what would the difference be?

What blocks what, for example (for which as you said we need a matrix in
order to model it correctly).  Or how the iostatus is reset if the
target has a iostatus at all; if not, that would be another difference.

>>>>>> One example of the doubts I have: is iostatus a BlockBackend or a
>>>>>> BlockDriverState thing?  If it a BlockBackend thing, does the target
>>>>>> device have an iostatus at all?
>>>>>
>>>>> I think it's better to have it in BlockDriverState, but in my
>>>>> imagination the target is a BlockBackend anyway.
>>>>
>>>> Great, 0/2. :)  My rhetorical questions didn't have the outcome I hoped for.
>>>
>>> Heh. :-)
>>>
>>> I was thinking that when you do it for backing files, you automatically
>>> have to use the BDS, but the iostatus/pointer model works as well, so
>>> yeah, maybe BB is better.
>>>
>>> But why wouldn't you use a BlockBackend for the target?
>>
>> Probably just because at the moment I'm not sure how much of BDS would
>> percolate there.  In_use is tricky, of course, but perhaps we can or
>> need to get rid of in_use altogether.
> 
> in_use has never been properly designed, it is a band-aid hack that has
> spread more or more. Can you explain its exact semantics? The best I can
> come up with is something like: "Someone does something with the device
> and thought that doing one specific other thing, that happens to use
> in_use as well, at the same time might be a bad idea."

Cool things you can do;
don't ask for two at a time
or QEMU eats your disk.

> I'm pretty sure that in_use forbids more cases than is really necessary,
> and probably many other cases that should be forbidden are missing.

So far we fared pretty well on the latter point though.

>> I'm not sure of iostatus because we currently do not have a way to reset
>> it.  "cont" and "block-job-resume" do that, but only implicitly and
>> that's too tied to the current usage of iostatus.  If the target device
>> is not a BB we solve the problem of "cont" resetting its iostatus...
> 
> This is an awful reason. :-)

It is, but it has the merit of decoupling new stuff from "not so
brilliant" old stuff.

> Block jobs aren't really different from guests in that respect. Maybe
> the BB needs a second iostatus field that must explicitly be reset, and
> the old one keeps doing the stupid thing for compatibility's sake.

Or the iostatus for the target can just reside in the BlockJob... :)

As much as I hate to invoke shortcuts, management may proceed without
human help only in the ENOSPC case, and ENOSPC can only happens on the
target.  Humans usually look at dmesg to find the source.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 04/47] block: add block_job_query
  2012-07-31  8:50         ` Paolo Bonzini
@ 2012-08-02 19:28           ` Jeff Cody
  0 siblings, 0 replies; 136+ messages in thread
From: Jeff Cody @ 2012-08-02 19:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, eblake, qemu-devel, stefanha

On 07/31/2012 04:50 AM, Paolo Bonzini wrote:
> Il 31/07/2012 10:47, Kevin Wolf ha scritto:
>>>>>> Why did you convert the initialisation to separate statement? If you
>>>>>> really want to do this, I think using g_new0 would be safer now, but I
>>>>>> actually like compound literals better.
>>>>
>>>> Later on I will have some more initialization beyond the list of fields,
>>>> so I preferred an explicit list.  I can change it back if you prefer.
>> What I'm really interested in is having zero-initialisation for any not
>> explicitly initialised fields, just to be on the safe side. You can do
>> that with g_new0() or with compound literals, that's a matter of taste.
> 
> Yes, and in fact I even have a change to g_new0 later in the series.
> I'll squash that change in this patch.
> 
> Paolo
> 
> 

+1 on this...  interestingly, I just ran into an issue with this patch
while testing block-job-query on my commit patches - I got a segfault on
the command query, because has_target was not initialized to 0, and so
the target ptr was invalid.  Changing it to g_new0() fixes it.


Jeff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-01 15:15                         ` Paolo Bonzini
@ 2012-08-06  9:29                           ` Kevin Wolf
  2012-08-06  9:44                             ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-06  9:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 01.08.2012 17:15, schrieb Paolo Bonzini:
> Il 01/08/2012 16:59, Kevin Wolf ha scritto:
>> Block jobs aren't really different from guests in that respect. Maybe
>> the BB needs a second iostatus field that must explicitly be reset, and
>> the old one keeps doing the stupid thing for compatibility's sake.
> 
> Or the iostatus for the target can just reside in the BlockJob... :)

That wouldn't fix the problem in more than a single instance...

> As much as I hate to invoke shortcuts, management may proceed without
> human help only in the ENOSPC case, and ENOSPC can only happens on the
> target.  Humans usually look at dmesg to find the source.

dmesg doesn't contain information about corrupted qcow2 images, Sheepdog
error codes from the server, etc.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-06  9:29                           ` Kevin Wolf
@ 2012-08-06  9:44                             ` Paolo Bonzini
  2012-08-06 10:45                               ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-06  9:44 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 06/08/2012 11:29, Kevin Wolf ha scritto:
>>> >> Block jobs aren't really different from guests in that respect. Maybe
>>> >> the BB needs a second iostatus field that must explicitly be reset, and
>>> >> the old one keeps doing the stupid thing for compatibility's sake.
>> > 
>> > Or the iostatus for the target can just reside in the BlockJob... :)
> That wouldn't fix the problem in more than a single instance...

Even if you have problems in more than one device, you can still fix
them one at a time.

I think that we're fine with the current information.  In the long term
we will add the failing blockdev name to the blockjob iostatus.

>> > As much as I hate to invoke shortcuts, management may proceed without
>> > human help only in the ENOSPC case, and ENOSPC can only happens on the
>> > target.  Humans usually look at dmesg to find the source.
> dmesg doesn't contain information about corrupted qcow2 images, Sheepdog
> error codes from the server, etc.

True, guest dmesg doesn't help for block jobs.  (But network glitches
could be in dmesg or other sources of monitoring information).

But for block jobs your margins are small because you cannot take the VM
offline.  So if you get an EIO you can just do three things: first,
retry and see if it goes away (transient glitch); second, throw away the
target and see if it goes away; third, raise white flag and ask the VM
admin to cooperate, because the problem is likely in the source.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-06  9:44                             ` Paolo Bonzini
@ 2012-08-06 10:45                               ` Kevin Wolf
  2012-08-06 10:58                                 ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-08-06 10:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 06.08.2012 11:44, schrieb Paolo Bonzini:
> Il 06/08/2012 11:29, Kevin Wolf ha scritto:
>>>>>> Block jobs aren't really different from guests in that respect. Maybe
>>>>>> the BB needs a second iostatus field that must explicitly be reset, and
>>>>>> the old one keeps doing the stupid thing for compatibility's sake.
>>>>
>>>> Or the iostatus for the target can just reside in the BlockJob... :)
>> That wouldn't fix the problem in more than a single instance...
> 
> Even if you have problems in more than one device, you can still fix
> them one at a time.
> 
> I think that we're fine with the current information.  In the long term
> we will add the failing blockdev name to the blockjob iostatus.

I think you misunderstood. What I was trying to say is that with the
same reasoning we'd need a field that doesn't automatically reset its
status on 'cont' not only for block jobs, but also for regular guest disks.

If you try fixing the problem by adding a field in BlockJob, it may well
be fixed for block jobs, but you still need to add it in the generic
place later so that regular disks are covered as well.

>>>> As much as I hate to invoke shortcuts, management may proceed without
>>>> human help only in the ENOSPC case, and ENOSPC can only happens on the
>>>> target.  Humans usually look at dmesg to find the source.
>> dmesg doesn't contain information about corrupted qcow2 images, Sheepdog
>> error codes from the server, etc.
> 
> True, guest dmesg doesn't help for block jobs.  (But network glitches
> could be in dmesg or other sources of monitoring information).

Guest or host dmesg? I thought we're talking about the host.

> But for block jobs your margins are small because you cannot take the VM
> offline.  So if you get an EIO you can just do three things: first,
> retry and see if it goes away (transient glitch); second, throw away the
> target and see if it goes away; third, raise white flag and ask the VM
> admin to cooperate, because the problem is likely in the source.

Yes, and the admin wants to know what happened, i.e. an accurate iostatus.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
  2012-08-06 10:45                               ` Kevin Wolf
@ 2012-08-06 10:58                                 ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-08-06 10:58 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 06/08/2012 12:45, Kevin Wolf ha scritto:
>> In the long term
>> we will add the failing blockdev name to the blockjob iostatus.
> I think you misunderstood. What I was trying to say is that with the
> same reasoning we'd need a field that doesn't automatically reset its
> status on 'cont' not only for block jobs, but also for regular guest disks.

I'm not sure why.  There cannot be changes to the guest-triggered
iostatus after you do a stop.

On the other hand, a block job still runs while the guest is stopped
(and we cannot change this because it'd be backwards-incompatible), so
you can have the following (> means command, < means event):

   > stop                 (no iostatus)
   < BLOCK_JOB_ERROR      (iostatus=failed on source)
   > cont                 (no iostatus)
     libvirtd restarts
   > query-block          (no iostatus)

Compare this with guest-initiated I/O:

   > stop                 (no iostatus)
   < BLOCK_IO_ERROR       (iostatus=failed)
   > cont                 (no iostatus)
     libvirtd restarts
     QEMU retries I/O, fails
   < BLOCK_IO_ERROR       (iostatus=failed)
   > query-block          (iostatus=failed)

> If you try fixing the problem by adding a field in BlockJob, it may well
> be fixed for block jobs, but you still need to add it in the generic
> place later so that regular disks are covered as well.

Regular disks are covered because each entry in query-block has its own
iostatus.  The only problematic case is now if two different backing
files fail for unrelated reasons.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 19/47] block: add bdrv_query_info
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 19/47] block: add bdrv_query_info Paolo Bonzini
@ 2012-09-11 13:07   ` Kevin Wolf
  2012-09-11 13:12     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-09-11 13:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> Extract it out of the implementation of "info block".
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c |  104 +++++++++++++++++++++++++++++++--------------------------------
>  block.h |    1 +
>  2 files changed, 53 insertions(+), 52 deletions(-)

The refactoring looks correct, but why do you put the function between
qmp_query_blockstat and qmp_query_blockstats? If you put it next to
qmp_query_block, both functions pairs stay closer together.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 19/47] block: add bdrv_query_info
  2012-09-11 13:07   ` Kevin Wolf
@ 2012-09-11 13:12     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-09-11 13:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha



----- Messaggio originale -----
> Da: "Kevin Wolf" <kwolf@redhat.com>
> A: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: qemu-devel@nongnu.org, eblake@redhat.com, jcody@redhat.com, stefanha@linux.vnet.ibm.com
> Inviato: Martedì, 11 settembre 2012 15:07:05
> Oggetto: Re: [PATCH 19/47] block: add bdrv_query_info
> 
> Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> > Extract it out of the implementation of "info block".
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  block.c |  104
> >  +++++++++++++++++++++++++++++++--------------------------------
> >  block.h |    1 +
> >  2 files changed, 53 insertions(+), 52 deletions(-)
> 
> The refactoring looks correct, but why do you put the function between
> qmp_query_blockstat and qmp_query_blockstats? If you put it next to
> qmp_query_block, both functions pairs stay closer together.

Ok, will do.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 21/47] block: add bdrv_ensure_backing_file
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 21/47] block: add bdrv_ensure_backing_file Paolo Bonzini
@ 2012-09-11 13:32   ` Kevin Wolf
  2012-09-11 13:46     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-09-11 13:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> Mirroring runs without the backing file so that it can be copied outside
> QEMU.  However, we need to add it at the time the job is completed and
> QEMU switches to the target.  Factor out the common bits of opening an
> image and completing a mirroring operation.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Once again, combining code motion and code changes in one patch makes it
harder to review.

bdrv_ensure_backing_file() isn't a good name, after reading only the
subject line I had no idea what this function might do. It's still not
entirely clear to me what the different to a bdrv_open_backing_file()
is, except that it doesn't do anything if a backing file is already
open. In which cases do we rely on this behaviour?

> ---
>  block.c |   69 ++++++++++++++++++++++++++++++++++++++++-----------------------
>  block.h |    1 +
>  2 files changed, 45 insertions(+), 25 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 19da114..002b442 100644
> --- a/block.c
> +++ b/block.c
> @@ -730,6 +730,48 @@ int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags)
>      return 0;
>  }
>  
> +int bdrv_ensure_backing_file(BlockDriverState *bs)
> +{
> +    char backing_filename[PATH_MAX];
> +    int back_flags, ret;
> +    BlockDriver *back_drv = NULL;
> +
> +    if (bs->backing_hd != NULL) {
> +        return 0;
> +    }
> +
> +    bs->open_flags &= ~BDRV_O_NO_BACKING;

This doesn't do anything in this patch because the function is never
called with BDRV_O_NO_BACKING set. Is it in preparation for a second user?

> +    if (bs->backing_file[0] == '\0') {
> +        return 0;
> +    }
> +
> +    bs->backing_hd = bdrv_new("");
> +    bdrv_get_full_backing_filename(bs, backing_filename,
> +                                   sizeof(backing_filename));
> +
> +    if (bs->backing_format[0] != '\0') {
> +        back_drv = bdrv_find_format(bs->backing_format);
> +    }
> +
> +    /* backing files always opened read-only */
> +    back_flags = bs->open_flags & ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT);
> +
> +    ret = bdrv_open(bs->backing_hd, backing_filename, back_flags, back_drv);
> +    if (ret < 0) {
> +        bdrv_close(bs);

I don't like this because it makes the invalid assumption that the
caller has just opened bs and wants to close it if opening the backing
file fails. I think this is part of the error handling that belongs in
the caller: It opened bs, so it is responsible for closing it in error
cases.

> +        bdrv_delete(bs->backing_hd);

This is a bug fix of its own, should be a separate patch.

> +        bs->backing_hd = NULL;
> +        return ret;
> +    }
> +    if (bs->is_temporary) {
> +        bs->backing_hd->keep_read_only = !(bs->open_flags & BDRV_O_RDWR);
> +    } else {
> +        /* base images use the same setting as leaf */

Huh, "parent" turned into "leaf"?

> +        bs->backing_hd->keep_read_only = bs->keep_read_only;
> +    }
> +    return 0;
> +}

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 22/47] block: make device optional in BlockInfo
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 22/47] block: make device optional in BlockInfo Paolo Bonzini
@ 2012-09-11 13:38   ` Kevin Wolf
  2012-09-11 13:49     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-09-11 13:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:04, schrieb Paolo Bonzini:
> Targets of a mirroring operation will not have a device.  Once we have
> -blockdev or equivalent, "detached" block devices and non-anonymous
> backing files also will not have a device.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  qapi-schema.json |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index fca1806..b00d8c6 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -443,7 +443,8 @@
>  # Block device information.  This structure describes a virtual device and
>  # the backing device associated with it.
>  #
> -# @device: The device name associated with the virtual device.
> +# @device: #optional The device name associated with the virtual device.
> +#          Always included in the output of query-block.
>  #
>  # @type: This field is returned only for compatibility reasons, it should
>  #        not be used (always returns 'unknown')
> @@ -465,7 +466,7 @@
>  # Since:  0.14.0
>  ##
>  { 'type': 'BlockInfo',
> -  'data': {'device': 'str', 'type': 'str', 'removable': 'bool',
> +  'data': {'*device': 'str', 'type': 'str', 'removable': 'bool',
>             'locked': 'bool', '*inserted': 'BlockDeviceInfo',
>             '*tray_open': 'bool', '*io-status': 'BlockDeviceIoStatus'} }

Is this really a compatible change? That 'device' is basically the
unique key by which block device are identified doesn't exactly make
feel more comfortable about the change.

Of course, not making it optional means that basically we need to go the
way of referencing the block device in query-block-jobs immediately
instead of thinking about it later. You know that I preferred this from
the start, and this change is just another detail that makes me think
it's the right thing to do.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 21/47] block: add bdrv_ensure_backing_file
  2012-09-11 13:32   ` Kevin Wolf
@ 2012-09-11 13:46     ` Paolo Bonzini
  2012-09-11 13:58       ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-09-11 13:46 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha


> Once again, combining code motion and code changes in one patch makes
> it harder to review.

bdrv_ensure_backing_file() is a new standalone function that happens to be
usable in bdrv_open as well.  But I can separate the changes/fixes to a
separate patch.

In particular it is can be used after a file has been opened with
BDRV_O_NO_BACKING (at which point the flag does not reflect reality
anymore, hence the removal of the flag).

> bdrv_ensure_backing_file() isn't a good name, after reading only the
> subject line I had no idea what this function might do. It's still
> not entirely clear to me what the different to a bdrv_open_backing_file()
> is, except that it doesn't do anything if a backing file is already
> open. In which cases do we rely on this behaviour?

We open the mirroring target with BDRV_O_NO_BACKING usually, but require
the backing file if the cluster size is larger than the dirty block
granularity.  Later, COW is done in the mirror job, so this is not
needed anymore at the end of the series.
 
> > ---
> >  block.c |   69
> >  ++++++++++++++++++++++++++++++++++++++++-----------------------
> >  block.h |    1 +
> >  2 files changed, 45 insertions(+), 25 deletions(-)
> > 
> > diff --git a/block.c b/block.c
> > index 19da114..002b442 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -730,6 +730,48 @@ int bdrv_file_open(BlockDriverState **pbs,
> > const char *filename, int flags)
> >      return 0;
> >  }
> >  
> > +int bdrv_ensure_backing_file(BlockDriverState *bs)
> > +{
> > +    char backing_filename[PATH_MAX];
> > +    int back_flags, ret;
> > +    BlockDriver *back_drv = NULL;
> > +
> > +    if (bs->backing_hd != NULL) {
> > +        return 0;
> > +    }
> > +
> > +    bs->open_flags &= ~BDRV_O_NO_BACKING;
> 
> This doesn't do anything in this patch because the function is never
> called with BDRV_O_NO_BACKING set. Is it in preparation for a second
> user?

Yes, see above.

> > +    if (bs->backing_file[0] == '\0') {
> > +        return 0;
> > +    }
> > +
> > +    bs->backing_hd = bdrv_new("");
> > +    bdrv_get_full_backing_filename(bs, backing_filename,
> > +                                   sizeof(backing_filename));
> > +
> > +    if (bs->backing_format[0] != '\0') {
> > +        back_drv = bdrv_find_format(bs->backing_format);
> > +    }
> > +
> > +    /* backing files always opened read-only */
> > +    back_flags = bs->open_flags & ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT);
> > +
> > +    ret = bdrv_open(bs->backing_hd, backing_filename, back_flags,
> > back_drv);
> > +    if (ret < 0) {
> > +        bdrv_close(bs);
> 
> I don't like this because it makes the invalid assumption that the
> caller has just opened bs and wants to close it if opening the
> backing file fails. I think this is part of the error handling that belongs
> in the caller: It opened bs, so it is responsible for closing it in
> error cases.

It's a bug, it should have closed bs->backing_hd.

> > +        bdrv_delete(bs->backing_hd);
> 
> This is a bug fix of its own, should be a separate patch.

Ok.

> > +        bs->backing_hd = NULL;
> > +        return ret;
> > +    }
> > +    if (bs->is_temporary) {
> > +        bs->backing_hd->keep_read_only = !(bs->open_flags &
> > BDRV_O_RDWR);
> > +    } else {
> > +        /* base images use the same setting as leaf */
> 
> Huh, "parent" turned into "leaf"?

Will move this to a separate patch, too.

Paolo

> > +        bs->backing_hd->keep_read_only = bs->keep_read_only;
> > +    }
> > +    return 0;
> > +}
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 22/47] block: make device optional in BlockInfo
  2012-09-11 13:38   ` Kevin Wolf
@ 2012-09-11 13:49     ` Paolo Bonzini
  2012-09-11 14:02       ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-09-11 13:49 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha



----- Messaggio originale -----
> Da: "Kevin Wolf" <kwolf@redhat.com>
> A: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: qemu-devel@nongnu.org, eblake@redhat.com, jcody@redhat.com, stefanha@linux.vnet.ibm.com
> Inviato: Martedì, 11 settembre 2012 15:38:33
> Oggetto: Re: [PATCH 22/47] block: make device optional in BlockInfo
> 
> Am 24.07.2012 13:04, schrieb Paolo Bonzini:
> > Targets of a mirroring operation will not have a device.  Once we
> > have
> > -blockdev or equivalent, "detached" block devices and non-anonymous
> > backing files also will not have a device.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  qapi-schema.json |    5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index fca1806..b00d8c6 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -443,7 +443,8 @@
> >  # Block device information.  This structure describes a virtual
> >  device and
> >  # the backing device associated with it.
> >  #
> > -# @device: The device name associated with the virtual device.
> > +# @device: #optional The device name associated with the virtual
> > device.
> > +#          Always included in the output of query-block.
> >  #
> >  # @type: This field is returned only for compatibility reasons, it
> >  should
> >  #        not be used (always returns 'unknown')
> > @@ -465,7 +466,7 @@
> >  # Since:  0.14.0
> >  ##
> >  { 'type': 'BlockInfo',
> > -  'data': {'device': 'str', 'type': 'str', 'removable': 'bool',
> > +  'data': {'*device': 'str', 'type': 'str', 'removable': 'bool',
> >             'locked': 'bool', '*inserted': 'BlockDeviceInfo',
> >             '*tray_open': 'bool', '*io-status':
> >             'BlockDeviceIoStatus'} }
> 
> Is this really a compatible change? That 'device' is basically the
> unique key by which block device are identified doesn't exactly make
> feel more comfortable about the change.

As long as query-block ensures that the field is present---yes.

> Of course, not making it optional means that basically we need to go
> the way of referencing the block device in query-block-jobs immediately
> instead of thinking about it later. You know that I preferred this
> from the start, and this change is just another detail that makes me think
> it's the right thing to do.

Indeed; this patch is not anymore in the current version of the series,
after your comments from July/August.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 21/47] block: add bdrv_ensure_backing_file
  2012-09-11 13:46     ` Paolo Bonzini
@ 2012-09-11 13:58       ` Kevin Wolf
  2012-09-11 14:10         ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-09-11 13:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 11.09.2012 15:46, schrieb Paolo Bonzini:
> 
>> Once again, combining code motion and code changes in one patch makes
>> it harder to review.
> 
> bdrv_ensure_backing_file() is a new standalone function that happens to be
> usable in bdrv_open as well.  But I can separate the changes/fixes to a
> separate patch.
> 
> In particular it is can be used after a file has been opened with
> BDRV_O_NO_BACKING (at which point the flag does not reflect reality
> anymore, hence the removal of the flag).

Yes, that's what I figured eventually. Maybe some documentation for the
function couldn't hurt.

>> bdrv_ensure_backing_file() isn't a good name, after reading only the
>> subject line I had no idea what this function might do. It's still
>> not entirely clear to me what the different to a bdrv_open_backing_file()
>> is, except that it doesn't do anything if a backing file is already
>> open. In which cases do we rely on this behaviour?
> 
> We open the mirroring target with BDRV_O_NO_BACKING usually, but require
> the backing file if the cluster size is larger than the dirty block
> granularity.  Later, COW is done in the mirror job, so this is not
> needed anymore at the end of the series.

Can we then put a /* FIXME */ comment there and revert that behaviour at
the end of the series? Then we can call it bdrv_open_backing_file() and
it's meaning becomes more obvious.

>>> +    if (bs->backing_file[0] == '\0') {
>>> +        return 0;
>>> +    }
>>> +
>>> +    bs->backing_hd = bdrv_new("");
>>> +    bdrv_get_full_backing_filename(bs, backing_filename,
>>> +                                   sizeof(backing_filename));
>>> +
>>> +    if (bs->backing_format[0] != '\0') {
>>> +        back_drv = bdrv_find_format(bs->backing_format);
>>> +    }
>>> +
>>> +    /* backing files always opened read-only */
>>> +    back_flags = bs->open_flags & ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT);
>>> +
>>> +    ret = bdrv_open(bs->backing_hd, backing_filename, back_flags,
>>> back_drv);
>>> +    if (ret < 0) {
>>> +        bdrv_close(bs);
>>
>> I don't like this because it makes the invalid assumption that the
>> caller has just opened bs and wants to close it if opening the
>> backing file fails. I think this is part of the error handling that belongs
>> in the caller: It opened bs, so it is responsible for closing it in
>> error cases.
> 
> It's a bug, it should have closed bs->backing_hd.

Are you sure? You removed the bdrv_close(bs) in the caller, so that it's
missing there would be a second bug.

An explicit bdrv_close(bs->backing_hd) isn't required here, it is
implicitly called in bdrv_delete(bs->backing_hd).

>>> +        bdrv_delete(bs->backing_hd);
>>
>> This is a bug fix of its own, should be a separate patch.
> 
> Ok.
> 
>>> +        bs->backing_hd = NULL;
>>> +        return ret;
>>> +    }
>>> +    if (bs->is_temporary) {
>>> +        bs->backing_hd->keep_read_only = !(bs->open_flags &
>>> BDRV_O_RDWR);
>>> +    } else {
>>> +        /* base images use the same setting as leaf */
>>
>> Huh, "parent" turned into "leaf"?
> 
> Will move this to a separate patch, too.

I don't even understand what you're trying to say in this comment. The
only tree that I can think of (so something like leaves exists) is built
by bs->file and bs->backing_hd. But in this case, the top-level image,
from which the flags are taken, is the root and not a leaf?

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 22/47] block: make device optional in BlockInfo
  2012-09-11 13:49     ` Paolo Bonzini
@ 2012-09-11 14:02       ` Kevin Wolf
  2012-09-11 14:14         ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-09-11 14:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 11.09.2012 15:49, schrieb Paolo Bonzini:
> 
> 
> ----- Messaggio originale -----
>> Da: "Kevin Wolf" <kwolf@redhat.com>
>> A: "Paolo Bonzini" <pbonzini@redhat.com>
>> Cc: qemu-devel@nongnu.org, eblake@redhat.com, jcody@redhat.com, stefanha@linux.vnet.ibm.com
>> Inviato: Martedì, 11 settembre 2012 15:38:33
>> Oggetto: Re: [PATCH 22/47] block: make device optional in BlockInfo
>>
>> Am 24.07.2012 13:04, schrieb Paolo Bonzini:
>>> Targets of a mirroring operation will not have a device.  Once we
>>> have
>>> -blockdev or equivalent, "detached" block devices and non-anonymous
>>> backing files also will not have a device.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  qapi-schema.json |    5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>> index fca1806..b00d8c6 100644
>>> --- a/qapi-schema.json
>>> +++ b/qapi-schema.json
>>> @@ -443,7 +443,8 @@
>>>  # Block device information.  This structure describes a virtual
>>>  device and
>>>  # the backing device associated with it.
>>>  #
>>> -# @device: The device name associated with the virtual device.
>>> +# @device: #optional The device name associated with the virtual
>>> device.
>>> +#          Always included in the output of query-block.
>>>  #
>>>  # @type: This field is returned only for compatibility reasons, it
>>>  should
>>>  #        not be used (always returns 'unknown')
>>> @@ -465,7 +466,7 @@
>>>  # Since:  0.14.0
>>>  ##
>>>  { 'type': 'BlockInfo',
>>> -  'data': {'device': 'str', 'type': 'str', 'removable': 'bool',
>>> +  'data': {'*device': 'str', 'type': 'str', 'removable': 'bool',
>>>             'locked': 'bool', '*inserted': 'BlockDeviceInfo',
>>>             '*tray_open': 'bool', '*io-status':
>>>             'BlockDeviceIoStatus'} }
>>
>> Is this really a compatible change? That 'device' is basically the
>> unique key by which block device are identified doesn't exactly make
>> feel more comfortable about the change.
> 
> As long as query-block ensures that the field is present---yes.
> 
>> Of course, not making it optional means that basically we need to go
>> the way of referencing the block device in query-block-jobs immediately
>> instead of thinking about it later. You know that I preferred this
>> from the start, and this change is just another detail that makes me think
>> it's the right thing to do.
> 
> Indeed; this patch is not anymore in the current version of the series,
> after your comments from July/August.

Wait, am I reviewing the wrong version of the series? :-/

Did you post a newer one?

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 21/47] block: add bdrv_ensure_backing_file
  2012-09-11 13:58       ` Kevin Wolf
@ 2012-09-11 14:10         ` Paolo Bonzini
  2012-09-11 15:38           ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-09-11 14:10 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha



----- Messaggio originale -----
> Da: "Kevin Wolf" <kwolf@redhat.com>
> A: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: qemu-devel@nongnu.org, eblake@redhat.com, jcody@redhat.com, stefanha@linux.vnet.ibm.com
> Inviato: Martedì, 11 settembre 2012 15:58:38
> Oggetto: Re: [PATCH 21/47] block: add bdrv_ensure_backing_file
> 
> Am 11.09.2012 15:46, schrieb Paolo Bonzini:
> > 
> >> Once again, combining code motion and code changes in one patch
> >> makes
> >> it harder to review.
> > 
> > bdrv_ensure_backing_file() is a new standalone function that
> > happens to be
> > usable in bdrv_open as well.  But I can separate the changes/fixes
> > to a
> > separate patch.
> > 
> > In particular it is can be used after a file has been opened with
> > BDRV_O_NO_BACKING (at which point the flag does not reflect reality
> > anymore, hence the removal of the flag).
> 
> Yes, that's what I figured eventually. Maybe some documentation for
> the function couldn't hurt.
> 
> >> bdrv_ensure_backing_file() isn't a good name, after reading only
> >> the
> >> subject line I had no idea what this function might do. It's still
> >> not entirely clear to me what the different to a
> >> bdrv_open_backing_file()
> >> is, except that it doesn't do anything if a backing file is
> >> already
> >> open. In which cases do we rely on this behaviour?
> > 
> > We open the mirroring target with BDRV_O_NO_BACKING usually, but
> > require
> > the backing file if the cluster size is larger than the dirty block
> > granularity.  Later, COW is done in the mirror job, so this is not
> > needed anymore at the end of the series.
> 
> Can we then put a /* FIXME */ comment there and revert that behaviour
> at the end of the series? Then we can call it bdrv_open_backing_file()
> and it's meaning becomes more obvious.

Actually, now that my machine finished upgrading and I can look at the
source code, we do use the functionality even at the end of this series.
If you call block-job-complete to complete mirroring, bdrv_ensure_backing_file()
is called.  But block-job-complete can be called multiple times, because
completion is entirely asynchronous.  I can check bs->backing_hd in the
completion callback, but I think it's less clean (there is no reason in
principle why block/mirror.c should include block_int.h, and adding a
function just to use it once seems not worth).

I would like to add documentation to all functions in block.h in 1.3,
I can start from this function if that would mean keeping it as is...

> >>> +    if (bs->backing_file[0] == '\0') {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    bs->backing_hd = bdrv_new("");
> >>> +    bdrv_get_full_backing_filename(bs, backing_filename,
> >>> +                                   sizeof(backing_filename));
> >>> +
> >>> +    if (bs->backing_format[0] != '\0') {
> >>> +        back_drv = bdrv_find_format(bs->backing_format);
> >>> +    }
> >>> +
> >>> +    /* backing files always opened read-only */
> >>> +    back_flags = bs->open_flags & ~(BDRV_O_RDWR |
> >>> BDRV_O_SNAPSHOT);
> >>> +
> >>> +    ret = bdrv_open(bs->backing_hd, backing_filename,
> >>> back_flags,
> >>> back_drv);
> >>> +    if (ret < 0) {
> >>> +        bdrv_close(bs);
> >>
> >> I don't like this because it makes the invalid assumption that the
> >> caller has just opened bs and wants to close it if opening the
> >> backing file fails. I think this is part of the error handling
> >> that belongs
> >> in the caller: It opened bs, so it is responsible for closing it
> >> in
> >> error cases.
> > 
> > It's a bug, it should have closed bs->backing_hd.
> 
> Are you sure? You removed the bdrv_close(bs) in the caller, so that
> it's missing there would be a second bug.
> An explicit bdrv_close(bs->backing_hd) isn't required here, it is
> implicitly called in bdrv_delete(bs->backing_hd).

True.  But likely my mental process was to add the bdrv_close(bs) here
thinking that it would match the bdrv_delete below.  Note that
bdrv_close(bs) already does delete bs->backing_hd.

> >>> +        bdrv_delete(bs->backing_hd);
> >>
> >> This is a bug fix of its own, should be a separate patch.
> > 
> > Ok.
> > 
> >>> +        bs->backing_hd = NULL;
> >>> +        return ret;
> >>> +    }
> >>> +    if (bs->is_temporary) {
> >>> +        bs->backing_hd->keep_read_only = !(bs->open_flags &
> >>> BDRV_O_RDWR);
> >>> +    } else {
> >>> +        /* base images use the same setting as leaf */
> >>
> >> Huh, "parent" turned into "leaf"?
> > 
> > Will move this to a separate patch, too.
> 
> I don't even understand what you're trying to say in this comment.

Well, I couldn't understand the original comment either. :)  To me,
base image and parent is a synonym...

The images form a tree (snapshots being nodes and with each node
having a parent pointer); what we open is a path from root to leaf,
so the top-level image is a leaf.

Paolo

> The
> only tree that I can think of (so something like leaves exists) is
> built by bs->file and bs->backing_hd. But in this case, the top-level
> image, from which the flags are taken, is the root and not a leaf?
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 22/47] block: make device optional in BlockInfo
  2012-09-11 14:02       ` Kevin Wolf
@ 2012-09-11 14:14         ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-09-11 14:14 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha


> > Indeed; this patch is not anymore in the current version of the
> > series,
> > after your comments from July/August.
> 
> Wait, am I reviewing the wrong version of the series? :-/
> 
> Did you post a newer one?

No, not yet.  I was going to follow up this week.  Almost all the changes
are in the patches you already reviewed in July/August (and the changes
match your review).  So you can go on, just skip these three:

   add hierarchical bitmap data type and test cases (Laszlo reviewed
       this one offlist and I have some changes from him; better docs too)
   block: add target info to QMP query-blockjobs command
   mirror: support querying target file

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 24/47] block: introduce new dirty bitmap functionality
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 24/47] block: introduce new dirty bitmap functionality Paolo Bonzini
@ 2012-09-11 14:57   ` Kevin Wolf
  2012-09-11 16:17     ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-09-11 14:57 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:04, schrieb Paolo Bonzini:
> Assert that write_compressed is never used with the dirty bitmap.
> Setting the bits early is wrong, because a coroutine might concurrently
> examine them and copy incomplete data from the source.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Looks good, but I think it should be documented at least that
bdrv_get_next_dirty() hangs when it's called and there is no dirty block.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 21/47] block: add bdrv_ensure_backing_file
  2012-09-11 14:10         ` Paolo Bonzini
@ 2012-09-11 15:38           ` Kevin Wolf
  0 siblings, 0 replies; 136+ messages in thread
From: Kevin Wolf @ 2012-09-11 15:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 11.09.2012 16:10, schrieb Paolo Bonzini:
> 
> 
> ----- Messaggio originale -----
>> Da: "Kevin Wolf" <kwolf@redhat.com>
>> A: "Paolo Bonzini" <pbonzini@redhat.com>
>> Cc: qemu-devel@nongnu.org, eblake@redhat.com, jcody@redhat.com, stefanha@linux.vnet.ibm.com
>> Inviato: Martedì, 11 settembre 2012 15:58:38
>> Oggetto: Re: [PATCH 21/47] block: add bdrv_ensure_backing_file
>>
>> Am 11.09.2012 15:46, schrieb Paolo Bonzini:
>>>
>>>> Once again, combining code motion and code changes in one patch
>>>> makes
>>>> it harder to review.
>>>
>>> bdrv_ensure_backing_file() is a new standalone function that
>>> happens to be
>>> usable in bdrv_open as well.  But I can separate the changes/fixes
>>> to a
>>> separate patch.
>>>
>>> In particular it is can be used after a file has been opened with
>>> BDRV_O_NO_BACKING (at which point the flag does not reflect reality
>>> anymore, hence the removal of the flag).
>>
>> Yes, that's what I figured eventually. Maybe some documentation for
>> the function couldn't hurt.
>>
>>>> bdrv_ensure_backing_file() isn't a good name, after reading only
>>>> the
>>>> subject line I had no idea what this function might do. It's still
>>>> not entirely clear to me what the different to a
>>>> bdrv_open_backing_file()
>>>> is, except that it doesn't do anything if a backing file is
>>>> already
>>>> open. In which cases do we rely on this behaviour?
>>>
>>> We open the mirroring target with BDRV_O_NO_BACKING usually, but
>>> require
>>> the backing file if the cluster size is larger than the dirty block
>>> granularity.  Later, COW is done in the mirror job, so this is not
>>> needed anymore at the end of the series.
>>
>> Can we then put a /* FIXME */ comment there and revert that behaviour
>> at the end of the series? Then we can call it bdrv_open_backing_file()
>> and it's meaning becomes more obvious.
> 
> Actually, now that my machine finished upgrading and I can look at the
> source code, we do use the functionality even at the end of this series.
> If you call block-job-complete to complete mirroring, bdrv_ensure_backing_file()
> is called.  But block-job-complete can be called multiple times, because
> completion is entirely asynchronous.  I can check bs->backing_hd in the
> completion callback, but I think it's less clean (there is no reason in
> principle why block/mirror.c should include block_int.h, and adding a
> function just to use it once seems not worth).
> 
> I would like to add documentation to all functions in block.h in 1.3,
> I can start from this function if that would mean keeping it as is...

I'm not against keeping the logic, just adding documentation to new
functions would be good; even more so if their semantics isn't obvious.

I think I'd consider renaming the function anyway.

>>>>> +    if (bs->backing_file[0] == '\0') {
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    bs->backing_hd = bdrv_new("");
>>>>> +    bdrv_get_full_backing_filename(bs, backing_filename,
>>>>> +                                   sizeof(backing_filename));
>>>>> +
>>>>> +    if (bs->backing_format[0] != '\0') {
>>>>> +        back_drv = bdrv_find_format(bs->backing_format);
>>>>> +    }
>>>>> +
>>>>> +    /* backing files always opened read-only */
>>>>> +    back_flags = bs->open_flags & ~(BDRV_O_RDWR |
>>>>> BDRV_O_SNAPSHOT);
>>>>> +
>>>>> +    ret = bdrv_open(bs->backing_hd, backing_filename,
>>>>> back_flags,
>>>>> back_drv);
>>>>> +    if (ret < 0) {
>>>>> +        bdrv_close(bs);
>>>>
>>>> I don't like this because it makes the invalid assumption that the
>>>> caller has just opened bs and wants to close it if opening the
>>>> backing file fails. I think this is part of the error handling
>>>> that belongs
>>>> in the caller: It opened bs, so it is responsible for closing it
>>>> in
>>>> error cases.
>>>
>>> It's a bug, it should have closed bs->backing_hd.
>>
>> Are you sure? You removed the bdrv_close(bs) in the caller, so that
>> it's missing there would be a second bug.
>> An explicit bdrv_close(bs->backing_hd) isn't required here, it is
>> implicitly called in bdrv_delete(bs->backing_hd).
> 
> True.  But likely my mental process was to add the bdrv_close(bs) here
> thinking that it would match the bdrv_delete below.  Note that
> bdrv_close(bs) already does delete bs->backing_hd.

Whatever, as long as the bug gets fixed... ;-)

>>>>> +        bdrv_delete(bs->backing_hd);
>>>>
>>>> This is a bug fix of its own, should be a separate patch.
>>>
>>> Ok.
>>>
>>>>> +        bs->backing_hd = NULL;
>>>>> +        return ret;
>>>>> +    }
>>>>> +    if (bs->is_temporary) {
>>>>> +        bs->backing_hd->keep_read_only = !(bs->open_flags &
>>>>> BDRV_O_RDWR);
>>>>> +    } else {
>>>>> +        /* base images use the same setting as leaf */
>>>>
>>>> Huh, "parent" turned into "leaf"?
>>>
>>> Will move this to a separate patch, too.
>>
>> I don't even understand what you're trying to say in this comment.
> 
> Well, I couldn't understand the original comment either. :)  To me,
> base image and parent is a synonym...
> 
> The images form a tree (snapshots being nodes and with each node
> having a parent pointer); what we open is a path from root to leaf,
> so the top-level image is a leaf.

We definitely need to get the terminology clarified...

Yes, you're right, the images files on the disk form a tree and your
terminology matches this. But in a running qemu, we have a tree of
BlockDriverStates (as I mentioned bs->file and bs->backing_hd create it,
along with driver specific links), and unfortunately it's direction is
exactly the opposite. This is what the terminology of the original
comment was based on and what I thought of. Both approaches are right in
a way.

I had a similar confusion when talking to Jeff recently, and there we
decided to avoid talking about children and parents at all. Maybe this
is the best in order to avoid misunderstandings. For the image files on
the disk "backing file/overlay" are good alternatives; I'm not sure if
we have any for the BDS tree.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 24/47] block: introduce new dirty bitmap functionality
  2012-09-11 14:57   ` Kevin Wolf
@ 2012-09-11 16:17     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-09-11 16:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 11/09/2012 16:57, Kevin Wolf ha scritto:
>> > Assert that write_compressed is never used with the dirty bitmap.
>> > Setting the bits early is wrong, because a coroutine might concurrently
>> > examine them and copy incomplete data from the source.
>> > 
>> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Looks good, but I think it should be documented at least that
> bdrv_get_next_dirty() hangs when it's called and there is no dirty block.

Since bdrv_get_next_dirty disappears soon, I'll just add an assertion.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 27/47] block: introduce mirror job
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 27/47] block: introduce mirror job Paolo Bonzini
  2012-07-25 23:02   ` Eric Blake
@ 2012-09-13 12:54   ` Kevin Wolf
  2012-09-13 14:07     ` Paolo Bonzini
  1 sibling, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-09-13 12:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:04, schrieb Paolo Bonzini:
> This patch adds the implementation of a new job that mirrors a disk to
> a new image while letting the guest continue using the old image.
> The target is treated as a "black box" and data is copied from the
> source to the target in the background.  This can be used for several
> purposes, including storage migration, continuous replication, and
> observation of the guest I/O in an external program.  It is also a
> first step in replacing the inefficient block migration code that is
> part of QEMU.
> 
> The job is possibly never-ending, but it is logically structured into
> two phases: 1) copy all data as fast as possible until the target
> first gets in sync with the source; 2) keep target in sync and
> ensure that reopening to the target gets a correct (full) copy
> of the source data.
> 
> The second phase is indicated by the progress in "info block-jobs"
> reporting the current offset to be equal to the length of the file.
> When the job is cancelled in the second phase, QEMU will run the
> job until the source is clean and quiescent, then it will report
> successful completion of the job.
> 
> In other words, the BLOCK_JOB_CANCELLED event means that the target
> may _not_ be consistent with a past state of the source; the
> BLOCK_JOB_COMPLETED event means that the target is consistent with
> a past state of the source.  (Note that it could already happen
> that management lost the race against QEMU and got a completion
> event instead of cancellation).
> 
> It is not yet possible to complete the job and switch over to the target
> disk.  The next patches will fix this and add many refinements to the
> basic idea introduced here.  These include improved error management,
> some tunable knobs and performance optimizations.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/Makefile.objs |    2 +-
>  block/mirror.c      |  232 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  block_int.h         |   20 +++++
>  qapi-schema.json    |   17 ++++
>  trace-events        |    7 ++
>  5 files changed, 277 insertions(+), 1 deletion(-)
>  create mode 100644 block/mirror.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index c45affc..f1a394a 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -9,4 +9,4 @@ block-obj-$(CONFIG_LIBISCSI) += iscsi.o
>  block-obj-$(CONFIG_CURL) += curl.o
>  block-obj-$(CONFIG_RBD) += rbd.o
>  
> -common-obj-y += stream.o
> +common-obj-y += stream.o mirror.o
> diff --git a/block/mirror.c b/block/mirror.c
> new file mode 100644
> index 0000000..f7d36f9
> --- /dev/null
> +++ b/block/mirror.c
> @@ -0,0 +1,232 @@
> +/*
> + * Image mirroring
> + *
> + * Copyright Red Hat, Inc. 2012
> + *
> + * Authors:
> + *  Paolo Bonzini  <pbonzini@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "trace.h"
> +#include "blockjob.h"
> +#include "block_int.h"
> +#include "qemu/ratelimit.h"
> +
> +enum {
> +    /*
> +     * Size of data buffer for populating the image file.  This should be large
> +     * enough to process multiple clusters in a single call, so that populating
> +     * contiguous regions of the image is efficient.
> +     */
> +    BLOCK_SIZE = 512 * BDRV_SECTORS_PER_DIRTY_CHUNK, /* in bytes */
> +};
> +
> +#define SLICE_TIME 100000000ULL /* ns */
> +
> +typedef struct MirrorBlockJob {
> +    BlockJob common;
> +    RateLimit limit;
> +    BlockDriverState *target;
> +    MirrorSyncMode mode;
> +    int64_t sector_num;
> +    uint8_t *buf;
> +} MirrorBlockJob;
> +
> +static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +{
> +    BlockDriverState *source = s->common.bs;
> +    BlockDriverState *target = s->target;
> +    QEMUIOVector qiov;
> +    int ret, nb_sectors;
> +    int64_t end;
> +    struct iovec iov;
> +
> +    end = s->common.len >> BDRV_SECTOR_BITS;
> +    s->sector_num = bdrv_get_next_dirty(source, s->sector_num);
> +    nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
> +    bdrv_reset_dirty(source, s->sector_num, nb_sectors);
> +
> +    /* Copy the dirty cluster.  */
> +    iov.iov_base = s->buf;
> +    iov.iov_len  = nb_sectors * 512;
> +    qemu_iovec_init_external(&qiov, &iov, 1);
> +
> +    trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
> +    ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
> +}
> +
> +static void coroutine_fn mirror_run(void *opaque)
> +{
> +    MirrorBlockJob *s = opaque;
> +    BlockDriverState *bs = s->common.bs;
> +    int64_t sector_num, end;
> +    int ret = 0;
> +    int n;
> +    bool synced = false;
> +
> +    if (block_job_is_cancelled(&s->common)) {
> +        goto immediate_exit;
> +    }
> +
> +    s->common.len = bdrv_getlength(bs);
> +    if (s->common.len < 0) {
> +        block_job_completed(&s->common, s->common.len);
> +        return;
> +    }
> +
> +    end = s->common.len >> BDRV_SECTOR_BITS;
> +    s->buf = qemu_blockalign(bs, BLOCK_SIZE);
> +
> +    if (s->mode == MIRROR_SYNC_MODE_FULL || s->mode == MIRROR_SYNC_MODE_TOP) {

I think this is the common case, so s->mode != MIRROR_SYNC_MODE_NONE
might describe it better?

> +        /* First part, loop on the sectors and initialize the dirty bitmap.  */
> +        BlockDriverState *base;
> +        base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
> +        for (sector_num = 0; sector_num < end; ) {
> +            int64_t next = (sector_num | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
> +            ret = bdrv_co_is_allocated_above(bs, base,
> +                                             sector_num, next - sector_num, &n);
> +
> +            if (ret < 0) {
> +                break;
> +            } else if (ret == 1) {
> +                bdrv_set_dirty(bs, sector_num, n);
> +                sector_num = next;
> +            } else {
> +                sector_num += n;
> +            }

Maybe it would be worth checking for n == 0 and returning an error in
that case. One example where this happens is when asking for the
allocation status after EOF. It shouldn't happen as long as
bdrv_truncate() is forbidden while the job runs, but an extra check
rarely hurts.

> +        }
> +    }
> +
> +    if (ret < 0) {
> +        goto immediate_exit;
> +    }

Why not do that directly instead of having a break; first just to get here?

> +
> +    s->sector_num = -1;
> +    for (;;) {
> +        uint64_t delay_ns;
> +        int64_t cnt;
> +        bool should_complete;
> +
> +        cnt = bdrv_get_dirty_count(bs);
> +        if (cnt != 0) {
> +            ret = mirror_iteration(s);
> +            if (ret < 0) {
> +                break;

goto immediate_exit? It's the same now, but code after the loop may be
added in the future.

> +            }
> +            cnt = bdrv_get_dirty_count(bs);
> +        }
> +
> +        if (cnt != 0) {
> +            should_complete = false;
> +        } else {
> +            trace_mirror_before_flush(s);
> +            bdrv_flush(s->target);

No error handling?

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-07-24 11:04 ` [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command Paolo Bonzini
  2012-07-26 23:42   ` Eric Blake
  2012-07-31  9:26   ` Kevin Wolf
@ 2012-09-13 13:15   ` Kevin Wolf
  2012-09-13 13:24     ` Paolo Bonzini
  2 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-09-13 13:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 24.07.2012 13:04, schrieb Paolo Bonzini:
> This adds the monitor commands that start the mirroring job.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockdev.c       |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  hmp-commands.hx  |   21 +++++++++
>  hmp.c            |   28 ++++++++++++
>  hmp.h            |    1 +
>  qapi-schema.json |   35 ++++++++++++++
>  qmp-commands.hx  |   42 +++++++++++++++++
>  trace-events     |    2 +-
>  7 files changed, 258 insertions(+), 4 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 192a9db..4b4574a 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -21,6 +21,8 @@
>  #include "trace.h"
>  #include "arch_init.h"
>  
> +static void block_job_cb(void *opaque, int ret);
> +
>  static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
>  
>  static const char *const if_name[IF_COUNT] = {
> @@ -825,6 +827,131 @@ exit:
>      return;
>  }
>  
> +void qmp_drive_mirror(const char *device, const char *target,
> +                      bool has_format, const char *format,
> +                      enum MirrorSyncMode sync,
> +                      bool has_mode, enum NewImageMode mode,
> +                      bool has_speed, int64_t speed, Error **errp)
> +{
> +    BlockDriverInfo bdi;
> +    BlockDriverState *bs;
> +    BlockDriverState *source, *target_bs;
> +    BlockDriver *proto_drv;
> +    BlockDriver *drv = NULL;
> +    Error *local_err = NULL;
> +    int flags;
> +    uint64_t size;
> +    int ret;
> +
> +    if (!has_speed) {
> +        speed = 0;
> +    }
> +    if (!has_mode) {
> +        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
> +    }
> +
> +    bs = bdrv_find(device);
> +    if (!bs) {
> +        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
> +        return;
> +    }
> +
> +    if (!has_format) {
> +        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;
> +    }
> +    if (format) {
> +        drv = bdrv_find_format(format);
> +        if (!drv) {
> +            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
> +            return;
> +        }
> +    }
> +
> +    if (!bdrv_is_inserted(bs)) {
> +        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
> +        return;
> +    }
> +
> +    if (bdrv_in_use(bs)) {
> +        error_set(errp, QERR_DEVICE_IN_USE, device);
> +        return;
> +    }
> +
> +    flags = bs->open_flags | BDRV_O_RDWR;

Do we take care to make the image read-only again after completion?

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-09-13 13:15   ` Kevin Wolf
@ 2012-09-13 13:24     ` Paolo Bonzini
  2012-09-13 13:26       ` Kevin Wolf
  0 siblings, 1 reply; 136+ messages in thread
From: Paolo Bonzini @ 2012-09-13 13:24 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 13/09/2012 15:15, Kevin Wolf ha scritto:
>> > +    flags = bs->open_flags | BDRV_O_RDWR;
> Do we take care to make the image read-only again after completion?

Not at the file-descriptor level, but bs->read_only is indeed restored
to "true" via bdrv_swap.

Doing it on the file descriptor is possible with Jeff's bdrv_reopen patches.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-09-13 13:24     ` Paolo Bonzini
@ 2012-09-13 13:26       ` Kevin Wolf
  2012-09-13 13:38         ` Paolo Bonzini
  0 siblings, 1 reply; 136+ messages in thread
From: Kevin Wolf @ 2012-09-13 13:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, eblake, qemu-devel, stefanha

Am 13.09.2012 15:24, schrieb Paolo Bonzini:
> Il 13/09/2012 15:15, Kevin Wolf ha scritto:
>>>> +    flags = bs->open_flags | BDRV_O_RDWR;
>> Do we take care to make the image read-only again after completion?
> 
> Not at the file-descriptor level, but bs->read_only is indeed restored
> to "true" via bdrv_swap.
> 
> Doing it on the file descriptor is possible with Jeff's bdrv_reopen patches.

Ah, right, obviously you can't do it before Jeff's patches are in. But
yes, this is what I meant.

Kevin

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command
  2012-09-13 13:26       ` Kevin Wolf
@ 2012-09-13 13:38         ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-09-13 13:38 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 13/09/2012 15:26, Kevin Wolf ha scritto:
>>>>> +    flags = bs->open_flags | BDRV_O_RDWR;
>>> >> Do we take care to make the image read-only again after completion?
>> > 
>> > Not at the file-descriptor level, but bs->read_only is indeed restored
>> > to "true" via bdrv_swap.
>> > 
>> > Doing it on the file descriptor is possible with Jeff's bdrv_reopen patches.
> Ah, right, obviously you can't do it before Jeff's patches are in. But
> yes, this is what I meant.

Still, the guest won't be able to issue writes after switching to the
destination, if it couldn't do so before.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [Qemu-devel] [PATCH 27/47] block: introduce mirror job
  2012-09-13 12:54   ` Kevin Wolf
@ 2012-09-13 14:07     ` Paolo Bonzini
  0 siblings, 0 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-09-13 14:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, eblake, qemu-devel, stefanha

Il 13/09/2012 14:54, Kevin Wolf ha scritto:
>> > +            ret = bdrv_co_is_allocated_above(bs, base,
>> > +                                             sector_num, next - sector_num, &n);
>> > +
>> > +            if (ret < 0) {
>> > +                break;
>> > +            } else if (ret == 1) {
>> > +                bdrv_set_dirty(bs, sector_num, n);
>> > +                sector_num = next;
>> > +            } else {
>> > +                sector_num += n;
>> > +            }
> Maybe it would be worth checking for n == 0 and returning an error in
> that case. One example where this happens is when asking for the
> allocation status after EOF. It shouldn't happen as long as
> bdrv_truncate() is forbidden while the job runs, but an extra check
> rarely hurts.

This is just an initialization loop, so I'll add an assertion instead.

>> > +        }
>> > +    }
>> > +
>> > +    if (ret < 0) {
>> > +        goto immediate_exit;
>> > +    }
> Why not do that directly instead of having a break; first just to get here?

Good idea.

>> > +
>> > +    s->sector_num = -1;
>> > +    for (;;) {
>> > +        uint64_t delay_ns;
>> > +        int64_t cnt;
>> > +        bool should_complete;
>> > +
>> > +        cnt = bdrv_get_dirty_count(bs);
>> > +        if (cnt != 0) {
>> > +            ret = mirror_iteration(s);
>> > +            if (ret < 0) {
>> > +                break;
> goto immediate_exit? It's the same now, but code after the loop may be
> added in the future.

That's why there is a break. :)  There will be code added later before
immediate_exit.  But it is just an if statement that will never be true
if mirroring hasn't started ye, so it can also go after.  I'll change to
goto and move that if statement after the label.

>> > +            }
>> > +            cnt = bdrv_get_dirty_count(bs);
>> > +        }
>> > +
>> > +        if (cnt != 0) {
>> > +            should_complete = false;
>> > +        } else {
>> > +            trace_mirror_before_flush(s);
>> > +            bdrv_flush(s->target);
> No error handling?

Will add.

Paolo

^ permalink raw reply	[flat|nested] 136+ messages in thread

end of thread, other threads:[~2012-09-13 14:08 UTC | newest]

Thread overview: 136+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-24 11:03 [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 01/47] qapi: generalize documentation of streaming commands Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 02/47] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE Paolo Bonzini
2012-07-26 15:26   ` Kevin Wolf
2012-07-26 15:41     ` Paolo Bonzini
2012-07-26 16:49       ` Luiz Capitulino
2012-07-26 16:59         ` Paolo Bonzini
2012-07-26 17:02           ` Luiz Capitulino
2012-07-24 11:03 ` [Qemu-devel] [PATCH 03/47] block: move job APIs to separate files Paolo Bonzini
2012-07-26 15:50   ` Kevin Wolf
2012-07-24 11:03 ` [Qemu-devel] [PATCH 04/47] block: add block_job_query Paolo Bonzini
2012-07-30 14:47   ` Kevin Wolf
2012-07-30 15:05     ` Paolo Bonzini
2012-07-31  8:47       ` Kevin Wolf
2012-07-31  8:50         ` Paolo Bonzini
2012-08-02 19:28           ` Jeff Cody
2012-07-24 11:03 ` [Qemu-devel] [PATCH 05/47] block: add support for job pause/resume Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 06/47] qmp: add block-job-pause and block-job-resume Paolo Bonzini
2012-08-01  7:42   ` Kevin Wolf
2012-07-24 11:03 ` [Qemu-devel] [PATCH 07/47] qemu-iotests: add test for pausing a streaming operation Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 08/47] block: rename block_job_complete to block_job_completed Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 09/47] block: rename BlockErrorAction, BlockQMPEventAction Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 10/47] block: move BlockdevOnError declaration to QAPI Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 11/47] block: reorganize io error code Paolo Bonzini
2012-08-01  9:30   ` Kevin Wolf
2012-08-01  9:46     ` Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 12/47] block: sort BlockDeviceIoStatus errors by severity Paolo Bonzini
2012-08-01  9:44   ` Paolo Bonzini
2012-08-01  9:44   ` Kevin Wolf
2012-07-24 11:03 ` [Qemu-devel] [PATCH 13/47] block: introduce block job error Paolo Bonzini
2012-07-25 17:40   ` Eric Blake
2012-08-01 10:14   ` Kevin Wolf
2012-08-01 11:17     ` Paolo Bonzini
2012-08-01 11:49       ` Kevin Wolf
2012-08-01 12:09         ` Paolo Bonzini
2012-08-01 12:23           ` Kevin Wolf
2012-08-01 12:30             ` Paolo Bonzini
2012-08-01 13:09               ` Kevin Wolf
2012-08-01 13:21                 ` Paolo Bonzini
2012-08-01 14:01                   ` Kevin Wolf
2012-08-01 14:34                     ` Paolo Bonzini
2012-08-01 14:59                       ` Kevin Wolf
2012-08-01 15:15                         ` Paolo Bonzini
2012-08-06  9:29                           ` Kevin Wolf
2012-08-06  9:44                             ` Paolo Bonzini
2012-08-06 10:45                               ` Kevin Wolf
2012-08-06 10:58                                 ` Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 14/47] stream: add on-error argument Paolo Bonzini
2012-07-31 18:40   ` Eric Blake
2012-08-01 10:29   ` Kevin Wolf
2012-08-01 11:11     ` Paolo Bonzini
2012-08-01 11:45       ` Kevin Wolf
2012-07-24 11:03 ` [Qemu-devel] [PATCH 15/47] blkdebug: process all set_state rules in the old state Paolo Bonzini
2012-07-24 20:06   ` Blue Swirl
2012-07-24 11:03 ` [Qemu-devel] [PATCH 16/47] qemu-iotests: map underscore to dash in QMP argument names Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 17/47] qemu-iotests: add tests for streaming error handling Paolo Bonzini
2012-08-01 10:43   ` Kevin Wolf
2012-08-01 11:09     ` Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 18/47] block: live snapshot documentation tweaks Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 19/47] block: add bdrv_query_info Paolo Bonzini
2012-09-11 13:07   ` Kevin Wolf
2012-09-11 13:12     ` Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 20/47] block: add bdrv_query_stats Paolo Bonzini
2012-07-24 11:03 ` [Qemu-devel] [PATCH 21/47] block: add bdrv_ensure_backing_file Paolo Bonzini
2012-09-11 13:32   ` Kevin Wolf
2012-09-11 13:46     ` Paolo Bonzini
2012-09-11 13:58       ` Kevin Wolf
2012-09-11 14:10         ` Paolo Bonzini
2012-09-11 15:38           ` Kevin Wolf
2012-07-24 11:04 ` [Qemu-devel] [PATCH 22/47] block: make device optional in BlockInfo Paolo Bonzini
2012-09-11 13:38   ` Kevin Wolf
2012-09-11 13:49     ` Paolo Bonzini
2012-09-11 14:02       ` Kevin Wolf
2012-09-11 14:14         ` Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 23/47] block: add target info to QMP query-blockjobs command Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 24/47] block: introduce new dirty bitmap functionality Paolo Bonzini
2012-09-11 14:57   ` Kevin Wolf
2012-09-11 16:17     ` Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 25/47] block: add block-job-complete Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 26/47] block: introduce BLOCK_JOB_READY event Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 27/47] block: introduce mirror job Paolo Bonzini
2012-07-25 23:02   ` Eric Blake
2012-09-13 12:54   ` Kevin Wolf
2012-09-13 14:07     ` Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 28/47] qmp: add drive-mirror command Paolo Bonzini
2012-07-26 23:42   ` Eric Blake
2012-07-27  7:04     ` Paolo Bonzini
2012-07-31  9:26   ` Kevin Wolf
2012-07-31  9:33     ` Paolo Bonzini
2012-07-31  9:46       ` Kevin Wolf
2012-07-31 10:02         ` Paolo Bonzini
2012-07-31 10:25           ` Kevin Wolf
2012-07-31 10:51             ` Paolo Bonzini
2012-07-31 11:13               ` Kevin Wolf
2012-07-31 11:25                 ` Paolo Bonzini
2012-07-31 12:17                   ` Kevin Wolf
2012-07-31 12:52                     ` Paolo Bonzini
2012-09-13 13:15   ` Kevin Wolf
2012-09-13 13:24     ` Paolo Bonzini
2012-09-13 13:26       ` Kevin Wolf
2012-09-13 13:38         ` Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 29/47] mirror: support querying target file Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 30/47] mirror: implement completion Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 31/47] qemu-iotests: add mirroring test case Paolo Bonzini
2012-07-26 23:46   ` Eric Blake
2012-07-27  7:04     ` Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 32/47] block: forward bdrv_iostatus_reset to block job Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 33/47] mirror: add support for on-source-error/on-target-error Paolo Bonzini
2012-07-27 15:26   ` Eric Blake
2012-07-30 13:29     ` Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 34/47] qmp: add pull_event function Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 35/47] qemu-iotests: add testcases for mirroring on-source-error/on-target-error Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 36/47] host-utils: add ffsl and flsl Paolo Bonzini
2012-07-27 16:05   ` Eric Blake
2012-07-30 13:30     ` Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 37/47] add hierarchical bitmap data type and test cases Paolo Bonzini
2012-07-28 13:26   ` Eric Blake
2012-07-30 13:39     ` Paolo Bonzini
2012-07-30 14:18       ` Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 38/47] block: implement dirty bitmap using HBitmap Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 39/47] block: make round_to_clusters public Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 40/47] mirror: perform COW if the cluster size is bigger than the granularity Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 41/47] block: return count of dirty sectors, not chunks Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 42/47] block: allow customizing the granularity of the dirty bitmap Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 43/47] mirror: allow customizing the granularity Paolo Bonzini
2012-07-28 13:43   ` Eric Blake
2012-07-30 13:40     ` Paolo Bonzini
2012-07-30 13:53       ` Eric Blake
2012-07-30 14:03         ` Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 44/47] mirror: switch mirror_iteration to AIO Paolo Bonzini
2012-07-28 13:46   ` Eric Blake
2012-07-30 13:41     ` Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 45/47] mirror: add buf-size argument to drive-mirror Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 46/47] mirror: support more than one in-flight AIO operation Paolo Bonzini
2012-07-24 11:04 ` [Qemu-devel] [PATCH 47/47] mirror: support arbitrarily-sized iterations Paolo Bonzini
2012-07-28 13:51 ` [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2 Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.