All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit
@ 2012-08-30 18:47 Jeff Cody
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images Jeff Cody
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Jeff Cody @ 2012-08-30 18:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

Live block commit.

I originally had intended for this RFC series to include the more
complicated case of a live commit of the active layer, but removed
it for this commit in the hopes of making it into the soft feature
freeze for 1.2, so this series is the simpler case.

This series adds the basic case, of a live commit between two
images below the active layer, e.g.:

[base] <--- [snp-1] <--- [snp-2] <--- [snp-3] <--- [active]

can be collapsed down via commit, into:

[base] <--- [active]

or,

[base] <--- [snp-1] <--- [active],

[base] <--- [snp-3] <--- [active],

etc..

TODO: * qemu-io tests (in progress)
      * 'stage-2' of live commit functionality, to be able to push down the
        active layer. This structured something like mirroring, to allow for
        convergence.

Changes from the RFC v1 series:

* This patch series is not on top of Paolo's blk mirror series yet, to make it
  easier to apply independently if desired.  This means some of what was in the
  previous RFC series is not in this one (BlockdevOnError, for instance), but
  that can be easily added in once Paolo's series are in.

* This patches series is dependent on the reopen() series with transactional
  reopen.

* The target release for this series is 1.3

* Found some mistakes in the reopen calls

* Dropped the BlockdevOnError argument (for now), will add in if rebasing on
  top of Paolo's series.

* Used the new qerror system


Jeff Cody (6):
  1/6   block: add support functions for live commit, to find and delete
        images.
  2/6   block: add live block commit functionality
  3/6   blockdev: rename block_stream_cb to a generic block_job_cb
  4/6   qerror: new error for live block commit, QERR_TOP_NOT_FOUND
  5/6   block: helper function, to find the base image of a chain
  6/6   QAPI: add command for live block commit, 'block-commit'

 block.c             | 158 ++++++++++++++++++++++++++++++++++++++++
 block.h             |   6 +-
 block/Makefile.objs |   1 +
 block/commit.c      | 202 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block_int.h         |  19 +++++
 blockdev.c          |  91 ++++++++++++++++++++++-
 qapi-schema.json    |  30 ++++++++
 qerror.h            |   3 +
 qmp-commands.hx     |   6 ++
 trace-events        |   4 +-
 10 files changed, 515 insertions(+), 5 deletions(-)
 create mode 100644 block/commit.c

-- 
1.7.11.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images.
  2012-08-30 18:47 [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody
@ 2012-08-30 18:47 ` Jeff Cody
  2012-09-06 13:23   ` Kevin Wolf
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 2/6] block: add live block commit functionality Jeff Cody
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Jeff Cody @ 2012-08-30 18:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

Add bdrv_find_child(), and bdrv_delete_intermediate().

bdrv_find_child():  given 'bs' and the active (topmost) BDS of an image chain,
                    find the image that is the immediate top of 'bs'

bdrv_delete_intermediate():
                    Given 3 BDS (active, top, base), delete images above
                    base up to and including top, and set base to be the
                    parent of top's child node.

                    E.g., this converts:

                    bottom <- base <- intermediate <- top <- active

                    to

                    bottom <- base <- active

                    where top == active is permitted, although active
                    will not be deleted.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h |   5 ++-
 2 files changed, 146 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 9470319..11e275c 100644
--- a/block.c
+++ b/block.c
@@ -1752,6 +1752,148 @@ int bdrv_change_backing_file(BlockDriverState *bs,
     return ret;
 }
 
+/* Finds the image layer immediately to the 'top' of bs.
+ *
+ * active is the current topmost image.
+ */
+BlockDriverState *bdrv_find_child(BlockDriverState *active,
+                                  BlockDriverState *bs)
+{
+    BlockDriverState *child = NULL;
+    BlockDriverState *intermediate;
+
+    /* if the active bs layer is the same as the new top, then there
+     * is no image above the top, so it will be returned as the child
+     */
+    if (active == bs) {
+        child = active;
+    } else {
+        intermediate = active;
+        while (intermediate->backing_hd) {
+            if (intermediate->backing_hd == bs) {
+                child = intermediate;
+                break;
+            }
+            intermediate = intermediate->backing_hd;
+        }
+    }
+
+    return child;
+}
+
+typedef struct BlkIntermediateStates {
+    BlockDriverState *bs;
+    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
+} BlkIntermediateStates;
+
+
+/* deletes images above 'base' up to and including 'top', and sets the image
+ * above 'top' to have base as its backing file.
+ *
+ * E.g., this will convert the following chain:
+ * bottom <- base <- intermediate <- top <- active
+ *
+ * to
+ *
+ * bottom <- base <- active
+ *
+ * It is allowed for bottom==base, in which case it converts:
+ *
+ * base <- intermediate <- top <- active
+ *
+ * to
+ *
+ * base <- active
+ *
+ * It is also allowed for top==active, except in that case active is not
+ * deleted:
+ *
+ * base <- intermediate <- top
+ *
+ * becomes
+ *
+ * base <- top
+ */
+int bdrv_delete_intermediate(BlockDriverState *active, BlockDriverState *top,
+                             BlockDriverState *base)
+{
+    BlockDriverState *intermediate;
+    BlockDriverState *base_bs = NULL;
+    BlockDriverState *new_top_bs = NULL;
+    BlkIntermediateStates *intermediate_state, *next;
+    int ret = -1;
+
+    QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete;
+    QSIMPLEQ_INIT(&states_to_delete);
+
+    if (!top->drv || !base->drv) {
+        goto exit;
+    }
+
+    new_top_bs = bdrv_find_child(active, top);
+
+    /* special case of new_top_bs->backing_hd already pointing to base - nothing
+     * to do, no intermediate images
+     */
+    if (new_top_bs->backing_hd == base) {
+        ret = 0;
+        goto exit;
+    }
+
+    if (new_top_bs == NULL) {
+        /* we could not find the image above 'top', this is an error */
+        goto exit;
+    }
+
+    /* if the active and top image passed in are the same, then we
+     * can't delete the active, so we start one below
+     */
+    intermediate = (active == top) ? active->backing_hd : top;
+
+    /* now we will go down through the list, and add each BDS we find
+     * into our deletion queue, until we hit the 'base'
+     */
+    while (intermediate) {
+        intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
+        intermediate_state->bs = intermediate;
+        QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
+
+        if (intermediate->backing_hd == base) {
+            base_bs = intermediate->backing_hd;
+            break;
+        }
+        intermediate = intermediate->backing_hd;
+    }
+    if (base_bs == NULL) {
+        /* something went wrong, we did not end at the base. safely
+         * unravel everything, and exit with error */
+        goto exit;
+    }
+
+    /* success - we can delete the intermediate states, and link top->base */
+    ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
+                                   base_bs->drv ? base_bs->drv->format_name : "");
+    if (ret) {
+        goto exit;
+    }
+    new_top_bs->backing_hd = base_bs;
+
+
+    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
+        /* so that bdrv_close() does not recursively close the chain */
+        intermediate_state->bs->backing_hd = NULL;
+        bdrv_delete(intermediate_state->bs);
+    }
+    ret = 0;
+
+exit:
+    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
+        g_free(intermediate_state);
+    }
+    return ret;
+}
+
+
 static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
                                    size_t size)
 {
diff --git a/block.h b/block.h
index db812b1..ee76869 100644
--- a/block.h
+++ b/block.h
@@ -201,7 +201,10 @@ int bdrv_commit_all(void);
 int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
-
+int bdrv_delete_intermediate(BlockDriverState *active, BlockDriverState *top,
+                             BlockDriverState *base);
+BlockDriverState *bdrv_find_child(BlockDriverState *active,
+                                  BlockDriverState *bs);
 
 typedef struct BdrvCheckResult {
     int corruptions;
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 2/6] block: add live block commit functionality
  2012-08-30 18:47 [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images Jeff Cody
@ 2012-08-30 18:47 ` Jeff Cody
  2012-09-06 14:00   ` Kevin Wolf
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 3/6] blockdev: rename block_stream_cb to a generic block_job_cb Jeff Cody
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Jeff Cody @ 2012-08-30 18:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

This adds the live commit coroutine.  This iteration focuses on the
commit only below the active layer, and not the active layer itself.

The behaviour is similar to block streaming; the sectors are walked
through, and anything that exists above 'base' is committed back down
into base.  At the end, intermediate images are deleted, and the
chain stitched together.  Images are restored to their original open
flags upon completion.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/Makefile.objs |   1 +
 block/commit.c      | 202 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block_int.h         |  19 +++++
 trace-events        |   2 +
 4 files changed, 224 insertions(+)
 create mode 100644 block/commit.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..4a136b8 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-obj-y += stream.o
+block-obj-y += commit.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
diff --git a/block/commit.c b/block/commit.c
new file mode 100644
index 0000000..bd3d882
--- /dev/null
+++ b/block/commit.c
@@ -0,0 +1,202 @@
+/*
+ * Live block commit
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Jeff Cody   <jcody@redhat.com>
+ *  Based on stream.c by Stefan Hajnoczi
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "trace.h"
+#include "block_int.h"
+#include "qemu/ratelimit.h"
+
+enum {
+    /*
+     * Size of data buffer for populating the image file.  This should be large
+     * enough to process multiple clusters in a single call, so that populating
+     * contiguous regions of the image is efficient.
+     */
+    COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
+};
+
+#define SLICE_TIME 100000000ULL /* ns */
+
+typedef struct CommitBlockJob {
+    BlockJob common;
+    RateLimit limit;
+    BlockDriverState *active;
+    BlockDriverState *top;
+    BlockDriverState *base;
+    BlockErrorAction on_error;
+    int base_flags;
+    int top_flags;
+} CommitBlockJob;
+
+static int coroutine_fn commit_populate(BlockDriverState *bs,
+                                        BlockDriverState *base,
+                                        int64_t sector_num, int nb_sectors,
+                                        void *buf)
+{
+    if (bdrv_read(bs, sector_num, buf, nb_sectors)) {
+        return -EIO;
+    }
+    if (bdrv_write(base, sector_num, buf, nb_sectors)) {
+        return -EIO;
+    }
+    return 0;
+}
+
+static void coroutine_fn commit_run(void *opaque)
+{
+    CommitBlockJob *s = opaque;
+    BlockDriverState *active = s->active;
+    BlockDriverState *top = s->top;
+    BlockDriverState *base = s->base;
+    BlockDriverState *top_child = NULL;
+    int64_t sector_num, end;
+    int error = 0;
+    int ret = 0;
+    int n = 0;
+    void *buf;
+    int bytes_written = 0;
+
+    s->common.len = bdrv_getlength(top);
+    if (s->common.len < 0) {
+        block_job_complete(&s->common, s->common.len);
+        return;
+    }
+
+    top_child = bdrv_find_child(active, top);
+
+    end = s->common.len >> BDRV_SECTOR_BITS;
+    buf = qemu_blockalign(top, COMMIT_BUFFER_SIZE);
+
+    for (sector_num = 0; sector_num < end; sector_num += n) {
+        uint64_t delay_ns = 0;
+        bool copy;
+
+wait:
+        /* Note that even when no rate limit is applied we need to yield
+         * with no pending I/O here so that qemu_aio_flush() returns.
+         */
+        block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+        if (block_job_is_cancelled(&s->common)) {
+            break;
+        }
+        /* Copy if allocated above the base */
+        ret = bdrv_co_is_allocated_above(top, base, sector_num,
+                                         COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
+                                         &n);
+        copy = (ret == 1);
+        trace_commit_one_iteration(s, sector_num, n, ret);
+        if (ret >= 0 && copy) {
+            if (s->common.speed) {
+                delay_ns = ratelimit_calculate_delay(&s->limit, n);
+                if (delay_ns > 0) {
+                    goto wait;
+                }
+            }
+            ret = commit_populate(top, base, sector_num, n, buf);
+            bytes_written += n * BDRV_SECTOR_SIZE;
+        }
+        if (ret < 0) {
+            if (s->on_error == BLOCK_ERR_STOP_ANY ||
+                s->on_error == BLOCK_ERR_STOP_ENOSPC) {
+                n = 0;
+                continue;
+            }
+            if (error == 0) {
+                error = ret;
+            }
+            if (s->on_error == BLOCK_ERR_REPORT) {
+                break;
+            }
+        }
+        ret = 0;
+
+        /* Publish progress */
+        s->common.offset += n * BDRV_SECTOR_SIZE;
+    }
+
+    if (!block_job_is_cancelled(&s->common) && sector_num == end && ret == 0) {
+        /* success */
+        if (bdrv_delete_intermediate(active, top, base)) {
+            /* something went wrong! */
+            /* TODO:add error reporting here */
+        }
+    }
+
+    /* restore base open flags here if appropriate (e.g., change the base back
+     * to r/o). These reopens do not need to be atomic, since we won't abort
+     * even on failure here */
+
+    if (s->base_flags != bdrv_get_flags(base)) {
+        bdrv_reopen(base, s->base_flags, NULL);
+    }
+    if (s->top_flags != bdrv_get_flags(top_child)) {
+        bdrv_reopen(top_child, s->top_flags, NULL);
+    }
+
+    qemu_vfree(buf);
+    block_job_complete(&s->common, ret);
+}
+
+static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+    CommitBlockJob *s = container_of(job, CommitBlockJob, common);
+
+    if (speed < 0) {
+        error_set(errp, QERR_INVALID_PARAMETER, "speed");
+        return;
+    }
+    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+}
+
+static BlockJobType commit_job_type = {
+    .instance_size = sizeof(CommitBlockJob),
+    .job_type      = "commit",
+    .set_speed     = commit_set_speed,
+};
+
+void commit_start(BlockDriverState *bs, BlockDriverState *base,
+                 BlockDriverState *top, int64_t speed,
+                 BlockErrorAction on_error, BlockDriverCompletionFunc *cb,
+                 void *opaque, int orig_base_flags, int orig_top_flags,
+                 Error **errp)
+{
+    CommitBlockJob *s;
+
+    if ((on_error == BLOCK_ERR_STOP_ANY ||
+         on_error == BLOCK_ERR_STOP_ENOSPC) &&
+        !bdrv_iostatus_is_enabled(bs)) {
+        error_set(errp, QERR_INVALID_PARAMETER_COMBINATION);
+        return;
+    }
+
+    s = block_job_create(&commit_job_type, bs, speed, cb, opaque, errp);
+    if (!s) {
+        return;
+    }
+
+    s->base   = base;
+    s->top    = top;
+    s->active = bs;
+
+    s->base_flags = orig_base_flags;
+    s->top_flags  = orig_top_flags;
+
+    s->on_error = on_error;
+    s->common.co = qemu_coroutine_create(commit_run);
+
+    trace_commit_start(bs, base, top, s, s->common.co, opaque,
+                       orig_base_flags, orig_top_flags);
+    qemu_coroutine_enter(s->common.co, s);
+
+    return;
+}
diff --git a/block_int.h b/block_int.h
index 7a4e226..5c936ee 100644
--- a/block_int.h
+++ b/block_int.h
@@ -469,4 +469,23 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
+/**
+ * commit_start:
+ * @bs: Top Block device
+ * @base: Block device that will be written into, and become the new top
+ * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @on_error: The action to take upon error.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @orig_base_flags: The original open flags for the base image
+ * @orig_top_flags: The original open flags for the top image
+ * @errp: Error object.
+ *
+ */
+void commit_start(BlockDriverState *bs, BlockDriverState *base,
+                 BlockDriverState *top, int64_t speed,
+                 BlockErrorAction on_error, BlockDriverCompletionFunc *cb,
+                 void *opaque, int orig_base_flags, int orig_top_flags,
+                 Error **errp);
+
 #endif /* BLOCK_INT_H */
diff --git a/trace-events b/trace-events
index 04b0723..9eb8f10 100644
--- a/trace-events
+++ b/trace-events
@@ -74,6 +74,8 @@ bdrv_co_do_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t c
 # block/stream.c
 stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
 stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base %p s %p co %p opaque %p"
+commit_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
+commit_start(void *bs, void *base, void *top, void *s, void *co, void *opaque, int base_flags, int top_flags) "bs %p base %p top %p s %p co %p opaque %p base_flags %d top_flags %d"
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 3/6] blockdev: rename block_stream_cb to a generic block_job_cb
  2012-08-30 18:47 [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images Jeff Cody
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 2/6] block: add live block commit functionality Jeff Cody
@ 2012-08-30 18:47 ` Jeff Cody
  2012-09-07 16:27   ` Paolo Bonzini
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 4/6] qerror: new error for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Jeff Cody @ 2012-08-30 18:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak


Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockdev.c   | 8 +++++---
 trace-events | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 7c83baa..68d65fb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -53,6 +53,8 @@ static const int if_max_devs[IF_COUNT] = {
     [IF_SCSI] = 7,
 };
 
+static void block_job_cb(void *opaque, int ret);
+
 /*
  * We automatically delete the drive when a device using it gets
  * unplugged.  Questionable feature, but we can't just drop it.
@@ -1063,12 +1065,12 @@ static QObject *qobject_from_block_job(BlockJob *job)
                               job->speed);
 }
 
-static void block_stream_cb(void *opaque, int ret)
+static void block_job_cb(void *opaque, int ret)
 {
     BlockDriverState *bs = opaque;
     QObject *obj;
 
-    trace_block_stream_cb(bs, bs->job, ret);
+    trace_block_job_cb(bs, bs->job, ret);
 
     assert(bs->job);
     obj = qobject_from_block_job(bs->job);
@@ -1110,7 +1112,7 @@ void qmp_block_stream(const char *device, bool has_base,
     }
 
     stream_start(bs, base_bs, base, has_speed ? speed : 0,
-                 block_stream_cb, bs, &local_err);
+                 block_job_cb, bs, &local_err);
     if (error_is_set(&local_err)) {
         error_propagate(errp, local_err);
         return;
diff --git a/trace-events b/trace-events
index 9eb8f10..8d7a8d3 100644
--- a/trace-events
+++ b/trace-events
@@ -79,7 +79,7 @@ commit_start(void *bs, void *base, void *top, void *s, void *co, void *opaque, i
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-block_stream_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
+block_job_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
 # hw/virtio-blk.c
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 4/6] qerror: new error for live block commit, QERR_TOP_NOT_FOUND
  2012-08-30 18:47 [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody
                   ` (2 preceding siblings ...)
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 3/6] blockdev: rename block_stream_cb to a generic block_job_cb Jeff Cody
@ 2012-08-30 18:47 ` Jeff Cody
  2012-08-30 22:55   ` Eric Blake
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 5/6] block: helper function, to find the base image of a chain Jeff Cody
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Jeff Cody @ 2012-08-30 18:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak


Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 qerror.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qerror.h b/qerror.h
index d0a76a4..7396184 100644
--- a/qerror.h
+++ b/qerror.h
@@ -219,6 +219,9 @@ void assert_no_error(Error *err);
 #define QERR_TOO_MANY_FILES \
     ERROR_CLASS_GENERIC_ERROR, "Too many open files"
 
+#define QERR_TOP_NOT_FOUND \
+    ERROR_CLASS_GENERIC_ERROR, "Top image file %s not found"
+
 #define QERR_UNDEFINED_ERROR \
     ERROR_CLASS_GENERIC_ERROR, "An undefined error has occurred"
 
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 5/6] block: helper function, to find the base image of a chain
  2012-08-30 18:47 [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody
                   ` (3 preceding siblings ...)
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 4/6] qerror: new error for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
@ 2012-08-30 18:47 ` Jeff Cody
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 6/6] QAPI: add command for live block commit, 'block-commit' Jeff Cody
  2012-08-30 21:31 ` [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff Cody @ 2012-08-30 18:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

This is a simple helper function, that will return the base image
of a given image chain.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block.c | 16 ++++++++++++++++
 block.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/block.c b/block.c
index 11e275c..5f58600 100644
--- a/block.c
+++ b/block.c
@@ -3137,6 +3137,22 @@ int bdrv_get_backing_file_depth(BlockDriverState *bs)
     return 1 + bdrv_get_backing_file_depth(bs->backing_hd);
 }
 
+BlockDriverState *bdrv_find_base(BlockDriverState *bs)
+{
+    BlockDriverState *curr_bs = NULL;
+
+    if (!bs) {
+        return NULL;
+    }
+
+    curr_bs = bs;
+
+    while (curr_bs->backing_hd) {
+        curr_bs = curr_bs->backing_hd;
+    }
+    return curr_bs;
+}
+
 #define NB_SUFFIXES 4
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size)
diff --git a/block.h b/block.h
index ee76869..376cc50 100644
--- a/block.h
+++ b/block.h
@@ -201,6 +201,7 @@ int bdrv_commit_all(void);
 int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
+BlockDriverState *bdrv_find_base(BlockDriverState *bs);
 int bdrv_delete_intermediate(BlockDriverState *active, BlockDriverState *top,
                              BlockDriverState *base);
 BlockDriverState *bdrv_find_child(BlockDriverState *active,
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 6/6] QAPI: add command for live block commit, 'block-commit'
  2012-08-30 18:47 [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody
                   ` (4 preceding siblings ...)
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 5/6] block: helper function, to find the base image of a chain Jeff Cody
@ 2012-08-30 18:47 ` Jeff Cody
  2012-08-30 23:06   ` Eric Blake
  2012-09-06 14:29   ` Kevin Wolf
  2012-08-30 21:31 ` [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody
  6 siblings, 2 replies; 26+ messages in thread
From: Jeff Cody @ 2012-08-30 18:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

The command for live block commit is added, which has the following
arguments:

device: the block device to perform the commit on (mandatory)
base:   the base image to commit into; optional (if not specified,
        it is the underlying original image)
top:    the top image of the commit - all data from inside top down
        to base will be committed into base. optional (if not specified,
        it is the active image) - see note below
speed:  maximum speed, in bytes/sec

note: eventually this will support merging down the active layer,
      but that code is not yet complete.  If the active layer is passed
      in currently as top, or top is left to the default, then the error
      QERR_TOP_NOT_FOUND will be returned.

The is done as a block job, so upon completion a BLOCK_JOB_COMPLETED will
be emitted.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockdev.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 30 ++++++++++++++++++++
 qmp-commands.hx  |  6 ++++
 3 files changed, 119 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 68d65fb..e0d6ca0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -827,6 +827,89 @@ exit:
     return;
 }
 
+void qmp_block_commit(const char *device,
+                      bool has_base, const char *base,
+                      bool has_top, const char *top,
+                      bool has_speed, int64_t speed,
+                      Error **errp)
+{
+    BlockDriverState *bs;
+    BlockDriverState *base_bs, *top_bs, *child_bs;
+    Error *local_err = NULL;
+    int orig_base_flags, orig_top_flags;
+    BlockReopenQueue *reopen_queue = NULL;
+    /* This will be part of the QMP command, if/when the
+     * BlockdevOnError change for blkmirror makes it in
+     */
+    BlockErrorAction on_error = BLOCK_ERR_REPORT;
+
+    /* drain all i/o before commits */
+    bdrv_drain_all();
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+    if (base && has_base) {
+        base_bs = bdrv_find_backing_image(bs, base);
+    } else {
+        base_bs = bdrv_find_base(bs);
+    }
+
+    if (base_bs == NULL) {
+        error_set(errp, QERR_BASE_NOT_FOUND, "NULL");
+        return;
+    }
+
+    if (top && has_top) {
+        /* if we want to allow the active layer,
+         * use 'bdrv_find_image()' here */
+        top_bs = bdrv_find_backing_image(bs, top);
+        if (top_bs == NULL) {
+            error_set(errp, QERR_TOP_NOT_FOUND, top);
+            return;
+        }
+    } else {
+        /* we will eventually default to the top layer,i.e. top_bs = bs */
+        error_set(errp, QERR_TOP_NOT_FOUND, top);
+        return;
+    }
+
+    child_bs = bdrv_find_child(bs, top_bs);
+
+    orig_base_flags = bdrv_get_flags(base_bs);  /* what we are writing into   */
+    orig_top_flags = bdrv_get_flags(child_bs);  /* to change the backing file */
+
+    /* convert base_bs to r/w, if necessary */
+    if (!(orig_base_flags & BDRV_O_RDWR)) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, base_bs,
+                                         orig_base_flags | BDRV_O_RDWR);
+    }
+    if (!(orig_top_flags & BDRV_O_RDWR)) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, base_bs,
+                                         orig_base_flags | BDRV_O_RDWR);
+    }
+    if (reopen_queue) {
+        bdrv_reopen_multiple(reopen_queue, &local_err);
+        if (local_err != NULL) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
+    commit_start(bs, base_bs, top_bs, speed, on_error,
+                 block_job_cb, bs, orig_base_flags, orig_top_flags,
+                 &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    /* Grab a reference so hotplug does not delete the BlockDriverState from
+     * underneath us.
+     */
+    drive_get_ref(drive_get_by_blockdev(bs));
+}
 
 static void eject_device(BlockDriverState *bs, int force, Error **errp)
 {
diff --git a/qapi-schema.json b/qapi-schema.json
index bd8ad74..45feda6 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1401,6 +1401,36 @@
   'returns': 'str' }
 
 ##
+# @block-commit
+#
+# Live commit of data from child image nodes into parent nodes - i.e.,
+# writes data between 'top' and 'base' into 'base'.
+#
+# @device:  the name of the device
+#
+# @base:   #optional The parent image of the device to write data into.
+#                    If not specified, this is the original parent image.
+#
+# @top:    #optional The child image, above which data will not be committed
+#                    down.  If not specified, this is the active layer.
+#
+# @speed:  #optional the maximum speed, in bytes per second
+#
+# Returns: Nothing on success
+#          If commit or stream is already active on this device, DeviceInUse
+#          If @device does not exist, DeviceNotFound
+#          If image commit is not supported by this device, NotSupported
+#          If @base does not exist, BaseNotFound
+#          If @top does not exist, TopNotFound
+#          If @speed is invalid, InvalidParameter
+#
+# Since: 1.2
+#
+##
+{ 'command': 'block-commit',
+  'data': { 'device': 'str', '*base': 'str', '*top': 'str',
+            '*speed': 'int' } }
+
 # @migrate_cancel
 #
 # Cancel the current executing migration process.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 3745a21..9292877 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -767,6 +767,12 @@ EQMP
     },
 
     {
+        .name       = "block-commit",
+        .args_type  = "device:B,base:s?,top:s?,speed:o?",
+        .mhandler.cmd_new = qmp_marshal_input_block_commit,
+    },
+
+    {
         .name       = "block-job-set-speed",
         .args_type  = "device:B,speed:o",
         .mhandler.cmd_new = qmp_marshal_input_block_job_set_speed,
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit
  2012-08-30 18:47 [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody
                   ` (5 preceding siblings ...)
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 6/6] QAPI: add command for live block commit, 'block-commit' Jeff Cody
@ 2012-08-30 21:31 ` Jeff Cody
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff Cody @ 2012-08-30 21:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, eblake, supriyak, stefanha

On 08/30/2012 02:47 PM, Jeff Cody wrote:
> Live block commit.
> 
> I originally had intended for this RFC series to include the more
> complicated case of a live commit of the active layer, but removed
> it for this commit in the hopes of making it into the soft feature
> freeze for 1.2, so this series is the simpler case.
> 
> This series adds the basic case, of a live commit between two
> images below the active layer, e.g.:
> 
> [base] <--- [snp-1] <--- [snp-2] <--- [snp-3] <--- [active]
> 
> can be collapsed down via commit, into:
> 
> [base] <--- [active]
> 
> or,
> 
> [base] <--- [snp-1] <--- [active],
> 
> [base] <--- [snp-3] <--- [active],
> 
> etc..
> 
> TODO: * qemu-io tests (in progress)
>       * 'stage-2' of live commit functionality, to be able to push down the
>         active layer. This structured something like mirroring, to allow for
>         convergence.
> 
> Changes from the RFC v1 series:
> 
> * This patch series is not on top of Paolo's blk mirror series yet, to make it
>   easier to apply independently if desired.  This means some of what was in the
>   previous RFC series is not in this one (BlockdevOnError, for instance), but
>   that can be easily added in once Paolo's series are in.
> 
> * This patches series is dependent on the reopen() series with transactional
>   reopen.
> 
> * The target release for this series is 1.3
> 
> * Found some mistakes in the reopen calls
> 
> * Dropped the BlockdevOnError argument (for now), will add in if rebasing on
>   top of Paolo's series.
> 
> * Used the new qerror system
> 
> 

I meant to add this to my cover letter, but forgot; if anyone wants to play around
with this, you can find it on github:

git://github.com/codyprime/qemu-kvm-jtc.git  (branch jtc-live-commit-1.3)




> Jeff Cody (6):
>   1/6   block: add support functions for live commit, to find and delete
>         images.
>   2/6   block: add live block commit functionality
>   3/6   blockdev: rename block_stream_cb to a generic block_job_cb
>   4/6   qerror: new error for live block commit, QERR_TOP_NOT_FOUND
>   5/6   block: helper function, to find the base image of a chain
>   6/6   QAPI: add command for live block commit, 'block-commit'
> 
>  block.c             | 158 ++++++++++++++++++++++++++++++++++++++++
>  block.h             |   6 +-
>  block/Makefile.objs |   1 +
>  block/commit.c      | 202 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block_int.h         |  19 +++++
>  blockdev.c          |  91 ++++++++++++++++++++++-
>  qapi-schema.json    |  30 ++++++++
>  qerror.h            |   3 +
>  qmp-commands.hx     |   6 ++
>  trace-events        |   4 +-
>  10 files changed, 515 insertions(+), 5 deletions(-)
>  create mode 100644 block/commit.c
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 4/6] qerror: new error for live block commit, QERR_TOP_NOT_FOUND
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 4/6] qerror: new error for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
@ 2012-08-30 22:55   ` Eric Blake
  2012-08-31 14:42     ` Jeff Cody
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Blake @ 2012-08-30 22:55 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

[-- Attachment #1: Type: text/plain, Size: 775 bytes --]

On 08/30/2012 11:47 AM, Jeff Cody wrote:
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  qerror.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/qerror.h b/qerror.h
> index d0a76a4..7396184 100644
> --- a/qerror.h
> +++ b/qerror.h
> @@ -219,6 +219,9 @@ void assert_no_error(Error *err);
>  #define QERR_TOO_MANY_FILES \
>      ERROR_CLASS_GENERIC_ERROR, "Too many open files"
>  
> +#define QERR_TOP_NOT_FOUND \
> +    ERROR_CLASS_GENERIC_ERROR, "Top image file %s not found"
> +

Are you sure this patch is needed, or should you be using error_setg()?

 http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg04980.html

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 6/6] QAPI: add command for live block commit, 'block-commit'
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 6/6] QAPI: add command for live block commit, 'block-commit' Jeff Cody
@ 2012-08-30 23:06   ` Eric Blake
  2012-08-31 14:42     ` Jeff Cody
  2012-09-06 14:29   ` Kevin Wolf
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Blake @ 2012-08-30 23:06 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

[-- Attachment #1: Type: text/plain, Size: 2572 bytes --]

On 08/30/2012 11:47 AM, Jeff Cody wrote:
> The command for live block commit is added, which has the following
> arguments:
> 
> device: the block device to perform the commit on (mandatory)
> base:   the base image to commit into; optional (if not specified,
>         it is the underlying original image)
> top:    the top image of the commit - all data from inside top down
>         to base will be committed into base. optional (if not specified,
>         it is the active image) - see note below
> speed:  maximum speed, in bytes/sec
> 
> note: eventually this will support merging down the active layer,
>       but that code is not yet complete.  If the active layer is passed
>       in currently as top, or top is left to the default, then the error
>       QERR_TOP_NOT_FOUND will be returned.
> 
> The is done as a block job, so upon completion a BLOCK_JOB_COMPLETED will
> be emitted.

Likewise, the job can be canceled, and it is possible to track progress
of the job or change the speed on the fly, using existing block job
commands.

Will the BLOCK_JOB_COMPLETED event have a new category listing that it
was a commit job instead of a stream job that completed?  That is, I
think this patch is incomplete, and that you also need to modify
QMP/qmp-events.txt to modify the 'type' field of
BLOCK_JOB_{CANCELLED,COMPLETED} to distinguish this new sub-type of event.

>  
>  ##
> +# @block-commit
> +#
> +# Live commit of data from child image nodes into parent nodes - i.e.,
> +# writes data between 'top' and 'base' into 'base'.
> +#
> +# @device:  the name of the device
> +#
> +# @base:   #optional The parent image of the device to write data into.
> +#                    If not specified, this is the original parent image.
> +#
> +# @top:    #optional The child image, above which data will not be committed
> +#                    down.  If not specified, this is the active layer.
> +#
> +# @speed:  #optional the maximum speed, in bytes per second
> +#
> +# Returns: Nothing on success
> +#          If commit or stream is already active on this device, DeviceInUse
> +#          If @device does not exist, DeviceNotFound
> +#          If image commit is not supported by this device, NotSupported
> +#          If @base does not exist, BaseNotFound
> +#          If @top does not exist, TopNotFound
> +#          If @speed is invalid, InvalidParameter
> +#
> +# Since: 1.2

1.3

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 4/6] qerror: new error for live block commit, QERR_TOP_NOT_FOUND
  2012-08-30 22:55   ` Eric Blake
@ 2012-08-31 14:42     ` Jeff Cody
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Cody @ 2012-08-31 14:42 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

On 08/30/2012 06:55 PM, Eric Blake wrote:
> On 08/30/2012 11:47 AM, Jeff Cody wrote:
>> Signed-off-by: Jeff Cody <jcody@redhat.com>
>> ---
>>  qerror.h | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/qerror.h b/qerror.h
>> index d0a76a4..7396184 100644
>> --- a/qerror.h
>> +++ b/qerror.h
>> @@ -219,6 +219,9 @@ void assert_no_error(Error *err);
>>  #define QERR_TOO_MANY_FILES \
>>      ERROR_CLASS_GENERIC_ERROR, "Too many open files"
>>  
>> +#define QERR_TOP_NOT_FOUND \
>> +    ERROR_CLASS_GENERIC_ERROR, "Top image file %s not found"
>> +
> 
> Are you sure this patch is needed, or should you be using error_setg()?
> 
>  http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg04980.html
> 

I think you are right, error_setg() would be the best course - although
I based these patches only on what was in qemu/master currently.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 6/6] QAPI: add command for live block commit, 'block-commit'
  2012-08-30 23:06   ` Eric Blake
@ 2012-08-31 14:42     ` Jeff Cody
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Cody @ 2012-08-31 14:42 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

On 08/30/2012 07:06 PM, Eric Blake wrote:
> On 08/30/2012 11:47 AM, Jeff Cody wrote:
>> The command for live block commit is added, which has the following
>> arguments:
>>
>> device: the block device to perform the commit on (mandatory)
>> base:   the base image to commit into; optional (if not specified,
>>         it is the underlying original image)
>> top:    the top image of the commit - all data from inside top down
>>         to base will be committed into base. optional (if not specified,
>>         it is the active image) - see note below
>> speed:  maximum speed, in bytes/sec
>>
>> note: eventually this will support merging down the active layer,
>>       but that code is not yet complete.  If the active layer is passed
>>       in currently as top, or top is left to the default, then the error
>>       QERR_TOP_NOT_FOUND will be returned.
>>
>> The is done as a block job, so upon completion a BLOCK_JOB_COMPLETED will
>> be emitted.
> 
> Likewise, the job can be canceled, and it is possible to track progress
> of the job or change the speed on the fly, using existing block job
> commands.

Correct.

> 
> Will the BLOCK_JOB_COMPLETED event have a new category listing that it
> was a commit job instead of a stream job that completed?  That is, I
> think this patch is incomplete, and that you also need to modify
> QMP/qmp-events.txt to modify the 'type' field of
> BLOCK_JOB_{CANCELLED,COMPLETED} to distinguish this new sub-type of event.
>

You are right, I should add the sub-type info into QMP/qmp-events.txt; the
type is 'commit'.

Here is what the BLOCK_JOB_COMPLETED event looks like for commit:

{"timestamp": {"seconds": 1346423249, "microseconds": 551295},
  "event": "BLOCK_JOB_COMPLETED",
  "data": {"device": "virtio0", "len": 6442450944, "offset": 6442450944,
            "speed": 140732751976032, "type": "commit"}}


>>  
>>  ##
>> +# @block-commit
>> +#
>> +# Live commit of data from child image nodes into parent nodes - i.e.,
>> +# writes data between 'top' and 'base' into 'base'.
>> +#
>> +# @device:  the name of the device
>> +#
>> +# @base:   #optional The parent image of the device to write data into.
>> +#                    If not specified, this is the original parent image.
>> +#
>> +# @top:    #optional The child image, above which data will not be committed
>> +#                    down.  If not specified, this is the active layer.
>> +#
>> +# @speed:  #optional the maximum speed, in bytes per second
>> +#
>> +# Returns: Nothing on success
>> +#          If commit or stream is already active on this device, DeviceInUse
>> +#          If @device does not exist, DeviceNotFound
>> +#          If image commit is not supported by this device, NotSupported
>> +#          If @base does not exist, BaseNotFound
>> +#          If @top does not exist, TopNotFound
>> +#          If @speed is invalid, InvalidParameter
>> +#
>> +# Since: 1.2
> 
> 1.3
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images.
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images Jeff Cody
@ 2012-09-06 13:23   ` Kevin Wolf
  2012-09-06 14:58     ` Eric Blake
  2012-09-06 14:59     ` Jeff Cody
  0 siblings, 2 replies; 26+ messages in thread
From: Kevin Wolf @ 2012-09-06 13:23 UTC (permalink / raw)
  To: Jeff Cody; +Cc: supriyak, pbonzini, eblake, qemu-devel, stefanha

Am 30.08.2012 20:47, schrieb Jeff Cody:
> Add bdrv_find_child(), and bdrv_delete_intermediate().
> 
> bdrv_find_child():  given 'bs' and the active (topmost) BDS of an image chain,
>                     find the image that is the immediate top of 'bs'
> 
> bdrv_delete_intermediate():
>                     Given 3 BDS (active, top, base), delete images above
>                     base up to and including top, and set base to be the
>                     parent of top's child node.
> 
>                     E.g., this converts:
> 
>                     bottom <- base <- intermediate <- top <- active
> 
>                     to
> 
>                     bottom <- base <- active
> 
>                     where top == active is permitted, although active
>                     will not be deleted.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>

At first, when just reading the function name, I thought this would
actually delete the image file. Of course, it only removes it from the
backing file chain, but leaves the image file around. I don't have a
good suggestion, but if someone has a better name, I think we should
change it.

> ---
>  block.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block.h |   5 ++-
>  2 files changed, 146 insertions(+), 1 deletion(-)
> 
> diff --git a/block.c b/block.c
> index 9470319..11e275c 100644
> --- a/block.c
> +++ b/block.c
> @@ -1752,6 +1752,148 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>      return ret;
>  }
>  
> +/* Finds the image layer immediately to the 'top' of bs.
> + *
> + * active is the current topmost image.
> + */

Most other function header comments in block.c have the /* on its own
line, so it would be nice to stay consistent.

Without looking at the code yet, I'm not quite sure if I understand this
correctly. Could use an example:

  base <- a <- b <- c <- d

bdrv_find_child(d, a) would return b? Why is it called bdrv_find_child
then, b is neither a (direct) child of d nor a.

> +BlockDriverState *bdrv_find_child(BlockDriverState *active,
> +                                  BlockDriverState *bs)
> +{
> +    BlockDriverState *child = NULL;
> +    BlockDriverState *intermediate;
> +
> +    /* if the active bs layer is the same as the new top, then there
> +     * is no image above the top, so it will be returned as the child
> +     */

"new top"?

And should this be mentioned in the header comment? Not sure how
generally useful this behaviour is. If it's only the right thing for the
only caller, maybe returning NULL and handling the case in the caller
would be better.

> +    if (active == bs) {
> +        child = active;
> +    } else {
> +        intermediate = active;
> +        while (intermediate->backing_hd) {
> +            if (intermediate->backing_hd == bs) {
> +                child = intermediate;
> +                break;
> +            }
> +            intermediate = intermediate->backing_hd;
> +        }
> +    }
> +
> +    return child;
> +}

So it returns NULL when bs isn't in the backing file chain of active.
Probably worth mentioning in the comment as well.

> +
> +typedef struct BlkIntermediateStates {
> +    BlockDriverState *bs;
> +    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
> +} BlkIntermediateStates;
> +
> +
> +/* deletes images above 'base' up to and including 'top', and sets the image
> + * above 'top' to have base as its backing file.
> + *
> + * E.g., this will convert the following chain:
> + * bottom <- base <- intermediate <- top <- active
> + *
> + * to
> + *
> + * bottom <- base <- active
> + *
> + * It is allowed for bottom==base, in which case it converts:
> + *
> + * base <- intermediate <- top <- active
> + *
> + * to
> + *
> + * base <- active
> + *
> + * It is also allowed for top==active, except in that case active is not
> + * deleted:

Hm, makes the interface inconsistent. Shouldn't you be using top ==
intermediate and it would work without any special casing?

> + *
> + * base <- intermediate <- top
> + *
> + * becomes
> + *
> + * base <- top
> + */
> +int bdrv_delete_intermediate(BlockDriverState *active, BlockDriverState *top,
> +                             BlockDriverState *base)
> +{
> +    BlockDriverState *intermediate;
> +    BlockDriverState *base_bs = NULL;
> +    BlockDriverState *new_top_bs = NULL;
> +    BlkIntermediateStates *intermediate_state, *next;
> +    int ret = -1;
> +
> +    QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete;
> +    QSIMPLEQ_INIT(&states_to_delete);
> +
> +    if (!top->drv || !base->drv) {
> +        goto exit;
> +    }
> +
> +    new_top_bs = bdrv_find_child(active, top);

new_top_bs is the first BDS that should not be removed from the chain.

Why do we pass top and then search for new_top instead of passing
new_top directly?

> +
> +    /* special case of new_top_bs->backing_hd already pointing to base - nothing
> +     * to do, no intermediate images
> +     */
> +    if (new_top_bs->backing_hd == base) {
> +        ret = 0;
> +        goto exit;
> +    }
> +
> +    if (new_top_bs == NULL) {

Never reached, we segfault before. (How about unit tests for this
function, in the style of tests/check-q*.c?)

> +        /* we could not find the image above 'top', this is an error */
> +        goto exit;
> +    }
> +
> +    /* if the active and top image passed in are the same, then we
> +     * can't delete the active, so we start one below
> +     */
> +    intermediate = (active == top) ? active->backing_hd : top;

Aha. So intermediate is used to undo the special case. Now we're always
on the last image to be deleted.

This is equivalent to an unconditional new_top_bs->backing_hd.

> +
> +    /* now we will go down through the list, and add each BDS we find
> +     * into our deletion queue, until we hit the 'base'
> +     */
> +    while (intermediate) {
> +        intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
> +        intermediate_state->bs = intermediate;
> +        QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
> +
> +        if (intermediate->backing_hd == base) {
> +            base_bs = intermediate->backing_hd;
> +            break;
> +        }
> +        intermediate = intermediate->backing_hd;
> +    }
> +    if (base_bs == NULL) {
> +        /* something went wrong, we did not end at the base. safely
> +         * unravel everything, and exit with error */
> +        goto exit;
> +    }
> +
> +    /* success - we can delete the intermediate states, and link top->base */
> +    ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
> +                                   base_bs->drv ? base_bs->drv->format_name : "");

Requires that new_top_bs is opened r/w. Definitely worth documenting.

Kevin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 2/6] block: add live block commit functionality
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 2/6] block: add live block commit functionality Jeff Cody
@ 2012-09-06 14:00   ` Kevin Wolf
  2012-09-06 20:37     ` Jeff Cody
  0 siblings, 1 reply; 26+ messages in thread
From: Kevin Wolf @ 2012-09-06 14:00 UTC (permalink / raw)
  To: Jeff Cody; +Cc: supriyak, pbonzini, eblake, qemu-devel, stefanha

Am 30.08.2012 20:47, schrieb Jeff Cody:
> This adds the live commit coroutine.  This iteration focuses on the
> commit only below the active layer, and not the active layer itself.
> 
> The behaviour is similar to block streaming; the sectors are walked
> through, and anything that exists above 'base' is committed back down
> into base.  At the end, intermediate images are deleted, and the
> chain stitched together.  Images are restored to their original open
> flags upon completion.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/Makefile.objs |   1 +
>  block/commit.c      | 202 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block_int.h         |  19 +++++
>  trace-events        |   2 +
>  4 files changed, 224 insertions(+)
>  create mode 100644 block/commit.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index b5754d3..4a136b8 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>  block-obj-y += stream.o
> +block-obj-y += commit.o
>  block-obj-$(CONFIG_WIN32) += raw-win32.o
>  block-obj-$(CONFIG_POSIX) += raw-posix.o
>  block-obj-$(CONFIG_LIBISCSI) += iscsi.o
> diff --git a/block/commit.c b/block/commit.c
> new file mode 100644
> index 0000000..bd3d882
> --- /dev/null
> +++ b/block/commit.c
> @@ -0,0 +1,202 @@
> +/*
> + * Live block commit
> + *
> + * Copyright Red Hat, Inc. 2012
> + *
> + * Authors:
> + *  Jeff Cody   <jcody@redhat.com>
> + *  Based on stream.c by Stefan Hajnoczi
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "trace.h"
> +#include "block_int.h"
> +#include "qemu/ratelimit.h"
> +
> +enum {
> +    /*
> +     * Size of data buffer for populating the image file.  This should be large
> +     * enough to process multiple clusters in a single call, so that populating
> +     * contiguous regions of the image is efficient.
> +     */
> +    COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
> +};
> +
> +#define SLICE_TIME 100000000ULL /* ns */
> +
> +typedef struct CommitBlockJob {
> +    BlockJob common;
> +    RateLimit limit;
> +    BlockDriverState *active;
> +    BlockDriverState *top;
> +    BlockDriverState *base;
> +    BlockErrorAction on_error;
> +    int base_flags;
> +    int top_flags;
> +} CommitBlockJob;
> +
> +static int coroutine_fn commit_populate(BlockDriverState *bs,
> +                                        BlockDriverState *base,
> +                                        int64_t sector_num, int nb_sectors,
> +                                        void *buf)
> +{
> +    if (bdrv_read(bs, sector_num, buf, nb_sectors)) {
> +        return -EIO;
> +    }
> +    if (bdrv_write(base, sector_num, buf, nb_sectors)) {
> +        return -EIO;
> +    }

Don't throw error codes away.

What should we do with backing files that are smaller than the image to
commit? In this version, data is copied up to the size of the backing
file, and then we get -EIO from bdrv_co_do_writev().

> +    return 0;
> +}
> +
> +static void coroutine_fn commit_run(void *opaque)
> +{
> +    CommitBlockJob *s = opaque;
> +    BlockDriverState *active = s->active;
> +    BlockDriverState *top = s->top;
> +    BlockDriverState *base = s->base;
> +    BlockDriverState *top_child = NULL;
> +    int64_t sector_num, end;
> +    int error = 0;
> +    int ret = 0;
> +    int n = 0;
> +    void *buf;
> +    int bytes_written = 0;
> +
> +    s->common.len = bdrv_getlength(top);
> +    if (s->common.len < 0) {
> +        block_job_complete(&s->common, s->common.len);
> +        return;
> +    }
> +
> +    top_child = bdrv_find_child(active, top);
> +
> +    end = s->common.len >> BDRV_SECTOR_BITS;
> +    buf = qemu_blockalign(top, COMMIT_BUFFER_SIZE);
> +
> +    for (sector_num = 0; sector_num < end; sector_num += n) {
> +        uint64_t delay_ns = 0;
> +        bool copy;
> +
> +wait:
> +        /* Note that even when no rate limit is applied we need to yield
> +         * with no pending I/O here so that qemu_aio_flush() returns.
> +         */
> +        block_job_sleep_ns(&s->common, rt_clock, delay_ns);
> +        if (block_job_is_cancelled(&s->common)) {
> +            break;
> +        }
> +        /* Copy if allocated above the base */
> +        ret = bdrv_co_is_allocated_above(top, base, sector_num,
> +                                         COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
> +                                         &n);
> +        copy = (ret == 1);
> +        trace_commit_one_iteration(s, sector_num, n, ret);
> +        if (ret >= 0 && copy) {

If ret == 1, it's automatically >= 0...

By the way, I'm not sure if the interface is 0/1/-errno or
0/positive/-errno.

> +            if (s->common.speed) {
> +                delay_ns = ratelimit_calculate_delay(&s->limit, n);
> +                if (delay_ns > 0) {
> +                    goto wait;
> +                }
> +            }
> +            ret = commit_populate(top, base, sector_num, n, buf);
> +            bytes_written += n * BDRV_SECTOR_SIZE;
> +        }
> +        if (ret < 0) {
> +            if (s->on_error == BLOCK_ERR_STOP_ANY ||
> +                s->on_error == BLOCK_ERR_STOP_ENOSPC) {

Shouldn't there be a check for ret == -ENOSPC then...? And does this
error handling do anything useful if you can't pause the job? Wouldn't
it retry all the time?

> +                n = 0;
> +                continue;
> +            }
> +            if (error == 0) {
> +                error = ret;
> +            }
> +            if (s->on_error == BLOCK_ERR_REPORT) {
> +                break;
> +            }
> +        }
> +        ret = 0;
> +
> +        /* Publish progress */
> +        s->common.offset += n * BDRV_SECTOR_SIZE;
> +    }
> +
> +    if (!block_job_is_cancelled(&s->common) && sector_num == end && ret == 0) {
> +        /* success */
> +        if (bdrv_delete_intermediate(active, top, base)) {
> +            /* something went wrong! */
> +            /* TODO:add error reporting here */

Indeed :-)

> +        }
> +    }
> +
> +    /* restore base open flags here if appropriate (e.g., change the base back
> +     * to r/o). These reopens do not need to be atomic, since we won't abort
> +     * even on failure here */
> +
> +    if (s->base_flags != bdrv_get_flags(base)) {
> +        bdrv_reopen(base, s->base_flags, NULL);
> +    }
> +    if (s->top_flags != bdrv_get_flags(top_child)) {
> +        bdrv_reopen(top_child, s->top_flags, NULL);
> +    }
> +
> +    qemu_vfree(buf);
> +    block_job_complete(&s->common, ret);
> +}
> +
> +static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp)
> +{
> +    CommitBlockJob *s = container_of(job, CommitBlockJob, common);
> +
> +    if (speed < 0) {
> +        error_set(errp, QERR_INVALID_PARAMETER, "speed");
> +        return;
> +    }
> +    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
> +}
> +
> +static BlockJobType commit_job_type = {
> +    .instance_size = sizeof(CommitBlockJob),
> +    .job_type      = "commit",
> +    .set_speed     = commit_set_speed,
> +};
> +
> +void commit_start(BlockDriverState *bs, BlockDriverState *base,
> +                 BlockDriverState *top, int64_t speed,
> +                 BlockErrorAction on_error, BlockDriverCompletionFunc *cb,
> +                 void *opaque, int orig_base_flags, int orig_top_flags,
> +                 Error **errp)
> +{
> +    CommitBlockJob *s;
> +
> +    if ((on_error == BLOCK_ERR_STOP_ANY ||
> +         on_error == BLOCK_ERR_STOP_ENOSPC) &&
> +        !bdrv_iostatus_is_enabled(bs)) {
> +        error_set(errp, QERR_INVALID_PARAMETER_COMBINATION);
> +        return;
> +    }
> +
> +    s = block_job_create(&commit_job_type, bs, speed, cb, opaque, errp);
> +    if (!s) {
> +        return;
> +    }
> +
> +    s->base   = base;
> +    s->top    = top;
> +    s->active = bs;
> +
> +    s->base_flags = orig_base_flags;
> +    s->top_flags  = orig_top_flags;

So it's the caller who is expected to reopen r/w and then pass the
original flags in? Can't we do both of it here?

> +
> +    s->on_error = on_error;
> +    s->common.co = qemu_coroutine_create(commit_run);
> +
> +    trace_commit_start(bs, base, top, s, s->common.co, opaque,
> +                       orig_base_flags, orig_top_flags);
> +    qemu_coroutine_enter(s->common.co, s);
> +
> +    return;

Unnecessary return.

Kevin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 6/6] QAPI: add command for live block commit, 'block-commit'
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 6/6] QAPI: add command for live block commit, 'block-commit' Jeff Cody
  2012-08-30 23:06   ` Eric Blake
@ 2012-09-06 14:29   ` Kevin Wolf
  2012-09-06 21:00     ` Jeff Cody
  1 sibling, 1 reply; 26+ messages in thread
From: Kevin Wolf @ 2012-09-06 14:29 UTC (permalink / raw)
  To: Jeff Cody; +Cc: supriyak, pbonzini, eblake, qemu-devel, stefanha

Am 30.08.2012 20:47, schrieb Jeff Cody:
> The command for live block commit is added, which has the following
> arguments:
> 
> device: the block device to perform the commit on (mandatory)
> base:   the base image to commit into; optional (if not specified,
>         it is the underlying original image)
> top:    the top image of the commit - all data from inside top down
>         to base will be committed into base. optional (if not specified,
>         it is the active image) - see note below
> speed:  maximum speed, in bytes/sec
> 
> note: eventually this will support merging down the active layer,
>       but that code is not yet complete.  If the active layer is passed
>       in currently as top, or top is left to the default, then the error
>       QERR_TOP_NOT_FOUND will be returned.
> 
> The is done as a block job, so upon completion a BLOCK_JOB_COMPLETED will
> be emitted.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  blockdev.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json | 30 ++++++++++++++++++++
>  qmp-commands.hx  |  6 ++++
>  3 files changed, 119 insertions(+)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 68d65fb..e0d6ca0 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -827,6 +827,89 @@ exit:
>      return;
>  }
>  
> +void qmp_block_commit(const char *device,
> +                      bool has_base, const char *base,
> +                      bool has_top, const char *top,
> +                      bool has_speed, int64_t speed,
> +                      Error **errp)
> +{
> +    BlockDriverState *bs;
> +    BlockDriverState *base_bs, *top_bs, *child_bs;
> +    Error *local_err = NULL;
> +    int orig_base_flags, orig_top_flags;
> +    BlockReopenQueue *reopen_queue = NULL;
> +    /* This will be part of the QMP command, if/when the
> +     * BlockdevOnError change for blkmirror makes it in
> +     */
> +    BlockErrorAction on_error = BLOCK_ERR_REPORT;
> +
> +    /* drain all i/o before commits */
> +    bdrv_drain_all();
> +
> +    bs = bdrv_find(device);
> +    if (!bs) {
> +        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
> +        return;
> +    }
> +    if (base && has_base) {
> +        base_bs = bdrv_find_backing_image(bs, base);
> +    } else {
> +        base_bs = bdrv_find_base(bs);
> +    }
> +
> +    if (base_bs == NULL) {
> +        error_set(errp, QERR_BASE_NOT_FOUND, "NULL");

Shouldn't it be base rather than "NULL"?

> +        return;
> +    }
> +
> +    if (top && has_top) {
> +        /* if we want to allow the active layer,
> +         * use 'bdrv_find_image()' here */
> +        top_bs = bdrv_find_backing_image(bs, top);
> +        if (top_bs == NULL) {
> +            error_set(errp, QERR_TOP_NOT_FOUND, top);
> +            return;
> +        }
> +    } else {
> +        /* we will eventually default to the top layer,i.e. top_bs = bs */
> +        error_set(errp, QERR_TOP_NOT_FOUND, top);
> +        return;
> +    }
> +
> +    child_bs = bdrv_find_child(bs, top_bs);
> +
> +    orig_base_flags = bdrv_get_flags(base_bs);  /* what we are writing into   */
> +    orig_top_flags = bdrv_get_flags(child_bs);  /* to change the backing file */
> +
> +    /* convert base_bs to r/w, if necessary */
> +    if (!(orig_base_flags & BDRV_O_RDWR)) {
> +        reopen_queue = bdrv_reopen_queue(reopen_queue, base_bs,
> +                                         orig_base_flags | BDRV_O_RDWR);
> +    }
> +    if (!(orig_top_flags & BDRV_O_RDWR)) {
> +        reopen_queue = bdrv_reopen_queue(reopen_queue, base_bs,
> +                                         orig_base_flags | BDRV_O_RDWR);

I think you mean child_bs and orig_top_flags. (Why isn't it called
orig_child_flags?)

> +    }
> +    if (reopen_queue) {
> +        bdrv_reopen_multiple(reopen_queue, &local_err);
> +        if (local_err != NULL) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +
> +    commit_start(bs, base_bs, top_bs, speed, on_error,
> +                 block_job_cb, bs, orig_base_flags, orig_top_flags,
> +                 &local_err);
> +    if (local_err != NULL) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +    /* Grab a reference so hotplug does not delete the BlockDriverState from
> +     * underneath us.
> +     */
> +    drive_get_ref(drive_get_by_blockdev(bs));
> +}
>  
>  static void eject_device(BlockDriverState *bs, int force, Error **errp)
>  {
> diff --git a/qapi-schema.json b/qapi-schema.json
> index bd8ad74..45feda6 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1401,6 +1401,36 @@
>    'returns': 'str' }
>  
>  ##
> +# @block-commit
> +#
> +# Live commit of data from child image nodes into parent nodes - i.e.,
> +# writes data between 'top' and 'base' into 'base'.
> +#
> +# @device:  the name of the device
> +#
> +# @base:   #optional The parent image of the device to write data into.
> +#                    If not specified, this is the original parent image.

"The file name of the parent image", actually.

Which I'm afraid means that this isn't an API for the long term, but
there's little we can do about it now...

> +#
> +# @top:    #optional The child image, above which data will not be committed
> +#                    down.  If not specified, this is the active layer.

Again, the file name.

Kevin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images.
  2012-09-06 13:23   ` Kevin Wolf
@ 2012-09-06 14:58     ` Eric Blake
  2012-09-06 14:59     ` Jeff Cody
  1 sibling, 0 replies; 26+ messages in thread
From: Eric Blake @ 2012-09-06 14:58 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: supriyak, pbonzini, Jeff Cody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 795 bytes --]

On 09/06/2012 07:23 AM, Kevin Wolf wrote:
> Am 30.08.2012 20:47, schrieb Jeff Cody:
>> Add bdrv_find_child(), and bdrv_delete_intermediate().
>>
>> bdrv_find_child():  given 'bs' and the active (topmost) BDS of an image chain,
>>                     find the image that is the immediate top of 'bs'
>>
>> bdrv_delete_intermediate():

> At first, when just reading the function name, I thought this would
> actually delete the image file. Of course, it only removes it from the
> backing file chain, but leaves the image file around. I don't have a
> good suggestion, but if someone has a better name, I think we should
> change it.

Maybe bdrv_unchain_intermediate()?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images.
  2012-09-06 13:23   ` Kevin Wolf
  2012-09-06 14:58     ` Eric Blake
@ 2012-09-06 14:59     ` Jeff Cody
  2012-09-07 10:19       ` Kevin Wolf
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff Cody @ 2012-09-06 14:59 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: supriyak, pbonzini, eblake, qemu-devel, stefanha

On 09/06/2012 09:23 AM, Kevin Wolf wrote:
> Am 30.08.2012 20:47, schrieb Jeff Cody:
>> Add bdrv_find_child(), and bdrv_delete_intermediate().
>>
>> bdrv_find_child():  given 'bs' and the active (topmost) BDS of an image chain,
>>                     find the image that is the immediate top of 'bs'
>>
>> bdrv_delete_intermediate():
>>                     Given 3 BDS (active, top, base), delete images above
>>                     base up to and including top, and set base to be the
>>                     parent of top's child node.
>>
>>                     E.g., this converts:
>>
>>                     bottom <- base <- intermediate <- top <- active
>>
>>                     to
>>
>>                     bottom <- base <- active
>>
>>                     where top == active is permitted, although active
>>                     will not be deleted.
>>
>> Signed-off-by: Jeff Cody <jcody@redhat.com>
> 
> At first, when just reading the function name, I thought this would
> actually delete the image file. Of course, it only removes it from the
> backing file chain, but leaves the image file around. I don't have a
> good suggestion, but if someone has a better name, I think we should
> change it.

Hmm, the naming seems consistent with bdrv_delete(), which does not
actually delete the image files either (and, that is essentially what
this does... calls bdrv_delete(), on the intermediate images).

However, here are some other name proposals:

   *  bdrv_disconnect_intermediate()
   *  bdrv_drop_intermediate()
   *  bdrv_shorten_chain()

> 
>> ---
>>  block.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block.h |   5 ++-
>>  2 files changed, 146 insertions(+), 1 deletion(-)
>>
>> diff --git a/block.c b/block.c
>> index 9470319..11e275c 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1752,6 +1752,148 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>      return ret;
>>  }
>>  
>> +/* Finds the image layer immediately to the 'top' of bs.
>> + *
>> + * active is the current topmost image.
>> + */
> 
> Most other function header comments in block.c have the /* on its own
> line, so it would be nice to stay consistent.
> 
> Without looking at the code yet, I'm not quite sure if I understand this
> correctly. Could use an example:
> 
>   base <- a <- b <- c <- d
> 
> bdrv_find_child(d, a) would return b? Why is it called bdrv_find_child
> then, b is neither a (direct) child of d nor a.
> 

For completeness: as we discussed on IRC, this will be renamed to
bdrv_find_overlay(), avoiding the parent/child terminology, which can be
a bit ambiguous.


>> +BlockDriverState *bdrv_find_child(BlockDriverState *active,
>> +                                  BlockDriverState *bs)
>> +{
>> +    BlockDriverState *child = NULL;
>> +    BlockDriverState *intermediate;
>> +
>> +    /* if the active bs layer is the same as the new top, then there
>> +     * is no image above the top, so it will be returned as the child
>> +     */
> 
> "new top"?
> 
> And should this be mentioned in the header comment? Not sure how
> generally useful this behaviour is. If it's only the right thing for the
> only caller, maybe returning NULL and handling the case in the caller
> would be better.
> 

Yes, it should definitely be mentioned in the comment header. I
initially was going to say that I don't think NULL is appropriate to
return here, but on more reflection I think you correct.  We shouldn't
return non-NULL if it is not actually the overlay to bs.  The caller can
check on a NULL return, and determine what to do at that point.


>> +    if (active == bs) {
>> +        child = active;
>> +    } else {
>> +        intermediate = active;
>> +        while (intermediate->backing_hd) {
>> +            if (intermediate->backing_hd == bs) {
>> +                child = intermediate;
>> +                break;
>> +            }
>> +            intermediate = intermediate->backing_hd;
>> +        }
>> +    }
>> +
>> +    return child;
>> +}
> 
> So it returns NULL when bs isn't in the backing file chain of active.
> Probably worth mentioning in the comment as well.

Agreed.

> 
>> +
>> +typedef struct BlkIntermediateStates {
>> +    BlockDriverState *bs;
>> +    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
>> +} BlkIntermediateStates;
>> +
>> +
>> +/* deletes images above 'base' up to and including 'top', and sets the image
>> + * above 'top' to have base as its backing file.
>> + *
>> + * E.g., this will convert the following chain:
>> + * bottom <- base <- intermediate <- top <- active
>> + *
>> + * to
>> + *
>> + * bottom <- base <- active
>> + *
>> + * It is allowed for bottom==base, in which case it converts:
>> + *
>> + * base <- intermediate <- top <- active
>> + *
>> + * to
>> + *
>> + * base <- active
>> + *
>> + * It is also allowed for top==active, except in that case active is not
>> + * deleted:
> 
> Hm, makes the interface inconsistent. Shouldn't you be using top ==
> intermediate and it would work without any special casing?
>

To remain consistent, maybe we should define it as an error if
top==active, and return error in that case?  The caller can be
responsible for checking for that - if the caller wants to merge down
the active layer, there are additional steps to be taken anyway.

>> + *
>> + * base <- intermediate <- top
>> + *
>> + * becomes
>> + *
>> + * base <- top
>> + */
>> +int bdrv_delete_intermediate(BlockDriverState *active, BlockDriverState *top,
>> +                             BlockDriverState *base)
>> +{
>> +    BlockDriverState *intermediate;
>> +    BlockDriverState *base_bs = NULL;
>> +    BlockDriverState *new_top_bs = NULL;
>> +    BlkIntermediateStates *intermediate_state, *next;
>> +    int ret = -1;
>> +
>> +    QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete;
>> +    QSIMPLEQ_INIT(&states_to_delete);
>> +
>> +    if (!top->drv || !base->drv) {
>> +        goto exit;
>> +    }
>> +
>> +    new_top_bs = bdrv_find_child(active, top);
> 
> new_top_bs is the first BDS that should not be removed from the chain.
> 
> Why do we pass top and then search for new_top instead of passing
> new_top directly?
> 

(Even though we talked about this on IRC, including it here for
completeness):

Because the block commit API specifies the image to merge down, so we
need to find the image overlay on top of that, to adjust the backing
file correctly.  And if we change the semantics of the block commit API
to mean the image above the image to merge down, then that makes it
inconsistent when specifying merging down the active layer later on.

>> +
>> +    /* special case of new_top_bs->backing_hd already pointing to base - nothing
>> +     * to do, no intermediate images
>> +     */
>> +    if (new_top_bs->backing_hd == base) {
>> +        ret = 0;
>> +        goto exit;
>> +    }
>> +
>> +    if (new_top_bs == NULL) {
> 
> Never reached, we segfault before. (How about unit tests for this
> function, in the style of tests/check-q*.c?)
> 

Ugh, thanks - obviously this should be above the previous check.  And, I
agree on the unit test - I will add that in tests/check-*.


>> +        /* we could not find the image above 'top', this is an error */
>> +        goto exit;
>> +    }
>> +
>> +    /* if the active and top image passed in are the same, then we
>> +     * can't delete the active, so we start one below
>> +     */
>> +    intermediate = (active == top) ? active->backing_hd : top;
> 
> Aha. So intermediate is used to undo the special case. Now we're always
> on the last image to be deleted.
> 
> This is equivalent to an unconditional new_top_bs->backing_hd.
> 
>> +
>> +    /* now we will go down through the list, and add each BDS we find
>> +     * into our deletion queue, until we hit the 'base'
>> +     */
>> +    while (intermediate) {
>> +        intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
>> +        intermediate_state->bs = intermediate;
>> +        QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
>> +
>> +        if (intermediate->backing_hd == base) {
>> +            base_bs = intermediate->backing_hd;
>> +            break;
>> +        }
>> +        intermediate = intermediate->backing_hd;
>> +    }
>> +    if (base_bs == NULL) {
>> +        /* something went wrong, we did not end at the base. safely
>> +         * unravel everything, and exit with error */
>> +        goto exit;
>> +    }
>> +
>> +    /* success - we can delete the intermediate states, and link top->base */
>> +    ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
>> +                                   base_bs->drv ? base_bs->drv->format_name : "");
> 
> Requires that new_top_bs is opened r/w. Definitely worth documenting.
>

Agreed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 2/6] block: add live block commit functionality
  2012-09-06 14:00   ` Kevin Wolf
@ 2012-09-06 20:37     ` Jeff Cody
  2012-09-06 21:16       ` Eric Blake
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Cody @ 2012-09-06 20:37 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: stefanha, supriyak, eblake, qemu-devel, pbonzini

On 09/06/2012 10:00 AM, Kevin Wolf wrote:
> Am 30.08.2012 20:47, schrieb Jeff Cody:
>> This adds the live commit coroutine.  This iteration focuses on the
>> commit only below the active layer, and not the active layer itself.
>>
>> The behaviour is similar to block streaming; the sectors are walked
>> through, and anything that exists above 'base' is committed back down
>> into base.  At the end, intermediate images are deleted, and the
>> chain stitched together.  Images are restored to their original open
>> flags upon completion.
>>
>> Signed-off-by: Jeff Cody <jcody@redhat.com>
>> ---
>>  block/Makefile.objs |   1 +
>>  block/commit.c      | 202 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block_int.h         |  19 +++++
>>  trace-events        |   2 +
>>  4 files changed, 224 insertions(+)
>>  create mode 100644 block/commit.c
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index b5754d3..4a136b8 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>  block-obj-y += qed-check.o
>>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>  block-obj-y += stream.o
>> +block-obj-y += commit.o
>>  block-obj-$(CONFIG_WIN32) += raw-win32.o
>>  block-obj-$(CONFIG_POSIX) += raw-posix.o
>>  block-obj-$(CONFIG_LIBISCSI) += iscsi.o
>> diff --git a/block/commit.c b/block/commit.c
>> new file mode 100644
>> index 0000000..bd3d882
>> --- /dev/null
>> +++ b/block/commit.c
>> @@ -0,0 +1,202 @@
>> +/*
>> + * Live block commit
>> + *
>> + * Copyright Red Hat, Inc. 2012
>> + *
>> + * Authors:
>> + *  Jeff Cody   <jcody@redhat.com>
>> + *  Based on stream.c by Stefan Hajnoczi
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "trace.h"
>> +#include "block_int.h"
>> +#include "qemu/ratelimit.h"
>> +
>> +enum {
>> +    /*
>> +     * Size of data buffer for populating the image file.  This should be large
>> +     * enough to process multiple clusters in a single call, so that populating
>> +     * contiguous regions of the image is efficient.
>> +     */
>> +    COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
>> +};
>> +
>> +#define SLICE_TIME 100000000ULL /* ns */
>> +
>> +typedef struct CommitBlockJob {
>> +    BlockJob common;
>> +    RateLimit limit;
>> +    BlockDriverState *active;
>> +    BlockDriverState *top;
>> +    BlockDriverState *base;
>> +    BlockErrorAction on_error;
>> +    int base_flags;
>> +    int top_flags;
>> +} CommitBlockJob;
>> +
>> +static int coroutine_fn commit_populate(BlockDriverState *bs,
>> +                                        BlockDriverState *base,
>> +                                        int64_t sector_num, int nb_sectors,
>> +                                        void *buf)
>> +{
>> +    if (bdrv_read(bs, sector_num, buf, nb_sectors)) {
>> +        return -EIO;
>> +    }
>> +    if (bdrv_write(base, sector_num, buf, nb_sectors)) {
>> +        return -EIO;
>> +    }
> 
> Don't throw error codes away.
> 

OK, I'll pass the return from bdrv_read/write to the caller.

> What should we do with backing files that are smaller than the image to
> commit? In this version, data is copied up to the size of the backing
> file, and then we get -EIO from bdrv_co_do_writev().
> 

We could leave it like that, and let it receive -EIO in that case.
Alternatively, we could try and determine before the commit if the data
will fit in the base, and return -ENOSPC if not.

>> +    return 0;
>> +}
>> +
>> +static void coroutine_fn commit_run(void *opaque)
>> +{
>> +    CommitBlockJob *s = opaque;
>> +    BlockDriverState *active = s->active;
>> +    BlockDriverState *top = s->top;
>> +    BlockDriverState *base = s->base;
>> +    BlockDriverState *top_child = NULL;
>> +    int64_t sector_num, end;
>> +    int error = 0;
>> +    int ret = 0;
>> +    int n = 0;
>> +    void *buf;
>> +    int bytes_written = 0;
>> +
>> +    s->common.len = bdrv_getlength(top);
>> +    if (s->common.len < 0) {
>> +        block_job_complete(&s->common, s->common.len);
>> +        return;
>> +    }
>> +
>> +    top_child = bdrv_find_child(active, top);
>> +
>> +    end = s->common.len >> BDRV_SECTOR_BITS;
>> +    buf = qemu_blockalign(top, COMMIT_BUFFER_SIZE);
>> +
>> +    for (sector_num = 0; sector_num < end; sector_num += n) {
>> +        uint64_t delay_ns = 0;
>> +        bool copy;
>> +
>> +wait:
>> +        /* Note that even when no rate limit is applied we need to yield
>> +         * with no pending I/O here so that qemu_aio_flush() returns.
>> +         */
>> +        block_job_sleep_ns(&s->common, rt_clock, delay_ns);
>> +        if (block_job_is_cancelled(&s->common)) {
>> +            break;
>> +        }
>> +        /* Copy if allocated above the base */
>> +        ret = bdrv_co_is_allocated_above(top, base, sector_num,
>> +                                         COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
>> +                                         &n);
>> +        copy = (ret == 1);
>> +        trace_commit_one_iteration(s, sector_num, n, ret);
>> +        if (ret >= 0 && copy) {
> 
> If ret == 1, it's automatically >= 0...
> 
> By the way, I'm not sure if the interface is 0/1/-errno or
> 0/positive/-errno.
> 

The header for bdrv_co_is_allocated() states that it returns true if
it is allocated, and false otherwise.  However, the function actually
returns true if allocated, false if not, and something < 0 on failure.

But either way, you are right, I can just use 'copy' here.

>> +            if (s->common.speed) {
>> +                delay_ns = ratelimit_calculate_delay(&s->limit, n);
>> +                if (delay_ns > 0) {
>> +                    goto wait;
>> +                }
>> +            }
>> +            ret = commit_populate(top, base, sector_num, n, buf);
>> +            bytes_written += n * BDRV_SECTOR_SIZE;
>> +        }
>> +        if (ret < 0) {
>> +            if (s->on_error == BLOCK_ERR_STOP_ANY ||
>> +                s->on_error == BLOCK_ERR_STOP_ENOSPC) {
> 
> Shouldn't there be a check for ret == -ENOSPC then...? And does this
> error handling do anything useful if you can't pause the job? Wouldn't
> it retry all the time?
> 

We should check for ret == -ENOSPC, yes.  On the error handling, we
should probably just stop completely here, and return error.

>> +                n = 0;
>> +                continue;
>> +            }
>> +            if (error == 0) {
>> +                error = ret;
>> +            }
>> +            if (s->on_error == BLOCK_ERR_REPORT) {
>> +                break;
>> +            }
>> +        }
>> +        ret = 0;
>> +
>> +        /* Publish progress */
>> +        s->common.offset += n * BDRV_SECTOR_SIZE;
>> +    }
>> +
>> +    if (!block_job_is_cancelled(&s->common) && sector_num == end && ret == 0) {
>> +        /* success */
>> +        if (bdrv_delete_intermediate(active, top, base)) {
>> +            /* something went wrong! */
>> +            /* TODO:add error reporting here */
> 
> Indeed :-)
> 
>> +        }
>> +    }
>> +
>> +    /* restore base open flags here if appropriate (e.g., change the base back
>> +     * to r/o). These reopens do not need to be atomic, since we won't abort
>> +     * even on failure here */
>> +
>> +    if (s->base_flags != bdrv_get_flags(base)) {
>> +        bdrv_reopen(base, s->base_flags, NULL);
>> +    }
>> +    if (s->top_flags != bdrv_get_flags(top_child)) {
>> +        bdrv_reopen(top_child, s->top_flags, NULL);
>> +    }
>> +
>> +    qemu_vfree(buf);
>> +    block_job_complete(&s->common, ret);
>> +}
>> +
>> +static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp)
>> +{
>> +    CommitBlockJob *s = container_of(job, CommitBlockJob, common);
>> +
>> +    if (speed < 0) {
>> +        error_set(errp, QERR_INVALID_PARAMETER, "speed");
>> +        return;
>> +    }
>> +    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
>> +}
>> +
>> +static BlockJobType commit_job_type = {
>> +    .instance_size = sizeof(CommitBlockJob),
>> +    .job_type      = "commit",
>> +    .set_speed     = commit_set_speed,
>> +};
>> +
>> +void commit_start(BlockDriverState *bs, BlockDriverState *base,
>> +                 BlockDriverState *top, int64_t speed,
>> +                 BlockErrorAction on_error, BlockDriverCompletionFunc *cb,
>> +                 void *opaque, int orig_base_flags, int orig_top_flags,
>> +                 Error **errp)
>> +{
>> +    CommitBlockJob *s;
>> +
>> +    if ((on_error == BLOCK_ERR_STOP_ANY ||
>> +         on_error == BLOCK_ERR_STOP_ENOSPC) &&
>> +        !bdrv_iostatus_is_enabled(bs)) {
>> +        error_set(errp, QERR_INVALID_PARAMETER_COMBINATION);
>> +        return;
>> +    }
>> +
>> +    s = block_job_create(&commit_job_type, bs, speed, cb, opaque, errp);
>> +    if (!s) {
>> +        return;
>> +    }
>> +
>> +    s->base   = base;
>> +    s->top    = top;
>> +    s->active = bs;
>> +
>> +    s->base_flags = orig_base_flags;
>> +    s->top_flags  = orig_top_flags;
> 
> So it's the caller who is expected to reopen r/w and then pass the
> original flags in? Can't we do both of it here?
> 

Yes, that would be cleaner - I'll move the reopen() to commit_start().


>> +
>> +    s->on_error = on_error;
>> +    s->common.co = qemu_coroutine_create(commit_run);
>> +
>> +    trace_commit_start(bs, base, top, s, s->common.co, opaque,
>> +                       orig_base_flags, orig_top_flags);
>> +    qemu_coroutine_enter(s->common.co, s);
>> +
>> +    return;
> 
> Unnecessary return.
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 6/6] QAPI: add command for live block commit, 'block-commit'
  2012-09-06 14:29   ` Kevin Wolf
@ 2012-09-06 21:00     ` Jeff Cody
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Cody @ 2012-09-06 21:00 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: supriyak, pbonzini, eblake, qemu-devel, stefanha

On 09/06/2012 10:29 AM, Kevin Wolf wrote:
> Am 30.08.2012 20:47, schrieb Jeff Cody:
>> The command for live block commit is added, which has the following
>> arguments:
>>
>> device: the block device to perform the commit on (mandatory)
>> base:   the base image to commit into; optional (if not specified,
>>         it is the underlying original image)
>> top:    the top image of the commit - all data from inside top down
>>         to base will be committed into base. optional (if not specified,
>>         it is the active image) - see note below
>> speed:  maximum speed, in bytes/sec
>>
>> note: eventually this will support merging down the active layer,
>>       but that code is not yet complete.  If the active layer is passed
>>       in currently as top, or top is left to the default, then the error
>>       QERR_TOP_NOT_FOUND will be returned.
>>
>> The is done as a block job, so upon completion a BLOCK_JOB_COMPLETED will
>> be emitted.
>>
>> Signed-off-by: Jeff Cody <jcody@redhat.com>
>> ---
>>  blockdev.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  qapi-schema.json | 30 ++++++++++++++++++++
>>  qmp-commands.hx  |  6 ++++
>>  3 files changed, 119 insertions(+)
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index 68d65fb..e0d6ca0 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -827,6 +827,89 @@ exit:
>>      return;
>>  }
>>  
>> +void qmp_block_commit(const char *device,
>> +                      bool has_base, const char *base,
>> +                      bool has_top, const char *top,
>> +                      bool has_speed, int64_t speed,
>> +                      Error **errp)
>> +{
>> +    BlockDriverState *bs;
>> +    BlockDriverState *base_bs, *top_bs, *child_bs;
>> +    Error *local_err = NULL;
>> +    int orig_base_flags, orig_top_flags;
>> +    BlockReopenQueue *reopen_queue = NULL;
>> +    /* This will be part of the QMP command, if/when the
>> +     * BlockdevOnError change for blkmirror makes it in
>> +     */
>> +    BlockErrorAction on_error = BLOCK_ERR_REPORT;
>> +
>> +    /* drain all i/o before commits */
>> +    bdrv_drain_all();
>> +
>> +    bs = bdrv_find(device);
>> +    if (!bs) {
>> +        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
>> +        return;
>> +    }
>> +    if (base && has_base) {
>> +        base_bs = bdrv_find_backing_image(bs, base);
>> +    } else {
>> +        base_bs = bdrv_find_base(bs);
>> +    }
>> +
>> +    if (base_bs == NULL) {
>> +        error_set(errp, QERR_BASE_NOT_FOUND, "NULL");
> 
> Shouldn't it be base rather than "NULL"?
>

Yes

>> +        return;
>> +    }
>> +
>> +    if (top && has_top) {
>> +        /* if we want to allow the active layer,
>> +         * use 'bdrv_find_image()' here */
>> +        top_bs = bdrv_find_backing_image(bs, top);
>> +        if (top_bs == NULL) {
>> +            error_set(errp, QERR_TOP_NOT_FOUND, top);
>> +            return;
>> +        }
>> +    } else {
>> +        /* we will eventually default to the top layer,i.e. top_bs = bs */
>> +        error_set(errp, QERR_TOP_NOT_FOUND, top);
>> +        return;
>> +    }
>> +
>> +    child_bs = bdrv_find_child(bs, top_bs);
>> +
>> +    orig_base_flags = bdrv_get_flags(base_bs);  /* what we are writing into   */
>> +    orig_top_flags = bdrv_get_flags(child_bs);  /* to change the backing file */
>> +
>> +    /* convert base_bs to r/w, if necessary */
>> +    if (!(orig_base_flags & BDRV_O_RDWR)) {
>> +        reopen_queue = bdrv_reopen_queue(reopen_queue, base_bs,
>> +                                         orig_base_flags | BDRV_O_RDWR);
>> +    }
>> +    if (!(orig_top_flags & BDRV_O_RDWR)) {
>> +        reopen_queue = bdrv_reopen_queue(reopen_queue, base_bs,
>> +                                         orig_base_flags | BDRV_O_RDWR);
> 
> I think you mean child_bs and orig_top_flags. (Why isn't it called
> orig_child_flags?)
> 

Yes, thanks - and (per your comments on the previous patch), I'll move
the reopen bits into commit_start().

>> +    }
>> +    if (reopen_queue) {
>> +        bdrv_reopen_multiple(reopen_queue, &local_err);
>> +        if (local_err != NULL) {
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>> +    }
>> +
>> +    commit_start(bs, base_bs, top_bs, speed, on_error,
>> +                 block_job_cb, bs, orig_base_flags, orig_top_flags,
>> +                 &local_err);
>> +    if (local_err != NULL) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +    /* Grab a reference so hotplug does not delete the BlockDriverState from
>> +     * underneath us.
>> +     */
>> +    drive_get_ref(drive_get_by_blockdev(bs));
>> +}
>>  
>>  static void eject_device(BlockDriverState *bs, int force, Error **errp)
>>  {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index bd8ad74..45feda6 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -1401,6 +1401,36 @@
>>    'returns': 'str' }
>>  
>>  ##
>> +# @block-commit
>> +#
>> +# Live commit of data from child image nodes into parent nodes - i.e.,
>> +# writes data between 'top' and 'base' into 'base'.
>> +#
>> +# @device:  the name of the device
>> +#
>> +# @base:   #optional The parent image of the device to write data into.
>> +#                    If not specified, this is the original parent image.
> 
> "The file name of the parent image", actually.
> 
> Which I'm afraid means that this isn't an API for the long term, but
> there's little we can do about it now...
> 
>> +#
>> +# @top:    #optional The child image, above which data will not be committed
>> +#                    down.  If not specified, this is the active layer.
> 
> Again, the file name.
> 
> Kevin
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 2/6] block: add live block commit functionality
  2012-09-06 20:37     ` Jeff Cody
@ 2012-09-06 21:16       ` Eric Blake
  2012-09-07 15:56         ` Jeff Cody
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Blake @ 2012-09-06 21:16 UTC (permalink / raw)
  To: jcody; +Cc: Kevin Wolf, supriyak, stefanha, qemu-devel, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1308 bytes --]

On 09/06/2012 02:37 PM, Jeff Cody wrote:
> On 09/06/2012 10:00 AM, Kevin Wolf wrote:
>> Am 30.08.2012 20:47, schrieb Jeff Cody:
>>> This adds the live commit coroutine.  This iteration focuses on the
>>> commit only below the active layer, and not the active layer itself.
>>>
>>> The behaviour is similar to block streaming; the sectors are walked
>>> through, and anything that exists above 'base' is committed back down
>>> into base.  At the end, intermediate images are deleted, and the
>>> chain stitched together.  Images are restored to their original open
>>> flags upon completion.
>>>

> 
>> What should we do with backing files that are smaller than the image to
>> commit? In this version, data is copied up to the size of the backing
>> file, and then we get -EIO from bdrv_co_do_writev().
>>
> 
> We could leave it like that, and let it receive -EIO in that case.
> Alternatively, we could try and determine before the commit if the data
> will fit in the base, and return -ENOSPC if not.

Neither sounds appealing.  Why can't we first try to resize the base to
the new data size being committed, and only fall back to -ENOSPC or -EIO
if the resize fails?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images.
  2012-09-06 14:59     ` Jeff Cody
@ 2012-09-07 10:19       ` Kevin Wolf
  2012-09-07 15:17         ` Jeff Cody
  0 siblings, 1 reply; 26+ messages in thread
From: Kevin Wolf @ 2012-09-07 10:19 UTC (permalink / raw)
  To: jcody; +Cc: supriyak, pbonzini, eblake, qemu-devel, stefanha

Am 06.09.2012 16:59, schrieb Jeff Cody:
> On 09/06/2012 09:23 AM, Kevin Wolf wrote:
>> Am 30.08.2012 20:47, schrieb Jeff Cody:
>>> Add bdrv_find_child(), and bdrv_delete_intermediate().
>>>
>>> bdrv_find_child():  given 'bs' and the active (topmost) BDS of an image chain,
>>>                     find the image that is the immediate top of 'bs'
>>>
>>> bdrv_delete_intermediate():
>>>                     Given 3 BDS (active, top, base), delete images above
>>>                     base up to and including top, and set base to be the
>>>                     parent of top's child node.
>>>
>>>                     E.g., this converts:
>>>
>>>                     bottom <- base <- intermediate <- top <- active
>>>
>>>                     to
>>>
>>>                     bottom <- base <- active
>>>
>>>                     where top == active is permitted, although active
>>>                     will not be deleted.
>>>
>>> Signed-off-by: Jeff Cody <jcody@redhat.com>
>>
>> At first, when just reading the function name, I thought this would
>> actually delete the image file. Of course, it only removes it from the
>> backing file chain, but leaves the image file around. I don't have a
>> good suggestion, but if someone has a better name, I think we should
>> change it.
> 
> Hmm, the naming seems consistent with bdrv_delete(), which does not
> actually delete the image files either (and, that is essentially what
> this does... calls bdrv_delete(), on the intermediate images).
> 
> However, here are some other name proposals:
> 
>    *  bdrv_disconnect_intermediate()
>    *  bdrv_drop_intermediate()
>    *  bdrv_shorten_chain()

bdrv_drop_intermediate() sounds good to me.

>>
>>> +
>>> +typedef struct BlkIntermediateStates {
>>> +    BlockDriverState *bs;
>>> +    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
>>> +} BlkIntermediateStates;
>>> +
>>> +
>>> +/* deletes images above 'base' up to and including 'top', and sets the image
>>> + * above 'top' to have base as its backing file.
>>> + *
>>> + * E.g., this will convert the following chain:
>>> + * bottom <- base <- intermediate <- top <- active
>>> + *
>>> + * to
>>> + *
>>> + * bottom <- base <- active
>>> + *
>>> + * It is allowed for bottom==base, in which case it converts:
>>> + *
>>> + * base <- intermediate <- top <- active
>>> + *
>>> + * to
>>> + *
>>> + * base <- active
>>> + *
>>> + * It is also allowed for top==active, except in that case active is not
>>> + * deleted:
>>
>> Hm, makes the interface inconsistent. Shouldn't you be using top ==
>> intermediate and it would work without any special casing?
>>
> 
> To remain consistent, maybe we should define it as an error if
> top==active, and return error in that case?  The caller can be
> responsible for checking for that - if the caller wants to merge down
> the active layer, there are additional steps to be taken anyway.

Yes, why not.

And we can always revisit when implementing the additional functionality.

>>> +        /* we could not find the image above 'top', this is an error */
>>> +        goto exit;
>>> +    }
>>> +
>>> +    /* if the active and top image passed in are the same, then we
>>> +     * can't delete the active, so we start one below
>>> +     */
>>> +    intermediate = (active == top) ? active->backing_hd : top;
>>
>> Aha. So intermediate is used to undo the special case. Now we're always
>> on the last image to be deleted.
>>
>> This is equivalent to an unconditional new_top_bs->backing_hd.

How about changing this to use the simpler unconditional version?

Kevin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images.
  2012-09-07 10:19       ` Kevin Wolf
@ 2012-09-07 15:17         ` Jeff Cody
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Cody @ 2012-09-07 15:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: stefanha, supriyak, eblake, qemu-devel, pbonzini

On 09/07/2012 06:19 AM, Kevin Wolf wrote:
> Am 06.09.2012 16:59, schrieb Jeff Cody:
>> On 09/06/2012 09:23 AM, Kevin Wolf wrote:
>>> Am 30.08.2012 20:47, schrieb Jeff Cody:
>>>> Add bdrv_find_child(), and bdrv_delete_intermediate().
>>>>
>>>> bdrv_find_child():  given 'bs' and the active (topmost) BDS of an image chain,
>>>>                     find the image that is the immediate top of 'bs'
>>>>
>>>> bdrv_delete_intermediate():
>>>>                     Given 3 BDS (active, top, base), delete images above
>>>>                     base up to and including top, and set base to be the
>>>>                     parent of top's child node.
>>>>
>>>>                     E.g., this converts:
>>>>
>>>>                     bottom <- base <- intermediate <- top <- active
>>>>
>>>>                     to
>>>>
>>>>                     bottom <- base <- active
>>>>
>>>>                     where top == active is permitted, although active
>>>>                     will not be deleted.
>>>>
>>>> Signed-off-by: Jeff Cody <jcody@redhat.com>
>>>
>>> At first, when just reading the function name, I thought this would
>>> actually delete the image file. Of course, it only removes it from the
>>> backing file chain, but leaves the image file around. I don't have a
>>> good suggestion, but if someone has a better name, I think we should
>>> change it.
>>
>> Hmm, the naming seems consistent with bdrv_delete(), which does not
>> actually delete the image files either (and, that is essentially what
>> this does... calls bdrv_delete(), on the intermediate images).
>>
>> However, here are some other name proposals:
>>
>>    *  bdrv_disconnect_intermediate()
>>    *  bdrv_drop_intermediate()
>>    *  bdrv_shorten_chain()
> 
> bdrv_drop_intermediate() sounds good to me.
> 
>>>
>>>> +
>>>> +typedef struct BlkIntermediateStates {
>>>> +    BlockDriverState *bs;
>>>> +    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
>>>> +} BlkIntermediateStates;
>>>> +
>>>> +
>>>> +/* deletes images above 'base' up to and including 'top', and sets the image
>>>> + * above 'top' to have base as its backing file.
>>>> + *
>>>> + * E.g., this will convert the following chain:
>>>> + * bottom <- base <- intermediate <- top <- active
>>>> + *
>>>> + * to
>>>> + *
>>>> + * bottom <- base <- active
>>>> + *
>>>> + * It is allowed for bottom==base, in which case it converts:
>>>> + *
>>>> + * base <- intermediate <- top <- active
>>>> + *
>>>> + * to
>>>> + *
>>>> + * base <- active
>>>> + *
>>>> + * It is also allowed for top==active, except in that case active is not
>>>> + * deleted:
>>>
>>> Hm, makes the interface inconsistent. Shouldn't you be using top ==
>>> intermediate and it would work without any special casing?
>>>
>>
>> To remain consistent, maybe we should define it as an error if
>> top==active, and return error in that case?  The caller can be
>> responsible for checking for that - if the caller wants to merge down
>> the active layer, there are additional steps to be taken anyway.
> 
> Yes, why not.
> 
> And we can always revisit when implementing the additional functionality.
> 
>>>> +        /* we could not find the image above 'top', this is an error */
>>>> +        goto exit;
>>>> +    }
>>>> +
>>>> +    /* if the active and top image passed in are the same, then we
>>>> +     * can't delete the active, so we start one below
>>>> +     */
>>>> +    intermediate = (active == top) ? active->backing_hd : top;
>>>
>>> Aha. So intermediate is used to undo the special case. Now we're always
>>> on the last image to be deleted.
>>>
>>> This is equivalent to an unconditional new_top_bs->backing_hd.
> 
> How about changing this to use the simpler unconditional version?

Sure - since active == top is now an error, there is no reason for the
more complicated logic.  And at this point, the statement
(new_top_bs->backing_hd == top) should always be true.

> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 2/6] block: add live block commit functionality
  2012-09-06 21:16       ` Eric Blake
@ 2012-09-07 15:56         ` Jeff Cody
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Cody @ 2012-09-07 15:56 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, supriyak, pbonzini, qemu-devel, stefanha

On 09/06/2012 05:16 PM, Eric Blake wrote:
> On 09/06/2012 02:37 PM, Jeff Cody wrote:
>> On 09/06/2012 10:00 AM, Kevin Wolf wrote:
>>> Am 30.08.2012 20:47, schrieb Jeff Cody:
>>>> This adds the live commit coroutine.  This iteration focuses on the
>>>> commit only below the active layer, and not the active layer itself.
>>>>
>>>> The behaviour is similar to block streaming; the sectors are walked
>>>> through, and anything that exists above 'base' is committed back down
>>>> into base.  At the end, intermediate images are deleted, and the
>>>> chain stitched together.  Images are restored to their original open
>>>> flags upon completion.
>>>>
> 
>>
>>> What should we do with backing files that are smaller than the image to
>>> commit? In this version, data is copied up to the size of the backing
>>> file, and then we get -EIO from bdrv_co_do_writev().
>>>
>>
>> We could leave it like that, and let it receive -EIO in that case.
>> Alternatively, we could try and determine before the commit if the data
>> will fit in the base, and return -ENOSPC if not.
> 
> Neither sounds appealing.  Why can't we first try to resize the base to
> the new data size being committed, and only fall back to -ENOSPC or -EIO
> if the resize fails?
> 

OK - we will attempt to resize the base, and return the appropriate
error on failure or if unsupported for the format.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 3/6] blockdev: rename block_stream_cb to a generic block_job_cb
  2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 3/6] blockdev: rename block_stream_cb to a generic block_job_cb Jeff Cody
@ 2012-09-07 16:27   ` Paolo Bonzini
  2012-09-07 17:04     ` Jeff Cody
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2012-09-07 16:27 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, supriyak, eblake, qemu-devel, stefanha

Il 30/08/2012 20:47, Jeff Cody ha scritto:
> @@ -53,6 +53,8 @@ static const int if_max_devs[IF_COUNT] = {
>      [IF_SCSI] = 7,
>  };
>  
> +static void block_job_cb(void *opaque, int ret);
> +
>  /*
>   * We automatically delete the drive when a device using it gets
>   * unplugged.  Questionable feature, but we can't just drop it.

Can you avoid the forward declaration?

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 3/6] blockdev: rename block_stream_cb to a generic block_job_cb
  2012-09-07 16:27   ` Paolo Bonzini
@ 2012-09-07 17:04     ` Jeff Cody
  2012-09-07 17:09       ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Cody @ 2012-09-07 17:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, supriyak, eblake, qemu-devel, stefanha

On 09/07/2012 12:27 PM, Paolo Bonzini wrote:
> Il 30/08/2012 20:47, Jeff Cody ha scritto:
>> @@ -53,6 +53,8 @@ static const int if_max_devs[IF_COUNT] = {
>>      [IF_SCSI] = 7,
>>  };
>>  
>> +static void block_job_cb(void *opaque, int ret);
>> +
>>  /*
>>   * We automatically delete the drive when a device using it gets
>>   * unplugged.  Questionable feature, but we can't just drop it.
> 
> Can you avoid the forward declaration?
> 
> Paolo
> 

Yes, sure - honestly, I added this patch in, but I assumed that the
similar patch of yours to support mirroring would go in first, making
this patch moot.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 3/6] blockdev: rename block_stream_cb to a generic block_job_cb
  2012-09-07 17:04     ` Jeff Cody
@ 2012-09-07 17:09       ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2012-09-07 17:09 UTC (permalink / raw)
  To: jcody; +Cc: kwolf, supriyak, eblake, qemu-devel, stefanha

Il 07/09/2012 19:04, Jeff Cody ha scritto:
> On 09/07/2012 12:27 PM, Paolo Bonzini wrote:
>> Il 30/08/2012 20:47, Jeff Cody ha scritto:
>>> @@ -53,6 +53,8 @@ static const int if_max_devs[IF_COUNT] = {
>>>      [IF_SCSI] = 7,
>>>  };
>>>  
>>> +static void block_job_cb(void *opaque, int ret);
>>> +
>>>  /*
>>>   * We automatically delete the drive when a device using it gets
>>>   * unplugged.  Questionable feature, but we can't just drop it.
>>
>> Can you avoid the forward declaration?
>>
>> Paolo
>>
> 
> Yes, sure - honestly, I added this patch in, but I assumed that the
> similar patch of yours to support mirroring would go in first, making
> this patch moot.

I now took this patch of yours in my tree (minus the forward
declaration), so...

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-09-07 17:09 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-30 18:47 [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody
2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 1/6] block: add support functions for live commit, to find and delete images Jeff Cody
2012-09-06 13:23   ` Kevin Wolf
2012-09-06 14:58     ` Eric Blake
2012-09-06 14:59     ` Jeff Cody
2012-09-07 10:19       ` Kevin Wolf
2012-09-07 15:17         ` Jeff Cody
2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 2/6] block: add live block commit functionality Jeff Cody
2012-09-06 14:00   ` Kevin Wolf
2012-09-06 20:37     ` Jeff Cody
2012-09-06 21:16       ` Eric Blake
2012-09-07 15:56         ` Jeff Cody
2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 3/6] blockdev: rename block_stream_cb to a generic block_job_cb Jeff Cody
2012-09-07 16:27   ` Paolo Bonzini
2012-09-07 17:04     ` Jeff Cody
2012-09-07 17:09       ` Paolo Bonzini
2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 4/6] qerror: new error for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
2012-08-30 22:55   ` Eric Blake
2012-08-31 14:42     ` Jeff Cody
2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 5/6] block: helper function, to find the base image of a chain Jeff Cody
2012-08-30 18:47 ` [Qemu-devel] [RFC v2 PATCH 6/6] QAPI: add command for live block commit, 'block-commit' Jeff Cody
2012-08-30 23:06   ` Eric Blake
2012-08-31 14:42     ` Jeff Cody
2012-09-06 14:29   ` Kevin Wolf
2012-09-06 21:00     ` Jeff Cody
2012-08-30 21:31 ` [Qemu-devel] [RFC v2 PATCH 0/6] Live block commit Jeff Cody

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.