All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/2] Group Live Snapshots
@ 2012-02-26 19:31 Jeff Cody
  2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody
  2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 2/2] QMP: Add qmp command for blockdev-group-snapshot-sync Jeff Cody
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff Cody @ 2012-02-26 19:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, eblake, armbru, lcapitulino

This patchset adds the ability to take a snapshot of a group of devices,
rather than each device individually.  Upon failure of any snapshot, all
snapshots taken by the command will be abandoned, and the appropriate failure
code returned.

This differs from v1 in that:
    * The QAPI input mechanism for JSON-arrays of qdict items is now used
      correctly, and there is no modification of the existing monitor code.
      This drops the original patch 1 from v1.
    * Rather than use bdrv_close() and bdrv_open() to pivot the snapshot,
      the fields of the BlockDriverState are manipulated so that there are
      no irrecoverable failure points in the snapshot process. This is based
      on a suggestion by Kevin Wolf.
    * The qapi & block code was broken out into patch 1/2, and the QMP command
	  placed patch 2/2
    * Since there are no irrecoverable error points, there is a no need for
      a command to return a list of failures.  There is at most one failure
      to report, which is the first failure encountered. In light of that,
      patch 3 from v1 was dropped.

Some things for careful review:

    In patch 1/2, in the new bdrv_append() function:
        * Are all of the relevant fields preserved in the top bs?
         (see 'bdrv_append()')
        * Conversely, are any of the fields being preserved that should not be?
        * Are there race condition concerns at the end of bdrv_append(), at the
          line '*bs_top = tmp;', which replaces the contents of the current top
          bs?


Jeff Cody (2):
  qapi: Introduce blockdev-group-snapshot-sync command
  QMP: Add qmp command for blockdev-group-snapshot-sync

 block.c          |   47 ++++++++++++++++++++
 block.h          |    1 +
 blockdev.c       |  128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json |   38 ++++++++++++++++
 qmp-commands.hx  |   39 ++++++++++++++++
 5 files changed, 253 insertions(+), 0 deletions(-)

-- 
1.7.9.rc2.1.g69204

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-26 19:31 [Qemu-devel] [PATCH v2 0/2] Group Live Snapshots Jeff Cody
@ 2012-02-26 19:31 ` Jeff Cody
  2012-02-27 11:13   ` Stefan Hajnoczi
  2012-02-27 15:46   ` Kevin Wolf
  2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 2/2] QMP: Add qmp command for blockdev-group-snapshot-sync Jeff Cody
  1 sibling, 2 replies; 13+ messages in thread
From: Jeff Cody @ 2012-02-26 19:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, eblake, armbru, lcapitulino

This is a QAPI/QMP only command to take a snapshot of a group of
devices. This is similar to the blockdev-snapshot-sync command, except
blockdev-group-snapshot-sync accepts a list devices, filenames, and
formats.

It is attempted to keep the snapshot of the group atomic; if the
creation or open of any of the new snapshots fails, then all of
the new snapshots are abandoned, and the name of the snapshot image
that failed is returned.  The failure case should not interrupt
any operations.

Rather than use bdrv_close() along with a subsequent bdrv_open() to
perform the pivot, the original image is never closed and the new
image is placed 'in front' of the original image via manipulation
of the BlockDriverState fields.  Thus, once the new snapshot image
has been successfully created, there are no more failure points
before pivoting to the new snapshot.

This allows the group of disks to remain consistent with each other,
even across snapshot failures.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block.c          |   47 ++++++++++++++++++++
 block.h          |    1 +
 blockdev.c       |  128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json |   38 ++++++++++++++++
 4 files changed, 214 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 3621d11..0045ab1 100644
--- a/block.c
+++ b/block.c
@@ -880,6 +880,53 @@ void bdrv_make_anon(BlockDriverState *bs)
     bs->device_name[0] = '\0';
 }
 
+/*
+ * Add new bs contents at the top of an image chain while the chain is live,
+ * while keeping required fields on the top layer.
+ *
+ * It is assumed that bs_new already points to an existing image,
+ * with the correct backing filename of top->backing_file
+ */
+void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
+{
+    BlockDriverState tmp;
+
+    /* the new bs must not be in bdrv_states */
+    bdrv_make_anon(bs_new);
+
+    tmp = *bs_new;
+    tmp.backing_hd = bs_new;
+
+    /* there are some fields that need to stay on the top layer: */
+
+    /* dev info */
+    tmp.dev_ops          = bs_top->dev_ops;
+    tmp.dev_opaque       = bs_top->dev_opaque;
+    tmp.dev              = bs_top->dev;
+    tmp.buffer_alignment = bs_top->buffer_alignment;
+    tmp.copy_on_read     = bs_top->copy_on_read;
+
+    /* i/o timing parameters */
+    tmp.slice_time        = bs_top->slice_time;
+    tmp.slice_start       = bs_top->slice_start;
+    tmp.slice_end         = bs_top->slice_end;
+    tmp.io_limits         = bs_top->io_limits;
+    tmp.io_base           = bs_top->io_base;
+    tmp.throttled_reqs    = bs_top->throttled_reqs;
+    tmp.block_timer       = bs_top->block_timer;
+    tmp.io_limits_enabled = bs_top->io_limits_enabled;
+
+    /* keep the same entry in bdrv_states */
+    pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name);
+    tmp.list = bs_top->list;
+
+    /* swap contents of the fixed new bs and the current top */
+    *bs_new = *bs_top;
+    *bs_top = tmp;
+
+    bdrv_detach_dev(bs_new, bs_new->dev);
+}
+
 void bdrv_delete(BlockDriverState *bs)
 {
     assert(!bs->dev);
diff --git a/block.h b/block.h
index cae289b..190a780 100644
--- a/block.h
+++ b/block.h
@@ -114,6 +114,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
 BlockDriverState *bdrv_new(const char *device_name);
 void bdrv_make_anon(BlockDriverState *bs);
+void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_parse_cache_flags(const char *mode, int *flags);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
diff --git a/blockdev.c b/blockdev.c
index 05e7c5e..560f7e8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -714,6 +714,134 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file,
     }
 }
 
+
+/* New and old BlockDriverState structs for group snapshots */
+typedef struct BlkGroupSnapshotStates {
+    BlockDriverState *old_bs;
+    BlockDriverState *new_bs;
+    bool is_open;
+    QSIMPLEQ_ENTRY(BlkGroupSnapshotStates) entry;
+} BlkGroupSnapshotStates;
+
+/*
+ * 'Atomic' group snapshots.  The snapshots are taken as a set, and if any fail
+ *  then we do not pivot any of the devices in the group, and abandon the
+ *  snapshots
+ */
+void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list,
+                                                   Error **errp)
+{
+    int ret = 0;
+    SnapshotDevList *dev_entry = dev_list;
+    SnapshotDev *dev_info = NULL;
+    BlkGroupSnapshotStates *states;
+    BlockDriver *proto_drv;
+    BlockDriver *drv;
+    int flags;
+    const char *format;
+    const char *snapshot_file;
+
+    QSIMPLEQ_HEAD(snap_bdrv_states, BlkGroupSnapshotStates) snap_bdrv_states;
+    QSIMPLEQ_INIT(&snap_bdrv_states);
+
+    /* We don't do anything in this loop that commits us to the snapshot */
+    while (NULL != dev_entry) {
+        dev_info = dev_entry->value;
+        dev_entry = dev_entry->next;
+
+        states = g_malloc0(sizeof(BlkGroupSnapshotStates));
+        states->is_open = false;
+
+        states->old_bs = bdrv_find(dev_info->device);
+
+        if (!states->old_bs) {
+            error_set(errp, QERR_DEVICE_NOT_FOUND, dev_info->device);
+            goto fail;
+        }
+
+        if (bdrv_in_use(states->old_bs)) {
+            error_set(errp, QERR_DEVICE_IN_USE, dev_info->device);
+            goto fail;
+        }
+
+        snapshot_file = dev_info->snapshot_file;
+
+        flags = states->old_bs->open_flags;
+
+        if (!dev_info->has_format) {
+            format = "qcow2";
+        } else {
+            format = dev_info->format;
+        }
+
+        drv = bdrv_find_format(format);
+        if (!drv) {
+            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+            goto fail;
+        }
+
+        proto_drv = bdrv_find_protocol(snapshot_file);
+        if (!proto_drv) {
+            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+            goto fail;
+        }
+
+        /* create new image w/backing file */
+        ret = bdrv_img_create(snapshot_file, format,
+                              states->old_bs->filename,
+                              drv->format_name, NULL, -1, flags);
+        if (ret) {
+            error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
+            goto fail;
+        }
+
+        /* We will manually add the backing_hd field to the bs later */
+        states->new_bs = bdrv_new("");
+        ret = bdrv_open(states->new_bs, snapshot_file,
+                        flags | BDRV_O_NO_BACKING, drv);
+        states->is_open = true;
+
+        if (ret != 0) {
+            error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
+            goto close_and_fail;
+        }
+        QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry);
+    }
+
+    /*
+     * Now we will drain, flush, & pivot everything - we are committed at this
+     * point.
+     */
+    bdrv_drain_all();
+    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
+        bdrv_flush(states->old_bs);
+
+        /* This removes our old bs from the bdrv_states, and adds the new bs */
+        bdrv_append(states->new_bs, states->old_bs);
+    }
+
+    /* success */
+    goto exit;
+
+close_and_fail:
+    /*
+    * failure, and it is all-or-none; abandon each new bs, and keep using
+    * the original bs for all images
+    */
+    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
+        if (states->is_open) {
+             bdrv_close(states->new_bs);
+        }
+    }
+fail:
+exit:
+    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
+        g_free(states);
+    }
+    return;
+}
+
+
 static void eject_device(BlockDriverState *bs, int force, Error **errp)
 {
     if (bdrv_in_use(bs)) {
diff --git a/qapi-schema.json b/qapi-schema.json
index 80debe6..a0cab8f 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1107,6 +1107,44 @@
 { 'command': 'block_resize', 'data': { 'device': 'str', 'size': 'int' }}
 
 ##
+# @SnapshotDev
+#
+# @device:  the name of the device to generate the snapshot from.
+#
+# @snapshot-file: the target of the new image. A new file will be created.
+#
+# @format: #optional the format of the snapshot image, default is 'qcow2'.
+##
+{ 'type': 'SnapshotDev',
+  'data': {'device': 'str', 'snapshot-file': 'str', '*format': 'str' } }
+
+##
+# @blockdev-group-snapshot-sync
+#
+# Generates a synchronous snapshot of a group of one or more block devices,
+# as atomically as possible.  If the snapshot of any device in the group
+# fails, then the entire group snapshot will be abandoned and the
+# appropriate error returned.
+#
+#  List of:
+#  @SnapshotDev: information needed for the device snapshot
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+#          If @device is busy, DeviceInUse will be returned
+#          If @snapshot-file can't be created, OpenFileFailed
+#          If @snapshot-file can't be opened, OpenFileFailed
+#          If @format is invalid, InvalidBlockFormat
+#
+# Note: The group snapshot attempt returns failure on the first snapshot
+# device failure.  Therefore, there will be only one device or snapshot file
+# returned in an error condition, and subsequent devices will not have been
+# attempted.
+##
+{ 'command': 'blockdev-group-snapshot-sync',
+  'data': { 'devlist': [ 'SnapshotDev' ] } }
+
+##
 # @blockdev-snapshot-sync
 #
 # Generates a synchronous snapshot of a block device.
-- 
1.7.9.rc2.1.g69204

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 2/2] QMP: Add qmp command for blockdev-group-snapshot-sync
  2012-02-26 19:31 [Qemu-devel] [PATCH v2 0/2] Group Live Snapshots Jeff Cody
  2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody
@ 2012-02-26 19:31 ` Jeff Cody
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Cody @ 2012-02-26 19:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, eblake, armbru, lcapitulino

This adds the QMP command for blockdev-group-snapshot-sync. It
takes an array in as the input, for the argument devlist.  The
array consists of the following elements:

    + device:        device to snapshot. e.g. "ide-hd0", "virtio0"
    + snapshot-file: path & file for the snapshot image. e.g. "/tmp/file.img"
    + format:        snapshot format. e.g., "qcow2". Optional

There is no HMP equivalent for the command.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 qmp-commands.hx |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index bd6b641..365489c 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -665,6 +665,45 @@ EQMP
         .args_type  = "device:B",
         .mhandler.cmd_new = qmp_marshal_input_block_job_cancel,
     },
+    {
+        .name       = "blockdev-group-snapshot-sync",
+        .args_type  = "devlist:O",
+        .params  = "device:B,snapshot-file:s,format:s?",
+        .mhandler.cmd_new = qmp_marshal_input_blockdev_group_snapshot_sync,
+    },
+
+SQMP
+blockdev-group-snapshot-sync
+----------------------
+
+Synchronous snapshot of one or more block devices.  A list array input
+is accepted, that contains the device and snapshot file information for
+each device in group. The default format, if not specified, is qcow2.
+
+If there is any failure creating or opening a new snapshot, all snapshots
+for the group are abandoned, and the original disks pre-snapshot attempt
+are used.
+
+
+Arguments:
+
+devlist array:
+    - "device": device name to snapshot (json-string)
+    - "snapshot-file": name of new image file (json-string)
+    - "format": format of new image (json-string, optional)
+
+Example:
+
+-> { "execute": "blockdev-group-snapshot-sync", "arguments":
+                      { "devlist": [{ "device": "ide-hd0",
+                                      "snapshot-file": "/some/place/my-image",
+                                      "format": "qcow2" },
+                                    { "device": "ide-hd1",
+                                      "snapshot-file": "/some/place/my-image2",
+                                      "format": "qcow2" }] } }
+<- { "return": {} }
+
+EQMP
 
     {
         .name       = "blockdev-snapshot-sync",
-- 
1.7.9.rc2.1.g69204

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody
@ 2012-02-27 11:13   ` Stefan Hajnoczi
  2012-02-27 14:26     ` Jeff Cody
  2012-02-27 15:46   ` Kevin Wolf
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2012-02-27 11:13 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, armbru, qemu-devel, lcapitulino, pbonzini, eblake

On Sun, Feb 26, 2012 at 7:31 PM, Jeff Cody <jcody@redhat.com> wrote:

Do you have automated tests for this feature?

> +/*
> + * Add new bs contents at the top of an image chain while the chain is live,
> + * while keeping required fields on the top layer.

Please also document the swap behavior.  It's pretty important for the
caller to realize that once this function returns, their
BlockDriverState arguments with have swapped.

> + * It is assumed that bs_new already points to an existing image,
> + * with the correct backing filename of top->backing_file

Not sure what this means.  Isn't bs_new going to use bs_top as its
backing file?  Why "top->backing_file"?

> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
> +{
> +    BlockDriverState tmp;
> +
> +    /* the new bs must not be in bdrv_states */
> +    bdrv_make_anon(bs_new);
> +
> +    tmp = *bs_new;
> +    tmp.backing_hd = bs_new;

This is tricky, would be nice to comment that we will swap bs_new and
bs_top later, therefore we need a pointer to bs_new here even though
it doesn't make sense yet.

> +
> +    /* there are some fields that need to stay on the top layer: */
> +
> +    /* dev info */
> +    tmp.dev_ops          = bs_top->dev_ops;
> +    tmp.dev_opaque       = bs_top->dev_opaque;
> +    tmp.dev              = bs_top->dev;
> +    tmp.buffer_alignment = bs_top->buffer_alignment;
> +    tmp.copy_on_read     = bs_top->copy_on_read;
> +
> +    /* i/o timing parameters */
> +    tmp.slice_time        = bs_top->slice_time;
> +    tmp.slice_start       = bs_top->slice_start;
> +    tmp.slice_end         = bs_top->slice_end;
> +    tmp.io_limits         = bs_top->io_limits;
> +    tmp.io_base           = bs_top->io_base;
> +    tmp.throttled_reqs    = bs_top->throttled_reqs;
> +    tmp.block_timer       = bs_top->block_timer;
> +    tmp.io_limits_enabled = bs_top->io_limits_enabled;

Please add a comment into BlockDriverState to warn that changes to
fields may require updates to this function too!

> +        /* We will manually add the backing_hd field to the bs later */
> +        states->new_bs = bdrv_new("");
> +        ret = bdrv_open(states->new_bs, snapshot_file,
> +                        flags | BDRV_O_NO_BACKING, drv);
> +        states->is_open = true;

What is the point of is_open?  If we failed then new_bs is actually
not open here.

I think you can remove this field and just do the following in close_and_fail:

if (states->new_bs) {
    bdrv_delete(states->new_bs);
}

(BTW I think close_and_fail is currently leaking new_bs because it
only closes it and does not delete it!)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-27 11:13   ` Stefan Hajnoczi
@ 2012-02-27 14:26     ` Jeff Cody
  2012-02-27 14:40       ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Cody @ 2012-02-27 14:26 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, armbru, lcapitulino, pbonzini, eblake

On 02/27/2012 06:13 AM, Stefan Hajnoczi wrote:
> On Sun, Feb 26, 2012 at 7:31 PM, Jeff Cody <jcody@redhat.com> wrote:
> 
> Do you have automated tests for this feature?
> 

No, not yet.  The testing has been manual.

>> +/*
>> + * Add new bs contents at the top of an image chain while the chain is live,
>> + * while keeping required fields on the top layer.
> 
> Please also document the swap behavior.  It's pretty important for the
> caller to realize that once this function returns, their
> BlockDriverState arguments with have swapped.

Good point.  How about this:

/*
 * Add new bs contents at the top of an image chain while the chain is
 * live, while keeping required fields on the top layer.
 *
 * This will modify the BlockDriverState fields, and swap contents
 * between bs_new and bs_top.  Both bs_new and bs_top are modified.
 *
 * A new image file is not created.  It is assumed that bs_new already
 * points to an existing image, with the correct backing filename of
 * bs_top->backing_file.
 */


> 
>> + * It is assumed that bs_new already points to an existing image,
>> + * with the correct backing filename of top->backing_file
> 
> Not sure what this means.  Isn't bs_new going to use bs_top as its
> backing file?  Why "top->backing_file"?

Sorry, that should have been 'bs_top->backing_file'. The image file is
not created by this function.  I added some more explanation, and
corrected that typo, in the above comment block.  Let me know if you
think it still needs more clarification.


> 
>> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
>> +{
>> +    BlockDriverState tmp;
>> +
>> +    /* the new bs must not be in bdrv_states */
>> +    bdrv_make_anon(bs_new);
>> +
>> +    tmp = *bs_new;
>> +    tmp.backing_hd = bs_new;
> 
> This is tricky, would be nice to comment that we will swap bs_new and
> bs_top later, therefore we need a pointer to bs_new here even though
> it doesn't make sense yet.

OK, thanks. I will add some clarifying comments - you are right, it is a
bit tricky and counter-intuitive.  It may also be clearer if this is
done closer to the actual swap ('*bs_top = tmp').  I will move it down
to above the swap.

> 
>> +
>> +    /* there are some fields that need to stay on the top layer: */
>> +
>> +    /* dev info */
>> +    tmp.dev_ops          = bs_top->dev_ops;
>> +    tmp.dev_opaque       = bs_top->dev_opaque;
>> +    tmp.dev              = bs_top->dev;
>> +    tmp.buffer_alignment = bs_top->buffer_alignment;
>> +    tmp.copy_on_read     = bs_top->copy_on_read;
>> +
>> +    /* i/o timing parameters */
>> +    tmp.slice_time        = bs_top->slice_time;
>> +    tmp.slice_start       = bs_top->slice_start;
>> +    tmp.slice_end         = bs_top->slice_end;
>> +    tmp.io_limits         = bs_top->io_limits;
>> +    tmp.io_base           = bs_top->io_base;
>> +    tmp.throttled_reqs    = bs_top->throttled_reqs;
>> +    tmp.block_timer       = bs_top->block_timer;
>> +    tmp.io_limits_enabled = bs_top->io_limits_enabled;
> 
> Please add a comment into BlockDriverState to warn that changes to
> fields may require updates to this function too!

OK; I will add a blanket warning at the top to inspect bdrv_append()
when adding new fields, to see if they need updated in bdrv_append().

> 
>> +        /* We will manually add the backing_hd field to the bs later */
>> +        states->new_bs = bdrv_new("");
>> +        ret = bdrv_open(states->new_bs, snapshot_file,
>> +                        flags | BDRV_O_NO_BACKING, drv);
>> +        states->is_open = true;
> 
> What is the point of is_open?  If we failed then new_bs is actually
> not open here.

Previous entries may have been opened. You are right, however, in that
it is not needed at all. We can rely on new_bs being non-NULL.  I will
remove the 'is_open' references.

> 
> I think you can remove this field and just do the following in close_and_fail:
> 
> if (states->new_bs) {
>     bdrv_delete(states->new_bs);
> }
> 
> (BTW I think close_and_fail is currently leaking new_bs because it
> only closes it and does not delete it!)
> 

You are right. That should be bdrv_delete(), and not bdrv_close().  I
will fix that, and also rename close_and_fail to delete_and_fail for
good measure.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-27 14:26     ` Jeff Cody
@ 2012-02-27 14:40       ` Stefan Hajnoczi
  2012-02-27 15:23         ` Jeff Cody
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2012-02-27 14:40 UTC (permalink / raw)
  To: jcody; +Cc: kwolf, qemu-devel, armbru, lcapitulino, pbonzini, eblake

On Mon, Feb 27, 2012 at 2:26 PM, Jeff Cody <jcody@redhat.com> wrote:
> On 02/27/2012 06:13 AM, Stefan Hajnoczi wrote:
>> On Sun, Feb 26, 2012 at 7:31 PM, Jeff Cody <jcody@redhat.com> wrote:
>>
>> Do you have automated tests for this feature?
>>
>
> No, not yet.  The testing has been manual.

For image streaming I used the Python unittest framework along with
QMP/qmp.py to create tests.  I am going to submit it as a qemu-iotest.
 We really need something along the lines of a harness with QMP
support so that these block layer features can be tested.  I will CC
you on the email.

>
>>> +/*
>>> + * Add new bs contents at the top of an image chain while the chain is live,
>>> + * while keeping required fields on the top layer.
>>
>> Please also document the swap behavior.  It's pretty important for the
>> caller to realize that once this function returns, their
>> BlockDriverState arguments with have swapped.
>
> Good point.  How about this:
>
> /*
>  * Add new bs contents at the top of an image chain while the chain is
>  * live, while keeping required fields on the top layer.
>  *
>  * This will modify the BlockDriverState fields, and swap contents
>  * between bs_new and bs_top.  Both bs_new and bs_top are modified.

Looks good.

>>> + * It is assumed that bs_new already points to an existing image,
>>> + * with the correct backing filename of top->backing_file
>>
>> Not sure what this means.  Isn't bs_new going to use bs_top as its
>> backing file?  Why "top->backing_file"?
>
> Sorry, that should have been 'bs_top->backing_file'. The image file is
> not created by this function.  I added some more explanation, and
> corrected that typo, in the above comment block.  Let me know if you
> think it still needs more clarification.

I still don't follow.  Old bs_top's image file itself becomes the
backing file, not bs_top->backing_file.  Perhaps I'm misinterpreting
because of how swap changes bs_top and bs_new, but I'm reading it from
the perspective of the caller when they pass in bs_top.

The rest looks good.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-27 14:40       ` Stefan Hajnoczi
@ 2012-02-27 15:23         ` Jeff Cody
  2012-02-27 15:31           ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Cody @ 2012-02-27 15:23 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, armbru, lcapitulino, pbonzini, eblake

On 02/27/2012 09:40 AM, Stefan Hajnoczi wrote:
> On Mon, Feb 27, 2012 at 2:26 PM, Jeff Cody <jcody@redhat.com> wrote:
>> On 02/27/2012 06:13 AM, Stefan Hajnoczi wrote:
>>> On Sun, Feb 26, 2012 at 7:31 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>
>>> Do you have automated tests for this feature?
>>>
>>
>> No, not yet.  The testing has been manual.
> 
> For image streaming I used the Python unittest framework along with
> QMP/qmp.py to create tests.  I am going to submit it as a qemu-iotest.
>  We really need something along the lines of a harness with QMP
> support so that these block layer features can be tested.  I will CC
> you on the email.

Excellent, thanks.

> 
>>
>>>> +/*
>>>> + * Add new bs contents at the top of an image chain while the chain is live,
>>>> + * while keeping required fields on the top layer.
>>>
>>> Please also document the swap behavior.  It's pretty important for the
>>> caller to realize that once this function returns, their
>>> BlockDriverState arguments with have swapped.
>>
>> Good point.  How about this:
>>
>> /*
>>  * Add new bs contents at the top of an image chain while the chain is
>>  * live, while keeping required fields on the top layer.
>>  *
>>  * This will modify the BlockDriverState fields, and swap contents
>>  * between bs_new and bs_top.  Both bs_new and bs_top are modified.
> 
> Looks good.
> 
>>>> + * It is assumed that bs_new already points to an existing image,
>>>> + * with the correct backing filename of top->backing_file
>>>
>>> Not sure what this means.  Isn't bs_new going to use bs_top as its
>>> backing file?  Why "top->backing_file"?
>>
>> Sorry, that should have been 'bs_top->backing_file'. The image file is
>> not created by this function.  I added some more explanation, and
>> corrected that typo, in the above comment block.  Let me know if you
>> think it still needs more clarification.
> 
> I still don't follow.  Old bs_top's image file itself becomes the
> backing file, not bs_top->backing_file.  Perhaps I'm misinterpreting
> because of how swap changes bs_top and bs_new, but I'm reading it from
> the perspective of the caller when they pass in bs_top.
> 

Maybe it would be better to just replace that part of the comment with
something that says "This function does not create the image file".

The function bdrv_append() will neither create the image file, or set
(in the case of qcow2) the extended headers of the file to have the
backing filename.  It is only concerned with the bs contents.

For example, in group snapshots in blockdev.c, the image file is first
created, and then we do the swap:

        /* create new image w/backing file */
        ret = bdrv_img_create(snapshot_file, format,
                              states->old_bs->filename,
                              drv->format_name, NULL, -1, flags);

        ...

        ret = bdrv_open(states->new_bs, snapshot_file,
                        flags | BDRV_O_NO_BACKING, drv);

        ...

        bdrv_append(states->new_bs, states->old_bs);




> The rest looks good.
> 
> Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-27 15:23         ` Jeff Cody
@ 2012-02-27 15:31           ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2012-02-27 15:31 UTC (permalink / raw)
  To: jcody; +Cc: kwolf, qemu-devel, armbru, lcapitulino, pbonzini, eblake

On Mon, Feb 27, 2012 at 3:23 PM, Jeff Cody <jcody@redhat.com> wrote:
> On 02/27/2012 09:40 AM, Stefan Hajnoczi wrote:
>> On Mon, Feb 27, 2012 at 2:26 PM, Jeff Cody <jcody@redhat.com> wrote:
>>> On 02/27/2012 06:13 AM, Stefan Hajnoczi wrote:
>>>> On Sun, Feb 26, 2012 at 7:31 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>> + * It is assumed that bs_new already points to an existing image,
>>>>> + * with the correct backing filename of top->backing_file
>>>>
>>>> Not sure what this means.  Isn't bs_new going to use bs_top as its
>>>> backing file?  Why "top->backing_file"?
>>>
>>> Sorry, that should have been 'bs_top->backing_file'. The image file is
>>> not created by this function.  I added some more explanation, and
>>> corrected that typo, in the above comment block.  Let me know if you
>>> think it still needs more clarification.
>>
>> I still don't follow.  Old bs_top's image file itself becomes the
>> backing file, not bs_top->backing_file.  Perhaps I'm misinterpreting
>> because of how swap changes bs_top and bs_new, but I'm reading it from
>> the perspective of the caller when they pass in bs_top.
>>
>
> Maybe it would be better to just replace that part of the comment with
> something that says "This function does not create the image file".
>
> The function bdrv_append() will neither create the image file, or set
> (in the case of qcow2) the extended headers of the file to have the
> backing filename.  It is only concerned with the bs contents.

Makes sense.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody
  2012-02-27 11:13   ` Stefan Hajnoczi
@ 2012-02-27 15:46   ` Kevin Wolf
  2012-02-27 16:51     ` Paolo Bonzini
  2012-02-27 17:02     ` Jeff Cody
  1 sibling, 2 replies; 13+ messages in thread
From: Kevin Wolf @ 2012-02-27 15:46 UTC (permalink / raw)
  To: Jeff Cody; +Cc: pbonzini, eblake, qemu-devel, armbru, lcapitulino

Am 26.02.2012 20:31, schrieb Jeff Cody:
> This is a QAPI/QMP only command to take a snapshot of a group of
> devices. This is similar to the blockdev-snapshot-sync command, except
> blockdev-group-snapshot-sync accepts a list devices, filenames, and
> formats.
> 
> It is attempted to keep the snapshot of the group atomic; if the
> creation or open of any of the new snapshots fails, then all of
> the new snapshots are abandoned, and the name of the snapshot image
> that failed is returned.  The failure case should not interrupt
> any operations.
> 
> Rather than use bdrv_close() along with a subsequent bdrv_open() to
> perform the pivot, the original image is never closed and the new
> image is placed 'in front' of the original image via manipulation
> of the BlockDriverState fields.  Thus, once the new snapshot image
> has been successfully created, there are no more failure points
> before pivoting to the new snapshot.
> 
> This allows the group of disks to remain consistent with each other,
> even across snapshot failures.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  block.c          |   47 ++++++++++++++++++++
>  block.h          |    1 +
>  blockdev.c       |  128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json |   38 ++++++++++++++++
>  4 files changed, 214 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 3621d11..0045ab1 100644
> --- a/block.c
> +++ b/block.c
> @@ -880,6 +880,53 @@ void bdrv_make_anon(BlockDriverState *bs)
>      bs->device_name[0] = '\0';
>  }
>  
> +/*
> + * Add new bs contents at the top of an image chain while the chain is live,
> + * while keeping required fields on the top layer.
> + *
> + * It is assumed that bs_new already points to an existing image,
> + * with the correct backing filename of top->backing_file
> + */
> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
> +{
> +    BlockDriverState tmp;
> +
> +    /* the new bs must not be in bdrv_states */
> +    bdrv_make_anon(bs_new);
> +
> +    tmp = *bs_new;
> +    tmp.backing_hd = bs_new;
> +
> +    /* there are some fields that need to stay on the top layer: */
> +
> +    /* dev info */
> +    tmp.dev_ops          = bs_top->dev_ops;
> +    tmp.dev_opaque       = bs_top->dev_opaque;
> +    tmp.dev              = bs_top->dev;
> +    tmp.buffer_alignment = bs_top->buffer_alignment;
> +    tmp.copy_on_read     = bs_top->copy_on_read;
> +
> +    /* i/o timing parameters */
> +    tmp.slice_time        = bs_top->slice_time;
> +    tmp.slice_start       = bs_top->slice_start;
> +    tmp.slice_end         = bs_top->slice_end;
> +    tmp.io_limits         = bs_top->io_limits;
> +    tmp.io_base           = bs_top->io_base;
> +    tmp.throttled_reqs    = bs_top->throttled_reqs;
> +    tmp.block_timer       = bs_top->block_timer;
> +    tmp.io_limits_enabled = bs_top->io_limits_enabled;

What about iostatus_enabled/iostatus? Possibly on_read/write_error and
geometry, too. (Maybe the correct answer is that you're doing the right
thing, but I wanted to bring it up)

> +
> +    /* keep the same entry in bdrv_states */
> +    pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name);
> +    tmp.list = bs_top->list;
> +
> +    /* swap contents of the fixed new bs and the current top */
> +    *bs_new = *bs_top;
> +    *bs_top = tmp;
> +
> +    bdrv_detach_dev(bs_new, bs_new->dev);
> +}

The last line would actually deserve a comment /* clear the copied
fields in the new backing file */, which makes clear that there are
probably some more fields missing in this section.

> +
>  void bdrv_delete(BlockDriverState *bs)
>  {
>      assert(!bs->dev);
> diff --git a/block.h b/block.h
> index cae289b..190a780 100644
> --- a/block.h
> +++ b/block.h
> @@ -114,6 +114,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>  int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
>  BlockDriverState *bdrv_new(const char *device_name);
>  void bdrv_make_anon(BlockDriverState *bs);
> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
>  void bdrv_delete(BlockDriverState *bs);
>  int bdrv_parse_cache_flags(const char *mode, int *flags);
>  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
> diff --git a/blockdev.c b/blockdev.c
> index 05e7c5e..560f7e8 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -714,6 +714,134 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file,
>      }
>  }
>  
> +
> +/* New and old BlockDriverState structs for group snapshots */
> +typedef struct BlkGroupSnapshotStates {
> +    BlockDriverState *old_bs;
> +    BlockDriverState *new_bs;
> +    bool is_open;
> +    QSIMPLEQ_ENTRY(BlkGroupSnapshotStates) entry;
> +} BlkGroupSnapshotStates;
> +
> +/*
> + * 'Atomic' group snapshots.  The snapshots are taken as a set, and if any fail
> + *  then we do not pivot any of the devices in the group, and abandon the
> + *  snapshots
> + */
> +void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list,
> +                                                   Error **errp)

Too much indentation?

> +{
> +    int ret = 0;
> +    SnapshotDevList *dev_entry = dev_list;
> +    SnapshotDev *dev_info = NULL;
> +    BlkGroupSnapshotStates *states;
> +    BlockDriver *proto_drv;
> +    BlockDriver *drv;
> +    int flags;
> +    const char *format;
> +    const char *snapshot_file;
> +
> +    QSIMPLEQ_HEAD(snap_bdrv_states, BlkGroupSnapshotStates) snap_bdrv_states;
> +    QSIMPLEQ_INIT(&snap_bdrv_states);
> +
> +    /* We don't do anything in this loop that commits us to the snapshot */
> +    while (NULL != dev_entry) {
> +        dev_info = dev_entry->value;
> +        dev_entry = dev_entry->next;
> +
> +        states = g_malloc0(sizeof(BlkGroupSnapshotStates));
> +        states->is_open = false;
> +
> +        states->old_bs = bdrv_find(dev_info->device);
> +
> +        if (!states->old_bs) {
> +            error_set(errp, QERR_DEVICE_NOT_FOUND, dev_info->device);
> +            goto fail;
> +        }
> +
> +        if (bdrv_in_use(states->old_bs)) {
> +            error_set(errp, QERR_DEVICE_IN_USE, dev_info->device);
> +            goto fail;
> +        }
> +
> +        snapshot_file = dev_info->snapshot_file;
> +
> +        flags = states->old_bs->open_flags;
> +
> +        if (!dev_info->has_format) {
> +            format = "qcow2";
> +        } else {
> +            format = dev_info->format;
> +        }
> +
> +        drv = bdrv_find_format(format);
> +        if (!drv) {
> +            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
> +            goto fail;
> +        }
> +
> +        proto_drv = bdrv_find_protocol(snapshot_file);
> +        if (!proto_drv) {
> +            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
> +            goto fail;
> +        }
> +
> +        /* create new image w/backing file */
> +        ret = bdrv_img_create(snapshot_file, format,
> +                              states->old_bs->filename,
> +                              drv->format_name, NULL, -1, flags);
> +        if (ret) {
> +            error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
> +            goto fail;
> +        }
> +
> +        /* We will manually add the backing_hd field to the bs later */
> +        states->new_bs = bdrv_new("");
> +        ret = bdrv_open(states->new_bs, snapshot_file,
> +                        flags | BDRV_O_NO_BACKING, drv);
> +        states->is_open = true;
> +
> +        if (ret != 0) {
> +            error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
> +            goto close_and_fail;
> +        }
> +        QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry);

Inserting it only here means that all the error cases above leak states.

> +    }
> +
> +    /*
> +     * Now we will drain, flush, & pivot everything - we are committed at this
> +     * point.
> +     */
> +    bdrv_drain_all();

I would feel more comfortable if we could do the bdrv_drain_all() at the
very beginning of the function. Not that I know of a specific scenario
that would go wrong, but you have a nested main loop here that could do
more or less anything.

> +    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
> +        bdrv_flush(states->old_bs);

This can return an error which must be checked. And of course, we must
do it before committing to the snapshot (but after bdrv_drain_all).

> +        /* This removes our old bs from the bdrv_states, and adds the new bs */
> +        bdrv_append(states->new_bs, states->old_bs);
> +    }
> +
> +    /* success */
> +    goto exit;
> +
> +close_and_fail:
> +    /*
> +    * failure, and it is all-or-none; abandon each new bs, and keep using
> +    * the original bs for all images
> +    */
> +    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
> +        if (states->is_open) {
> +             bdrv_close(states->new_bs);

bdrv_delete, as Stefan already mentioned.

> +        }
> +    }
> +fail:
> +exit:
> +    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
> +        g_free(states);
> +    }
> +    return;
> +}

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-27 15:46   ` Kevin Wolf
@ 2012-02-27 16:51     ` Paolo Bonzini
  2012-02-27 17:02     ` Jeff Cody
  1 sibling, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2012-02-27 16:51 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Jeff Cody, lcapitulino, eblake, qemu-devel, armbru

On 02/27/2012 04:46 PM, Kevin Wolf wrote:
> > +    /*
> > +     * Now we will drain, flush, & pivot everything - we are committed at this
> > +     * point.
> > +     */
> > +    bdrv_drain_all();
> 
> I would feel more comfortable if we could do the bdrv_drain_all() at the
> very beginning of the function. Not that I know of a specific scenario
> that would go wrong, but you have a nested main loop here that could do
> more or less anything.

The same could be said of a bdrv_read or a bdrv_write done by bdrv_open, no?

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-27 15:46   ` Kevin Wolf
  2012-02-27 16:51     ` Paolo Bonzini
@ 2012-02-27 17:02     ` Jeff Cody
  2012-02-27 17:22       ` Kevin Wolf
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff Cody @ 2012-02-27 17:02 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, eblake, qemu-devel, armbru, lcapitulino

On 02/27/2012 10:46 AM, Kevin Wolf wrote:
> Am 26.02.2012 20:31, schrieb Jeff Cody:
>> This is a QAPI/QMP only command to take a snapshot of a group of
>> devices. This is similar to the blockdev-snapshot-sync command, except
>> blockdev-group-snapshot-sync accepts a list devices, filenames, and
>> formats.
>>
>> It is attempted to keep the snapshot of the group atomic; if the
>> creation or open of any of the new snapshots fails, then all of
>> the new snapshots are abandoned, and the name of the snapshot image
>> that failed is returned.  The failure case should not interrupt
>> any operations.
>>
>> Rather than use bdrv_close() along with a subsequent bdrv_open() to
>> perform the pivot, the original image is never closed and the new
>> image is placed 'in front' of the original image via manipulation
>> of the BlockDriverState fields.  Thus, once the new snapshot image
>> has been successfully created, there are no more failure points
>> before pivoting to the new snapshot.
>>
>> This allows the group of disks to remain consistent with each other,
>> even across snapshot failures.
>>
>> Signed-off-by: Jeff Cody <jcody@redhat.com>
>> ---
>>  block.c          |   47 ++++++++++++++++++++
>>  block.h          |    1 +
>>  blockdev.c       |  128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  qapi-schema.json |   38 ++++++++++++++++
>>  4 files changed, 214 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 3621d11..0045ab1 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -880,6 +880,53 @@ void bdrv_make_anon(BlockDriverState *bs)
>>      bs->device_name[0] = '\0';
>>  }
>>  
>> +/*
>> + * Add new bs contents at the top of an image chain while the chain is live,
>> + * while keeping required fields on the top layer.
>> + *
>> + * It is assumed that bs_new already points to an existing image,
>> + * with the correct backing filename of top->backing_file
>> + */
>> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
>> +{
>> +    BlockDriverState tmp;
>> +
>> +    /* the new bs must not be in bdrv_states */
>> +    bdrv_make_anon(bs_new);
>> +
>> +    tmp = *bs_new;
>> +    tmp.backing_hd = bs_new;
>> +
>> +    /* there are some fields that need to stay on the top layer: */
>> +
>> +    /* dev info */
>> +    tmp.dev_ops          = bs_top->dev_ops;
>> +    tmp.dev_opaque       = bs_top->dev_opaque;
>> +    tmp.dev              = bs_top->dev;
>> +    tmp.buffer_alignment = bs_top->buffer_alignment;
>> +    tmp.copy_on_read     = bs_top->copy_on_read;
>> +
>> +    /* i/o timing parameters */
>> +    tmp.slice_time        = bs_top->slice_time;
>> +    tmp.slice_start       = bs_top->slice_start;
>> +    tmp.slice_end         = bs_top->slice_end;
>> +    tmp.io_limits         = bs_top->io_limits;
>> +    tmp.io_base           = bs_top->io_base;
>> +    tmp.throttled_reqs    = bs_top->throttled_reqs;
>> +    tmp.block_timer       = bs_top->block_timer;
>> +    tmp.io_limits_enabled = bs_top->io_limits_enabled;
> 
> What about iostatus_enabled/iostatus? Possibly on_read/write_error and
> geometry, too. (Maybe the correct answer is that you're doing the right
> thing, but I wanted to bring it up)

I think you are right, those should probably be preserved as well; I
don't know if it hurts to not preserve them, but since these are all set
from drive_init, I think it makes sense I think to keep them on the top
layer.

> 
>> +
>> +    /* keep the same entry in bdrv_states */
>> +    pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name);
>> +    tmp.list = bs_top->list;
>> +
>> +    /* swap contents of the fixed new bs and the current top */
>> +    *bs_new = *bs_top;
>> +    *bs_top = tmp;
>> +
>> +    bdrv_detach_dev(bs_new, bs_new->dev);
>> +}
> 
> The last line would actually deserve a comment /* clear the copied
> fields in the new backing file */, which makes clear that there are
> probably some more fields missing in this section.

OK, added.

> 
>> +
>>  void bdrv_delete(BlockDriverState *bs)
>>  {
>>      assert(!bs->dev);
>> diff --git a/block.h b/block.h
>> index cae289b..190a780 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -114,6 +114,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>>  int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
>>  BlockDriverState *bdrv_new(const char *device_name);
>>  void bdrv_make_anon(BlockDriverState *bs);
>> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
>>  void bdrv_delete(BlockDriverState *bs);
>>  int bdrv_parse_cache_flags(const char *mode, int *flags);
>>  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
>> diff --git a/blockdev.c b/blockdev.c
>> index 05e7c5e..560f7e8 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -714,6 +714,134 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file,
>>      }
>>  }
>>  
>> +
>> +/* New and old BlockDriverState structs for group snapshots */
>> +typedef struct BlkGroupSnapshotStates {
>> +    BlockDriverState *old_bs;
>> +    BlockDriverState *new_bs;
>> +    bool is_open;
>> +    QSIMPLEQ_ENTRY(BlkGroupSnapshotStates) entry;
>> +} BlkGroupSnapshotStates;
>> +
>> +/*
>> + * 'Atomic' group snapshots.  The snapshots are taken as a set, and if any fail
>> + *  then we do not pivot any of the devices in the group, and abandon the
>> + *  snapshots
>> + */
>> +void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list,
>> +                                                   Error **errp)
> 
> Too much indentation?

Sure :)  Fixed.

> 
>> +{
>> +    int ret = 0;
>> +    SnapshotDevList *dev_entry = dev_list;
>> +    SnapshotDev *dev_info = NULL;
>> +    BlkGroupSnapshotStates *states;
>> +    BlockDriver *proto_drv;
>> +    BlockDriver *drv;
>> +    int flags;
>> +    const char *format;
>> +    const char *snapshot_file;
>> +
>> +    QSIMPLEQ_HEAD(snap_bdrv_states, BlkGroupSnapshotStates) snap_bdrv_states;
>> +    QSIMPLEQ_INIT(&snap_bdrv_states);
>> +
>> +    /* We don't do anything in this loop that commits us to the snapshot */
>> +    while (NULL != dev_entry) {
>> +        dev_info = dev_entry->value;
>> +        dev_entry = dev_entry->next;
>> +
>> +        states = g_malloc0(sizeof(BlkGroupSnapshotStates));
>> +        states->is_open = false;
>> +
>> +        states->old_bs = bdrv_find(dev_info->device);
>> +
>> +        if (!states->old_bs) {
>> +            error_set(errp, QERR_DEVICE_NOT_FOUND, dev_info->device);
>> +            goto fail;
>> +        }
>> +
>> +        if (bdrv_in_use(states->old_bs)) {
>> +            error_set(errp, QERR_DEVICE_IN_USE, dev_info->device);
>> +            goto fail;
>> +        }
>> +
>> +        snapshot_file = dev_info->snapshot_file;
>> +
>> +        flags = states->old_bs->open_flags;
>> +
>> +        if (!dev_info->has_format) {
>> +            format = "qcow2";
>> +        } else {
>> +            format = dev_info->format;
>> +        }
>> +
>> +        drv = bdrv_find_format(format);
>> +        if (!drv) {
>> +            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
>> +            goto fail;
>> +        }
>> +
>> +        proto_drv = bdrv_find_protocol(snapshot_file);
>> +        if (!proto_drv) {
>> +            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
>> +            goto fail;
>> +        }
>> +
>> +        /* create new image w/backing file */
>> +        ret = bdrv_img_create(snapshot_file, format,
>> +                              states->old_bs->filename,
>> +                              drv->format_name, NULL, -1, flags);
>> +        if (ret) {
>> +            error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
>> +            goto fail;
>> +        }
>> +
>> +        /* We will manually add the backing_hd field to the bs later */
>> +        states->new_bs = bdrv_new("");
>> +        ret = bdrv_open(states->new_bs, snapshot_file,
>> +                        flags | BDRV_O_NO_BACKING, drv);
>> +        states->is_open = true;
>> +
>> +        if (ret != 0) {
>> +            error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
>> +            goto close_and_fail;
>> +        }
>> +        QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry);
> 
> Inserting it only here means that all the error cases above leak states.

Yes. Will move it up to immediately after the bdrv_open().

> 
>> +    }
>> +
>> +    /*
>> +     * Now we will drain, flush, & pivot everything - we are committed at this
>> +     * point.
>> +     */
>> +    bdrv_drain_all();
> 
> I would feel more comfortable if we could do the bdrv_drain_all() at the
> very beginning of the function. Not that I know of a specific scenario
> that would go wrong, but you have a nested main loop here that could do
> more or less anything.

I can move it up to the beginning if desired...  My thought was that it
was best to drain closer to the point of commit.


> 
>> +    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>> +        bdrv_flush(states->old_bs);
> 
> This can return an error which must be checked. And of course, we must
> do it before committing to the snapshot (but after bdrv_drain_all).

Hmm. If the flush returns error, do we abandon at this point? Perhaps it
would be best to loop through and flush each device first, and if no
flushes fail, _then_ loop through and perform the bdrv_append(). Once we
are calling bdrv_append(), we want no possible failure points.

> 
>> +        /* This removes our old bs from the bdrv_states, and adds the new bs */
>> +        bdrv_append(states->new_bs, states->old_bs);
>> +    }
>> +
>> +    /* success */
>> +    goto exit;
>> +
>> +close_and_fail:
>> +    /*
>> +    * failure, and it is all-or-none; abandon each new bs, and keep using
>> +    * the original bs for all images
>> +    */
>> +    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>> +        if (states->is_open) {
>> +             bdrv_close(states->new_bs);
> 
> bdrv_delete, as Stefan already mentioned.
> 
>> +        }
>> +    }
>> +fail:
>> +exit:
>> +    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>> +        g_free(states);
>> +    }
>> +    return;
>> +}
> 
> Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-27 17:02     ` Jeff Cody
@ 2012-02-27 17:22       ` Kevin Wolf
  2012-02-27 17:31         ` Jeff Cody
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2012-02-27 17:22 UTC (permalink / raw)
  To: jcody; +Cc: pbonzini, eblake, qemu-devel, armbru, lcapitulino

Am 27.02.2012 18:02, schrieb Jeff Cody:
>>> +
>>> +    /* keep the same entry in bdrv_states */
>>> +    pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name);
>>> +    tmp.list = bs_top->list;
>>> +
>>> +    /* swap contents of the fixed new bs and the current top */
>>> +    *bs_new = *bs_top;
>>> +    *bs_top = tmp;
>>> +
>>> +    bdrv_detach_dev(bs_new, bs_new->dev);
>>> +}
>>
>> The last line would actually deserve a comment /* clear the copied
>> fields in the new backing file */, which makes clear that there are
>> probably some more fields missing in this section.
> 
> OK, added.

Only the comment or also clearing other fields? For some of them it's
very obvious that they need to be cleared (copy on read, I/O throttling).

>>> +    }
>>> +
>>> +    /*
>>> +     * Now we will drain, flush, & pivot everything - we are committed at this
>>> +     * point.
>>> +     */
>>> +    bdrv_drain_all();
>>
>> I would feel more comfortable if we could do the bdrv_drain_all() at the
>> very beginning of the function. Not that I know of a specific scenario
>> that would go wrong, but you have a nested main loop here that could do
>> more or less anything.
> 
> I can move it up to the beginning if desired...  My thought was that it
> was best to drain closer to the point of commit.

As long as we don't create new AIO requests here, drained is drained.

But anyway, I'm not requesting a change here, it was just a feeling.

>>
>>> +    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>>> +        bdrv_flush(states->old_bs);
>>
>> This can return an error which must be checked. And of course, we must
>> do it before committing to the snapshot (but after bdrv_drain_all).
> 
> Hmm. If the flush returns error, do we abandon at this point? Perhaps it
> would be best to loop through and flush each device first, and if no
> flushes fail, _then_ loop through and perform the bdrv_append(). Once we
> are calling bdrv_append(), we want no possible failure points.

Yes, this is what I meant. Sorry for the somewhat vague wording.

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
  2012-02-27 17:22       ` Kevin Wolf
@ 2012-02-27 17:31         ` Jeff Cody
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Cody @ 2012-02-27 17:31 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, lcapitulino, eblake, qemu-devel, armbru

On 02/27/2012 12:22 PM, Kevin Wolf wrote:
> Am 27.02.2012 18:02, schrieb Jeff Cody:
>>>> +
>>>> +    /* keep the same entry in bdrv_states */
>>>> +    pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name);
>>>> +    tmp.list = bs_top->list;
>>>> +
>>>> +    /* swap contents of the fixed new bs and the current top */
>>>> +    *bs_new = *bs_top;
>>>> +    *bs_top = tmp;
>>>> +
>>>> +    bdrv_detach_dev(bs_new, bs_new->dev);
>>>> +}
>>>
>>> The last line would actually deserve a comment /* clear the copied
>>> fields in the new backing file */, which makes clear that there are
>>> probably some more fields missing in this section.
>>
>> OK, added.
> 
> Only the comment or also clearing other fields? For some of them it's
> very obvious that they need to be cleared (copy on read, I/O throttling).

Sorry; yes to clearing the other fields as well.  I don't know if it
hurts to leave them not cleared, but logically it makes sense to clear
them out.

> 
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Now we will drain, flush, & pivot everything - we are committed at this
>>>> +     * point.
>>>> +     */
>>>> +    bdrv_drain_all();
>>>
>>> I would feel more comfortable if we could do the bdrv_drain_all() at the
>>> very beginning of the function. Not that I know of a specific scenario
>>> that would go wrong, but you have a nested main loop here that could do
>>> more or less anything.
>>
>> I can move it up to the beginning if desired...  My thought was that it
>> was best to drain closer to the point of commit.
> 
> As long as we don't create new AIO requests here, drained is drained.
> 
> But anyway, I'm not requesting a change here, it was just a feeling.
> 
>>>
>>>> +    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>>>> +        bdrv_flush(states->old_bs);
>>>
>>> This can return an error which must be checked. And of course, we must
>>> do it before committing to the snapshot (but after bdrv_drain_all).
>>
>> Hmm. If the flush returns error, do we abandon at this point? Perhaps it
>> would be best to loop through and flush each device first, and if no
>> flushes fail, _then_ loop through and perform the bdrv_append(). Once we
>> are calling bdrv_append(), we want no possible failure points.
> 
> Yes, this is what I meant. Sorry for the somewhat vague wording.
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-02-27 17:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-26 19:31 [Qemu-devel] [PATCH v2 0/2] Group Live Snapshots Jeff Cody
2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody
2012-02-27 11:13   ` Stefan Hajnoczi
2012-02-27 14:26     ` Jeff Cody
2012-02-27 14:40       ` Stefan Hajnoczi
2012-02-27 15:23         ` Jeff Cody
2012-02-27 15:31           ` Stefan Hajnoczi
2012-02-27 15:46   ` Kevin Wolf
2012-02-27 16:51     ` Paolo Bonzini
2012-02-27 17:02     ` Jeff Cody
2012-02-27 17:22       ` Kevin Wolf
2012-02-27 17:31         ` Jeff Cody
2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 2/2] QMP: Add qmp command for blockdev-group-snapshot-sync Jeff Cody

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.