All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/10] Mirrored block writes
@ 2012-03-06 17:55 Paolo Bonzini
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 01/10] use QSIMPLEQ_FOREACH_SAFE when freeing list elements Paolo Bonzini
                   ` (10 more replies)
  0 siblings, 11 replies; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, fsimonce, eblake, stefanha, lcapitulino

v4 includes Federico's drive-reopen (patch 10) command, fixes another
small bug in Jeff's code (patch 2), and tweaks the union handling for
older compilers.

v3 tested with the following scenarios, v4 only d/e:

a) mirror only

1) create base.qcow2 and starat QEMU with it

2) Execute the following QMP command

{ "execute": "qmp_capabilities" }
{ "execute": "transaction", "arguments":
  {'actions': [
    { 'type': 'drive-mirror', 'data' :
      { 'device': 'ide0-hd0', 'target': '/home/pbonzini/mirror.qcow2' } } ] } }
{ "execute": "cont" }

3) hibernate the guest (this requires an IDE disk and -cpu kvm64,-kvmclock)

4) restart the guest with mirror.qcow2


b) Same as (a) with drive-mirror command.


c) streaming to new image

1) start QEMU with an existing image test.img

2) execute the following HMP commands

drive_mirror -s ide0-hd0 /home/pbonzini/mirror.qed qed
block_stream ide0-hd0

3) shut down the guest (sorry, this took a looong time and I forgot to
hibernate here)

4) move away the base image

5) restart the guest with mirror.qed


d) atomic snapshot+mirror (QMP only):

1) start QEMU with an existing image test.img

2) Execute the following QMP command

{ "execute": "qmp_capabilities" }
{ "execute": "transaction", "arguments":
  {'actions': [
    { 'type': 'blockdev-snapshot-sync', 'data' :
      { 'device': 'ide0-hd0', 'snapshot-file': '/home/pbonzini/base.qcow2' } },
    { 'type': 'drive-mirror', 'data' :
      { 'device': 'ide0-hd0', 'target': '/home/pbonzini/mirror.qcow2' } } ] } }
{ "execute": "cont" }

3) hibernate the guest (this requires an IDE disk and -cpu kvm64,-kvmclock)

4) check that mirror.qcow2 has test.img as the base

5) restart the guest with base.qcow2

6) restart the guest with mirror.qcow2


e) simple snapshot with snapshot_blkdev and snapshot_blkdev -n.

v3->v4:
	avoid literals with anonymous unions in them, add "void *data" to
	all unions (Mark Wu); add drive-reopen (Federico); new patch to fix
	use-after-free in group snapshots; patch splitting nits (Eric).

v2->v3:
        replace reuse argument with NewImageMode enum and mode argument;
        renamed all commands and actions; implemented blockdev-snapshot-sync
        in terms of group snasphots; added new drive-mirror command and
        HMP equivalent.

Federico Simoncelli (1):
  Add the drive-reopen command

Marcelo Tosatti (1):
  Add blkmirror block driver

Paolo Bonzini (8):
  use QSIMPLEQ_FOREACH_SAFE when freeing list elements
  fix format name for backing file
  qapi: complete implementation of unions
  rename blockdev-group-snapshot-sync
  add mode field to blockdev-snapshot-sync transaction item
  qmp: convert blockdev-snapshot-sync to a wrapper around transactions
  add mirroring to transaction
  add drive-mirror command and HMP equivalent

 Makefile.objs             |    2 +-
 block/blkmirror.c         |  239 +++++++++++++++++++++++++++++++++++++++++++++
 blockdev.c                |  202 +++++++++++++++++++++++++++-----------
 hmp-commands.hx           |   47 ++++++++-
 hmp.c                     |   45 ++++++++-
 hmp.h                     |    2 +
 qapi-schema-test.json     |   10 ++
 qapi-schema.json          |  143 +++++++++++++++++++++++----
 qmp-commands.hx           |  135 +++++++++++++++++++++-----
 scripts/qapi-types.py     |    6 +
 scripts/qapi-visit.py     |   31 ++++++-
 test-qmp-input-visitor.c  |   18 ++++
 test-qmp-output-visitor.c |   34 +++++++
 13 files changed, 804 insertions(+), 110 deletions(-)
 create mode 100644 block/blkmirror.c

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 01/10] use QSIMPLEQ_FOREACH_SAFE when freeing list elements
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
@ 2012-03-06 17:55 ` Paolo Bonzini
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 02/10] fix format name for backing file Paolo Bonzini
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, fsimonce, eblake, stefanha, lcapitulino

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index d78aa51..acdc72a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -733,7 +733,7 @@ void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list,
     int ret = 0;
     SnapshotDevList *dev_entry = dev_list;
     SnapshotDev *dev_info = NULL;
-    BlkGroupSnapshotStates *states;
+    BlkGroupSnapshotStates *states, *next;
     BlockDriver *proto_drv;
     BlockDriver *drv;
     int flags;
@@ -838,7 +838,7 @@ delete_and_fail:
         }
     }
 exit:
-    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
+    QSIMPLEQ_FOREACH_SAFE(states, &snap_bdrv_states, entry, next) {
         g_free(states);
     }
     return;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 02/10] fix format name for backing file
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 01/10] use QSIMPLEQ_FOREACH_SAFE when freeing list elements Paolo Bonzini
@ 2012-03-06 17:55 ` Paolo Bonzini
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 03/10] qapi: complete implementation of unions Paolo Bonzini
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, fsimonce, eblake, stefanha, lcapitulino

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index acdc72a..6ed7d61 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -800,7 +800,8 @@ void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list,
         /* create new image w/backing file */
         ret = bdrv_img_create(snapshot_file, format,
                               states->old_bs->filename,
-                              drv->format_name, NULL, -1, flags);
+                              states->old_bs->drv->format_name,
+                              NULL, -1, flags);
         if (ret) {
             error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
             goto delete_and_fail;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 03/10] qapi: complete implementation of unions
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 01/10] use QSIMPLEQ_FOREACH_SAFE when freeing list elements Paolo Bonzini
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 02/10] fix format name for backing file Paolo Bonzini
@ 2012-03-06 17:55 ` Paolo Bonzini
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 04/10] rename blockdev-group-snapshot-sync Paolo Bonzini
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, fsimonce, eblake, stefanha, lcapitulino

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qapi-schema-test.json     |   10 ++++++++++
 scripts/qapi-types.py     |    6 ++++++
 scripts/qapi-visit.py     |   31 ++++++++++++++++++++++++++++++-
 test-qmp-input-visitor.c  |   18 ++++++++++++++++++
 test-qmp-output-visitor.c |   34 ++++++++++++++++++++++++++++++++++
 5 files changed, 98 insertions(+), 1 deletions(-)

diff --git a/qapi-schema-test.json b/qapi-schema-test.json
index 2b38919..8c7f9f7 100644
--- a/qapi-schema-test.json
+++ b/qapi-schema-test.json
@@ -22,6 +22,16 @@
                        'dict2': { 'userdef1': 'UserDefOne', 'string2': 'str' },
                        '*dict3': { 'userdef2': 'UserDefOne', 'string3': 'str' } } } }
 
+# for testing unions
+{ 'type': 'UserDefA',
+  'data': { 'boolean': 'bool' } }
+
+{ 'type': 'UserDefB',
+  'data': { 'integer': 'int' } }
+
+{ 'union': 'UserDefUnion',
+  'data': { 'a' : 'UserDefA', 'b' : 'UserDefB' } }
+
 # testing commands
 { 'command': 'user_def_cmd', 'data': {} }
 { 'command': 'user_def_cmd1', 'data': {'ud1a': 'UserDefOne'} }
diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py
index b56225b..727fb77 100644
--- a/scripts/qapi-types.py
+++ b/scripts/qapi-types.py
@@ -117,6 +117,7 @@ struct %(name)s
 {
     %(name)sKind kind;
     union {
+        void *data;
 ''',
                 name=name)
 
@@ -269,6 +270,7 @@ for expr in exprs:
     elif expr.has_key('union'):
         ret += generate_fwd_struct(expr['union'], expr['data']) + "\n"
         ret += generate_enum('%sKind' % expr['union'], expr['data'].keys())
+        fdef.write(generate_enum_lookup('%sKind' % expr['union'], expr['data'].keys()))
     else:
         continue
     fdecl.write(ret)
@@ -283,6 +285,10 @@ for expr in exprs:
         fdef.write(generate_type_cleanup(expr['type']) + "\n")
     elif expr.has_key('union'):
         ret += generate_union(expr['union'], expr['data'])
+        ret += generate_type_cleanup_decl(expr['union'] + "List")
+        fdef.write(generate_type_cleanup(expr['union'] + "List") + "\n")
+        ret += generate_type_cleanup_decl(expr['union'])
+        fdef.write(generate_type_cleanup(expr['union']) + "\n")
     else:
         continue
     fdecl.write(ret)
diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
index 5160d83..54117d4 100644
--- a/scripts/qapi-visit.py
+++ b/scripts/qapi-visit.py
@@ -110,10 +110,38 @@ def generate_visit_union(name, members):
 
 void visit_type_%(name)s(Visitor *m, %(name)s ** obj, const char *name, Error **errp)
 {
-}
+    Error *err = NULL;
+
+    visit_start_struct(m, (void **)obj, "%(name)s", name, sizeof(%(name)s), &err);
+    visit_type_%(name)sKind(m, &(*obj)->kind, "type", &err);
+    if (err) {
+        error_propagate(errp, err);
+        goto end;
+    }
+    switch ((*obj)->kind) {
 ''',
                  name=name)
 
+    for key in members:
+        ret += mcgen('''
+    case %(abbrev)s_KIND_%(enum)s:
+        visit_type_%(c_type)s(m, &(*obj)->%(c_name)s, "data", errp);
+        break;
+''',
+                abbrev = de_camel_case(name).upper(),
+                enum = de_camel_case(key).upper(),
+                c_type=members[key],
+                c_name=c_var(key))
+
+    ret += mcgen('''
+    default:
+        abort();
+    }
+end:
+    visit_end_struct(m, errp);
+}
+''')
+
     return ret
 
 def generate_declaration(name, members, genlist=True):
@@ -242,6 +270,7 @@ for expr in exprs:
         fdecl.write(ret)
     elif expr.has_key('union'):
         ret = generate_visit_union(expr['union'], expr['data'])
+        ret += generate_visit_list(expr['union'], expr['data'])
         fdef.write(ret)
 
         ret = generate_decl_enum('%sKind' % expr['union'], expr['data'].keys())
diff --git a/test-qmp-input-visitor.c b/test-qmp-input-visitor.c
index 926db5c..1996e49 100644
--- a/test-qmp-input-visitor.c
+++ b/test-qmp-input-visitor.c
@@ -234,6 +234,22 @@ static void test_visitor_in_list(TestInputVisitorData *data,
     qapi_free_UserDefOneList(head);
 }
 
+static void test_visitor_in_union(TestInputVisitorData *data,
+                                  const void *unused)
+{
+    Visitor *v;
+    Error *err = NULL;
+    UserDefUnion *tmp;
+
+    v = visitor_input_test_init(data, "{ 'type': 'b', 'data' : { 'integer': 42 } }");
+
+    visit_type_UserDefUnion(v, &tmp, NULL, &err);
+    g_assert(err == NULL);
+    g_assert_cmpint(tmp->kind, ==, USER_DEF_UNION_KIND_B);
+    g_assert_cmpint(tmp->b->integer, ==, 42);
+    qapi_free_UserDefUnion(tmp);
+}
+
 static void input_visitor_test_add(const char *testpath,
                                    TestInputVisitorData *data,
                                    void (*test_func)(TestInputVisitorData *data, const void *user_data))
@@ -264,6 +280,8 @@ int main(int argc, char **argv)
                             &in_visitor_data, test_visitor_in_struct_nested);
     input_visitor_test_add("/visitor/input/list",
                             &in_visitor_data, test_visitor_in_list);
+    input_visitor_test_add("/visitor/input/union",
+                            &in_visitor_data, test_visitor_in_union);
 
     g_test_run();
 
diff --git a/test-qmp-output-visitor.c b/test-qmp-output-visitor.c
index c94c208..78e5f2d 100644
--- a/test-qmp-output-visitor.c
+++ b/test-qmp-output-visitor.c
@@ -380,6 +380,38 @@ static void test_visitor_out_list_qapi_free(TestOutputVisitorData *data,
     qapi_free_UserDefNestedList(head);
 }
 
+static void test_visitor_out_union(TestOutputVisitorData *data,
+                                   const void *unused)
+{
+    QObject *arg, *qvalue;
+    QDict *qdict, *value;
+
+    Error *err = NULL;
+
+    UserDefUnion *tmp = g_malloc0(sizeof(UserDefUnion));
+    tmp->kind = USER_DEF_UNION_KIND_A;
+    tmp->a = g_malloc0(sizeof(UserDefA));
+    tmp->a->boolean = true;
+
+    visit_type_UserDefUnion(data->ov, &tmp, NULL, &err);
+    g_assert(err == NULL);
+    arg = qmp_output_get_qobject(data->qov);
+
+    g_assert(qobject_type(arg) == QTYPE_QDICT);
+    qdict = qobject_to_qdict(arg);
+
+    g_assert_cmpstr(qdict_get_str(qdict, "type"), ==, "a");
+
+    qvalue = qdict_get(qdict, "data");
+    g_assert(data != NULL);
+    g_assert(qobject_type(qvalue) == QTYPE_QDICT);
+    value = qobject_to_qdict(qvalue);
+    g_assert_cmpint(qdict_get_bool(value, "boolean"), ==, true);
+
+    qapi_free_UserDefUnion(tmp);
+    QDECREF(qdict);
+}
+
 static void output_visitor_test_add(const char *testpath,
                                     TestOutputVisitorData *data,
                                     void (*test_func)(TestOutputVisitorData *data, const void *user_data))
@@ -416,6 +448,8 @@ int main(int argc, char **argv)
                             &out_visitor_data, test_visitor_out_list);
     output_visitor_test_add("/visitor/output/list-qapi-free",
                             &out_visitor_data, test_visitor_out_list_qapi_free);
+    output_visitor_test_add("/visitor/output/union",
+                            &out_visitor_data, test_visitor_out_union);
 
     g_test_run();
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 04/10] rename blockdev-group-snapshot-sync
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
                   ` (2 preceding siblings ...)
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 03/10] qapi: complete implementation of unions Paolo Bonzini
@ 2012-03-06 17:55 ` Paolo Bonzini
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 05/10] add mode field to blockdev-snapshot-sync transaction item Paolo Bonzini
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, fsimonce, eblake, stefanha, lcapitulino

We will add other kinds of operation.  Prepare for this by adjusting
the schema.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |   78 ++++++++++++++++++++++++++++-------------------------
 qapi-schema.json |   42 ++++++++++++++++++----------
 qmp-commands.hx  |   52 +++++++++++++++++++----------------
 3 files changed, 96 insertions(+), 76 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 6ed7d61..df544a3 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -716,31 +716,24 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file,
 
 
 /* New and old BlockDriverState structs for group snapshots */
-typedef struct BlkGroupSnapshotStates {
+typedef struct BlkTransactionStates {
     BlockDriverState *old_bs;
     BlockDriverState *new_bs;
-    QSIMPLEQ_ENTRY(BlkGroupSnapshotStates) entry;
-} BlkGroupSnapshotStates;
+    QSIMPLEQ_ENTRY(BlkTransactionStates) entry;
+} BlkTransactionStates;
 
 /*
  * 'Atomic' group snapshots.  The snapshots are taken as a set, and if any fail
  *  then we do not pivot any of the devices in the group, and abandon the
  *  snapshots
  */
-void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list,
-                                      Error **errp)
+void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
 {
     int ret = 0;
-    SnapshotDevList *dev_entry = dev_list;
-    SnapshotDev *dev_info = NULL;
-    BlkGroupSnapshotStates *states, *next;
-    BlockDriver *proto_drv;
-    BlockDriver *drv;
-    int flags;
-    const char *format;
-    const char *snapshot_file;
+    BlockdevActionList *dev_entry = dev_list;
+    BlkTransactionStates *states, *next;
 
-    QSIMPLEQ_HEAD(snap_bdrv_states, BlkGroupSnapshotStates) snap_bdrv_states;
+    QSIMPLEQ_HEAD(snap_bdrv_states, BlkTransactionStates) snap_bdrv_states;
     QSIMPLEQ_INIT(&snap_bdrv_states);
 
     /* drain all i/o before any snapshots */
@@ -748,21 +741,46 @@ void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list,
 
     /* We don't do anything in this loop that commits us to the snapshot */
     while (NULL != dev_entry) {
+        BlockdevAction *dev_info = NULL;
+        BlockDriver *proto_drv;
+        BlockDriver *drv;
+        int flags;
+        const char *device;
+        const char *format = "qcow2";
+        const char *new_image_file = NULL;
+
         dev_info = dev_entry->value;
         dev_entry = dev_entry->next;
 
-        states = g_malloc0(sizeof(BlkGroupSnapshotStates));
+        states = g_malloc0(sizeof(BlkTransactionStates));
         QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry);
 
-        states->old_bs = bdrv_find(dev_info->device);
+        switch (dev_info->kind) {
+        case BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC:
+            device = dev_info->blockdev_snapshot_sync->device;
+            if (dev_info->blockdev_snapshot_sync->has_format) {
+                format = dev_info->blockdev_snapshot_sync->format;
+            }
+            new_image_file = dev_info->blockdev_snapshot_sync->snapshot_file;
+            break;
+        default:
+            abort();
+        }
+
+        drv = bdrv_find_format(format);
+        if (!drv) {
+            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+            goto delete_and_fail;
+        }
 
+        states->old_bs = bdrv_find(device);
         if (!states->old_bs) {
-            error_set(errp, QERR_DEVICE_NOT_FOUND, dev_info->device);
+            error_set(errp, QERR_DEVICE_NOT_FOUND, device);
             goto delete_and_fail;
         }
 
         if (bdrv_in_use(states->old_bs)) {
-            error_set(errp, QERR_DEVICE_IN_USE, dev_info->device);
+            error_set(errp, QERR_DEVICE_IN_USE, device);
             goto delete_and_fail;
         }
 
@@ -775,44 +793,30 @@ void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list,
             }
         }
 
-        snapshot_file = dev_info->snapshot_file;
-
         flags = states->old_bs->open_flags;
 
-        if (!dev_info->has_format) {
-            format = "qcow2";
-        } else {
-            format = dev_info->format;
-        }
-
-        drv = bdrv_find_format(format);
-        if (!drv) {
-            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
-            goto delete_and_fail;
-        }
-
-        proto_drv = bdrv_find_protocol(snapshot_file);
+        proto_drv = bdrv_find_protocol(new_image_file);
         if (!proto_drv) {
             error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
             goto delete_and_fail;
         }
 
         /* create new image w/backing file */
-        ret = bdrv_img_create(snapshot_file, format,
+        ret = bdrv_img_create(new_image_file, format,
                               states->old_bs->filename,
                               states->old_bs->drv->format_name,
                               NULL, -1, flags);
         if (ret) {
-            error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
+            error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
             goto delete_and_fail;
         }
 
         /* We will manually add the backing_hd field to the bs later */
         states->new_bs = bdrv_new("");
-        ret = bdrv_open(states->new_bs, snapshot_file,
+        ret = bdrv_open(states->new_bs, new_image_file,
                         flags | BDRV_O_NO_BACKING, drv);
         if (ret != 0) {
-            error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
+            error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
             goto delete_and_fail;
         }
     }
diff --git a/qapi-schema.json b/qapi-schema.json
index 5f293c4..1d1ffa6 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1118,7 +1118,7 @@
 { 'command': 'block_resize', 'data': { 'device': 'str', 'size': 'int' }}
 
 ##
-# @SnapshotDev
+# @BlockdevSnapshot
 #
 # @device:  the name of the device to generate the snapshot from.
 #
@@ -1126,19 +1126,30 @@
 #
 # @format: #optional the format of the snapshot image, default is 'qcow2'.
 ##
-{ 'type': 'SnapshotDev',
-  'data': {'device': 'str', 'snapshot-file': 'str', '*format': 'str' } }
+{ 'type': 'BlockdevSnapshot',
+  'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str' } }
+
+##
+# @BlockdevAction
+#
+# A discriminated record of operations that can be performed with
+# @transaction.
+##
+{ 'union': 'BlockdevAction',
+  'data': {
+       'blockdev-snapshot-sync': 'BlockdevSnapshot',
+   } }
 
 ##
-# @blockdev-group-snapshot-sync
+# @transaction
 #
-# Generates a synchronous snapshot of a group of one or more block devices,
-# as atomically as possible.  If the snapshot of any device in the group
-# fails, then the entire group snapshot will be abandoned and the
-# appropriate error returned.
+# Atomically operate on a group of one or more block devices.  If
+# any operation fails, then the entire set of actions will be
+# abandoned and the appropriate error returned.  The only operation
+# supported is currently blockdev-snapshot-sync.
 #
 #  List of:
-#  @SnapshotDev: information needed for the device snapshot
+#  @BlockdevAction: information needed for the device snapshot
 #
 # Returns: nothing on success
 #          If @device is not a valid block device, DeviceNotFound
@@ -1147,13 +1158,14 @@
 #          If @snapshot-file can't be opened, OpenFileFailed
 #          If @format is invalid, InvalidBlockFormat
 #
-# Note: The group snapshot attempt returns failure on the first snapshot
-# device failure.  Therefore, there will be only one device or snapshot file
-# returned in an error condition, and subsequent devices will not have been
-# attempted.
+# Note: The transaction aborts on the first failure.  Therefore, there will
+# be only one device or snapshot file returned in an error condition, and
+# subsequent actions will not have been attempted.
+#
+# Since 1.1
 ##
-{ 'command': 'blockdev-group-snapshot-sync',
-  'data': { 'devlist': [ 'SnapshotDev' ] } }
+{ 'command': 'transaction',
+  'data': { 'actions': [ 'BlockdevAction' ] } }
 
 ##
 # @blockdev-snapshot-sync
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 0c9bfac..fb4f1df 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -687,41 +687,45 @@ EQMP
         .mhandler.cmd_new = qmp_marshal_input_block_job_cancel,
     },
     {
-        .name       = "blockdev-group-snapshot-sync",
-        .args_type  = "devlist:O",
-        .params  = "device:B,snapshot-file:s,format:s?",
-        .mhandler.cmd_new = qmp_marshal_input_blockdev_group_snapshot_sync,
+        .name       = "transaction",
+        .args_type  = "actions:O",
+        .mhandler.cmd_new = qmp_marshal_input_transaction,
     },
 
 SQMP
-blockdev-group-snapshot-sync
-----------------------
-
-Synchronous snapshot of one or more block devices.  A list array input
-is accepted, that contains the device and snapshot file information for
-each device in group. The default format, if not specified, is qcow2.
+transaction
+-----------
 
-If there is any failure creating or opening a new snapshot, all snapshots
-for the group are abandoned, and the original disks pre-snapshot attempt
-are used.
+Atomically operate on one or more block devices.  The only supported
+operation for now is snapshotting.  If there is any failure performing
+any of the operations, all snapshots for the group are abandoned, and
+the original disks pre-snapshot attempt are used.
 
+A list of dictionaries is accepted, that contains the actions to be performed.
+For snapshots this is the device, the file to use for the new snapshot,
+and the format.  The default format, if not specified, is qcow2.
 
 Arguments:
 
-devlist array:
-    - "device": device name to snapshot (json-string)
-    - "snapshot-file": name of new image file (json-string)
-    - "format": format of new image (json-string, optional)
+actions array:
+    - "type": the operation to perform.  The only supported
+      value is "blockdev-snapshot-sync". (json-string)
+    - "data": a dictionary.  The contents depend on the value
+      of "type".  When "type" is "blockdev-snapshot-sync":
+      - "device": device name to snapshot (json-string)
+      - "snapshot-file": name of new image file (json-string)
+      - "format": format of new image (json-string, optional)
 
 Example:
 
--> { "execute": "blockdev-group-snapshot-sync", "arguments":
-                      { "devlist": [{ "device": "ide-hd0",
-                                      "snapshot-file": "/some/place/my-image",
-                                      "format": "qcow2" },
-                                    { "device": "ide-hd1",
-                                      "snapshot-file": "/some/place/my-image2",
-                                      "format": "qcow2" }] } }
+-> { "execute": "transaction",
+     "arguments": { "actions": [
+         { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "ide-hd0",
+                                         "snapshot-file": "/some/place/my-image",
+                                         "format": "qcow2" } },
+         { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "ide-hd1",
+                                         "snapshot-file": "/some/place/my-image2",
+                                         "format": "qcow2" } } ] } }
 <- { "return": {} }
 
 EQMP
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 05/10] add mode field to blockdev-snapshot-sync transaction item
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
                   ` (3 preceding siblings ...)
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 04/10] rename blockdev-group-snapshot-sync Paolo Bonzini
@ 2012-03-06 17:55 ` Paolo Bonzini
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 06/10] qmp: convert blockdev-snapshot-sync to a wrapper around transactions Paolo Bonzini
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, fsimonce, eblake, stefanha, lcapitulino

The mode field lets a management application create the snapshot
destination outside QEMU.

Right now, the only modes are "existing" and "absolute-paths".  Mirroring
introduces "no-backing-file".  In the future "relative-paths" could be
implemented too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |   25 ++++++++++++++++---------
 qapi-schema.json |   21 ++++++++++++++++++++-
 qmp-commands.hx  |   10 ++++++++++
 3 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index df544a3..cb5bf03 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -745,9 +745,10 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
         BlockDriver *proto_drv;
         BlockDriver *drv;
         int flags;
+        enum NewImageMode mode;
+        const char *new_image_file;
         const char *device;
         const char *format = "qcow2";
-        const char *new_image_file = NULL;
 
         dev_info = dev_entry->value;
         dev_entry = dev_entry->next;
@@ -758,10 +759,14 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
         switch (dev_info->kind) {
         case BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC:
             device = dev_info->blockdev_snapshot_sync->device;
+            if (!dev_info->blockdev_snapshot_sync->has_mode) {
+                dev_info->blockdev_snapshot_sync->mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+            }
+            new_image_file = dev_info->blockdev_snapshot_sync->snapshot_file;
             if (dev_info->blockdev_snapshot_sync->has_format) {
                 format = dev_info->blockdev_snapshot_sync->format;
             }
-            new_image_file = dev_info->blockdev_snapshot_sync->snapshot_file;
+            mode = dev_info->blockdev_snapshot_sync->mode;
             break;
         default:
             abort();
@@ -802,13 +807,15 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
         }
 
         /* create new image w/backing file */
-        ret = bdrv_img_create(new_image_file, format,
-                              states->old_bs->filename,
-                              states->old_bs->drv->format_name,
-                              NULL, -1, flags);
-        if (ret) {
-            error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
-            goto delete_and_fail;
+        if (mode != NEW_IMAGE_MODE_EXISTING) {
+            ret = bdrv_img_create(new_image_file, format,
+                                  states->old_bs->filename,
+                                  states->old_bs->drv->format_name,
+                                  NULL, -1, flags);
+            if (ret) {
+                error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
+                goto delete_and_fail;
+            }
         }
 
         /* We will manually add the backing_hd field to the bs later */
diff --git a/qapi-schema.json b/qapi-schema.json
index 1d1ffa6..b062d01 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1118,6 +1118,22 @@
 { 'command': 'block_resize', 'data': { 'device': 'str', 'size': 'int' }}
 
 ##
+# @NewImageMode
+#
+# An enumeration that tells QEMU how to set the backing file path in
+# a new image file.
+#
+# @existing: QEMU should look for an existing image file.
+#
+# @absolute-paths: QEMU should create a new image with absolute paths
+# for the backing file.
+#
+# Since: 1.1
+##
+{ 'enum': 'NewImageMode'
+  'data': [ 'existing', 'absolute-paths' ] }
+
+##
 # @BlockdevSnapshot
 #
 # @device:  the name of the device to generate the snapshot from.
@@ -1127,7 +1145,8 @@
 # @format: #optional the format of the snapshot image, default is 'qcow2'.
 ##
 { 'type': 'BlockdevSnapshot',
-  'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str' } }
+  'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str',
+            '*mode': 'NewImageMode' } }
 
 ##
 # @BlockdevAction
diff --git a/qmp-commands.hx b/qmp-commands.hx
index fb4f1df..7c03b62 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -705,6 +705,13 @@ A list of dictionaries is accepted, that contains the actions to be performed.
 For snapshots this is the device, the file to use for the new snapshot,
 and the format.  The default format, if not specified, is qcow2.
 
+Each new snapshot defaults to being created by QEMU (wiping any
+contents if the file already exists), but it is also possible to reuse
+an externally-created file.  In the latter case, you should ensure that
+the new image file has the same contents as the current one; QEMU cannot
+perform any meaningful check.  Typically this is achieved by using the
+current image file as the backing file for the new image.
+
 Arguments:
 
 actions array:
@@ -715,6 +722,8 @@ actions array:
       - "device": device name to snapshot (json-string)
       - "snapshot-file": name of new image file (json-string)
       - "format": format of new image (json-string, optional)
+      - "mode": whether and how QEMU should create the snapshot file
+        (NewImageMode, optional, default "absolute-paths")
 
 Example:
 
@@ -725,6 +734,7 @@ Example:
                                          "format": "qcow2" } },
          { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "ide-hd1",
                                          "snapshot-file": "/some/place/my-image2",
+                                         "mode": "existing",
                                          "format": "qcow2" } } ] } }
 <- { "return": {} }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 06/10] qmp: convert blockdev-snapshot-sync to a wrapper around transactions
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
                   ` (4 preceding siblings ...)
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 05/10] add mode field to blockdev-snapshot-sync transaction item Paolo Bonzini
@ 2012-03-06 17:55 ` Paolo Bonzini
  2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 07/10] Add blkmirror block driver Paolo Bonzini
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, fsimonce, eblake, stefanha, lcapitulino

Simplify the blockdev-snapshot-sync code and gain failsafe operation
by turning it into a wrapper around the new transaction command.  A new
option is also added matching "mode".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |   85 ++++++++++++++---------------------------------------
 hmp-commands.hx  |    9 ++++--
 hmp.c            |    6 +++-
 qapi-schema.json |   15 +++++----
 qmp-commands.hx  |    2 +
 5 files changed, 44 insertions(+), 73 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index cb5bf03..067cc10 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -646,72 +646,33 @@ void do_commit(Monitor *mon, const QDict *qdict)
     }
 }
 
+static void blockdev_do_action(int kind, void *data, Error **errp)
+{
+    BlockdevAction action;
+    BlockdevActionList list;
+
+    action.kind = kind;
+    action.data = data;
+    list.value = &action;
+    list.next = NULL;
+    qmp_transaction(&list, errp);
+}
+
 void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file,
                                 bool has_format, const char *format,
+                                bool has_mode, enum NewImageMode mode,
                                 Error **errp)
 {
-    BlockDriverState *bs;
-    BlockDriver *drv, *old_drv, *proto_drv;
-    int ret = 0;
-    int flags;
-    char old_filename[1024];
-
-    bs = bdrv_find(device);
-    if (!bs) {
-        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
-        return;
-    }
-    if (bdrv_in_use(bs)) {
-        error_set(errp, QERR_DEVICE_IN_USE, device);
-        return;
-    }
-
-    pstrcpy(old_filename, sizeof(old_filename), bs->filename);
-
-    old_drv = bs->drv;
-    flags = bs->open_flags;
-
-    if (!has_format) {
-        format = "qcow2";
-    }
-
-    drv = bdrv_find_format(format);
-    if (!drv) {
-        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
-        return;
-    }
-
-    proto_drv = bdrv_find_protocol(snapshot_file);
-    if (!proto_drv) {
-        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
-        return;
-    }
-
-    ret = bdrv_img_create(snapshot_file, format, bs->filename,
-                          bs->drv->format_name, NULL, -1, flags);
-    if (ret) {
-        error_set(errp, QERR_UNDEFINED_ERROR);
-        return;
-    }
-
-    bdrv_drain_all();
-    bdrv_flush(bs);
-
-    bdrv_close(bs);
-    ret = bdrv_open(bs, snapshot_file, flags, drv);
-    /*
-     * If reopening the image file we just created fails, fall back
-     * and try to re-open the original image. If that fails too, we
-     * are in serious trouble.
-     */
-    if (ret != 0) {
-        ret = bdrv_open(bs, old_filename, flags, old_drv);
-        if (ret != 0) {
-            error_set(errp, QERR_OPEN_FILE_FAILED, old_filename);
-        } else {
-            error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
-        }
-    }
+    BlockdevSnapshot snapshot = {
+        .device = (char *) device,
+        .snapshot_file = (char *) snapshot_file,
+        .has_format = has_format,
+        .format = (char *) format,
+        .has_mode = has_mode,
+        .mode = mode,
+    };
+    blockdev_do_action(BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC, &snapshot,
+                       errp);
 }
 
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index ed88877..6980214 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -882,14 +882,17 @@ ETEXI
 
     {
         .name       = "snapshot_blkdev",
-        .args_type  = "device:B,snapshot-file:s?,format:s?",
-        .params     = "device [new-image-file] [format]",
+        .args_type  = "reuse:-n,device:B,snapshot-file:s?,format:s?",
+        .params     = "[-n] device [new-image-file] [format]",
         .help       = "initiates a live snapshot\n\t\t\t"
                       "of device. If a new image file is specified, the\n\t\t\t"
                       "new image file will become the new root image.\n\t\t\t"
                       "If format is specified, the snapshot file will\n\t\t\t"
                       "be created in that format. Otherwise the\n\t\t\t"
-                      "snapshot will be internal! (currently unsupported)",
+                      "snapshot will be internal! (currently unsupported).\n\t\t\t"
+                      "The default format is qcow2.  The -n flag requests QEMU\n\t\t\t"
+                      "to reuse the image found in new-image-file, instead of\n\t\t\t"
+                      "recreating it from scratch.",
         .mhandler.cmd = hmp_snapshot_blkdev,
     },
 
diff --git a/hmp.c b/hmp.c
index 3a54455..290c43d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -692,6 +692,8 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
     const char *device = qdict_get_str(qdict, "device");
     const char *filename = qdict_get_try_str(qdict, "snapshot-file");
     const char *format = qdict_get_try_str(qdict, "format");
+    int reuse = qdict_get_try_bool(qdict, "reuse", 0);
+    enum NewImageMode mode;
     Error *errp = NULL;
 
     if (!filename) {
@@ -702,7 +704,9 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
         return;
     }
 
-    qmp_blockdev_snapshot_sync(device, filename, !!format, format, &errp);
+    mode = reuse ? NEW_IMAGE_MODE_EXISTING : NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+    qmp_blockdev_snapshot_sync(device, filename, !!format, format,
+                               true, mode, &errp);
     hmp_handle_error(mon, &errp);
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index b062d01..0d24240 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1143,6 +1143,9 @@
 # @snapshot-file: the target of the new image. A new file will be created.
 #
 # @format: #optional the format of the snapshot image, default is 'qcow2'.
+#
+# @mode: #optional whether and how QEMU should create a new image, default is
+# 'absolute-paths'.
 ##
 { 'type': 'BlockdevSnapshot',
   'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str',
@@ -1199,21 +1202,19 @@
 #
 # @format: #optional the format of the snapshot image, default is 'qcow2'.
 #
+# @mode: #optional whether and how QEMU should create a new image, default is
+# 'absolute-paths'.
+#
 # Returns: nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #          If @snapshot-file can't be opened, OpenFileFailed
 #          If @format is invalid, InvalidBlockFormat
 #
-# Notes: One of the last steps taken by this command is to close the current
-#        image being used by @device and open the @snapshot-file one. If that
-#        fails, the command will try to reopen the original image file. If
-#        that also fails OpenFileFailed will be returned and the guest may get
-#        unexpected errors.
-#
 # Since 0.14.0
 ##
 { 'command': 'blockdev-snapshot-sync',
-  'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str' } }
+  'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str',
+            '*mode': 'NewImageMode'} }
 
 ##
 # @human-monitor-command:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7c03b62..dfe8a5b 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -760,6 +760,8 @@ Arguments:
 
 - "device": device name to snapshot (json-string)
 - "snapshot-file": name of new image file (json-string)
+- "mode": whether and how QEMU should create the snapshot file
+  (NewImageMode, optional, default "absolute-paths")
 - "format": format of new image (json-string, optional)
 
 Example:
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 07/10] Add blkmirror block driver
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
                   ` (5 preceding siblings ...)
  2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 06/10] qmp: convert blockdev-snapshot-sync to a wrapper around transactions Paolo Bonzini
@ 2012-03-06 17:56 ` Paolo Bonzini
  2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 08/10] add mirroring to transaction Paolo Bonzini
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, Marcelo Tosatti, lcapitulino, fsimonce, eblake

From: Marcelo Tosatti <mtosatti@redhat.com>

Mirrored writes are used by live block copy.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.objs     |    2 +-
 block/blkmirror.c |  239 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 240 insertions(+), 1 deletions(-)
 create mode 100644 block/blkmirror.c

diff --git a/Makefile.objs b/Makefile.objs
index 808de6a..982f44b 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -74,7 +74,7 @@ fsdev-obj-$(CONFIG_VIRTFS) += $(addprefix fsdev/, $(fsdev-nested-y))
 # suppress *all* target specific code in case of system emulation, i.e. a
 # single QEMU executable should support all CPUs and machines.
 
-common-obj-y = $(block-obj-y) blockdev.o
+common-obj-y = $(block-obj-y) blockdev.o block/blkmirror.o
 common-obj-y += $(net-obj-y)
 common-obj-y += $(qobject-obj-y)
 common-obj-$(CONFIG_LINUX) += $(fsdev-obj-$(CONFIG_LINUX))
diff --git a/block/blkmirror.c b/block/blkmirror.c
new file mode 100644
index 0000000..0ee2ca6
--- /dev/null
+++ b/block/blkmirror.c
@@ -0,0 +1,239 @@
+/*
+ * Block driver for mirrored writes.
+ *
+ * Copyright (C) 2011-2012 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <stdarg.h>
+#include "block_int.h"
+
+typedef struct DupAIOCB DupAIOCB;
+
+typedef struct SingleAIOCB {
+    BlockDriverAIOCB *aiocb;
+    DupAIOCB *parent;
+} SingleAIOCB;
+
+struct DupAIOCB {
+    BlockDriverAIOCB common;
+    int count;
+    int canceled;
+
+    BlockDriverCompletionFunc *cb;
+    SingleAIOCB aios[2];
+    int ret;
+};
+
+/* Valid blkmirror filenames look like blkmirror:format:path/to/target.
+ *
+ * The driver is not intended for general usage.  It expects bdrv_append
+ * to tack it onto an existing image, which is used as the primary
+ * source and hardcoded to be the backing file for the target.
+ */
+static int blkmirror_open(BlockDriverState *bs, const char *filename, int flags)
+{
+    int ret, n;
+    const char *filename2;
+    char *format;
+    BlockDriver *drv;
+
+    /* Parse the blkmirror: prefix */
+    if (strncmp(filename, "blkmirror:", strlen("blkmirror:"))) {
+        return -EINVAL;
+    }
+    filename += strlen("blkmirror:");
+
+    /* The source image filename is added by bdrv_append.  We only need
+     * to parse and open the destination image and format.  */
+    n = strcspn(filename, ":");
+    if (filename[n] == 0) {
+        format = NULL;
+        filename2 = filename;
+    } else {
+        format = g_strdup(filename);
+        format[n] = 0;
+        filename2 = format + n + 1;
+    }
+
+    drv = bdrv_find_whitelisted_format(format);
+    if (!drv) {
+        ret = -ENOENT;
+        goto out;
+    }
+
+    bs->file = bdrv_new("");
+    if (bs->file == NULL) {
+        ret = -ENOMEM;
+        goto out;
+    }
+
+    /* If we crash, we cannot assume that the destination is a
+     * valid mirror and we have to start over.  So speed up things
+     * by effectively operating on the destination in cache=unsafe
+     * mode.
+     */
+    ret = bdrv_open(bs->file, filename2,
+                    flags | BDRV_O_NO_BACKING | BDRV_O_NO_FLUSH | BDRV_O_CACHE_WB,
+                    drv);
+    if (ret < 0) {
+        goto out;
+    }
+
+out:
+    g_free(format);
+    return ret;
+}
+
+static void blkmirror_close(BlockDriverState *bs)
+{
+    bs->file->backing_hd = NULL;
+
+    /* backing_hd and file closed by the caller.  */
+}
+
+static coroutine_fn int blkmirror_co_flush(BlockDriverState *bs)
+{
+    return bdrv_co_flush(bs->backing_hd);
+}
+
+static int64_t blkmirror_getlength(BlockDriverState *bs)
+{
+    return bdrv_getlength(bs->file);
+}
+
+static int coroutine_fn blkmirror_co_is_allocated(BlockDriverState *bs,
+                                                  int64_t sector_num,
+                                                  int nb_sectors, int *pnum)
+{
+    return bdrv_is_allocated(bs->file, sector_num, nb_sectors, pnum);
+}
+
+static BlockDriverAIOCB *blkmirror_aio_readv(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             QEMUIOVector *qiov, int nb_sectors,
+                                             BlockDriverCompletionFunc *cb,
+                                             void *opaque)
+{
+    return bdrv_aio_readv(bs->backing_hd, sector_num, qiov, nb_sectors,
+                          cb, opaque);
+}
+
+static void dup_aio_cancel(BlockDriverAIOCB *acb)
+{
+    DupAIOCB *dcb = container_of(acb, DupAIOCB, common);
+    int i;
+
+    dcb->canceled = true;
+    for (i = 0 ; i < 2; i++) {
+        if (dcb->aios[i].aiocb) {
+            bdrv_aio_cancel(dcb->aios[i].aiocb);
+        }
+    }
+    qemu_aio_release(dcb);
+}
+
+static AIOPool dup_aio_pool = {
+    .aiocb_size         = sizeof(DupAIOCB),
+    .cancel             = dup_aio_cancel,
+};
+
+static void blkmirror_aio_cb(void *opaque, int ret)
+{
+    SingleAIOCB *scb = opaque;
+    DupAIOCB *dcb = scb->parent;
+
+    scb->aiocb = NULL;
+
+    assert(dcb->count > 0);
+    if (ret < 0 && dcb->ret == 0) {
+        dcb->ret = ret;
+    }
+    if (--dcb->count == 0) {
+        dcb->common.cb(dcb->common.opaque, dcb->ret);
+        if (!dcb->canceled) {
+            qemu_aio_release(dcb);
+        }
+    }
+}
+
+static DupAIOCB *dup_aio_get(BlockDriverState *bs,
+                             BlockDriverCompletionFunc *cb,
+                             void *opaque)
+{
+    DupAIOCB *dcb = qemu_aio_get(&dup_aio_pool, bs, cb, opaque);
+
+    dcb->canceled = false;
+    dcb->aios[0].parent = dcb;
+    dcb->aios[1].parent = dcb;
+    dcb->count = 2;
+    dcb->ret = 0;
+    return dcb;
+}
+
+static BlockDriverAIOCB *blkmirror_aio_writev(BlockDriverState *bs,
+                                              int64_t sector_num,
+                                              QEMUIOVector *qiov,
+                                              int nb_sectors,
+                                              BlockDriverCompletionFunc *cb,
+                                              void *opaque)
+{
+    DupAIOCB *dcb = dup_aio_get(bs, cb, opaque);
+
+    /* bs->backing_hd is set after initialization.  */
+    bs->file->backing_hd = bs->backing_hd;
+
+    dcb->aios[0].aiocb = bdrv_aio_writev(bs->backing_hd, sector_num, qiov,
+                                         nb_sectors, blkmirror_aio_cb,
+                                         &dcb->aios[0]);
+    dcb->aios[1].aiocb = bdrv_aio_writev(bs->file, sector_num, qiov,
+                                         nb_sectors, blkmirror_aio_cb,
+                                         &dcb->aios[1]);
+
+    return &dcb->common;
+}
+
+static BlockDriverAIOCB *blkmirror_aio_discard(BlockDriverState *bs,
+                                               int64_t sector_num,
+                                               int nb_sectors,
+                                               BlockDriverCompletionFunc *cb,
+                                               void *opaque)
+{
+    DupAIOCB *dcb = dup_aio_get(bs, cb, opaque);
+
+    dcb->aios[0].aiocb = bdrv_aio_discard(bs->backing_hd, sector_num,
+                                          nb_sectors, blkmirror_aio_cb,
+                                          &dcb->aios[0]);
+    dcb->aios[1].aiocb = bdrv_aio_discard(bs->file, sector_num,
+                                          nb_sectors, blkmirror_aio_cb,
+                                          &dcb->aios[1]);
+
+    return &dcb->common;
+}
+
+
+static BlockDriver bdrv_blkmirror = {
+    .format_name        = "blkmirror",
+    .protocol_name      = "blkmirror",
+    .instance_size      = 0,
+
+    .bdrv_getlength     = blkmirror_getlength,
+
+    .bdrv_file_open     = blkmirror_open,
+    .bdrv_close         = blkmirror_close,
+    .bdrv_co_flush_to_disk = blkmirror_co_flush,
+    .bdrv_co_is_allocated = blkmirror_co_is_allocated,
+
+    .bdrv_aio_readv      = blkmirror_aio_readv,
+    .bdrv_aio_writev     = blkmirror_aio_writev,
+    .bdrv_aio_discard    = blkmirror_aio_discard,
+};
+
+static void bdrv_blkmirror_init(void)
+{
+    bdrv_register(&bdrv_blkmirror);
+}
+
+block_init(bdrv_blkmirror_init);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 08/10] add mirroring to transaction
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
                   ` (6 preceding siblings ...)
  2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 07/10] Add blkmirror block driver Paolo Bonzini
@ 2012-03-06 17:56 ` Paolo Bonzini
  2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 09/10] add drive-mirror command and HMP equivalent Paolo Bonzini
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, fsimonce, eblake, stefanha, lcapitulino

With it comes a new image creation mode, "no-backing-file", that can
be used to stream an image so that the destination does not need the
original image's backing file(s).

Both bdrv_append and blkmirror will set the backing_hd on the target,
even if the image is created without one, so that both streaming and
copy-on-write work properly (at least with qcow2 or qed, not raw).

Streaming mode works with the following gotchas:

- streaming will rewrite every bit of the source image;

- zero writes are not supported by the blkmirror driver, hence both
  the source and the destination image will grow to full size.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |   53 +++++++++++++++++++++++++++++++++++++++++++----------
 qapi-schema.json |   22 ++++++++++++++++++++--
 qmp-commands.hx  |   16 +++++++++++++---
 3 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 067cc10..9d3a362 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -693,6 +693,7 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
     int ret = 0;
     BlockdevActionList *dev_entry = dev_list;
     BlkTransactionStates *states, *next;
+    char *new_source = NULL;
 
     QSIMPLEQ_HEAD(snap_bdrv_states, BlkTransactionStates) snap_bdrv_states;
     QSIMPLEQ_INIT(&snap_bdrv_states);
@@ -704,7 +705,8 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
     while (NULL != dev_entry) {
         BlockdevAction *dev_info = NULL;
         BlockDriver *proto_drv;
-        BlockDriver *drv;
+        BlockDriver *target_drv;
+        BlockDriver *drv = NULL;
         int flags;
         enum NewImageMode mode;
         const char *new_image_file;
@@ -728,16 +730,36 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
                 format = dev_info->blockdev_snapshot_sync->format;
             }
             mode = dev_info->blockdev_snapshot_sync->mode;
+            new_source = g_strdup(new_image_file);
             break;
+
+        case BLOCKDEV_ACTION_KIND_DRIVE_MIRROR:
+            device = dev_info->drive_mirror->device;
+            drv = bdrv_find_format("blkmirror");
+            if (!dev_info->drive_mirror->has_mode) {
+                dev_info->drive_mirror->mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+            }
+            new_image_file = dev_info->drive_mirror->target;
+            if (dev_info->drive_mirror->has_format) {
+                format = dev_info->drive_mirror->format;
+            }
+            mode = dev_info->drive_mirror->mode;
+            new_source = g_strdup_printf("blkmirror:%s:%s", format,
+                                         dev_info->drive_mirror->target);
+            break;
+
         default:
             abort();
         }
 
-        drv = bdrv_find_format(format);
-        if (!drv) {
+        target_drv = bdrv_find_format(format);
+        if (!target_drv) {
             error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
             goto delete_and_fail;
         }
+        if (!drv) {
+            drv = target_drv;
+        }
 
         states->old_bs = bdrv_find(device);
         if (!states->old_bs) {
@@ -761,7 +783,7 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
 
         flags = states->old_bs->open_flags;
 
-        proto_drv = bdrv_find_protocol(new_image_file);
+        proto_drv = bdrv_find_protocol(new_source);
         if (!proto_drv) {
             error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
             goto delete_and_fail;
@@ -769,10 +791,18 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
 
         /* create new image w/backing file */
         if (mode != NEW_IMAGE_MODE_EXISTING) {
-            ret = bdrv_img_create(new_image_file, format,
-                                  states->old_bs->filename,
-                                  states->old_bs->drv->format_name,
-                                  NULL, -1, flags);
+            if (mode == NEW_IMAGE_MODE_ABSOLUTE_PATHS) {
+                ret = bdrv_img_create(new_image_file, format,
+                                      states->old_bs->filename,
+                                      states->old_bs->drv->format_name,
+                                      NULL, -1, flags);
+            } else {
+                uint64_t size;
+                bdrv_get_geometry(states->old_bs, &size);
+                size *= 512;
+                ret = bdrv_img_create(new_image_file, format,
+                                      NULL, NULL, NULL, size, flags);
+            }
             if (ret) {
                 error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
                 goto delete_and_fail;
@@ -781,12 +811,14 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
 
         /* We will manually add the backing_hd field to the bs later */
         states->new_bs = bdrv_new("");
-        ret = bdrv_open(states->new_bs, new_image_file,
+        ret = bdrv_open(states->new_bs, new_source,
                         flags | BDRV_O_NO_BACKING, drv);
         if (ret != 0) {
-            error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
+            error_set(errp, QERR_OPEN_FILE_FAILED, new_source);
             goto delete_and_fail;
         }
+        g_free(new_source);
+        new_source = NULL;
     }
 
 
@@ -814,6 +846,7 @@ exit:
     QSIMPLEQ_FOREACH_SAFE(states, &snap_bdrv_states, entry, next) {
         g_free(states);
     }
+    g_free(new_source);
     return;
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 0d24240..75ac978 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1133,10 +1133,12 @@
 # @absolute-paths: QEMU should create a new image with absolute paths
 # for the backing file.
 #
+# @no-backing-file: QEMU should create a new image with no backing file.
+#
 # Since: 1.1
 ##
 { 'enum': 'NewImageMode'
-  'data': [ 'existing', 'absolute-paths' ] }
+  'data': [ 'existing', 'absolute-paths', 'no-backing-file' ] }
 
 ##
 # @BlockdevSnapshot
@@ -1152,6 +1152,23 @@
             '*mode': 'NewImageMode' } }
 
 ##
+# @BlockdevMirror
+#
+# @device:  the name of the device to start mirroring.
+#
+# @target: the image that will start receiving writes for @device. A new
+#          file will be created unless @mode is "existing".
+#
+# @format: #optional the format of the target image, default is 'qcow2'.
+#
+# @mode: #optional whether and how QEMU should create a new image, default is
+# 'absolute-paths'.
+##
+{ 'type': 'BlockdevMirror',
+  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
+            '*mode': 'NewImageMode' } }
+
+##
 # @BlockdevAction
 #
 # A discriminated record of operations that can be performed with
@@ -1159,6 +1176,7 @@
 { 'union': 'BlockdevAction',
   'data': {
        'blockdev-snapshot-sync': 'BlockdevSnapshot',
+       'drive-mirror': 'BlockdevMirror',
    } }
 
 ##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index dfe8a5b..0438632 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -696,12 +696,16 @@ SQMP
 transaction
 -----------
 
-Atomically operate on one or more block devices.  The only supported
-operation for now is snapshotting.  If there is any failure performing
+Atomically operate on one or more block devices, snapshotting them
+or enabling mirrored writes.  If there is any failure performing
 any of the operations, all snapshots for the group are abandoned, and
 the original disks pre-snapshot attempt are used.
 
+Mirrored writes keep the previous image file open, and start writing
+data also to the new file specified in the command.
+
 A list of dictionaries is accepted, that contains the actions to be performed.
+
 For snapshots this is the device, the file to use for the new snapshot,
 and the format.  The default format, if not specified, is qcow2.
 
@@ -716,7 +720,7 @@ Arguments:
 
 actions array:
     - "type": the operation to perform.  The only supported
-      value is "blockdev-snapshot-sync". (json-string)
+      values are "blockdev-snapshot-sync" and "mirror". (json-string)
     - "data": a dictionary.  The contents depend on the value
       of "type".  When "type" is "blockdev-snapshot-sync":
       - "device": device name to snapshot (json-string)
@@ -724,6 +728,12 @@ actions array:
       - "format": format of new image (json-string, optional)
       - "mode": whether and how QEMU should create the snapshot file
         (NewImageMode, optional, default "absolute-paths")
+      When "type" is "drive-mirror":
+      - "device": device name to snapshot (json-string)
+      - "target": name of destination image file (json-string)
+      - "format": format of new image (json-string, optional)
+      - "mode": how QEMU should look for an existing image file
+        (NewImageMode, optional, default "absolute-paths")
 
 Example:
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 09/10] add drive-mirror command and HMP equivalent
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
                   ` (7 preceding siblings ...)
  2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 08/10] add mirroring to transaction Paolo Bonzini
@ 2012-03-06 17:56 ` Paolo Bonzini
  2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command Paolo Bonzini
  2012-03-09 15:36 ` [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Kevin Wolf
  10 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, fsimonce, eblake, stefanha, lcapitulino

Since QMP transactions are supposed to take (pseudo) QMP commands,
add a standalone drive-mirror command.

The corresponding HMP command does not provide atomic snapshot+mirror,
but it can still be used together with image streaming.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |   15 +++++++++++++++
 hmp-commands.hx  |   22 ++++++++++++++++++++++
 hmp.c            |   28 ++++++++++++++++++++++++++++
 hmp.h            |    1 +
 qapi-schema.json |   27 +++++++++++++++++++++++++++
 qmp-commands.hx  |   33 +++++++++++++++++++++++++++++++++
 6 files changed, 126 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 9d3a362..2958419 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -675,6 +675,21 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file,
                        errp);
 }
 
+void qmp_drive_mirror(const char *device, const char *target,
+                      bool has_format, const char *format,
+                      bool has_mode, enum NewImageMode mode, Error **errp)
+{
+    BlockdevMirror mirror = {
+        .device = (char *) device,
+        .target = (char *) target,
+        .has_format = has_format,
+        .format = (char *) format,
+        .has_mode = has_mode,
+        .mode = mode,
+    };
+    blockdev_do_action(BLOCKDEV_ACTION_KIND_DRIVE_MIRROR, &mirror, errp);
+}
+
 
 /* New and old BlockDriverState structs for group snapshots */
 typedef struct BlkTransactionStates {
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 6980214..9c49c33 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -903,6 +903,28 @@ Snapshot device, using snapshot file as target if provided
 ETEXI
 
     {
+        .name       = "drive_mirror",
+        .args_type  = "reuse:-n,no-backing:-s,device:B,target:s,format:s?",
+        .params     = "[-n] [-s] device [new-image-file] [format]",
+        .help       = "initiates live storage\n\t\t\t"
+                      "migration for a device. New writes are mirrored to the\n\t\t\t"
+                      "specified new image file, and the block_stream\n\t\t\t"
+                      "command can then be started.\n\t\t\t"
+                      "The default format is qcow2.  The -n flag requests QEMU\n\t\t\t"
+                      "to reuse the image found in new-image-file, instead of\n\t\t\t"
+                      "recreating it from scratch.  The -s flag requests QEMU\n\t\t\t"
+                      "to create a root image, that does not have the current\n\t\t\t"
+                      "image as the backing file.",
+        .mhandler.cmd = hmp_drive_mirror,
+    },
+STEXI
+@item drive_mirror
+@findex drive_mirror
+Start mirroring a block device's writes to a new destination,
+using the specified target.
+ETEXI
+
+    {
         .name       = "drive_add",
         .args_type  = "pci_addr:s,opts:s",
         .params     = "[[<domain>:]<bus>:]<slot>\n"
diff --git a/hmp.c b/hmp.c
index 290c43d..e706db9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -687,6 +687,34 @@ void hmp_block_resize(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &errp);
 }
 
+void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
+{
+    const char *device = qdict_get_str(qdict, "device");
+    const char *filename = qdict_get_try_str(qdict, "target");
+    const char *format = qdict_get_try_str(qdict, "format");
+    int reuse = qdict_get_try_bool(qdict, "reuse", 0);
+    int no_backing = qdict_get_try_bool(qdict, "no-backing", 0);
+    enum NewImageMode mode;
+    Error *errp = NULL;
+
+    if (!filename) {
+        error_set(&errp, QERR_MISSING_PARAMETER, "target");
+        hmp_handle_error(mon, &errp);
+        return;
+    }
+
+    if (reuse) {
+        mode = NEW_IMAGE_MODE_EXISTING;
+    } else if (no_backing) {
+        mode = NEW_IMAGE_MODE_NO_BACKING_FILE;
+    } else {
+        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+    }
+
+    qmp_drive_mirror(device, filename, !!format, format, true, mode, &errp);
+    hmp_handle_error(mon, &errp);
+}
+
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
 {
     const char *device = qdict_get_str(qdict, "device");
diff --git a/hmp.h b/hmp.h
index 5409464..5352f00 100644
--- a/hmp.h
+++ b/hmp.h
@@ -48,6 +48,7 @@ void hmp_block_passwd(Monitor *mon, const QDict *qdict);
 void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_block_resize(Monitor *mon, const QDict *qdict);
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
+void hmp_drive_mirror(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index 75ac978..4cebf78 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1263,6 +1263,33 @@
   'returns': 'str' } 
 
 ##
+# @drive-mirror
+#
+# Start mirroring a block device's writes to a new destination.
+#
+# @device:  the name of the device whose writes should be mirrored.
+#
+# @target: the target of the new image. If the file exists, or if it
+#          is a device, the existing file/device will be used as the new
+#          destination.  If it does not exist, a new file will be created.
+#
+# @format: #optional the format of the new destination, default is 'qcow2'.
+#
+# @mode: #optional whether and how QEMU should create a new image, default is
+# 'absolute-paths'.
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+#          If @target can't be opened, OpenFileFailed
+#          If @format is invalid, InvalidBlockFormat
+#
+# Since 1.1
+##
+{ 'command': 'drive-mirror',
+  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
+            '*mode': 'NewImageMode'} }
+
+##
 # @migrate_cancel
 #
 # Cancel the current executing migration process.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 0438632..122ebe7 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -785,6 +785,39 @@ Example:
 EQMP
 
     {
+        .name       = "drive-mirror",
+        .args_type  = "device:B,snapshot-file:s,mode:s?,format:s?",
+        .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
+    },
+
+SQMP
+drive-mirror
+------------
+
+Start mirroring a block device's writes to a new destination. target
+specifies the target of the new image. If the file exists, or if it is
+a device, it will be used as the new destination for writes. If does not
+exist, a new file will be created. format specifies the format of the
+mirror image, default is qcow2.
+
+Arguments:
+
+- "device": device name to operate on (json-string)
+- "target": name of new image file (json-string)
+- "format": format of new image (json-string, optional)
+- "mode": how an image file should be created into the target
+  file/device (NewImageMode, optional, default 'absolute-paths')
+
+Example:
+
+-> { "execute": "drive-mirror", "arguments": { "device": "ide-hd0",
+                                               "target": "/some/place/my-image",
+                                               "format": "qcow2" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "balloon",
         .args_type  = "value:M",
         .mhandler.cmd_new = qmp_marshal_input_balloon,
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
                   ` (8 preceding siblings ...)
  2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 09/10] add drive-mirror command and HMP equivalent Paolo Bonzini
@ 2012-03-06 17:56 ` Paolo Bonzini
  2012-03-13 20:48   ` Eric Blake
  2012-03-09 15:36 ` [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Kevin Wolf
  10 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-06 17:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, fsimonce, eblake, stefanha, lcapitulino

From: Federico Simoncelli <fsimonce@redhat.com>

Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |   63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hmp-commands.hx  |   16 +++++++++++++
 hmp.c            |   11 +++++++++
 hmp.h            |    1 +
 qapi-schema.json |   22 ++++++++++++++++++
 qmp-commands.hx  |   30 +++++++++++++++++++++++++
 6 files changed, 143 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 2958419..df8ce14 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -646,6 +646,69 @@ void do_commit(Monitor *mon, const QDict *qdict)
     }
 }
 
+static void change_blockdev_image(const char *device, const char *new_image_file,
+                                  const char *format, Error **errp)
+{
+    BlockDriverState *bs;
+    BlockDriver *drv, *old_drv, *proto_drv;
+    int ret = 0;
+    int flags;
+    char old_filename[1024];
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+    if (bdrv_in_use(bs)) {
+        error_set(errp, QERR_DEVICE_IN_USE, device);
+        return;
+    }
+
+    pstrcpy(old_filename, sizeof(old_filename), bs->filename);
+
+    old_drv = bs->drv;
+    flags = bs->open_flags;
+
+    drv = bdrv_find_format(format);
+    if (!drv) {
+        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+        return;
+    }
+
+    proto_drv = bdrv_find_protocol(new_image_file);
+    if (!proto_drv) {
+        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+        return;
+    }
+
+    bdrv_drain_all();
+    bdrv_flush(bs);
+
+    bdrv_close(bs);
+    ret = bdrv_open(bs, new_image_file, flags, drv);
+    /*
+     * If reopening the image file we just created fails, fall back
+     * and try to re-open the original image. If that fails too, we
+     * are in serious trouble.
+     */
+    if (ret != 0) {
+        ret = bdrv_open(bs, old_filename, flags, old_drv);
+        if (ret != 0) {
+            error_set(errp, QERR_OPEN_FILE_FAILED, old_filename);
+        } else {
+            error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
+        }
+    }
+}
+
+void qmp_drive_reopen(const char *device, const char *new_image_file,
+                      bool has_format, const char *format, Error **errp)
+{
+    change_blockdev_image(device, new_image_file,
+                          has_format ? format : "qcow2", errp);
+}
+
 static void blockdev_do_action(int kind, void *data, Error **errp)
 {
     BlockdevAction action;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 9c49c33..4afde71 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -925,6 +925,22 @@ using the specified target.
 ETEXI
 
     {
+        .name       = "drive_reopen",
+        .args_type  = "device:B,new-image-file:s,format:s?",
+        .params     = "device new-image-file [format]",
+        .help       = "Assigns a new image file to a device.\n\t\t\t"
+                      "The image will be opened using the format\n\t\t\t"
+                      "specified or 'qcow2' by default.\n\t\t\t",
+        .mhandler.cmd = hmp_drive_reopen,
+    },
+
+STEXI
+@item drive_reopen
+@findex drive_reopen
+Assigns a new image file to a device.
+ETEXI
+
+    {
         .name       = "drive_add",
         .args_type  = "pci_addr:s,opts:s",
         .params     = "[[<domain>:]<bus>:]<slot>\n"
diff --git a/hmp.c b/hmp.c
index e706db9..ec41d83 100644
--- a/hmp.c
+++ b/hmp.c
@@ -738,6 +738,17 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &errp);
 }
 
+void hmp_drive_reopen(Monitor *mon, const QDict *qdict)
+{
+    const char *device = qdict_get_str(qdict, "device");
+    const char *filename = qdict_get_str(qdict, "new-image-file");
+    const char *format = qdict_get_try_str(qdict, "format");
+    Error *errp = NULL;
+
+    qmp_drive_reopen(device, filename, !!format, format, &errp);
+    hmp_handle_error(mon, &errp);
+}
+
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
 {
     qmp_migrate_cancel(NULL);
diff --git a/hmp.h b/hmp.h
index 5352f00..648a84f 100644
--- a/hmp.h
+++ b/hmp.h
@@ -49,6 +49,7 @@ void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_block_resize(Monitor *mon, const QDict *qdict);
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
 void hmp_drive_mirror(Monitor *mon, const QDict *qdict);
+void hmp_drive_reopen(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index 4cebf78..21eb256 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1235,6 +1235,28 @@
             '*mode': 'NewImageMode'} }
 
 ##
+# @drive-reopen
+#
+# Assigns a new image file to a device.
+#
+# @device: the name of the device for which we are changing the image file.
+#
+# @new-image-file: the target of the new image. If the file doesn't exists the
+#                  command will fail.
+#
+# @format: #optional the format of the new image, default is 'qcow2'.
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+#          If @new-image-file can't be opened, OpenFileFailed
+#          If @format is invalid, InvalidBlockFormat
+#
+# Since 1.1
+##
+{ 'command': 'drive-reopen',
+  'data': { 'device': 'str', 'new-image-file': 'str', '*format': 'str' } }
+
+##
 # @human-monitor-command:
 #
 # Execute a command on the human monitor and return the output.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 122ebe7..7fc30c2 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -818,6 +818,36 @@ Example:
 EQMP
 
     {
+        .name       = "drive-reopen",
+        .args_type  = "device:B,new-image-file:s,format:s?",
+        .mhandler.cmd_new = qmp_marshal_input_drive_reopen,
+    },
+
+SQMP
+drive-reopen
+------------
+
+Assigns a new image file to a device. Except extremely rare cases where the
+guest is expecting the drive to change its content, the new image should
+contain the same data of the current one.  One use case is to terminate
+a drive-mirror command.
+
+Arguments:
+
+- "device": device name to operate on (json-string)
+- "new-image-file": name of new image file (json-string)
+- "format": format of new image (json-string, optional)
+
+Example:
+
+-> { "execute": "drive-reopen", "arguments": {"device": "ide-hd0",
+                                    "new-image-file": "/some/place/my-image",
+                                    "format": "qcow2" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "balloon",
         .args_type  = "value:M",
         .mhandler.cmd_new = qmp_marshal_input_balloon,
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/10] Mirrored block writes
  2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
                   ` (9 preceding siblings ...)
  2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command Paolo Bonzini
@ 2012-03-09 15:36 ` Kevin Wolf
  10 siblings, 0 replies; 21+ messages in thread
From: Kevin Wolf @ 2012-03-09 15:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: fsimonce, eblake, qemu-devel, stefanha, lcapitulino

Am 06.03.2012 18:55, schrieb Paolo Bonzini:
> v4 includes Federico's drive-reopen (patch 10) command, fixes another
> small bug in Jeff's code (patch 2), and tweaks the union handling for
> older compilers.
> 
> v3 tested with the following scenarios, v4 only d/e:
> 
> a) mirror only
> 
> 1) create base.qcow2 and starat QEMU with it
> 
> 2) Execute the following QMP command
> 
> { "execute": "qmp_capabilities" }
> { "execute": "transaction", "arguments":
>   {'actions': [
>     { 'type': 'drive-mirror', 'data' :
>       { 'device': 'ide0-hd0', 'target': '/home/pbonzini/mirror.qcow2' } } ] } }
> { "execute": "cont" }
> 
> 3) hibernate the guest (this requires an IDE disk and -cpu kvm64,-kvmclock)
> 
> 4) restart the guest with mirror.qcow2
> 
> 
> b) Same as (a) with drive-mirror command.
> 
> 
> c) streaming to new image
> 
> 1) start QEMU with an existing image test.img
> 
> 2) execute the following HMP commands
> 
> drive_mirror -s ide0-hd0 /home/pbonzini/mirror.qed qed
> block_stream ide0-hd0
> 
> 3) shut down the guest (sorry, this took a looong time and I forgot to
> hibernate here)
> 
> 4) move away the base image
> 
> 5) restart the guest with mirror.qed
> 
> 
> d) atomic snapshot+mirror (QMP only):
> 
> 1) start QEMU with an existing image test.img
> 
> 2) Execute the following QMP command
> 
> { "execute": "qmp_capabilities" }
> { "execute": "transaction", "arguments":
>   {'actions': [
>     { 'type': 'blockdev-snapshot-sync', 'data' :
>       { 'device': 'ide0-hd0', 'snapshot-file': '/home/pbonzini/base.qcow2' } },
>     { 'type': 'drive-mirror', 'data' :
>       { 'device': 'ide0-hd0', 'target': '/home/pbonzini/mirror.qcow2' } } ] } }
> { "execute": "cont" }
> 
> 3) hibernate the guest (this requires an IDE disk and -cpu kvm64,-kvmclock)
> 
> 4) check that mirror.qcow2 has test.img as the base
> 
> 5) restart the guest with base.qcow2
> 
> 6) restart the guest with mirror.qcow2
> 
> 
> e) simple snapshot with snapshot_blkdev and snapshot_blkdev -n.
> 
> v3->v4:
> 	avoid literals with anonymous unions in them, add "void *data" to
> 	all unions (Mark Wu); add drive-reopen (Federico); new patch to fix
> 	use-after-free in group snapshots; patch splitting nits (Eric).
> 
> v2->v3:
>         replace reuse argument with NewImageMode enum and mode argument;
>         renamed all commands and actions; implemented blockdev-snapshot-sync
>         in terms of group snasphots; added new drive-mirror command and
>         HMP equivalent.
> 
> Federico Simoncelli (1):
>   Add the drive-reopen command
> 
> Marcelo Tosatti (1):
>   Add blkmirror block driver
> 
> Paolo Bonzini (8):
>   use QSIMPLEQ_FOREACH_SAFE when freeing list elements
>   fix format name for backing file
>   qapi: complete implementation of unions
>   rename blockdev-group-snapshot-sync
>   add mode field to blockdev-snapshot-sync transaction item
>   qmp: convert blockdev-snapshot-sync to a wrapper around transactions
>   add mirroring to transaction
>   add drive-mirror command and HMP equivalent
> 
>  Makefile.objs             |    2 +-
>  block/blkmirror.c         |  239 +++++++++++++++++++++++++++++++++++++++++++++
>  blockdev.c                |  202 +++++++++++++++++++++++++++-----------
>  hmp-commands.hx           |   47 ++++++++-
>  hmp.c                     |   45 ++++++++-
>  hmp.h                     |    2 +
>  qapi-schema-test.json     |   10 ++
>  qapi-schema.json          |  143 +++++++++++++++++++++++----
>  qmp-commands.hx           |  135 +++++++++++++++++++++-----
>  scripts/qapi-types.py     |    6 +
>  scripts/qapi-visit.py     |   31 ++++++-
>  test-qmp-input-visitor.c  |   18 ++++
>  test-qmp-output-visitor.c |   34 +++++++
>  13 files changed, 804 insertions(+), 110 deletions(-)
>  create mode 100644 block/blkmirror.c

Thanks, applied patches 1-6 for now. I'll review the mirroring part later.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
  2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command Paolo Bonzini
@ 2012-03-13 20:48   ` Eric Blake
  2012-03-14  0:14     ` Federico Simoncelli
  2012-03-14  9:17     ` Kevin Wolf
  0 siblings, 2 replies; 21+ messages in thread
From: Eric Blake @ 2012-03-13 20:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, fsimonce, qemu-devel, stefanha, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 2955 bytes --]

On 03/06/2012 10:56 AM, Paolo Bonzini wrote:
> From: Federico Simoncelli <fsimonce@redhat.com>
> 
> Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

>  ##
> +# @drive-reopen
> +#
> +# Assigns a new image file to a device.
> +#
> +# @device: the name of the device for which we are changing the image file.
> +#
> +# @new-image-file: the target of the new image. If the file doesn't exists the
> +#                  command will fail.
> +#
> +# @format: #optional the format of the new image, default is 'qcow2'.
> +#
> +# Returns: nothing on success
> +#          If @device is not a valid block device, DeviceNotFound
> +#          If @new-image-file can't be opened, OpenFileFailed
> +#          If @format is invalid, InvalidBlockFormat
> +#
> +# Since 1.1
> +##
> +{ 'command': 'drive-reopen',
> +  'data': { 'device': 'str', 'new-image-file': 'str', '*format': 'str' } }

I still think we need a 'drive-reopen' action included in 'transaction',
as an 11/10 on this series.  For disk migration, it is true that you can
migrate one disk at a time, and therefore only need to reopen one disk
at a time, to get the guarantee that for a single disk image, the
current state of that image will be guaranteed to be consistent using
only one storage domain.

But since the API allows the creation of two mirrors in one command, I'm
worried that someone will try to start a mirror on two disks at once,
but then be stuck doing two separate 'drive-reopen' commands.  If the
first succeeds but the second fails, you have now stranded the qemu
process across two storage domains, which is exactly what we were trying
to avoid in the first place by inventing transactions.  That is, even if
all disks are individually consistent in a single domain, the act of
migrating then reopening one disk at a time means you will have a window
where disk 1 and disk 2 are opened on different storage domains.

Besides, I'm planning on implementing libvirt support for the
'drive-reopen' command by adding a flag to virDomainSnapshotDelete
(basically, the presence of the flag states that for all mirrored disks
in a given snapshot, libvirt will then issue a drive-reopen that pivots
qemu over to the mirror, and finally delete the snapshot now that
mirroring is no longer needed).  With separate commands, if drive-reopen
on disk 1 succeeds, then drive-reopen on disk 2 fails, I can attempt a
rollback by doing another drive-reopen on disk 1; but the rollback will
be incomplete since I have lost the ability to reopen the mirroring.  I
would much rather issue a 'transaction' with multiple reopen commands,
and knowing that either all disks were reopened and the mirrors
discarded, or that none were reopened and the mirroring remains intact.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
  2012-03-13 20:48   ` Eric Blake
@ 2012-03-14  0:14     ` Federico Simoncelli
  2012-03-14  9:34       ` Kevin Wolf
  2012-03-14  9:17     ` Kevin Wolf
  1 sibling, 1 reply; 21+ messages in thread
From: Federico Simoncelli @ 2012-03-14  0:14 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, Paolo Bonzini, lcapitulino, qemu-devel, stefanha

----- Original Message -----
> From: "Eric Blake" <eblake@redhat.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: kwolf@redhat.com, fsimonce@redhat.com, qemu-devel@nongnu.org, stefanha@linux.vnet.ibm.com, lcapitulino@redhat.com
> Sent: Tuesday, March 13, 2012 9:48:10 PM
> Subject: Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
> 
> On 03/06/2012 10:56 AM, Paolo Bonzini wrote:
> > From: Federico Simoncelli <fsimonce@redhat.com>
> > 
> > Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> >  ##
> > +# @drive-reopen
> > +#
> > +# Assigns a new image file to a device.
> > +#
> > +# @device: the name of the device for which we are changing the
> > image file.
> > +#
> > +# @new-image-file: the target of the new image. If the file
> > doesn't exists the
> > +#                  command will fail.
> > +#
> > +# @format: #optional the format of the new image, default is
> > 'qcow2'.
> > +#
> > +# Returns: nothing on success
> > +#          If @device is not a valid block device, DeviceNotFound
> > +#          If @new-image-file can't be opened, OpenFileFailed
> > +#          If @format is invalid, InvalidBlockFormat
> > +#
> > +# Since 1.1
> > +##
> > +{ 'command': 'drive-reopen',
> > +  'data': { 'device': 'str', 'new-image-file': 'str', '*format':
> > 'str' } }
> 
> I still think we need a 'drive-reopen' action included in
> 'transaction',
> as an 11/10 on this series.  For disk migration, it is true that you
> can
> migrate one disk at a time, and therefore only need to reopen one
> disk
> at a time, to get the guarantee that for a single disk image, the
> current state of that image will be guaranteed to be consistent using
> only one storage domain.

I'm not sure if this was already addressed on this mailing list but
the main problem is that as general rule a qcow file cannot be opened
in r/w mode twice. I believe there was only one exception to that
with the live migration and it generated several issues.

That said, reopen is really hard to be implemented as a transaction
without breaking that rule. For example in the blkmirror case you'd
need to open the destination image in r/w while the mirroring is in
action (already having the same image in r/w mode).

There are several solutions here but they are either really hard to
implement or non definitive. For example:

* We could try to implement the reopen command for each special case,
  eg: blkmirror, reopening the same image, etc... and in such cases
  reusing the same bs that we already have. The downside is that this
  command will be coupled with all this special cases.

* We could use the transaction APIs without actually making it
  transaction (if we fail in the middle we can't rollback). The only
  advantage of this is that we'd provide a consistent API to libvirt
  and we would postpone the problem to the future. Anyway I strongly
  discourage this as it's completely unsafe and it's going to break
  the transaction semantic. Moreover it's a solution that relies too
  much on the hope of finding something appropriate in the future.

* We could leave it as it is, a distinct command that is not part of
  the transaction and that it's closing the old image before opening
  the new one.

> But since the API allows the creation of two mirrors in one command,
> I'm
> worried that someone will try to start a mirror on two disks at once,
> but then be stuck doing two separate 'drive-reopen' commands.  If the
> first succeeds but the second fails, you have now stranded the qemu
> process across two storage domains, which is exactly what we were
> trying
> to avoid in the first place by inventing transactions.  That is, even
> if
> all disks are individually consistent in a single domain, the act of
> migrating then reopening one disk at a time means you will have a
> window
> where disk 1 and disk 2 are opened on different storage domains.

This is not completely correct, the main intent was to not spread one
image chain across two storage domains (making it incomplete if one of
them was missing). In the next oVirt release a VM can have different
disks on different storage domains, so this wouldn't be a special case
but just a normal situation.

-- 
Federico

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
  2012-03-13 20:48   ` Eric Blake
  2012-03-14  0:14     ` Federico Simoncelli
@ 2012-03-14  9:17     ` Kevin Wolf
  2012-03-14  9:19       ` Paolo Bonzini
  1 sibling, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2012-03-14  9:17 UTC (permalink / raw)
  To: Eric Blake; +Cc: Paolo Bonzini, fsimonce, qemu-devel, stefanha, lcapitulino

Am 13.03.2012 21:48, schrieb Eric Blake:
> On 03/06/2012 10:56 AM, Paolo Bonzini wrote:
>> From: Federico Simoncelli <fsimonce@redhat.com>
>>
>> Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
>>  ##
>> +# @drive-reopen
>> +#
>> +# Assigns a new image file to a device.
>> +#
>> +# @device: the name of the device for which we are changing the image file.
>> +#
>> +# @new-image-file: the target of the new image. If the file doesn't exists the
>> +#                  command will fail.
>> +#
>> +# @format: #optional the format of the new image, default is 'qcow2'.
>> +#
>> +# Returns: nothing on success
>> +#          If @device is not a valid block device, DeviceNotFound
>> +#          If @new-image-file can't be opened, OpenFileFailed
>> +#          If @format is invalid, InvalidBlockFormat
>> +#
>> +# Since 1.1
>> +##
>> +{ 'command': 'drive-reopen',
>> +  'data': { 'device': 'str', 'new-image-file': 'str', '*format': 'str' } }
> 
> I still think we need a 'drive-reopen' action included in 'transaction',
> as an 11/10 on this series.

If we want to do this,  it needs to be the same patch, as we couple the
transaction actions with top-level commands as long as there is no other
way to discover the possible actions. And it probably makes more sense
anyway, because the top-level command would be just a thin wrapper
around the transactional one.

Only problem is that just moving the code there doesn't make it suitable
for a transaction and doing an all-or-nothing drive-reopen isn't quite
trivial.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
  2012-03-14  9:17     ` Kevin Wolf
@ 2012-03-14  9:19       ` Paolo Bonzini
  2012-03-14  9:35         ` Kevin Wolf
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2012-03-14  9:19 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: fsimonce, Eric Blake, qemu-devel, stefanha, lcapitulino

Il 14/03/2012 10:17, Kevin Wolf ha scritto:
> If we want to do this,  it needs to be the same patch, as we couple the
> transaction actions with top-level commands as long as there is no other
> way to discover the possible actions. And it probably makes more sense
> anyway, because the top-level command would be just a thin wrapper
> around the transactional one.
> 
> Only problem is that just moving the code there doesn't make it suitable
> for a transaction and doing an all-or-nothing drive-reopen isn't quite
> trivial.

We can also add a "transactionable" item to query-commands.  If we do it
after 1.1, in absence of it only blockdev-snapshot-sync and drive-mirror
are transactionable.  Otherwise we can do it now.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
  2012-03-14  0:14     ` Federico Simoncelli
@ 2012-03-14  9:34       ` Kevin Wolf
  2012-03-14 13:11         ` Eric Blake
  2012-03-14 13:29         ` Federico Simoncelli
  0 siblings, 2 replies; 21+ messages in thread
From: Kevin Wolf @ 2012-03-14  9:34 UTC (permalink / raw)
  To: Federico Simoncelli
  Cc: stefanha, Markus Armbruster, qemu-devel, lcapitulino,
	Paolo Bonzini, Eric Blake

Am 14.03.2012 01:14, schrieb Federico Simoncelli:
> ----- Original Message -----
>> From: "Eric Blake" <eblake@redhat.com>
>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>> Cc: kwolf@redhat.com, fsimonce@redhat.com, qemu-devel@nongnu.org, stefanha@linux.vnet.ibm.com, lcapitulino@redhat.com
>> Sent: Tuesday, March 13, 2012 9:48:10 PM
>> Subject: Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
>>
>> On 03/06/2012 10:56 AM, Paolo Bonzini wrote:
>>> From: Federico Simoncelli <fsimonce@redhat.com>
>>>
>>> Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>>>  ##
>>> +# @drive-reopen
>>> +#
>>> +# Assigns a new image file to a device.
>>> +#
>>> +# @device: the name of the device for which we are changing the
>>> image file.
>>> +#
>>> +# @new-image-file: the target of the new image. If the file
>>> doesn't exists the
>>> +#                  command will fail.
>>> +#
>>> +# @format: #optional the format of the new image, default is
>>> 'qcow2'.
>>> +#
>>> +# Returns: nothing on success
>>> +#          If @device is not a valid block device, DeviceNotFound
>>> +#          If @new-image-file can't be opened, OpenFileFailed
>>> +#          If @format is invalid, InvalidBlockFormat
>>> +#
>>> +# Since 1.1
>>> +##
>>> +{ 'command': 'drive-reopen',
>>> +  'data': { 'device': 'str', 'new-image-file': 'str', '*format':
>>> 'str' } }
>>
>> I still think we need a 'drive-reopen' action included in
>> 'transaction',
>> as an 11/10 on this series.  For disk migration, it is true that you
>> can
>> migrate one disk at a time, and therefore only need to reopen one
>> disk
>> at a time, to get the guarantee that for a single disk image, the
>> current state of that image will be guaranteed to be consistent using
>> only one storage domain.
> 
> I'm not sure if this was already addressed on this mailing list but
> the main problem is that as general rule a qcow file cannot be opened
> in r/w mode twice. I believe there was only one exception to that
> with the live migration and it generated several issues.

In fact the same is true for any image. There are just some special
cases that happen to work anyway, but using them is not more than a hack.

> That said, reopen is really hard to be implemented as a transaction
> without breaking that rule. For example in the blkmirror case you'd
> need to open the destination image in r/w while the mirroring is in
> action (already having the same image in r/w mode).
> 
> There are several solutions here but they are either really hard to
> implement or non definitive. For example:
> 
> * We could try to implement the reopen command for each special case,
>   eg: blkmirror, reopening the same image, etc... and in such cases
>   reusing the same bs that we already have. The downside is that this
>   command will be coupled with all this special cases.

The problem we're trying to solve is that we have a graph of open
BlockDriverStates (connected by bs->file, backing file relations etc.)
and we want to transform it into a different graph of open BDSs
atomically, reusing zero, one or many of the existing BDSs, and possibly
with changed properties (cache mode, read-only, etc.)

What we can have reasonably easily (there are patches floating around,
they just need to be completed), is a bdrv_reopen() that changes flags
on one given BDS, without changing the file it's backed by. This is
already broken up into prepare/commit/abort stages as we need it to
reopen VMDK's split images safely.

In theory this should be enough to build the new graph by opening yet
unused BDSs, preparing the reopen of reused ones and only if all of that
was successful, committing the bdrv_reopen and changing the relations
between the nodes. I hope that at the same time it's clear that this
isn't exactly trivial to implement.

> * We could use the transaction APIs without actually making it
>   transaction (if we fail in the middle we can't rollback). The only
>   advantage of this is that we'd provide a consistent API to libvirt
>   and we would postpone the problem to the future. Anyway I strongly
>   discourage this as it's completely unsafe and it's going to break
>   the transaction semantic. Moreover it's a solution that relies too
>   much on the hope of finding something appropriate in the future.

This is not an option. Advertising transactional behaviour and not
implementing it is just plain wrong.

> * We could leave it as it is, a distinct command that is not part of
>   the transaction and that it's closing the old image before opening
>   the new one.

Yes, this would be the short-term preliminary solution. I would tend to
leave it to downstreams to implement it as an extension, though.

> This is not completely correct, the main intent was to not spread one
> image chain across two storage domains (making it incomplete if one of
> them was missing). In the next oVirt release a VM can have different
> disks on different storage domains, so this wouldn't be a special case
> but just a normal situation.

The problem with this kind of argument is that we're not developing only
for oVirt, but need to look for what makes sense for any management tool
(or even just direct users of qemu).

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
  2012-03-14  9:19       ` Paolo Bonzini
@ 2012-03-14  9:35         ` Kevin Wolf
  0 siblings, 0 replies; 21+ messages in thread
From: Kevin Wolf @ 2012-03-14  9:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: fsimonce, Eric Blake, qemu-devel, stefanha, lcapitulino

Am 14.03.2012 10:19, schrieb Paolo Bonzini:
> Il 14/03/2012 10:17, Kevin Wolf ha scritto:
>> If we want to do this,  it needs to be the same patch, as we couple the
>> transaction actions with top-level commands as long as there is no other
>> way to discover the possible actions. And it probably makes more sense
>> anyway, because the top-level command would be just a thin wrapper
>> around the transactional one.
>>
>> Only problem is that just moving the code there doesn't make it suitable
>> for a transaction and doing an all-or-nothing drive-reopen isn't quite
>> trivial.
> 
> We can also add a "transactionable" item to query-commands.  If we do it
> after 1.1, in absence of it only blockdev-snapshot-sync and drive-mirror
> are transactionable.  Otherwise we can do it now.

Sounds reasonable. If we want to do it, I would prefer to do it right now.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
  2012-03-14  9:34       ` Kevin Wolf
@ 2012-03-14 13:11         ` Eric Blake
  2012-03-14 14:30           ` Kevin Wolf
  2012-03-14 13:29         ` Federico Simoncelli
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Blake @ 2012-03-14 13:11 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: stefanha, Markus Armbruster, qemu-devel, lcapitulino,
	Federico Simoncelli, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 4591 bytes --]

On 03/14/2012 03:34 AM, Kevin Wolf wrote:

>> That said, reopen is really hard to be implemented as a transaction
>> without breaking that rule. For example in the blkmirror case you'd
>> need to open the destination image in r/w while the mirroring is in
>> action (already having the same image in r/w mode).
>>
>> There are several solutions here but they are either really hard to
>> implement or non definitive. For example:
>>
>> * We could try to implement the reopen command for each special case,
>>   eg: blkmirror, reopening the same image, etc... and in such cases
>>   reusing the same bs that we already have. The downside is that this
>>   command will be coupled with all this special cases.
> 
> The problem we're trying to solve is that we have a graph of open
> BlockDriverStates (connected by bs->file, backing file relations etc.)
> and we want to transform it into a different graph of open BDSs
> atomically, reusing zero, one or many of the existing BDSs, and possibly
> with changed properties (cache mode, read-only, etc.)
> 
> What we can have reasonably easily (there are patches floating around,
> they just need to be completed), is a bdrv_reopen() that changes flags
> on one given BDS, without changing the file it's backed by. This is
> already broken up into prepare/commit/abort stages as we need it to
> reopen VMDK's split images safely.
> 
> In theory this should be enough to build the new graph by opening yet
> unused BDSs, preparing the reopen of reused ones and only if all of that
> was successful, committing the bdrv_reopen and changing the relations
> between the nodes. I hope that at the same time it's clear that this
> isn't exactly trivial to implement.

Agreed that 1) it looks implementable, and 2) it does not look trivial.

> 
>> * We could use the transaction APIs without actually making it
>>   transaction (if we fail in the middle we can't rollback). The only
>>   advantage of this is that we'd provide a consistent API to libvirt
>>   and we would postpone the problem to the future. Anyway I strongly
>>   discourage this as it's completely unsafe and it's going to break
>>   the transaction semantic. Moreover it's a solution that relies too
>>   much on the hope of finding something appropriate in the future.
> 
> This is not an option. Advertising transactional behaviour and not
> implementing it is just plain wrong.

Absolutely concur.  From libvirt's perspective, I will be assuming that:

if 'transaction' exists, then it supports 'blockdev-snapshot-sync' and
'drive-mirror' (assuming that these are the only two commands that are
in 'transaction' in qemu 1.1), but nothing else is safe to use without a
further probe.  Then, if down the road, we go through the pain of making
'drive-reopen' transactionable, then there will be some new
introspection command (one idea was an optional argument to
query-commands which limits output to just the commands that can also
occur in a 'transaction'), and libvirt will have to use that
introspection before using the additional features.

> 
>> * We could leave it as it is, a distinct command that is not part of
>>   the transaction and that it's closing the old image before opening
>>   the new one.
> 
> Yes, this would be the short-term preliminary solution. I would tend to
> leave it to downstreams to implement it as an extension, though.

Correct - I'm fine with libvirt using the direct, non-atomic,
'drive-reopen' for the immediate needs of oVirt while we work on a more
complete solution for down the road.

> 
>> This is not completely correct, the main intent was to not spread one
>> image chain across two storage domains (making it incomplete if one of
>> them was missing). In the next oVirt release a VM can have different
>> disks on different storage domains, so this wouldn't be a special case
>> but just a normal situation.
> 
> The problem with this kind of argument is that we're not developing only
> for oVirt, but need to look for what makes sense for any management tool
> (or even just direct users of qemu).

Indeed - that's why I still think it's dangerous to not have an atomic
'drive-reopen', even if oVirt can be coded to work in spite of an
initial implementation being non-atomic.  But there's a difference
between wish-lists (atomic reopen) and practicality (what can we
implement now), and I'm not going to insist on the impossible :)

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
  2012-03-14  9:34       ` Kevin Wolf
  2012-03-14 13:11         ` Eric Blake
@ 2012-03-14 13:29         ` Federico Simoncelli
  1 sibling, 0 replies; 21+ messages in thread
From: Federico Simoncelli @ 2012-03-14 13:29 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: stefanha, Markus Armbruster, qemu-devel, lcapitulino,
	Paolo Bonzini, Eric Blake

----- Original Message -----
> From: "Kevin Wolf" <kwolf@redhat.com>
> To: "Federico Simoncelli" <fsimonce@redhat.com>
> Cc: "Eric Blake" <eblake@redhat.com>, qemu-devel@nongnu.org, stefanha@linux.vnet.ibm.com, lcapitulino@redhat.com,
> "Paolo Bonzini" <pbonzini@redhat.com>, "Markus Armbruster" <armbru@redhat.com>
> Sent: Wednesday, March 14, 2012 10:34:08 AM
> Subject: Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
> 
> > * We could leave it as it is, a distinct command that is not part
> > of
> >   the transaction and that it's closing the old image before
> >   opening
> >   the new one.
> 
> Yes, this would be the short-term preliminary solution. I would tend
> to
> leave it to downstreams to implement it as an extension, though.
> 
> > This is not completely correct, the main intent was to not spread
> > one image chain across two storage domains (making it incomplete if one
> > of them was missing). In the next oVirt release a VM can have
> > different disks on different storage domains, so this wouldn't be a special
> > case but just a normal situation.
> 
> The problem with this kind of argument is that we're not developing
> only for oVirt, but need to look for what makes sense for any management
> tool (or even just direct users of qemu).

That is a general rule, and it's perfectly agreeable, but it has nothing to
do with what I was saying. Do you know any management tool or any reason to
forbid to a VM to have different disks on different storages?
In fact qemu-kvm doesn't even know where the images are coming from (what
particular iscsi connection or what particular NFS server).

What I was asking was a tool (mirroring) to avoid the split of an image
chain between multiple storages, and not a tool to avoid the ability to
have full image chains on different storages.

There is nothing oVirt specific here, it came into play only when I said that
for us is not a problem (for once :-).

-- 
Federico

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
  2012-03-14 13:11         ` Eric Blake
@ 2012-03-14 14:30           ` Kevin Wolf
  0 siblings, 0 replies; 21+ messages in thread
From: Kevin Wolf @ 2012-03-14 14:30 UTC (permalink / raw)
  To: Eric Blake
  Cc: stefanha, Markus Armbruster, qemu-devel, lcapitulino,
	Federico Simoncelli, Paolo Bonzini

Am 14.03.2012 14:11, schrieb Eric Blake:
>>> * We could use the transaction APIs without actually making it
>>>   transaction (if we fail in the middle we can't rollback). The only
>>>   advantage of this is that we'd provide a consistent API to libvirt
>>>   and we would postpone the problem to the future. Anyway I strongly
>>>   discourage this as it's completely unsafe and it's going to break
>>>   the transaction semantic. Moreover it's a solution that relies too
>>>   much on the hope of finding something appropriate in the future.
>>
>> This is not an option. Advertising transactional behaviour and not
>> implementing it is just plain wrong.
> 
> Absolutely concur.  From libvirt's perspective, I will be assuming that:
> 
> if 'transaction' exists, then it supports 'blockdev-snapshot-sync' and
> 'drive-mirror' (assuming that these are the only two commands that are
> in 'transaction' in qemu 1.1), but nothing else is safe to use without a
> further probe.

I don't think it's good to conclude alone from the presence of
'transaction' that subcommands exist. Current master does have
'transaction' (and the 'blockdev-snapshot-sync' action), but not yet
'drive-mirror'.

>  Then, if down the road, we go through the pain of making
> 'drive-reopen' transactionable, then there will be some new
> introspection command (one idea was an optional argument to
> query-commands which limits output to just the commands that can also
> occur in a 'transaction'), and libvirt will have to use that
> introspection before using the additional features.

We can have the introspection command now. Either a new option like you
suggest, or even simpler, just add a 'transactionable': 'bool' to the
existing CommandInfo.

>>> * We could leave it as it is, a distinct command that is not part of
>>>   the transaction and that it's closing the old image before opening
>>>   the new one.
>>
>> Yes, this would be the short-term preliminary solution. I would tend to
>> leave it to downstreams to implement it as an extension, though.
> 
> Correct - I'm fine with libvirt using the direct, non-atomic,
> 'drive-reopen' for the immediate needs of oVirt while we work on a more
> complete solution for down the road.

One important difference that probably matters even now is that the
proposed drive-reopen can fail in the middle, where the old BDS is
closed, but the new one couldn't be opened. In this case the disk will
be lost. libvirt must handle this case in some way.

>>> This is not completely correct, the main intent was to not spread one
>>> image chain across two storage domains (making it incomplete if one of
>>> them was missing). In the next oVirt release a VM can have different
>>> disks on different storage domains, so this wouldn't be a special case
>>> but just a normal situation.
>>
>> The problem with this kind of argument is that we're not developing only
>> for oVirt, but need to look for what makes sense for any management tool
>> (or even just direct users of qemu).
> 
> Indeed - that's why I still think it's dangerous to not have an atomic
> 'drive-reopen', even if oVirt can be coded to work in spite of an
> initial implementation being non-atomic.  But there's a difference
> between wish-lists (atomic reopen) and practicality (what can we
> implement now), and I'm not going to insist on the impossible :)

Well, at least upstream you always have the option to reject a feature
instead of taking something broken. And I'm still not quite sure whether
to use this option with part of this series or not.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-03-14 14:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-06 17:55 [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 01/10] use QSIMPLEQ_FOREACH_SAFE when freeing list elements Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 02/10] fix format name for backing file Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 03/10] qapi: complete implementation of unions Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 04/10] rename blockdev-group-snapshot-sync Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 05/10] add mode field to blockdev-snapshot-sync transaction item Paolo Bonzini
2012-03-06 17:55 ` [Qemu-devel] [PATCH v4 06/10] qmp: convert blockdev-snapshot-sync to a wrapper around transactions Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 07/10] Add blkmirror block driver Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 08/10] add mirroring to transaction Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 09/10] add drive-mirror command and HMP equivalent Paolo Bonzini
2012-03-06 17:56 ` [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command Paolo Bonzini
2012-03-13 20:48   ` Eric Blake
2012-03-14  0:14     ` Federico Simoncelli
2012-03-14  9:34       ` Kevin Wolf
2012-03-14 13:11         ` Eric Blake
2012-03-14 14:30           ` Kevin Wolf
2012-03-14 13:29         ` Federico Simoncelli
2012-03-14  9:17     ` Kevin Wolf
2012-03-14  9:19       ` Paolo Bonzini
2012-03-14  9:35         ` Kevin Wolf
2012-03-09 15:36 ` [Qemu-devel] [PATCH v4 00/10] Mirrored block writes Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.