All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support
@ 2015-09-22  7:44 Wen Congyang
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child Wen Congyang
                   ` (5 more replies)
  0 siblings, 6 replies; 45+ messages in thread
From: Wen Congyang @ 2015-09-22  7:44 UTC (permalink / raw)
  To: qemu devel, Eric Blake, Markus Armbruster, Alberto Garcia,
	Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Yang Hongyang

If quorum's child is broken, we can use mirror job to replace it.
But sometimes, the user only need to remove the broken child, and
add it later when the problem is fixed.

It is based on the following patch:
http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html

ChangLog:
v5:
1. Address Eric Blake's comments
v4:
1. drop nbd driver's implementation. We can use human-monitor-command
   to do it.
2. Rename the command name.
v3:
1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
   created by the QMP command blockdev-add.
2. The driver NBD can support filename, path, host:port now.
v2:
1. Use bdrv_get_device_or_node_name() instead of new function
   bdrv_get_id_or_node_name()
2. Update the error message
3. Update the documents in block-core.json

Wen Congyang (4):
  Add new block driver interface to add/delete a BDS's child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  qmp: add monitor command to add/remove a child
  hmp: add monitor command to add/remove a child

 block.c                   | 56 ++++++++++++++++++++++++++++++++++--
 block/quorum.c            | 72 +++++++++++++++++++++++++++++++++++++++++++++--
 blockdev.c                | 48 +++++++++++++++++++++++++++++++
 hmp-commands.hx           | 28 ++++++++++++++++++
 hmp.c                     | 20 +++++++++++++
 hmp.h                     |  2 ++
 include/block/block.h     |  8 ++++++
 include/block/block_int.h |  5 ++++
 qapi/block-core.json      | 34 ++++++++++++++++++++++
 qmp-commands.hx           | 61 +++++++++++++++++++++++++++++++++++++++
 10 files changed, 329 insertions(+), 5 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child
  2015-09-22  7:44 [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Wen Congyang
@ 2015-09-22  7:44 ` Wen Congyang
  2015-10-07 13:35   ` Alberto Garcia
                     ` (2 more replies)
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child() Wen Congyang
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 45+ messages in thread
From: Wen Congyang @ 2015-09-22  7:44 UTC (permalink / raw)
  To: qemu devel, Eric Blake, Markus Armbruster, Alberto Garcia,
	Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang, zhanghailiang

In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block.c                   | 50 +++++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |  5 +++++
 include/block/block_int.h |  5 +++++
 3 files changed, 60 insertions(+)

diff --git a/block.c b/block.c
index e815d73..1b25e43 100644
--- a/block.c
+++ b/block.c
@@ -4265,3 +4265,53 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
 {
     return &bs->stats;
 }
+
+/*
+ * Hot add/remove a BDS's child. So the user can take a child offline when
+ * it is broken and take a new child online
+ */
+void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
+                    Error **errp)
+{
+
+    if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
+        error_setg(errp, "The BDS %s doesn't support adding a child",
+                   bdrv_get_device_or_node_name(parent_bs));
+        return;
+    }
+
+    if (!QLIST_EMPTY(&child_bs->parents)) {
+        error_setg(errp, "The BDS %s already has parent",
+                   child_bs->node_name);
+        return;
+    }
+
+    parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
+}
+
+void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
+                    Error **errp)
+{
+    BdrvChild *child;
+
+    if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
+        error_setg(errp, "The BDS %s doesn't support removing a child",
+                   bdrv_get_device_or_node_name(parent_bs));
+        return;
+    }
+
+    QLIST_FOREACH(child, &parent_bs->children, next) {
+        if (child->bs == child_bs) {
+            break;
+        }
+    }
+
+    if (!child) {
+        error_setg(errp, "BDS %s is not a child of %s",
+                   bdrv_get_device_or_node_name(child_bs),
+                   bdrv_get_device_or_node_name(parent_bs));
+        return;
+    }
+
+    parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
+}
diff --git a/include/block/block.h b/include/block/block.h
index ef67353..665c56f 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -616,4 +616,9 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
 
 BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 
+void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
+                    Error **errp);
+void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
+                    Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2f2c47b..64cbc55 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -288,6 +288,11 @@ struct BlockDriver {
      */
     int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+    void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
+                           Error **errp);
+    void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
+                           Error **errp);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
  2015-09-22  7:44 [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Wen Congyang
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child Wen Congyang
@ 2015-09-22  7:44 ` Wen Congyang
  2015-10-07 14:12   ` Alberto Garcia
  2015-10-07 18:51   ` [Qemu-devel] [Qemu-block] " Max Reitz
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child Wen Congyang
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 45+ messages in thread
From: Wen Congyang @ 2015-09-22  7:44 UTC (permalink / raw)
  To: qemu devel, Eric Blake, Markus Armbruster, Alberto Garcia,
	Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang, zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c               |  6 ++---
 block/quorum.c        | 72 +++++++++++++++++++++++++++++++++++++++++++++++++--
 include/block/block.h |  3 +++
 3 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index 1b25e43..01f6d69 100644
--- a/block.c
+++ b/block.c
@@ -1079,9 +1079,9 @@ static int bdrv_fill_options(QDict **options, const char **pfilename,
     return 0;
 }
 
-static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-                                    BlockDriverState *child_bs,
-                                    const BdrvChildRole *child_role)
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+                             BlockDriverState *child_bs,
+                             const BdrvChildRole *child_role)
 {
     BdrvChild *child = g_new(BdrvChild, 1);
     *child = (BdrvChild) {
diff --git a/block/quorum.c b/block/quorum.c
index 8fe53b4..111a57b 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -66,6 +66,9 @@ typedef struct QuorumVotes {
 typedef struct BDRVQuorumState {
     BlockDriverState **bs; /* children BlockDriverStates */
     int num_children;      /* children count */
+    int max_children;      /* The maximum children count, we need to reallocate
+                            * bs if num_children grows larger than maximum.
+                            */
     int threshold;         /* if less than threshold children reads gave the
                             * same result a quorum error occurs.
                             */
@@ -874,9 +877,9 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
         ret = -EINVAL;
         goto exit;
     }
-    if (s->num_children < 2) {
+    if (s->num_children < 1) {
         error_setg(&local_err,
-                   "Number of provided children must be greater than 1");
+                   "Number of provided children must be 1 or more");
         ret = -EINVAL;
         goto exit;
     }
@@ -925,6 +928,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     /* allocate the children BlockDriverState array */
     s->bs = g_new0(BlockDriverState *, s->num_children);
     opened = g_new0(bool, s->num_children);
+    s->max_children = s->num_children;
 
     for (i = 0; i < s->num_children; i++) {
         char indexstr[32];
@@ -995,6 +999,67 @@ static void quorum_attach_aio_context(BlockDriverState *bs,
     }
 }
 
+static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
+                             Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+
+    bdrv_drain(bs);
+
+    if (s->num_children == s->max_children) {
+        if (s->max_children >= INT_MAX) {
+            error_setg(errp, "Too many children");
+            return;
+        }
+
+        s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
+        s->bs[s->num_children] = NULL;
+        s->max_children++;
+    }
+
+    bdrv_ref(child_bs);
+    bdrv_attach_child(bs, child_bs, &child_format);
+    s->bs[s->num_children++] = child_bs;
+}
+
+static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
+                             Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+    BdrvChild *child;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        if (s->bs[i] == child_bs) {
+            break;
+        }
+    }
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        if (child->bs == child_bs) {
+            break;
+        }
+    }
+
+    /* we have checked it in bdrv_del_child() */
+    assert(i < s->num_children && child);
+
+    if (s->num_children <= s->threshold) {
+        error_setg(errp,
+            "The number of children cannot be lower than the vote threshold %d",
+            s->threshold);
+        return;
+    }
+
+    bdrv_drain(bs);
+    /* We can safely remove this child now */
+    memmove(&s->bs[i], &s->bs[i + 1],
+            (s->num_children - i - 1) * sizeof(void *));
+    s->num_children--;
+    s->bs[s->num_children] = NULL;
+    bdrv_unref_child(bs, child);
+}
+
 static void quorum_refresh_filename(BlockDriverState *bs)
 {
     BDRVQuorumState *s = bs->opaque;
@@ -1049,6 +1114,9 @@ static BlockDriver bdrv_quorum = {
     .bdrv_detach_aio_context            = quorum_detach_aio_context,
     .bdrv_attach_aio_context            = quorum_attach_aio_context,
 
+    .bdrv_add_child                     = quorum_add_child,
+    .bdrv_del_child                     = quorum_del_child,
+
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
 };
diff --git a/include/block/block.h b/include/block/block.h
index 665c56f..bd97399 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -514,6 +514,9 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 void bdrv_ref(BlockDriverState *bs);
 void bdrv_unref(BlockDriverState *bs);
 void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+                             BlockDriverState *child_bs,
+                             const BdrvChildRole *child_role);
 
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-09-22  7:44 [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Wen Congyang
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child Wen Congyang
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child() Wen Congyang
@ 2015-09-22  7:44 ` Wen Congyang
  2015-10-07 14:33   ` Alberto Garcia
  2015-10-07 19:42   ` [Qemu-devel] [Qemu-block] " Max Reitz
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 4/4] hmp: " Wen Congyang
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 45+ messages in thread
From: Wen Congyang @ 2015-09-22  7:44 UTC (permalink / raw)
  To: qemu devel, Eric Blake, Markus Armbruster, Alberto Garcia,
	Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang, zhanghailiang

The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
It justs for adding/removing quorum's child now, and don't support all
kinds of children, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 blockdev.c           | 48 +++++++++++++++++++++++++++++++++++++++++
 qapi/block-core.json | 34 +++++++++++++++++++++++++++++
 qmp-commands.hx      | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 143 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 32b04b4..8da0ffb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3086,6 +3086,54 @@ fail:
     qmp_output_visitor_cleanup(ov);
 }
 
+void qmp_x_blockdev_child_add(const char *parent, const char *child,
+                              Error **errp)
+{
+    BlockDriverState *parent_bs, *child_bs;
+    Error *local_err = NULL;
+
+    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
+    if (!parent_bs) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    child_bs = bdrv_find_node(child);
+    if (!child_bs) {
+        error_setg(errp, "Node '%s' not found", child);
+        return;
+    }
+
+    bdrv_add_child(parent_bs, child_bs, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+}
+
+void qmp_x_blockdev_child_del(const char *parent, const char *child,
+                              Error **errp)
+{
+    BlockDriverState *parent_bs, *child_bs;
+    Error *local_err = NULL;
+
+    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
+    if (!parent_bs) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    child_bs = bdrv_find_node(child);
+    if (!child_bs) {
+        error_setg(errp, "Node '%s' not found", child);
+        return;
+    }
+
+    bdrv_del_child(parent_bs, child_bs, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+}
+
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
     BlockJobInfoList *head = NULL, **p_next = &head;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index bb2189e..9418f05 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2114,3 +2114,37 @@
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @x-blockdev-child-add
+#
+# Add a new child to the parent BDS. Currently only the Quorum driver
+# implements this feature. This is useful to fix a broken quorum child.
+#
+# @parent: graph node name or id which the child will be added to.
+#
+# @child: graph node name that will be added.
+#
+# Note: this command is experimental, and not a stable API.
+#
+# Since: 2.5
+##
+{ 'command': 'x-blockdev-child-add',
+  'data' : { 'parent': 'str', 'child': 'str' } }
+
+##
+# @x-blockdev-child-del
+#
+# Remove a child from the parent BDS. Currently only the Quorum driver
+# implements this feature. This is useful to fix a broken quorum child.
+# Note, you can't remove a child if it would bring the quorum below its
+# threshold.
+#
+# @parent: graph node name or id from which the child will removed.
+#
+# @child: graph node name that will be removed.
+#
+# Since: 2.5
+##
+{ 'command': 'x-blockdev-child-del',
+  'data' : { 'parent': 'str', 'child': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 66f0300..36e75b1 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3916,6 +3916,67 @@ Example (2):
 EQMP
 
     {
+        .name       = "x-blockdev-child-add",
+        .args_type  = "parent:B,child:B",
+        .mhandler.cmd_new = qmp_marshal_x_blockdev_child_add,
+    },
+
+SQMP
+x-blockdev-child-add
+------------
+
+Add a child to a quorum node.
+
+Arguments:
+
+- "parent": the quorum's id or node name
+- "child": the child node-name which will be added
+
+Note: this command is experimental, and not a stable API. It doesn't
+support all kinds of children, nor all block drivers.
+
+Example:
+
+-> { "execute": blockdev-add",
+    "arguments": { "options": { "driver": "raw",
+                                "node-name": "new_node",
+                                "id": "test_new_node",
+                                "file": { "driver": "file",
+                                          "filename": "test.raw" } } } }
+<- { "return": {} }
+-> { "execute": "x-blockdev-child-add",
+    "arguments": { "parent": "disk1", "child": "new_node" } }
+<- { "return": {} }
+
+EQMP
+
+    {
+        .name        = "x-blockdev-child-del",
+        .args_type   = "parent:B,child:B",
+        .mhandler.cmd_new = qmp_marshal_x_blockdev_child_del,
+    },
+
+SQMP
+x-blockdev-child-del
+------------
+
+Delete a child from a quorum node. It can be used to remove a broken
+quorum child.
+
+Arguments:
+
+- "parent": the quorum's id or node name
+- "child": the child node-name which will be removed
+
+Example:
+
+-> { "execute": "x-blockdev-child-del",
+    "arguments": { "parent": "disk1", "child": "new_node" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "query-named-block-nodes",
         .args_type  = "",
         .mhandler.cmd_new = qmp_marshal_query_named_block_nodes,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH v5 4/4] hmp: add monitor command to add/remove a child
  2015-09-22  7:44 [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Wen Congyang
                   ` (2 preceding siblings ...)
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child Wen Congyang
@ 2015-09-22  7:44 ` Wen Congyang
  2015-10-07 14:38   ` Alberto Garcia
  2015-09-22 11:15 ` [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Dr. David Alan Gilbert
  2015-10-07  6:40 ` Wen Congyang
  5 siblings, 1 reply; 45+ messages in thread
From: Wen Congyang @ 2015-09-22  7:44 UTC (permalink / raw)
  To: qemu devel, Eric Blake, Markus Armbruster, Alberto Garcia,
	Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Luiz Capitulino, Gonglei, Yang Hongyang,
	zhanghailiang

The new command is blockdev_child_add and blockdev_child_del.
It does the same thing as the QMP command x-blockdev-child-add
and x-blockdev-child-del.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
---
 hmp-commands.hx | 28 ++++++++++++++++++++++++++++
 hmp.c           | 20 ++++++++++++++++++++
 hmp.h           |  2 ++
 3 files changed, 50 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index a511004..57adb7d 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -193,6 +193,34 @@ actions (drive options rerror, werror).
 ETEXI
 
     {
+        .name       = "blockdev_child_add",
+        .args_type  = "id:B,child:B",
+        .params     = "parent child",
+        .help       = "add a child to a BDS",
+        .mhandler.cmd = hmp_blockdev_child_add,
+    },
+
+STEXI
+@item blockdev_child_add @var{parent} @var{child}
+@findex blockdev_child_add
+Add a child to the block device.
+ETEXI
+
+    {
+        .name       = "blockdev_child_del",
+        .args_type  = "id:B,child:B",
+        .params     = "parent child",
+        .help       = "remove a child from a BDS",
+        .mhandler.cmd = hmp_blockdev_child_del,
+    },
+
+STEXI
+@item blockdev_child_del @var{parent} @var{child}
+@findex blockdev_child_del
+Remove a child from the parent device.
+ETEXI
+
+    {
         .name       = "change",
         .args_type  = "device:B,target:F,arg:s?",
         .params     = "device filename [format]",
diff --git a/hmp.c b/hmp.c
index 3f807b7..1dc1bed 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2325,3 +2325,23 @@ void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict)
 
     qapi_free_RockerOfDpaGroupList(list);
 }
+
+void hmp_blockdev_child_add(Monitor *mon, const QDict *qdict)
+{
+    const char *id = qdict_get_str(qdict, "id");
+    const char *child_id = qdict_get_str(qdict, "child");
+    Error *local_err = NULL;
+
+    qmp_x_blockdev_child_add(id, child_id, &local_err);
+    hmp_handle_error(mon, &local_err);
+}
+
+void hmp_blockdev_child_del(Monitor *mon, const QDict *qdict)
+{
+    const char *id = qdict_get_str(qdict, "id");
+    const char *child_id = qdict_get_str(qdict, "child");
+    Error *local_err = NULL;
+
+    qmp_x_blockdev_child_del(id, child_id, &local_err);
+    hmp_handle_error(mon, &local_err);
+}
diff --git a/hmp.h b/hmp.h
index 81656c3..3033648 100644
--- a/hmp.h
+++ b/hmp.h
@@ -130,5 +130,7 @@ void hmp_rocker(Monitor *mon, const QDict *qdict);
 void hmp_rocker_ports(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_flows(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
+void hmp_blockdev_child_add(Monitor *mon, const QDict *qdict);
+void hmp_blockdev_child_del(Monitor *mon, const QDict *qdict);
 
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support
  2015-09-22  7:44 [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Wen Congyang
                   ` (3 preceding siblings ...)
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 4/4] hmp: " Wen Congyang
@ 2015-09-22 11:15 ` Dr. David Alan Gilbert
  2015-09-23  1:08   ` Wen Congyang
  2015-10-07  6:40 ` Wen Congyang
  5 siblings, 1 reply; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-22 11:15 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Alberto Garcia, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Markus Armbruster, Stefan Hajnoczi,
	Yang Hongyang

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> If quorum's child is broken, we can use mirror job to replace it.
> But sometimes, the user only need to remove the broken child, and
> add it later when the problem is fixed.
> 

Hi,
  Two questions:
    1) Do you have an example of a pair of add/remove commands that work
      together? (I'm not quite sure I understand where the ID for the remove
      comes from).

    2) If the child has failed and is not responding to block operations
       at all (e.g a networking failure to an nbd device which may take minutes
       to time out); how do you recover - flush or drain on the devices
       hang at that point.

(I was trying to test recovery from a failed secondary using the July COLO
release; but the primary gets stuck in bdrv_drain or bdrv_flush if I kill
the secondary in the right way).

Dave


> It is based on the following patch:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html
> 
> ChangLog:
> v5:
> 1. Address Eric Blake's comments
> v4:
> 1. drop nbd driver's implementation. We can use human-monitor-command
>    to do it.
> 2. Rename the command name.
> v3:
> 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
>    created by the QMP command blockdev-add.
> 2. The driver NBD can support filename, path, host:port now.
> v2:
> 1. Use bdrv_get_device_or_node_name() instead of new function
>    bdrv_get_id_or_node_name()
> 2. Update the error message
> 3. Update the documents in block-core.json
> 
> Wen Congyang (4):
>   Add new block driver interface to add/delete a BDS's child
>   quorum: implement bdrv_add_child() and bdrv_del_child()
>   qmp: add monitor command to add/remove a child
>   hmp: add monitor command to add/remove a child
> 
>  block.c                   | 56 ++++++++++++++++++++++++++++++++++--
>  block/quorum.c            | 72 +++++++++++++++++++++++++++++++++++++++++++++--
>  blockdev.c                | 48 +++++++++++++++++++++++++++++++
>  hmp-commands.hx           | 28 ++++++++++++++++++
>  hmp.c                     | 20 +++++++++++++
>  hmp.h                     |  2 ++
>  include/block/block.h     |  8 ++++++
>  include/block/block_int.h |  5 ++++
>  qapi/block-core.json      | 34 ++++++++++++++++++++++
>  qmp-commands.hx           | 61 +++++++++++++++++++++++++++++++++++++++
>  10 files changed, 329 insertions(+), 5 deletions(-)
> 
> -- 
> 2.4.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support
  2015-09-22 11:15 ` [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Dr. David Alan Gilbert
@ 2015-09-23  1:08   ` Wen Congyang
  2015-09-23  9:21     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 45+ messages in thread
From: Wen Congyang @ 2015-09-23  1:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Alberto Garcia, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Markus Armbruster, Stefan Hajnoczi,
	Yang Hongyang

On 09/22/2015 07:15 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> If quorum's child is broken, we can use mirror job to replace it.
>> But sometimes, the user only need to remove the broken child, and
>> add it later when the problem is fixed.
>>
> 
> Hi,
>   Two questions:
>     1) Do you have an example of a pair of add/remove commands that work
>       together? (I'm not quite sure I understand where the ID for the remove
>       comes from).

The command line:
-drive if=virtio,id=disk1,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/data/images/kvm/suse/suse11_3.img,children.0.driver=raw

And the QMP monitor command:
{'execute':'blockdev-add', 'arguments':{'options':{'driver': 'raw', 'node-name': 'test1', 'file': {'driver': 'file', 'filename': '/dev/null'}, 'id': 'test11' }  } }
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=nbd,host=192.168.3.1,port=8889,export=colo-disk1,node-name=test2,if=none'}}
{'execute':'x-blockdev-child-add', 'arguments':{'parent': 'disk1', 'child': 'test1' } }
{'execute':'x-blockdev-child-add', 'arguments':{'parent': 'disk1', 'child': 'test2' } }
{'execute': 'x-blockdev-child-del', 'arguments': {'parent': 'disk1', 'child': 'test1' } }
{'execute': 'x-blockdev-child-del', 'arguments': {'parent': 'disk1', 'child': 'test2' } }

Note: the qmp monitor command doesn't support nbd now, and I use the hmp command to add a BDS.

> 
>     2) If the child has failed and is not responding to block operations
>        at all (e.g a networking failure to an nbd device which may take minutes
>        to time out); how do you recover - flush or drain on the devices
>        hang at that point.

If the network fails, the kernel doesn't notify the application...

> 
> (I was trying to test recovery from a failed secondary using the July COLO
> release; but the primary gets stuck in bdrv_drain or bdrv_flush if I kill
> the secondary in the right way).

IIRC, if the qemu is killed, the connection is closed at the same time. bdrv_drain()
or bdrv_flush() should not get stuck.

Thanks
Wen Congyang

> 
> Dave
> 
> 
>> It is based on the following patch:
>> http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html
>>
>> ChangLog:
>> v5:
>> 1. Address Eric Blake's comments
>> v4:
>> 1. drop nbd driver's implementation. We can use human-monitor-command
>>    to do it.
>> 2. Rename the command name.
>> v3:
>> 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
>>    created by the QMP command blockdev-add.
>> 2. The driver NBD can support filename, path, host:port now.
>> v2:
>> 1. Use bdrv_get_device_or_node_name() instead of new function
>>    bdrv_get_id_or_node_name()
>> 2. Update the error message
>> 3. Update the documents in block-core.json
>>
>> Wen Congyang (4):
>>   Add new block driver interface to add/delete a BDS's child
>>   quorum: implement bdrv_add_child() and bdrv_del_child()
>>   qmp: add monitor command to add/remove a child
>>   hmp: add monitor command to add/remove a child
>>
>>  block.c                   | 56 ++++++++++++++++++++++++++++++++++--
>>  block/quorum.c            | 72 +++++++++++++++++++++++++++++++++++++++++++++--
>>  blockdev.c                | 48 +++++++++++++++++++++++++++++++
>>  hmp-commands.hx           | 28 ++++++++++++++++++
>>  hmp.c                     | 20 +++++++++++++
>>  hmp.h                     |  2 ++
>>  include/block/block.h     |  8 ++++++
>>  include/block/block_int.h |  5 ++++
>>  qapi/block-core.json      | 34 ++++++++++++++++++++++
>>  qmp-commands.hx           | 61 +++++++++++++++++++++++++++++++++++++++
>>  10 files changed, 329 insertions(+), 5 deletions(-)
>>
>> -- 
>> 2.4.3
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support
  2015-09-23  1:08   ` Wen Congyang
@ 2015-09-23  9:21     ` Dr. David Alan Gilbert
  2015-09-23  9:30       ` Wen Congyang
  0 siblings, 1 reply; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-23  9:21 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Alberto Garcia, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Markus Armbruster, Stefan Hajnoczi,
	Yang Hongyang

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 09/22/2015 07:15 PM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >> If quorum's child is broken, we can use mirror job to replace it.
> >> But sometimes, the user only need to remove the broken child, and
> >> add it later when the problem is fixed.
> >>
> > 
> > Hi,
> >   Two questions:
> >     1) Do you have an example of a pair of add/remove commands that work
> >       together? (I'm not quite sure I understand where the ID for the remove
> >       comes from).
> 
> The command line:
> -drive if=virtio,id=disk1,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/data/images/kvm/suse/suse11_3.img,children.0.driver=raw
> 
> And the QMP monitor command:
> {'execute':'blockdev-add', 'arguments':{'options':{'driver': 'raw', 'node-name': 'test1', 'file': {'driver': 'file', 'filename': '/dev/null'}, 'id': 'test11' }  } }
> {'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=nbd,host=192.168.3.1,port=8889,export=colo-disk1,node-name=test2,if=none'}}
> {'execute':'x-blockdev-child-add', 'arguments':{'parent': 'disk1', 'child': 'test1' } }
> {'execute':'x-blockdev-child-add', 'arguments':{'parent': 'disk1', 'child': 'test2' } }
> {'execute': 'x-blockdev-child-del', 'arguments': {'parent': 'disk1', 'child': 'test1' } }
> {'execute': 'x-blockdev-child-del', 'arguments': {'parent': 'disk1', 'child': 'test2' } }
> 
> Note: the qmp monitor command doesn't support nbd now, and I use the hmp command to add a BDS.

Thank you; OK I see the format has changed quite a bit from the older version; this version
is a lot nicer.

> >     2) If the child has failed and is not responding to block operations
> >        at all (e.g a networking failure to an nbd device which may take minutes
> >        to time out); how do you recover - flush or drain on the devices
> >        hang at that point.
> 
> If the network fails, the kernel doesn't notify the application...
> 
> > 
> > (I was trying to test recovery from a failed secondary using the July COLO
> > release; but the primary gets stuck in bdrv_drain or bdrv_flush if I kill
> > the secondary in the right way).
> 
> IIRC, if the qemu is killed, the connection is closed at the same time. bdrv_drain()
> or bdrv_flush() should not get stuck.

I use kill -SIGSTOP to the secondary qemu so I think that behaves like the network fails,
or if the secondary host just failed completely.  You do need some way to recover from the
NBD server dieing like that.

It sounds like we need some way to be able to remove a blockdev that's failed like that;
Paolo suggested the 'disk deadline' series could be used to time something like that
out eventually, but maybe you need something that allows you to remove
a child more forcibly.

Dave

> 
> Thanks
> Wen Congyang
> 
> > 
> > Dave
> > 
> > 
> >> It is based on the following patch:
> >> http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html
> >>
> >> ChangLog:
> >> v5:
> >> 1. Address Eric Blake's comments
> >> v4:
> >> 1. drop nbd driver's implementation. We can use human-monitor-command
> >>    to do it.
> >> 2. Rename the command name.
> >> v3:
> >> 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
> >>    created by the QMP command blockdev-add.
> >> 2. The driver NBD can support filename, path, host:port now.
> >> v2:
> >> 1. Use bdrv_get_device_or_node_name() instead of new function
> >>    bdrv_get_id_or_node_name()
> >> 2. Update the error message
> >> 3. Update the documents in block-core.json
> >>
> >> Wen Congyang (4):
> >>   Add new block driver interface to add/delete a BDS's child
> >>   quorum: implement bdrv_add_child() and bdrv_del_child()
> >>   qmp: add monitor command to add/remove a child
> >>   hmp: add monitor command to add/remove a child
> >>
> >>  block.c                   | 56 ++++++++++++++++++++++++++++++++++--
> >>  block/quorum.c            | 72 +++++++++++++++++++++++++++++++++++++++++++++--
> >>  blockdev.c                | 48 +++++++++++++++++++++++++++++++
> >>  hmp-commands.hx           | 28 ++++++++++++++++++
> >>  hmp.c                     | 20 +++++++++++++
> >>  hmp.h                     |  2 ++
> >>  include/block/block.h     |  8 ++++++
> >>  include/block/block_int.h |  5 ++++
> >>  qapi/block-core.json      | 34 ++++++++++++++++++++++
> >>  qmp-commands.hx           | 61 +++++++++++++++++++++++++++++++++++++++
> >>  10 files changed, 329 insertions(+), 5 deletions(-)
> >>
> >> -- 
> >> 2.4.3
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support
  2015-09-23  9:21     ` Dr. David Alan Gilbert
@ 2015-09-23  9:30       ` Wen Congyang
  0 siblings, 0 replies; 45+ messages in thread
From: Wen Congyang @ 2015-09-23  9:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Alberto Garcia, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Markus Armbruster, Stefan Hajnoczi,
	Yang Hongyang

On 09/23/2015 05:21 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 09/22/2015 07:15 PM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>> If quorum's child is broken, we can use mirror job to replace it.
>>>> But sometimes, the user only need to remove the broken child, and
>>>> add it later when the problem is fixed.
>>>>
>>>
>>> Hi,
>>>   Two questions:
>>>     1) Do you have an example of a pair of add/remove commands that work
>>>       together? (I'm not quite sure I understand where the ID for the remove
>>>       comes from).
>>
>> The command line:
>> -drive if=virtio,id=disk1,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/data/images/kvm/suse/suse11_3.img,children.0.driver=raw
>>
>> And the QMP monitor command:
>> {'execute':'blockdev-add', 'arguments':{'options':{'driver': 'raw', 'node-name': 'test1', 'file': {'driver': 'file', 'filename': '/dev/null'}, 'id': 'test11' }  } }
>> {'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=nbd,host=192.168.3.1,port=8889,export=colo-disk1,node-name=test2,if=none'}}
>> {'execute':'x-blockdev-child-add', 'arguments':{'parent': 'disk1', 'child': 'test1' } }
>> {'execute':'x-blockdev-child-add', 'arguments':{'parent': 'disk1', 'child': 'test2' } }
>> {'execute': 'x-blockdev-child-del', 'arguments': {'parent': 'disk1', 'child': 'test1' } }
>> {'execute': 'x-blockdev-child-del', 'arguments': {'parent': 'disk1', 'child': 'test2' } }
>>
>> Note: the qmp monitor command doesn't support nbd now, and I use the hmp command to add a BDS.
> 
> Thank you; OK I see the format has changed quite a bit from the older version; this version
> is a lot nicer.
> 
>>>     2) If the child has failed and is not responding to block operations
>>>        at all (e.g a networking failure to an nbd device which may take minutes
>>>        to time out); how do you recover - flush or drain on the devices
>>>        hang at that point.
>>
>> If the network fails, the kernel doesn't notify the application...
>>
>>>
>>> (I was trying to test recovery from a failed secondary using the July COLO
>>> release; but the primary gets stuck in bdrv_drain or bdrv_flush if I kill
>>> the secondary in the right way).
>>
>> IIRC, if the qemu is killed, the connection is closed at the same time. bdrv_drain()
>> or bdrv_flush() should not get stuck.
> 
> I use kill -SIGSTOP to the secondary qemu so I think that behaves like the network fails,
> or if the secondary host just failed completely.  You do need some way to recover from the
> NBD server dieing like that.

You use SIGSTOP, so there is no error in the connection, and the nbd client will wait the
reply. bdrv_drain() will never end in this case.

> 
> It sounds like we need some way to be able to remove a blockdev that's failed like that;
> Paolo suggested the 'disk deadline' series could be used to time something like that
> out eventually, but maybe you need something that allows you to remove
> a child more forcibly.

Yes, but quorum will wait bdrv_co_write() return. It is very hard to implement it now...


I guess 'disk deadline' can fix these two problems.

Thanks
Wen Congyang

> 
> Dave
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>> Dave
>>>
>>>
>>>> It is based on the following patch:
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html
>>>>
>>>> ChangLog:
>>>> v5:
>>>> 1. Address Eric Blake's comments
>>>> v4:
>>>> 1. drop nbd driver's implementation. We can use human-monitor-command
>>>>    to do it.
>>>> 2. Rename the command name.
>>>> v3:
>>>> 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
>>>>    created by the QMP command blockdev-add.
>>>> 2. The driver NBD can support filename, path, host:port now.
>>>> v2:
>>>> 1. Use bdrv_get_device_or_node_name() instead of new function
>>>>    bdrv_get_id_or_node_name()
>>>> 2. Update the error message
>>>> 3. Update the documents in block-core.json
>>>>
>>>> Wen Congyang (4):
>>>>   Add new block driver interface to add/delete a BDS's child
>>>>   quorum: implement bdrv_add_child() and bdrv_del_child()
>>>>   qmp: add monitor command to add/remove a child
>>>>   hmp: add monitor command to add/remove a child
>>>>
>>>>  block.c                   | 56 ++++++++++++++++++++++++++++++++++--
>>>>  block/quorum.c            | 72 +++++++++++++++++++++++++++++++++++++++++++++--
>>>>  blockdev.c                | 48 +++++++++++++++++++++++++++++++
>>>>  hmp-commands.hx           | 28 ++++++++++++++++++
>>>>  hmp.c                     | 20 +++++++++++++
>>>>  hmp.h                     |  2 ++
>>>>  include/block/block.h     |  8 ++++++
>>>>  include/block/block_int.h |  5 ++++
>>>>  qapi/block-core.json      | 34 ++++++++++++++++++++++
>>>>  qmp-commands.hx           | 61 +++++++++++++++++++++++++++++++++++++++
>>>>  10 files changed, 329 insertions(+), 5 deletions(-)
>>>>
>>>> -- 
>>>> 2.4.3
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support
  2015-09-22  7:44 [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Wen Congyang
                   ` (4 preceding siblings ...)
  2015-09-22 11:15 ` [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Dr. David Alan Gilbert
@ 2015-10-07  6:40 ` Wen Congyang
  5 siblings, 0 replies; 45+ messages in thread
From: Wen Congyang @ 2015-10-07  6:40 UTC (permalink / raw)
  To: qemu devel, Eric Blake, Markus Armbruster, Alberto Garcia,
	Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Yang Hongyang

Ping...

On 09/22/2015 03:44 PM, Wen Congyang wrote:
> If quorum's child is broken, we can use mirror job to replace it.
> But sometimes, the user only need to remove the broken child, and
> add it later when the problem is fixed.
> 
> It is based on the following patch:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html
> 
> ChangLog:
> v5:
> 1. Address Eric Blake's comments
> v4:
> 1. drop nbd driver's implementation. We can use human-monitor-command
>    to do it.
> 2. Rename the command name.
> v3:
> 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
>    created by the QMP command blockdev-add.
> 2. The driver NBD can support filename, path, host:port now.
> v2:
> 1. Use bdrv_get_device_or_node_name() instead of new function
>    bdrv_get_id_or_node_name()
> 2. Update the error message
> 3. Update the documents in block-core.json
> 
> Wen Congyang (4):
>   Add new block driver interface to add/delete a BDS's child
>   quorum: implement bdrv_add_child() and bdrv_del_child()
>   qmp: add monitor command to add/remove a child
>   hmp: add monitor command to add/remove a child
> 
>  block.c                   | 56 ++++++++++++++++++++++++++++++++++--
>  block/quorum.c            | 72 +++++++++++++++++++++++++++++++++++++++++++++--
>  blockdev.c                | 48 +++++++++++++++++++++++++++++++
>  hmp-commands.hx           | 28 ++++++++++++++++++
>  hmp.c                     | 20 +++++++++++++
>  hmp.h                     |  2 ++
>  include/block/block.h     |  8 ++++++
>  include/block/block_int.h |  5 ++++
>  qapi/block-core.json      | 34 ++++++++++++++++++++++
>  qmp-commands.hx           | 61 +++++++++++++++++++++++++++++++++++++++
>  10 files changed, 329 insertions(+), 5 deletions(-)
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child Wen Congyang
@ 2015-10-07 13:35   ` Alberto Garcia
  2015-10-08  2:05     ` Wen Congyang
  2015-10-07 18:33   ` [Qemu-devel] [Qemu-block] " Max Reitz
  2015-10-07 19:00   ` [Qemu-devel] " Dr. David Alan Gilbert
  2 siblings, 1 reply; 45+ messages in thread
From: Alberto Garcia @ 2015-10-07 13:35 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Eric Blake, Markus Armbruster, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang, zhanghailiang

On Tue 22 Sep 2015 09:44:19 AM CEST, Wen Congyang <wency@cn.fujitsu.com> wrote:
> In some cases, we want to take a quorum child offline, and take
> another child online.
>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>

> +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
> +                    Error **errp)
> +{
> +
> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
> +        error_setg(errp, "The BDS %s doesn't support adding a child",
> +                   bdrv_get_device_or_node_name(parent_bs));
> +        return;
> +    }
> +
> +    if (!QLIST_EMPTY(&child_bs->parents)) {
> +        error_setg(errp, "The BDS %s already has parent",
> +                   child_bs->node_name);

I think there's one 'a' missing:

  "The BDS %s already has a parent".

I also don't think we should use "BDS" in error messages, that's an
acronym for the name of a C data type, not something that the user is
supposed to know about.

I suggest using 'Node' instead.

Otherwise the patch looks good to me, thanks!

Berto

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child() Wen Congyang
@ 2015-10-07 14:12   ` Alberto Garcia
  2015-10-08  2:10     ` Wen Congyang
  2015-10-07 18:51   ` [Qemu-devel] [Qemu-block] " Max Reitz
  1 sibling, 1 reply; 45+ messages in thread
From: Alberto Garcia @ 2015-10-07 14:12 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Eric Blake, Markus Armbruster, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang, zhanghailiang

On Tue 22 Sep 2015 09:44:20 AM CEST, Wen Congyang wrote:

> +++ b/block/quorum.c
> @@ -66,6 +66,9 @@ typedef struct QuorumVotes {
>  typedef struct BDRVQuorumState {
>      BlockDriverState **bs; /* children BlockDriverStates */
>      int num_children;      /* children count */
> +    int max_children;      /* The maximum children count, we need to reallocate
> +                            * bs if num_children grows larger than maximum.
> +                            */
>      int threshold;         /* if less than threshold children reads gave the
>                              * same result a quorum error occurs.
>                              */

As you announce in the cover letter of this series, your code depends on
the parents list patch written by Kevin here:

http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html

As you might be aware, and as part of the same series by Kevin,
BDRVQuorumState will no longer hold a list of BlockDriverState but a
list of BdrvChild instead:

https://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04571.html

> +static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
> +                             Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +
> +    bdrv_drain(bs);
> +
> +    if (s->num_children == s->max_children) {
> +        if (s->max_children >= INT_MAX) {
> +            error_setg(errp, "Too many children");
> +            return;
> +        }

max_children can never be greater than INT_MAX. Use == instead.

> +        s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
> +        s->bs[s->num_children] = NULL;

No need to set the pointer to NULL here, and you are anyway setting the
pointer to the new child a few lines afterwards.

> +        s->max_children++;
> +    }
> +
> +    bdrv_ref(child_bs);
> +    bdrv_attach_child(bs, child_bs, &child_format);
> +    s->bs[s->num_children++] = child_bs;
> +}
> +
> +static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
> +                             Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    BdrvChild *child;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        if (s->bs[i] == child_bs) {
> +            break;
> +        }
> +    }
> +
> +    QLIST_FOREACH(child, &bs->children, next) {
> +        if (child->bs == child_bs) {
> +            break;
> +        }
> +    }
> +
> +    /* we have checked it in bdrv_del_child() */
> +    assert(i < s->num_children && child);
> +
> +    if (s->num_children <= s->threshold) {
> +        error_setg(errp,
> +            "The number of children cannot be lower than the vote threshold %d",
> +            s->threshold);
> +        return;
> +    }
> +
> +    bdrv_drain(bs);
> +    /* We can safely remove this child now */
> +    memmove(&s->bs[i], &s->bs[i + 1],
> +            (s->num_children - i - 1) * sizeof(void *));
> +    s->num_children--;
> +    s->bs[s->num_children] = NULL;

Same here, no one will check or use s->bs[s->num_children] so there's no
need to make it NULL.

Apart from the issue of using only part of Kevin's series, the rest are
minor things.

Thanks and sorry for the late review!

Berto

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child Wen Congyang
@ 2015-10-07 14:33   ` Alberto Garcia
  2015-10-07 19:42   ` [Qemu-devel] [Qemu-block] " Max Reitz
  1 sibling, 0 replies; 45+ messages in thread
From: Alberto Garcia @ 2015-10-07 14:33 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Eric Blake, Markus Armbruster, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang, zhanghailiang

On Tue 22 Sep 2015 09:44:21 AM CEST, Wen Congyang wrote:
> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
> It justs for adding/removing quorum's child now, and don't support all
> kinds of children, nor all block drivers. So it is experimental now.

Better "So it is experimental for now". Otherwise it suggests that it
was not experimental before but now it is.

> +void qmp_x_blockdev_child_add(const char *parent, const char *child,
> +                              Error **errp)
> +{
> +    BlockDriverState *parent_bs, *child_bs;
> +    Error *local_err = NULL;
> +
> +    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
> +    if (!parent_bs) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }

I think this is fine as it is now, but since you are checking the value
of parent_bs to see if bdrv_lookup_bs() succeeded you can use errp
directly instead and skip the error_propagate() call.

> +void qmp_x_blockdev_child_del(const char *parent, const char *child,
> +                              Error **errp)
> +{
> +    BlockDriverState *parent_bs, *child_bs;
> +    Error *local_err = NULL;
> +
> +    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
> +    if (!parent_bs) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }

Same here.

> +##
> +# @x-blockdev-child-add
> +#
> +# Add a new child to the parent BDS. Currently only the Quorum driver
> +# implements this feature. This is useful to fix a broken quorum
> child.

Again, I would not use 'BDS' here. "Add a new child to an existing block
device", or something like that.

> +# @parent: graph node name or id which the child will be added to.
> +#
> +# @child: graph node name that will be added.
> +#
> +# Note: this command is experimental, and not a stable API.

"[..] and does not have a stable API".

> +x-blockdev-child-add
> +------------
> +
> +Add a child to a quorum node.
> +
> +Arguments:
> +
> +- "parent": the quorum's id or node name
> +- "child": the child node-name which will be added
> +

I would use the same kind of description than in the json file. "Add a
child to an existing block device. Currently only Quorum supports this
feature". Same for the 'x-blockdev-child-del' documentation.

Berto

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/4] hmp: add monitor command to add/remove a child
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 4/4] hmp: " Wen Congyang
@ 2015-10-07 14:38   ` Alberto Garcia
  0 siblings, 0 replies; 45+ messages in thread
From: Alberto Garcia @ 2015-10-07 14:38 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Eric Blake, Markus Armbruster, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Luiz Capitulino, Gonglei, Yang Hongyang,
	zhanghailiang

On Tue 22 Sep 2015 09:44:22 AM CEST, Wen Congyang wrote:

> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -193,6 +193,34 @@ actions (drive options rerror, werror).
>  ETEXI
>  
>      {
> +        .name       = "blockdev_child_add",
> +        .args_type  = "id:B,child:B",
> +        .params     = "parent child",
> +        .help       = "add a child to a BDS",

Once again, I would not use 'BDS' in the documentation.

> +        .mhandler.cmd = hmp_blockdev_child_add,
> +    },
> +
> +STEXI
> +@item blockdev_child_add @var{parent} @var{child}
> +@findex blockdev_child_add
> +Add a child to the block device.

"Add a child to a block device."

Thanks!

Berto

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child Wen Congyang
  2015-10-07 13:35   ` Alberto Garcia
@ 2015-10-07 18:33   ` Max Reitz
  2015-10-08  2:06     ` Wen Congyang
  2015-10-07 19:00   ` [Qemu-devel] " Dr. David Alan Gilbert
  2 siblings, 1 reply; 45+ messages in thread
From: Max Reitz @ 2015-10-07 18:33 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Eric Blake, Markus Armbruster,
	Alberto Garcia, Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 3832 bytes --]

On 22.09.2015 09:44, Wen Congyang wrote:
> In some cases, we want to take a quorum child offline, and take
> another child online.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  block.c                   | 50 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/block/block.h     |  5 +++++
>  include/block/block_int.h |  5 +++++
>  3 files changed, 60 insertions(+)
> 
> diff --git a/block.c b/block.c
> index e815d73..1b25e43 100644
> --- a/block.c
> +++ b/block.c
> @@ -4265,3 +4265,53 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>  {
>      return &bs->stats;
>  }
> +
> +/*
> + * Hot add/remove a BDS's child. So the user can take a child offline when
> + * it is broken and take a new child online
> + */
> +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
> +                    Error **errp)
> +{
> +
> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
> +        error_setg(errp, "The BDS %s doesn't support adding a child",
> +                   bdrv_get_device_or_node_name(parent_bs));
> +        return;
> +    }
> +
> +    if (!QLIST_EMPTY(&child_bs->parents)) {
> +        error_setg(errp, "The BDS %s already has parent",
> +                   child_bs->node_name);
> +        return;
> +    }
> +
> +    parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
> +}
> +
> +void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
> +                    Error **errp)
> +{
> +    BdrvChild *child;
> +
> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
> +        error_setg(errp, "The BDS %s doesn't support removing a child",
> +                   bdrv_get_device_or_node_name(parent_bs));
> +        return;
> +    }
> +
> +    QLIST_FOREACH(child, &parent_bs->children, next) {
> +        if (child->bs == child_bs) {
> +            break;
> +        }
> +    }
> +
> +    if (!child) {
> +        error_setg(errp, "BDS %s is not a child of %s",
> +                   bdrv_get_device_or_node_name(child_bs),
> +                   bdrv_get_device_or_node_name(parent_bs));
> +        return;
> +    }
> +
> +    parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);

How about we make this (BlockDriver.bdrv_del_child()) take a BdrvChild
instead of a BDS? We could even make bdrv_del_child() as a whole take a
BdrvChild parameter, but I don't suppose that would help much.

Max

> +}
> diff --git a/include/block/block.h b/include/block/block.h
> index ef67353..665c56f 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -616,4 +616,9 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
>  
>  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
>  
> +void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
> +                    Error **errp);
> +void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
> +                    Error **errp);
> +
>  #endif
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 2f2c47b..64cbc55 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -288,6 +288,11 @@ struct BlockDriver {
>       */
>      int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
>  
> +    void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
> +                           Error **errp);
> +    void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
> +                           Error **errp);
> +
>      QLIST_ENTRY(BlockDriver) list;
>  };
>  
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child() Wen Congyang
  2015-10-07 14:12   ` Alberto Garcia
@ 2015-10-07 18:51   ` Max Reitz
  2015-10-08  8:12     ` Alberto Garcia
  1 sibling, 1 reply; 45+ messages in thread
From: Max Reitz @ 2015-10-07 18:51 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Eric Blake, Markus Armbruster,
	Alberto Garcia, Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 7322 bytes --]

On 22.09.2015 09:44, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block.c               |  6 ++---
>  block/quorum.c        | 72 +++++++++++++++++++++++++++++++++++++++++++++++++--
>  include/block/block.h |  3 +++
>  3 files changed, 76 insertions(+), 5 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 1b25e43..01f6d69 100644
> --- a/block.c
> +++ b/block.c
> @@ -1079,9 +1079,9 @@ static int bdrv_fill_options(QDict **options, const char **pfilename,
>      return 0;
>  }
>  
> -static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
> -                                    BlockDriverState *child_bs,
> -                                    const BdrvChildRole *child_role)
> +BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
> +                             BlockDriverState *child_bs,
> +                             const BdrvChildRole *child_role)
>  {
>      BdrvChild *child = g_new(BdrvChild, 1);
>      *child = (BdrvChild) {
> diff --git a/block/quorum.c b/block/quorum.c
> index 8fe53b4..111a57b 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -66,6 +66,9 @@ typedef struct QuorumVotes {
>  typedef struct BDRVQuorumState {
>      BlockDriverState **bs; /* children BlockDriverStates */
>      int num_children;      /* children count */
> +    int max_children;      /* The maximum children count, we need to reallocate
> +                            * bs if num_children grows larger than maximum.
> +                            */
>      int threshold;         /* if less than threshold children reads gave the
>                              * same result a quorum error occurs.
>                              */
> @@ -874,9 +877,9 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
>          ret = -EINVAL;
>          goto exit;
>      }
> -    if (s->num_children < 2) {
> +    if (s->num_children < 1) {
>          error_setg(&local_err,
> -                   "Number of provided children must be greater than 1");
> +                   "Number of provided children must be 1 or more");
>          ret = -EINVAL;
>          goto exit;
>      }
> @@ -925,6 +928,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
>      /* allocate the children BlockDriverState array */
>      s->bs = g_new0(BlockDriverState *, s->num_children);
>      opened = g_new0(bool, s->num_children);
> +    s->max_children = s->num_children;
>  
>      for (i = 0; i < s->num_children; i++) {
>          char indexstr[32];
> @@ -995,6 +999,67 @@ static void quorum_attach_aio_context(BlockDriverState *bs,
>      }
>  }
>  
> +static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
> +                             Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +
> +    bdrv_drain(bs);
> +
> +    if (s->num_children == s->max_children) {
> +        if (s->max_children >= INT_MAX) {

Opposing Berto (:-)), I like >= even if the > part is actually
impossible. I myself like to use constructs such as:

for (i = 0; i < n; i++) {
    /* ... */
}
if (i >= n) {
    /* ... */
}

Even though @i can never exceed @n after the loop. This is because my
way of thinking is "What if it could exceed @n? Then I'd like to take
this branch as well." The same applies here. s->max_children can never
exceed INT_MAX, but if it could, we'd want that to be an error, too.

I do find myself in discussions about that from time to time, especially
since others prefer (for the example above):

for (i = 0; i < n; i++) { /* ... */ }
assert(i <= n);
if (i == n) { /* ... */ }

Or the like.

> +            error_setg(errp, "Too many children");
> +            return;
> +        }
> +
> +        s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
> +        s->bs[s->num_children] = NULL;
> +        s->max_children++;
> +    }

Just a suggestion, please feel free to ignore it completely:

You can drop the s->max_children field and just always call g_renew()
with s->num_children + 1 as the @count parameter. There shouldn't be any
(visible) performance penalty, but it would simplify the code.

> +
> +    bdrv_ref(child_bs);
> +    bdrv_attach_child(bs, child_bs, &child_format);
> +    s->bs[s->num_children++] = child_bs;
> +}
> +
> +static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
> +                             Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    BdrvChild *child;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        if (s->bs[i] == child_bs) {
> +            break;
> +        }
> +    }
> +
> +    QLIST_FOREACH(child, &bs->children, next) {
> +        if (child->bs == child_bs) {
> +            break;
> +        }
> +    }
> +
> +    /* we have checked it in bdrv_del_child() */
> +    assert(i < s->num_children && child);
> +
> +    if (s->num_children <= s->threshold) {
> +        error_setg(errp,
> +            "The number of children cannot be lower than the vote threshold %d",
> +            s->threshold);
> +        return;
> +    }
> +
> +    bdrv_drain(bs);
> +    /* We can safely remove this child now */
> +    memmove(&s->bs[i], &s->bs[i + 1],
> +            (s->num_children - i - 1) * sizeof(void *));

I'd use a matching type here, i.e. sizeof(BlockDriverState *) before
Kevin's bdrv_swap() series or sizeof(BdrvChild *) afterwards.

> +    s->num_children--;
> +    s->bs[s->num_children] = NULL;

In case you do decide to drop s->max_children, you could replace this by
a call to g_renew on s->bs (or s->children).

In any case, as Berto said, this line is unnecessary.

Max

> +    bdrv_unref_child(bs, child);
> +}
> +
>  static void quorum_refresh_filename(BlockDriverState *bs)
>  {
>      BDRVQuorumState *s = bs->opaque;
> @@ -1049,6 +1114,9 @@ static BlockDriver bdrv_quorum = {
>      .bdrv_detach_aio_context            = quorum_detach_aio_context,
>      .bdrv_attach_aio_context            = quorum_attach_aio_context,
>  
> +    .bdrv_add_child                     = quorum_add_child,
> +    .bdrv_del_child                     = quorum_del_child,
> +
>      .is_filter                          = true,
>      .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
>  };
> diff --git a/include/block/block.h b/include/block/block.h
> index 665c56f..bd97399 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -514,6 +514,9 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
>  void bdrv_ref(BlockDriverState *bs);
>  void bdrv_unref(BlockDriverState *bs);
>  void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
> +BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
> +                             BlockDriverState *child_bs,
> +                             const BdrvChildRole *child_role);
>  
>  bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
>  void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child Wen Congyang
  2015-10-07 13:35   ` Alberto Garcia
  2015-10-07 18:33   ` [Qemu-devel] [Qemu-block] " Max Reitz
@ 2015-10-07 19:00   ` Dr. David Alan Gilbert
  2015-10-08  2:03     ` Wen Congyang
  2 siblings, 1 reply; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-07 19:00 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Alberto Garcia, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Markus Armbruster,
	Gonglei, Stefan Hajnoczi, Yang Hongyang

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> In some cases, we want to take a quorum child offline, and take
> another child online.

Hi,
  Have you checked the output of 'info block' after adding/deleting a child?
I'm using one of your older worlds (from a few months ago) and I found I had
to add a 

    bdrv_refresh_filename(bs);

to get the output of 'info block' to show the new child.
I don't see it in this version.

Dave


> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  block.c                   | 50 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/block/block.h     |  5 +++++
>  include/block/block_int.h |  5 +++++
>  3 files changed, 60 insertions(+)
> 
> diff --git a/block.c b/block.c
> index e815d73..1b25e43 100644
> --- a/block.c
> +++ b/block.c
> @@ -4265,3 +4265,53 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>  {
>      return &bs->stats;
>  }
> +
> +/*
> + * Hot add/remove a BDS's child. So the user can take a child offline when
> + * it is broken and take a new child online
> + */
> +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
> +                    Error **errp)
> +{
> +
> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
> +        error_setg(errp, "The BDS %s doesn't support adding a child",
> +                   bdrv_get_device_or_node_name(parent_bs));
> +        return;
> +    }
> +
> +    if (!QLIST_EMPTY(&child_bs->parents)) {
> +        error_setg(errp, "The BDS %s already has parent",
> +                   child_bs->node_name);
> +        return;
> +    }
> +
> +    parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
> +}
> +
> +void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
> +                    Error **errp)
> +{
> +    BdrvChild *child;
> +
> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
> +        error_setg(errp, "The BDS %s doesn't support removing a child",
> +                   bdrv_get_device_or_node_name(parent_bs));
> +        return;
> +    }
> +
> +    QLIST_FOREACH(child, &parent_bs->children, next) {
> +        if (child->bs == child_bs) {
> +            break;
> +        }
> +    }
> +
> +    if (!child) {
> +        error_setg(errp, "BDS %s is not a child of %s",
> +                   bdrv_get_device_or_node_name(child_bs),
> +                   bdrv_get_device_or_node_name(parent_bs));
> +        return;
> +    }
> +
> +    parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
> +}
> diff --git a/include/block/block.h b/include/block/block.h
> index ef67353..665c56f 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -616,4 +616,9 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
>  
>  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
>  
> +void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
> +                    Error **errp);
> +void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
> +                    Error **errp);
> +
>  #endif
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 2f2c47b..64cbc55 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -288,6 +288,11 @@ struct BlockDriver {
>       */
>      int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
>  
> +    void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
> +                           Error **errp);
> +    void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
> +                           Error **errp);
> +
>      QLIST_ENTRY(BlockDriver) list;
>  };
>  
> -- 
> 2.4.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child Wen Congyang
  2015-10-07 14:33   ` Alberto Garcia
@ 2015-10-07 19:42   ` Max Reitz
  2015-10-08  6:15     ` Markus Armbruster
  2015-10-12 16:27     ` Max Reitz
  1 sibling, 2 replies; 45+ messages in thread
From: Max Reitz @ 2015-10-07 19:42 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Eric Blake, Markus Armbruster,
	Alberto Garcia, Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 8469 bytes --]

On 22.09.2015 09:44, Wen Congyang wrote:
> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
> It justs for adding/removing quorum's child now, and don't support all
> kinds of children,

It does support all kinds of children for quorum, doesn't it?

>                    nor all block drivers. So it is experimental now.

Well, that is not really a reason why we would have to make it
experimental. For instance, blockdev-add (although some might argue it
actually is experimental...) doesn't support all block drivers either.

The reason I am hesitant of adding an experimental QMP interface that is
actually visible to the user (compare x-image in blkverify and blkdebug,
which are not documented and not to be used by the user) is twofold:

(1) At some point we have to say "OK, this is good enough now" and make
    it stable. What would that point be? Who can guarantee that we
    wouldn't want to make any interface changes after that point? Would
    we actually remember to revisit this function once in a while and
    consider making it stable?

(2) While marking things experimental *should* keep people from using it
    in their tools, nobody can guarantee that it *does* keep them from
    doing so. So we may find ourselves in the situation of having to
    keep a compatibility interface for an experimental feature...

For the second point, you should also consider how useful this feature
is to management tools. Just being able to remove and attach children
from a quorum node seems very useful on its own. I don't see why we
should wait for having support for other block drivers; also, for most
block drivers there is no meaningful way of adding or removing children
as nicely as that is possible for quorum.

E.g. you may have a block filter in the future where you want to
exchange its child BDS. This exchange should be an atomic operation, so
we cannot use this interface there anyway. For quorum, such an exchange
does not need to be atomic, since you can just add the new child first
and remove the old one afterwards.

So maybe in the future we get some block driver other than quorum for
which adding and removing children (as opposed to atomically exchanging
them) makes sense, but for now I can only see quorum. Therefore, that
this works for quorum only is in my opinion not a reason to make it
experimental. I think we actually want to keep it that way.

So the question would then be: What ways can you imagine to change this
interface, which would necessitate an incompatible change, therefore
calling for an experimental interface?

(My point is that with such an experimental interface, management tools
cannot use it, even though it'd be a very nice functionality to have)

> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  blockdev.c           | 48 +++++++++++++++++++++++++++++++++++++++++
>  qapi/block-core.json | 34 +++++++++++++++++++++++++++++
>  qmp-commands.hx      | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 143 insertions(+)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 32b04b4..8da0ffb 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3086,6 +3086,54 @@ fail:
>      qmp_output_visitor_cleanup(ov);
>  }
>  
> +void qmp_x_blockdev_child_add(const char *parent, const char *child,
> +                              Error **errp)
> +{
> +    BlockDriverState *parent_bs, *child_bs;
> +    Error *local_err = NULL;
> +
> +    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
> +    if (!parent_bs) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    child_bs = bdrv_find_node(child);
> +    if (!child_bs) {
> +        error_setg(errp, "Node '%s' not found", child);
> +        return;
> +    }
> +
> +    bdrv_add_child(parent_bs, child_bs, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +    }

You can just pass errp to bdrv_add_child().

> +}
> +
> +void qmp_x_blockdev_child_del(const char *parent, const char *child,
> +                              Error **errp)
> +{
> +    BlockDriverState *parent_bs, *child_bs;
> +    Error *local_err = NULL;
> +
> +    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
> +    if (!parent_bs) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    child_bs = bdrv_find_node(child);
> +    if (!child_bs) {
> +        error_setg(errp, "Node '%s' not found", child);
> +        return;
> +    }
> +
> +    bdrv_del_child(parent_bs, child_bs, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +    }

Same here.

Max

> +}
> +
>  BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>  {
>      BlockJobInfoList *head = NULL, **p_next = &head;
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index bb2189e..9418f05 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2114,3 +2114,37 @@
>  ##
>  { 'command': 'block-set-write-threshold',
>    'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
> +
> +##
> +# @x-blockdev-child-add
> +#
> +# Add a new child to the parent BDS. Currently only the Quorum driver
> +# implements this feature. This is useful to fix a broken quorum child.
> +#
> +# @parent: graph node name or id which the child will be added to.
> +#
> +# @child: graph node name that will be added.
> +#
> +# Note: this command is experimental, and not a stable API.
> +#
> +# Since: 2.5
> +##
> +{ 'command': 'x-blockdev-child-add',
> +  'data' : { 'parent': 'str', 'child': 'str' } }
> +
> +##
> +# @x-blockdev-child-del
> +#
> +# Remove a child from the parent BDS. Currently only the Quorum driver
> +# implements this feature. This is useful to fix a broken quorum child.
> +# Note, you can't remove a child if it would bring the quorum below its
> +# threshold.
> +#
> +# @parent: graph node name or id from which the child will removed.
> +#
> +# @child: graph node name that will be removed.
> +#
> +# Since: 2.5
> +##
> +{ 'command': 'x-blockdev-child-del',
> +  'data' : { 'parent': 'str', 'child': 'str' } }
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 66f0300..36e75b1 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -3916,6 +3916,67 @@ Example (2):
>  EQMP
>  
>      {
> +        .name       = "x-blockdev-child-add",
> +        .args_type  = "parent:B,child:B",
> +        .mhandler.cmd_new = qmp_marshal_x_blockdev_child_add,
> +    },
> +
> +SQMP
> +x-blockdev-child-add
> +------------
> +
> +Add a child to a quorum node.
> +
> +Arguments:
> +
> +- "parent": the quorum's id or node name
> +- "child": the child node-name which will be added
> +
> +Note: this command is experimental, and not a stable API. It doesn't
> +support all kinds of children, nor all block drivers.
> +
> +Example:
> +
> +-> { "execute": blockdev-add",
> +    "arguments": { "options": { "driver": "raw",
> +                                "node-name": "new_node",
> +                                "id": "test_new_node",
> +                                "file": { "driver": "file",
> +                                          "filename": "test.raw" } } } }
> +<- { "return": {} }
> +-> { "execute": "x-blockdev-child-add",
> +    "arguments": { "parent": "disk1", "child": "new_node" } }
> +<- { "return": {} }
> +
> +EQMP
> +
> +    {
> +        .name        = "x-blockdev-child-del",
> +        .args_type   = "parent:B,child:B",
> +        .mhandler.cmd_new = qmp_marshal_x_blockdev_child_del,
> +    },
> +
> +SQMP
> +x-blockdev-child-del
> +------------
> +
> +Delete a child from a quorum node. It can be used to remove a broken
> +quorum child.
> +
> +Arguments:
> +
> +- "parent": the quorum's id or node name
> +- "child": the child node-name which will be removed
> +
> +Example:
> +
> +-> { "execute": "x-blockdev-child-del",
> +    "arguments": { "parent": "disk1", "child": "new_node" } }
> +<- { "return": {} }
> +
> +EQMP
> +
> +    {
>          .name       = "query-named-block-nodes",
>          .args_type  = "",
>          .mhandler.cmd_new = qmp_marshal_query_named_block_nodes,
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child
  2015-10-07 19:00   ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2015-10-08  2:03     ` Wen Congyang
  2015-10-08 18:44       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 45+ messages in thread
From: Wen Congyang @ 2015-10-08  2:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Alberto Garcia, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Markus Armbruster,
	Gonglei, Stefan Hajnoczi, Yang Hongyang

On 10/08/2015 03:00 AM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> In some cases, we want to take a quorum child offline, and take
>> another child online.
> 
> Hi,
>   Have you checked the output of 'info block' after adding/deleting a child?
> I'm using one of your older worlds (from a few months ago) and I found I had
> to add a 
> 
>     bdrv_refresh_filename(bs);
> 
> to get the output of 'info block' to show the new child.
> I don't see it in this version.

Max sent a patch series to drop BDS.filename, so I don't call bdrv_refresh_filename()
here. If the BDS is not the top BDS, 'info block' still shows the wrong child.

Thanks
Wen Congyang

> 
> Dave
> 
> 
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> ---
>>  block.c                   | 50 +++++++++++++++++++++++++++++++++++++++++++++++
>>  include/block/block.h     |  5 +++++
>>  include/block/block_int.h |  5 +++++
>>  3 files changed, 60 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index e815d73..1b25e43 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -4265,3 +4265,53 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>>  {
>>      return &bs->stats;
>>  }
>> +
>> +/*
>> + * Hot add/remove a BDS's child. So the user can take a child offline when
>> + * it is broken and take a new child online
>> + */
>> +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>> +                    Error **errp)
>> +{
>> +
>> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
>> +        error_setg(errp, "The BDS %s doesn't support adding a child",
>> +                   bdrv_get_device_or_node_name(parent_bs));
>> +        return;
>> +    }
>> +
>> +    if (!QLIST_EMPTY(&child_bs->parents)) {
>> +        error_setg(errp, "The BDS %s already has parent",
>> +                   child_bs->node_name);
>> +        return;
>> +    }
>> +
>> +    parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
>> +}
>> +
>> +void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>> +                    Error **errp)
>> +{
>> +    BdrvChild *child;
>> +
>> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
>> +        error_setg(errp, "The BDS %s doesn't support removing a child",
>> +                   bdrv_get_device_or_node_name(parent_bs));
>> +        return;
>> +    }
>> +
>> +    QLIST_FOREACH(child, &parent_bs->children, next) {
>> +        if (child->bs == child_bs) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (!child) {
>> +        error_setg(errp, "BDS %s is not a child of %s",
>> +                   bdrv_get_device_or_node_name(child_bs),
>> +                   bdrv_get_device_or_node_name(parent_bs));
>> +        return;
>> +    }
>> +
>> +    parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
>> +}
>> diff --git a/include/block/block.h b/include/block/block.h
>> index ef67353..665c56f 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -616,4 +616,9 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
>>  
>>  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
>>  
>> +void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
>> +                    Error **errp);
>> +void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
>> +                    Error **errp);
>> +
>>  #endif
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 2f2c47b..64cbc55 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -288,6 +288,11 @@ struct BlockDriver {
>>       */
>>      int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
>>  
>> +    void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
>> +                           Error **errp);
>> +    void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
>> +                           Error **errp);
>> +
>>      QLIST_ENTRY(BlockDriver) list;
>>  };
>>  
>> -- 
>> 2.4.3
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child
  2015-10-07 13:35   ` Alberto Garcia
@ 2015-10-08  2:05     ` Wen Congyang
  0 siblings, 0 replies; 45+ messages in thread
From: Wen Congyang @ 2015-10-08  2:05 UTC (permalink / raw)
  To: Alberto Garcia, qemu devel, Eric Blake, Markus Armbruster,
	Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang

On 10/07/2015 09:35 PM, Alberto Garcia wrote:
> On Tue 22 Sep 2015 09:44:19 AM CEST, Wen Congyang <wency@cn.fujitsu.com> wrote:
>> In some cases, we want to take a quorum child offline, and take
>> another child online.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
>> +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>> +                    Error **errp)
>> +{
>> +
>> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
>> +        error_setg(errp, "The BDS %s doesn't support adding a child",
>> +                   bdrv_get_device_or_node_name(parent_bs));
>> +        return;
>> +    }
>> +
>> +    if (!QLIST_EMPTY(&child_bs->parents)) {
>> +        error_setg(errp, "The BDS %s already has parent",
>> +                   child_bs->node_name);
> 
> I think there's one 'a' missing:
> 
>   "The BDS %s already has a parent".
> 
> I also don't think we should use "BDS" in error messages, that's an
> acronym for the name of a C data type, not something that the user is
> supposed to know about.
> 
> I suggest using 'Node' instead.
> 
> Otherwise the patch looks good to me, thanks!

OK, I will fix it in the next version

Thanks
Wen Congyang

> 
> Berto
> .
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child
  2015-10-07 18:33   ` [Qemu-devel] [Qemu-block] " Max Reitz
@ 2015-10-08  2:06     ` Wen Congyang
  0 siblings, 0 replies; 45+ messages in thread
From: Wen Congyang @ 2015-10-08  2:06 UTC (permalink / raw)
  To: Max Reitz, qemu devel, Eric Blake, Markus Armbruster,
	Alberto Garcia, Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang

On 10/08/2015 02:33 AM, Max Reitz wrote:
> On 22.09.2015 09:44, Wen Congyang wrote:
>> In some cases, we want to take a quorum child offline, and take
>> another child online.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> ---
>>  block.c                   | 50 +++++++++++++++++++++++++++++++++++++++++++++++
>>  include/block/block.h     |  5 +++++
>>  include/block/block_int.h |  5 +++++
>>  3 files changed, 60 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index e815d73..1b25e43 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -4265,3 +4265,53 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
>>  {
>>      return &bs->stats;
>>  }
>> +
>> +/*
>> + * Hot add/remove a BDS's child. So the user can take a child offline when
>> + * it is broken and take a new child online
>> + */
>> +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>> +                    Error **errp)
>> +{
>> +
>> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
>> +        error_setg(errp, "The BDS %s doesn't support adding a child",
>> +                   bdrv_get_device_or_node_name(parent_bs));
>> +        return;
>> +    }
>> +
>> +    if (!QLIST_EMPTY(&child_bs->parents)) {
>> +        error_setg(errp, "The BDS %s already has parent",
>> +                   child_bs->node_name);
>> +        return;
>> +    }
>> +
>> +    parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
>> +}
>> +
>> +void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
>> +                    Error **errp)
>> +{
>> +    BdrvChild *child;
>> +
>> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
>> +        error_setg(errp, "The BDS %s doesn't support removing a child",
>> +                   bdrv_get_device_or_node_name(parent_bs));
>> +        return;
>> +    }
>> +
>> +    QLIST_FOREACH(child, &parent_bs->children, next) {
>> +        if (child->bs == child_bs) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (!child) {
>> +        error_setg(errp, "BDS %s is not a child of %s",
>> +                   bdrv_get_device_or_node_name(child_bs),
>> +                   bdrv_get_device_or_node_name(parent_bs));
>> +        return;
>> +    }
>> +
>> +    parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
> 
> How about we make this (BlockDriver.bdrv_del_child()) take a BdrvChild
> instead of a BDS? We could even make bdrv_del_child() as a whole take a
> BdrvChild parameter, but I don't suppose that would help much.

bdrv_add_child() takes a BDS, so I use BDS here.

Thanks
Wen Congyang

> 
> Max
> 
>> +}
>> diff --git a/include/block/block.h b/include/block/block.h
>> index ef67353..665c56f 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -616,4 +616,9 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
>>  
>>  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
>>  
>> +void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
>> +                    Error **errp);
>> +void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
>> +                    Error **errp);
>> +
>>  #endif
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 2f2c47b..64cbc55 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -288,6 +288,11 @@ struct BlockDriver {
>>       */
>>      int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
>>  
>> +    void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
>> +                           Error **errp);
>> +    void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
>> +                           Error **errp);
>> +
>>      QLIST_ENTRY(BlockDriver) list;
>>  };
>>  
>>
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
  2015-10-07 14:12   ` Alberto Garcia
@ 2015-10-08  2:10     ` Wen Congyang
  0 siblings, 0 replies; 45+ messages in thread
From: Wen Congyang @ 2015-10-08  2:10 UTC (permalink / raw)
  To: Alberto Garcia, qemu devel, Eric Blake, Markus Armbruster,
	Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang

On 10/07/2015 10:12 PM, Alberto Garcia wrote:
> On Tue 22 Sep 2015 09:44:20 AM CEST, Wen Congyang wrote:
> 
>> +++ b/block/quorum.c
>> @@ -66,6 +66,9 @@ typedef struct QuorumVotes {
>>  typedef struct BDRVQuorumState {
>>      BlockDriverState **bs; /* children BlockDriverStates */
>>      int num_children;      /* children count */
>> +    int max_children;      /* The maximum children count, we need to reallocate
>> +                            * bs if num_children grows larger than maximum.
>> +                            */
>>      int threshold;         /* if less than threshold children reads gave the
>>                              * same result a quorum error occurs.
>>                              */
> 
> As you announce in the cover letter of this series, your code depends on
> the parents list patch written by Kevin here:
> 
> http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04579.html
> 
> As you might be aware, and as part of the same series by Kevin,
> BDRVQuorumState will no longer hold a list of BlockDriverState but a
> list of BdrvChild instead:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg04571.html

I notice that, and I only one patch from Kevin now. I will fix it in the
next version.

> 
>> +static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
>> +                             Error **errp)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +
>> +    bdrv_drain(bs);
>> +
>> +    if (s->num_children == s->max_children) {
>> +        if (s->max_children >= INT_MAX) {
>> +            error_setg(errp, "Too many children");
>> +            return;
>> +        }
> 
> max_children can never be greater than INT_MAX. Use == instead.
> 
>> +        s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
>> +        s->bs[s->num_children] = NULL;
> 
> No need to set the pointer to NULL here, and you are anyway setting the
> pointer to the new child a few lines afterwards.

Yes, I will remove it in the next version.

> 
>> +        s->max_children++;
>> +    }
>> +
>> +    bdrv_ref(child_bs);
>> +    bdrv_attach_child(bs, child_bs, &child_format);
>> +    s->bs[s->num_children++] = child_bs;
>> +}
>> +
>> +static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
>> +                             Error **errp)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +    BdrvChild *child;
>> +    int i;
>> +
>> +    for (i = 0; i < s->num_children; i++) {
>> +        if (s->bs[i] == child_bs) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    QLIST_FOREACH(child, &bs->children, next) {
>> +        if (child->bs == child_bs) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    /* we have checked it in bdrv_del_child() */
>> +    assert(i < s->num_children && child);
>> +
>> +    if (s->num_children <= s->threshold) {
>> +        error_setg(errp,
>> +            "The number of children cannot be lower than the vote threshold %d",
>> +            s->threshold);
>> +        return;
>> +    }
>> +
>> +    bdrv_drain(bs);
>> +    /* We can safely remove this child now */
>> +    memmove(&s->bs[i], &s->bs[i + 1],
>> +            (s->num_children - i - 1) * sizeof(void *));
>> +    s->num_children--;
>> +    s->bs[s->num_children] = NULL;
> 
> Same here, no one will check or use s->bs[s->num_children] so there's no
> need to make it NULL.
> 
> Apart from the issue of using only part of Kevin's series, the rest are
> minor things.

I will fix it in the next version.

> 
> Thanks and sorry for the late review!

Thanks for your review

Wen Congyang

> 
> Berto
> .
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-07 19:42   ` [Qemu-devel] [Qemu-block] " Max Reitz
@ 2015-10-08  6:15     ` Markus Armbruster
  2015-10-08  8:29       ` Alberto Garcia
                         ` (2 more replies)
  2015-10-12 16:27     ` Max Reitz
  1 sibling, 3 replies; 45+ messages in thread
From: Markus Armbruster @ 2015-10-08  6:15 UTC (permalink / raw)
  To: Max Reitz
  Cc: Kevin Wolf, Yang Hongyang, Alberto Garcia, zhanghailiang,
	Jiang Yunhong, Dong Eddie, qemu devel, Dr. David Alan Gilbert,
	Gonglei, Stefan Hajnoczi, qemu block

Max Reitz <mreitz@redhat.com> writes:

> On 22.09.2015 09:44, Wen Congyang wrote:
>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>> It justs for adding/removing quorum's child now, and don't support all
>> kinds of children,
>
> It does support all kinds of children for quorum, doesn't it?
>
>>                    nor all block drivers. So it is experimental now.
>
> Well, that is not really a reason why we would have to make it
> experimental. For instance, blockdev-add (although some might argue it
> actually is experimental...) doesn't support all block drivers either.

Yup, and not calling it x-blockdev-add until it's done was a mistake.
People tried using it, then found its current limitations the painful
way.  Not nice.

> The reason I am hesitant of adding an experimental QMP interface that is
> actually visible to the user (compare x-image in blkverify and blkdebug,
> which are not documented and not to be used by the user) is twofold:
>
> (1) At some point we have to say "OK, this is good enough now" and make
>     it stable. What would that point be? Who can guarantee that we
>     wouldn't want to make any interface changes after that point?

Nobody can, just like for any other interface.  So?

The x- prefix enables work spanning multiple releases.  Until the
feature is complete, we have a hard time seeing the whole picture, and
therefore the risk of interface mistakes is higher than normal.  Once
it's complete, we drop the x-.

>                                                                   Would
>     we actually remember to revisit this function once in a while and
>     consider making it stable?

Has that been a problem in the past?

If the feature is of any use, there's always been mounting pressure to
finish the job and drop the x-.  If it's of no use, not dropping the x-
would do no harm.  The opposite, actually.

> (2) While marking things experimental *should* keep people from using it
>     in their tools, nobody can guarantee that it *does* keep them from
>     doing so. So we may find ourselves in the situation of having to
>     keep a compatibility interface for an experimental feature...

You can't force people not to do stupid things, but you *can* improve
their chances at avoiding them.  Clearly marking experimental stuff
improves chances.

> For the second point, you should also consider how useful this feature
> is to management tools. Just being able to remove and attach children
> from a quorum node seems very useful on its own. I don't see why we
> should wait for having support for other block drivers; also, for most
> block drivers there is no meaningful way of adding or removing children
> as nicely as that is possible for quorum.

Okay, this is an argument I might be able to buy.

> E.g. you may have a block filter in the future where you want to
> exchange its child BDS. This exchange should be an atomic operation, so
> we cannot use this interface there anyway. For quorum, such an exchange
> does not need to be atomic, since you can just add the new child first
> and remove the old one afterwards.
>
> So maybe in the future we get some block driver other than quorum for
> which adding and removing children (as opposed to atomically exchanging
> them) makes sense, but for now I can only see quorum. Therefore, that
> this works for quorum only is in my opinion not a reason to make it
> experimental. I think we actually want to keep it that way.

Are you telling us the existing interface is insufficiently general?
That the general interface neeeds to support atomic replacement?

If yes, why isn't the general interface is what we should do for quorum?
Delete is atomic replacement by nothing, add is atomic replacement of
nothing.

> So the question would then be: What ways can you imagine to change this
> interface, which would necessitate an incompatible change, therefore
> calling for an experimental interface?

Yes, that's an important question.

Another important question is whether this is the interface we want.

A secondary question is whether the incompleteness of the implementation
demands an x- to warn users.

> (My point is that with such an experimental interface, management tools
> cannot use it, even though it'd be a very nice functionality to have)

How much work is it to finish the job?  Unless it's a substantial chunk,
debating x- is much ado about nothing: just finish the job already :)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
  2015-10-07 18:51   ` [Qemu-devel] [Qemu-block] " Max Reitz
@ 2015-10-08  8:12     ` Alberto Garcia
  2015-10-09 15:51       ` Max Reitz
  0 siblings, 1 reply; 45+ messages in thread
From: Alberto Garcia @ 2015-10-08  8:12 UTC (permalink / raw)
  To: Max Reitz, Wen Congyang, qemu devel, Eric Blake,
	Markus Armbruster, Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang

On Wed 07 Oct 2015 08:51:21 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>> +    if (s->num_children == s->max_children) {
>> +        if (s->max_children >= INT_MAX) {
>
> Opposing Berto (:-)), I like >= even if the > part is actually
> impossible. I myself like to use constructs such as:
>
> for (i = 0; i < n; i++) {
>     /* ... */
> }
> if (i >= n) {
>     /* ... */
> }
>
> Even though @i can never exceed @n after the loop. This is because my
> way of thinking is "What if it could exceed @n? Then I'd like to take
> this branch as well." The same applies here. s->max_children can never
> exceed INT_MAX, but if it could, we'd want that to be an error, too.

If s->max_children (and therefore s->num_children) could exceed the
upper limit then we would probably want to assert on that, because it
means that there's something seriously broken. The purpose of this code
is to make sure that the upper limit is... well, an upper limit :-) If
that invariant no longer holds then I don't think we want a simple "Too
many children" error.

>> +        s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
>> +        s->bs[s->num_children] = NULL;
>> +        s->max_children++;
>> +    }
>
> Just a suggestion, please feel free to ignore it completely:
>
> You can drop the s->max_children field and just always call g_renew()
> with s->num_children + 1 as the @count parameter. There shouldn't be
> any (visible) performance penalty, but it would simplify the code.

If s->num_children has decreased since the previous g_renew() call
(because the user called quorum_del_child()) that could actually reduce
the array size. Nothing would break though, at worst it would just be a
bit counter-intuitive :-)

https://www.gnu.org/software/libc/manual/html_node/Changing-Block-Size.html

Berto

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-08  6:15     ` Markus Armbruster
@ 2015-10-08  8:29       ` Alberto Garcia
  2015-10-08 10:03         ` Kevin Wolf
  2015-10-09 16:14         ` Max Reitz
  2015-10-08 11:02       ` [Qemu-devel] Dynamic reconfiguration (was: qmp: add monitor command to add/remove a child) Kevin Wolf
  2015-10-09 16:13       ` [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child Max Reitz
  2 siblings, 2 replies; 45+ messages in thread
From: Alberto Garcia @ 2015-10-08  8:29 UTC (permalink / raw)
  To: Markus Armbruster, Max Reitz
  Cc: Kevin Wolf, Yang Hongyang, zhanghailiang, Jiang Yunhong,
	Dong Eddie, qemu devel, Dr. David Alan Gilbert, Gonglei,
	Stefan Hajnoczi, qemu block

On Thu 08 Oct 2015 08:15:25 AM CEST, Markus Armbruster wrote:
>> For the second point, you should also consider how useful this
>> feature is to management tools. Just being able to remove and attach
>> children from a quorum node seems very useful on its own. I don't see
>> why we should wait for having support for other block drivers; also,
>> for most block drivers there is no meaningful way of adding or
>> removing children as nicely as that is possible for quorum.
>
> Okay, this is an argument I might be able to buy.

Note that if we want to make this interface stable there's one use case
missing: there's currently no way to change the vote threshold.

This is maybe not so important for the COLO use case, but for the
general case of adding and removing children from a quorum node having
the possibility to change the threshold makes a lot of sense.

That would probably require a its own API ('quorum-set-threshold' or
something like that) so I don't think it has an effect on these
child-add and child-del commands, but I wanted to mention it here anyway
in case someone sees something that I'm overlooking.

Berto

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-08  8:29       ` Alberto Garcia
@ 2015-10-08 10:03         ` Kevin Wolf
  2015-10-08 10:13           ` Alberto Garcia
  2015-10-09 16:14         ` Max Reitz
  1 sibling, 1 reply; 45+ messages in thread
From: Kevin Wolf @ 2015-10-08 10:03 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu block, Yang Hongyang, zhanghailiang, Jiang Yunhong,
	Dong Eddie, qemu devel, Markus Armbruster, Gonglei,
	Stefan Hajnoczi, Max Reitz, Dr. David Alan Gilbert

Am 08.10.2015 um 10:29 hat Alberto Garcia geschrieben:
> On Thu 08 Oct 2015 08:15:25 AM CEST, Markus Armbruster wrote:
> >> For the second point, you should also consider how useful this
> >> feature is to management tools. Just being able to remove and attach
> >> children from a quorum node seems very useful on its own. I don't see
> >> why we should wait for having support for other block drivers; also,
> >> for most block drivers there is no meaningful way of adding or
> >> removing children as nicely as that is possible for quorum.
> >
> > Okay, this is an argument I might be able to buy.
> 
> Note that if we want to make this interface stable there's one use case
> missing: there's currently no way to change the vote threshold.
> 
> This is maybe not so important for the COLO use case, but for the
> general case of adding and removing children from a quorum node having
> the possibility to change the threshold makes a lot of sense.
> 
> That would probably require a its own API ('quorum-set-threshold' or
> something like that) so I don't think it has an effect on these
> child-add and child-del commands, but I wanted to mention it here anyway
> in case someone sees something that I'm overlooking.

The right way to change the voting threshold is bdrv_reopen(). Adding
support to quorum for changing the threshold this way should be trivial.
The hard part is how to expose it in QMP, which is why I've only added
it to qemu-io so far (which happens to be accessible from HMP).

If you need this functionality, let's start a new email thread and
discuss the right QMP interface for blockdev-reopen (or blockdev-
set-options or whatever you would call it).

Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-08 10:03         ` Kevin Wolf
@ 2015-10-08 10:13           ` Alberto Garcia
  0 siblings, 0 replies; 45+ messages in thread
From: Alberto Garcia @ 2015-10-08 10:13 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu block, Yang Hongyang, zhanghailiang, Jiang Yunhong,
	Dong Eddie, qemu devel, Markus Armbruster, Gonglei,
	Stefan Hajnoczi, Max Reitz, Dr. David Alan Gilbert

On Thu 08 Oct 2015 12:03:45 PM CEST, Kevin Wolf wrote:
>> Note that if we want to make this interface stable there's one use
>> case missing: there's currently no way to change the vote threshold.
>> 
> The right way to change the voting threshold is bdrv_reopen(). Adding
> support to quorum for changing the threshold this way should be
> trivial.

Ah, and that would mean implementing quorum_reopen_*. Works for me.

> The hard part is how to expose it in QMP, which is why I've only added
> it to qemu-io so far (which happens to be accessible from HMP).
>
> If you need this functionality, let's start a new email thread and
> discuss the right QMP interface for blockdev-reopen (or blockdev-
> set-options or whatever you would call it).

Ok, sounds good. Thanks!

Berto

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Dynamic reconfiguration (was: qmp: add monitor command to add/remove a child)
  2015-10-08  6:15     ` Markus Armbruster
  2015-10-08  8:29       ` Alberto Garcia
@ 2015-10-08 11:02       ` Kevin Wolf
  2015-10-08 11:10         ` [Qemu-devel] [Qemu-block] " Kevin Wolf
  2015-10-09 16:13       ` [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child Max Reitz
  2 siblings, 1 reply; 45+ messages in thread
From: Kevin Wolf @ 2015-10-08 11:02 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Yang Hongyang, Alberto Garcia, zhanghailiang, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi,
	qemu block, Dr. David Alan Gilbert

Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben:
> Max Reitz <mreitz@redhat.com> writes:
> > E.g. you may have a block filter in the future where you want to
> > exchange its child BDS. This exchange should be an atomic operation, so
> > we cannot use this interface there anyway. For quorum, such an exchange
> > does not need to be atomic, since you can just add the new child first
> > and remove the old one afterwards.
> >
> > So maybe in the future we get some block driver other than quorum for
> > which adding and removing children (as opposed to atomically exchanging
> > them) makes sense, but for now I can only see quorum. Therefore, that
> > this works for quorum only is in my opinion not a reason to make it
> > experimental. I think we actually want to keep it that way.
> 
> Are you telling us the existing interface is insufficiently general?
> That the general interface neeeds to support atomic replacement?
> 
> If yes, why isn't the general interface is what we should do for quorum?
> Delete is atomic replacement by nothing, add is atomic replacement of
> nothing.

The general thing is what we used to call "dynamic reconfiguration". If
we want a single command to enable it (which would be great if we
could), we need to some more design work first. Atomic replacement might
be the operation we're looking for, but we have to be sure.

So far we haven't thought about dynamic reconfiguation enough that we
would know the right solution, but enough that we know it's hard. That
would be an argument for me that makes adding an x-* command now
acceptable. On the other hand, the fact that we want a single command in
the end makes me want to keep it experimental.


What types of dynamic reconfiguration do we need to support? I'll start
with a small list, feel free to extend it:

* Replace a child node with another node. This works pretty much
  everywhere in the tree - including the root, i.e. BlockBackend!
  Working just on BDSes doesn't seem to be enough.

* Adding a child to a node that can take additional children (e.g.
  quorum can take an arbitrary number; but also COW image formats have
  an option child, so you could add a backing file to a node originally
  opened with BDRV_O_NO_BACKING)

  Same as atomically replacing nothing by a node.

* Removing an optional child from a node that remains valid with that
  child removed. The same examples apply.

  Same as atomically replacing a child by nothing.

* Add a new node between two existing nodes. An example is taking a live
  snapshot, which inserts a new node between the BlockBackend and the
  first BDS. Or it could be used to insert a filter somewhere in the
  graph.

  Same as creating the new node pointing to node B (or dynamically
  adding it) and then atomically replacing the reference of node A that
  pointed to B with a reference to the new node.

* Add a new node between multiple existing nodes. This is one of the
  examples we always used to discuss with dynamic reconfiguration:

      base <- sn1 <- sn2 <--+-- virtio-blk
                            |
                            +-- NBD server

  Adding a filter could result in this:

        base <- sn1 <- sn2 <- throttling <--+-- virtio-blk
                                          |
                                          +-- NBD server

  Or this:

        base <- sn1 <- sn2 <--+-- throttling <- virtio-blk
                              |
                              +-- NBD server

  Or this:

        base <- sn1 <- sn2 <--+-- virtio-blk
                              |
                              +-- throttling <- NBD server

  All of these are different kinds of "adding a filter", and all of
  them are valid operations that a user could want to perform.

  Case 2 and 3 are really just "add a new node between two existing
  nodes", as explained above.

  Case 1 is new: We still create the throttling node so that it already
  points to sn2, but now we have to atomically change the children of
  two things (the BlockBackends of virtio-blk and the NBD server). Not a
  problem per se because we can just do that, but it raises the question
  whether the atomic replacement operation needs to be transactionable.

* Remove a node between two (or more) other nodes.

  Same as atomically replacing a child by a grandchild. For more than
  two involved nodes, again a transactional version might be needed.

So at the first sight, this operation seems to work as the general
interface for dynamic reconfiguration.


One problem we discussed earlier that I'm not sure whether it's related
is filter nodes inserted automatically once we change e.g. the I/O
throttling QMP commands to add a throttling filter BDS to the graph.

If the user creates nodes A and B, but sets throttling options, they
might end up with a chain like this:

    A <- throttling <- B

Now imagine that they want to add another filter between A and B, let's
say blkdebug. They would need to know that they have to insert the new
node between A and throttling or B and throttling, but not between A and
B. If they tried to insert it between A and B, the algorithm above says
that they would let blkdebug point to A, and replace B's child with
blkdebug, the resulting tree wouldn't be throttled any more!

    A <- blkdebug <- B

         throttling (ref = 0 -> delete)

Even if they knew that they have to consider the throttling node, it
currently wouldn't have a node-name, and with Jeff's autogenerated names
it wouldn't be predictable.

Maybe the dynamic reconfiguration interface does need to be a bit
cleverer.


Anyway, after writing all of this, I'm almost convinced now that an
experimental interface is the right thing to do in this patch series.

Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Dynamic reconfiguration (was: qmp: add monitor command to add/remove a child)
  2015-10-08 11:02       ` [Qemu-devel] Dynamic reconfiguration (was: qmp: add monitor command to add/remove a child) Kevin Wolf
@ 2015-10-08 11:10         ` Kevin Wolf
  2015-10-21  8:27           ` [Qemu-devel] [Qemu-block] Dynamic reconfiguration Markus Armbruster
  0 siblings, 1 reply; 45+ messages in thread
From: Kevin Wolf @ 2015-10-08 11:10 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu block, zhanghailiang, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	Dr. David Alan Gilbert

Am 08.10.2015 um 13:02 hat Kevin Wolf geschrieben:
> Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben:
> > Max Reitz <mreitz@redhat.com> writes:
> > > E.g. you may have a block filter in the future where you want to
> > > exchange its child BDS. This exchange should be an atomic operation, so
> > > we cannot use this interface there anyway. For quorum, such an exchange
> > > does not need to be atomic, since you can just add the new child first
> > > and remove the old one afterwards.
> > >
> > > So maybe in the future we get some block driver other than quorum for
> > > which adding and removing children (as opposed to atomically exchanging
> > > them) makes sense, but for now I can only see quorum. Therefore, that
> > > this works for quorum only is in my opinion not a reason to make it
> > > experimental. I think we actually want to keep it that way.
> > 
> > Are you telling us the existing interface is insufficiently general?
> > That the general interface neeeds to support atomic replacement?
> > 
> > If yes, why isn't the general interface is what we should do for quorum?
> > Delete is atomic replacement by nothing, add is atomic replacement of
> > nothing.
> 
> The general thing is what we used to call "dynamic reconfiguration". If
> we want a single command to enable it (which would be great if we
> could), we need to some more design work first. Atomic replacement might
> be the operation we're looking for, but we have to be sure.
> 
> So far we haven't thought about dynamic reconfiguation enough that we
> would know the right solution, but enough that we know it's hard. That
> would be an argument for me that makes adding an x-* command now
> acceptable. On the other hand, the fact that we want a single command in
> the end makes me want to keep it experimental.
> 
> 
> What types of dynamic reconfiguration do we need to support? I'll start
> with a small list, feel free to extend it:
> 
> * Replace a child node with another node. This works pretty much
>   everywhere in the tree - including the root, i.e. BlockBackend!
>   Working just on BDSes doesn't seem to be enough.
> 
> * Adding a child to a node that can take additional children (e.g.
>   quorum can take an arbitrary number; but also COW image formats have
>   an option child, so you could add a backing file to a node originally
>   opened with BDRV_O_NO_BACKING)
> 
>   Same as atomically replacing nothing by a node.
> 
> * Removing an optional child from a node that remains valid with that
>   child removed. The same examples apply.
> 
>   Same as atomically replacing a child by nothing.
> 
> * Add a new node between two existing nodes. An example is taking a live
>   snapshot, which inserts a new node between the BlockBackend and the
>   first BDS. Or it could be used to insert a filter somewhere in the
>   graph.
> 
>   Same as creating the new node pointing to node B (or dynamically
>   adding it) and then atomically replacing the reference of node A that
>   pointed to B with a reference to the new node.
> 
> * Add a new node between multiple existing nodes. This is one of the
>   examples we always used to discuss with dynamic reconfiguration:
> 
>       base <- sn1 <- sn2 <--+-- virtio-blk
>                             |
>                             +-- NBD server
> 
>   Adding a filter could result in this:
> 
>         base <- sn1 <- sn2 <- throttling <--+-- virtio-blk
>                                           |
>                                           +-- NBD server
> 
>   Or this:
> 
>         base <- sn1 <- sn2 <--+-- throttling <- virtio-blk
>                               |
>                               +-- NBD server
> 
>   Or this:
> 
>         base <- sn1 <- sn2 <--+-- virtio-blk
>                               |
>                               +-- throttling <- NBD server
> 
>   All of these are different kinds of "adding a filter", and all of
>   them are valid operations that a user could want to perform.
> 
>   Case 2 and 3 are really just "add a new node between two existing
>   nodes", as explained above.
> 
>   Case 1 is new: We still create the throttling node so that it already
>   points to sn2, but now we have to atomically change the children of
>   two things (the BlockBackends of virtio-blk and the NBD server). Not a
>   problem per se because we can just do that, but it raises the question
>   whether the atomic replacement operation needs to be transactionable.
> 
> * Remove a node between two (or more) other nodes.
> 
>   Same as atomically replacing a child by a grandchild. For more than
>   two involved nodes, again a transactional version might be needed.
> 
> So at the first sight, this operation seems to work as the general
> interface for dynamic reconfiguration.

And actually, after re-reading the other branch of this email thread
where I replied to Berto that he wants to use bdrv_reopen() to change
the voting threshold for quorum, it occurred to me that bdrv_reopen()
should possibly also be this atomatic replacement function.

After all, the references to other nodes are blockdev-add options, and
if we implement bdrv_reopen() fully so that it can change any options
that can be given to blockdev-add, that means that we must be able to
replace children.

Kevin

> One problem we discussed earlier that I'm not sure whether it's related
> is filter nodes inserted automatically once we change e.g. the I/O
> throttling QMP commands to add a throttling filter BDS to the graph.
> 
> If the user creates nodes A and B, but sets throttling options, they
> might end up with a chain like this:
> 
>     A <- throttling <- B
> 
> Now imagine that they want to add another filter between A and B, let's
> say blkdebug. They would need to know that they have to insert the new
> node between A and throttling or B and throttling, but not between A and
> B. If they tried to insert it between A and B, the algorithm above says
> that they would let blkdebug point to A, and replace B's child with
> blkdebug, the resulting tree wouldn't be throttled any more!
> 
>     A <- blkdebug <- B
> 
>          throttling (ref = 0 -> delete)
> 
> Even if they knew that they have to consider the throttling node, it
> currently wouldn't have a node-name, and with Jeff's autogenerated names
> it wouldn't be predictable.
> 
> Maybe the dynamic reconfiguration interface does need to be a bit
> cleverer.
> 
> 
> Anyway, after writing all of this, I'm almost convinced now that an
> experimental interface is the right thing to do in this patch series.
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child
  2015-10-08  2:03     ` Wen Congyang
@ 2015-10-08 18:44       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-08 18:44 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Alberto Garcia, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Markus Armbruster,
	Gonglei, Stefan Hajnoczi, Yang Hongyang

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 10/08/2015 03:00 AM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >> In some cases, we want to take a quorum child offline, and take
> >> another child online.
> > 
> > Hi,
> >   Have you checked the output of 'info block' after adding/deleting a child?
> > I'm using one of your older worlds (from a few months ago) and I found I had
> > to add a 
> > 
> >     bdrv_refresh_filename(bs);
> > 
> > to get the output of 'info block' to show the new child.
> > I don't see it in this version.
> 
> Max sent a patch series to drop BDS.filename, so I don't call bdrv_refresh_filename()
> here. If the BDS is not the top BDS, 'info block' still shows the wrong child.

Ah, so I guess you can just implement the exact_filename() function and that
will get it right?

Dave
> 
> Thanks
> Wen Congyang
> 
> > 
> > Dave
> > 
> > 
> >>
> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> >> Reviewed-by: Eric Blake <eblake@redhat.com>
> >> ---
> >>  block.c                   | 50 +++++++++++++++++++++++++++++++++++++++++++++++
> >>  include/block/block.h     |  5 +++++
> >>  include/block/block_int.h |  5 +++++
> >>  3 files changed, 60 insertions(+)
> >>
> >> diff --git a/block.c b/block.c
> >> index e815d73..1b25e43 100644
> >> --- a/block.c
> >> +++ b/block.c
> >> @@ -4265,3 +4265,53 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
> >>  {
> >>      return &bs->stats;
> >>  }
> >> +
> >> +/*
> >> + * Hot add/remove a BDS's child. So the user can take a child offline when
> >> + * it is broken and take a new child online
> >> + */
> >> +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
> >> +                    Error **errp)
> >> +{
> >> +
> >> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
> >> +        error_setg(errp, "The BDS %s doesn't support adding a child",
> >> +                   bdrv_get_device_or_node_name(parent_bs));
> >> +        return;
> >> +    }
> >> +
> >> +    if (!QLIST_EMPTY(&child_bs->parents)) {
> >> +        error_setg(errp, "The BDS %s already has parent",
> >> +                   child_bs->node_name);
> >> +        return;
> >> +    }
> >> +
> >> +    parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
> >> +}
> >> +
> >> +void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
> >> +                    Error **errp)
> >> +{
> >> +    BdrvChild *child;
> >> +
> >> +    if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
> >> +        error_setg(errp, "The BDS %s doesn't support removing a child",
> >> +                   bdrv_get_device_or_node_name(parent_bs));
> >> +        return;
> >> +    }
> >> +
> >> +    QLIST_FOREACH(child, &parent_bs->children, next) {
> >> +        if (child->bs == child_bs) {
> >> +            break;
> >> +        }
> >> +    }
> >> +
> >> +    if (!child) {
> >> +        error_setg(errp, "BDS %s is not a child of %s",
> >> +                   bdrv_get_device_or_node_name(child_bs),
> >> +                   bdrv_get_device_or_node_name(parent_bs));
> >> +        return;
> >> +    }
> >> +
> >> +    parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
> >> +}
> >> diff --git a/include/block/block.h b/include/block/block.h
> >> index ef67353..665c56f 100644
> >> --- a/include/block/block.h
> >> +++ b/include/block/block.h
> >> @@ -616,4 +616,9 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
> >>  
> >>  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
> >>  
> >> +void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
> >> +                    Error **errp);
> >> +void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
> >> +                    Error **errp);
> >> +
> >>  #endif
> >> diff --git a/include/block/block_int.h b/include/block/block_int.h
> >> index 2f2c47b..64cbc55 100644
> >> --- a/include/block/block_int.h
> >> +++ b/include/block/block_int.h
> >> @@ -288,6 +288,11 @@ struct BlockDriver {
> >>       */
> >>      int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
> >>  
> >> +    void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
> >> +                           Error **errp);
> >> +    void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
> >> +                           Error **errp);
> >> +
> >>      QLIST_ENTRY(BlockDriver) list;
> >>  };
> >>  
> >> -- 
> >> 2.4.3
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
  2015-10-08  8:12     ` Alberto Garcia
@ 2015-10-09 15:51       ` Max Reitz
  2015-10-12 11:56         ` Alberto Garcia
  0 siblings, 1 reply; 45+ messages in thread
From: Max Reitz @ 2015-10-09 15:51 UTC (permalink / raw)
  To: Alberto Garcia, Wen Congyang, qemu devel, Eric Blake,
	Markus Armbruster, Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 2306 bytes --]

On 08.10.2015 10:12, Alberto Garcia wrote:
> On Wed 07 Oct 2015 08:51:21 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>>> +    if (s->num_children == s->max_children) {
>>> +        if (s->max_children >= INT_MAX) {
>>
>> Opposing Berto (:-)), I like >= even if the > part is actually
>> impossible. I myself like to use constructs such as:
>>
>> for (i = 0; i < n; i++) {
>>     /* ... */
>> }
>> if (i >= n) {
>>     /* ... */
>> }
>>
>> Even though @i can never exceed @n after the loop. This is because my
>> way of thinking is "What if it could exceed @n? Then I'd like to take
>> this branch as well." The same applies here. s->max_children can never
>> exceed INT_MAX, but if it could, we'd want that to be an error, too.
> 
> If s->max_children (and therefore s->num_children) could exceed the
> upper limit then we would probably want to assert on that, because it
> means that there's something seriously broken. The purpose of this code
> is to make sure that the upper limit is... well, an upper limit :-) If
> that invariant no longer holds then I don't think we want a simple "Too
> many children" error.
> 
>>> +        s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
>>> +        s->bs[s->num_children] = NULL;
>>> +        s->max_children++;
>>> +    }
>>
>> Just a suggestion, please feel free to ignore it completely:
>>
>> You can drop the s->max_children field and just always call g_renew()
>> with s->num_children + 1 as the @count parameter. There shouldn't be
>> any (visible) performance penalty, but it would simplify the code.
> 
> If s->num_children has decreased since the previous g_renew() call
> (because the user called quorum_del_child()) that could actually reduce
> the array size.

Yes, it could. And that would be just fine. ;-)

We'd just keep the array exactly as big as it needs to be. I find that
pretty intuitive. It's just counter-intuitive if you think one should
never use realloc() for reducing the size of a buffer (and I know I
myself tend to write my code thinking that).

Max

>                 Nothing would break though, at worst it would just be a
> bit counter-intuitive :-)
> 
> https://www.gnu.org/software/libc/manual/html_node/Changing-Block-Size.html
> 
> Berto
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-08  6:15     ` Markus Armbruster
  2015-10-08  8:29       ` Alberto Garcia
  2015-10-08 11:02       ` [Qemu-devel] Dynamic reconfiguration (was: qmp: add monitor command to add/remove a child) Kevin Wolf
@ 2015-10-09 16:13       ` Max Reitz
  2015-10-09 16:42         ` Dr. David Alan Gilbert
  2015-10-12  7:56         ` Markus Armbruster
  2 siblings, 2 replies; 45+ messages in thread
From: Max Reitz @ 2015-10-09 16:13 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Yang Hongyang, Alberto Garcia, zhanghailiang,
	Jiang Yunhong, Dong Eddie, qemu devel, Dr. David Alan Gilbert,
	Gonglei, Stefan Hajnoczi, qemu block

[-- Attachment #1: Type: text/plain, Size: 7835 bytes --]

On 08.10.2015 08:15, Markus Armbruster wrote:
> Max Reitz <mreitz@redhat.com> writes:
> 
>> On 22.09.2015 09:44, Wen Congyang wrote:
>>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>>> It justs for adding/removing quorum's child now, and don't support all
>>> kinds of children,
>>
>> It does support all kinds of children for quorum, doesn't it?
>>
>>>                    nor all block drivers. So it is experimental now.
>>
>> Well, that is not really a reason why we would have to make it
>> experimental. For instance, blockdev-add (although some might argue it
>> actually is experimental...) doesn't support all block drivers either.
> 
> Yup, and not calling it x-blockdev-add until it's done was a mistake.
> People tried using it, then found its current limitations the painful
> way.  Not nice.

I knew I should have written s/some might/Markus does/. ;-)

>> The reason I am hesitant of adding an experimental QMP interface that is
>> actually visible to the user (compare x-image in blkverify and blkdebug,
>> which are not documented and not to be used by the user) is twofold:
>>
>> (1) At some point we have to say "OK, this is good enough now" and make
>>     it stable. What would that point be? Who can guarantee that we
>>     wouldn't want to make any interface changes after that point?
> 
> Nobody can, just like for any other interface.  So?

The main question is "what would that point be". As I can see you're
arguing that that point would be "once people want to use it", but I'm
arguing that people want to use it today or we wouldn't need this
interface at all.

I'm against adding external experimental interface because having
external interface indicates that someone wants to use them, but making
them experimental indicates that nobody should use them.

This interface is added for the COLO series. The documentation added in
patch 5 there explains usage of COLO with x-child-add. I don't think
that should be there, because it's experimental. But why have an
external interface if nobody should use it anyway?

> The x- prefix enables work spanning multiple releases.  Until the
> feature is complete, we have a hard time seeing the whole picture, and
> therefore the risk of interface mistakes is higher than normal.  Once
> it's complete, we drop the x-.

I'm arguing the feature is complete as far as what it's supposed to do goes.

>>                                                                   Would
>>     we actually remember to revisit this function once in a while and
>>     consider making it stable?
> 
> Has that been a problem in the past?

I don't know, because I never witnessed an external experimental
interface, but I haven't been closely involved with qemu for too long.

> If the feature is of any use, there's always been mounting pressure to
> finish the job and drop the x-.  If it's of no use, not dropping the x-
> would do no harm.  The opposite, actually.

This is of use, however.

I do see your point, but I'm still arguing that I don't see why we need
an external interface then. But COLO needs an external interface (which
management tools should be able to use!) so there's that.

>> (2) While marking things experimental *should* keep people from using it
>>     in their tools, nobody can guarantee that it *does* keep them from
>>     doing so. So we may find ourselves in the situation of having to
>>     keep a compatibility interface for an experimental feature...
> 
> You can't force people not to do stupid things, but you *can* improve
> their chances at avoiding them.  Clearly marking experimental stuff
> improves chances.

OK, right.

>> For the second point, you should also consider how useful this feature
>> is to management tools. Just being able to remove and attach children
>> from a quorum node seems very useful on its own. I don't see why we
>> should wait for having support for other block drivers; also, for most
>> block drivers there is no meaningful way of adding or removing children
>> as nicely as that is possible for quorum.
> 
> Okay, this is an argument I might be able to buy.
> 
>> E.g. you may have a block filter in the future where you want to
>> exchange its child BDS. This exchange should be an atomic operation, so
>> we cannot use this interface there anyway. For quorum, such an exchange
>> does not need to be atomic, since you can just add the new child first
>> and remove the old one afterwards.
>>
>> So maybe in the future we get some block driver other than quorum for
>> which adding and removing children (as opposed to atomically exchanging
>> them) makes sense, but for now I can only see quorum. Therefore, that
>> this works for quorum only is in my opinion not a reason to make it
>> experimental. I think we actually want to keep it that way.
> 
> Are you telling us the existing interface is insufficiently general?
> That the general interface neeeds to support atomic replacement?

The general interface for all block drivers and situations (as described
by Kevin), yes. The interface for adding/removing children from quorum
in regards to what COLO needs, no.

> If yes, why isn't the general interface is what we should do for quorum?
> Delete is atomic replacement by nothing, add is atomic replacement of
> nothing.

Yes, but I personally don't have a problem with macro functions. I don't
see the harm in implementing blockdev-child-{add,del} now the way they
are and later replacing them by calls to the general function.

But why should we do that?

Some people really want COLO in better sooner than later. Telling them
to wait for until the block layer is as nice as we want it to be is not
really an option I'd be willing to accept. I don't see why we should
delay the COLO work just so that we don't have these two (macro) functions.

The alternative would be to keep it experimental and then tell every
COLO user to use the general function once it's available. But since
COLO users are probably mostly management tools, that seems much more
difficult to me than to just keep these two macro functions as legacy in
qemu.

>> So the question would then be: What ways can you imagine to change this
>> interface, which would necessitate an incompatible change, therefore
>> calling for an experimental interface?
> 
> Yes, that's an important question.
> 
> Another important question is whether this is the interface we want.

I can see why this is an important question to you, but to me it is not
so much. As I argued above, I'm not opposed to adding interface that are
actually not what we want, but that are what users want, and that are
easy to implement with the interface that we want. It's what I call a
“macro function”.

> A secondary question is whether the incompleteness of the implementation
> demands an x- to warn users.

We don't want to shoehorn generality into these two functions, but add a
new one anyway. We might want to add optional parameters later on (e.g.
changing the threshold), but that would be a compatible change.

>> (My point is that with such an experimental interface, management tools
>> cannot use it, even though it'd be a very nice functionality to have)
> 
> How much work is it to finish the job?  Unless it's a substantial chunk,
> debating x- is much ado about nothing: just finish the job already :)

We have proven in the past to need a significant amount of time for even
for non-substantial chunks. Often, non-substantial chunks turn out to be
substantial when actually tackling them.

It basically comes down to whether the COLO authors are willing to wait
for us to do the job. And frankly, if I were them, I wouldn't be.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-08  8:29       ` Alberto Garcia
  2015-10-08 10:03         ` Kevin Wolf
@ 2015-10-09 16:14         ` Max Reitz
  1 sibling, 0 replies; 45+ messages in thread
From: Max Reitz @ 2015-10-09 16:14 UTC (permalink / raw)
  To: Alberto Garcia, Markus Armbruster
  Cc: Kevin Wolf, Yang Hongyang, zhanghailiang, Jiang Yunhong,
	Dong Eddie, qemu devel, Dr. David Alan Gilbert, Gonglei,
	Stefan Hajnoczi, qemu block

[-- Attachment #1: Type: text/plain, Size: 1517 bytes --]

On 08.10.2015 10:29, Alberto Garcia wrote:
> On Thu 08 Oct 2015 08:15:25 AM CEST, Markus Armbruster wrote:
>>> For the second point, you should also consider how useful this
>>> feature is to management tools. Just being able to remove and attach
>>> children from a quorum node seems very useful on its own. I don't see
>>> why we should wait for having support for other block drivers; also,
>>> for most block drivers there is no meaningful way of adding or
>>> removing children as nicely as that is possible for quorum.
>>
>> Okay, this is an argument I might be able to buy.
> 
> Note that if we want to make this interface stable there's one use case
> missing: there's currently no way to change the vote threshold.

Besides what Kevin said: If you add a new function, that would be
independent from these two functions. If want to add it as an optional
parameter to blockdev-add-child so the change is done atomically, that
wouldn't be an incompatible interface change either.

Max

> This is maybe not so important for the COLO use case, but for the
> general case of adding and removing children from a quorum node having
> the possibility to change the threshold makes a lot of sense.
> 
> That would probably require a its own API ('quorum-set-threshold' or
> something like that) so I don't think it has an effect on these
> child-add and child-del commands, but I wanted to mention it here anyway
> in case someone sees something that I'm overlooking.
> 
> Berto
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-09 16:13       ` [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child Max Reitz
@ 2015-10-09 16:42         ` Dr. David Alan Gilbert
  2015-10-09 18:24           ` Max Reitz
  2015-10-12  7:58           ` Markus Armbruster
  2015-10-12  7:56         ` Markus Armbruster
  1 sibling, 2 replies; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-09 16:42 UTC (permalink / raw)
  To: Max Reitz
  Cc: Kevin Wolf, Yang Hongyang, Alberto Garcia, zhanghailiang,
	Jiang Yunhong, Dong Eddie, Markus Armbruster, qemu devel,
	Gonglei, Stefan Hajnoczi, qemu block

* Max Reitz (mreitz@redhat.com) wrote:
> On 08.10.2015 08:15, Markus Armbruster wrote:
> > Max Reitz <mreitz@redhat.com> writes:
> > 
> >> On 22.09.2015 09:44, Wen Congyang wrote:
> >>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
> >>> It justs for adding/removing quorum's child now, and don't support all
> >>> kinds of children,
> >>
> >> It does support all kinds of children for quorum, doesn't it?
> >>
> >>>                    nor all block drivers. So it is experimental now.
> >>
> >> Well, that is not really a reason why we would have to make it
> >> experimental. For instance, blockdev-add (although some might argue it
> >> actually is experimental...) doesn't support all block drivers either.
> > 
> > Yup, and not calling it x-blockdev-add until it's done was a mistake.
> > People tried using it, then found its current limitations the painful
> > way.  Not nice.
> 
> I knew I should have written s/some might/Markus does/. ;-)
> 
> >> The reason I am hesitant of adding an experimental QMP interface that is
> >> actually visible to the user (compare x-image in blkverify and blkdebug,
> >> which are not documented and not to be used by the user) is twofold:
> >>
> >> (1) At some point we have to say "OK, this is good enough now" and make
> >>     it stable. What would that point be? Who can guarantee that we
> >>     wouldn't want to make any interface changes after that point?
> > 
> > Nobody can, just like for any other interface.  So?
> 
> The main question is "what would that point be". As I can see you're
> arguing that that point would be "once people want to use it", but I'm
> arguing that people want to use it today or we wouldn't need this
> interface at all.
> 
> I'm against adding external experimental interface because having
> external interface indicates that someone wants to use them, but making
> them experimental indicates that nobody should use them.
> 
> This interface is added for the COLO series. The documentation added in
> patch 5 there explains usage of COLO with x-child-add. I don't think
> that should be there, because it's experimental. But why have an
> external interface if nobody should use it anyway?

Because it lets people move forward; the COLO series is pretty huge, there
already seem to be side discussions spawning off about dynamic reconfiguration
of stuff, who knows how long those will take to pan out.
Adding the experimental stuff makes it easier for people to try and
get some feedback on.
If everyone turns out to love it then it only takes a trivial patch to promote
it; if people actually realise there is a better interface then it's
no problem to change it either - x- doesn't stop any one using it, but it
does remove their right to moan if it changes.

Dave

> 
> > The x- prefix enables work spanning multiple releases.  Until the
> > feature is complete, we have a hard time seeing the whole picture, and
> > therefore the risk of interface mistakes is higher than normal.  Once
> > it's complete, we drop the x-.
> 
> I'm arguing the feature is complete as far as what it's supposed to do goes.
> 
> >>                                                                   Would
> >>     we actually remember to revisit this function once in a while and
> >>     consider making it stable?
> > 
> > Has that been a problem in the past?
> 
> I don't know, because I never witnessed an external experimental
> interface, but I haven't been closely involved with qemu for too long.
> 
> > If the feature is of any use, there's always been mounting pressure to
> > finish the job and drop the x-.  If it's of no use, not dropping the x-
> > would do no harm.  The opposite, actually.
> 
> This is of use, however.
> 
> I do see your point, but I'm still arguing that I don't see why we need
> an external interface then. But COLO needs an external interface (which
> management tools should be able to use!) so there's that.
> 
> >> (2) While marking things experimental *should* keep people from using it
> >>     in their tools, nobody can guarantee that it *does* keep them from
> >>     doing so. So we may find ourselves in the situation of having to
> >>     keep a compatibility interface for an experimental feature...
> > 
> > You can't force people not to do stupid things, but you *can* improve
> > their chances at avoiding them.  Clearly marking experimental stuff
> > improves chances.
> 
> OK, right.
> 
> >> For the second point, you should also consider how useful this feature
> >> is to management tools. Just being able to remove and attach children
> >> from a quorum node seems very useful on its own. I don't see why we
> >> should wait for having support for other block drivers; also, for most
> >> block drivers there is no meaningful way of adding or removing children
> >> as nicely as that is possible for quorum.
> > 
> > Okay, this is an argument I might be able to buy.
> > 
> >> E.g. you may have a block filter in the future where you want to
> >> exchange its child BDS. This exchange should be an atomic operation, so
> >> we cannot use this interface there anyway. For quorum, such an exchange
> >> does not need to be atomic, since you can just add the new child first
> >> and remove the old one afterwards.
> >>
> >> So maybe in the future we get some block driver other than quorum for
> >> which adding and removing children (as opposed to atomically exchanging
> >> them) makes sense, but for now I can only see quorum. Therefore, that
> >> this works for quorum only is in my opinion not a reason to make it
> >> experimental. I think we actually want to keep it that way.
> > 
> > Are you telling us the existing interface is insufficiently general?
> > That the general interface neeeds to support atomic replacement?
> 
> The general interface for all block drivers and situations (as described
> by Kevin), yes. The interface for adding/removing children from quorum
> in regards to what COLO needs, no.
> 
> > If yes, why isn't the general interface is what we should do for quorum?
> > Delete is atomic replacement by nothing, add is atomic replacement of
> > nothing.
> 
> Yes, but I personally don't have a problem with macro functions. I don't
> see the harm in implementing blockdev-child-{add,del} now the way they
> are and later replacing them by calls to the general function.
> 
> But why should we do that?
> 
> Some people really want COLO in better sooner than later. Telling them
> to wait for until the block layer is as nice as we want it to be is not
> really an option I'd be willing to accept. I don't see why we should
> delay the COLO work just so that we don't have these two (macro) functions.
> 
> The alternative would be to keep it experimental and then tell every
> COLO user to use the general function once it's available. But since
> COLO users are probably mostly management tools, that seems much more
> difficult to me than to just keep these two macro functions as legacy in
> qemu.
> 
> >> So the question would then be: What ways can you imagine to change this
> >> interface, which would necessitate an incompatible change, therefore
> >> calling for an experimental interface?
> > 
> > Yes, that's an important question.
> > 
> > Another important question is whether this is the interface we want.
> 
> I can see why this is an important question to you, but to me it is not
> so much. As I argued above, I'm not opposed to adding interface that are
> actually not what we want, but that are what users want, and that are
> easy to implement with the interface that we want. It's what I call a
> “macro function”.
> 
> > A secondary question is whether the incompleteness of the implementation
> > demands an x- to warn users.
> 
> We don't want to shoehorn generality into these two functions, but add a
> new one anyway. We might want to add optional parameters later on (e.g.
> changing the threshold), but that would be a compatible change.
> 
> >> (My point is that with such an experimental interface, management tools
> >> cannot use it, even though it'd be a very nice functionality to have)
> > 
> > How much work is it to finish the job?  Unless it's a substantial chunk,
> > debating x- is much ado about nothing: just finish the job already :)
> 
> We have proven in the past to need a significant amount of time for even
> for non-substantial chunks. Often, non-substantial chunks turn out to be
> substantial when actually tackling them.
> 
> It basically comes down to whether the COLO authors are willing to wait
> for us to do the job. And frankly, if I were them, I wouldn't be.
> 
> Max
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-09 16:42         ` Dr. David Alan Gilbert
@ 2015-10-09 18:24           ` Max Reitz
  2015-10-12  8:07             ` Dr. David Alan Gilbert
  2015-10-12  8:18             ` Kevin Wolf
  2015-10-12  7:58           ` Markus Armbruster
  1 sibling, 2 replies; 45+ messages in thread
From: Max Reitz @ 2015-10-09 18:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Yang Hongyang, Alberto Garcia, zhanghailiang,
	Jiang Yunhong, Dong Eddie, Markus Armbruster, qemu devel,
	Gonglei, Stefan Hajnoczi, qemu block

[-- Attachment #1: Type: text/plain, Size: 4684 bytes --]

On 09.10.2015 18:42, Dr. David Alan Gilbert wrote:
> * Max Reitz (mreitz@redhat.com) wrote:
>> On 08.10.2015 08:15, Markus Armbruster wrote:
>>> Max Reitz <mreitz@redhat.com> writes:
>>>
>>>> On 22.09.2015 09:44, Wen Congyang wrote:
>>>>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>>>>> It justs for adding/removing quorum's child now, and don't support all
>>>>> kinds of children,
>>>>
>>>> It does support all kinds of children for quorum, doesn't it?
>>>>
>>>>>                    nor all block drivers. So it is experimental now.
>>>>
>>>> Well, that is not really a reason why we would have to make it
>>>> experimental. For instance, blockdev-add (although some might argue it
>>>> actually is experimental...) doesn't support all block drivers either.
>>>
>>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
>>> People tried using it, then found its current limitations the painful
>>> way.  Not nice.
>>
>> I knew I should have written s/some might/Markus does/. ;-)
>>
>>>> The reason I am hesitant of adding an experimental QMP interface that is
>>>> actually visible to the user (compare x-image in blkverify and blkdebug,
>>>> which are not documented and not to be used by the user) is twofold:
>>>>
>>>> (1) At some point we have to say "OK, this is good enough now" and make
>>>>     it stable. What would that point be? Who can guarantee that we
>>>>     wouldn't want to make any interface changes after that point?
>>>
>>> Nobody can, just like for any other interface.  So?
>>
>> The main question is "what would that point be". As I can see you're
>> arguing that that point would be "once people want to use it", but I'm
>> arguing that people want to use it today or we wouldn't need this
>> interface at all.
>>
>> I'm against adding external experimental interface because having
>> external interface indicates that someone wants to use them, but making
>> them experimental indicates that nobody should use them.
>>
>> This interface is added for the COLO series. The documentation added in
>> patch 5 there explains usage of COLO with x-child-add. I don't think
>> that should be there, because it's experimental. But why have an
>> external interface if nobody should use it anyway?
> 
> Because it lets people move forward; the COLO series is pretty huge, there
> already seem to be side discussions spawning off about dynamic reconfiguration
> of stuff, who knows how long those will take to pan out.

Yes, and my point is that with these functions
(blockdev-child-{add,del}) the result of that side discussion doesn't
matter.

> Adding the experimental stuff makes it easier for people to try and
> get some feedback on.

The thing is, I cannot imagine any feedback that would necessitate an
incompatible change. “I want to change quorum's options while
adding/removing children” can easily be accomplished with an additional
optional parameter.

But I do know that we want to keep things experimental exactly because
there can be feedback which I cannot imagine right now.

> If everyone turns out to love it then it only takes a trivial patch to promote
> it; if people actually realise there is a better interface then it's
> no problem to change it either - x- doesn't stop any one using it,

But it should, shouldn't it? No management tool should be using an x-
command, as far as I know. And these are functions which are clearly
designed for management tools.

If management tools are indeed free to use x- functions, then I'm
completely fine with making these experimental for now. It's just that
it looks to me like “Hey, look, we have these two new functions you can
use!” and then, two versions later we remove them because we have a
general reconfiguration option, and we'll say “It's your own fault for
using experimental functions” if someone complains. That sounds
hypocritical to me, but I'm probably being to “legal” here.

(i.e. it's more like “Hey, look, two new cool functions! But don't use
them.” which sounds like a contradiction to me, whereas it actually
means “Feel free to use them but don't blame us”)

tl;dr: May management tools use x- functions? And is it actually
conceivable for them to do so? If so, my whole argument becomes moot, so
let's make these functions x-.

Mainly I'd like to know about some example where we had an x- function
in the past. Markus seemed to imply that was the case.

Max

>                                                                    but it
> does remove their right to moan if it changes.
> 
> Dave



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-09 16:13       ` [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child Max Reitz
  2015-10-09 16:42         ` Dr. David Alan Gilbert
@ 2015-10-12  7:56         ` Markus Armbruster
  1 sibling, 0 replies; 45+ messages in thread
From: Markus Armbruster @ 2015-10-12  7:56 UTC (permalink / raw)
  To: Max Reitz
  Cc: Kevin Wolf, Alberto Garcia, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Dr. David Alan Gilbert,
	Gonglei, Stefan Hajnoczi, Yang Hongyang

Max Reitz <mreitz@redhat.com> writes:

> On 08.10.2015 08:15, Markus Armbruster wrote:
>> Max Reitz <mreitz@redhat.com> writes:
>> 
>>> On 22.09.2015 09:44, Wen Congyang wrote:
>>>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>>>> It justs for adding/removing quorum's child now, and don't support all
>>>> kinds of children,
>>>
>>> It does support all kinds of children for quorum, doesn't it?
>>>
>>>>                    nor all block drivers. So it is experimental now.
>>>
>>> Well, that is not really a reason why we would have to make it
>>> experimental. For instance, blockdev-add (although some might argue it
>>> actually is experimental...) doesn't support all block drivers either.
>> 
>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
>> People tried using it, then found its current limitations the painful
>> way.  Not nice.
>
> I knew I should have written s/some might/Markus does/. ;-)

:)

>>> The reason I am hesitant of adding an experimental QMP interface that is
>>> actually visible to the user (compare x-image in blkverify and blkdebug,
>>> which are not documented and not to be used by the user) is twofold:
>>>
>>> (1) At some point we have to say "OK, this is good enough now" and make
>>>     it stable. What would that point be? Who can guarantee that we
>>>     wouldn't want to make any interface changes after that point?
>> 
>> Nobody can, just like for any other interface.  So?
>
> The main question is "what would that point be". As I can see you're
> arguing that that point would be "once people want to use it", but I'm
> arguing that people want to use it today or we wouldn't need this
> interface at all.
>
> I'm against adding external experimental interface because having
> external interface indicates that someone wants to use them, but making
> them experimental indicates that nobody should use them.

Make that "nobody should use them in anger just yet."

They can and should be used to develop stuff.  Developing non-trivial
interfaces without actual users is risky.  Sometimes, you can't see
shortcomings in an interface until you try to use it.  Successful actual
use can build confidence the experimental interface is in fact ready to
be cast in stone.

> This interface is added for the COLO series. The documentation added in
> patch 5 there explains usage of COLO with x-child-add. I don't think
> that should be there, because it's experimental. But why have an
> external interface if nobody should use it anyway?
>
>> The x- prefix enables work spanning multiple releases.  Until the
>> feature is complete, we have a hard time seeing the whole picture, and
>> therefore the risk of interface mistakes is higher than normal.  Once
>> it's complete, we drop the x-.
>
> I'm arguing the feature is complete as far as what it's supposed to do goes.

When you say "the feature is complete", you're arguing this specific
interface is ready.  When you say you're "against adding external
experimental interface", you're arguing proper use of x-.  Let's try to
keep the discussion of principles separate from the discussion of the
specific instance.

On the former: maybe the interface is ready, but I can't judge offhand.
All I can do is ask questions.

On the latter: I emphatically disagree with the idea that experimental
interfaces are to be avoided because "someone wants to use them".

>>>                                                                   Would
>>>     we actually remember to revisit this function once in a while and
>>>     consider making it stable?
>> 
>> Has that been a problem in the past?
>
> I don't know, because I never witnessed an external experimental
> interface, but I haven't been closely involved with qemu for too long.

QMP itself started experimental, and was declared stable after fairly
heated discussion.

I think we've been dropping x- prefixes pretty routinely.  A quick,
superficial search finds commit 41310c6 (x-rdma) and commit 467b3f3
(x-iothread).

[...]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-09 16:42         ` Dr. David Alan Gilbert
  2015-10-09 18:24           ` Max Reitz
@ 2015-10-12  7:58           ` Markus Armbruster
  1 sibling, 0 replies; 45+ messages in thread
From: Markus Armbruster @ 2015-10-12  7:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Alberto Garcia, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, Gonglei,
	Stefan Hajnoczi, Yang Hongyang

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Max Reitz (mreitz@redhat.com) wrote:
>> On 08.10.2015 08:15, Markus Armbruster wrote:
>> > Max Reitz <mreitz@redhat.com> writes:
>> > 
>> >> On 22.09.2015 09:44, Wen Congyang wrote:
>> >>> The new QMP command name is x-blockdev-child-add, and
>> >>> x-blockdev-child-del.
>> >>> It justs for adding/removing quorum's child now, and don't support all
>> >>> kinds of children,
>> >>
>> >> It does support all kinds of children for quorum, doesn't it?
>> >>
>> >>>                    nor all block drivers. So it is experimental now.
>> >>
>> >> Well, that is not really a reason why we would have to make it
>> >> experimental. For instance, blockdev-add (although some might argue it
>> >> actually is experimental...) doesn't support all block drivers either.
>> > 
>> > Yup, and not calling it x-blockdev-add until it's done was a mistake.
>> > People tried using it, then found its current limitations the painful
>> > way.  Not nice.
>> 
>> I knew I should have written s/some might/Markus does/. ;-)
>> 
>> >> The reason I am hesitant of adding an experimental QMP interface that is
>> >> actually visible to the user (compare x-image in blkverify and blkdebug,
>> >> which are not documented and not to be used by the user) is twofold:
>> >>
>> >> (1) At some point we have to say "OK, this is good enough now" and make
>> >>     it stable. What would that point be? Who can guarantee that we
>> >>     wouldn't want to make any interface changes after that point?
>> > 
>> > Nobody can, just like for any other interface.  So?
>> 
>> The main question is "what would that point be". As I can see you're
>> arguing that that point would be "once people want to use it", but I'm
>> arguing that people want to use it today or we wouldn't need this
>> interface at all.
>> 
>> I'm against adding external experimental interface because having
>> external interface indicates that someone wants to use them, but making
>> them experimental indicates that nobody should use them.
>> 
>> This interface is added for the COLO series. The documentation added in
>> patch 5 there explains usage of COLO with x-child-add. I don't think
>> that should be there, because it's experimental. But why have an
>> external interface if nobody should use it anyway?
>
> Because it lets people move forward; the COLO series is pretty huge, there
> already seem to be side discussions spawning off about dynamic reconfiguration
> of stuff, who knows how long those will take to pan out.
> Adding the experimental stuff makes it easier for people to try and
> get some feedback on.
> If everyone turns out to love it then it only takes a trivial patch to promote
> it; if people actually realise there is a better interface then it's
> no problem to change it either - x- doesn't stop any one using it, but it
> does remove their right to moan if it changes.

Exactly.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-09 18:24           ` Max Reitz
@ 2015-10-12  8:07             ` Dr. David Alan Gilbert
  2015-10-12  8:18             ` Kevin Wolf
  1 sibling, 0 replies; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-12  8:07 UTC (permalink / raw)
  To: Max Reitz
  Cc: Kevin Wolf, Alberto Garcia, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, Markus Armbruster, qemu devel,
	Gonglei, Stefan Hajnoczi, Yang Hongyang

* Max Reitz (mreitz@redhat.com) wrote:
> On 09.10.2015 18:42, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mreitz@redhat.com) wrote:
> >> On 08.10.2015 08:15, Markus Armbruster wrote:
> >>> Max Reitz <mreitz@redhat.com> writes:
> >>>
> >>>> On 22.09.2015 09:44, Wen Congyang wrote:
> >>>>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
> >>>>> It justs for adding/removing quorum's child now, and don't support all
> >>>>> kinds of children,
> >>>>
> >>>> It does support all kinds of children for quorum, doesn't it?
> >>>>
> >>>>>                    nor all block drivers. So it is experimental now.
> >>>>
> >>>> Well, that is not really a reason why we would have to make it
> >>>> experimental. For instance, blockdev-add (although some might argue it
> >>>> actually is experimental...) doesn't support all block drivers either.
> >>>
> >>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
> >>> People tried using it, then found its current limitations the painful
> >>> way.  Not nice.
> >>
> >> I knew I should have written s/some might/Markus does/. ;-)
> >>
> >>>> The reason I am hesitant of adding an experimental QMP interface that is
> >>>> actually visible to the user (compare x-image in blkverify and blkdebug,
> >>>> which are not documented and not to be used by the user) is twofold:
> >>>>
> >>>> (1) At some point we have to say "OK, this is good enough now" and make
> >>>>     it stable. What would that point be? Who can guarantee that we
> >>>>     wouldn't want to make any interface changes after that point?
> >>>
> >>> Nobody can, just like for any other interface.  So?
> >>
> >> The main question is "what would that point be". As I can see you're
> >> arguing that that point would be "once people want to use it", but I'm
> >> arguing that people want to use it today or we wouldn't need this
> >> interface at all.
> >>
> >> I'm against adding external experimental interface because having
> >> external interface indicates that someone wants to use them, but making
> >> them experimental indicates that nobody should use them.
> >>
> >> This interface is added for the COLO series. The documentation added in
> >> patch 5 there explains usage of COLO with x-child-add. I don't think
> >> that should be there, because it's experimental. But why have an
> >> external interface if nobody should use it anyway?
> > 
> > Because it lets people move forward; the COLO series is pretty huge, there
> > already seem to be side discussions spawning off about dynamic reconfiguration
> > of stuff, who knows how long those will take to pan out.
> 
> Yes, and my point is that with these functions
> (blockdev-child-{add,del}) the result of that side discussion doesn't
> matter.
> 
> > Adding the experimental stuff makes it easier for people to try and
> > get some feedback on.
> 
> The thing is, I cannot imagine any feedback that would necessitate an
> incompatible change. “I want to change quorum's options while
> adding/removing children” can easily be accomplished with an additional
> optional parameter.
> 
> But I do know that we want to keep things experimental exactly because
> there can be feedback which I cannot imagine right now.
> 
> > If everyone turns out to love it then it only takes a trivial patch to promote
> > it; if people actually realise there is a better interface then it's
> > no problem to change it either - x- doesn't stop any one using it,
> 
> But it should, shouldn't it? No management tool should be using an x-
> command, as far as I know. And these are functions which are clearly
> designed for management tools.
> 
> If management tools are indeed free to use x- functions, then I'm
> completely fine with making these experimental for now. It's just that
> it looks to me like “Hey, look, we have these two new functions you can
> use!” and then, two versions later we remove them because we have a
> general reconfiguration option, and we'll say “It's your own fault for
> using experimental functions” if someone complains. That sounds
> hypocritical to me, but I'm probably being to “legal” here.
>
> (i.e. it's more like “Hey, look, two new cool functions! But don't use
> them.” which sounds like a contradiction to me, whereas it actually
> means “Feel free to use them but don't blame us”)
> 
> tl;dr: May management tools use x- functions? And is it actually
> conceivable for them to do so? If so, my whole argument becomes moot, so
> let's make these functions x-.

My guess is the libvirt guys wont take the code to drive the x- methods;
but it still makes it easier if someone wants to try this stuff out, they
wont need to apply 2/3 sets of COLO code and then any management tools.

> Mainly I'd like to know about some example where we had an x- function
> in the past. Markus seemed to imply that was the case.

The RDMA code used to have x- for migration protocol and some of the
capabilities; we've recently added Jason Herne's cpu throttling with
similar x- flags (1626fee3bdbb295d5e8aff800f7621357bb376d6),
and input-send-event got moved into the x- world (df5b2adb7398d71016ee469f71e52075ed95e04e)
which is much worse than it starting out there.

Dave

> 
> Max
> 
> >                                                                    but it
> > does remove their right to moan if it changes.
> > 
> > Dave
> 
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-09 18:24           ` Max Reitz
  2015-10-12  8:07             ` Dr. David Alan Gilbert
@ 2015-10-12  8:18             ` Kevin Wolf
  1 sibling, 0 replies; 45+ messages in thread
From: Kevin Wolf @ 2015-10-12  8:18 UTC (permalink / raw)
  To: Max Reitz
  Cc: Yang Hongyang, Alberto Garcia, zhanghailiang, Markus Armbruster,
	Jiang Yunhong, Dong Eddie, qemu devel, Dr. David Alan Gilbert,
	Gonglei, Stefan Hajnoczi, qemu block

[-- Attachment #1: Type: text/plain, Size: 5490 bytes --]

Am 09.10.2015 um 20:24 hat Max Reitz geschrieben:
> On 09.10.2015 18:42, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mreitz@redhat.com) wrote:
> >> On 08.10.2015 08:15, Markus Armbruster wrote:
> >>> Max Reitz <mreitz@redhat.com> writes:
> >>>
> >>>> On 22.09.2015 09:44, Wen Congyang wrote:
> >>>>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
> >>>>> It justs for adding/removing quorum's child now, and don't support all
> >>>>> kinds of children,
> >>>>
> >>>> It does support all kinds of children for quorum, doesn't it?
> >>>>
> >>>>>                    nor all block drivers. So it is experimental now.
> >>>>
> >>>> Well, that is not really a reason why we would have to make it
> >>>> experimental. For instance, blockdev-add (although some might argue it
> >>>> actually is experimental...) doesn't support all block drivers either.
> >>>
> >>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
> >>> People tried using it, then found its current limitations the painful
> >>> way.  Not nice.
> >>
> >> I knew I should have written s/some might/Markus does/. ;-)
> >>
> >>>> The reason I am hesitant of adding an experimental QMP interface that is
> >>>> actually visible to the user (compare x-image in blkverify and blkdebug,
> >>>> which are not documented and not to be used by the user) is twofold:
> >>>>
> >>>> (1) At some point we have to say "OK, this is good enough now" and make
> >>>>     it stable. What would that point be? Who can guarantee that we
> >>>>     wouldn't want to make any interface changes after that point?
> >>>
> >>> Nobody can, just like for any other interface.  So?
> >>
> >> The main question is "what would that point be". As I can see you're
> >> arguing that that point would be "once people want to use it", but I'm
> >> arguing that people want to use it today or we wouldn't need this
> >> interface at all.
> >>
> >> I'm against adding external experimental interface because having
> >> external interface indicates that someone wants to use them, but making
> >> them experimental indicates that nobody should use them.
> >>
> >> This interface is added for the COLO series. The documentation added in
> >> patch 5 there explains usage of COLO with x-child-add. I don't think
> >> that should be there, because it's experimental. But why have an
> >> external interface if nobody should use it anyway?
> > 
> > Because it lets people move forward; the COLO series is pretty huge, there
> > already seem to be side discussions spawning off about dynamic reconfiguration
> > of stuff, who knows how long those will take to pan out.
> 
> Yes, and my point is that with these functions
> (blockdev-child-{add,del}) the result of that side discussion doesn't
> matter.
> 
> > Adding the experimental stuff makes it easier for people to try and
> > get some feedback on.
> 
> The thing is, I cannot imagine any feedback that would necessitate an
> incompatible change. “I want to change quorum's options while
> adding/removing children” can easily be accomplished with an additional
> optional parameter.
> 
> But I do know that we want to keep things experimental exactly because
> there can be feedback which I cannot imagine right now.
> 
> > If everyone turns out to love it then it only takes a trivial patch to promote
> > it; if people actually realise there is a better interface then it's
> > no problem to change it either - x- doesn't stop any one using it,
> 
> But it should, shouldn't it? No management tool should be using an x-
> command, as far as I know. And these are functions which are clearly
> designed for management tools.

It should stop people from using it in production, but it shouldn't stop
them from using it for development and testing.

We know that child-add/del is probably not the interface that we want to
have in the end (and I would like to avoid accumulating tons of
compatibility commands once we have what we want).

If the COLO people say that they need an experimental command in order
to make progress, that's fine with me. I think we'll all agree that
while 'blockdev-add' can't reasonably be used in production yet, without
it we couldn't have made much of the progress in the block layer that we
made in the past year. If COLO people are in the same situation, let's
give them what they need, without setting an unwanted interface in
stone.

> If management tools are indeed free to use x- functions, then I'm
> completely fine with making these experimental for now. It's just that
> it looks to me like “Hey, look, we have these two new functions you can
> use!” and then, two versions later we remove them because we have a
> general reconfiguration option, and we'll say “It's your own fault for
> using experimental functions” if someone complains. That sounds
> hypocritical to me, but I'm probably being to “legal” here.

Experimental features in management tools (e.g. in some feature branch)
can use them, they just can't rely on it keeping working.

> (i.e. it's more like “Hey, look, two new cool functions! But don't use
> them.” which sounds like a contradiction to me, whereas it actually
> means “Feel free to use them but don't blame us”)

"Hey, look, two new functions for an incomplete feature! Use them if you
want to help with development, but otherwise stay clear of them."

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child()
  2015-10-09 15:51       ` Max Reitz
@ 2015-10-12 11:56         ` Alberto Garcia
  0 siblings, 0 replies; 45+ messages in thread
From: Alberto Garcia @ 2015-10-12 11:56 UTC (permalink / raw)
  To: Max Reitz, Wen Congyang, qemu devel, Eric Blake,
	Markus Armbruster, Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang

On Fri 09 Oct 2015 05:51:55 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>>>> +        s->bs = g_renew(BlockDriverState *, s->bs, s->max_children + 1);
>>>> +        s->bs[s->num_children] = NULL;
>>>> +        s->max_children++;
>>>> +    }
>>>
>>> Just a suggestion, please feel free to ignore it completely:
>>>
>>> You can drop the s->max_children field and just always call g_renew()
>>> with s->num_children + 1 as the @count parameter. There shouldn't be
>>> any (visible) performance penalty, but it would simplify the code.
>> 
>> If s->num_children has decreased since the previous g_renew() call
>> (because the user called quorum_del_child()) that could actually reduce
>> the array size.
>
> Yes, it could. And that would be just fine. ;-)
>
> We'd just keep the array exactly as big as it needs to be. I find that
> pretty intuitive. It's just counter-intuitive if you think one should
> never use realloc() for reducing the size of a buffer (and I know I
> myself tend to write my code thinking that).

If the goal is to keep the array exactly as big as it needs to be then
we should use g_renew() in quorum_del_child()...

Anyway we're digressing :-) this array is one pointer per Quorum child,
so the amount of memory we're talking about here is probably negligible.
I'm fine with any solution.

Berto

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child
  2015-10-07 19:42   ` [Qemu-devel] [Qemu-block] " Max Reitz
  2015-10-08  6:15     ` Markus Armbruster
@ 2015-10-12 16:27     ` Max Reitz
  1 sibling, 0 replies; 45+ messages in thread
From: Max Reitz @ 2015-10-12 16:27 UTC (permalink / raw)
  To: Wen Congyang, qemu devel, Eric Blake, Markus Armbruster,
	Alberto Garcia, Stefan Hajnoczi
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

On 07.10.2015 21:42, Max Reitz wrote:
> On 22.09.2015 09:44, Wen Congyang wrote:
>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>> It justs for adding/removing quorum's child now, and don't support all
>> kinds of children,
> 
> It does support all kinds of children for quorum, doesn't it?
> 
>>                    nor all block drivers. So it is experimental now.
> 
> Well, that is not really a reason why we would have to make it
> experimental. For instance, blockdev-add (although some might argue it
> actually is experimental...) doesn't support all block drivers either.

OK, after a rather long discussion, my opinion has changed. Adding them
as experimental interfaces is good (although the reason noted here is
not exactly what I feel is the reason that came out in the discussion).

Thanks to everyone who argued with me! I took a good chunk of your time,
and I'll have you know that I'm grateful for it.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Dynamic reconfiguration
  2015-10-08 11:10         ` [Qemu-devel] [Qemu-block] " Kevin Wolf
@ 2015-10-21  8:27           ` Markus Armbruster
  2015-10-26  2:04             ` Wen Congyang
  0 siblings, 1 reply; 45+ messages in thread
From: Markus Armbruster @ 2015-10-21  8:27 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	Dr. David Alan Gilbert

Sorry for my slow reply.

Kevin Wolf <kwolf@redhat.com> writes:

> Am 08.10.2015 um 13:02 hat Kevin Wolf geschrieben:
>> Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben:
>> > Max Reitz <mreitz@redhat.com> writes:
>> > > E.g. you may have a block filter in the future where you want to
>> > > exchange its child BDS. This exchange should be an atomic operation, so
>> > > we cannot use this interface there anyway. For quorum, such an exchange
>> > > does not need to be atomic, since you can just add the new child first
>> > > and remove the old one afterwards.
>> > >
>> > > So maybe in the future we get some block driver other than quorum for
>> > > which adding and removing children (as opposed to atomically exchanging
>> > > them) makes sense, but for now I can only see quorum. Therefore, that
>> > > this works for quorum only is in my opinion not a reason to make it
>> > > experimental. I think we actually want to keep it that way.
>> > 
>> > Are you telling us the existing interface is insufficiently general?
>> > That the general interface neeeds to support atomic replacement?
>> > 
>> > If yes, why isn't the general interface is what we should do for quorum?
>> > Delete is atomic replacement by nothing, add is atomic replacement of
>> > nothing.
>> 
>> The general thing is what we used to call "dynamic reconfiguration". If
>> we want a single command to enable it (which would be great if we
>> could), we need to some more design work first. Atomic replacement might
>> be the operation we're looking for, but we have to be sure.
>> 
>> So far we haven't thought about dynamic reconfiguation enough that we
>> would know the right solution, but enough that we know it's hard.

Yes.

>>                                                                   That
>> would be an argument for me that makes adding an x-* command now
>> acceptable. On the other hand, the fact that we want a single command in
>> the end makes me want to keep it experimental.

We have a bit more time to think until the release calcifies interfaces.

>> What types of dynamic reconfiguration do we need to support? I'll start
>> with a small list, feel free to extend it:
>> 
>> * Replace a child node with another node. This works pretty much
>>   everywhere in the tree - including the root, i.e. BlockBackend!
>>   Working just on BDSes doesn't seem to be enough.

Two ways to replace a node:

1. Operation on a node, i.e. replace node O by node N *everywhere*.

2. Operation on an edge, i.e. replace an edge to node O by an edge to
   node N.

Less abstract:

1. Node replacement takes a BDS *O.  It either replaces in place
   (something we've worked hard to get rid of), or needs to find all
   variables containing the value O.

2. Edge replacement takes a BDS **O.  Naturally, whoever contains that
   thing may need to be notified, so the actual interface may well be a
   bit more involved.

>> * Adding a child to a node that can take additional children (e.g.
>>   quorum can take an arbitrary number; but also COW image formats have
>>   an option child, so you could add a backing file to a node originally
>>   opened with BDRV_O_NO_BACKING)
>> 
>>   Same as atomically replacing nothing by a node.

Yes.

>> * Removing an optional child from a node that remains valid with that
>>   child removed. The same examples apply.
>> 
>>   Same as atomically replacing a child by nothing.

Yes.

>> * Add a new node between two existing nodes. An example is taking a live
>>   snapshot, which inserts a new node between the BlockBackend and the
>>   first BDS. Or it could be used to insert a filter somewhere in the
>>   graph.
>> 
>>   Same as creating the new node pointing to node B (or dynamically
>>   adding it) and then atomically replacing the reference of node A that
>>   pointed to B with a reference to the new node.

Footnote: between creation of the new node and the atomic replace, the
old node has another parent.  Requires support for a BDS to have
multiple parents, but that's a given, I think.

>> * Add a new node between multiple existing nodes. This is one of the
>>   examples we always used to discuss with dynamic reconfiguration:
>> 
>>       base <- sn1 <- sn2 <--+-- virtio-blk
>>                             |
>>                             +-- NBD server
>> 
>>   Adding a filter could result in this:
>> 
>>         base <- sn1 <- sn2 <- throttling <--+-- virtio-blk
>>                                           |
>>                                           +-- NBD server
>> 
>>   Or this:
>> 
>>         base <- sn1 <- sn2 <--+-- throttling <- virtio-blk
>>                               |
>>                               +-- NBD server
>> 
>>   Or this:
>> 
>>         base <- sn1 <- sn2 <--+-- virtio-blk
>>                               |
>>                               +-- throttling <- NBD server
>> 
>>   All of these are different kinds of "adding a filter", and all of
>>   them are valid operations that a user could want to perform.
>> 
>>   Case 2 and 3 are really just "add a new node between two existing
>>   nodes", as explained above.

Yes.

>>   Case 1 is new: We still create the throttling node so that it already
>>   points to sn2, but now we have to atomically change the children of
>>   two things (the BlockBackends of virtio-blk and the NBD server). Not a
>>   problem per se because we can just do that, but it raises the question
>>   whether the atomic replacement operation needs to be transactionable.

Where "transactionable" means you can bind it together with other
transactionable operations, and the resulting transaction either
succeeds or fails completely.  Right?

In increasing flexibility order:

a. atomic single replacement, not transactionable

b. atomic multiple replacements, not transactionable

c. single replacement, transactionable

In your throttling example, having to insert the throttle in two
separate operations (either of which can fail) seems tolerable.

Can we think of an example where a similar multi-replacement should
either succeed or fail completely?  I.e. one that needs at least b.

Can we think of an example where we need c?

>> * Remove a node between two (or more) other nodes.
>> 
>>   Same as atomically replacing a child by a grandchild. For more than
>>   two involved nodes, again a transactional version might be needed.
>> 
>> So at the first sight, this operation seems to work as the general
>> interface for dynamic reconfiguration.

Yes, we seem to be onto something here.  Finally!

> And actually, after re-reading the other branch of this email thread
> where I replied to Berto that he wants to use bdrv_reopen() to change
> the voting threshold for quorum, it occurred to me that bdrv_reopen()
> should possibly also be this atomatic replacement function.
>
> After all, the references to other nodes are blockdev-add options, and
> if we implement bdrv_reopen() fully so that it can change any options
> that can be given to blockdev-add, that means that we must be able to
> replace children.

You got a point.  Do you think implementing it will be hard?

>
> Kevin
>
>> One problem we discussed earlier that I'm not sure whether it's related
>> is filter nodes inserted automatically once we change e.g. the I/O
>> throttling QMP commands to add a throttling filter BDS to the graph.
>> 
>> If the user creates nodes A and B, but sets throttling options, they
>> might end up with a chain like this:
>> 
>>     A <- throttling <- B

Is "sets throttling options" == "uses the (then) legacy interface for
throttling instead of the more powerful 'insert throttling node'
interface"?

>> Now imagine that they want to add another filter between A and B, let's
>> say blkdebug. They would need to know that they have to insert the new
>> node between A and throttling or B and throttling, but not between A and
>> B. If they tried to insert it between A and B, the algorithm above says
>> that they would let blkdebug point to A, and replace B's child with
>> blkdebug, the resulting tree wouldn't be throttled any more!
>> 
>>     A <- blkdebug <- B
>> 
>>          throttling (ref = 0 -> delete)

Can we phrase the operation differently?  Instead of "insert between A
and B (silently replacing everything that is now between A and B)",
say one of

1a. Replace node A by A <- blkdebug

1b. Replace node B by blkdebug <- B

2a. Replace edge A <- B by <- blkdebug <-
    Impossible in the current state, because there is no such edge.

2b. Replace edge A <- by <- blkdebug <-

2c. Replace edge <- B by <- blkdebug <-

See also node vs. edge replacement above.

>> Even if they knew that they have to consider the throttling node, it
>> currently wouldn't have a node-name, and with Jeff's autogenerated names
>> it wouldn't be predictable.
>> 
>> Maybe the dynamic reconfiguration interface does need to be a bit
>> cleverer.
>> 
>> 
>> Anyway, after writing all of this, I'm almost convinced now that an
>> experimental interface is the right thing to do in this patch series.

Let's do experimental now, and reconsider before the release.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Dynamic reconfiguration
  2015-10-21  8:27           ` [Qemu-devel] [Qemu-block] Dynamic reconfiguration Markus Armbruster
@ 2015-10-26  2:04             ` Wen Congyang
  2015-10-26  7:24               ` Markus Armbruster
  0 siblings, 1 reply; 45+ messages in thread
From: Wen Congyang @ 2015-10-26  2:04 UTC (permalink / raw)
  To: Markus Armbruster, Kevin Wolf
  Cc: zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	Dr. David Alan Gilbert

On 10/21/2015 04:27 PM, Markus Armbruster wrote:
> Sorry for my slow reply.
> 
> Kevin Wolf <kwolf@redhat.com> writes:
> 
>> Am 08.10.2015 um 13:02 hat Kevin Wolf geschrieben:
>>> Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben:
>>>> Max Reitz <mreitz@redhat.com> writes:
>>>>> E.g. you may have a block filter in the future where you want to
>>>>> exchange its child BDS. This exchange should be an atomic operation, so
>>>>> we cannot use this interface there anyway. For quorum, such an exchange
>>>>> does not need to be atomic, since you can just add the new child first
>>>>> and remove the old one afterwards.
>>>>>
>>>>> So maybe in the future we get some block driver other than quorum for
>>>>> which adding and removing children (as opposed to atomically exchanging
>>>>> them) makes sense, but for now I can only see quorum. Therefore, that
>>>>> this works for quorum only is in my opinion not a reason to make it
>>>>> experimental. I think we actually want to keep it that way.
>>>>
>>>> Are you telling us the existing interface is insufficiently general?
>>>> That the general interface neeeds to support atomic replacement?
>>>>
>>>> If yes, why isn't the general interface is what we should do for quorum?
>>>> Delete is atomic replacement by nothing, add is atomic replacement of
>>>> nothing.
>>>
>>> The general thing is what we used to call "dynamic reconfiguration". If
>>> we want a single command to enable it (which would be great if we
>>> could), we need to some more design work first. Atomic replacement might
>>> be the operation we're looking for, but we have to be sure.
>>>
>>> So far we haven't thought about dynamic reconfiguation enough that we
>>> would know the right solution, but enough that we know it's hard.
> 
> Yes.
> 
>>>                                                                   That
>>> would be an argument for me that makes adding an x-* command now
>>> acceptable. On the other hand, the fact that we want a single command in
>>> the end makes me want to keep it experimental.
> 
> We have a bit more time to think until the release calcifies interfaces.
> 
>>> What types of dynamic reconfiguration do we need to support? I'll start
>>> with a small list, feel free to extend it:
>>>
>>> * Replace a child node with another node. This works pretty much
>>>   everywhere in the tree - including the root, i.e. BlockBackend!
>>>   Working just on BDSes doesn't seem to be enough.
> 
> Two ways to replace a node:
> 
> 1. Operation on a node, i.e. replace node O by node N *everywhere*.
> 
> 2. Operation on an edge, i.e. replace an edge to node O by an edge to
>    node N.
> 
> Less abstract:
> 
> 1. Node replacement takes a BDS *O.  It either replaces in place
>    (something we've worked hard to get rid of), or needs to find all
>    variables containing the value O.
> 
> 2. Edge replacement takes a BDS **O.  Naturally, whoever contains that
>    thing may need to be notified, so the actual interface may well be a
>    bit more involved.
> 
>>> * Adding a child to a node that can take additional children (e.g.
>>>   quorum can take an arbitrary number; but also COW image formats have
>>>   an option child, so you could add a backing file to a node originally
>>>   opened with BDRV_O_NO_BACKING)
>>>
>>>   Same as atomically replacing nothing by a node.
> 
> Yes.
> 
>>> * Removing an optional child from a node that remains valid with that
>>>   child removed. The same examples apply.
>>>
>>>   Same as atomically replacing a child by nothing.
> 
> Yes.
> 
>>> * Add a new node between two existing nodes. An example is taking a live
>>>   snapshot, which inserts a new node between the BlockBackend and the
>>>   first BDS. Or it could be used to insert a filter somewhere in the
>>>   graph.
>>>
>>>   Same as creating the new node pointing to node B (or dynamically
>>>   adding it) and then atomically replacing the reference of node A that
>>>   pointed to B with a reference to the new node.
> 
> Footnote: between creation of the new node and the atomic replace, the
> old node has another parent.  Requires support for a BDS to have
> multiple parents, but that's a given, I think.
> 
>>> * Add a new node between multiple existing nodes. This is one of the
>>>   examples we always used to discuss with dynamic reconfiguration:
>>>
>>>       base <- sn1 <- sn2 <--+-- virtio-blk
>>>                             |
>>>                             +-- NBD server
>>>
>>>   Adding a filter could result in this:
>>>
>>>         base <- sn1 <- sn2 <- throttling <--+-- virtio-blk
>>>                                           |
>>>                                           +-- NBD server
>>>
>>>   Or this:
>>>
>>>         base <- sn1 <- sn2 <--+-- throttling <- virtio-blk
>>>                               |
>>>                               +-- NBD server
>>>
>>>   Or this:
>>>
>>>         base <- sn1 <- sn2 <--+-- virtio-blk
>>>                               |
>>>                               +-- throttling <- NBD server
>>>
>>>   All of these are different kinds of "adding a filter", and all of
>>>   them are valid operations that a user could want to perform.
>>>
>>>   Case 2 and 3 are really just "add a new node between two existing
>>>   nodes", as explained above.
> 
> Yes.
> 
>>>   Case 1 is new: We still create the throttling node so that it already
>>>   points to sn2, but now we have to atomically change the children of
>>>   two things (the BlockBackends of virtio-blk and the NBD server). Not a
>>>   problem per se because we can just do that, but it raises the question
>>>   whether the atomic replacement operation needs to be transactionable.
> 
> Where "transactionable" means you can bind it together with other
> transactionable operations, and the resulting transaction either
> succeeds or fails completely.  Right?
> 
> In increasing flexibility order:
> 
> a. atomic single replacement, not transactionable
> 
> b. atomic multiple replacements, not transactionable
> 
> c. single replacement, transactionable
> 
> In your throttling example, having to insert the throttle in two
> separate operations (either of which can fail) seems tolerable.
> 
> Can we think of an example where a similar multi-replacement should
> either succeed or fail completely?  I.e. one that needs at least b.
> 
> Can we think of an example where we need c?
> 
>>> * Remove a node between two (or more) other nodes.
>>>
>>>   Same as atomically replacing a child by a grandchild. For more than
>>>   two involved nodes, again a transactional version might be needed.
>>>
>>> So at the first sight, this operation seems to work as the general
>>> interface for dynamic reconfiguration.
> 
> Yes, we seem to be onto something here.  Finally!
> 
>> And actually, after re-reading the other branch of this email thread
>> where I replied to Berto that he wants to use bdrv_reopen() to change
>> the voting threshold for quorum, it occurred to me that bdrv_reopen()
>> should possibly also be this atomatic replacement function.
>>
>> After all, the references to other nodes are blockdev-add options, and
>> if we implement bdrv_reopen() fully so that it can change any options
>> that can be given to blockdev-add, that means that we must be able to
>> replace children.
> 
> You got a point.  Do you think implementing it will be hard?
> 
>>
>> Kevin
>>
>>> One problem we discussed earlier that I'm not sure whether it's related
>>> is filter nodes inserted automatically once we change e.g. the I/O
>>> throttling QMP commands to add a throttling filter BDS to the graph.
>>>
>>> If the user creates nodes A and B, but sets throttling options, they
>>> might end up with a chain like this:
>>>
>>>     A <- throttling <- B
> 
> Is "sets throttling options" == "uses the (then) legacy interface for
> throttling instead of the more powerful 'insert throttling node'
> interface"?
> 
>>> Now imagine that they want to add another filter between A and B, let's
>>> say blkdebug. They would need to know that they have to insert the new
>>> node between A and throttling or B and throttling, but not between A and
>>> B. If they tried to insert it between A and B, the algorithm above says
>>> that they would let blkdebug point to A, and replace B's child with
>>> blkdebug, the resulting tree wouldn't be throttled any more!
>>>
>>>     A <- blkdebug <- B
>>>
>>>          throttling (ref = 0 -> delete)
> 
> Can we phrase the operation differently?  Instead of "insert between A
> and B (silently replacing everything that is now between A and B)",
> say one of
> 
> 1a. Replace node A by A <- blkdebug
> 
> 1b. Replace node B by blkdebug <- B
> 
> 2a. Replace edge A <- B by <- blkdebug <-
>     Impossible in the current state, because there is no such edge.

What does 'edge' mean?

Thanks
Wen Congyang

> 
> 2b. Replace edge A <- by <- blkdebug <-
> 
> 2c. Replace edge <- B by <- blkdebug <-
> 
> See also node vs. edge replacement above.
> 
>>> Even if they knew that they have to consider the throttling node, it
>>> currently wouldn't have a node-name, and with Jeff's autogenerated names
>>> it wouldn't be predictable.
>>>
>>> Maybe the dynamic reconfiguration interface does need to be a bit
>>> cleverer.
>>>
>>>
>>> Anyway, after writing all of this, I'm almost convinced now that an
>>> experimental interface is the right thing to do in this patch series.
> 
> Let's do experimental now, and reconsider before the release.
> 
> .
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Dynamic reconfiguration
  2015-10-26  2:04             ` Wen Congyang
@ 2015-10-26  7:24               ` Markus Armbruster
  2015-10-26  7:25                 ` Wen Congyang
  0 siblings, 1 reply; 45+ messages in thread
From: Markus Armbruster @ 2015-10-26  7:24 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	Dr. David Alan Gilbert

Wen Congyang <wency@cn.fujitsu.com> writes:

> On 10/21/2015 04:27 PM, Markus Armbruster wrote:
[...]
>> Can we phrase the operation differently?  Instead of "insert between A
>> and B (silently replacing everything that is now between A and B)",
>> say one of
>> 
>> 1a. Replace node A by A <- blkdebug
>> 
>> 1b. Replace node B by blkdebug <- B
>> 
>> 2a. Replace edge A <- B by <- blkdebug <-
>>     Impossible in the current state, because there is no such edge.
>
> What does 'edge' mean?

It's graph terminology: the BB and BDS serve as the graph's nodes
(a.k.a. vertices), and the pointers connecting them serve as the graph's
edges.

[...]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Dynamic reconfiguration
  2015-10-26  7:24               ` Markus Armbruster
@ 2015-10-26  7:25                 ` Wen Congyang
  0 siblings, 0 replies; 45+ messages in thread
From: Wen Congyang @ 2015-10-26  7:25 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	Dr. David Alan Gilbert

On 10/26/2015 03:24 PM, Markus Armbruster wrote:
> Wen Congyang <wency@cn.fujitsu.com> writes:
> 
>> On 10/21/2015 04:27 PM, Markus Armbruster wrote:
> [...]
>>> Can we phrase the operation differently?  Instead of "insert between A
>>> and B (silently replacing everything that is now between A and B)",
>>> say one of
>>>
>>> 1a. Replace node A by A <- blkdebug
>>>
>>> 1b. Replace node B by blkdebug <- B
>>>
>>> 2a. Replace edge A <- B by <- blkdebug <-
>>>     Impossible in the current state, because there is no such edge.
>>
>> What does 'edge' mean?
> 
> It's graph terminology: the BB and BDS serve as the graph's nodes
> (a.k.a. vertices), and the pointers connecting them serve as the graph's
> edges.

Thanks

Wen Congyang

> 
> [...]
> .
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2015-10-26  7:27 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-22  7:44 [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Wen Congyang
2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 1/4] Add new block driver interface to add/delete a BDS's child Wen Congyang
2015-10-07 13:35   ` Alberto Garcia
2015-10-08  2:05     ` Wen Congyang
2015-10-07 18:33   ` [Qemu-devel] [Qemu-block] " Max Reitz
2015-10-08  2:06     ` Wen Congyang
2015-10-07 19:00   ` [Qemu-devel] " Dr. David Alan Gilbert
2015-10-08  2:03     ` Wen Congyang
2015-10-08 18:44       ` Dr. David Alan Gilbert
2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 2/4] quorum: implement bdrv_add_child() and bdrv_del_child() Wen Congyang
2015-10-07 14:12   ` Alberto Garcia
2015-10-08  2:10     ` Wen Congyang
2015-10-07 18:51   ` [Qemu-devel] [Qemu-block] " Max Reitz
2015-10-08  8:12     ` Alberto Garcia
2015-10-09 15:51       ` Max Reitz
2015-10-12 11:56         ` Alberto Garcia
2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 3/4] qmp: add monitor command to add/remove a child Wen Congyang
2015-10-07 14:33   ` Alberto Garcia
2015-10-07 19:42   ` [Qemu-devel] [Qemu-block] " Max Reitz
2015-10-08  6:15     ` Markus Armbruster
2015-10-08  8:29       ` Alberto Garcia
2015-10-08 10:03         ` Kevin Wolf
2015-10-08 10:13           ` Alberto Garcia
2015-10-09 16:14         ` Max Reitz
2015-10-08 11:02       ` [Qemu-devel] Dynamic reconfiguration (was: qmp: add monitor command to add/remove a child) Kevin Wolf
2015-10-08 11:10         ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2015-10-21  8:27           ` [Qemu-devel] [Qemu-block] Dynamic reconfiguration Markus Armbruster
2015-10-26  2:04             ` Wen Congyang
2015-10-26  7:24               ` Markus Armbruster
2015-10-26  7:25                 ` Wen Congyang
2015-10-09 16:13       ` [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child Max Reitz
2015-10-09 16:42         ` Dr. David Alan Gilbert
2015-10-09 18:24           ` Max Reitz
2015-10-12  8:07             ` Dr. David Alan Gilbert
2015-10-12  8:18             ` Kevin Wolf
2015-10-12  7:58           ` Markus Armbruster
2015-10-12  7:56         ` Markus Armbruster
2015-10-12 16:27     ` Max Reitz
2015-09-22  7:44 ` [Qemu-devel] [PATCH v5 4/4] hmp: " Wen Congyang
2015-10-07 14:38   ` Alberto Garcia
2015-09-22 11:15 ` [Qemu-devel] [PATCH v5 0/4] qapi: child add/delete support Dr. David Alan Gilbert
2015-09-23  1:08   ` Wen Congyang
2015-09-23  9:21     ` Dr. David Alan Gilbert
2015-09-23  9:30       ` Wen Congyang
2015-10-07  6:40 ` Wen Congyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.