All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency
@ 2012-08-07 13:44 Benoît Canet
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 01/10] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
                   ` (10 more replies)
  0 siblings, 11 replies; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

This patchset create a block driver implementing a quorum using three qemu disk
images. Writes are mirrored on the three files.
For the reading part the three files are read at the same time and a vote is
done to determine which is the majority qiov version. It then return this
majority version to the upper layers.
When three differents versions of the data are returned by the lower layer the
quorum is broken and the read return -EIO.

The goal of this patchset is to be turned in a QEMU block filter living just
above raw-*.c and below qcow2/qed when the required infrastructure will be done.

Main use of this feature will be people using NFS appliances which can be
subjected to bitflip errors.

usage: -drive file=quorum:image1.raw:image2.raw:image3.raw,if=virtio,cache=none

in v2:

eblake: fix typos
        squash two first commits

afärber: Modify the Makefile on first commit

bcanet: move function prototype of quorum.c one patch down

open question:
    Would the ability to do quorum on 2n+1 images like Blue Swirl suggested be
    popular ?

Benoît Canet (10):
  quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
  quorum: Create BDRVQuorumState and BlkDriver and do init.
  quorum: Add quorum_open().
  quorum: Add quorum_close().
  quorum: Add quorum_getlength().
  quorum: Add quorum_aio_writev and its dependencies.
  blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare()
    public
  quorum: Add quorum_co_flush().
  quorum: Add quorum_aio_readv.
  quorum: Add quorum mechanism.

 block/Makefile.objs |    1 +
 block/blkverify.c   |    8 +-
 block/quorum.c      |  390 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 397 insertions(+), 2 deletions(-)
 create mode 100644 block/quorum.c

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Qemu-devel] [RFC V2 01/10] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
@ 2012-08-07 13:44 ` Benoît Canet
  2012-08-07 20:24   ` Blue Swirl
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 02/10] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs |    1 +
 block/quorum.c      |   44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)
 create mode 100644 block/quorum.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..66af6dc 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-obj-y += stream.o
+block-obj-y += quorum.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
diff --git a/block/quorum.c b/block/quorum.c
new file mode 100644
index 0000000..046b183
--- /dev/null
+++ b/block/quorum.c
@@ -0,0 +1,44 @@
+/*
+ * Quorum Block filter
+ *
+ * Copyright (C) Nodalink, SARL. 2012
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Based on the design and code of blkverify.c (Copyright (C) 2010 IBM, Corp)
+ * and blkmirror.c (Copyright (C) 2011 Red Hat, Inc).
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "block_int.h"
+
+typedef struct QuorumAIOCB QuorumAIOCB;
+
+typedef struct QuorumSingleAIOCB {
+    BlockDriverAIOCB *aiocb;
+    char *buf;
+    int ret;
+    QuorumAIOCB *parent;
+} QuorumSingleAIOCB;
+
+struct QuorumAIOCB {
+    BlockDriverAIOCB common;
+    QEMUBH *bh;
+
+    /* Request metadata */
+    int64_t sector_num;
+    int nb_sectors;
+
+    QEMUIOVector *qiov;         /* calling readv IOV */
+
+    QuorumSingleAIOCB aios[3];   /* individual AIOs */
+    QEMUIOVector qiovs[3];      /* individual IOVs */
+    int count;                  /* number of completed AIOCB */
+    bool *finished;             /* completion signal for cancel */
+
+    void (*vote)(QuorumAIOCB *acb);
+    int vote_ret;
+};
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [RFC V2 02/10] quorum: Create BDRVQuorumState and BlkDriver and do init.
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 01/10] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
@ 2012-08-07 13:44 ` Benoît Canet
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open() Benoît Canet
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 046b183..e0405b6 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -15,6 +15,10 @@
 
 #include "block_int.h"
 
+typedef struct {
+    BlockDriverState * bs[3];
+} BDRVQuorumState;
+
 typedef struct QuorumAIOCB QuorumAIOCB;
 
 typedef struct QuorumSingleAIOCB {
@@ -42,3 +46,17 @@ struct QuorumAIOCB {
     void (*vote)(QuorumAIOCB *acb);
     int vote_ret;
 };
+
+static BlockDriver bdrv_quorum = {
+    .format_name        = "quorum",
+    .protocol_name      = "quorum",
+
+    .instance_size      = sizeof(BDRVQuorumState),
+};
+
+static void bdrv_quorum_init(void)
+{
+    bdrv_register(&bdrv_quorum);
+}
+
+block_init(bdrv_quorum_init);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open().
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 01/10] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 02/10] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
@ 2012-08-07 13:44 ` Benoît Canet
  2012-08-07 20:30   ` Blue Swirl
  2012-08-08 15:04   ` Stefan Hajnoczi
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 04/10] quorum: Add quorum_close() Benoît Canet
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index e0405b6..de58ab8 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -47,11 +47,73 @@ struct QuorumAIOCB {
     int vote_ret;
 };
 
+/* Valid quorum filenames look like
+ * quorum:path/to/a_image:path/to/b_image:path/to/c_image
+ */
+static int quorum_open(BlockDriverState *bs, const char *filename, int flags)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int ret, i;
+    char *a, *b, *c, *filenames[3];
+
+    /* Parse the quorum: prefix */
+    if (strncmp(filename, "quorum:", strlen("quorum:"))) {
+        return -EINVAL;
+    }
+    a = g_strdup(filename + strlen("quorum:"));
+
+    /* Find separators */
+    b = strchr(a, ':');
+    if (b == NULL) {
+        return -EINVAL;
+    }
+
+    c = strrchr(a, ':');
+    if (c == NULL) {
+        return -EINVAL;
+    }
+
+    /* Check that filename contains two separate ':' */
+    if (b == c) {
+        return -EINVAL;
+    }
+
+    /* Split string */
+    *b = '\0';
+    *c = '\0';
+
+    filenames[0] = a;
+    filenames[1] = b + 1;
+    filenames[2] = c + 1;
+
+    /* Open files */
+    for (i = 0; i <= 2; i++) {
+        s->bs[i] = bdrv_new("");
+        ret = bdrv_open(s->bs[i], filenames[i], flags, NULL);
+        if (ret < 0) {
+            goto error_exit;
+        }
+    }
+
+    goto clean_exit;
+
+error_exit:
+    for (; i >= 0; i--) {
+        bdrv_delete(s->bs[i]);
+        s->bs[i] = NULL;
+    }
+clean_exit:
+    g_free(a);
+    return ret;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
 
     .instance_size      = sizeof(BDRVQuorumState),
+
+    .bdrv_file_open     = quorum_open,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [RFC V2 04/10] quorum: Add quorum_close().
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
                   ` (2 preceding siblings ...)
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open() Benoît Canet
@ 2012-08-07 13:44 ` Benoît Canet
  2012-08-07 20:30   ` Blue Swirl
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 05/10] quorum: Add quorum_getlength() Benoît Canet
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index de58ab8..9da0432 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -107,6 +107,17 @@ clean_exit:
     return ret;
 }
 
+static void quorum_close(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    /* Ensure writes reach stable storage */
+    for (i = 0; i <= 2; i++) {
+        bdrv_flush(s->bs[i]);
+    }
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
@@ -114,6 +125,7 @@ static BlockDriver bdrv_quorum = {
     .instance_size      = sizeof(BDRVQuorumState),
 
     .bdrv_file_open     = quorum_open,
+    .bdrv_close         = quorum_close,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [RFC V2 05/10] quorum: Add quorum_getlength().
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
                   ` (3 preceding siblings ...)
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 04/10] quorum: Add quorum_close() Benoît Canet
@ 2012-08-07 13:44 ` Benoît Canet
  2012-08-08 15:30   ` Stefan Hajnoczi
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 06/10] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 9da0432..5cd7083 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -118,12 +118,29 @@ static void quorum_close(BlockDriverState *bs)
     }
 }
 
+static int64_t quorum_getlength(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+    int64_t ret;
+
+    /* return the length of the first available quorum file */
+    for (i = 0, ret = bdrv_getlength(s->bs[i]);
+         ret == -ENOMEDIUM && i <= 2;
+         i++, ret = bdrv_getlength(s->bs[i])) {
+    }
+
+    return ret;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
 
     .instance_size      = sizeof(BDRVQuorumState),
 
+    .bdrv_getlength     = quorum_getlength,
+
     .bdrv_file_open     = quorum_open,
     .bdrv_close         = quorum_close,
 };
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [RFC V2 06/10] quorum: Add quorum_aio_writev and its dependencies.
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
                   ` (4 preceding siblings ...)
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 05/10] quorum: Add quorum_getlength() Benoît Canet
@ 2012-08-07 13:44 ` Benoît Canet
  2012-08-08 15:37   ` Stefan Hajnoczi
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 07/10] blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare() public Benoît Canet
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |  110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 5cd7083..e6d2274 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -133,6 +133,114 @@ static int64_t quorum_getlength(BlockDriverState *bs)
     return ret;
 }
 
+static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+    QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
+    bool finished = false;
+
+    /* Wait for the request to finish */
+    acb->finished = &finished;
+    while (!finished) {
+        qemu_aio_wait();
+    }
+}
+
+static AIOPool quorum_aio_pool = {
+    .aiocb_size         = sizeof(QuorumAIOCB),
+    .cancel             = quorum_aio_cancel,
+};
+
+static int quorum_check_ret(QuorumAIOCB *acb)
+{
+    int i, j;
+
+    for (i = 0, j = 0; i <= 2; i++) {
+        if (acb->aios[0].ret) {
+            j++;
+        }
+    }
+
+    if (j > 1) {
+        return -EIO;
+    }
+
+    return 0;
+}
+
+static void quorum_aio_bh(void *opaque)
+{
+    QuorumAIOCB *acb = opaque;
+
+    qemu_bh_delete(acb->bh);
+    acb->common.cb(acb->common.opaque, quorum_check_ret(acb));
+    if (acb->finished) {
+        *acb->finished = true;
+    }
+    qemu_aio_release(acb);
+}
+
+static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
+                                 QEMUIOVector *qiov,
+                                 int64_t sector_num,
+                                 int nb_sectors,
+                                 BlockDriverCompletionFunc *cb,
+                                 void *opaque)
+{
+    QuorumAIOCB *acb = qemu_aio_get(&quorum_aio_pool, bs, cb, opaque);
+    int i;
+
+    acb->qiov = qiov;
+    acb->bh = NULL;
+    acb->count = 0;
+    acb->sector_num = sector_num;
+    acb->nb_sectors = nb_sectors;
+    acb->vote = NULL;
+    acb->vote_ret = 0;
+
+    for (i = 0; i <= 2; i++) {
+        acb->aios[i].buf = NULL;
+        acb->aios[i].ret = 0;
+        acb->aios[i].parent = acb;
+    }
+
+    return acb;
+}
+
+static void quorum_aio_cb(void *opaque, int ret)
+{
+    QuorumSingleAIOCB *sacb = opaque;
+    QuorumAIOCB *acb = sacb->parent;
+
+    sacb->ret = ret;
+    acb->count++;
+    assert(acb->count <= 3);
+    if (acb->count == 3) {
+        acb->bh = qemu_bh_new(quorum_aio_bh, acb);
+        qemu_bh_schedule(acb->bh);
+    }
+}
+
+static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
+                                          int64_t sector_num,
+                                          QEMUIOVector *qiov,
+                                          int nb_sectors,
+                                          BlockDriverCompletionFunc *cb,
+                                          void *opaque)
+{
+    BDRVQuorumState *s = bs->opaque;
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors,
+                                    cb, opaque);
+    int i;
+
+    for (i = 0; i <= 2; i++) {
+        acb->aios[i].aiocb = bdrv_aio_writev(s->bs[i], sector_num, qiov,
+                                             nb_sectors, &quorum_aio_cb,
+                                             &acb->aios[i]);
+    }
+
+    return &acb->common;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
@@ -143,6 +251,8 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_file_open     = quorum_open,
     .bdrv_close         = quorum_close,
+
+    .bdrv_aio_writev    = quorum_aio_writev,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [RFC V2 07/10] blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare() public
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
                   ` (5 preceding siblings ...)
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 06/10] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
@ 2012-08-07 13:44 ` Benoît Canet
  2012-08-08 15:38   ` Stefan Hajnoczi
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 08/10] quorum: Add quorum_co_flush() Benoît Canet
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/blkverify.c |    8 ++++++--
 block/quorum.c    |    4 ++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/blkverify.c b/block/blkverify.c
index 9d5f1ec..9e15081 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -11,6 +11,10 @@
 #include "qemu_socket.h" /* for EINPROGRESS on Windows */
 #include "block_int.h"
 
+ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b);
+void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
+                           void *buf);
+
 typedef struct {
     BlockDriverState *test_file;
 } BDRVBlkverifyState;
@@ -130,7 +134,7 @@ static int64_t blkverify_getlength(BlockDriverState *bs)
  * @b:          I/O vector
  * @ret:        Offset to first mismatching byte or -1 if match
  */
-static ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
+ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
 {
     int i;
     ssize_t offset = 0;
@@ -190,7 +194,7 @@ static int sortelem_cmp_src_index(const void *a, const void *b)
  * The relative relationships of overlapping iovecs are preserved.  This is
  * necessary to ensure identical semantics in the cloned I/O vector.
  */
-static void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
+void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
                                   void *buf)
 {
     IOVectorSortElem sortelems[src->niov];
diff --git a/block/quorum.c b/block/quorum.c
index e6d2274..003fc3f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -15,6 +15,10 @@
 
 #include "block_int.h"
 
+ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b);
+void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
+                           void *buf);
+
 typedef struct {
     BlockDriverState * bs[3];
 } BDRVQuorumState;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [RFC V2 08/10] quorum: Add quorum_co_flush().
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
                   ` (6 preceding siblings ...)
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 07/10] blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare() public Benoît Canet
@ 2012-08-07 13:44 ` Benoît Canet
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 09/10] quorum: Add quorum_aio_readv Benoît Canet
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 003fc3f..2df3ae6 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -245,6 +245,21 @@ static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
     return &acb->common;
 }
 
+static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i, ret;
+
+    for (i = 0; i <= 2; i++) {
+        ret = bdrv_co_flush(s->bs[i]);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return ret;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
@@ -255,6 +270,7 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_file_open     = quorum_open,
     .bdrv_close         = quorum_close,
+    .bdrv_co_flush_to_disk = quorum_co_flush,
 
     .bdrv_aio_writev    = quorum_aio_writev,
 };
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [RFC V2 09/10] quorum: Add quorum_aio_readv.
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
                   ` (7 preceding siblings ...)
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 08/10] quorum: Add quorum_co_flush() Benoît Canet
@ 2012-08-07 13:44 ` Benoît Canet
  2012-08-08 15:44   ` Stefan Hajnoczi
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 10/10] quorum: Add quorum mechanism Benoît Canet
  2012-08-08 14:55 ` [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Stefan Hajnoczi
  10 siblings, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 2df3ae6..13804c1 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -174,6 +174,14 @@ static int quorum_check_ret(QuorumAIOCB *acb)
 static void quorum_aio_bh(void *opaque)
 {
     QuorumAIOCB *acb = opaque;
+    int i;
+
+    for (i = 0; i <= 2; i++) {
+        if (acb->aios[i].buf) {
+            g_free(acb->aios[i].buf);
+            acb->aios[i].buf = NULL;
+        }
+    }
 
     qemu_bh_delete(acb->bh);
     acb->common.cb(acb->common.opaque, quorum_check_ret(acb));
@@ -224,6 +232,32 @@ static void quorum_aio_cb(void *opaque, int ret)
     }
 }
 
+static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
+                                         int64_t sector_num,
+                                         QEMUIOVector *qiov,
+                                         int nb_sectors,
+                                         BlockDriverCompletionFunc *cb,
+                                         void *opaque)
+{
+    BDRVQuorumState *s = bs->opaque;
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num,
+                                      nb_sectors, cb, opaque);
+    int i;
+
+    for (i = 0; i <= 2; i++) {
+        acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
+        qemu_iovec_init(&acb->qiovs[i], qiov->niov);
+        blkverify_iovec_clone(&acb->qiovs[i], qiov, acb->aios[i].buf);
+    }
+
+    for (i = 0; i <= 2; i++) {
+        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+                       quorum_aio_cb, &acb->aios[i]);
+    }
+
+    return &acb->common;
+}
+
 static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
                                           int64_t sector_num,
                                           QEMUIOVector *qiov,
@@ -272,6 +306,7 @@ static BlockDriver bdrv_quorum = {
     .bdrv_close         = quorum_close,
     .bdrv_co_flush_to_disk = quorum_co_flush,
 
+    .bdrv_aio_readv     = quorum_aio_readv,
     .bdrv_aio_writev    = quorum_aio_writev,
 };
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [RFC V2 10/10] quorum: Add quorum mechanism.
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
                   ` (8 preceding siblings ...)
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 09/10] quorum: Add quorum_aio_readv Benoît Canet
@ 2012-08-07 13:44 ` Benoît Canet
  2012-08-08 15:54   ` Stefan Hajnoczi
  2012-08-08 14:55 ` [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Stefan Hajnoczi
  10 siblings, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-07 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 78 insertions(+), 6 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 13804c1..5914141 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -174,7 +174,7 @@ static int quorum_check_ret(QuorumAIOCB *acb)
 static void quorum_aio_bh(void *opaque)
 {
     QuorumAIOCB *acb = opaque;
-    int i;
+    int i, ret;
 
     for (i = 0; i <= 2; i++) {
         if (acb->aios[i].buf) {
@@ -184,7 +184,12 @@ static void quorum_aio_bh(void *opaque)
     }
 
     qemu_bh_delete(acb->bh);
-    acb->common.cb(acb->common.opaque, quorum_check_ret(acb));
+    if (acb->vote_ret) {
+        ret = acb->vote_ret;
+    } else {
+        ret = quorum_check_ret(acb);
+    }
+    acb->common.cb(acb->common.opaque, ret);
     if (acb->finished) {
         *acb->finished = true;
     }
@@ -226,10 +231,75 @@ static void quorum_aio_cb(void *opaque, int ret)
     sacb->ret = ret;
     acb->count++;
     assert(acb->count <= 3);
-    if (acb->count == 3) {
-        acb->bh = qemu_bh_new(quorum_aio_bh, acb);
-        qemu_bh_schedule(acb->bh);
+    if (acb->count < 3) {
+        return;
     }
+
+    /* Do the quorum */
+    if (acb->vote) {
+        acb->vote(acb);
+    }
+
+    acb->bh = qemu_bh_new(quorum_aio_bh, acb);
+    qemu_bh_schedule(acb->bh);
+}
+
+static void quorum_print_bad(QuorumAIOCB *acb, const char *filename)
+{
+    fprintf(stderr, "quorum: corrected error in quorum file %s: sector_num=%"
+            PRId64 " nb_sectors=%i\n", filename, acb->sector_num,
+            acb->nb_sectors);
+}
+
+static void quorum_print_failure(QuorumAIOCB *acb)
+{
+    fprintf(stderr, "quorum: failure sector_num=%" PRId64 " nb_sectors=%i\n",
+            acb->sector_num, acb->nb_sectors);
+}
+
+static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
+{
+    int i;
+    for (i = 0; i < source->niov; i++) {
+        memcpy(dest->iov[i].iov_base,
+               source->iov[i].iov_base,
+               source->iov[i].iov_len);
+        dest->iov[i].iov_len = source->iov[i].iov_len;
+    }
+    dest->niov = source->niov;
+    dest->nalloc = source->nalloc;
+    dest->size = source->size;
+}
+
+static void quorum_vote(QuorumAIOCB *acb)
+{
+    ssize_t a_b, b_c, a_c;
+    a_b = blkverify_iovec_compare(&acb->qiovs[0], &acb->qiovs[1]);
+    b_c = blkverify_iovec_compare(&acb->qiovs[1], &acb->qiovs[2]);
+
+    /* Three vector identical -> quorum */
+    if (a_b == b_c && a_b == -1) {
+        quorum_copy_qiov(acb->qiov, &acb->qiovs[0]); /*clone a */
+        return;
+    }
+    if (a_b == -1) {
+        quorum_print_bad(acb, "C");
+        quorum_copy_qiov(acb->qiov, &acb->qiovs[0]); /*clone a */
+        return;
+    }
+    if (b_c == -1) {
+        quorum_print_bad(acb, "A");
+        quorum_copy_qiov(acb->qiov, &acb->qiovs[1]); /*clone b */
+        return;
+    }
+    a_c = blkverify_iovec_compare(&acb->qiovs[0], &acb->qiovs[2]);
+    if (a_c == -1) {
+        quorum_print_bad(acb, "B");
+        quorum_copy_qiov(acb->qiov, &acb->qiovs[0]); /*clone a */
+        return;
+    }
+    quorum_print_failure(acb);
+    acb->vote_ret = -EIO;
 }
 
 static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
@@ -244,6 +314,8 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
                                       nb_sectors, cb, opaque);
     int i;
 
+    acb->vote = quorum_vote;
+
     for (i = 0; i <= 2; i++) {
         acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
         qemu_iovec_init(&acb->qiovs[i], qiov->niov);
@@ -251,7 +323,7 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
     }
 
     for (i = 0; i <= 2; i++) {
-        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+        bdrv_aio_readv(s->bs[i], sector_num, &acb->qiovs[i], nb_sectors,
                        quorum_aio_cb, &acb->aios[i]);
     }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 01/10] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 01/10] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
@ 2012-08-07 20:24   ` Blue Swirl
  0 siblings, 0 replies; 33+ messages in thread
From: Blue Swirl @ 2012-08-07 20:24 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

On Tue, Aug 7, 2012 at 1:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/Makefile.objs |    1 +
>  block/quorum.c      |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+)
>  create mode 100644 block/quorum.c
>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index b5754d3..66af6dc 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>  block-obj-y += stream.o
> +block-obj-y += quorum.o
>  block-obj-$(CONFIG_WIN32) += raw-win32.o
>  block-obj-$(CONFIG_POSIX) += raw-posix.o
>  block-obj-$(CONFIG_LIBISCSI) += iscsi.o
> diff --git a/block/quorum.c b/block/quorum.c
> new file mode 100644
> index 0000000..046b183
> --- /dev/null
> +++ b/block/quorum.c
> @@ -0,0 +1,44 @@
> +/*
> + * Quorum Block filter
> + *
> + * Copyright (C) Nodalink, SARL. 2012
> + *
> + * Author:
> + *   Benoît Canet <benoit.canet@irqsave.net>
> + *
> + * Based on the design and code of blkverify.c (Copyright (C) 2010 IBM, Corp)
> + * and blkmirror.c (Copyright (C) 2011 Red Hat, Inc).
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "block_int.h"
> +
> +typedef struct QuorumAIOCB QuorumAIOCB;
> +
> +typedef struct QuorumSingleAIOCB {
> +    BlockDriverAIOCB *aiocb;
> +    char *buf;

uint8_t *buf?

> +    int ret;
> +    QuorumAIOCB *parent;
> +} QuorumSingleAIOCB;
> +
> +struct QuorumAIOCB {
> +    BlockDriverAIOCB common;
> +    QEMUBH *bh;
> +
> +    /* Request metadata */
> +    int64_t sector_num;
> +    int nb_sectors;
> +
> +    QEMUIOVector *qiov;         /* calling readv IOV */
> +
> +    QuorumSingleAIOCB aios[3];   /* individual AIOs */
> +    QEMUIOVector qiovs[3];      /* individual IOVs */
> +    int count;                  /* number of completed AIOCB */
> +    bool *finished;             /* completion signal for cancel */
> +
> +    void (*vote)(QuorumAIOCB *acb);
> +    int vote_ret;
> +};
> --
> 1.7.9.5
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open().
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open() Benoît Canet
@ 2012-08-07 20:30   ` Blue Swirl
  2012-08-07 20:40     ` Eric Blake
  2012-08-10 17:48     ` Benoît Canet
  2012-08-08 15:04   ` Stefan Hajnoczi
  1 sibling, 2 replies; 33+ messages in thread
From: Blue Swirl @ 2012-08-07 20:30 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

On Tue, Aug 7, 2012 at 1:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/quorum.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index e0405b6..de58ab8 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -47,11 +47,73 @@ struct QuorumAIOCB {
>      int vote_ret;
>  };
>
> +/* Valid quorum filenames look like
> + * quorum:path/to/a_image:path/to/b_image:path/to/c_image

This syntax would mean that stacking for example curl or other network
paths would not be possible. How about comma as separator?

> + */
> +static int quorum_open(BlockDriverState *bs, const char *filename, int flags)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int ret, i;
> +    char *a, *b, *c, *filenames[3];
> +
> +    /* Parse the quorum: prefix */
> +    if (strncmp(filename, "quorum:", strlen("quorum:"))) {
> +        return -EINVAL;
> +    }
> +    a = g_strdup(filename + strlen("quorum:"));
> +
> +    /* Find separators */
> +    b = strchr(a, ':');
> +    if (b == NULL) {
> +        return -EINVAL;
> +    }
> +
> +    c = strrchr(a, ':');
> +    if (c == NULL) {
> +        return -EINVAL;
> +    }
> +
> +    /* Check that filename contains two separate ':' */
> +    if (b == c) {
> +        return -EINVAL;
> +    }
> +
> +    /* Split string */
> +    *b = '\0';
> +    *c = '\0';
> +
> +    filenames[0] = a;
> +    filenames[1] = b + 1;
> +    filenames[2] = c + 1;
> +
> +    /* Open files */
> +    for (i = 0; i <= 2; i++) {
> +        s->bs[i] = bdrv_new("");
> +        ret = bdrv_open(s->bs[i], filenames[i], flags, NULL);
> +        if (ret < 0) {
> +            goto error_exit;

Successfully opening two out of three should be enough, but maybe it
does not make much sense.

> +        }
> +    }
> +
> +    goto clean_exit;
> +
> +error_exit:
> +    for (; i >= 0; i--) {
> +        bdrv_delete(s->bs[i]);

bdrv_close() instead?

> +        s->bs[i] = NULL;
> +    }
> +clean_exit:
> +    g_free(a);
> +    return ret;
> +}
> +
>  static BlockDriver bdrv_quorum = {
>      .format_name        = "quorum",
>      .protocol_name      = "quorum",
>
>      .instance_size      = sizeof(BDRVQuorumState),
> +
> +    .bdrv_file_open     = quorum_open,
>  };
>
>  static void bdrv_quorum_init(void)
> --
> 1.7.9.5
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 04/10] quorum: Add quorum_close().
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 04/10] quorum: Add quorum_close() Benoît Canet
@ 2012-08-07 20:30   ` Blue Swirl
  2012-08-08 15:06     ` Stefan Hajnoczi
  0 siblings, 1 reply; 33+ messages in thread
From: Blue Swirl @ 2012-08-07 20:30 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, anthony, pbonzini, eblake, afaerber,
	Benoît Canet

On Tue, Aug 7, 2012 at 1:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/quorum.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index de58ab8..9da0432 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -107,6 +107,17 @@ clean_exit:
>      return ret;
>  }
>
> +static void quorum_close(BlockDriverState *bs)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    /* Ensure writes reach stable storage */
> +    for (i = 0; i <= 2; i++) {
> +        bdrv_flush(s->bs[i]);

bdrv_close()

> +    }
> +}
> +
>  static BlockDriver bdrv_quorum = {
>      .format_name        = "quorum",
>      .protocol_name      = "quorum",
> @@ -114,6 +125,7 @@ static BlockDriver bdrv_quorum = {
>      .instance_size      = sizeof(BDRVQuorumState),
>
>      .bdrv_file_open     = quorum_open,
> +    .bdrv_close         = quorum_close,
>  };
>
>  static void bdrv_quorum_init(void)
> --
> 1.7.9.5
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open().
  2012-08-07 20:30   ` Blue Swirl
@ 2012-08-07 20:40     ` Eric Blake
  2012-08-10 17:48     ` Benoît Canet
  1 sibling, 0 replies; 33+ messages in thread
From: Eric Blake @ 2012-08-07 20:40 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kwolf, Benoît Canet, stefanha, qemu-devel, anthony,
	pbonzini, afaerber, Benoît Canet

[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]

On 08/07/2012 02:30 PM, Blue Swirl wrote:
> On Tue, Aug 7, 2012 at 1:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
>> Signed-off-by: Benoit Canet <benoit@irqsave.net>
>> ---
>>  block/quorum.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 62 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index e0405b6..de58ab8 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -47,11 +47,73 @@ struct QuorumAIOCB {
>>      int vote_ret;
>>  };
>>
>> +/* Valid quorum filenames look like
>> + * quorum:path/to/a_image:path/to/b_image:path/to/c_image
> 
> This syntax would mean that stacking for example curl or other network
> paths would not be possible. How about comma as separator?

Also, what escaping mechanism is in place for allowing a file containing
the same character as the separator (whether you end up with : or , as
the separator)?  If you support a larger quorum (whether always n/(2n-1)
or whether fully configurable n/m), rather than hard-coded 2/3, then you
also need a way to specify how many quorum members will follow.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency
  2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
                   ` (9 preceding siblings ...)
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 10/10] quorum: Add quorum mechanism Benoît Canet
@ 2012-08-08 14:55 ` Stefan Hajnoczi
  2012-08-10 17:45   ` Benoît Canet
  10 siblings, 1 reply; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 14:55 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
	eblake, afaerber, Benoît Canet

On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> This patchset create a block driver implementing a quorum using three qemu disk
> images. Writes are mirrored on the three files.
> For the reading part the three files are read at the same time and a vote is
> done to determine which is the majority qiov version. It then return this
> majority version to the upper layers.
> When three differents versions of the data are returned by the lower layer the
> quorum is broken and the read return -EIO.

If you make the quorum setting configurable, then this can replace
blkverify.  n=2 is blkverify, n=3 is your current patch series, n/m is
also possible where n=number of mirrors and m=threshold needed to
achieve quorum.

Stefan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open().
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open() Benoît Canet
  2012-08-07 20:30   ` Blue Swirl
@ 2012-08-08 15:04   ` Stefan Hajnoczi
  1 sibling, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 15:04 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
	eblake, afaerber, Benoît Canet

On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/quorum.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index e0405b6..de58ab8 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -47,11 +47,73 @@ struct QuorumAIOCB {
>      int vote_ret;
>  };
>
> +/* Valid quorum filenames look like
> + * quorum:path/to/a_image:path/to/b_image:path/to/c_image
> + */
> +static int quorum_open(BlockDriverState *bs, const char *filename, int flags)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int ret, i;
> +    char *a, *b, *c, *filenames[3];
> +
> +    /* Parse the quorum: prefix */
> +    if (strncmp(filename, "quorum:", strlen("quorum:"))) {
> +        return -EINVAL;
> +    }
> +    a = g_strdup(filename + strlen("quorum:"));
> +
> +    /* Find separators */
> +    b = strchr(a, ':');
> +    if (b == NULL) {
> +        return -EINVAL;

Leaks a.  Same for returns below.

>  static BlockDriver bdrv_quorum = {
>      .format_name        = "quorum",
>      .protocol_name      = "quorum",
>
>      .instance_size      = sizeof(BDRVQuorumState),
> +
> +    .bdrv_file_open     = quorum_open,

Please squash quorum_close() into this commit so the g_strdup()
filesnames[] aren't leaked on close.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 04/10] quorum: Add quorum_close().
  2012-08-07 20:30   ` Blue Swirl
@ 2012-08-08 15:06     ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 15:06 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kwolf, Benoît Canet, stefanha, qemu-devel, anthony,
	pbonzini, eblake, afaerber, Benoît Canet

On Tue, Aug 7, 2012 at 9:30 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Tue, Aug 7, 2012 at 1:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
>> +static void quorum_close(BlockDriverState *bs)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +    int i;
>> +
>> +    /* Ensure writes reach stable storage */
>> +    for (i = 0; i <= 2; i++) {
>> +        bdrv_flush(s->bs[i]);
>
> bdrv_close()

bdrv_delete() instead of close since we did bdrv_new() to allocate them.

We also need to g_free() the filenames.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 05/10] quorum: Add quorum_getlength().
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 05/10] quorum: Add quorum_getlength() Benoît Canet
@ 2012-08-08 15:30   ` Stefan Hajnoczi
  2012-08-09  9:07     ` Benoît Canet
  0 siblings, 1 reply; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 15:30 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
	eblake, afaerber, Benoît Canet

On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/quorum.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index 9da0432..5cd7083 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -118,12 +118,29 @@ static void quorum_close(BlockDriverState *bs)
>      }
>  }
>
> +static int64_t quorum_getlength(BlockDriverState *bs)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +    int64_t ret;
> +
> +    /* return the length of the first available quorum file */
> +    for (i = 0, ret = bdrv_getlength(s->bs[i]);
> +         ret == -ENOMEDIUM && i <= 2;
> +         i++, ret = bdrv_getlength(s->bs[i])) {
> +    }

Why is -ENOMEDIUM an expected error value?

IMO a for loop with a body or a do while loop would make this easier to read.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 06/10] quorum: Add quorum_aio_writev and its dependencies.
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 06/10] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
@ 2012-08-08 15:37   ` Stefan Hajnoczi
  2012-08-09  9:24     ` Benoît Canet
  0 siblings, 1 reply; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 15:37 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
	eblake, afaerber, Benoît Canet

On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> +static int quorum_check_ret(QuorumAIOCB *acb)
> +{
> +    int i, j;
> +
> +    for (i = 0, j = 0; i <= 2; i++) {
> +        if (acb->aios[0].ret) {
> +            j++;
> +        }
> +    }
> +
> +    if (j > 1) {
> +        return -EIO;
> +    }
> +
> +    return 0;
> +}

Simpler version just scans the return values (also I think
acb->aios[0].ret should be acb->aios[i].ret):

static int quorum_check_ret(QuorumAIOCB *acb)
{
    int i;
    for (i = 0; i <= 2; i++) {
        if (acb->aios[i].ret) {
            return -EIO; /* or acb->aios[i].ret */
        }
    }
    return 0;
}

> +
> +static void quorum_aio_bh(void *opaque)
> +{
> +    QuorumAIOCB *acb = opaque;
> +
> +    qemu_bh_delete(acb->bh);
> +    acb->common.cb(acb->common.opaque, quorum_check_ret(acb));
> +    if (acb->finished) {
> +        *acb->finished = true;
> +    }
> +    qemu_aio_release(acb);
> +}
> +
> +static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
> +                                 QEMUIOVector *qiov,
> +                                 int64_t sector_num,
> +                                 int nb_sectors,
> +                                 BlockDriverCompletionFunc *cb,
> +                                 void *opaque)
> +{
> +    QuorumAIOCB *acb = qemu_aio_get(&quorum_aio_pool, bs, cb, opaque);
> +    int i;
> +
> +    acb->qiov = qiov;
> +    acb->bh = NULL;
> +    acb->count = 0;
> +    acb->sector_num = sector_num;
> +    acb->nb_sectors = nb_sectors;
> +    acb->vote = NULL;
> +    acb->vote_ret = 0;

acb->finished = NULL;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 07/10] blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare() public
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 07/10] blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare() public Benoît Canet
@ 2012-08-08 15:38   ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 15:38 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
	eblake, afaerber, Benoît Canet

On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/blkverify.c |    8 ++++++--
>  block/quorum.c    |    4 ++++
>  2 files changed, 10 insertions(+), 2 deletions(-)

Perhaps these should be in cutils.c with the other iovec functions.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 09/10] quorum: Add quorum_aio_readv.
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 09/10] quorum: Add quorum_aio_readv Benoît Canet
@ 2012-08-08 15:44   ` Stefan Hajnoczi
  2012-08-08 15:46     ` Stefan Hajnoczi
  0 siblings, 1 reply; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 15:44 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
	eblake, afaerber, Benoît Canet

On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/quorum.c |   35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index 2df3ae6..13804c1 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -174,6 +174,14 @@ static int quorum_check_ret(QuorumAIOCB *acb)
>  static void quorum_aio_bh(void *opaque)
>  {
>      QuorumAIOCB *acb = opaque;
> +    int i;
> +
> +    for (i = 0; i <= 2; i++) {
> +        if (acb->aios[i].buf) {
> +            g_free(acb->aios[i].buf);
> +            acb->aios[i].buf = NULL;
> +        }

qemu_blockalign() buffers must be freed using qemu_vfree().  Also
qemu_vfree(NULL) is a nop so there no need for if (acb->aios[i].buf),
it can be done unconditionally.

> +    }
>
>      qemu_bh_delete(acb->bh);
>      acb->common.cb(acb->common.opaque, quorum_check_ret(acb));
> @@ -224,6 +232,32 @@ static void quorum_aio_cb(void *opaque, int ret)
>      }
>  }
>
> +static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
> +                                         int64_t sector_num,
> +                                         QEMUIOVector *qiov,
> +                                         int nb_sectors,
> +                                         BlockDriverCompletionFunc *cb,
> +                                         void *opaque)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num,
> +                                      nb_sectors, cb, opaque);
> +    int i;
> +
> +    for (i = 0; i <= 2; i++) {
> +        acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
> +        qemu_iovec_init(&acb->qiovs[i], qiov->niov);
> +        blkverify_iovec_clone(&acb->qiovs[i], qiov, acb->aios[i].buf);
> +    }
> +
> +    for (i = 0; i <= 2; i++) {
> +        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
> +                       quorum_aio_cb, &acb->aios[i]);

acb->qiovs[i] instead of qiov.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 09/10] quorum: Add quorum_aio_readv.
  2012-08-08 15:44   ` Stefan Hajnoczi
@ 2012-08-08 15:46     ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 15:46 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
	eblake, afaerber, Benoît Canet

On Wed, Aug 8, 2012 at 4:44 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
>> @@ -224,6 +232,32 @@ static void quorum_aio_cb(void *opaque, int ret)
>>      }
>>  }
>>
>> +static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
>> +                                         int64_t sector_num,
>> +                                         QEMUIOVector *qiov,
>> +                                         int nb_sectors,
>> +                                         BlockDriverCompletionFunc *cb,
>> +                                         void *opaque)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num,
>> +                                      nb_sectors, cb, opaque);
>> +    int i;
>> +
>> +    for (i = 0; i <= 2; i++) {
>> +        acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
>> +        qemu_iovec_init(&acb->qiovs[i], qiov->niov);
>> +        blkverify_iovec_clone(&acb->qiovs[i], qiov, acb->aios[i].buf);
>> +    }
>> +
>> +    for (i = 0; i <= 2; i++) {
>> +        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
>> +                       quorum_aio_cb, &acb->aios[i]);
>
> acb->qiovs[i] instead of qiov.

Ah, I now see this was intentional because voting hasn't been added yet.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 10/10] quorum: Add quorum mechanism.
  2012-08-07 13:44 ` [Qemu-devel] [RFC V2 10/10] quorum: Add quorum mechanism Benoît Canet
@ 2012-08-08 15:54   ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 15:54 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
	eblake, afaerber, Benoît Canet

On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> +static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
> +{
> +    int i;
> +    for (i = 0; i < source->niov; i++) {
> +        memcpy(dest->iov[i].iov_base,
> +               source->iov[i].iov_base,
> +               source->iov[i].iov_len);
> +        dest->iov[i].iov_len = source->iov[i].iov_len;
> +    }
> +    dest->niov = source->niov;
> +    dest->nalloc = source->nalloc;
> +    dest->size = source->size;

dest and source must be compatible.  Their element lengths must be identical.

Therefore I suggest dropping the assignments and replacing them with
assert(3) calls that remind us that we know they are compatible.

> +}
> +
> +static void quorum_vote(QuorumAIOCB *acb)
> +{
> +    ssize_t a_b, b_c, a_c;
> +    a_b = blkverify_iovec_compare(&acb->qiovs[0], &acb->qiovs[1]);
> +    b_c = blkverify_iovec_compare(&acb->qiovs[1], &acb->qiovs[2]);
> +
> +    /* Three vector identical -> quorum */
> +    if (a_b == b_c && a_b == -1) {
> +        quorum_copy_qiov(acb->qiov, &acb->qiovs[0]); /*clone a */
> +        return;
> +    }
> +    if (a_b == -1) {
> +        quorum_print_bad(acb, "C");
> +        quorum_copy_qiov(acb->qiov, &acb->qiovs[0]); /*clone a */
> +        return;
> +    }
> +    if (b_c == -1) {
> +        quorum_print_bad(acb, "A");
> +        quorum_copy_qiov(acb->qiov, &acb->qiovs[1]); /*clone b */
> +        return;
> +    }
> +    a_c = blkverify_iovec_compare(&acb->qiovs[0], &acb->qiovs[2]);
> +    if (a_c == -1) {
> +        quorum_print_bad(acb, "B");
> +        quorum_copy_qiov(acb->qiov, &acb->qiovs[0]); /*clone a */
> +        return;
> +    }
> +    quorum_print_failure(acb);
> +    acb->vote_ret = -EIO;
>  }

In the common case comparison will succeed so we could use acb->qiov
as acb->qiovs[0] (a's qiov).  In that case we wouldn't need to copy
the data.  If you feel this will complicate things you could leave a
comment so someone can add it in the future, if necessary.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 05/10] quorum: Add quorum_getlength().
  2012-08-08 15:30   ` Stefan Hajnoczi
@ 2012-08-09  9:07     ` Benoît Canet
  2012-08-09  9:14       ` Stefan Hajnoczi
  0 siblings, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-09  9:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, Benoît Canet, stefanha, qemu-devel, blauwirbel,
	anthony, pbonzini, eblake, afaerber

> > +static int64_t quorum_getlength(BlockDriverState *bs)
> > +{
> > +    BDRVQuorumState *s = bs->opaque;
> > +    int i;
> > +    int64_t ret;
> > +
> > +    /* return the length of the first available quorum file */
> > +    for (i = 0, ret = bdrv_getlength(s->bs[i]);
> > +         ret == -ENOMEDIUM && i <= 2;
> > +         i++, ret = bdrv_getlength(s->bs[i])) {
> > +    }
> 
> Why is -ENOMEDIUM an expected error value?

I put the -ENOMEDIUM test because of the following piece of code.
/**                                                                                                                                    
 * Length of a file in bytes. Return < 0 if error or unknown.                                                                          
 */                                                                                                                                    
int64_t bdrv_getlength(BlockDriverState *bs)                                                                                           
{                                                                                                                                      
    BlockDriver *drv = bs->drv;                                                                                                        
    if (!drv)                                                                                                                          
        return -ENOMEDIUM;   

Still I am not sure it's needed. What is your stance on this ?

> 
> IMO a for loop with a body or a do while loop would make this easier to read.
> 
Ok

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 05/10] quorum: Add quorum_getlength().
  2012-08-09  9:07     ` Benoît Canet
@ 2012-08-09  9:14       ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-09  9:14 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, Benoît Canet, stefanha, qemu-devel, blauwirbel,
	anthony, pbonzini, eblake, afaerber

On Thu, Aug 9, 2012 at 10:07 AM, Benoît Canet <benoit.canet@irqsave.net> wrote:
>> > +static int64_t quorum_getlength(BlockDriverState *bs)
>> > +{
>> > +    BDRVQuorumState *s = bs->opaque;
>> > +    int i;
>> > +    int64_t ret;
>> > +
>> > +    /* return the length of the first available quorum file */
>> > +    for (i = 0, ret = bdrv_getlength(s->bs[i]);
>> > +         ret == -ENOMEDIUM && i <= 2;
>> > +         i++, ret = bdrv_getlength(s->bs[i])) {
>> > +    }
>>
>> Why is -ENOMEDIUM an expected error value?
>
> I put the -ENOMEDIUM test because of the following piece of code.
> /**
>  * Length of a file in bytes. Return < 0 if error or unknown.
>  */
> int64_t bdrv_getlength(BlockDriverState *bs)
> {
>     BlockDriver *drv = bs->drv;
>     if (!drv)
>         return -ENOMEDIUM;
>
> Still I am not sure it's needed. What is your stance on this ?

A BlockDriverState has more than just block filter or image format
state, it also has some guest-visible state unfortunately.  The
BlockDriverState attached to the guest's storage controller (e.g. IDE
CD-ROM) can be closed by blockdev.c:eject_device() and left as an
empty BlockDriverState with ->drv == NULL.

I think we don't need to worry about this in a block filter like
quorum because all child BlockDriverStates will not be ejected.

Stefan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 06/10] quorum: Add quorum_aio_writev and its dependencies.
  2012-08-08 15:37   ` Stefan Hajnoczi
@ 2012-08-09  9:24     ` Benoît Canet
  2012-08-09  9:35       ` Stefan Hajnoczi
  0 siblings, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-09  9:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, Benoît Canet, stefanha, qemu-devel, blauwirbel,
	anthony, pbonzini, eblake, afaerber

Le Wednesday 08 Aug 2012 à 16:37:13 (+0100), Stefan Hajnoczi a écrit :
> On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> > +static int quorum_check_ret(QuorumAIOCB *acb)
> > +{
> > +    int i, j;
> > +
> > +    for (i = 0, j = 0; i <= 2; i++) {
> > +        if (acb->aios[0].ret) {
> > +            j++;
> > +        }
> > +    }
> > +
> > +    if (j > 1) {
> > +        return -EIO;
> > +    }
> > +
> > +    return 0;
> > +}
> 
> Simpler version just scans the return values (also I think
> acb->aios[0].ret should be acb->aios[i].ret):
> 
> static int quorum_check_ret(QuorumAIOCB *acb)
> {
>     int i;
>     for (i = 0; i <= 2; i++) {
>         if (acb->aios[i].ret) {
>             return -EIO; /* or acb->aios[i].ret */
>         }
>     }
>     return 0;
> }

I am wondering what is the best code to return.
There is some potential case like a filer containing a particular
image (or a image on the network) going down where the user probably
don't want to get an -EIO.

The 
if (j > 1) {
    return -EIO;
}
part was about detecting an error count greater dans the threshold (2).

> 
> > +
> > +static void quorum_aio_bh(void *opaque)
> > +{
> > +    QuorumAIOCB *acb = opaque;
> > +
> > +    qemu_bh_delete(acb->bh);
> > +    acb->common.cb(acb->common.opaque, quorum_check_ret(acb));
> > +    if (acb->finished) {
> > +        *acb->finished = true;
> > +    }
> > +    qemu_aio_release(acb);
> > +}
> > +
> > +static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
> > +                                 QEMUIOVector *qiov,
> > +                                 int64_t sector_num,
> > +                                 int nb_sectors,
> > +                                 BlockDriverCompletionFunc *cb,
> > +                                 void *opaque)
> > +{
> > +    QuorumAIOCB *acb = qemu_aio_get(&quorum_aio_pool, bs, cb, opaque);
> > +    int i;
> > +
> > +    acb->qiov = qiov;
> > +    acb->bh = NULL;
> > +    acb->count = 0;
> > +    acb->sector_num = sector_num;
> > +    acb->nb_sectors = nb_sectors;
> > +    acb->vote = NULL;
> > +    acb->vote_ret = 0;
> 
> acb->finished = NULL;
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 06/10] quorum: Add quorum_aio_writev and its dependencies.
  2012-08-09  9:24     ` Benoît Canet
@ 2012-08-09  9:35       ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2012-08-09  9:35 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, Benoît Canet, stefanha, qemu-devel, blauwirbel,
	anthony, pbonzini, eblake, afaerber

On Thu, Aug 9, 2012 at 10:24 AM, Benoît Canet <benoit.canet@irqsave.net> wrote:
> Le Wednesday 08 Aug 2012 ŕ 16:37:13 (+0100), Stefan Hajnoczi a écrit :
>> On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
>> > +static int quorum_check_ret(QuorumAIOCB *acb)
>> > +{
>> > +    int i, j;
>> > +
>> > +    for (i = 0, j = 0; i <= 2; i++) {
>> > +        if (acb->aios[0].ret) {
>> > +            j++;
>> > +        }
>> > +    }
>> > +
>> > +    if (j > 1) {
>> > +        return -EIO;
>> > +    }
>> > +
>> > +    return 0;
>> > +}
>>
>> Simpler version just scans the return values (also I think
>> acb->aios[0].ret should be acb->aios[i].ret):
>>
>> static int quorum_check_ret(QuorumAIOCB *acb)
>> {
>>     int i;
>>     for (i = 0; i <= 2; i++) {
>>         if (acb->aios[i].ret) {
>>             return -EIO; /* or acb->aios[i].ret */
>>         }
>>     }
>>     return 0;
>> }
>
> I am wondering what is the best code to return.
> There is some potential case like a filer containing a particular
> image (or a image on the network) going down where the user probably
> don't want to get an -EIO.
>
> The
> if (j > 1) {
>     return -EIO;
> }
> part was about detecting an error count greater dans the threshold (2).

Yeah, this starts to get into policy and depends on the user.  The
best we can do is to make it configurable and choose a sane default.

Stefan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency
  2012-08-08 14:55 ` [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Stefan Hajnoczi
@ 2012-08-10 17:45   ` Benoît Canet
  0 siblings, 0 replies; 33+ messages in thread
From: Benoît Canet @ 2012-08-10 17:45 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, Benoît Canet, stefanha, qemu-devel, blauwirbel,
	anthony, pbonzini, eblake, afaerber

Le Wednesday 08 Aug 2012 à 15:55:58 (+0100), Stefan Hajnoczi a écrit :
> On Tue, Aug 7, 2012 at 2:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> > This patchset create a block driver implementing a quorum using three qemu disk
> > images. Writes are mirrored on the three files.
> > For the reading part the three files are read at the same time and a vote is
> > done to determine which is the majority qiov version. It then return this
> > majority version to the upper layers.
> > When three differents versions of the data are returned by the lower layer the
> > quorum is broken and the read return -EIO.
> 
> If you make the quorum setting configurable, then this can replace
> blkverify.  n=2 is blkverify, n=3 is your current patch series, n/m is
> also possible where n=number of mirrors and m=threshold needed to
> achieve quorum.

I am working on making the quorum settings configurable.
I will post the new patchset next week.

Benoît

> 
> Stefan
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open().
  2012-08-07 20:30   ` Blue Swirl
  2012-08-07 20:40     ` Eric Blake
@ 2012-08-10 17:48     ` Benoît Canet
  2012-08-13  7:41       ` Kevin Wolf
  1 sibling, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-10 17:48 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kwolf, Benoît Canet, stefanha, qemu-devel, anthony,
	pbonzini, eblake, afaerber

Le Tuesday 07 Aug 2012 à 20:30:09 (+0000), Blue Swirl a écrit :
> On Tue, Aug 7, 2012 at 1:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > ---
> >  block/quorum.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 62 insertions(+)
> >
> > diff --git a/block/quorum.c b/block/quorum.c
> > index e0405b6..de58ab8 100644
> > --- a/block/quorum.c
> > +++ b/block/quorum.c
> > @@ -47,11 +47,73 @@ struct QuorumAIOCB {
> >      int vote_ret;
> >  };
> >
> > +/* Valid quorum filenames look like
> > + * quorum:path/to/a_image:path/to/b_image:path/to/c_image
> 
> This syntax would mean that stacking for example curl or other network
> paths would not be possible. How about comma as separator?

I just tried comma but it fail because the qemu command line parsing
catch it believing that the string next to the coma is another "file="
like options.

Is there any other separator we can use ?

Benoît

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open().
  2012-08-10 17:48     ` Benoît Canet
@ 2012-08-13  7:41       ` Kevin Wolf
  2012-08-14 16:56         ` Benoît Canet
  0 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2012-08-13  7:41 UTC (permalink / raw)
  To: Benoît Canet
  Cc: Benoît Canet, stefanha, qemu-devel, Blue Swirl, anthony,
	pbonzini, eblake, afaerber

Am 10.08.2012 19:48, schrieb Benoît Canet:
> Le Tuesday 07 Aug 2012 à 20:30:09 (+0000), Blue Swirl a écrit :
>> On Tue, Aug 7, 2012 at 1:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
>>> Signed-off-by: Benoit Canet <benoit@irqsave.net>
>>> ---
>>>  block/quorum.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 62 insertions(+)
>>>
>>> diff --git a/block/quorum.c b/block/quorum.c
>>> index e0405b6..de58ab8 100644
>>> --- a/block/quorum.c
>>> +++ b/block/quorum.c
>>> @@ -47,11 +47,73 @@ struct QuorumAIOCB {
>>>      int vote_ret;
>>>  };
>>>
>>> +/* Valid quorum filenames look like
>>> + * quorum:path/to/a_image:path/to/b_image:path/to/c_image
>>
>> This syntax would mean that stacking for example curl or other network
>> paths would not be possible. How about comma as separator?
> 
> I just tried comma but it fail because the qemu command line parsing
> catch it believing that the string next to the coma is another "file="
> like options.
> 
> Is there any other separator we can use ?

I would ignore that for now, or you can introduce your own escaping of
colons. The real solution is, once again, -blockdev.

Kevin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open().
  2012-08-13  7:41       ` Kevin Wolf
@ 2012-08-14 16:56         ` Benoît Canet
  2012-08-15 10:40           ` Kevin Wolf
  0 siblings, 1 reply; 33+ messages in thread
From: Benoît Canet @ 2012-08-14 16:56 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Benoît Canet, Benoît Canet, stefanha, qemu-devel,
	Blue Swirl, anthony, pbonzini, eblake, afaerber

Le Monday 13 Aug 2012 à 09:41:00 (+0200), Kevin Wolf a écrit :
> Am 10.08.2012 19:48, schrieb Benoît Canet:
> > Le Tuesday 07 Aug 2012 à 20:30:09 (+0000), Blue Swirl a écrit :
> >> On Tue, Aug 7, 2012 at 1:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> >>> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> >>> ---
> >>>  block/quorum.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 62 insertions(+)
> >>>
> >>> diff --git a/block/quorum.c b/block/quorum.c
> >>> index e0405b6..de58ab8 100644
> >>> --- a/block/quorum.c
> >>> +++ b/block/quorum.c
> >>> @@ -47,11 +47,73 @@ struct QuorumAIOCB {
> >>>      int vote_ret;
> >>>  };
> >>>
> >>> +/* Valid quorum filenames look like
> >>> + * quorum:path/to/a_image:path/to/b_image:path/to/c_image
> >>
> >> This syntax would mean that stacking for example curl or other network
> >> paths would not be possible. How about comma as separator?
> > 
> > I just tried comma but it fail because the qemu command line parsing
> > catch it believing that the string next to the coma is another "file="
> > like options.
> > 
> > Is there any other separator we can use ?
> 
> I would ignore that for now, or you can introduce your own escaping of
> colons. The real solution is, once again, -blockdev.

Is -blockdev related to BlockBackend ?

Benoît

> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open().
  2012-08-14 16:56         ` Benoît Canet
@ 2012-08-15 10:40           ` Kevin Wolf
  0 siblings, 0 replies; 33+ messages in thread
From: Kevin Wolf @ 2012-08-15 10:40 UTC (permalink / raw)
  To: Benoît Canet
  Cc: Benoît Canet, stefanha, qemu-devel, Blue Swirl, anthony,
	pbonzini, eblake, afaerber

Am 14.08.2012 18:56, schrieb Benoît Canet:
> Le Monday 13 Aug 2012 à 09:41:00 (+0200), Kevin Wolf a écrit :
>> Am 10.08.2012 19:48, schrieb Benoît Canet:
>>> Le Tuesday 07 Aug 2012 à 20:30:09 (+0000), Blue Swirl a écrit :
>>>> On Tue, Aug 7, 2012 at 1:44 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
>>>>> Signed-off-by: Benoit Canet <benoit@irqsave.net>
>>>>> ---
>>>>>  block/quorum.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 62 insertions(+)
>>>>>
>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>> index e0405b6..de58ab8 100644
>>>>> --- a/block/quorum.c
>>>>> +++ b/block/quorum.c
>>>>> @@ -47,11 +47,73 @@ struct QuorumAIOCB {
>>>>>      int vote_ret;
>>>>>  };
>>>>>
>>>>> +/* Valid quorum filenames look like
>>>>> + * quorum:path/to/a_image:path/to/b_image:path/to/c_image
>>>>
>>>> This syntax would mean that stacking for example curl or other network
>>>> paths would not be possible. How about comma as separator?
>>>
>>> I just tried comma but it fail because the qemu command line parsing
>>> catch it believing that the string next to the coma is another "file="
>>> like options.
>>>
>>> Is there any other separator we can use ?
>>
>> I would ignore that for now, or you can introduce your own escaping of
>> colons. The real solution is, once again, -blockdev.
> 
> Is -blockdev related to BlockBackend ?

BlockBackend is a prerequisite for it.

One of the goals with -blockdev is to have not only a filename for
bdrv_open from which you can parse options, but that you can get
structured and driver-specific options.

Kevin

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2012-08-15 10:49 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-07 13:44 [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Benoît Canet
2012-08-07 13:44 ` [Qemu-devel] [RFC V2 01/10] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
2012-08-07 20:24   ` Blue Swirl
2012-08-07 13:44 ` [Qemu-devel] [RFC V2 02/10] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
2012-08-07 13:44 ` [Qemu-devel] [RFC V2 03/10] quorum: Add quorum_open() Benoît Canet
2012-08-07 20:30   ` Blue Swirl
2012-08-07 20:40     ` Eric Blake
2012-08-10 17:48     ` Benoît Canet
2012-08-13  7:41       ` Kevin Wolf
2012-08-14 16:56         ` Benoît Canet
2012-08-15 10:40           ` Kevin Wolf
2012-08-08 15:04   ` Stefan Hajnoczi
2012-08-07 13:44 ` [Qemu-devel] [RFC V2 04/10] quorum: Add quorum_close() Benoît Canet
2012-08-07 20:30   ` Blue Swirl
2012-08-08 15:06     ` Stefan Hajnoczi
2012-08-07 13:44 ` [Qemu-devel] [RFC V2 05/10] quorum: Add quorum_getlength() Benoît Canet
2012-08-08 15:30   ` Stefan Hajnoczi
2012-08-09  9:07     ` Benoît Canet
2012-08-09  9:14       ` Stefan Hajnoczi
2012-08-07 13:44 ` [Qemu-devel] [RFC V2 06/10] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
2012-08-08 15:37   ` Stefan Hajnoczi
2012-08-09  9:24     ` Benoît Canet
2012-08-09  9:35       ` Stefan Hajnoczi
2012-08-07 13:44 ` [Qemu-devel] [RFC V2 07/10] blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare() public Benoît Canet
2012-08-08 15:38   ` Stefan Hajnoczi
2012-08-07 13:44 ` [Qemu-devel] [RFC V2 08/10] quorum: Add quorum_co_flush() Benoît Canet
2012-08-07 13:44 ` [Qemu-devel] [RFC V2 09/10] quorum: Add quorum_aio_readv Benoît Canet
2012-08-08 15:44   ` Stefan Hajnoczi
2012-08-08 15:46     ` Stefan Hajnoczi
2012-08-07 13:44 ` [Qemu-devel] [RFC V2 10/10] quorum: Add quorum mechanism Benoît Canet
2012-08-08 15:54   ` Stefan Hajnoczi
2012-08-08 14:55 ` [Qemu-devel] [PATCH 00/10] Quorum disk image corruption resiliency Stefan Hajnoczi
2012-08-10 17:45   ` Benoît Canet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.