All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V10 00/13] Quorum block driver
@ 2014-01-28 16:52 Benoît Canet
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 01/13] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
                   ` (13 more replies)
  0 siblings, 14 replies; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

Here is the new version of the QUORUM block driver.
It now use bdrv_image_open for using QMP references and support snapshotting
via the bs node-name infrastructure.

I think the series is feature complete.

It applies on top of Max bdrv_openv2 branch.

in v10
    reference support thanks with Kevin help [Benoît]
    snapshot support [Benoît]
    "operation it performs on its children" [Max]
    s/hold/holds/ [Max]
    s/occur/occurs/ [Max]
    Change callback cancelation [Max]
    Make quorum compilation optional in order not to depend on gnutls [Max]

Benoît Canet (13):
  quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
  quorum: Create BDRVQuorumState and BlkDriver and do init.
  quorum: Add quorum_aio_writev and its dependencies.
  blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from
    blkverify.
  quorum: Add quorum_aio_readv.
  quorum: Add quorum mechanism.
  quorum: Add quorum_getlength().
  quorum: Add quorum_invalidate_cache().
  quorum: Add quorum_co_get_block_status.
  quorum: Add quorum_co_flush().
  quorum: Implement recursive .bdrv_recurse_is_first_non_filter in
    quorum.
  quorum: Add quorum_open() and quorum_close().
  quorum: Add unit test.

 block/Makefile.objs        |    1 +
 block/blkverify.c          |  108 +----
 block/quorum.c             | 1036 ++++++++++++++++++++++++++++++++++++++++++++
 configure                  |   36 ++
 docs/qmp/qmp-events.txt    |   33 ++
 include/monitor/monitor.h  |    2 +
 include/qemu-common.h      |    2 +
 monitor.c                  |    2 +
 qapi-schema.json           |   21 +-
 tests/qemu-iotests/075     |   85 ++++
 tests/qemu-iotests/075.out |   27 ++
 tests/qemu-iotests/group   |    1 +
 util/iov.c                 |  103 +++++
 13 files changed, 1350 insertions(+), 107 deletions(-)
 create mode 100644 block/quorum.c
 create mode 100644 tests/qemu-iotests/075
 create mode 100644 tests/qemu-iotests/075.out

--
1.8.3.2

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 01/13] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-01-31 21:10   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 02/13] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs |  1 +
 block/quorum.c      | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)
 create mode 100644 block/quorum.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 4e8c91e..a2650b9 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -3,6 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-c
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
+block-obj-y += quorum.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
diff --git a/block/quorum.c b/block/quorum.c
new file mode 100644
index 0000000..1695f04
--- /dev/null
+++ b/block/quorum.c
@@ -0,0 +1,54 @@
+/*
+ * Quorum Block filter
+ *
+ * Copyright (C) 2012-2013 Nodalink, SARL.
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Based on the design and code of blkverify.c (Copyright (C) 2010 IBM, Corp)
+ * and blkmirror.c (Copyright (C) 2011 Red Hat, Inc).
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "block/block_int.h"
+
+typedef struct QuorumAIOCB QuorumAIOCB;
+
+/* Quorum will create one instance of the following structure per operation it
+ * performs on its children.
+ * So for each read/write operation coming from the upper layer there will be
+ * $children_count QuorumSingleAIOCB.
+ */
+typedef struct QuorumSingleAIOCB {
+    BlockDriverAIOCB *aiocb;
+    QEMUIOVector qiov;
+    uint8_t *buf;
+    int ret;
+    QuorumAIOCB *parent;
+} QuorumSingleAIOCB;
+
+/* Quorum will use the following structure to track progress of each read/write
+ * operation received by the upper layer.
+ * This structure hold pointers to the QuorumSingleAIOCB structures instances
+ * used to do operations on each children and track overall progress.
+ */
+struct QuorumAIOCB {
+    BlockDriverAIOCB common;
+
+    /* Request metadata */
+    uint64_t sector_num;
+    int nb_sectors;
+
+    QEMUIOVector *qiov;         /* calling IOV */
+
+    QuorumSingleAIOCB *aios;    /* individual AIOs */
+    int count;                  /* number of completed AIOCB */
+    int success_count;          /* number of successfully completed AIOCB */
+    bool *finished;             /* completion signal for cancel */
+
+    bool is_read;
+    int vote_ret;
+};
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 02/13] quorum: Create BDRVQuorumState and BlkDriver and do init.
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 01/13] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-01-31 21:11   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 03/13] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 1695f04..0242378 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -15,6 +15,16 @@
 
 #include "block/block_int.h"
 
+/* the following structure holds the state of one quorum instance */
+typedef struct {
+    BlockDriverState **bs; /* children BlockDriverStates */
+    int total;             /* children count */
+    int threshold;         /* if less than threshold children reads gave the
+                            * same result a quorum error occurs.
+                            */
+    bool is_blkverify;     /* true if the driver is in blkverify mode */
+} BDRVQuorumState;
+
 typedef struct QuorumAIOCB QuorumAIOCB;
 
 /* Quorum will create one instance of the following structure per operation it
@@ -37,6 +47,7 @@ typedef struct QuorumSingleAIOCB {
  */
 struct QuorumAIOCB {
     BlockDriverAIOCB common;
+    BDRVQuorumState *bqs;
 
     /* Request metadata */
     uint64_t sector_num;
@@ -52,3 +63,17 @@ struct QuorumAIOCB {
     bool is_read;
     int vote_ret;
 };
+
+static BlockDriver bdrv_quorum = {
+    .format_name        = "quorum",
+    .protocol_name      = "quorum",
+
+    .instance_size      = sizeof(BDRVQuorumState),
+};
+
+static void bdrv_quorum_init(void)
+{
+    bdrv_register(&bdrv_quorum);
+}
+
+block_init(bdrv_quorum_init);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 03/13] quorum: Add quorum_aio_writev and its dependencies.
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 01/13] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 02/13] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-01-31 21:21   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 04/13] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 0242378..85992ab 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -64,11 +64,134 @@ struct QuorumAIOCB {
     int vote_ret;
 };
 
+static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+    QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
+    BDRVQuorumState *s = acb->bqs;
+    int i;
+
+    /* cancel all callback */
+    for (i = 0; i < s->total; i++) {
+        bdrv_aio_cancel(acb->aios[i].aiocb);
+    }
+}
+
+static AIOCBInfo quorum_aiocb_info = {
+    .aiocb_size         = sizeof(QuorumAIOCB),
+    .cancel             = quorum_aio_cancel,
+};
+
+/* return the first error code get by each individual callbacks */
+static int quorum_get_first_error(QuorumAIOCB *acb)
+{
+    BDRVQuorumState *s = acb->bqs;
+    int i, ret = 0;
+
+    for (i = 0; i < s->total; i++) {
+        ret = acb->aios[i].ret;
+        if (ret) {
+            return ret;
+        }
+    }
+
+    /* should not pass here */
+    assert(false);
+}
+
+static void quorum_aio_finalize(QuorumAIOCB *acb)
+{
+    BDRVQuorumState *s = acb->bqs;
+    int ret;
+
+    ret = s->threshold <= acb->success_count ? 0 : quorum_get_first_error(acb);
+
+    acb->common.cb(acb->common.opaque, ret);
+    if (acb->finished) {
+        *acb->finished = true;
+    }
+    g_free(acb->aios);
+    qemu_aio_release(acb);
+}
+
+static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
+                                   BlockDriverState *bs,
+                                   QEMUIOVector *qiov,
+                                   uint64_t sector_num,
+                                   int nb_sectors,
+                                   BlockDriverCompletionFunc *cb,
+                                   void *opaque)
+{
+    QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
+    int i;
+
+    acb->bqs = s;
+    acb->sector_num = sector_num;
+    acb->nb_sectors = nb_sectors;
+    acb->qiov = qiov;
+    acb->aios = g_new0(QuorumSingleAIOCB, s->total);
+    acb->count = 0;
+    acb->success_count = 0;
+    acb->finished = NULL;
+    acb->is_read = false;
+    acb->vote_ret = 0;
+
+    for (i = 0; i < s->total; i++) {
+        acb->aios[i].buf = NULL;
+        acb->aios[i].ret = 0;
+        acb->aios[i].parent = acb;
+    }
+
+    return acb;
+}
+
+static void quorum_aio_cb(void *opaque, int ret)
+{
+    QuorumSingleAIOCB *sacb = opaque;
+    QuorumAIOCB *acb = sacb->parent;
+    BDRVQuorumState *s = acb->bqs;
+
+    sacb->ret = ret;
+    acb->count++;
+    if (ret == 0) {
+        acb->success_count++;
+    }
+    assert(acb->count <= s->total);
+    assert(acb->success_count <= s->total);
+    if (acb->count < s->total) {
+        return;
+    }
+
+    quorum_aio_finalize(acb);
+}
+
+static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
+                                          int64_t sector_num,
+                                          QEMUIOVector *qiov,
+                                          int nb_sectors,
+                                          BlockDriverCompletionFunc *cb,
+                                          void *opaque)
+{
+    BDRVQuorumState *s = bs->opaque;
+    QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors,
+                                      cb, opaque);
+    int i;
+
+    for (i = 0; i < s->total; i++) {
+        acb->aios[i].aiocb = bdrv_aio_writev(s->bs[i], sector_num, qiov,
+                                             nb_sectors, &quorum_aio_cb,
+                                             &acb->aios[i]);
+    }
+
+    return &acb->common;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
 
     .instance_size      = sizeof(BDRVQuorumState),
+
+    .bdrv_aio_writev    = quorum_aio_writev,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 04/13] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify.
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (2 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 03/13] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-01-31 21:25   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 05/13] quorum: Add quorum_aio_readv Benoît Canet
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/blkverify.c     | 108 +-------------------------------------------------
 include/qemu-common.h |   2 +
 util/iov.c            | 103 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 106 deletions(-)

diff --git a/block/blkverify.c b/block/blkverify.c
index fe94b59..91b638d 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -171,110 +171,6 @@ static int64_t blkverify_getlength(BlockDriverState *bs)
     return bdrv_getlength(s->test_file);
 }
 
-/**
- * Check that I/O vector contents are identical
- *
- * @a:          I/O vector
- * @b:          I/O vector
- * @ret:        Offset to first mismatching byte or -1 if match
- */
-static ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
-{
-    int i;
-    ssize_t offset = 0;
-
-    assert(a->niov == b->niov);
-    for (i = 0; i < a->niov; i++) {
-        size_t len = 0;
-        uint8_t *p = (uint8_t *)a->iov[i].iov_base;
-        uint8_t *q = (uint8_t *)b->iov[i].iov_base;
-
-        assert(a->iov[i].iov_len == b->iov[i].iov_len);
-        while (len < a->iov[i].iov_len && *p++ == *q++) {
-            len++;
-        }
-
-        offset += len;
-
-        if (len != a->iov[i].iov_len) {
-            return offset;
-        }
-    }
-    return -1;
-}
-
-typedef struct {
-    int src_index;
-    struct iovec *src_iov;
-    void *dest_base;
-} IOVectorSortElem;
-
-static int sortelem_cmp_src_base(const void *a, const void *b)
-{
-    const IOVectorSortElem *elem_a = a;
-    const IOVectorSortElem *elem_b = b;
-
-    /* Don't overflow */
-    if (elem_a->src_iov->iov_base < elem_b->src_iov->iov_base) {
-        return -1;
-    } else if (elem_a->src_iov->iov_base > elem_b->src_iov->iov_base) {
-        return 1;
-    } else {
-        return 0;
-    }
-}
-
-static int sortelem_cmp_src_index(const void *a, const void *b)
-{
-    const IOVectorSortElem *elem_a = a;
-    const IOVectorSortElem *elem_b = b;
-
-    return elem_a->src_index - elem_b->src_index;
-}
-
-/**
- * Copy contents of I/O vector
- *
- * The relative relationships of overlapping iovecs are preserved.  This is
- * necessary to ensure identical semantics in the cloned I/O vector.
- */
-static void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
-                                  void *buf)
-{
-    IOVectorSortElem sortelems[src->niov];
-    void *last_end;
-    int i;
-
-    /* Sort by source iovecs by base address */
-    for (i = 0; i < src->niov; i++) {
-        sortelems[i].src_index = i;
-        sortelems[i].src_iov = &src->iov[i];
-    }
-    qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
-
-    /* Allocate buffer space taking into account overlapping iovecs */
-    last_end = NULL;
-    for (i = 0; i < src->niov; i++) {
-        struct iovec *cur = sortelems[i].src_iov;
-        ptrdiff_t rewind = 0;
-
-        /* Detect overlap */
-        if (last_end && last_end > cur->iov_base) {
-            rewind = last_end - cur->iov_base;
-        }
-
-        sortelems[i].dest_base = buf - rewind;
-        buf += cur->iov_len - MIN(rewind, cur->iov_len);
-        last_end = MAX(cur->iov_base + cur->iov_len, last_end);
-    }
-
-    /* Sort by source iovec index and build destination iovec */
-    qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
-    for (i = 0; i < src->niov; i++) {
-        qemu_iovec_add(dest, sortelems[i].dest_base, src->iov[i].iov_len);
-    }
-}
-
 static BlkverifyAIOCB *blkverify_aio_get(BlockDriverState *bs, bool is_write,
                                          int64_t sector_num, QEMUIOVector *qiov,
                                          int nb_sectors,
@@ -338,7 +234,7 @@ static void blkverify_aio_cb(void *opaque, int ret)
 
 static void blkverify_verify_readv(BlkverifyAIOCB *acb)
 {
-    ssize_t offset = blkverify_iovec_compare(acb->qiov, &acb->raw_qiov);
+    ssize_t offset = qemu_iovec_compare(acb->qiov, &acb->raw_qiov);
     if (offset != -1) {
         blkverify_err(acb, "contents mismatch in sector %" PRId64,
                       acb->sector_num + (int64_t)(offset / BDRV_SECTOR_SIZE));
@@ -356,7 +252,7 @@ static BlockDriverAIOCB *blkverify_aio_readv(BlockDriverState *bs,
     acb->verify = blkverify_verify_readv;
     acb->buf = qemu_blockalign(bs->file, qiov->size);
     qemu_iovec_init(&acb->raw_qiov, acb->qiov->niov);
-    blkverify_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
+    qemu_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
 
     bdrv_aio_readv(s->test_file, sector_num, qiov, nb_sectors,
                    blkverify_aio_cb, acb);
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 5054836..d70783e 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -346,6 +346,8 @@ size_t qemu_iovec_from_buf(QEMUIOVector *qiov, size_t offset,
                            const void *buf, size_t bytes);
 size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
                          int fillc, size_t bytes);
+ssize_t qemu_iovec_compare(QEMUIOVector *a, QEMUIOVector *b);
+void qemu_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, void *buf);
 
 bool buffer_is_zero(const void *buf, size_t len);
 
diff --git a/util/iov.c b/util/iov.c
index bb46c04..f869fb9 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -378,6 +378,109 @@ size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
     return iov_memset(qiov->iov, qiov->niov, offset, fillc, bytes);
 }
 
+/**
+ * Check that I/O vector contents are identical
+ *
+ * @a:          I/O vector
+ * @b:          I/O vector
+ * @ret:        Offset to first mismatching byte or -1 if match
+ */
+ssize_t qemu_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
+{
+    int i;
+    ssize_t offset = 0;
+
+    assert(a->niov == b->niov);
+    for (i = 0; i < a->niov; i++) {
+        size_t len = 0;
+        uint8_t *p = (uint8_t *)a->iov[i].iov_base;
+        uint8_t *q = (uint8_t *)b->iov[i].iov_base;
+
+        assert(a->iov[i].iov_len == b->iov[i].iov_len);
+        while (len < a->iov[i].iov_len && *p++ == *q++) {
+            len++;
+        }
+
+        offset += len;
+
+        if (len != a->iov[i].iov_len) {
+            return offset;
+        }
+    }
+    return -1;
+}
+
+typedef struct {
+    int src_index;
+    struct iovec *src_iov;
+    void *dest_base;
+} IOVectorSortElem;
+
+static int sortelem_cmp_src_base(const void *a, const void *b)
+{
+    const IOVectorSortElem *elem_a = a;
+    const IOVectorSortElem *elem_b = b;
+
+    /* Don't overflow */
+    if (elem_a->src_iov->iov_base < elem_b->src_iov->iov_base) {
+        return -1;
+    } else if (elem_a->src_iov->iov_base > elem_b->src_iov->iov_base) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
+static int sortelem_cmp_src_index(const void *a, const void *b)
+{
+    const IOVectorSortElem *elem_a = a;
+    const IOVectorSortElem *elem_b = b;
+
+    return elem_a->src_index - elem_b->src_index;
+}
+
+/**
+ * Copy contents of I/O vector
+ *
+ * The relative relationships of overlapping iovecs are preserved.  This is
+ * necessary to ensure identical semantics in the cloned I/O vector.
+ */
+void qemu_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, void *buf)
+{
+    IOVectorSortElem sortelems[src->niov];
+    void *last_end;
+    int i;
+
+    /* Sort by source iovecs by base address */
+    for (i = 0; i < src->niov; i++) {
+        sortelems[i].src_index = i;
+        sortelems[i].src_iov = &src->iov[i];
+    }
+    qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
+
+    /* Allocate buffer space taking into account overlapping iovecs */
+    last_end = NULL;
+    for (i = 0; i < src->niov; i++) {
+        struct iovec *cur = sortelems[i].src_iov;
+        ptrdiff_t rewind = 0;
+
+        /* Detect overlap */
+        if (last_end && last_end > cur->iov_base) {
+            rewind = last_end - cur->iov_base;
+        }
+
+        sortelems[i].dest_base = buf - rewind;
+        buf += cur->iov_len - MIN(rewind, cur->iov_len);
+        last_end = MAX(cur->iov_base + cur->iov_len, last_end);
+    }
+
+    /* Sort by source iovec index and build destination iovec */
+    qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
+    for (i = 0; i < src->niov; i++) {
+        qemu_iovec_add(dest, sortelems[i].dest_base, src->iov[i].iov_len);
+    }
+}
+
 size_t iov_discard_front(struct iovec **iov, unsigned int *iov_cnt,
                          size_t bytes)
 {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 05/13] quorum: Add quorum_aio_readv.
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (3 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 04/13] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-01-31 21:47   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 06/13] quorum: Add quorum mechanism Benoît Canet
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index 85992ab..5bf37b3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -101,14 +101,23 @@ static int quorum_get_first_error(QuorumAIOCB *acb)
 static void quorum_aio_finalize(QuorumAIOCB *acb)
 {
     BDRVQuorumState *s = acb->bqs;
-    int ret;
+    int i, ret;
 
     ret = s->threshold <= acb->success_count ? 0 : quorum_get_first_error(acb);
 
+    for (i = 0; i < s->total; i++) {
+        qemu_vfree(acb->aios[i].buf);
+        acb->aios[i].buf = NULL;
+        acb->aios[i].ret = 0;
+    }
+
     acb->common.cb(acb->common.opaque, ret);
     if (acb->finished) {
         *acb->finished = true;
     }
+    for (i = 0; acb->is_read && i < s->total; i++) {
+        qemu_iovec_destroy(&acb->aios[i].qiov);
+    }
     g_free(acb->aios);
     qemu_aio_release(acb);
 }
@@ -164,6 +173,34 @@ static void quorum_aio_cb(void *opaque, int ret)
     quorum_aio_finalize(acb);
 }
 
+static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
+                                         int64_t sector_num,
+                                         QEMUIOVector *qiov,
+                                         int nb_sectors,
+                                         BlockDriverCompletionFunc *cb,
+                                         void *opaque)
+{
+    BDRVQuorumState *s = bs->opaque;
+    QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
+                                      nb_sectors, cb, opaque);
+    int i;
+
+    acb->is_read = true;
+
+    for (i = 0; i < s->total; i++) {
+        acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
+        qemu_iovec_init(&acb->aios[i].qiov, qiov->niov);
+        qemu_iovec_clone(&acb->aios[i].qiov, qiov, acb->aios[i].buf);
+    }
+
+    for (i = 0; i < s->total; i++) {
+        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+                       quorum_aio_cb, &acb->aios[i]);
+    }
+
+    return &acb->common;
+}
+
 static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
                                           int64_t sector_num,
                                           QEMUIOVector *qiov,
@@ -191,6 +228,7 @@ static BlockDriver bdrv_quorum = {
 
     .instance_size      = sizeof(BDRVQuorumState),
 
+    .bdrv_aio_readv     = quorum_aio_readv,
     .bdrv_aio_writev    = quorum_aio_writev,
 };
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 06/13] quorum: Add quorum mechanism.
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (4 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 05/13] quorum: Add quorum_aio_readv Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-01-31 23:04   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 07/13] quorum: Add quorum_getlength() Benoît Canet
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Use gnutls's SHA-256 to compare versions.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs       |   2 +-
 block/quorum.c            | 333 +++++++++++++++++++++++++++++++++++++++++++++-
 configure                 |  36 +++++
 docs/qmp/qmp-events.txt   |  33 +++++
 include/monitor/monitor.h |   2 +
 monitor.c                 |   2 +
 6 files changed, 406 insertions(+), 2 deletions(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index a2650b9..4ca9d43 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -3,7 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-c
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
-block-obj-y += quorum.o
+block-obj-$(CONFIG_QUORUM) += quorum.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
diff --git a/block/quorum.c b/block/quorum.c
index 5bf37b3..c319719 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -13,7 +13,43 @@
  * See the COPYING file in the top-level directory.
  */
 
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
 #include "block/block_int.h"
+#include "qapi/qmp/qjson.h"
+
+#define HASH_LENGTH 32
+
+/* This union holds a vote hash value */
+typedef union QuorumVoteValue {
+    char h[HASH_LENGTH];       /* SHA-256 hash */
+    int64_t l;                 /* simpler 64 bits hash */
+} QuorumVoteValue;
+
+/* A vote item */
+typedef struct QuorumVoteItem {
+    int index;
+    QLIST_ENTRY(QuorumVoteItem) next;
+} QuorumVoteItem;
+
+/* this structure is a vote version. A version is the set of votes sharing the
+ * same vote value.
+ * The set of votes will be tracked with the items field and its cardinality is
+ * vote_count.
+ */
+typedef struct QuorumVoteVersion {
+    QuorumVoteValue value;
+    int index;
+    int vote_count;
+    QLIST_HEAD(, QuorumVoteItem) items;
+    QLIST_ENTRY(QuorumVoteVersion) next;
+} QuorumVoteVersion;
+
+/* this structure holds a group of vote versions together */
+typedef struct QuorumVotes {
+    QLIST_HEAD(, QuorumVoteVersion) vote_list;
+    int (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
+} QuorumVotes;
 
 /* the following structure holds the state of one quorum instance */
 typedef struct {
@@ -60,10 +96,14 @@ struct QuorumAIOCB {
     int success_count;          /* number of successfully completed AIOCB */
     bool *finished;             /* completion signal for cancel */
 
+    QuorumVotes votes;
+
     bool is_read;
     int vote_ret;
 };
 
+static void quorum_vote(QuorumAIOCB *acb);
+
 static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
 {
     QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
@@ -111,6 +151,10 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
         acb->aios[i].ret = 0;
     }
 
+    if (acb->vote_ret) {
+        ret = acb->vote_ret;
+    }
+
     acb->common.cb(acb->common.opaque, ret);
     if (acb->finished) {
         *acb->finished = true;
@@ -122,6 +166,11 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
     qemu_aio_release(acb);
 }
 
+static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
+{
+    return memcmp(a->h, b->h, HASH_LENGTH);
+}
+
 static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
                                    BlockDriverState *bs,
                                    QEMUIOVector *qiov,
@@ -141,6 +190,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
     acb->count = 0;
     acb->success_count = 0;
     acb->finished = NULL;
+    acb->votes.compare = quorum_sha256_compare;
     acb->is_read = false;
     acb->vote_ret = 0;
 
@@ -170,9 +220,290 @@ static void quorum_aio_cb(void *opaque, int ret)
         return;
     }
 
+    /* Do the vote on read */
+    if (acb->is_read) {
+        quorum_vote(acb);
+    }
+
     quorum_aio_finalize(acb);
 }
 
+static void quorum_report_bad(QuorumAIOCB *acb, char *node_name)
+{
+    QObject *data;
+    data = qobject_from_jsonf("{ 'node-name': \"%s\""
+                              ", 'sector-num': %" PRId64
+                              ", 'sectors-count': %i }",
+                              node_name,
+                              acb->sector_num,
+                              acb->nb_sectors);
+    monitor_protocol_event(QEVENT_QUORUM_REPORT_BAD, data);
+    qobject_decref(data);
+}
+
+static void quorum_report_failure(QuorumAIOCB *acb)
+{
+    QObject *data;
+    data = qobject_from_jsonf("{ 'sector-num': %" PRId64
+                              ", 'sectors-count': %i }",
+                              acb->sector_num,
+                              acb->nb_sectors);
+    monitor_protocol_event(QEVENT_QUORUM_FAILURE, data);
+    qobject_decref(data);
+}
+
+static void quorum_report_bad_versions(BDRVQuorumState *s,
+                                       QuorumAIOCB *acb,
+                                       QuorumVoteValue *value)
+{
+    QuorumVoteVersion *version;
+    QuorumVoteItem *item;
+
+    QLIST_FOREACH(version, &acb->votes.vote_list, next) {
+        if (!acb->votes.compare(&version->value, value)) {
+            continue;
+        }
+        QLIST_FOREACH(item, &version->items, next) {
+            quorum_report_bad(acb, s->bs[item->index]->node_name);
+        }
+    }
+}
+
+static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
+{
+    int i;
+    assert(dest->niov == source->niov);
+    assert(dest->size == source->size);
+    for (i = 0; i < source->niov; i++) {
+        assert(dest->iov[i].iov_len == source->iov[i].iov_len);
+        memcpy(dest->iov[i].iov_base,
+               source->iov[i].iov_base,
+               source->iov[i].iov_len);
+    }
+}
+
+static void quorum_count_vote(QuorumVotes *votes,
+                              QuorumVoteValue *value,
+                              int index)
+{
+    QuorumVoteVersion *v = NULL, *version = NULL;
+    QuorumVoteItem *item;
+
+    /* look if we have something with this hash */
+    QLIST_FOREACH(v, &votes->vote_list, next) {
+        if (!votes->compare(&v->value, value)) {
+            version = v;
+            break;
+        }
+    }
+
+    /* It's a version not yet in the list add it */
+    if (!version) {
+        version = g_new0(QuorumVoteVersion, 1);
+        QLIST_INIT(&version->items);
+        memcpy(&version->value, value, sizeof(version->value));
+        version->index = index;
+        version->vote_count = 0;
+        QLIST_INSERT_HEAD(&votes->vote_list, version, next);
+    }
+
+    version->vote_count++;
+
+    item = g_new0(QuorumVoteItem, 1);
+    item->index = index;
+    QLIST_INSERT_HEAD(&version->items, item, next);
+}
+
+static void quorum_free_vote_list(QuorumVotes *votes)
+{
+    QuorumVoteVersion *version, *next_version;
+    QuorumVoteItem *item, *next_item;
+
+    QLIST_FOREACH_SAFE(version, &votes->vote_list, next, next_version) {
+        QLIST_REMOVE(version, next);
+        QLIST_FOREACH_SAFE(item, &version->items, next, next_item) {
+            QLIST_REMOVE(item, next);
+            g_free(item);
+        }
+        g_free(version);
+    }
+}
+
+static int quorum_compute_hash(QuorumAIOCB *acb, int i, QuorumVoteValue *hash)
+{
+    int j, ret;
+    gnutls_hash_hd_t dig;
+    QEMUIOVector *qiov = &acb->aios[i].qiov;
+
+    ret = gnutls_hash_init(&dig, GNUTLS_DIG_SHA256);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    for (j = 0; j < qiov->niov; j++) {
+        ret = gnutls_hash(dig, qiov->iov[j].iov_base, qiov->iov[j].iov_len);
+        if (ret < 0) {
+            break;
+        }
+    }
+
+    gnutls_hash_deinit(dig, (void *) hash);
+    return ret;
+}
+
+static QuorumVoteVersion *quorum_get_vote_winner(QuorumVotes *votes)
+{
+    int i = 0;
+    QuorumVoteVersion *candidate, *winner = NULL;
+
+    QLIST_FOREACH(candidate, &votes->vote_list, next) {
+        if (candidate->vote_count > i) {
+            i = candidate->vote_count;
+            winner = candidate;
+        }
+    }
+
+    return winner;
+}
+
+static bool quorum_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
+{
+    int i;
+    int result;
+
+    assert(a->niov == b->niov);
+    for (i = 0; i < a->niov; i++) {
+        assert(a->iov[i].iov_len == b->iov[i].iov_len);
+        result = memcmp(a->iov[i].iov_base,
+                        b->iov[i].iov_base,
+                        a->iov[i].iov_len);
+        if (result) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
+static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
+                                          const char *fmt, ...)
+{
+    va_list ap;
+
+    va_start(ap, fmt);
+    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
+            acb->sector_num, acb->nb_sectors);
+    vfprintf(stderr, fmt, ap);
+    fprintf(stderr, "\n");
+    va_end(ap);
+    exit(1);
+}
+
+static bool quorum_compare(QuorumAIOCB *acb,
+                           QEMUIOVector *a,
+                           QEMUIOVector *b)
+{
+    BDRVQuorumState *s = acb->bqs;
+    ssize_t offset;
+
+    /* This driver will replace blkverify in this particular case */
+    if (s->is_blkverify) {
+        offset = qemu_iovec_compare(a, b);
+        if (offset != -1) {
+            quorum_err(acb, "contents mismatch in sector %" PRId64,
+                       acb->sector_num +
+                       (uint64_t)(offset / BDRV_SECTOR_SIZE));
+        }
+        return true;
+    }
+
+    return quorum_iovec_compare(a, b);
+}
+
+static void quorum_vote(QuorumAIOCB *acb)
+{
+    bool quorum = false;
+    int i, j, ret;
+    QuorumVoteValue hash;
+    BDRVQuorumState *s = acb->bqs;
+    QuorumVoteVersion *winner;
+
+    QLIST_INIT(&acb->votes.vote_list);
+
+    /* if we don't get any successful read use the first error code */
+    if (!acb->success_count) {
+        acb->vote_ret = acb->aios[0].ret;
+        return;
+    }
+
+    /* get the index of the first successful read (we are sure to find one) */
+    for (i = 0; i < s->total; i++) {
+        if (!acb->aios[i].ret) {
+            break;
+        }
+    }
+
+    /* compare this read with all other successful read looking for quorum */
+    for (j = i + 1; j < s->total; j++) {
+        if (acb->aios[j].ret) {
+            continue;
+        }
+        quorum = quorum_compare(acb, &acb->aios[i].qiov, &acb->aios[j].qiov);
+        if (!quorum) {
+            break;
+       }
+    }
+
+    /* Every successful read agrees and their count is higher or equal threshold
+     * -> Quorum
+     */
+    if (quorum && acb->success_count >= s->threshold) {
+        quorum_copy_qiov(acb->qiov, &acb->aios[i].qiov);
+        return;
+    }
+
+    /* compute hashs for each successful read, also store indexes */
+    for (i = 0; i < s->total; i++) {
+        if (acb->aios[i].ret) {
+            continue;
+        }
+        ret = quorum_compute_hash(acb, i, &hash);
+        /* if ever the hash computation failed */
+        if (ret < 0) {
+            acb->vote_ret = ret;
+            goto free_exit;
+        }
+        quorum_count_vote(&acb->votes, &hash, i);
+    }
+
+    /* vote to select the most represented version */
+    winner = quorum_get_vote_winner(&acb->votes);
+    /* every vote version are differents -> error */
+    if (!winner) {
+        quorum_report_failure(acb);
+        acb->vote_ret = -EIO;
+        goto free_exit;
+    }
+
+    /* if the winner count is smaller than threshold the read fails */
+    if (winner->vote_count < s->threshold) {
+        quorum_report_failure(acb);
+        acb->vote_ret = -EIO;
+        goto free_exit;
+    }
+
+    /* we have a winner: copy it */
+    quorum_copy_qiov(acb->qiov, &acb->aios[winner->index].qiov);
+
+    /* some versions are bad print them */
+    quorum_report_bad_versions(s, acb, &winner->value);
+
+free_exit:
+    /* free lists */
+    quorum_free_vote_list(&acb->votes);
+}
+
 static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
                                          int64_t sector_num,
                                          QEMUIOVector *qiov,
@@ -194,7 +525,7 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
     }
 
     for (i = 0; i < s->total; i++) {
-        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+        bdrv_aio_readv(s->bs[i], sector_num, &acb->aios[i].qiov, nb_sectors,
                        quorum_aio_cb, &acb->aios[i]);
     }
 
diff --git a/configure b/configure
index b472694..5a68738 100755
--- a/configure
+++ b/configure
@@ -263,6 +263,7 @@ gtkabi="2.0"
 tpm="no"
 libssh2=""
 vhdx=""
+quorum="no"
 
 # parse CC options first
 for opt do
@@ -1000,6 +1001,10 @@ for opt do
   ;;
   --disable-vhdx) vhdx="no"
   ;;
+  --disable-quorum) quorum="no"
+  ;;
+  --enable-quorum) quorum="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1254,6 +1259,8 @@ Advanced options (experts only):
   --enable-libssh2         enable ssh block device support
   --disable-vhdx           disables support for the Microsoft VHDX image format
   --enable-vhdx            enable support for the Microsoft VHDX image format
+  --disable-quorum         disable quorum block filter support
+  --enable-quorum          enable quorum block filter support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -1923,6 +1930,30 @@ EOF
 fi
 
 ##########################################
+# Quorum probe (check for gnutls)
+if test "$quorum" != "no" ; then
+cat > $TMPC <<EOF
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
+int main(void) {char data[4096], digest[32];
+gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
+return 0;
+}
+EOF
+quorum_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
+quorum_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
+if compile_prog "$quorum_tls_cflags" "$quorum_tls_libs" ; then
+  qcow_tls=yes
+  libs_softmmu="$quorum_tls_libs $libs_softmmu"
+  libs_tools="$quorum_tls_libs $libs_softmmu"
+  QEMU_CFLAGS="$QEMU_CFLAGS $quorum_tls_cflags"
+else
+  echo "gnutls > 2.10.0 required to compile Quorum"
+  exit 1
+fi
+fi
+
+##########################################
 # VNC SASL detection
 if test "$vnc" = "yes" -a "$vnc_sasl" != "no" ; then
   cat > $TMPC <<EOF
@@ -3843,6 +3874,7 @@ echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging     $qom_cast_debug"
 echo "vhdx              $vhdx"
+echo "Quorum            $quorum"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -4241,6 +4273,10 @@ if test "$libssh2" = "yes" ; then
   echo "CONFIG_LIBSSH2=y" >> $config_host_mak
 fi
 
+if test "$quorum" = "yes" ; then
+  echo "CONFIG_QUORUM=y" >> $config_host_mak
+fi
+
 if test "$virtio_blk_data_plane" = "yes" ; then
   echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
 fi
diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
index 6b87e97..1cbdde7 100644
--- a/docs/qmp/qmp-events.txt
+++ b/docs/qmp/qmp-events.txt
@@ -500,3 +500,36 @@ Example:
 
 Note: If action is "reset", "shutdown", or "pause" the WATCHDOG event is
 followed respectively by the RESET, SHUTDOWN, or STOP events.
+
+QUORUM_FAILURE
+--------------
+
+Emitted by the Quorum block driver when it fail to establish a quorum.
+
+Data:
+
+- "sector-num":   Number of the first sector of the failed read operation.
+- "sector-count": Failed read operation sector count.
+
+Example:
+
+{ "event": "QUORUM_FAILURE",
+     "data": { "sector-num": 345435, "sector-count": 5 },
+     "timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
+
+QUORUM_REPORT_BAD
+-----------------
+
+Emitted to report a corruption of a Quorum file.
+
+Data:
+
+- "node-name":  The graph node name of the block driver state.
+- "sector-num":   Number of the first sector of the failed read operation.
+- "sector-count": Failed read operation sector count.
+
+Example:
+
+{ "event": "QUORUM_REPORT_BAD",
+     "data": { "node-name": "1.raw", "sector-num": 345435, "sector-count": 5 },
+     "timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 7e5f752..a49ea11 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -49,6 +49,8 @@ typedef enum MonitorEvent {
     QEVENT_SPICE_MIGRATE_COMPLETED,
     QEVENT_GUEST_PANICKED,
     QEVENT_BLOCK_IMAGE_CORRUPTED,
+    QEVENT_QUORUM_FAILURE,
+    QEVENT_QUORUM_REPORT_BAD,
 
     /* Add to 'monitor_event_names' array in monitor.c when
      * defining new events here */
diff --git a/monitor.c b/monitor.c
index 80456fb..f8f6aea 100644
--- a/monitor.c
+++ b/monitor.c
@@ -507,6 +507,8 @@ static const char *monitor_event_names[] = {
     [QEVENT_SPICE_MIGRATE_COMPLETED] = "SPICE_MIGRATE_COMPLETED",
     [QEVENT_GUEST_PANICKED] = "GUEST_PANICKED",
     [QEVENT_BLOCK_IMAGE_CORRUPTED] = "BLOCK_IMAGE_CORRUPTED",
+    [QEVENT_QUORUM_FAILURE] = "QUORUM_FAILURE",
+    [QEVENT_QUORUM_REPORT_BAD] = "QUORUM_REPORT_BAD",
 };
 QEMU_BUILD_BUG_ON(ARRAY_SIZE(monitor_event_names) != QEVENT_MAX)
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 07/13] quorum: Add quorum_getlength().
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (5 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 06/13] quorum: Add quorum mechanism Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-02-02 21:25   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 08/13] quorum: Add quorum_invalidate_cache() Benoît Canet
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Check that every bs file returns the same length.
Otherwize, return -EIO to disable the quorum and
avoid length discrepancy.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index c319719..a8a8492 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -553,12 +553,38 @@ static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
     return &acb->common;
 }
 
+static int64_t quorum_getlength(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int64_t result;
+    int i;
+
+    /* check that all file have the same length */
+    result = bdrv_getlength(s->bs[0]);
+    if (result < 0) {
+        return result;
+    }
+    for (i = 1; i < s->total; i++) {
+        int64_t value = bdrv_getlength(s->bs[i]);
+        if (value < 0) {
+            return value;
+        }
+        if (value != result) {
+            return -EIO;
+        }
+    }
+
+    return result;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
 
     .instance_size      = sizeof(BDRVQuorumState),
 
+    .bdrv_getlength     = quorum_getlength,
+
     .bdrv_aio_readv     = quorum_aio_readv,
     .bdrv_aio_writev    = quorum_aio_writev,
 };
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 08/13] quorum: Add quorum_invalidate_cache().
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (6 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 07/13] quorum: Add quorum_getlength() Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-02-02 21:27   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 09/13] quorum: Add quorum_co_get_block_status Benoît Canet
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index a8a8492..a47cd33 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -577,6 +577,16 @@ static int64_t quorum_getlength(BlockDriverState *bs)
     return result;
 }
 
+static void quorum_invalidate_cache(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->total; i++) {
+        bdrv_invalidate_cache(s->bs[i]);
+    }
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
@@ -587,6 +597,7 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_aio_readv     = quorum_aio_readv,
     .bdrv_aio_writev    = quorum_aio_writev,
+    .bdrv_invalidate_cache = quorum_invalidate_cache,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 09/13] quorum: Add quorum_co_get_block_status.
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (7 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 08/13] quorum: Add quorum_invalidate_cache() Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-02-02 21:44   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 10/13] quorum: Add quorum_co_flush() Benoît Canet
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index a47cd33..9b0718b 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -171,6 +171,22 @@ static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
     return memcmp(a->h, b->h, HASH_LENGTH);
 }
 
+static int quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
+{
+    int64_t i = a->l;
+    int64_t j = b->l;
+
+    if (i < j) {
+        return -1;
+    }
+
+    if (i > j) {
+        return 1;
+    }
+
+    return 0;
+}
+
 static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
                                    BlockDriverState *bs,
                                    QEMUIOVector *qiov,
@@ -587,6 +603,56 @@ static void quorum_invalidate_cache(BlockDriverState *bs)
     }
 }
 
+static int64_t coroutine_fn quorum_co_get_block_status(BlockDriverState *bs,
+                                                       int64_t sector_num,
+                                                       int nb_sectors,
+                                                       int *pnum)
+{
+    BDRVQuorumState *s = bs->opaque;
+    QuorumVoteVersion *winner = NULL;
+    QuorumVotes result_votes, num_votes;
+    QuorumVoteValue result_value, num_value;
+    int i, num;
+    int64_t result = 0;
+
+    QLIST_INIT(&result_votes.vote_list);
+    QLIST_INIT(&num_votes.vote_list);
+    result_votes.compare = quorum_64bits_compare;
+    num_votes.compare = quorum_64bits_compare;
+
+    for (i = 0; i < s->total; i++) {
+        result = bdrv_get_block_status(s->bs[i], sector_num, nb_sectors, &num);
+        /* skip failed requests */
+        if (result < 0) {
+            continue;
+        }
+        result_value.l = result & BDRV_BLOCK_DATA;
+        num_value.l = num;
+        quorum_count_vote(&result_votes, &result_value, i);
+        quorum_count_vote(&num_votes, &num_value, i);
+    }
+
+    winner = quorum_get_vote_winner(&result_votes);
+    result = winner->value.l;
+    if (winner->vote_count < s->threshold) {
+        result = -ERANGE;
+        goto free_exit;
+    }
+
+    winner = quorum_get_vote_winner(&num_votes);
+    if (winner->vote_count < s->threshold) {
+        result = -ERANGE;
+        goto free_exit;
+    }
+    *pnum = winner->value.l;
+
+free_exit:
+    quorum_free_vote_list(&result_votes);
+    quorum_free_vote_list(&num_votes);
+
+    return result;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
@@ -598,6 +664,7 @@ static BlockDriver bdrv_quorum = {
     .bdrv_aio_readv     = quorum_aio_readv,
     .bdrv_aio_writev    = quorum_aio_writev,
     .bdrv_invalidate_cache = quorum_invalidate_cache,
+    .bdrv_co_get_block_status = quorum_co_get_block_status,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 10/13] quorum: Add quorum_co_flush().
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (8 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 09/13] quorum: Add quorum_co_get_block_status Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-02-02 22:02   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 11/13] quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum Benoît Canet
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Makes a vote to select error if any.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 9b0718b..1b84b07 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -653,12 +653,46 @@ free_exit:
     return result;
 }
 
+static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    QuorumVoteVersion *winner = NULL;
+    QuorumVotes error_votes;
+    QuorumVoteValue result_value;
+    int i;
+    int result = 0;
+    bool error = false;
+
+    QLIST_INIT(&error_votes.vote_list);
+    error_votes.compare = quorum_64bits_compare;
+
+    for (i = 0; i < s->total; i++) {
+        result = bdrv_co_flush(s->bs[i]);
+        if (result) {
+            error = true;
+            result_value.l = result;
+            quorum_count_vote(&error_votes, &result_value, i);
+        }
+    }
+
+    if (error) {
+        winner = quorum_get_vote_winner(&error_votes);
+        result = winner->value.l;
+    }
+
+    quorum_free_vote_list(&error_votes);
+
+    return result;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
 
     .instance_size      = sizeof(BDRVQuorumState),
 
+    .bdrv_co_flush_to_disk = quorum_co_flush,
+
     .bdrv_getlength     = quorum_getlength,
 
     .bdrv_aio_readv     = quorum_aio_readv,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 11/13] quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum.
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (9 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 10/13] quorum: Add quorum_co_flush() Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-02-02 22:31   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 12/13] quorum: Add quorum_open() and quorum_close() Benoît Canet
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 1b84b07..e7b2090 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -685,6 +685,23 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
     return result;
 }
 
+static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
+                                               BlockDriverState *candidate)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->total; i++) {
+        bool perm = bdrv_recurse_is_first_non_filter(s->bs[i],
+                                                     candidate);
+        if (perm) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
@@ -699,6 +716,8 @@ static BlockDriver bdrv_quorum = {
     .bdrv_aio_writev    = quorum_aio_writev,
     .bdrv_invalidate_cache = quorum_invalidate_cache,
     .bdrv_co_get_block_status = quorum_co_get_block_status,
+
+    .bdrv_recurse_is_first_non_filter = quorum_recurse_is_first_non_filter,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 12/13] quorum: Add quorum_open() and quorum_close().
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (10 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 11/13] quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-02-02 23:09   ` Max Reitz
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 13/13] quorum: Add unit test Benoît Canet
  2014-01-31 20:40 ` [Qemu-devel] [PATCH V10 00/13] Quorum block driver Max Reitz
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha, mreitz

From: Benoît Canet <benoit@irqsave.net>

Example of command line:
-drive if=virtio,file.driver=quorum,\
file.children.0.file.filename=1.raw,\
file.children.0.node-name=1.raw,\
file.children.0.driver=raw,\
file.children.1.file.filename=2.raw,\
file.children.1.node-name=2.raw,\
file.children.1.driver=raw,\
file.children.2.file.filename=3.raw,\
file.children.2.node-name=3.raw,\
file.children.2.driver=raw,\
file.vote_threshold=2

file.blkverify=on with file.vote_threshold=2 and two files can be passed to
emulated blkverify.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c   | 308 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json |  21 +++-
 2 files changed, 328 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index e7b2090..0c0d630 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -17,8 +17,12 @@
 #include <gnutls/crypto.h>
 #include "block/block_int.h"
 #include "qapi/qmp/qjson.h"
+#include "qapi/qmp/types.h"
+#include "qemu-common.h"
 
 #define HASH_LENGTH 32
+#define KEY_PREFIX "children."
+#define KEY_FILENAME_SUFFIX ".file.filename"
 
 /* This union holds a vote hash value */
 typedef union QuorumVoteValue {
@@ -702,12 +706,316 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
+static int quorum_match_key(const char *key,
+                            char **key_prefix)
+{
+    const char *start;
+    char *next;
+    unsigned long long idx;
+    int ret;
+
+    *key_prefix = NULL;
+
+    /* the following code is a translation of the following pseudo code:
+     *       match = key.match('(^children\.(\d+)\.)$suffix)
+     *       if not match:
+     *           return -1;
+     *       key_prefix = match.outer_match()
+     *       idx = match.inner_match()
+     *
+     * note: we also match the .file suffix to avoid futher checkings
+     */
+
+    /* this key is not a child */
+    if (strncmp(key, KEY_PREFIX, strlen(KEY_PREFIX))) {
+        return -1;
+    }
+
+    /* take first char after prefix */
+    start = key + strlen(KEY_PREFIX);
+
+    /* if the string end here -> scan fail */
+    if (start[0] == '\0') {
+        return -1;
+    }
+
+    /* try to retrieve the index */
+    ret = parse_uint(start, &idx, &next, 10);
+
+    /* no int found -> scan fail */
+    if (ret < 0) {
+        return -1;
+    }
+
+    /* we are taking a reference via QMP */
+    if (next - key == strlen(key)) {
+        *key_prefix = g_strdup(key);
+        return idx;
+    }
+
+    /* match the suffix to avoid matching the same idx
+     * multiple times and be required to do more checks later
+     */
+    if (strncmp(next, KEY_FILENAME_SUFFIX, strlen(KEY_FILENAME_SUFFIX))) {
+        return -1;
+    }
+
+    /* do not include '.' */
+    int len = next - key;
+    *key_prefix = g_strndup(key, len);
+
+    return idx;
+}
+
+static QDict *quorum_get_children_idx(const QDict *options)
+{
+    const QDictEntry *ent;
+    QDict *result;
+    char *key_prefix;
+    int idx;
+
+    result = qdict_new();
+
+    for (ent = qdict_first(options); ent; ent = qdict_next(options, ent)) {
+        const char *key = qdict_entry_key(ent);
+        idx = quorum_match_key(key,
+                               &key_prefix);
+
+        /* if the result zero or positive we got a key */
+        if (idx < 0) {
+            continue;
+        }
+
+        qdict_put(result, key_prefix, qint_from_int(idx));
+    }
+
+    return result;
+}
+
+static int quorum_fill_validation_array(bool *array,
+                                        const QDict *dict,
+                                        int total,
+                                        Error **errp)
+{
+    const QDictEntry *ent;
+
+    /* fill the checking array with children indexes */
+    for (ent = qdict_first(dict); ent; ent = qdict_next(dict, ent)) {
+        const char *key = qdict_entry_key(ent);
+        int idx = qdict_get_int(dict, key);
+
+        if (idx < 0 || idx >= total) {
+            error_setg(errp,
+                "Children index must be between 0 and children count -1");
+            return -ERANGE;
+        }
+
+        array[idx] = true;
+    }
+
+    return 0;
+}
+
+static int quorum_valid_indexes(const bool *array, int total, Error **errp)
+{
+    int i;
+
+    for (i = 0; i < total; i++) {
+        if (array[i] == true) {
+            continue;
+        }
+
+        error_setg(errp,
+            "All child indexes between 0 and children count -1  must be "
+            " used");
+        return -ERANGE;
+    }
+
+    return 0;
+}
+
+static int quorum_valid_children_indexes(const QDict *dict,
+                                         int total,
+                                         Error **errp)
+{
+    bool *array;
+    int ret = 0;;
+
+    /* allocate indexes checking array and put false in it */
+    array = g_new0(bool, total);
+
+    ret = quorum_fill_validation_array(array, dict, total, errp);
+    if (ret < 0) {
+        goto free_exit;
+    }
+
+    ret = quorum_valid_indexes(array, total, errp);
+free_exit:
+    g_free(array);
+    return ret;
+}
+
+static int quorum_valid_threshold(int threshold,
+                                  int total,
+                                  Error **errp)
+{
+
+    if (threshold < 1) {
+        error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+                  "vote-threshold", "value >= 1");
+        return -ERANGE;
+    }
+
+    if (threshold > total) {
+        error_setg(errp, "threshold <= children count must be true");
+        return -ERANGE;
+    }
+
+    return 0;
+}
+
+static int quorum_open(BlockDriverState *bs,
+                       QDict *options,
+                       int flags,
+                       Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+    Error *local_err = NULL;
+    const QDictEntry *ent;
+    QDict *idx_dict;
+    bool *opened;
+    const char *value;
+    char *next;
+    int i;
+    int ret = 0;
+
+    /* get a dict of children indexes for validation */
+    idx_dict = quorum_get_children_idx(options);
+
+    /* count how many different children indexes are present and validate */
+    s->total = qdict_size(idx_dict);
+    if (s->total < 2) {
+        error_setg(&local_err,
+                   "Number of provided children must be greater than 1");
+        ret = -EINVAL;
+        goto exit;
+    }
+
+    /* validate that the set of index is coherent */
+    ret = quorum_valid_children_indexes(idx_dict, s->total, &local_err);
+    if (ret < 0) {
+        goto exit;
+    }
+
+    ret = qdict_get_try_int(options, "vote-threshold", -1);
+    /* from QMP */
+    if (ret != -1) {
+        qdict_del(options, "vote-threshold");
+        s->threshold = ret;
+    /* from command line */
+    } else {
+        /* retrieve the threshold option from the command line */
+        value = qdict_get_try_str(options, "vote_threshold");
+        if (!value) {
+            error_setg(&local_err,
+                       "vote_threshold must be provided");
+            ret = -EINVAL;
+            goto exit;
+        }
+        qdict_del(options, "vote_threshold");
+
+        ret = parse_uint(value, (unsigned long long *) &s->threshold, &next, 10);
+
+        /* no int found -> scan fail */
+        if (ret < 0) {
+            error_setg(&local_err,
+                       "invalid voter_threshold specified");
+            ret = -EINVAL;
+            goto exit;
+        }
+    }
+
+    /* and validate it againts s->total */
+    ret = quorum_valid_threshold(s->threshold, s->total, &local_err);
+    if (ret < 0) {
+        goto exit;
+    }
+
+    /* is the driver in blkverify mode */
+    value = qdict_get_try_str(options, "blkverify");
+    if (value && !strcmp(value, "on")  &&
+        s->total == 2 && s->threshold == 2) {
+        s->is_blkverify = true;
+    }
+    qdict_del(options, "blkverify");
+
+    /* allocate the children BlockDriverState array */
+    s->bs = g_new0(BlockDriverState *, s->total);
+    opened = g_new0(bool, s->total);
+
+    /* open children bs */
+    for (ent = qdict_first(idx_dict);
+         ent; ent = qdict_next(idx_dict, ent)) {
+        const char *key = qdict_entry_key(ent);
+        int idx = qdict_get_int(idx_dict, key);
+        ret = bdrv_open_image(&s->bs[idx],
+                              NULL,
+                              options,
+                              key,
+                              flags,
+                              false,
+                              &local_err);
+        if (ret < 0) {
+            goto close_exit;
+        }
+        opened[idx] = true;
+    }
+
+    g_free(opened);
+    goto exit;
+
+close_exit:
+    /* cleanup on error */
+    for (i = 0; i < s->total; i++) {
+        if (!opened[i]) {
+            continue;
+        }
+        bdrv_close(s->bs[i]);
+    }
+    g_free(s->bs);
+    g_free(opened);
+exit:
+    /* propagate error */
+    if (error_is_set(&local_err)) {
+        error_propagate(errp, local_err);
+    }
+    return ret;
+}
+
+static void quorum_close(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->total; i++) {
+        /* Ensure writes reach stable storage */
+        bdrv_flush(s->bs[i]);
+        /* close manually because we'll free s->bs */
+        bdrv_close(s->bs[i]);
+    }
+
+    g_free(s->bs);
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
 
     .instance_size      = sizeof(BDRVQuorumState),
 
+    .bdrv_file_open     = quorum_open,
+    .bdrv_close         = quorum_close,
+
     .bdrv_co_flush_to_disk = quorum_co_flush,
 
     .bdrv_getlength     = quorum_getlength,
diff --git a/qapi-schema.json b/qapi-schema.json
index 05ced9d..903a3a0 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4352,6 +4352,24 @@
             'raw': 'BlockdevRef' } }
 
 ##
+# @BlockdevOptionsQuorum
+#
+# Driver specific block device options for Quorum
+#
+# @blkverify:      #optional true if the driver must print content mismatch
+#
+# @children:       the children block device to use
+#
+# @vote_threshold: the vote limit under which a read will fail
+#
+# Since: 2.0
+##
+{ 'type': 'BlockdevOptionsQuorum',
+  'data': { '*blkverify': 'bool',
+            'children': [ 'BlockdevRef' ],
+            'vote-threshold': 'int' } }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
@@ -4389,7 +4407,8 @@
       'vdi':        'BlockdevOptionsGenericFormat',
       'vhdx':       'BlockdevOptionsGenericFormat',
       'vmdk':       'BlockdevOptionsGenericCOWFormat',
-      'vpc':        'BlockdevOptionsGenericFormat'
+      'vpc':        'BlockdevOptionsGenericFormat',
+      'quorum':     'BlockdevOptionsQuorum'
   } }
 
 ##
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH V10 13/13] quorum: Add unit test.
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (11 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 12/13] quorum: Add quorum_open() and quorum_close() Benoît Canet
@ 2014-01-28 16:52 ` Benoît Canet
  2014-02-02 23:19   ` Max Reitz
  2014-01-31 20:40 ` [Qemu-devel] [PATCH V10 00/13] Quorum block driver Max Reitz
  13 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-01-28 16:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, Benoit Canet, stefanha, mreitz

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 tests/qemu-iotests/075     | 85 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/075.out | 27 +++++++++++++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 113 insertions(+)
 create mode 100644 tests/qemu-iotests/075
 create mode 100644 tests/qemu-iotests/075.out

diff --git a/tests/qemu-iotests/075 b/tests/qemu-iotests/075
new file mode 100644
index 0000000..f8179a1
--- /dev/null
+++ b/tests/qemu-iotests/075
@@ -0,0 +1,85 @@
+#!/bin/bash
+#
+# Test Quorum block driver
+#
+# Copyright (C) 2013 Nodalink, SARL.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=benoit@irqsave.net
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+_cleanup()
+{
+    rm -rf $TEST_DIR/1.raw
+    rm -rf $TEST_DIR/2.raw
+    rm -rf $TEST_DIR/3.raw
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto generic
+_supported_os Linux
+
+quorum="file.driver=quorum,file.children.0.file.filename=$TEST_DIR/1.raw"
+quorum="$quorum,file.children.1.file.filename=$TEST_DIR/2.raw"
+quorum="$quorum,file.children.2.file.filename=$TEST_DIR/3.raw,file.vote_threshold=2"
+
+echo
+echo "== creating quorum files =="
+
+size=10M
+
+TEST_IMG="$TEST_DIR/1.raw" _make_test_img $size
+TEST_IMG="$TEST_DIR/2.raw" _make_test_img $size
+TEST_IMG="$TEST_DIR/3.raw" _make_test_img $size
+
+echo
+echo "== writing images =="
+
+$QEMU_IO -c "open -o $quorum" -c "write -P 0x32 0 $size" | _filter_qemu_io
+
+echo
+echo "== checking quorum write =="
+
+$QEMU_IO -c "read -P 0x32 0 $size" "$TEST_DIR/1.raw" | _filter_qemu_io
+$QEMU_IO -c "read -P 0x32 0 $size" "$TEST_DIR/2.raw" | _filter_qemu_io
+$QEMU_IO -c "read -P 0x32 0 $size" "$TEST_DIR/3.raw" | _filter_qemu_io
+
+echo
+echo "== corrupting image =="
+
+$QEMU_IO -c "write -P 0x42 0 $size" "$TEST_DIR/2.raw" | _filter_qemu_io
+
+echo
+echo "== checking quorum correction =="
+
+$QEMU_IO -c "open -o $quorum" -c "read -P 0x32 0 $size" | _filter_qemu_io
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/075.out b/tests/qemu-iotests/075.out
new file mode 100644
index 0000000..17e73ed
--- /dev/null
+++ b/tests/qemu-iotests/075.out
@@ -0,0 +1,27 @@
+QA output created by 075
+
+== creating quorum files ==
+Formatting 'TEST_DIR/1.IMGFMT', fmt=IMGFMT size=10485760 
+Formatting 'TEST_DIR/2.IMGFMT', fmt=IMGFMT size=10485760 
+Formatting 'TEST_DIR/3.IMGFMT', fmt=IMGFMT size=10485760 
+
+== writing images ==
+wrote 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== checking quorum write ==
+read 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== corrupting image ==
+wrote 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== checking quorum correction ==
+read 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 03c762f..c697500 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -81,4 +81,5 @@
 072 rw auto
 073 rw auto
 074 rw auto
+075 rw auto
 077 rw auto
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 00/13] Quorum block driver
  2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
                   ` (12 preceding siblings ...)
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 13/13] quorum: Add unit test Benoît Canet
@ 2014-01-31 20:40 ` Max Reitz
  13 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-01-31 20:40 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> Here is the new version of the QUORUM block driver.
> It now use bdrv_image_open for using QMP references and support snapshotting
> via the bs node-name infrastructure.
>
> I think the series is feature complete.
>
> It applies on top of Max bdrv_openv2 branch.

Well, considering that v2 is still subject to change (since I haven't 
yet sent it and will amend it according to Kevin's comments), this may 
be… Interesting. ;-)

I'll keep this version of the branch Benoît probably refers to under 
git://github.com/XanClic/qemu.git:bdrv_open-v1.5 – please don't use 
bdrv_open-v2, as that branch is subject to change.

Max

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 01/13] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 01/13] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
@ 2014-01-31 21:10   ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-01-31 21:10 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/Makefile.objs |  1 +
>   block/quorum.c      | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 55 insertions(+)
>   create mode 100644 block/quorum.c

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 02/13] quorum: Create BDRVQuorumState and BlkDriver and do init.
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 02/13] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
@ 2014-01-31 21:11   ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-01-31 21:11 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/quorum.c | 25 +++++++++++++++++++++++++
>   1 file changed, 25 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 03/13] quorum: Add quorum_aio_writev and its dependencies.
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 03/13] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
@ 2014-01-31 21:21   ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-01-31 21:21 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/quorum.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 123 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 04/13] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify.
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 04/13] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
@ 2014-01-31 21:25   ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-01-31 21:25 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/blkverify.c     | 108 +-------------------------------------------------
>   include/qemu-common.h |   2 +
>   util/iov.c            | 103 +++++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 107 insertions(+), 106 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 05/13] quorum: Add quorum_aio_readv.
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 05/13] quorum: Add quorum_aio_readv Benoît Canet
@ 2014-01-31 21:47   ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-01-31 21:47 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/quorum.c | 40 +++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index 85992ab..5bf37b3 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -101,14 +101,23 @@ static int quorum_get_first_error(QuorumAIOCB *acb)
>   static void quorum_aio_finalize(QuorumAIOCB *acb)
>   {
>       BDRVQuorumState *s = acb->bqs;
> -    int ret;
> +    int i, ret;
>   
>       ret = s->threshold <= acb->success_count ? 0 : quorum_get_first_error(acb);
>   
> +    for (i = 0; i < s->total; i++) {
> +        qemu_vfree(acb->aios[i].buf);
> +        acb->aios[i].buf = NULL;
> +        acb->aios[i].ret = 0;
> +    }

With “fields” I mean these as well (they are not set for write 
operations). ;-)

But as I've said before, it doesn't actually make a difference whether 
this is just executed for reads or also for writes, where these fields 
are NULL and 0 already anyway, so:

Reviewed-by: Max Reitz <mreitz@redhat.com>

> +
>       acb->common.cb(acb->common.opaque, ret);
>       if (acb->finished) {
>           *acb->finished = true;
>       }
> +    for (i = 0; acb->is_read && i < s->total; i++) {

This is a pretty elegant way to avoid an “if” around the loop, I have to 
admit. *g*

Max

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 06/13] quorum: Add quorum mechanism.
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 06/13] quorum: Add quorum mechanism Benoît Canet
@ 2014-01-31 23:04   ` Max Reitz
  2014-02-03 11:44     ` Benoît Canet
  0 siblings, 1 reply; 36+ messages in thread
From: Max Reitz @ 2014-01-31 23:04 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Use gnutls's SHA-256 to compare versions.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/Makefile.objs       |   2 +-
>   block/quorum.c            | 333 +++++++++++++++++++++++++++++++++++++++++++++-
>   configure                 |  36 +++++
>   docs/qmp/qmp-events.txt   |  33 +++++
>   include/monitor/monitor.h |   2 +
>   monitor.c                 |   2 +
>   6 files changed, 406 insertions(+), 2 deletions(-)
>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index a2650b9..4ca9d43 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -3,7 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-c
>   block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>   block-obj-y += qed-check.o
>   block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> -block-obj-y += quorum.o
> +block-obj-$(CONFIG_QUORUM) += quorum.o
>   block-obj-y += parallels.o blkdebug.o blkverify.o
>   block-obj-y += snapshot.o qapi.o
>   block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> diff --git a/block/quorum.c b/block/quorum.c
> index 5bf37b3..c319719 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -13,7 +13,43 @@
>    * See the COPYING file in the top-level directory.
>    */
>   
> +#include <gnutls/gnutls.h>
> +#include <gnutls/crypto.h>
>   #include "block/block_int.h"
> +#include "qapi/qmp/qjson.h"
> +
> +#define HASH_LENGTH 32
> +
> +/* This union holds a vote hash value */
> +typedef union QuorumVoteValue {
> +    char h[HASH_LENGTH];       /* SHA-256 hash */
> +    int64_t l;                 /* simpler 64 bits hash */
> +} QuorumVoteValue;
> +
> +/* A vote item */
> +typedef struct QuorumVoteItem {
> +    int index;
> +    QLIST_ENTRY(QuorumVoteItem) next;
> +} QuorumVoteItem;
> +
> +/* this structure is a vote version. A version is the set of votes sharing the
> + * same vote value.
> + * The set of votes will be tracked with the items field and its cardinality is
> + * vote_count.
> + */
> +typedef struct QuorumVoteVersion {
> +    QuorumVoteValue value;
> +    int index;
> +    int vote_count;
> +    QLIST_HEAD(, QuorumVoteItem) items;
> +    QLIST_ENTRY(QuorumVoteVersion) next;
> +} QuorumVoteVersion;
> +
> +/* this structure holds a group of vote versions together */
> +typedef struct QuorumVotes {
> +    QLIST_HEAD(, QuorumVoteVersion) vote_list;
> +    int (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
> +} QuorumVotes;
>   
>   /* the following structure holds the state of one quorum instance */
>   typedef struct {
> @@ -60,10 +96,14 @@ struct QuorumAIOCB {
>       int success_count;          /* number of successfully completed AIOCB */
>       bool *finished;             /* completion signal for cancel */
>   
> +    QuorumVotes votes;
> +
>       bool is_read;
>       int vote_ret;
>   };
>   
> +static void quorum_vote(QuorumAIOCB *acb);
> +
>   static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
>   {
>       QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
> @@ -111,6 +151,10 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
>           acb->aios[i].ret = 0;
>       }
>   
> +    if (acb->vote_ret) {
> +        ret = acb->vote_ret;
> +    }

Hm, this makes the vote_ret take precedence over other errors returned 
by the children. If they differ (and both are not 0), we can choose 
between returning either (vote_ret or “real” errors reported by the 
child devices). Both are errors, so I don't see any natural precedence. 
However, generally, vote_ret will contain a more generic error code 
(i.e., -EIO), thus I could imagine the other error code reported by the 
child device to be more appropriate, as it might contain more useful 
information.

But, well, on the other had, Quorum is designed for hiding errors 
reported by a minority of child devices; therefore, hiding such errors 
in this case as well is probably just consequent.

I'll leave this up to you; I'm fine with it as it is and I'd be fine if 
(for whatever reason) you were to change it. :-)

> +
>       acb->common.cb(acb->common.opaque, ret);
>       if (acb->finished) {
>           *acb->finished = true;
> @@ -122,6 +166,11 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
>       qemu_aio_release(acb);
>   }
>   
> +static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
> +{
> +    return memcmp(a->h, b->h, HASH_LENGTH);
> +}
> +
>   static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
>                                      BlockDriverState *bs,
>                                      QEMUIOVector *qiov,
> @@ -141,6 +190,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
>       acb->count = 0;
>       acb->success_count = 0;
>       acb->finished = NULL;
> +    acb->votes.compare = quorum_sha256_compare;
>       acb->is_read = false;
>       acb->vote_ret = 0;
>   
> @@ -170,9 +220,290 @@ static void quorum_aio_cb(void *opaque, int ret)
>           return;
>       }
>   
> +    /* Do the vote on read */
> +    if (acb->is_read) {
> +        quorum_vote(acb);
> +    }
> +
>       quorum_aio_finalize(acb);
>   }
>   
> +static void quorum_report_bad(QuorumAIOCB *acb, char *node_name)
> +{
> +    QObject *data;
> +    data = qobject_from_jsonf("{ 'node-name': \"%s\""
> +                              ", 'sector-num': %" PRId64
> +                              ", 'sectors-count': %i }",
> +                              node_name,
> +                              acb->sector_num,
> +                              acb->nb_sectors);
> +    monitor_protocol_event(QEVENT_QUORUM_REPORT_BAD, data);
> +    qobject_decref(data);
> +}
> +
> +static void quorum_report_failure(QuorumAIOCB *acb)
> +{
> +    QObject *data;
> +    data = qobject_from_jsonf("{ 'sector-num': %" PRId64
> +                              ", 'sectors-count': %i }",
> +                              acb->sector_num,
> +                              acb->nb_sectors);
> +    monitor_protocol_event(QEVENT_QUORUM_FAILURE, data);
> +    qobject_decref(data);
> +}
> +
> +static void quorum_report_bad_versions(BDRVQuorumState *s,
> +                                       QuorumAIOCB *acb,
> +                                       QuorumVoteValue *value)
> +{
> +    QuorumVoteVersion *version;
> +    QuorumVoteItem *item;
> +
> +    QLIST_FOREACH(version, &acb->votes.vote_list, next) {
> +        if (!acb->votes.compare(&version->value, value)) {
> +            continue;
> +        }
> +        QLIST_FOREACH(item, &version->items, next) {
> +            quorum_report_bad(acb, s->bs[item->index]->node_name);
> +        }
> +    }
> +}
> +
> +static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
> +{
> +    int i;
> +    assert(dest->niov == source->niov);
> +    assert(dest->size == source->size);
> +    for (i = 0; i < source->niov; i++) {
> +        assert(dest->iov[i].iov_len == source->iov[i].iov_len);
> +        memcpy(dest->iov[i].iov_base,
> +               source->iov[i].iov_base,
> +               source->iov[i].iov_len);
> +    }
> +}
> +
> +static void quorum_count_vote(QuorumVotes *votes,
> +                              QuorumVoteValue *value,
> +                              int index)
> +{
> +    QuorumVoteVersion *v = NULL, *version = NULL;
> +    QuorumVoteItem *item;
> +
> +    /* look if we have something with this hash */
> +    QLIST_FOREACH(v, &votes->vote_list, next) {
> +        if (!votes->compare(&v->value, value)) {
> +            version = v;
> +            break;
> +        }
> +    }
> +
> +    /* It's a version not yet in the list add it */
> +    if (!version) {
> +        version = g_new0(QuorumVoteVersion, 1);
> +        QLIST_INIT(&version->items);
> +        memcpy(&version->value, value, sizeof(version->value));
> +        version->index = index;
> +        version->vote_count = 0;
> +        QLIST_INSERT_HEAD(&votes->vote_list, version, next);
> +    }
> +
> +    version->vote_count++;
> +
> +    item = g_new0(QuorumVoteItem, 1);
> +    item->index = index;
> +    QLIST_INSERT_HEAD(&version->items, item, next);
> +}
> +
> +static void quorum_free_vote_list(QuorumVotes *votes)
> +{
> +    QuorumVoteVersion *version, *next_version;
> +    QuorumVoteItem *item, *next_item;
> +
> +    QLIST_FOREACH_SAFE(version, &votes->vote_list, next, next_version) {
> +        QLIST_REMOVE(version, next);
> +        QLIST_FOREACH_SAFE(item, &version->items, next, next_item) {
> +            QLIST_REMOVE(item, next);
> +            g_free(item);
> +        }
> +        g_free(version);
> +    }
> +}
> +
> +static int quorum_compute_hash(QuorumAIOCB *acb, int i, QuorumVoteValue *hash)
> +{
> +    int j, ret;
> +    gnutls_hash_hd_t dig;
> +    QEMUIOVector *qiov = &acb->aios[i].qiov;
> +
> +    ret = gnutls_hash_init(&dig, GNUTLS_DIG_SHA256);
> +
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    for (j = 0; j < qiov->niov; j++) {
> +        ret = gnutls_hash(dig, qiov->iov[j].iov_base, qiov->iov[j].iov_len);
> +        if (ret < 0) {
> +            break;
> +        }
> +    }
> +
> +    gnutls_hash_deinit(dig, (void *) hash);
> +    return ret;
> +}
> +
> +static QuorumVoteVersion *quorum_get_vote_winner(QuorumVotes *votes)
> +{
> +    int i = 0;
> +    QuorumVoteVersion *candidate, *winner = NULL;
> +
> +    QLIST_FOREACH(candidate, &votes->vote_list, next) {
> +        if (candidate->vote_count > i) {
> +            i = candidate->vote_count;
> +            winner = candidate;
> +        }
> +    }
> +
> +    return winner;
> +}
> +
> +static bool quorum_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
> +{
> +    int i;
> +    int result;
> +
> +    assert(a->niov == b->niov);
> +    for (i = 0; i < a->niov; i++) {
> +        assert(a->iov[i].iov_len == b->iov[i].iov_len);
> +        result = memcmp(a->iov[i].iov_base,
> +                        b->iov[i].iov_base,
> +                        a->iov[i].iov_len);
> +        if (result) {
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
> +static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
> +                                          const char *fmt, ...)
> +{
> +    va_list ap;
> +
> +    va_start(ap, fmt);
> +    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
> +            acb->sector_num, acb->nb_sectors);
> +    vfprintf(stderr, fmt, ap);
> +    fprintf(stderr, "\n");
> +    va_end(ap);
> +    exit(1);
> +}
> +
> +static bool quorum_compare(QuorumAIOCB *acb,
> +                           QEMUIOVector *a,
> +                           QEMUIOVector *b)
> +{
> +    BDRVQuorumState *s = acb->bqs;
> +    ssize_t offset;
> +
> +    /* This driver will replace blkverify in this particular case */
> +    if (s->is_blkverify) {
> +        offset = qemu_iovec_compare(a, b);
> +        if (offset != -1) {
> +            quorum_err(acb, "contents mismatch in sector %" PRId64,
> +                       acb->sector_num +
> +                       (uint64_t)(offset / BDRV_SECTOR_SIZE));
> +        }
> +        return true;
> +    }
> +
> +    return quorum_iovec_compare(a, b);
> +}
> +
> +static void quorum_vote(QuorumAIOCB *acb)
> +{
> +    bool quorum = false;
> +    int i, j, ret;
> +    QuorumVoteValue hash;
> +    BDRVQuorumState *s = acb->bqs;
> +    QuorumVoteVersion *winner;
> +
> +    QLIST_INIT(&acb->votes.vote_list);
> +
> +    /* if we don't get any successful read use the first error code */
> +    if (!acb->success_count) {
> +        acb->vote_ret = acb->aios[0].ret;
> +        return;
> +    }
> +
> +    /* get the index of the first successful read (we are sure to find one) */
> +    for (i = 0; i < s->total; i++) {
> +        if (!acb->aios[i].ret) {
> +            break;
> +        }
> +    }
> +
> +    /* compare this read with all other successful read looking for quorum */
> +    for (j = i + 1; j < s->total; j++) {
> +        if (acb->aios[j].ret) {
> +            continue;
> +        }
> +        quorum = quorum_compare(acb, &acb->aios[i].qiov, &acb->aios[j].qiov);
> +        if (!quorum) {
> +            break;
> +       }
> +    }
> +
> +    /* Every successful read agrees and their count is higher or equal threshold
> +     * -> Quorum
> +     */
> +    if (quorum && acb->success_count >= s->threshold) {
> +        quorum_copy_qiov(acb->qiov, &acb->aios[i].qiov);
> +        return;
> +    }

Well, for quorum && acb->success_count < s->threshold, calculating and 
comparing hashes doesn't help at all (since the vote winner will only be 
able to get up to acb->success_count votes anyway). We should probably 
catch this case in the first if condition in this function (change it to 
“if (acb->success_count < s->threshold)”).

> +
> +    /* compute hashs for each successful read, also store indexes */
> +    for (i = 0; i < s->total; i++) {
> +        if (acb->aios[i].ret) {
> +            continue;
> +        }
> +        ret = quorum_compute_hash(acb, i, &hash);
> +        /* if ever the hash computation failed */
> +        if (ret < 0) {
> +            acb->vote_ret = ret;
> +            goto free_exit;
> +        }
> +        quorum_count_vote(&acb->votes, &hash, i);
> +    }
> +
> +    /* vote to select the most represented version */
> +    winner = quorum_get_vote_winner(&acb->votes);
> +    /* every vote version are differents -> error */
> +    if (!winner) {
> +        quorum_report_failure(acb);
> +        acb->vote_ret = -EIO;
> +        goto free_exit;
> +    }
> +
> +    /* if the winner count is smaller than threshold the read fails */
> +    if (winner->vote_count < s->threshold) {
> +        quorum_report_failure(acb);
> +        acb->vote_ret = -EIO;
> +        goto free_exit;
> +    }
> +
> +    /* we have a winner: copy it */
> +    quorum_copy_qiov(acb->qiov, &acb->aios[winner->index].qiov);
> +
> +    /* some versions are bad print them */
> +    quorum_report_bad_versions(s, acb, &winner->value);
> +
> +free_exit:
> +    /* free lists */
> +    quorum_free_vote_list(&acb->votes);
> +}
> +
>   static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
>                                            int64_t sector_num,
>                                            QEMUIOVector *qiov,
> @@ -194,7 +525,7 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
>       }
>   
>       for (i = 0; i < s->total; i++) {
> -        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
> +        bdrv_aio_readv(s->bs[i], sector_num, &acb->aios[i].qiov, nb_sectors,
>                          quorum_aio_cb, &acb->aios[i]);
>       }
>   
> diff --git a/configure b/configure
> index b472694..5a68738 100755
> --- a/configure
> +++ b/configure
> @@ -263,6 +263,7 @@ gtkabi="2.0"
>   tpm="no"
>   libssh2=""
>   vhdx=""
> +quorum="no"
>   
>   # parse CC options first
>   for opt do
> @@ -1000,6 +1001,10 @@ for opt do
>     ;;
>     --disable-vhdx) vhdx="no"
>     ;;
> +  --disable-quorum) quorum="no"
> +  ;;
> +  --enable-quorum) quorum="yes"
> +  ;;
>     *) echo "ERROR: unknown option $opt"; show_help="yes"
>     ;;
>     esac
> @@ -1254,6 +1259,8 @@ Advanced options (experts only):
>     --enable-libssh2         enable ssh block device support
>     --disable-vhdx           disables support for the Microsoft VHDX image format
>     --enable-vhdx            enable support for the Microsoft VHDX image format
> +  --disable-quorum         disable quorum block filter support
> +  --enable-quorum          enable quorum block filter support
>   
>   NOTE: The object files are built at the place where configure is launched
>   EOF
> @@ -1923,6 +1930,30 @@ EOF
>   fi
>   
>   ##########################################
> +# Quorum probe (check for gnutls)
> +if test "$quorum" != "no" ; then
> +cat > $TMPC <<EOF
> +#include <gnutls/gnutls.h>
> +#include <gnutls/crypto.h>
> +int main(void) {char data[4096], digest[32];
> +gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
> +return 0;
> +}
> +EOF
> +quorum_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
> +quorum_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
> +if compile_prog "$quorum_tls_cflags" "$quorum_tls_libs" ; then
> +  qcow_tls=yes
> +  libs_softmmu="$quorum_tls_libs $libs_softmmu"
> +  libs_tools="$quorum_tls_libs $libs_softmmu"
> +  QEMU_CFLAGS="$QEMU_CFLAGS $quorum_tls_cflags"
> +else
> +  echo "gnutls > 2.10.0 required to compile Quorum"
> +  exit 1
> +fi
> +fi
> +
> +##########################################
>   # VNC SASL detection
>   if test "$vnc" = "yes" -a "$vnc_sasl" != "no" ; then
>     cat > $TMPC <<EOF
> @@ -3843,6 +3874,7 @@ echo "libssh2 support   $libssh2"
>   echo "TPM passthrough   $tpm_passthrough"
>   echo "QOM debugging     $qom_cast_debug"
>   echo "vhdx              $vhdx"
> +echo "Quorum            $quorum"
>   
>   if test "$sdl_too_old" = "yes"; then
>   echo "-> Your SDL version is too old - please upgrade to have SDL support"
> @@ -4241,6 +4273,10 @@ if test "$libssh2" = "yes" ; then
>     echo "CONFIG_LIBSSH2=y" >> $config_host_mak
>   fi
>   
> +if test "$quorum" = "yes" ; then
> +  echo "CONFIG_QUORUM=y" >> $config_host_mak
> +fi
> +
>   if test "$virtio_blk_data_plane" = "yes" ; then
>     echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
>   fi
> diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
> index 6b87e97..1cbdde7 100644
> --- a/docs/qmp/qmp-events.txt
> +++ b/docs/qmp/qmp-events.txt
> @@ -500,3 +500,36 @@ Example:
>   
>   Note: If action is "reset", "shutdown", or "pause" the WATCHDOG event is
>   followed respectively by the RESET, SHUTDOWN, or STOP events.
> +
> +QUORUM_FAILURE
> +--------------
> +
> +Emitted by the Quorum block driver when it fail to establish a quorum.

Probably rather “if” than “when”; also, “fails”.

> +
> +Data:
> +
> +- "sector-num":   Number of the first sector of the failed read operation.
> +- "sector-count": Failed read operation sector count.
> +
> +Example:
> +
> +{ "event": "QUORUM_FAILURE",
> +     "data": { "sector-num": 345435, "sector-count": 5 },
> +     "timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
> +
> +QUORUM_REPORT_BAD
> +-----------------
> +
> +Emitted to report a corruption of a Quorum file.
> +
> +Data:
> +
> +- "node-name":  The graph node name of the block driver state.

The description is not aligned with the others, this is probably due to 
the change from child-index to node-name…?

Max

> +- "sector-num":   Number of the first sector of the failed read operation.
> +- "sector-count": Failed read operation sector count.
> +
> +Example:
> +
> +{ "event": "QUORUM_REPORT_BAD",
> +     "data": { "node-name": "1.raw", "sector-num": 345435, "sector-count": 5 },
> +     "timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index 7e5f752..a49ea11 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -49,6 +49,8 @@ typedef enum MonitorEvent {
>       QEVENT_SPICE_MIGRATE_COMPLETED,
>       QEVENT_GUEST_PANICKED,
>       QEVENT_BLOCK_IMAGE_CORRUPTED,
> +    QEVENT_QUORUM_FAILURE,
> +    QEVENT_QUORUM_REPORT_BAD,
>   
>       /* Add to 'monitor_event_names' array in monitor.c when
>        * defining new events here */
> diff --git a/monitor.c b/monitor.c
> index 80456fb..f8f6aea 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -507,6 +507,8 @@ static const char *monitor_event_names[] = {
>       [QEVENT_SPICE_MIGRATE_COMPLETED] = "SPICE_MIGRATE_COMPLETED",
>       [QEVENT_GUEST_PANICKED] = "GUEST_PANICKED",
>       [QEVENT_BLOCK_IMAGE_CORRUPTED] = "BLOCK_IMAGE_CORRUPTED",
> +    [QEVENT_QUORUM_FAILURE] = "QUORUM_FAILURE",
> +    [QEVENT_QUORUM_REPORT_BAD] = "QUORUM_REPORT_BAD",
>   };
>   QEMU_BUILD_BUG_ON(ARRAY_SIZE(monitor_event_names) != QEVENT_MAX)
>   

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 07/13] quorum: Add quorum_getlength().
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 07/13] quorum: Add quorum_getlength() Benoît Canet
@ 2014-02-02 21:25   ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-02-02 21:25 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Check that every bs file returns the same length.
> Otherwize, return -EIO to disable the quorum and

*Otherwise

> avoid length discrepancy.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/quorum.c | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)

Aside from that:

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 08/13] quorum: Add quorum_invalidate_cache().
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 08/13] quorum: Add quorum_invalidate_cache() Benoît Canet
@ 2014-02-02 21:27   ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-02-02 21:27 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/quorum.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 09/13] quorum: Add quorum_co_get_block_status.
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 09/13] quorum: Add quorum_co_get_block_status Benoît Canet
@ 2014-02-02 21:44   ` Max Reitz
  2014-02-03 11:47     ` Benoît Canet
  0 siblings, 1 reply; 36+ messages in thread
From: Max Reitz @ 2014-02-02 21:44 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/quorum.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 67 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index a47cd33..9b0718b 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -171,6 +171,22 @@ static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
>       return memcmp(a->h, b->h, HASH_LENGTH);
>   }
>   
> +static int quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
> +{
> +    int64_t i = a->l;
> +    int64_t j = b->l;
> +
> +    if (i < j) {
> +        return -1;
> +    }
> +
> +    if (i > j) {
> +        return 1;
> +    }
> +
> +    return 0;
> +}
> +
>   static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
>                                      BlockDriverState *bs,
>                                      QEMUIOVector *qiov,
> @@ -587,6 +603,56 @@ static void quorum_invalidate_cache(BlockDriverState *bs)
>       }
>   }
>   
> +static int64_t coroutine_fn quorum_co_get_block_status(BlockDriverState *bs,
> +                                                       int64_t sector_num,
> +                                                       int nb_sectors,
> +                                                       int *pnum)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    QuorumVoteVersion *winner = NULL;
> +    QuorumVotes result_votes, num_votes;
> +    QuorumVoteValue result_value, num_value;
> +    int i, num;
> +    int64_t result = 0;
> +
> +    QLIST_INIT(&result_votes.vote_list);
> +    QLIST_INIT(&num_votes.vote_list);
> +    result_votes.compare = quorum_64bits_compare;
> +    num_votes.compare = quorum_64bits_compare;
> +
> +    for (i = 0; i < s->total; i++) {
> +        result = bdrv_get_block_status(s->bs[i], sector_num, nb_sectors, &num);
> +        /* skip failed requests */
> +        if (result < 0) {
> +            continue;
> +        }
> +        result_value.l = result & BDRV_BLOCK_DATA;
> +        num_value.l = num;
> +        quorum_count_vote(&result_votes, &result_value, i);
> +        quorum_count_vote(&num_votes, &num_value, i);
> +    }
> +
> +    winner = quorum_get_vote_winner(&result_votes);
> +    result = winner->value.l;

Below, you're reading the winning value after checking whether it's 
corresponding votes exceeded the threshold. It doesn't matter in the 
end, but for the sake of uniformity, I'd do it the same way here (i.e., 
move this statement below the if block).

> +    if (winner->vote_count < s->threshold) {
> +        result = -ERANGE;

Is there any specific reason why you're returning -ERANGE here and -EIO 
everywhere else (even in quorum_getlength())?

Max

> +        goto free_exit;
> +    }
> +
> +    winner = quorum_get_vote_winner(&num_votes);
> +    if (winner->vote_count < s->threshold) {
> +        result = -ERANGE;
> +        goto free_exit;
> +    }
> +    *pnum = winner->value.l;
> +
> +free_exit:
> +    quorum_free_vote_list(&result_votes);
> +    quorum_free_vote_list(&num_votes);
> +
> +    return result;
> +}
> +
>   static BlockDriver bdrv_quorum = {
>       .format_name        = "quorum",
>       .protocol_name      = "quorum",
> @@ -598,6 +664,7 @@ static BlockDriver bdrv_quorum = {
>       .bdrv_aio_readv     = quorum_aio_readv,
>       .bdrv_aio_writev    = quorum_aio_writev,
>       .bdrv_invalidate_cache = quorum_invalidate_cache,
> +    .bdrv_co_get_block_status = quorum_co_get_block_status,
>   };
>   
>   static void bdrv_quorum_init(void)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 10/13] quorum: Add quorum_co_flush().
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 10/13] quorum: Add quorum_co_flush() Benoît Canet
@ 2014-02-02 22:02   ` Max Reitz
  2014-02-03 11:57     ` Benoît Canet
  0 siblings, 1 reply; 36+ messages in thread
From: Max Reitz @ 2014-02-02 22:02 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Makes a vote to select error if any.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/quorum.c | 34 ++++++++++++++++++++++++++++++++++
>   1 file changed, 34 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index 9b0718b..1b84b07 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -653,12 +653,46 @@ free_exit:
>       return result;
>   }
>   
> +static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    QuorumVoteVersion *winner = NULL;
> +    QuorumVotes error_votes;
> +    QuorumVoteValue result_value;
> +    int i;
> +    int result = 0;
> +    bool error = false;
> +
> +    QLIST_INIT(&error_votes.vote_list);
> +    error_votes.compare = quorum_64bits_compare;
> +
> +    for (i = 0; i < s->total; i++) {
> +        result = bdrv_co_flush(s->bs[i]);
> +        if (result) {
> +            error = true;
> +            result_value.l = result;
> +            quorum_count_vote(&error_votes, &result_value, i);
> +        }
> +    }
> +
> +    if (error) {
> +        winner = quorum_get_vote_winner(&error_votes);
> +        result = winner->value.l;
> +    }
> +
> +    quorum_free_vote_list(&error_votes);
> +
> +    return result;
> +}
> +
>   static BlockDriver bdrv_quorum = {
>       .format_name        = "quorum",
>       .protocol_name      = "quorum",
>   
>       .instance_size      = sizeof(BDRVQuorumState),
>   
> +    .bdrv_co_flush_to_disk = quorum_co_flush,
> +
>       .bdrv_getlength     = quorum_getlength,
>   
>       .bdrv_aio_readv     = quorum_aio_readv,

So, my general opinion on this patch (for reads/writes we don't vote on 
the error code either; so why here?) hasn't changed, but well, I 
definitely don't oppose it.

Another problem, however: If any error occurs, this function will return 
an error as well. Is that intended? If an error on a read/write 
operation occurs but there are still enough successful reads/writes to 
reach quorum, no error is returned. Is there a reason why this should be 
different for flush?

Max

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 11/13] quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum.
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 11/13] quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum Benoît Canet
@ 2014-02-02 22:31   ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-02-02 22:31 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/quorum.c | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 12/13] quorum: Add quorum_open() and quorum_close().
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 12/13] quorum: Add quorum_open() and quorum_close() Benoît Canet
@ 2014-02-02 23:09   ` Max Reitz
  2014-02-03 12:21     ` Benoît Canet
  2014-02-03 12:43     ` Benoît Canet
  0 siblings, 2 replies; 36+ messages in thread
From: Max Reitz @ 2014-02-02 23:09 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> From: Benoît Canet <benoit@irqsave.net>
>
> Example of command line:
> -drive if=virtio,file.driver=quorum,\
> file.children.0.file.filename=1.raw,\
> file.children.0.node-name=1.raw,\
> file.children.0.driver=raw,\
> file.children.1.file.filename=2.raw,\
> file.children.1.node-name=2.raw,\
> file.children.1.driver=raw,\
> file.children.2.file.filename=3.raw,\
> file.children.2.node-name=3.raw,\
> file.children.2.driver=raw,\
> file.vote_threshold=2
>
> file.blkverify=on with file.vote_threshold=2 and two files can be passed to
> emulated blkverify.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   block/quorum.c   | 308 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   qapi-schema.json |  21 +++-
>   2 files changed, 328 insertions(+), 1 deletion(-)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index e7b2090..0c0d630 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -17,8 +17,12 @@
>   #include <gnutls/crypto.h>
>   #include "block/block_int.h"
>   #include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/types.h"
> +#include "qemu-common.h"
>   
>   #define HASH_LENGTH 32
> +#define KEY_PREFIX "children."
> +#define KEY_FILENAME_SUFFIX ".file.filename"
>   
>   /* This union holds a vote hash value */
>   typedef union QuorumVoteValue {
> @@ -702,12 +706,316 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>       return false;
>   }

Perhaps you could make use of qdict_extract_subqdict() and 
qdict_array_split() functions here...? That is, 
qdict_extract_subqdict(..., "children.") and then qdict_array_split() on 
the result?

> +static int quorum_match_key(const char *key,
> +                            char **key_prefix)
> +{
> +    const char *start;
> +    char *next;
> +    unsigned long long idx;
> +    int ret;
> +
> +    *key_prefix = NULL;
> +
> +    /* the following code is a translation of the following pseudo code:
> +     *       match = key.match('(^children\.(\d+)\.)$suffix)
> +     *       if not match:
> +     *           return -1;
> +     *       key_prefix = match.outer_match()
> +     *       idx = match.inner_match()
> +     *
> +     * note: we also match the .file suffix to avoid futher checkings
> +     */
> +
> +    /* this key is not a child */
> +    if (strncmp(key, KEY_PREFIX, strlen(KEY_PREFIX))) {
> +        return -1;
> +    }
> +
> +    /* take first char after prefix */
> +    start = key + strlen(KEY_PREFIX);
> +
> +    /* if the string end here -> scan fail */
> +    if (start[0] == '\0') {
> +        return -1;
> +    }
> +
> +    /* try to retrieve the index */
> +    ret = parse_uint(start, &idx, &next, 10);
> +
> +    /* no int found -> scan fail */
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    /* we are taking a reference via QMP */
> +    if (next - key == strlen(key)) {

"if (!*next)" would probably accomplish the same (or "if next[0] == 
'\0')" to keep your coding style) without having to call strlen().

> +        *key_prefix = g_strdup(key);
> +        return idx;
> +    }
> +
> +    /* match the suffix to avoid matching the same idx
> +     * multiple times and be required to do more checks later
> +     */
> +    if (strncmp(next, KEY_FILENAME_SUFFIX, strlen(KEY_FILENAME_SUFFIX))) {

As stated in my review from October, you may use sizeof() to avoid strlen().

> +        return -1;
> +    }
> +
> +    /* do not include '.' */
> +    int len = next - key;
> +    *key_prefix = g_strndup(key, len);
> +
> +    return idx;
> +}
> +
> +static QDict *quorum_get_children_idx(const QDict *options)
> +{
> +    const QDictEntry *ent;
> +    QDict *result;
> +    char *key_prefix;
> +    int idx;
> +
> +    result = qdict_new();
> +
> +    for (ent = qdict_first(options); ent; ent = qdict_next(options, ent)) {
> +        const char *key = qdict_entry_key(ent);
> +        idx = quorum_match_key(key,
> +                               &key_prefix);
> +
> +        /* if the result zero or positive we got a key */
> +        if (idx < 0) {
> +            continue;
> +        }
> +
> +        qdict_put(result, key_prefix, qint_from_int(idx));
> +    }
> +
> +    return result;
> +}
> +
> +static int quorum_fill_validation_array(bool *array,
> +                                        const QDict *dict,
> +                                        int total,
> +                                        Error **errp)
> +{
> +    const QDictEntry *ent;
> +
> +    /* fill the checking array with children indexes */
> +    for (ent = qdict_first(dict); ent; ent = qdict_next(dict, ent)) {
> +        const char *key = qdict_entry_key(ent);
> +        int idx = qdict_get_int(dict, key);
> +
> +        if (idx < 0 || idx >= total) {
> +            error_setg(errp,
> +                "Children index must be between 0 and children count -1");

Quote from October: I'd prefer "- 1" instead of "-1" (or even "Child 
index must be an unsigned integer smaller than the children count").

> +            return -ERANGE;
> +        }
> +
> +        array[idx] = true;
> +    }
> +
> +    return 0;
> +}
> +
> +static int quorum_valid_indexes(const bool *array, int total, Error **errp)
> +{
> +    int i;
> +
> +    for (i = 0; i < total; i++) {
> +        if (array[i] == true) {
> +            continue;
> +        }
> +
> +        error_setg(errp,
> +            "All child indexes between 0 and children count -1  must be "
> +            " used");

Again, I'd prefer "- 1"; then, there are two spaces to the left of 
"must" and two spaces after "be".

> +        return -ERANGE;
> +    }
> +
> +    return 0;
> +}
> +
> +static int quorum_valid_children_indexes(const QDict *dict,
> +                                         int total,
> +                                         Error **errp)
> +{
> +    bool *array;
> +    int ret = 0;;
> +
> +    /* allocate indexes checking array and put false in it */
> +    array = g_new0(bool, total);
> +
> +    ret = quorum_fill_validation_array(array, dict, total, errp);
> +    if (ret < 0) {
> +        goto free_exit;
> +    }
> +
> +    ret = quorum_valid_indexes(array, total, errp);
> +free_exit:
> +    g_free(array);
> +    return ret;
> +}
> +
> +static int quorum_valid_threshold(int threshold,
> +                                  int total,
> +                                  Error **errp)
> +{
> +
> +    if (threshold < 1) {
> +        error_set(errp, QERR_INVALID_PARAMETER_VALUE,
> +                  "vote-threshold", "value >= 1");
> +        return -ERANGE;
> +    }
> +
> +    if (threshold > total) {
> +        error_setg(errp, "threshold <= children count must be true");

Quote from October: Well, while this might technically be true, I'd 
rather go for "threshold may not exceed children count" instead. ;-)

> +        return -ERANGE;
> +    }
> +
> +    return 0;
> +}
> +
> +static int quorum_open(BlockDriverState *bs,
> +                       QDict *options,
> +                       int flags,
> +                       Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    Error *local_err = NULL;
> +    const QDictEntry *ent;
> +    QDict *idx_dict;
> +    bool *opened;
> +    const char *value;
> +    char *next;
> +    int i;
> +    int ret = 0;
> +
> +    /* get a dict of children indexes for validation */
> +    idx_dict = quorum_get_children_idx(options);
> +
> +    /* count how many different children indexes are present and validate */
> +    s->total = qdict_size(idx_dict);
> +    if (s->total < 2) {
> +        error_setg(&local_err,
> +                   "Number of provided children must be greater than 1");
> +        ret = -EINVAL;
> +        goto exit;
> +    }
> +
> +    /* validate that the set of index is coherent */
> +    ret = quorum_valid_children_indexes(idx_dict, s->total, &local_err);
> +    if (ret < 0) {
> +        goto exit;
> +    }
> +
> +    ret = qdict_get_try_int(options, "vote-threshold", -1);
> +    /* from QMP */
> +    if (ret != -1) {
> +        qdict_del(options, "vote-threshold");
> +        s->threshold = ret;
> +    /* from command line */
> +    } else {
> +        /* retrieve the threshold option from the command line */
> +        value = qdict_get_try_str(options, "vote_threshold");
> +        if (!value) {
> +            error_setg(&local_err,
> +                       "vote_threshold must be provided");
> +            ret = -EINVAL;
> +            goto exit;
> +        }
> +        qdict_del(options, "vote_threshold");
> +
> +        ret = parse_uint(value, (unsigned long long *) &s->threshold, &next, 10);

I don't like this cast at all...

> +
> +        /* no int found -> scan fail */
> +        if (ret < 0) {
> +            error_setg(&local_err,
> +                       "invalid voter_threshold specified");

*vote_threshold

> +            ret = -EINVAL;
> +            goto exit;
> +        }
> +    }
> +
> +    /* and validate it againts s->total */

Still *against

> +    ret = quorum_valid_threshold(s->threshold, s->total, &local_err);
> +    if (ret < 0) {
> +        goto exit;
> +    }
> +
> +    /* is the driver in blkverify mode */
> +    value = qdict_get_try_str(options, "blkverify");
> +    if (value && !strcmp(value, "on")  &&
> +        s->total == 2 && s->threshold == 2) {
> +        s->is_blkverify = true;
> +    }

Quote from October: If the user specifies anything different from "on" 
or if he does and s->total or s->threshold is not 2, quorum will 
silently work in non-blkverify mode without telling the user. So I'd 
emit a message here if blkverify has been specified and its value is 
different from "off" but s->is_blkverify remains false (i.e., “else if 
(value && strcmp(value, "off")) { fprintf(stderr, "[Some warning]"); }”).

> +    qdict_del(options, "blkverify");
> +
> +    /* allocate the children BlockDriverState array */
> +    s->bs = g_new0(BlockDriverState *, s->total);
> +    opened = g_new0(bool, s->total);
> +
> +    /* open children bs */
> +    for (ent = qdict_first(idx_dict);
> +         ent; ent = qdict_next(idx_dict, ent)) {

This condition fits on a single line.

> +        const char *key = qdict_entry_key(ent);
> +        int idx = qdict_get_int(idx_dict, key);
> +        ret = bdrv_open_image(&s->bs[idx],
> +                              NULL,
> +                              options,
> +                              key,
> +                              flags,
> +                              false,
> +                              &local_err);

This takes two, but definitely not seven.

> +        if (ret < 0) {
> +            goto close_exit;
> +        }
> +        opened[idx] = true;
> +    }
> +
> +    g_free(opened);
> +    goto exit;
> +
> +close_exit:
> +    /* cleanup on error */
> +    for (i = 0; i < s->total; i++) {
> +        if (!opened[i]) {
> +            continue;
> +        }
> +        bdrv_close(s->bs[i]);

You'll probably want bdrv_unref() here.

> +    }
> +    g_free(s->bs);
> +    g_free(opened);
> +exit:
> +    /* propagate error */
> +    if (error_is_set(&local_err)) {
> +        error_propagate(errp, local_err);
> +    }
> +    return ret;
> +}
> +
> +static void quorum_close(BlockDriverState *bs)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->total; i++) {
> +        /* Ensure writes reach stable storage */
> +        bdrv_flush(s->bs[i]);
> +        /* close manually because we'll free s->bs */
> +        bdrv_close(s->bs[i]);

Again, bdrv_unref().

> +    }
> +
> +    g_free(s->bs);
> +}
> +
>   static BlockDriver bdrv_quorum = {
>       .format_name        = "quorum",
>       .protocol_name      = "quorum",
>   
>       .instance_size      = sizeof(BDRVQuorumState),
>   
> +    .bdrv_file_open     = quorum_open,
> +    .bdrv_close         = quorum_close,
> +
>       .bdrv_co_flush_to_disk = quorum_co_flush,
>   
>       .bdrv_getlength     = quorum_getlength,
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 05ced9d..903a3a0 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -4352,6 +4352,24 @@
>               'raw': 'BlockdevRef' } }
>   
>   ##
> +# @BlockdevOptionsQuorum
> +#
> +# Driver specific block device options for Quorum
> +#
> +# @blkverify:      #optional true if the driver must print content mismatch
> +#
> +# @children:       the children block device to use
> +#
> +# @vote_threshold: the vote limit under which a read will fail
> +#
> +# Since: 2.0
> +##
> +{ 'type': 'BlockdevOptionsQuorum',
> +  'data': { '*blkverify': 'bool',
> +            'children': [ 'BlockdevRef' ],
> +            'vote-threshold': 'int' } }
> +
> +##
>   # @BlockdevOptions
>   #
>   # Options for creating a block device.
> @@ -4389,7 +4407,8 @@
>         'vdi':        'BlockdevOptionsGenericFormat',
>         'vhdx':       'BlockdevOptionsGenericFormat',
>         'vmdk':       'BlockdevOptionsGenericCOWFormat',
> -      'vpc':        'BlockdevOptionsGenericFormat'
> +      'vpc':        'BlockdevOptionsGenericFormat',
> +      'quorum':     'BlockdevOptionsQuorum'
>     } }
>   
>   ##

As I've said before, I'd really like it if you could abandon all those 
functions for parsing the incoming options QDict in favor of 
qdict_extract_subqdict() and qdict_array_split() which will most likely 
do the job just fine; and, in fact, I have written qdict_array_split() 
with Quorum in mind. ;-)

Max

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 13/13] quorum: Add unit test.
  2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 13/13] quorum: Add unit test Benoît Canet
@ 2014-02-02 23:19   ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-02-02 23:19 UTC (permalink / raw)
  To: Benoît Canet, qemu-devel; +Cc: kwolf, Benoit Canet, stefanha

On 28.01.2014 17:52, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>   tests/qemu-iotests/075     | 85 ++++++++++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/075.out | 27 +++++++++++++++
>   tests/qemu-iotests/group   |  1 +
>   3 files changed, 113 insertions(+)
>   create mode 100644 tests/qemu-iotests/075
>   create mode 100644 tests/qemu-iotests/075.out

How about adding a test case where quorum could not be attained (i.e., 
corrupt another image differently)? Perhaps we could also have some 
tests with blkdebug underneath it for injecting I/O errors, but I'm 
personally fine without.

Max

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 06/13] quorum: Add quorum mechanism.
  2014-01-31 23:04   ` Max Reitz
@ 2014-02-03 11:44     ` Benoît Canet
  2014-02-03 19:03       ` Max Reitz
  0 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-02-03 11:44 UTC (permalink / raw)
  To: Max Reitz; +Cc: Benoît Canet, kwolf, qemu-devel, stefanha

Le Saturday 01 Feb 2014 à 00:04:01 (+0100), Max Reitz a écrit :
> On 28.01.2014 17:52, Benoît Canet wrote:
> >From: Benoît Canet <benoit@irqsave.net>
> >
> >Use gnutls's SHA-256 to compare versions.
> >
> >Signed-off-by: Benoit Canet <benoit@irqsave.net>
> >---
> >  block/Makefile.objs       |   2 +-
> >  block/quorum.c            | 333 +++++++++++++++++++++++++++++++++++++++++++++-
> >  configure                 |  36 +++++
> >  docs/qmp/qmp-events.txt   |  33 +++++
> >  include/monitor/monitor.h |   2 +
> >  monitor.c                 |   2 +
> >  6 files changed, 406 insertions(+), 2 deletions(-)
> >
> >diff --git a/block/Makefile.objs b/block/Makefile.objs
> >index a2650b9..4ca9d43 100644
> >--- a/block/Makefile.objs
> >+++ b/block/Makefile.objs
> >@@ -3,7 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-c
> >  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> >  block-obj-y += qed-check.o
> >  block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> >-block-obj-y += quorum.o
> >+block-obj-$(CONFIG_QUORUM) += quorum.o
> >  block-obj-y += parallels.o blkdebug.o blkverify.o
> >  block-obj-y += snapshot.o qapi.o
> >  block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> >diff --git a/block/quorum.c b/block/quorum.c
> >index 5bf37b3..c319719 100644
> >--- a/block/quorum.c
> >+++ b/block/quorum.c
> >@@ -13,7 +13,43 @@
> >   * See the COPYING file in the top-level directory.
> >   */
> >+#include <gnutls/gnutls.h>
> >+#include <gnutls/crypto.h>
> >  #include "block/block_int.h"
> >+#include "qapi/qmp/qjson.h"
> >+
> >+#define HASH_LENGTH 32
> >+
> >+/* This union holds a vote hash value */
> >+typedef union QuorumVoteValue {
> >+    char h[HASH_LENGTH];       /* SHA-256 hash */
> >+    int64_t l;                 /* simpler 64 bits hash */
> >+} QuorumVoteValue;
> >+
> >+/* A vote item */
> >+typedef struct QuorumVoteItem {
> >+    int index;
> >+    QLIST_ENTRY(QuorumVoteItem) next;
> >+} QuorumVoteItem;
> >+
> >+/* this structure is a vote version. A version is the set of votes sharing the
> >+ * same vote value.
> >+ * The set of votes will be tracked with the items field and its cardinality is
> >+ * vote_count.
> >+ */
> >+typedef struct QuorumVoteVersion {
> >+    QuorumVoteValue value;
> >+    int index;
> >+    int vote_count;
> >+    QLIST_HEAD(, QuorumVoteItem) items;
> >+    QLIST_ENTRY(QuorumVoteVersion) next;
> >+} QuorumVoteVersion;
> >+
> >+/* this structure holds a group of vote versions together */
> >+typedef struct QuorumVotes {
> >+    QLIST_HEAD(, QuorumVoteVersion) vote_list;
> >+    int (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
> >+} QuorumVotes;
> >  /* the following structure holds the state of one quorum instance */
> >  typedef struct {
> >@@ -60,10 +96,14 @@ struct QuorumAIOCB {
> >      int success_count;          /* number of successfully completed AIOCB */
> >      bool *finished;             /* completion signal for cancel */
> >+    QuorumVotes votes;
> >+
> >      bool is_read;
> >      int vote_ret;
> >  };
> >+static void quorum_vote(QuorumAIOCB *acb);
> >+
> >  static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
> >  {
> >      QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
> >@@ -111,6 +151,10 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
> >          acb->aios[i].ret = 0;
> >      }
> >+    if (acb->vote_ret) {
> >+        ret = acb->vote_ret;
> >+    }
> 
> Hm, this makes the vote_ret take precedence over other errors
> returned by the children. If they differ (and both are not 0), we
> can choose between returning either (vote_ret or “real” errors
> reported by the child devices). Both are errors, so I don't see any
> natural precedence. However, generally, vote_ret will contain a more
> generic error code (i.e., -EIO), thus I could imagine the other
> error code reported by the child device to be more appropriate, as
> it might contain more useful information.
> 
> But, well, on the other had, Quorum is designed for hiding errors
> reported by a minority of child devices; therefore, hiding such
> errors in this case as well is probably just consequent.

Yes the general idea of quorum is to hide errors as long as the majority is reached.
For the case where too many children errors occured to be able to do a vote
there is:
    ret = s->threshold <= acb->success_count ? 0 : quorum_get_first_error(acb); 
Kevin asked me do return the first error instead of -EIO.

> 
> I'll leave this up to you; I'm fine with it as it is and I'd be fine
> if (for whatever reason) you were to change it. :-)
> 
> >+
> >      acb->common.cb(acb->common.opaque, ret);
> >      if (acb->finished) {
> >          *acb->finished = true;
> >@@ -122,6 +166,11 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
> >      qemu_aio_release(acb);
> >  }
> >+static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
> >+{
> >+    return memcmp(a->h, b->h, HASH_LENGTH);
> >+}
> >+
> >  static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
> >                                     BlockDriverState *bs,
> >                                     QEMUIOVector *qiov,
> >@@ -141,6 +190,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
> >      acb->count = 0;
> >      acb->success_count = 0;
> >      acb->finished = NULL;
> >+    acb->votes.compare = quorum_sha256_compare;
> >      acb->is_read = false;
> >      acb->vote_ret = 0;
> >@@ -170,9 +220,290 @@ static void quorum_aio_cb(void *opaque, int ret)
> >          return;
> >      }
> >+    /* Do the vote on read */
> >+    if (acb->is_read) {
> >+        quorum_vote(acb);
> >+    }
> >+
> >      quorum_aio_finalize(acb);
> >  }
> >+static void quorum_report_bad(QuorumAIOCB *acb, char *node_name)
> >+{
> >+    QObject *data;
> >+    data = qobject_from_jsonf("{ 'node-name': \"%s\""
> >+                              ", 'sector-num': %" PRId64
> >+                              ", 'sectors-count': %i }",
> >+                              node_name,
> >+                              acb->sector_num,
> >+                              acb->nb_sectors);
> >+    monitor_protocol_event(QEVENT_QUORUM_REPORT_BAD, data);
> >+    qobject_decref(data);
> >+}
> >+
> >+static void quorum_report_failure(QuorumAIOCB *acb)
> >+{
> >+    QObject *data;
> >+    data = qobject_from_jsonf("{ 'sector-num': %" PRId64
> >+                              ", 'sectors-count': %i }",
> >+                              acb->sector_num,
> >+                              acb->nb_sectors);
> >+    monitor_protocol_event(QEVENT_QUORUM_FAILURE, data);
> >+    qobject_decref(data);
> >+}
> >+
> >+static void quorum_report_bad_versions(BDRVQuorumState *s,
> >+                                       QuorumAIOCB *acb,
> >+                                       QuorumVoteValue *value)
> >+{
> >+    QuorumVoteVersion *version;
> >+    QuorumVoteItem *item;
> >+
> >+    QLIST_FOREACH(version, &acb->votes.vote_list, next) {
> >+        if (!acb->votes.compare(&version->value, value)) {
> >+            continue;
> >+        }
> >+        QLIST_FOREACH(item, &version->items, next) {
> >+            quorum_report_bad(acb, s->bs[item->index]->node_name);
> >+        }
> >+    }
> >+}
> >+
> >+static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
> >+{
> >+    int i;
> >+    assert(dest->niov == source->niov);
> >+    assert(dest->size == source->size);
> >+    for (i = 0; i < source->niov; i++) {
> >+        assert(dest->iov[i].iov_len == source->iov[i].iov_len);
> >+        memcpy(dest->iov[i].iov_base,
> >+               source->iov[i].iov_base,
> >+               source->iov[i].iov_len);
> >+    }
> >+}
> >+
> >+static void quorum_count_vote(QuorumVotes *votes,
> >+                              QuorumVoteValue *value,
> >+                              int index)
> >+{
> >+    QuorumVoteVersion *v = NULL, *version = NULL;
> >+    QuorumVoteItem *item;
> >+
> >+    /* look if we have something with this hash */
> >+    QLIST_FOREACH(v, &votes->vote_list, next) {
> >+        if (!votes->compare(&v->value, value)) {
> >+            version = v;
> >+            break;
> >+        }
> >+    }
> >+
> >+    /* It's a version not yet in the list add it */
> >+    if (!version) {
> >+        version = g_new0(QuorumVoteVersion, 1);
> >+        QLIST_INIT(&version->items);
> >+        memcpy(&version->value, value, sizeof(version->value));
> >+        version->index = index;
> >+        version->vote_count = 0;
> >+        QLIST_INSERT_HEAD(&votes->vote_list, version, next);
> >+    }
> >+
> >+    version->vote_count++;
> >+
> >+    item = g_new0(QuorumVoteItem, 1);
> >+    item->index = index;
> >+    QLIST_INSERT_HEAD(&version->items, item, next);
> >+}
> >+
> >+static void quorum_free_vote_list(QuorumVotes *votes)
> >+{
> >+    QuorumVoteVersion *version, *next_version;
> >+    QuorumVoteItem *item, *next_item;
> >+
> >+    QLIST_FOREACH_SAFE(version, &votes->vote_list, next, next_version) {
> >+        QLIST_REMOVE(version, next);
> >+        QLIST_FOREACH_SAFE(item, &version->items, next, next_item) {
> >+            QLIST_REMOVE(item, next);
> >+            g_free(item);
> >+        }
> >+        g_free(version);
> >+    }
> >+}
> >+
> >+static int quorum_compute_hash(QuorumAIOCB *acb, int i, QuorumVoteValue *hash)
> >+{
> >+    int j, ret;
> >+    gnutls_hash_hd_t dig;
> >+    QEMUIOVector *qiov = &acb->aios[i].qiov;
> >+
> >+    ret = gnutls_hash_init(&dig, GNUTLS_DIG_SHA256);
> >+
> >+    if (ret < 0) {
> >+        return ret;
> >+    }
> >+
> >+    for (j = 0; j < qiov->niov; j++) {
> >+        ret = gnutls_hash(dig, qiov->iov[j].iov_base, qiov->iov[j].iov_len);
> >+        if (ret < 0) {
> >+            break;
> >+        }
> >+    }
> >+
> >+    gnutls_hash_deinit(dig, (void *) hash);
> >+    return ret;
> >+}
> >+
> >+static QuorumVoteVersion *quorum_get_vote_winner(QuorumVotes *votes)
> >+{
> >+    int i = 0;
> >+    QuorumVoteVersion *candidate, *winner = NULL;
> >+
> >+    QLIST_FOREACH(candidate, &votes->vote_list, next) {
> >+        if (candidate->vote_count > i) {
> >+            i = candidate->vote_count;
> >+            winner = candidate;
> >+        }
> >+    }
> >+
> >+    return winner;
> >+}
> >+
> >+static bool quorum_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
> >+{
> >+    int i;
> >+    int result;
> >+
> >+    assert(a->niov == b->niov);
> >+    for (i = 0; i < a->niov; i++) {
> >+        assert(a->iov[i].iov_len == b->iov[i].iov_len);
> >+        result = memcmp(a->iov[i].iov_base,
> >+                        b->iov[i].iov_base,
> >+                        a->iov[i].iov_len);
> >+        if (result) {
> >+            return false;
> >+        }
> >+    }
> >+
> >+    return true;
> >+}
> >+
> >+static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
> >+                                          const char *fmt, ...)
> >+{
> >+    va_list ap;
> >+
> >+    va_start(ap, fmt);
> >+    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
> >+            acb->sector_num, acb->nb_sectors);
> >+    vfprintf(stderr, fmt, ap);
> >+    fprintf(stderr, "\n");
> >+    va_end(ap);
> >+    exit(1);
> >+}
> >+
> >+static bool quorum_compare(QuorumAIOCB *acb,
> >+                           QEMUIOVector *a,
> >+                           QEMUIOVector *b)
> >+{
> >+    BDRVQuorumState *s = acb->bqs;
> >+    ssize_t offset;
> >+
> >+    /* This driver will replace blkverify in this particular case */
> >+    if (s->is_blkverify) {
> >+        offset = qemu_iovec_compare(a, b);
> >+        if (offset != -1) {
> >+            quorum_err(acb, "contents mismatch in sector %" PRId64,
> >+                       acb->sector_num +
> >+                       (uint64_t)(offset / BDRV_SECTOR_SIZE));
> >+        }
> >+        return true;
> >+    }
> >+
> >+    return quorum_iovec_compare(a, b);
> >+}
> >+
> >+static void quorum_vote(QuorumAIOCB *acb)
> >+{
> >+    bool quorum = false;
> >+    int i, j, ret;
> >+    QuorumVoteValue hash;
> >+    BDRVQuorumState *s = acb->bqs;
> >+    QuorumVoteVersion *winner;
> >+
> >+    QLIST_INIT(&acb->votes.vote_list);
> >+
> >+    /* if we don't get any successful read use the first error code */
> >+    if (!acb->success_count) {
> >+        acb->vote_ret = acb->aios[0].ret;
> >+        return;
> >+    }
> >+
> >+    /* get the index of the first successful read (we are sure to find one) */
> >+    for (i = 0; i < s->total; i++) {
> >+        if (!acb->aios[i].ret) {
> >+            break;
> >+        }
> >+    }
> >+
> >+    /* compare this read with all other successful read looking for quorum */
> >+    for (j = i + 1; j < s->total; j++) {
> >+        if (acb->aios[j].ret) {
> >+            continue;
> >+        }
> >+        quorum = quorum_compare(acb, &acb->aios[i].qiov, &acb->aios[j].qiov);
> >+        if (!quorum) {
> >+            break;
> >+       }
> >+    }
> >+
> >+    /* Every successful read agrees and their count is higher or equal threshold
> >+     * -> Quorum
> >+     */
> >+    if (quorum && acb->success_count >= s->threshold) {
> >+        quorum_copy_qiov(acb->qiov, &acb->aios[i].qiov);
> >+        return;
> >+    }
> 
> Well, for quorum && acb->success_count < s->threshold, calculating
> and comparing hashes doesn't help at all (since the vote winner will
> only be able to get up to acb->success_count votes anyway). We
> should probably catch this case in the first if condition in this
> function (change it to “if (acb->success_count < s->threshold)”).
> 
> >+
> >+    /* compute hashs for each successful read, also store indexes */
> >+    for (i = 0; i < s->total; i++) {
> >+        if (acb->aios[i].ret) {
> >+            continue;
> >+        }
> >+        ret = quorum_compute_hash(acb, i, &hash);
> >+        /* if ever the hash computation failed */
> >+        if (ret < 0) {
> >+            acb->vote_ret = ret;
> >+            goto free_exit;
> >+        }
> >+        quorum_count_vote(&acb->votes, &hash, i);
> >+    }
> >+
> >+    /* vote to select the most represented version */
> >+    winner = quorum_get_vote_winner(&acb->votes);
> >+    /* every vote version are differents -> error */
> >+    if (!winner) {
> >+        quorum_report_failure(acb);
> >+        acb->vote_ret = -EIO;
> >+        goto free_exit;
> >+    }
> >+
> >+    /* if the winner count is smaller than threshold the read fails */
> >+    if (winner->vote_count < s->threshold) {
> >+        quorum_report_failure(acb);
> >+        acb->vote_ret = -EIO;
> >+        goto free_exit;
> >+    }
> >+
> >+    /* we have a winner: copy it */
> >+    quorum_copy_qiov(acb->qiov, &acb->aios[winner->index].qiov);
> >+
> >+    /* some versions are bad print them */
> >+    quorum_report_bad_versions(s, acb, &winner->value);
> >+
> >+free_exit:
> >+    /* free lists */
> >+    quorum_free_vote_list(&acb->votes);
> >+}
> >+
> >  static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
> >                                           int64_t sector_num,
> >                                           QEMUIOVector *qiov,
> >@@ -194,7 +525,7 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
> >      }
> >      for (i = 0; i < s->total; i++) {
> >-        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
> >+        bdrv_aio_readv(s->bs[i], sector_num, &acb->aios[i].qiov, nb_sectors,
> >                         quorum_aio_cb, &acb->aios[i]);
> >      }
> >diff --git a/configure b/configure
> >index b472694..5a68738 100755
> >--- a/configure
> >+++ b/configure
> >@@ -263,6 +263,7 @@ gtkabi="2.0"
> >  tpm="no"
> >  libssh2=""
> >  vhdx=""
> >+quorum="no"
> >  # parse CC options first
> >  for opt do
> >@@ -1000,6 +1001,10 @@ for opt do
> >    ;;
> >    --disable-vhdx) vhdx="no"
> >    ;;
> >+  --disable-quorum) quorum="no"
> >+  ;;
> >+  --enable-quorum) quorum="yes"
> >+  ;;
> >    *) echo "ERROR: unknown option $opt"; show_help="yes"
> >    ;;
> >    esac
> >@@ -1254,6 +1259,8 @@ Advanced options (experts only):
> >    --enable-libssh2         enable ssh block device support
> >    --disable-vhdx           disables support for the Microsoft VHDX image format
> >    --enable-vhdx            enable support for the Microsoft VHDX image format
> >+  --disable-quorum         disable quorum block filter support
> >+  --enable-quorum          enable quorum block filter support
> >  NOTE: The object files are built at the place where configure is launched
> >  EOF
> >@@ -1923,6 +1930,30 @@ EOF
> >  fi
> >  ##########################################
> >+# Quorum probe (check for gnutls)
> >+if test "$quorum" != "no" ; then
> >+cat > $TMPC <<EOF
> >+#include <gnutls/gnutls.h>
> >+#include <gnutls/crypto.h>
> >+int main(void) {char data[4096], digest[32];
> >+gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
> >+return 0;
> >+}
> >+EOF
> >+quorum_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
> >+quorum_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
> >+if compile_prog "$quorum_tls_cflags" "$quorum_tls_libs" ; then
> >+  qcow_tls=yes
> >+  libs_softmmu="$quorum_tls_libs $libs_softmmu"
> >+  libs_tools="$quorum_tls_libs $libs_softmmu"
> >+  QEMU_CFLAGS="$QEMU_CFLAGS $quorum_tls_cflags"
> >+else
> >+  echo "gnutls > 2.10.0 required to compile Quorum"
> >+  exit 1
> >+fi
> >+fi
> >+
> >+##########################################
> >  # VNC SASL detection
> >  if test "$vnc" = "yes" -a "$vnc_sasl" != "no" ; then
> >    cat > $TMPC <<EOF
> >@@ -3843,6 +3874,7 @@ echo "libssh2 support   $libssh2"
> >  echo "TPM passthrough   $tpm_passthrough"
> >  echo "QOM debugging     $qom_cast_debug"
> >  echo "vhdx              $vhdx"
> >+echo "Quorum            $quorum"
> >  if test "$sdl_too_old" = "yes"; then
> >  echo "-> Your SDL version is too old - please upgrade to have SDL support"
> >@@ -4241,6 +4273,10 @@ if test "$libssh2" = "yes" ; then
> >    echo "CONFIG_LIBSSH2=y" >> $config_host_mak
> >  fi
> >+if test "$quorum" = "yes" ; then
> >+  echo "CONFIG_QUORUM=y" >> $config_host_mak
> >+fi
> >+
> >  if test "$virtio_blk_data_plane" = "yes" ; then
> >    echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
> >  fi
> >diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
> >index 6b87e97..1cbdde7 100644
> >--- a/docs/qmp/qmp-events.txt
> >+++ b/docs/qmp/qmp-events.txt
> >@@ -500,3 +500,36 @@ Example:
> >  Note: If action is "reset", "shutdown", or "pause" the WATCHDOG event is
> >  followed respectively by the RESET, SHUTDOWN, or STOP events.
> >+
> >+QUORUM_FAILURE
> >+--------------
> >+
> >+Emitted by the Quorum block driver when it fail to establish a quorum.
> 
> Probably rather “if” than “when”; also, “fails”.
> 
> >+
> >+Data:
> >+
> >+- "sector-num":   Number of the first sector of the failed read operation.
> >+- "sector-count": Failed read operation sector count.
> >+
> >+Example:
> >+
> >+{ "event": "QUORUM_FAILURE",
> >+     "data": { "sector-num": 345435, "sector-count": 5 },
> >+     "timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
> >+
> >+QUORUM_REPORT_BAD
> >+-----------------
> >+
> >+Emitted to report a corruption of a Quorum file.
> >+
> >+Data:
> >+
> >+- "node-name":  The graph node name of the block driver state.
> 
> The description is not aligned with the others, this is probably due
> to the change from child-index to node-name…?
> 
> Max
> 
> >+- "sector-num":   Number of the first sector of the failed read operation.
> >+- "sector-count": Failed read operation sector count.
> >+
> >+Example:
> >+
> >+{ "event": "QUORUM_REPORT_BAD",
> >+     "data": { "node-name": "1.raw", "sector-num": 345435, "sector-count": 5 },
> >+     "timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
> >diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> >index 7e5f752..a49ea11 100644
> >--- a/include/monitor/monitor.h
> >+++ b/include/monitor/monitor.h
> >@@ -49,6 +49,8 @@ typedef enum MonitorEvent {
> >      QEVENT_SPICE_MIGRATE_COMPLETED,
> >      QEVENT_GUEST_PANICKED,
> >      QEVENT_BLOCK_IMAGE_CORRUPTED,
> >+    QEVENT_QUORUM_FAILURE,
> >+    QEVENT_QUORUM_REPORT_BAD,
> >      /* Add to 'monitor_event_names' array in monitor.c when
> >       * defining new events here */
> >diff --git a/monitor.c b/monitor.c
> >index 80456fb..f8f6aea 100644
> >--- a/monitor.c
> >+++ b/monitor.c
> >@@ -507,6 +507,8 @@ static const char *monitor_event_names[] = {
> >      [QEVENT_SPICE_MIGRATE_COMPLETED] = "SPICE_MIGRATE_COMPLETED",
> >      [QEVENT_GUEST_PANICKED] = "GUEST_PANICKED",
> >      [QEVENT_BLOCK_IMAGE_CORRUPTED] = "BLOCK_IMAGE_CORRUPTED",
> >+    [QEVENT_QUORUM_FAILURE] = "QUORUM_FAILURE",
> >+    [QEVENT_QUORUM_REPORT_BAD] = "QUORUM_REPORT_BAD",
> >  };
> >  QEMU_BUILD_BUG_ON(ARRAY_SIZE(monitor_event_names) != QEVENT_MAX)
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 09/13] quorum: Add quorum_co_get_block_status.
  2014-02-02 21:44   ` Max Reitz
@ 2014-02-03 11:47     ` Benoît Canet
  0 siblings, 0 replies; 36+ messages in thread
From: Benoît Canet @ 2014-02-03 11:47 UTC (permalink / raw)
  To: Max Reitz; +Cc: Benoît Canet, kwolf, qemu-devel, stefanha

Le Sunday 02 Feb 2014 à 22:44:07 (+0100), Max Reitz a écrit :
> On 28.01.2014 17:52, Benoît Canet wrote:
> >From: Benoît Canet <benoit@irqsave.net>
> >
> >Signed-off-by: Benoit Canet <benoit@irqsave.net>
> >---
> >  block/quorum.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 67 insertions(+)
> >
> >diff --git a/block/quorum.c b/block/quorum.c
> >index a47cd33..9b0718b 100644
> >--- a/block/quorum.c
> >+++ b/block/quorum.c
> >@@ -171,6 +171,22 @@ static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
> >      return memcmp(a->h, b->h, HASH_LENGTH);
> >  }
> >+static int quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
> >+{
> >+    int64_t i = a->l;
> >+    int64_t j = b->l;
> >+
> >+    if (i < j) {
> >+        return -1;
> >+    }
> >+
> >+    if (i > j) {
> >+        return 1;
> >+    }
> >+
> >+    return 0;
> >+}
> >+
> >  static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
> >                                     BlockDriverState *bs,
> >                                     QEMUIOVector *qiov,
> >@@ -587,6 +603,56 @@ static void quorum_invalidate_cache(BlockDriverState *bs)
> >      }
> >  }
> >+static int64_t coroutine_fn quorum_co_get_block_status(BlockDriverState *bs,
> >+                                                       int64_t sector_num,
> >+                                                       int nb_sectors,
> >+                                                       int *pnum)
> >+{
> >+    BDRVQuorumState *s = bs->opaque;
> >+    QuorumVoteVersion *winner = NULL;
> >+    QuorumVotes result_votes, num_votes;
> >+    QuorumVoteValue result_value, num_value;
> >+    int i, num;
> >+    int64_t result = 0;
> >+
> >+    QLIST_INIT(&result_votes.vote_list);
> >+    QLIST_INIT(&num_votes.vote_list);
> >+    result_votes.compare = quorum_64bits_compare;
> >+    num_votes.compare = quorum_64bits_compare;
> >+
> >+    for (i = 0; i < s->total; i++) {
> >+        result = bdrv_get_block_status(s->bs[i], sector_num, nb_sectors, &num);
> >+        /* skip failed requests */
> >+        if (result < 0) {
> >+            continue;
> >+        }
> >+        result_value.l = result & BDRV_BLOCK_DATA;
> >+        num_value.l = num;
> >+        quorum_count_vote(&result_votes, &result_value, i);
> >+        quorum_count_vote(&num_votes, &num_value, i);
> >+    }
> >+
> >+    winner = quorum_get_vote_winner(&result_votes);
> >+    result = winner->value.l;
> 
> Below, you're reading the winning value after checking whether it's
> corresponding votes exceeded the threshold. It doesn't matter in the
> end, but for the sake of uniformity, I'd do it the same way here
> (i.e., move this statement below the if block).
> 
> >+    if (winner->vote_count < s->threshold) {
> >+        result = -ERANGE;
> 
> Is there any specific reason why you're returning -ERANGE here and
> -EIO everywhere else (even in quorum_getlength())?

I probably though "It's a comparison so it return -ERANGE".
A bad reason so :(

> 
> Max
> 
> >+        goto free_exit;
> >+    }
> >+
> >+    winner = quorum_get_vote_winner(&num_votes);
> >+    if (winner->vote_count < s->threshold) {
> >+        result = -ERANGE;
> >+        goto free_exit;
> >+    }
> >+    *pnum = winner->value.l;
> >+
> >+free_exit:
> >+    quorum_free_vote_list(&result_votes);
> >+    quorum_free_vote_list(&num_votes);
> >+
> >+    return result;
> >+}
> >+
> >  static BlockDriver bdrv_quorum = {
> >      .format_name        = "quorum",
> >      .protocol_name      = "quorum",
> >@@ -598,6 +664,7 @@ static BlockDriver bdrv_quorum = {
> >      .bdrv_aio_readv     = quorum_aio_readv,
> >      .bdrv_aio_writev    = quorum_aio_writev,
> >      .bdrv_invalidate_cache = quorum_invalidate_cache,
> >+    .bdrv_co_get_block_status = quorum_co_get_block_status,
> >  };
> >  static void bdrv_quorum_init(void)
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 10/13] quorum: Add quorum_co_flush().
  2014-02-02 22:02   ` Max Reitz
@ 2014-02-03 11:57     ` Benoît Canet
  2014-02-03 19:04       ` Max Reitz
  0 siblings, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-02-03 11:57 UTC (permalink / raw)
  To: Max Reitz; +Cc: Benoît Canet, kwolf, qemu-devel, stefanha

Le Sunday 02 Feb 2014 à 23:02:57 (+0100), Max Reitz a écrit :
> On 28.01.2014 17:52, Benoît Canet wrote:
> >From: Benoît Canet <benoit@irqsave.net>
> >
> >Makes a vote to select error if any.
> >
> >Signed-off-by: Benoit Canet <benoit@irqsave.net>
> >---
> >  block/quorum.c | 34 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 34 insertions(+)
> >
> >diff --git a/block/quorum.c b/block/quorum.c
> >index 9b0718b..1b84b07 100644
> >--- a/block/quorum.c
> >+++ b/block/quorum.c
> >@@ -653,12 +653,46 @@ free_exit:
> >      return result;
> >  }
> >+static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
> >+{
> >+    BDRVQuorumState *s = bs->opaque;
> >+    QuorumVoteVersion *winner = NULL;
> >+    QuorumVotes error_votes;
> >+    QuorumVoteValue result_value;
> >+    int i;
> >+    int result = 0;
> >+    bool error = false;
> >+
> >+    QLIST_INIT(&error_votes.vote_list);
> >+    error_votes.compare = quorum_64bits_compare;
> >+
> >+    for (i = 0; i < s->total; i++) {
> >+        result = bdrv_co_flush(s->bs[i]);
> >+        if (result) {
> >+            error = true;
> >+            result_value.l = result;
> >+            quorum_count_vote(&error_votes, &result_value, i);
> >+        }
> >+    }
> >+
> >+    if (error) {
> >+        winner = quorum_get_vote_winner(&error_votes);
> >+        result = winner->value.l;
> >+    }
> >+
> >+    quorum_free_vote_list(&error_votes);
> >+
> >+    return result;
> >+}
> >+
> >  static BlockDriver bdrv_quorum = {
> >      .format_name        = "quorum",
> >      .protocol_name      = "quorum",
> >      .instance_size      = sizeof(BDRVQuorumState),
> >+    .bdrv_co_flush_to_disk = quorum_co_flush,
> >+
> >      .bdrv_getlength     = quorum_getlength,
> >      .bdrv_aio_readv     = quorum_aio_readv,
> 
> So, my general opinion on this patch (for reads/writes we don't vote
> on the error code either; so why here?) hasn't changed, but well, I
> definitely don't oppose it.
For the reads/writes Kevin asked me to return the first error.
For the flush function someone else (probably Eric) asked me to do voting.

> 
> Another problem, however: If any error occurs, this function will
> return an error as well. Is that intended? If an error on a
> read/write operation occurs but there are still enough successful
> reads/writes to reach quorum, no error is returned. Is there a
> reason why this should be different for flush?
If an error occurs multiple times and is able to establish quorum this winning
error will be returned.
Else 0 will be returned due to the vote result if the error is in minority.
So the behavior is similar as long quorum is reached.

> 
> Max

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 12/13] quorum: Add quorum_open() and quorum_close().
  2014-02-02 23:09   ` Max Reitz
@ 2014-02-03 12:21     ` Benoît Canet
  2014-02-03 12:49       ` Benoît Canet
  2014-02-03 12:43     ` Benoît Canet
  1 sibling, 1 reply; 36+ messages in thread
From: Benoît Canet @ 2014-02-03 12:21 UTC (permalink / raw)
  To: Max Reitz; +Cc: Benoît Canet, kwolf, qemu-devel, stefanha

Le Monday 03 Feb 2014 à 00:09:28 (+0100), Max Reitz a écrit :
> On 28.01.2014 17:52, Benoît Canet wrote:
> >From: Benoît Canet <benoit@irqsave.net>
> >
> >Example of command line:
> >-drive if=virtio,file.driver=quorum,\
> >file.children.0.file.filename=1.raw,\
> >file.children.0.node-name=1.raw,\
> >file.children.0.driver=raw,\
> >file.children.1.file.filename=2.raw,\
> >file.children.1.node-name=2.raw,\
> >file.children.1.driver=raw,\
> >file.children.2.file.filename=3.raw,\
> >file.children.2.node-name=3.raw,\
> >file.children.2.driver=raw,\
> >file.vote_threshold=2
> >
> >file.blkverify=on with file.vote_threshold=2 and two files can be passed to
> >emulated blkverify.
> >
> >Signed-off-by: Benoit Canet <benoit@irqsave.net>
> >---
> >  block/quorum.c   | 308 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  qapi-schema.json |  21 +++-
> >  2 files changed, 328 insertions(+), 1 deletion(-)
> >
> >diff --git a/block/quorum.c b/block/quorum.c
> >index e7b2090..0c0d630 100644
> >--- a/block/quorum.c
> >+++ b/block/quorum.c
> >@@ -17,8 +17,12 @@
> >  #include <gnutls/crypto.h>
> >  #include "block/block_int.h"
> >  #include "qapi/qmp/qjson.h"
> >+#include "qapi/qmp/types.h"
> >+#include "qemu-common.h"
> >  #define HASH_LENGTH 32
> >+#define KEY_PREFIX "children."
> >+#define KEY_FILENAME_SUFFIX ".file.filename"
> >  /* This union holds a vote hash value */
> >  typedef union QuorumVoteValue {
> >@@ -702,12 +706,316 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
> >      return false;
> >  }
> 
> Perhaps you could make use of qdict_extract_subqdict() and
> qdict_array_split() functions here...? That is,
> qdict_extract_subqdict(..., "children.") and then
> qdict_array_split() on the result?
> 
> >+static int quorum_match_key(const char *key,
> >+                            char **key_prefix)
> >+{
> >+    const char *start;
> >+    char *next;
> >+    unsigned long long idx;
> >+    int ret;
> >+
> >+    *key_prefix = NULL;
> >+
> >+    /* the following code is a translation of the following pseudo code:
> >+     *       match = key.match('(^children\.(\d+)\.)$suffix)
> >+     *       if not match:
> >+     *           return -1;
> >+     *       key_prefix = match.outer_match()
> >+     *       idx = match.inner_match()
> >+     *
> >+     * note: we also match the .file suffix to avoid futher checkings
> >+     */
> >+
> >+    /* this key is not a child */
> >+    if (strncmp(key, KEY_PREFIX, strlen(KEY_PREFIX))) {
> >+        return -1;
> >+    }
> >+
> >+    /* take first char after prefix */
> >+    start = key + strlen(KEY_PREFIX);
> >+
> >+    /* if the string end here -> scan fail */
> >+    if (start[0] == '\0') {
> >+        return -1;
> >+    }
> >+
> >+    /* try to retrieve the index */
> >+    ret = parse_uint(start, &idx, &next, 10);
> >+
> >+    /* no int found -> scan fail */
> >+    if (ret < 0) {
> >+        return -1;
> >+    }
> >+
> >+    /* we are taking a reference via QMP */
> >+    if (next - key == strlen(key)) {
> 
> "if (!*next)" would probably accomplish the same (or "if next[0] ==
> '\0')" to keep your coding style) without having to call strlen().
> 
> >+        *key_prefix = g_strdup(key);
> >+        return idx;
> >+    }
> >+
> >+    /* match the suffix to avoid matching the same idx
> >+     * multiple times and be required to do more checks later
> >+     */
> >+    if (strncmp(next, KEY_FILENAME_SUFFIX, strlen(KEY_FILENAME_SUFFIX))) {
> 
> As stated in my review from October, you may use sizeof() to avoid strlen().
> 
> >+        return -1;
> >+    }
> >+
> >+    /* do not include '.' */
> >+    int len = next - key;
> >+    *key_prefix = g_strndup(key, len);
> >+
> >+    return idx;
> >+}
> >+
> >+static QDict *quorum_get_children_idx(const QDict *options)
> >+{
> >+    const QDictEntry *ent;
> >+    QDict *result;
> >+    char *key_prefix;
> >+    int idx;
> >+
> >+    result = qdict_new();
> >+
> >+    for (ent = qdict_first(options); ent; ent = qdict_next(options, ent)) {
> >+        const char *key = qdict_entry_key(ent);
> >+        idx = quorum_match_key(key,
> >+                               &key_prefix);
> >+
> >+        /* if the result zero or positive we got a key */
> >+        if (idx < 0) {
> >+            continue;
> >+        }
> >+
> >+        qdict_put(result, key_prefix, qint_from_int(idx));
> >+    }
> >+
> >+    return result;
> >+}
> >+
> >+static int quorum_fill_validation_array(bool *array,
> >+                                        const QDict *dict,
> >+                                        int total,
> >+                                        Error **errp)
> >+{
> >+    const QDictEntry *ent;
> >+
> >+    /* fill the checking array with children indexes */
> >+    for (ent = qdict_first(dict); ent; ent = qdict_next(dict, ent)) {
> >+        const char *key = qdict_entry_key(ent);
> >+        int idx = qdict_get_int(dict, key);
> >+
> >+        if (idx < 0 || idx >= total) {
> >+            error_setg(errp,
> >+                "Children index must be between 0 and children count -1");
> 
> Quote from October: I'd prefer "- 1" instead of "-1" (or even "Child
> index must be an unsigned integer smaller than the children count").
> 
> >+            return -ERANGE;
> >+        }
> >+
> >+        array[idx] = true;
> >+    }
> >+
> >+    return 0;
> >+}
> >+
> >+static int quorum_valid_indexes(const bool *array, int total, Error **errp)
> >+{
> >+    int i;
> >+
> >+    for (i = 0; i < total; i++) {
> >+        if (array[i] == true) {
> >+            continue;
> >+        }
> >+
> >+        error_setg(errp,
> >+            "All child indexes between 0 and children count -1  must be "
> >+            " used");
> 
> Again, I'd prefer "- 1"; then, there are two spaces to the left of
> "must" and two spaces after "be".
> 
> >+        return -ERANGE;
> >+    }
> >+
> >+    return 0;
> >+}
> >+
> >+static int quorum_valid_children_indexes(const QDict *dict,
> >+                                         int total,
> >+                                         Error **errp)
> >+{
> >+    bool *array;
> >+    int ret = 0;;
> >+
> >+    /* allocate indexes checking array and put false in it */
> >+    array = g_new0(bool, total);
> >+
> >+    ret = quorum_fill_validation_array(array, dict, total, errp);
> >+    if (ret < 0) {
> >+        goto free_exit;
> >+    }
> >+
> >+    ret = quorum_valid_indexes(array, total, errp);
> >+free_exit:
> >+    g_free(array);
> >+    return ret;
> >+}
> >+
> >+static int quorum_valid_threshold(int threshold,
> >+                                  int total,
> >+                                  Error **errp)
> >+{
> >+
> >+    if (threshold < 1) {
> >+        error_set(errp, QERR_INVALID_PARAMETER_VALUE,
> >+                  "vote-threshold", "value >= 1");
> >+        return -ERANGE;
> >+    }
> >+
> >+    if (threshold > total) {
> >+        error_setg(errp, "threshold <= children count must be true");
> 
> Quote from October: Well, while this might technically be true, I'd
> rather go for "threshold may not exceed children count" instead. ;-)
> 
> >+        return -ERANGE;
> >+    }
> >+
> >+    return 0;
> >+}
> >+
> >+static int quorum_open(BlockDriverState *bs,
> >+                       QDict *options,
> >+                       int flags,
> >+                       Error **errp)
> >+{
> >+    BDRVQuorumState *s = bs->opaque;
> >+    Error *local_err = NULL;
> >+    const QDictEntry *ent;
> >+    QDict *idx_dict;
> >+    bool *opened;
> >+    const char *value;
> >+    char *next;
> >+    int i;
> >+    int ret = 0;
> >+
> >+    /* get a dict of children indexes for validation */
> >+    idx_dict = quorum_get_children_idx(options);
> >+
> >+    /* count how many different children indexes are present and validate */
> >+    s->total = qdict_size(idx_dict);
> >+    if (s->total < 2) {
> >+        error_setg(&local_err,
> >+                   "Number of provided children must be greater than 1");
> >+        ret = -EINVAL;
> >+        goto exit;
> >+    }
> >+
> >+    /* validate that the set of index is coherent */
> >+    ret = quorum_valid_children_indexes(idx_dict, s->total, &local_err);
> >+    if (ret < 0) {
> >+        goto exit;
> >+    }
> >+
> >+    ret = qdict_get_try_int(options, "vote-threshold", -1);
> >+    /* from QMP */
> >+    if (ret != -1) {
> >+        qdict_del(options, "vote-threshold");
> >+        s->threshold = ret;
> >+    /* from command line */
> >+    } else {
> >+        /* retrieve the threshold option from the command line */
> >+        value = qdict_get_try_str(options, "vote_threshold");
> >+        if (!value) {
> >+            error_setg(&local_err,
> >+                       "vote_threshold must be provided");
> >+            ret = -EINVAL;
> >+            goto exit;
> >+        }
> >+        qdict_del(options, "vote_threshold");
> >+
> >+        ret = parse_uint(value, (unsigned long long *) &s->threshold, &next, 10);
> 
> I don't like this cast at all...
> 
> >+
> >+        /* no int found -> scan fail */
> >+        if (ret < 0) {
> >+            error_setg(&local_err,
> >+                       "invalid voter_threshold specified");
> 
> *vote_threshold
> 
> >+            ret = -EINVAL;
> >+            goto exit;
> >+        }
> >+    }
> >+
> >+    /* and validate it againts s->total */
> 
> Still *against
> 
> >+    ret = quorum_valid_threshold(s->threshold, s->total, &local_err);
> >+    if (ret < 0) {
> >+        goto exit;
> >+    }
> >+
> >+    /* is the driver in blkverify mode */
> >+    value = qdict_get_try_str(options, "blkverify");
> >+    if (value && !strcmp(value, "on")  &&
> >+        s->total == 2 && s->threshold == 2) {
> >+        s->is_blkverify = true;
> >+    }
> 
> Quote from October: If the user specifies anything different from
> "on" or if he does and s->total or s->threshold is not 2, quorum
> will silently work in non-blkverify mode without telling the user.
> So I'd emit a message here if blkverify has been specified and its
> value is different from "off" but s->is_blkverify remains false
> (i.e., “else if (value && strcmp(value, "off")) { fprintf(stderr,
> "[Some warning]"); }”).
> 
> >+    qdict_del(options, "blkverify");
> >+
> >+    /* allocate the children BlockDriverState array */
> >+    s->bs = g_new0(BlockDriverState *, s->total);
> >+    opened = g_new0(bool, s->total);
> >+
> >+    /* open children bs */
> >+    for (ent = qdict_first(idx_dict);
> >+         ent; ent = qdict_next(idx_dict, ent)) {
> 
> This condition fits on a single line.
> 
> >+        const char *key = qdict_entry_key(ent);
> >+        int idx = qdict_get_int(idx_dict, key);
> >+        ret = bdrv_open_image(&s->bs[idx],
> >+                              NULL,
> >+                              options,
> >+                              key,
> >+                              flags,
> >+                              false,
> >+                              &local_err);
> 
> This takes two, but definitely not seven.
> 
> >+        if (ret < 0) {
> >+            goto close_exit;
> >+        }
> >+        opened[idx] = true;
> >+    }
> >+
> >+    g_free(opened);
> >+    goto exit;
> >+
> >+close_exit:
> >+    /* cleanup on error */
> >+    for (i = 0; i < s->total; i++) {
> >+        if (!opened[i]) {
> >+            continue;
> >+        }
> >+        bdrv_close(s->bs[i]);
> 
> You'll probably want bdrv_unref() here.

I don't think so because to simplify memory management s->bs is an array
containing all the allocated bs and is freed at once.
So should bdrv_unref still be used ?

> 
> >+    }
> >+    g_free(s->bs);
> >+    g_free(opened);
> >+exit:
> >+    /* propagate error */
> >+    if (error_is_set(&local_err)) {
> >+        error_propagate(errp, local_err);
> >+    }
> >+    return ret;
> >+}
> >+
> >+static void quorum_close(BlockDriverState *bs)
> >+{
> >+    BDRVQuorumState *s = bs->opaque;
> >+    int i;
> >+
> >+    for (i = 0; i < s->total; i++) {
> >+        /* Ensure writes reach stable storage */
> >+        bdrv_flush(s->bs[i]);
> >+        /* close manually because we'll free s->bs */
> >+        bdrv_close(s->bs[i]);
> 
> Again, bdrv_unref().
> 
> >+    }
> >+
> >+    g_free(s->bs);
> >+}
> >+
> >  static BlockDriver bdrv_quorum = {
> >      .format_name        = "quorum",
> >      .protocol_name      = "quorum",
> >      .instance_size      = sizeof(BDRVQuorumState),
> >+    .bdrv_file_open     = quorum_open,
> >+    .bdrv_close         = quorum_close,
> >+
> >      .bdrv_co_flush_to_disk = quorum_co_flush,
> >      .bdrv_getlength     = quorum_getlength,
> >diff --git a/qapi-schema.json b/qapi-schema.json
> >index 05ced9d..903a3a0 100644
> >--- a/qapi-schema.json
> >+++ b/qapi-schema.json
> >@@ -4352,6 +4352,24 @@
> >              'raw': 'BlockdevRef' } }
> >  ##
> >+# @BlockdevOptionsQuorum
> >+#
> >+# Driver specific block device options for Quorum
> >+#
> >+# @blkverify:      #optional true if the driver must print content mismatch
> >+#
> >+# @children:       the children block device to use
> >+#
> >+# @vote_threshold: the vote limit under which a read will fail
> >+#
> >+# Since: 2.0
> >+##
> >+{ 'type': 'BlockdevOptionsQuorum',
> >+  'data': { '*blkverify': 'bool',
> >+            'children': [ 'BlockdevRef' ],
> >+            'vote-threshold': 'int' } }
> >+
> >+##
> >  # @BlockdevOptions
> >  #
> >  # Options for creating a block device.
> >@@ -4389,7 +4407,8 @@
> >        'vdi':        'BlockdevOptionsGenericFormat',
> >        'vhdx':       'BlockdevOptionsGenericFormat',
> >        'vmdk':       'BlockdevOptionsGenericCOWFormat',
> >-      'vpc':        'BlockdevOptionsGenericFormat'
> >+      'vpc':        'BlockdevOptionsGenericFormat',
> >+      'quorum':     'BlockdevOptionsQuorum'
> >    } }
> >  ##
> 
> As I've said before, I'd really like it if you could abandon all
> those functions for parsing the incoming options QDict in favor of
> qdict_extract_subqdict() and qdict_array_split() which will most
> likely do the job just fine; and, in fact, I have written
> qdict_array_split() with Quorum in mind. ;-)
> 
> Max
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 12/13] quorum: Add quorum_open() and quorum_close().
  2014-02-02 23:09   ` Max Reitz
  2014-02-03 12:21     ` Benoît Canet
@ 2014-02-03 12:43     ` Benoît Canet
  1 sibling, 0 replies; 36+ messages in thread
From: Benoît Canet @ 2014-02-03 12:43 UTC (permalink / raw)
  To: Max Reitz; +Cc: Benoît Canet, kwolf, qemu-devel, stefanha

Le Monday 03 Feb 2014 à 00:09:28 (+0100), Max Reitz a écrit :
> On 28.01.2014 17:52, Benoît Canet wrote:
> >From: Benoît Canet <benoit@irqsave.net>
> >
> >Example of command line:
> >-drive if=virtio,file.driver=quorum,\
> >file.children.0.file.filename=1.raw,\
> >file.children.0.node-name=1.raw,\
> >file.children.0.driver=raw,\
> >file.children.1.file.filename=2.raw,\
> >file.children.1.node-name=2.raw,\
> >file.children.1.driver=raw,\
> >file.children.2.file.filename=3.raw,\
> >file.children.2.node-name=3.raw,\
> >file.children.2.driver=raw,\
> >file.vote_threshold=2
> >
> >file.blkverify=on with file.vote_threshold=2 and two files can be passed to
> >emulated blkverify.
> >
> >Signed-off-by: Benoit Canet <benoit@irqsave.net>
> >---
> >  block/quorum.c   | 308 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  qapi-schema.json |  21 +++-
> >  2 files changed, 328 insertions(+), 1 deletion(-)
> >
> >diff --git a/block/quorum.c b/block/quorum.c
> >index e7b2090..0c0d630 100644
> >--- a/block/quorum.c
> >+++ b/block/quorum.c
> >@@ -17,8 +17,12 @@
> >  #include <gnutls/crypto.h>
> >  #include "block/block_int.h"
> >  #include "qapi/qmp/qjson.h"
> >+#include "qapi/qmp/types.h"
> >+#include "qemu-common.h"
> >  #define HASH_LENGTH 32
> >+#define KEY_PREFIX "children."
> >+#define KEY_FILENAME_SUFFIX ".file.filename"
> >  /* This union holds a vote hash value */
> >  typedef union QuorumVoteValue {
> >@@ -702,12 +706,316 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
> >      return false;
> >  }
> 

As a general stance sorry to have missed this mail of review from october.
It was not done on purpose.

Best regards

Benoît

> Perhaps you could make use of qdict_extract_subqdict() and
> qdict_array_split() functions here...? That is,
> qdict_extract_subqdict(..., "children.") and then
> qdict_array_split() on the result?
> 
> >+static int quorum_match_key(const char *key,
> >+                            char **key_prefix)
> >+{
> >+    const char *start;
> >+    char *next;
> >+    unsigned long long idx;
> >+    int ret;
> >+
> >+    *key_prefix = NULL;
> >+
> >+    /* the following code is a translation of the following pseudo code:
> >+     *       match = key.match('(^children\.(\d+)\.)$suffix)
> >+     *       if not match:
> >+     *           return -1;
> >+     *       key_prefix = match.outer_match()
> >+     *       idx = match.inner_match()
> >+     *
> >+     * note: we also match the .file suffix to avoid futher checkings
> >+     */
> >+
> >+    /* this key is not a child */
> >+    if (strncmp(key, KEY_PREFIX, strlen(KEY_PREFIX))) {
> >+        return -1;
> >+    }
> >+
> >+    /* take first char after prefix */
> >+    start = key + strlen(KEY_PREFIX);
> >+
> >+    /* if the string end here -> scan fail */
> >+    if (start[0] == '\0') {
> >+        return -1;
> >+    }
> >+
> >+    /* try to retrieve the index */
> >+    ret = parse_uint(start, &idx, &next, 10);
> >+
> >+    /* no int found -> scan fail */
> >+    if (ret < 0) {
> >+        return -1;
> >+    }
> >+
> >+    /* we are taking a reference via QMP */
> >+    if (next - key == strlen(key)) {
> 
> "if (!*next)" would probably accomplish the same (or "if next[0] ==
> '\0')" to keep your coding style) without having to call strlen().
> 
> >+        *key_prefix = g_strdup(key);
> >+        return idx;
> >+    }
> >+
> >+    /* match the suffix to avoid matching the same idx
> >+     * multiple times and be required to do more checks later
> >+     */
> >+    if (strncmp(next, KEY_FILENAME_SUFFIX, strlen(KEY_FILENAME_SUFFIX))) {
> 
> As stated in my review from October, you may use sizeof() to avoid strlen().
> 
> >+        return -1;
> >+    }
> >+
> >+    /* do not include '.' */
> >+    int len = next - key;
> >+    *key_prefix = g_strndup(key, len);
> >+
> >+    return idx;
> >+}
> >+
> >+static QDict *quorum_get_children_idx(const QDict *options)
> >+{
> >+    const QDictEntry *ent;
> >+    QDict *result;
> >+    char *key_prefix;
> >+    int idx;
> >+
> >+    result = qdict_new();
> >+
> >+    for (ent = qdict_first(options); ent; ent = qdict_next(options, ent)) {
> >+        const char *key = qdict_entry_key(ent);
> >+        idx = quorum_match_key(key,
> >+                               &key_prefix);
> >+
> >+        /* if the result zero or positive we got a key */
> >+        if (idx < 0) {
> >+            continue;
> >+        }
> >+
> >+        qdict_put(result, key_prefix, qint_from_int(idx));
> >+    }
> >+
> >+    return result;
> >+}
> >+
> >+static int quorum_fill_validation_array(bool *array,
> >+                                        const QDict *dict,
> >+                                        int total,
> >+                                        Error **errp)
> >+{
> >+    const QDictEntry *ent;
> >+
> >+    /* fill the checking array with children indexes */
> >+    for (ent = qdict_first(dict); ent; ent = qdict_next(dict, ent)) {
> >+        const char *key = qdict_entry_key(ent);
> >+        int idx = qdict_get_int(dict, key);
> >+
> >+        if (idx < 0 || idx >= total) {
> >+            error_setg(errp,
> >+                "Children index must be between 0 and children count -1");
> 
> Quote from October: I'd prefer "- 1" instead of "-1" (or even "Child
> index must be an unsigned integer smaller than the children count").
> 
> >+            return -ERANGE;
> >+        }
> >+
> >+        array[idx] = true;
> >+    }
> >+
> >+    return 0;
> >+}
> >+
> >+static int quorum_valid_indexes(const bool *array, int total, Error **errp)
> >+{
> >+    int i;
> >+
> >+    for (i = 0; i < total; i++) {
> >+        if (array[i] == true) {
> >+            continue;
> >+        }
> >+
> >+        error_setg(errp,
> >+            "All child indexes between 0 and children count -1  must be "
> >+            " used");
> 
> Again, I'd prefer "- 1"; then, there are two spaces to the left of
> "must" and two spaces after "be".
> 
> >+        return -ERANGE;
> >+    }
> >+
> >+    return 0;
> >+}
> >+
> >+static int quorum_valid_children_indexes(const QDict *dict,
> >+                                         int total,
> >+                                         Error **errp)
> >+{
> >+    bool *array;
> >+    int ret = 0;;
> >+
> >+    /* allocate indexes checking array and put false in it */
> >+    array = g_new0(bool, total);
> >+
> >+    ret = quorum_fill_validation_array(array, dict, total, errp);
> >+    if (ret < 0) {
> >+        goto free_exit;
> >+    }
> >+
> >+    ret = quorum_valid_indexes(array, total, errp);
> >+free_exit:
> >+    g_free(array);
> >+    return ret;
> >+}
> >+
> >+static int quorum_valid_threshold(int threshold,
> >+                                  int total,
> >+                                  Error **errp)
> >+{
> >+
> >+    if (threshold < 1) {
> >+        error_set(errp, QERR_INVALID_PARAMETER_VALUE,
> >+                  "vote-threshold", "value >= 1");
> >+        return -ERANGE;
> >+    }
> >+
> >+    if (threshold > total) {
> >+        error_setg(errp, "threshold <= children count must be true");
> 
> Quote from October: Well, while this might technically be true, I'd
> rather go for "threshold may not exceed children count" instead. ;-)
> 
> >+        return -ERANGE;
> >+    }
> >+
> >+    return 0;
> >+}
> >+
> >+static int quorum_open(BlockDriverState *bs,
> >+                       QDict *options,
> >+                       int flags,
> >+                       Error **errp)
> >+{
> >+    BDRVQuorumState *s = bs->opaque;
> >+    Error *local_err = NULL;
> >+    const QDictEntry *ent;
> >+    QDict *idx_dict;
> >+    bool *opened;
> >+    const char *value;
> >+    char *next;
> >+    int i;
> >+    int ret = 0;
> >+
> >+    /* get a dict of children indexes for validation */
> >+    idx_dict = quorum_get_children_idx(options);
> >+
> >+    /* count how many different children indexes are present and validate */
> >+    s->total = qdict_size(idx_dict);
> >+    if (s->total < 2) {
> >+        error_setg(&local_err,
> >+                   "Number of provided children must be greater than 1");
> >+        ret = -EINVAL;
> >+        goto exit;
> >+    }
> >+
> >+    /* validate that the set of index is coherent */
> >+    ret = quorum_valid_children_indexes(idx_dict, s->total, &local_err);
> >+    if (ret < 0) {
> >+        goto exit;
> >+    }
> >+
> >+    ret = qdict_get_try_int(options, "vote-threshold", -1);
> >+    /* from QMP */
> >+    if (ret != -1) {
> >+        qdict_del(options, "vote-threshold");
> >+        s->threshold = ret;
> >+    /* from command line */
> >+    } else {
> >+        /* retrieve the threshold option from the command line */
> >+        value = qdict_get_try_str(options, "vote_threshold");
> >+        if (!value) {
> >+            error_setg(&local_err,
> >+                       "vote_threshold must be provided");
> >+            ret = -EINVAL;
> >+            goto exit;
> >+        }
> >+        qdict_del(options, "vote_threshold");
> >+
> >+        ret = parse_uint(value, (unsigned long long *) &s->threshold, &next, 10);
> 
> I don't like this cast at all...
> 
> >+
> >+        /* no int found -> scan fail */
> >+        if (ret < 0) {
> >+            error_setg(&local_err,
> >+                       "invalid voter_threshold specified");
> 
> *vote_threshold
> 
> >+            ret = -EINVAL;
> >+            goto exit;
> >+        }
> >+    }
> >+
> >+    /* and validate it againts s->total */
> 
> Still *against
> 
> >+    ret = quorum_valid_threshold(s->threshold, s->total, &local_err);
> >+    if (ret < 0) {
> >+        goto exit;
> >+    }
> >+
> >+    /* is the driver in blkverify mode */
> >+    value = qdict_get_try_str(options, "blkverify");
> >+    if (value && !strcmp(value, "on")  &&
> >+        s->total == 2 && s->threshold == 2) {
> >+        s->is_blkverify = true;
> >+    }
> 
> Quote from October: If the user specifies anything different from
> "on" or if he does and s->total or s->threshold is not 2, quorum
> will silently work in non-blkverify mode without telling the user.
> So I'd emit a message here if blkverify has been specified and its
> value is different from "off" but s->is_blkverify remains false
> (i.e., “else if (value && strcmp(value, "off")) { fprintf(stderr,
> "[Some warning]"); }”).
> 
> >+    qdict_del(options, "blkverify");
> >+
> >+    /* allocate the children BlockDriverState array */
> >+    s->bs = g_new0(BlockDriverState *, s->total);
> >+    opened = g_new0(bool, s->total);
> >+
> >+    /* open children bs */
> >+    for (ent = qdict_first(idx_dict);
> >+         ent; ent = qdict_next(idx_dict, ent)) {
> 
> This condition fits on a single line.
> 
> >+        const char *key = qdict_entry_key(ent);
> >+        int idx = qdict_get_int(idx_dict, key);
> >+        ret = bdrv_open_image(&s->bs[idx],
> >+                              NULL,
> >+                              options,
> >+                              key,
> >+                              flags,
> >+                              false,
> >+                              &local_err);
> 
> This takes two, but definitely not seven.
> 
> >+        if (ret < 0) {
> >+            goto close_exit;
> >+        }
> >+        opened[idx] = true;
> >+    }
> >+
> >+    g_free(opened);
> >+    goto exit;
> >+
> >+close_exit:
> >+    /* cleanup on error */
> >+    for (i = 0; i < s->total; i++) {
> >+        if (!opened[i]) {
> >+            continue;
> >+        }
> >+        bdrv_close(s->bs[i]);
> 
> You'll probably want bdrv_unref() here.
> 
> >+    }
> >+    g_free(s->bs);
> >+    g_free(opened);
> >+exit:
> >+    /* propagate error */
> >+    if (error_is_set(&local_err)) {
> >+        error_propagate(errp, local_err);
> >+    }
> >+    return ret;
> >+}
> >+
> >+static void quorum_close(BlockDriverState *bs)
> >+{
> >+    BDRVQuorumState *s = bs->opaque;
> >+    int i;
> >+
> >+    for (i = 0; i < s->total; i++) {
> >+        /* Ensure writes reach stable storage */
> >+        bdrv_flush(s->bs[i]);
> >+        /* close manually because we'll free s->bs */
> >+        bdrv_close(s->bs[i]);
> 
> Again, bdrv_unref().
> 
> >+    }
> >+
> >+    g_free(s->bs);
> >+}
> >+
> >  static BlockDriver bdrv_quorum = {
> >      .format_name        = "quorum",
> >      .protocol_name      = "quorum",
> >      .instance_size      = sizeof(BDRVQuorumState),
> >+    .bdrv_file_open     = quorum_open,
> >+    .bdrv_close         = quorum_close,
> >+
> >      .bdrv_co_flush_to_disk = quorum_co_flush,
> >      .bdrv_getlength     = quorum_getlength,
> >diff --git a/qapi-schema.json b/qapi-schema.json
> >index 05ced9d..903a3a0 100644
> >--- a/qapi-schema.json
> >+++ b/qapi-schema.json
> >@@ -4352,6 +4352,24 @@
> >              'raw': 'BlockdevRef' } }
> >  ##
> >+# @BlockdevOptionsQuorum
> >+#
> >+# Driver specific block device options for Quorum
> >+#
> >+# @blkverify:      #optional true if the driver must print content mismatch
> >+#
> >+# @children:       the children block device to use
> >+#
> >+# @vote_threshold: the vote limit under which a read will fail
> >+#
> >+# Since: 2.0
> >+##
> >+{ 'type': 'BlockdevOptionsQuorum',
> >+  'data': { '*blkverify': 'bool',
> >+            'children': [ 'BlockdevRef' ],
> >+            'vote-threshold': 'int' } }
> >+
> >+##
> >  # @BlockdevOptions
> >  #
> >  # Options for creating a block device.
> >@@ -4389,7 +4407,8 @@
> >        'vdi':        'BlockdevOptionsGenericFormat',
> >        'vhdx':       'BlockdevOptionsGenericFormat',
> >        'vmdk':       'BlockdevOptionsGenericCOWFormat',
> >-      'vpc':        'BlockdevOptionsGenericFormat'
> >+      'vpc':        'BlockdevOptionsGenericFormat',
> >+      'quorum':     'BlockdevOptionsQuorum'
> >    } }
> >  ##
> 
> As I've said before, I'd really like it if you could abandon all
> those functions for parsing the incoming options QDict in favor of
> qdict_extract_subqdict() and qdict_array_split() which will most
> likely do the job just fine; and, in fact, I have written
> qdict_array_split() with Quorum in mind. ;-)
> 
> Max
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 12/13] quorum: Add quorum_open() and quorum_close().
  2014-02-03 12:21     ` Benoît Canet
@ 2014-02-03 12:49       ` Benoît Canet
  0 siblings, 0 replies; 36+ messages in thread
From: Benoît Canet @ 2014-02-03 12:49 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha, Max Reitz

Le Monday 03 Feb 2014 à 13:21:18 (+0100), Benoît Canet a écrit :
> Le Monday 03 Feb 2014 à 00:09:28 (+0100), Max Reitz a écrit :
> > On 28.01.2014 17:52, Benoît Canet wrote:
> > >From: Benoît Canet <benoit@irqsave.net>
> > >
> > >Example of command line:
> > >-drive if=virtio,file.driver=quorum,\
> > >file.children.0.file.filename=1.raw,\
> > >file.children.0.node-name=1.raw,\
> > >file.children.0.driver=raw,\
> > >file.children.1.file.filename=2.raw,\
> > >file.children.1.node-name=2.raw,\
> > >file.children.1.driver=raw,\
> > >file.children.2.file.filename=3.raw,\
> > >file.children.2.node-name=3.raw,\
> > >file.children.2.driver=raw,\
> > >file.vote_threshold=2
> > >
> > >file.blkverify=on with file.vote_threshold=2 and two files can be passed to
> > >emulated blkverify.
> > >
> > >Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > >---
> > >  block/quorum.c   | 308 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  qapi-schema.json |  21 +++-
> > >  2 files changed, 328 insertions(+), 1 deletion(-)
> > >
> > >diff --git a/block/quorum.c b/block/quorum.c
> > >index e7b2090..0c0d630 100644
> > >--- a/block/quorum.c
> > >+++ b/block/quorum.c
> > >@@ -17,8 +17,12 @@
> > >  #include <gnutls/crypto.h>
> > >  #include "block/block_int.h"
> > >  #include "qapi/qmp/qjson.h"
> > >+#include "qapi/qmp/types.h"
> > >+#include "qemu-common.h"
> > >  #define HASH_LENGTH 32
> > >+#define KEY_PREFIX "children."
> > >+#define KEY_FILENAME_SUFFIX ".file.filename"
> > >  /* This union holds a vote hash value */
> > >  typedef union QuorumVoteValue {
> > >@@ -702,12 +706,316 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
> > >      return false;
> > >  }
> > 
> > Perhaps you could make use of qdict_extract_subqdict() and
> > qdict_array_split() functions here...? That is,
> > qdict_extract_subqdict(..., "children.") and then
> > qdict_array_split() on the result?
> > 
> > >+static int quorum_match_key(const char *key,
> > >+                            char **key_prefix)
> > >+{
> > >+    const char *start;
> > >+    char *next;
> > >+    unsigned long long idx;
> > >+    int ret;
> > >+
> > >+    *key_prefix = NULL;
> > >+
> > >+    /* the following code is a translation of the following pseudo code:
> > >+     *       match = key.match('(^children\.(\d+)\.)$suffix)
> > >+     *       if not match:
> > >+     *           return -1;
> > >+     *       key_prefix = match.outer_match()
> > >+     *       idx = match.inner_match()
> > >+     *
> > >+     * note: we also match the .file suffix to avoid futher checkings
> > >+     */
> > >+
> > >+    /* this key is not a child */
> > >+    if (strncmp(key, KEY_PREFIX, strlen(KEY_PREFIX))) {
> > >+        return -1;
> > >+    }
> > >+
> > >+    /* take first char after prefix */
> > >+    start = key + strlen(KEY_PREFIX);
> > >+
> > >+    /* if the string end here -> scan fail */
> > >+    if (start[0] == '\0') {
> > >+        return -1;
> > >+    }
> > >+
> > >+    /* try to retrieve the index */
> > >+    ret = parse_uint(start, &idx, &next, 10);
> > >+
> > >+    /* no int found -> scan fail */
> > >+    if (ret < 0) {
> > >+        return -1;
> > >+    }
> > >+
> > >+    /* we are taking a reference via QMP */
> > >+    if (next - key == strlen(key)) {
> > 
> > "if (!*next)" would probably accomplish the same (or "if next[0] ==
> > '\0')" to keep your coding style) without having to call strlen().
> > 
> > >+        *key_prefix = g_strdup(key);
> > >+        return idx;
> > >+    }
> > >+
> > >+    /* match the suffix to avoid matching the same idx
> > >+     * multiple times and be required to do more checks later
> > >+     */
> > >+    if (strncmp(next, KEY_FILENAME_SUFFIX, strlen(KEY_FILENAME_SUFFIX))) {
> > 
> > As stated in my review from October, you may use sizeof() to avoid strlen().
> > 
> > >+        return -1;
> > >+    }
> > >+
> > >+    /* do not include '.' */
> > >+    int len = next - key;
> > >+    *key_prefix = g_strndup(key, len);
> > >+
> > >+    return idx;
> > >+}
> > >+
> > >+static QDict *quorum_get_children_idx(const QDict *options)
> > >+{
> > >+    const QDictEntry *ent;
> > >+    QDict *result;
> > >+    char *key_prefix;
> > >+    int idx;
> > >+
> > >+    result = qdict_new();
> > >+
> > >+    for (ent = qdict_first(options); ent; ent = qdict_next(options, ent)) {
> > >+        const char *key = qdict_entry_key(ent);
> > >+        idx = quorum_match_key(key,
> > >+                               &key_prefix);
> > >+
> > >+        /* if the result zero or positive we got a key */
> > >+        if (idx < 0) {
> > >+            continue;
> > >+        }
> > >+
> > >+        qdict_put(result, key_prefix, qint_from_int(idx));
> > >+    }
> > >+
> > >+    return result;
> > >+}
> > >+
> > >+static int quorum_fill_validation_array(bool *array,
> > >+                                        const QDict *dict,
> > >+                                        int total,
> > >+                                        Error **errp)
> > >+{
> > >+    const QDictEntry *ent;
> > >+
> > >+    /* fill the checking array with children indexes */
> > >+    for (ent = qdict_first(dict); ent; ent = qdict_next(dict, ent)) {
> > >+        const char *key = qdict_entry_key(ent);
> > >+        int idx = qdict_get_int(dict, key);
> > >+
> > >+        if (idx < 0 || idx >= total) {
> > >+            error_setg(errp,
> > >+                "Children index must be between 0 and children count -1");
> > 
> > Quote from October: I'd prefer "- 1" instead of "-1" (or even "Child
> > index must be an unsigned integer smaller than the children count").
> > 
> > >+            return -ERANGE;
> > >+        }
> > >+
> > >+        array[idx] = true;
> > >+    }
> > >+
> > >+    return 0;
> > >+}
> > >+
> > >+static int quorum_valid_indexes(const bool *array, int total, Error **errp)
> > >+{
> > >+    int i;
> > >+
> > >+    for (i = 0; i < total; i++) {
> > >+        if (array[i] == true) {
> > >+            continue;
> > >+        }
> > >+
> > >+        error_setg(errp,
> > >+            "All child indexes between 0 and children count -1  must be "
> > >+            " used");
> > 
> > Again, I'd prefer "- 1"; then, there are two spaces to the left of
> > "must" and two spaces after "be".
> > 
> > >+        return -ERANGE;
> > >+    }
> > >+
> > >+    return 0;
> > >+}
> > >+
> > >+static int quorum_valid_children_indexes(const QDict *dict,
> > >+                                         int total,
> > >+                                         Error **errp)
> > >+{
> > >+    bool *array;
> > >+    int ret = 0;;
> > >+
> > >+    /* allocate indexes checking array and put false in it */
> > >+    array = g_new0(bool, total);
> > >+
> > >+    ret = quorum_fill_validation_array(array, dict, total, errp);
> > >+    if (ret < 0) {
> > >+        goto free_exit;
> > >+    }
> > >+
> > >+    ret = quorum_valid_indexes(array, total, errp);
> > >+free_exit:
> > >+    g_free(array);
> > >+    return ret;
> > >+}
> > >+
> > >+static int quorum_valid_threshold(int threshold,
> > >+                                  int total,
> > >+                                  Error **errp)
> > >+{
> > >+
> > >+    if (threshold < 1) {
> > >+        error_set(errp, QERR_INVALID_PARAMETER_VALUE,
> > >+                  "vote-threshold", "value >= 1");
> > >+        return -ERANGE;
> > >+    }
> > >+
> > >+    if (threshold > total) {
> > >+        error_setg(errp, "threshold <= children count must be true");
> > 
> > Quote from October: Well, while this might technically be true, I'd
> > rather go for "threshold may not exceed children count" instead. ;-)
> > 
> > >+        return -ERANGE;
> > >+    }
> > >+
> > >+    return 0;
> > >+}
> > >+
> > >+static int quorum_open(BlockDriverState *bs,
> > >+                       QDict *options,
> > >+                       int flags,
> > >+                       Error **errp)
> > >+{
> > >+    BDRVQuorumState *s = bs->opaque;
> > >+    Error *local_err = NULL;
> > >+    const QDictEntry *ent;
> > >+    QDict *idx_dict;
> > >+    bool *opened;
> > >+    const char *value;
> > >+    char *next;
> > >+    int i;
> > >+    int ret = 0;
> > >+
> > >+    /* get a dict of children indexes for validation */
> > >+    idx_dict = quorum_get_children_idx(options);
> > >+
> > >+    /* count how many different children indexes are present and validate */
> > >+    s->total = qdict_size(idx_dict);
> > >+    if (s->total < 2) {
> > >+        error_setg(&local_err,
> > >+                   "Number of provided children must be greater than 1");
> > >+        ret = -EINVAL;
> > >+        goto exit;
> > >+    }
> > >+
> > >+    /* validate that the set of index is coherent */
> > >+    ret = quorum_valid_children_indexes(idx_dict, s->total, &local_err);
> > >+    if (ret < 0) {
> > >+        goto exit;
> > >+    }
> > >+
> > >+    ret = qdict_get_try_int(options, "vote-threshold", -1);
> > >+    /* from QMP */
> > >+    if (ret != -1) {
> > >+        qdict_del(options, "vote-threshold");
> > >+        s->threshold = ret;
> > >+    /* from command line */
> > >+    } else {
> > >+        /* retrieve the threshold option from the command line */
> > >+        value = qdict_get_try_str(options, "vote_threshold");
> > >+        if (!value) {
> > >+            error_setg(&local_err,
> > >+                       "vote_threshold must be provided");
> > >+            ret = -EINVAL;
> > >+            goto exit;
> > >+        }
> > >+        qdict_del(options, "vote_threshold");
> > >+
> > >+        ret = parse_uint(value, (unsigned long long *) &s->threshold, &next, 10);
> > 
> > I don't like this cast at all...
> > 
> > >+
> > >+        /* no int found -> scan fail */
> > >+        if (ret < 0) {
> > >+            error_setg(&local_err,
> > >+                       "invalid voter_threshold specified");
> > 
> > *vote_threshold
> > 
> > >+            ret = -EINVAL;
> > >+            goto exit;
> > >+        }
> > >+    }
> > >+
> > >+    /* and validate it againts s->total */
> > 
> > Still *against
> > 
> > >+    ret = quorum_valid_threshold(s->threshold, s->total, &local_err);
> > >+    if (ret < 0) {
> > >+        goto exit;
> > >+    }
> > >+
> > >+    /* is the driver in blkverify mode */
> > >+    value = qdict_get_try_str(options, "blkverify");
> > >+    if (value && !strcmp(value, "on")  &&
> > >+        s->total == 2 && s->threshold == 2) {
> > >+        s->is_blkverify = true;
> > >+    }
> > 
> > Quote from October: If the user specifies anything different from
> > "on" or if he does and s->total or s->threshold is not 2, quorum
> > will silently work in non-blkverify mode without telling the user.
> > So I'd emit a message here if blkverify has been specified and its
> > value is different from "off" but s->is_blkverify remains false
> > (i.e., “else if (value && strcmp(value, "off")) { fprintf(stderr,
> > "[Some warning]"); }”).
> > 
> > >+    qdict_del(options, "blkverify");
> > >+
> > >+    /* allocate the children BlockDriverState array */
> > >+    s->bs = g_new0(BlockDriverState *, s->total);
> > >+    opened = g_new0(bool, s->total);
> > >+
> > >+    /* open children bs */
> > >+    for (ent = qdict_first(idx_dict);
> > >+         ent; ent = qdict_next(idx_dict, ent)) {
> > 
> > This condition fits on a single line.
> > 
> > >+        const char *key = qdict_entry_key(ent);
> > >+        int idx = qdict_get_int(idx_dict, key);
> > >+        ret = bdrv_open_image(&s->bs[idx],
> > >+                              NULL,
> > >+                              options,
> > >+                              key,
> > >+                              flags,
> > >+                              false,
> > >+                              &local_err);
> > 
> > This takes two, but definitely not seven.
> > 
> > >+        if (ret < 0) {
> > >+            goto close_exit;
> > >+        }
> > >+        opened[idx] = true;
> > >+    }
> > >+
> > >+    g_free(opened);
> > >+    goto exit;
> > >+
> > >+close_exit:
> > >+    /* cleanup on error */
> > >+    for (i = 0; i < s->total; i++) {
> > >+        if (!opened[i]) {
> > >+            continue;
> > >+        }
> > >+        bdrv_close(s->bs[i]);
> > 
> > You'll probably want bdrv_unref() here.
> 
> I don't think so because to simplify memory management s->bs is an array
> containing all the allocated bs and is freed at once.
> So should bdrv_unref still be used ?
I double checked you are right :)

> 
> > 
> > >+    }
> > >+    g_free(s->bs);
> > >+    g_free(opened);
> > >+exit:
> > >+    /* propagate error */
> > >+    if (error_is_set(&local_err)) {
> > >+        error_propagate(errp, local_err);
> > >+    }
> > >+    return ret;
> > >+}
> > >+
> > >+static void quorum_close(BlockDriverState *bs)
> > >+{
> > >+    BDRVQuorumState *s = bs->opaque;
> > >+    int i;
> > >+
> > >+    for (i = 0; i < s->total; i++) {
> > >+        /* Ensure writes reach stable storage */
> > >+        bdrv_flush(s->bs[i]);
> > >+        /* close manually because we'll free s->bs */
> > >+        bdrv_close(s->bs[i]);
> > 
> > Again, bdrv_unref().
> > 
> > >+    }
> > >+
> > >+    g_free(s->bs);
> > >+}
> > >+
> > >  static BlockDriver bdrv_quorum = {
> > >      .format_name        = "quorum",
> > >      .protocol_name      = "quorum",
> > >      .instance_size      = sizeof(BDRVQuorumState),
> > >+    .bdrv_file_open     = quorum_open,
> > >+    .bdrv_close         = quorum_close,
> > >+
> > >      .bdrv_co_flush_to_disk = quorum_co_flush,
> > >      .bdrv_getlength     = quorum_getlength,
> > >diff --git a/qapi-schema.json b/qapi-schema.json
> > >index 05ced9d..903a3a0 100644
> > >--- a/qapi-schema.json
> > >+++ b/qapi-schema.json
> > >@@ -4352,6 +4352,24 @@
> > >              'raw': 'BlockdevRef' } }
> > >  ##
> > >+# @BlockdevOptionsQuorum
> > >+#
> > >+# Driver specific block device options for Quorum
> > >+#
> > >+# @blkverify:      #optional true if the driver must print content mismatch
> > >+#
> > >+# @children:       the children block device to use
> > >+#
> > >+# @vote_threshold: the vote limit under which a read will fail
> > >+#
> > >+# Since: 2.0
> > >+##
> > >+{ 'type': 'BlockdevOptionsQuorum',
> > >+  'data': { '*blkverify': 'bool',
> > >+            'children': [ 'BlockdevRef' ],
> > >+            'vote-threshold': 'int' } }
> > >+
> > >+##
> > >  # @BlockdevOptions
> > >  #
> > >  # Options for creating a block device.
> > >@@ -4389,7 +4407,8 @@
> > >        'vdi':        'BlockdevOptionsGenericFormat',
> > >        'vhdx':       'BlockdevOptionsGenericFormat',
> > >        'vmdk':       'BlockdevOptionsGenericCOWFormat',
> > >-      'vpc':        'BlockdevOptionsGenericFormat'
> > >+      'vpc':        'BlockdevOptionsGenericFormat',
> > >+      'quorum':     'BlockdevOptionsQuorum'
> > >    } }
> > >  ##
> > 
> > As I've said before, I'd really like it if you could abandon all
> > those functions for parsing the incoming options QDict in favor of
> > qdict_extract_subqdict() and qdict_array_split() which will most
> > likely do the job just fine; and, in fact, I have written
> > qdict_array_split() with Quorum in mind. ;-)
> > 
> > Max
> > 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 06/13] quorum: Add quorum mechanism.
  2014-02-03 11:44     ` Benoît Canet
@ 2014-02-03 19:03       ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-02-03 19:03 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha

On 03.02.2014 12:44, Benoît Canet wrote:
> Le Saturday 01 Feb 2014 à 00:04:01 (+0100), Max Reitz a écrit :
>> On 28.01.2014 17:52, Benoît Canet wrote:
>>> From: Benoît Canet <benoit@irqsave.net>
>>>
>>> Use gnutls's SHA-256 to compare versions.
>>>
>>> Signed-off-by: Benoit Canet <benoit@irqsave.net>
>>> ---
>>>   block/Makefile.objs       |   2 +-
>>>   block/quorum.c            | 333 +++++++++++++++++++++++++++++++++++++++++++++-
>>>   configure                 |  36 +++++
>>>   docs/qmp/qmp-events.txt   |  33 +++++
>>>   include/monitor/monitor.h |   2 +
>>>   monitor.c                 |   2 +
>>>   6 files changed, 406 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>>> index a2650b9..4ca9d43 100644
>>> --- a/block/Makefile.objs
>>> +++ b/block/Makefile.objs
>>> @@ -3,7 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-c
>>>   block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>>   block-obj-y += qed-check.o
>>>   block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
>>> -block-obj-y += quorum.o
>>> +block-obj-$(CONFIG_QUORUM) += quorum.o
>>>   block-obj-y += parallels.o blkdebug.o blkverify.o
>>>   block-obj-y += snapshot.o qapi.o
>>>   block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
>>> diff --git a/block/quorum.c b/block/quorum.c
>>> index 5bf37b3..c319719 100644
>>> --- a/block/quorum.c
>>> +++ b/block/quorum.c
>>> @@ -13,7 +13,43 @@
>>>    * See the COPYING file in the top-level directory.
>>>    */
>>> +#include <gnutls/gnutls.h>
>>> +#include <gnutls/crypto.h>
>>>   #include "block/block_int.h"
>>> +#include "qapi/qmp/qjson.h"
>>> +
>>> +#define HASH_LENGTH 32
>>> +
>>> +/* This union holds a vote hash value */
>>> +typedef union QuorumVoteValue {
>>> +    char h[HASH_LENGTH];       /* SHA-256 hash */
>>> +    int64_t l;                 /* simpler 64 bits hash */
>>> +} QuorumVoteValue;
>>> +
>>> +/* A vote item */
>>> +typedef struct QuorumVoteItem {
>>> +    int index;
>>> +    QLIST_ENTRY(QuorumVoteItem) next;
>>> +} QuorumVoteItem;
>>> +
>>> +/* this structure is a vote version. A version is the set of votes sharing the
>>> + * same vote value.
>>> + * The set of votes will be tracked with the items field and its cardinality is
>>> + * vote_count.
>>> + */
>>> +typedef struct QuorumVoteVersion {
>>> +    QuorumVoteValue value;
>>> +    int index;
>>> +    int vote_count;
>>> +    QLIST_HEAD(, QuorumVoteItem) items;
>>> +    QLIST_ENTRY(QuorumVoteVersion) next;
>>> +} QuorumVoteVersion;
>>> +
>>> +/* this structure holds a group of vote versions together */
>>> +typedef struct QuorumVotes {
>>> +    QLIST_HEAD(, QuorumVoteVersion) vote_list;
>>> +    int (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
>>> +} QuorumVotes;
>>>   /* the following structure holds the state of one quorum instance */
>>>   typedef struct {
>>> @@ -60,10 +96,14 @@ struct QuorumAIOCB {
>>>       int success_count;          /* number of successfully completed AIOCB */
>>>       bool *finished;             /* completion signal for cancel */
>>> +    QuorumVotes votes;
>>> +
>>>       bool is_read;
>>>       int vote_ret;
>>>   };
>>> +static void quorum_vote(QuorumAIOCB *acb);
>>> +
>>>   static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
>>>   {
>>>       QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
>>> @@ -111,6 +151,10 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
>>>           acb->aios[i].ret = 0;
>>>       }
>>> +    if (acb->vote_ret) {
>>> +        ret = acb->vote_ret;
>>> +    }
>> Hm, this makes the vote_ret take precedence over other errors
>> returned by the children. If they differ (and both are not 0), we
>> can choose between returning either (vote_ret or “real” errors
>> reported by the child devices). Both are errors, so I don't see any
>> natural precedence. However, generally, vote_ret will contain a more
>> generic error code (i.e., -EIO), thus I could imagine the other
>> error code reported by the child device to be more appropriate, as
>> it might contain more useful information.
>>
>> But, well, on the other had, Quorum is designed for hiding errors
>> reported by a minority of child devices; therefore, hiding such
>> errors in this case as well is probably just consequent.
> Yes the general idea of quorum is to hide errors as long as the majority is reached.
> For the case where too many children errors occured to be able to do a vote
> there is:
>      ret = s->threshold <= acb->success_count ? 0 : quorum_get_first_error(acb);
> Kevin asked me do return the first error instead of -EIO.

That's right, but due to this condition here, that first error will be 
overwritten with acb->vote_ret (which will be -EIO, most probably).

Max

>> I'll leave this up to you; I'm fine with it as it is and I'd be fine
>> if (for whatever reason) you were to change it. :-)
>>
>>> +
>>>       acb->common.cb(acb->common.opaque, ret);
>>>       if (acb->finished) {
>>>           *acb->finished = true;
>>> @@ -122,6 +166,11 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
>>>       qemu_aio_release(acb);
>>>   }
>>> +static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
>>> +{
>>> +    return memcmp(a->h, b->h, HASH_LENGTH);
>>> +}
>>> +
>>>   static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
>>>                                      BlockDriverState *bs,
>>>                                      QEMUIOVector *qiov,
>>> @@ -141,6 +190,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
>>>       acb->count = 0;
>>>       acb->success_count = 0;
>>>       acb->finished = NULL;
>>> +    acb->votes.compare = quorum_sha256_compare;
>>>       acb->is_read = false;
>>>       acb->vote_ret = 0;
>>> @@ -170,9 +220,290 @@ static void quorum_aio_cb(void *opaque, int ret)
>>>           return;
>>>       }
>>> +    /* Do the vote on read */
>>> +    if (acb->is_read) {
>>> +        quorum_vote(acb);
>>> +    }
>>> +
>>>       quorum_aio_finalize(acb);
>>>   }
>>> +static void quorum_report_bad(QuorumAIOCB *acb, char *node_name)
>>> +{
>>> +    QObject *data;
>>> +    data = qobject_from_jsonf("{ 'node-name': \"%s\""
>>> +                              ", 'sector-num': %" PRId64
>>> +                              ", 'sectors-count': %i }",
>>> +                              node_name,
>>> +                              acb->sector_num,
>>> +                              acb->nb_sectors);
>>> +    monitor_protocol_event(QEVENT_QUORUM_REPORT_BAD, data);
>>> +    qobject_decref(data);
>>> +}
>>> +
>>> +static void quorum_report_failure(QuorumAIOCB *acb)
>>> +{
>>> +    QObject *data;
>>> +    data = qobject_from_jsonf("{ 'sector-num': %" PRId64
>>> +                              ", 'sectors-count': %i }",
>>> +                              acb->sector_num,
>>> +                              acb->nb_sectors);
>>> +    monitor_protocol_event(QEVENT_QUORUM_FAILURE, data);
>>> +    qobject_decref(data);
>>> +}
>>> +
>>> +static void quorum_report_bad_versions(BDRVQuorumState *s,
>>> +                                       QuorumAIOCB *acb,
>>> +                                       QuorumVoteValue *value)
>>> +{
>>> +    QuorumVoteVersion *version;
>>> +    QuorumVoteItem *item;
>>> +
>>> +    QLIST_FOREACH(version, &acb->votes.vote_list, next) {
>>> +        if (!acb->votes.compare(&version->value, value)) {
>>> +            continue;
>>> +        }
>>> +        QLIST_FOREACH(item, &version->items, next) {
>>> +            quorum_report_bad(acb, s->bs[item->index]->node_name);
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
>>> +{
>>> +    int i;
>>> +    assert(dest->niov == source->niov);
>>> +    assert(dest->size == source->size);
>>> +    for (i = 0; i < source->niov; i++) {
>>> +        assert(dest->iov[i].iov_len == source->iov[i].iov_len);
>>> +        memcpy(dest->iov[i].iov_base,
>>> +               source->iov[i].iov_base,
>>> +               source->iov[i].iov_len);
>>> +    }
>>> +}
>>> +
>>> +static void quorum_count_vote(QuorumVotes *votes,
>>> +                              QuorumVoteValue *value,
>>> +                              int index)
>>> +{
>>> +    QuorumVoteVersion *v = NULL, *version = NULL;
>>> +    QuorumVoteItem *item;
>>> +
>>> +    /* look if we have something with this hash */
>>> +    QLIST_FOREACH(v, &votes->vote_list, next) {
>>> +        if (!votes->compare(&v->value, value)) {
>>> +            version = v;
>>> +            break;
>>> +        }
>>> +    }
>>> +
>>> +    /* It's a version not yet in the list add it */
>>> +    if (!version) {
>>> +        version = g_new0(QuorumVoteVersion, 1);
>>> +        QLIST_INIT(&version->items);
>>> +        memcpy(&version->value, value, sizeof(version->value));
>>> +        version->index = index;
>>> +        version->vote_count = 0;
>>> +        QLIST_INSERT_HEAD(&votes->vote_list, version, next);
>>> +    }
>>> +
>>> +    version->vote_count++;
>>> +
>>> +    item = g_new0(QuorumVoteItem, 1);
>>> +    item->index = index;
>>> +    QLIST_INSERT_HEAD(&version->items, item, next);
>>> +}
>>> +
>>> +static void quorum_free_vote_list(QuorumVotes *votes)
>>> +{
>>> +    QuorumVoteVersion *version, *next_version;
>>> +    QuorumVoteItem *item, *next_item;
>>> +
>>> +    QLIST_FOREACH_SAFE(version, &votes->vote_list, next, next_version) {
>>> +        QLIST_REMOVE(version, next);
>>> +        QLIST_FOREACH_SAFE(item, &version->items, next, next_item) {
>>> +            QLIST_REMOVE(item, next);
>>> +            g_free(item);
>>> +        }
>>> +        g_free(version);
>>> +    }
>>> +}
>>> +
>>> +static int quorum_compute_hash(QuorumAIOCB *acb, int i, QuorumVoteValue *hash)
>>> +{
>>> +    int j, ret;
>>> +    gnutls_hash_hd_t dig;
>>> +    QEMUIOVector *qiov = &acb->aios[i].qiov;
>>> +
>>> +    ret = gnutls_hash_init(&dig, GNUTLS_DIG_SHA256);
>>> +
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    for (j = 0; j < qiov->niov; j++) {
>>> +        ret = gnutls_hash(dig, qiov->iov[j].iov_base, qiov->iov[j].iov_len);
>>> +        if (ret < 0) {
>>> +            break;
>>> +        }
>>> +    }
>>> +
>>> +    gnutls_hash_deinit(dig, (void *) hash);
>>> +    return ret;
>>> +}
>>> +
>>> +static QuorumVoteVersion *quorum_get_vote_winner(QuorumVotes *votes)
>>> +{
>>> +    int i = 0;
>>> +    QuorumVoteVersion *candidate, *winner = NULL;
>>> +
>>> +    QLIST_FOREACH(candidate, &votes->vote_list, next) {
>>> +        if (candidate->vote_count > i) {
>>> +            i = candidate->vote_count;
>>> +            winner = candidate;
>>> +        }
>>> +    }
>>> +
>>> +    return winner;
>>> +}
>>> +
>>> +static bool quorum_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
>>> +{
>>> +    int i;
>>> +    int result;
>>> +
>>> +    assert(a->niov == b->niov);
>>> +    for (i = 0; i < a->niov; i++) {
>>> +        assert(a->iov[i].iov_len == b->iov[i].iov_len);
>>> +        result = memcmp(a->iov[i].iov_base,
>>> +                        b->iov[i].iov_base,
>>> +                        a->iov[i].iov_len);
>>> +        if (result) {
>>> +            return false;
>>> +        }
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>> +
>>> +static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
>>> +                                          const char *fmt, ...)
>>> +{
>>> +    va_list ap;
>>> +
>>> +    va_start(ap, fmt);
>>> +    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
>>> +            acb->sector_num, acb->nb_sectors);
>>> +    vfprintf(stderr, fmt, ap);
>>> +    fprintf(stderr, "\n");
>>> +    va_end(ap);
>>> +    exit(1);
>>> +}
>>> +
>>> +static bool quorum_compare(QuorumAIOCB *acb,
>>> +                           QEMUIOVector *a,
>>> +                           QEMUIOVector *b)
>>> +{
>>> +    BDRVQuorumState *s = acb->bqs;
>>> +    ssize_t offset;
>>> +
>>> +    /* This driver will replace blkverify in this particular case */
>>> +    if (s->is_blkverify) {
>>> +        offset = qemu_iovec_compare(a, b);
>>> +        if (offset != -1) {
>>> +            quorum_err(acb, "contents mismatch in sector %" PRId64,
>>> +                       acb->sector_num +
>>> +                       (uint64_t)(offset / BDRV_SECTOR_SIZE));
>>> +        }
>>> +        return true;
>>> +    }
>>> +
>>> +    return quorum_iovec_compare(a, b);
>>> +}
>>> +
>>> +static void quorum_vote(QuorumAIOCB *acb)
>>> +{
>>> +    bool quorum = false;
>>> +    int i, j, ret;
>>> +    QuorumVoteValue hash;
>>> +    BDRVQuorumState *s = acb->bqs;
>>> +    QuorumVoteVersion *winner;
>>> +
>>> +    QLIST_INIT(&acb->votes.vote_list);
>>> +
>>> +    /* if we don't get any successful read use the first error code */
>>> +    if (!acb->success_count) {
>>> +        acb->vote_ret = acb->aios[0].ret;
>>> +        return;
>>> +    }
>>> +
>>> +    /* get the index of the first successful read (we are sure to find one) */
>>> +    for (i = 0; i < s->total; i++) {
>>> +        if (!acb->aios[i].ret) {
>>> +            break;
>>> +        }
>>> +    }
>>> +
>>> +    /* compare this read with all other successful read looking for quorum */
>>> +    for (j = i + 1; j < s->total; j++) {
>>> +        if (acb->aios[j].ret) {
>>> +            continue;
>>> +        }
>>> +        quorum = quorum_compare(acb, &acb->aios[i].qiov, &acb->aios[j].qiov);
>>> +        if (!quorum) {
>>> +            break;
>>> +       }
>>> +    }
>>> +
>>> +    /* Every successful read agrees and their count is higher or equal threshold
>>> +     * -> Quorum
>>> +     */
>>> +    if (quorum && acb->success_count >= s->threshold) {
>>> +        quorum_copy_qiov(acb->qiov, &acb->aios[i].qiov);
>>> +        return;
>>> +    }
>> Well, for quorum && acb->success_count < s->threshold, calculating
>> and comparing hashes doesn't help at all (since the vote winner will
>> only be able to get up to acb->success_count votes anyway). We
>> should probably catch this case in the first if condition in this
>> function (change it to “if (acb->success_count < s->threshold)”).
>>
>>> +
>>> +    /* compute hashs for each successful read, also store indexes */
>>> +    for (i = 0; i < s->total; i++) {
>>> +        if (acb->aios[i].ret) {
>>> +            continue;
>>> +        }
>>> +        ret = quorum_compute_hash(acb, i, &hash);
>>> +        /* if ever the hash computation failed */
>>> +        if (ret < 0) {
>>> +            acb->vote_ret = ret;
>>> +            goto free_exit;
>>> +        }
>>> +        quorum_count_vote(&acb->votes, &hash, i);
>>> +    }
>>> +
>>> +    /* vote to select the most represented version */
>>> +    winner = quorum_get_vote_winner(&acb->votes);
>>> +    /* every vote version are differents -> error */
>>> +    if (!winner) {
>>> +        quorum_report_failure(acb);
>>> +        acb->vote_ret = -EIO;
>>> +        goto free_exit;
>>> +    }
>>> +
>>> +    /* if the winner count is smaller than threshold the read fails */
>>> +    if (winner->vote_count < s->threshold) {
>>> +        quorum_report_failure(acb);
>>> +        acb->vote_ret = -EIO;
>>> +        goto free_exit;
>>> +    }
>>> +
>>> +    /* we have a winner: copy it */
>>> +    quorum_copy_qiov(acb->qiov, &acb->aios[winner->index].qiov);
>>> +
>>> +    /* some versions are bad print them */
>>> +    quorum_report_bad_versions(s, acb, &winner->value);
>>> +
>>> +free_exit:
>>> +    /* free lists */
>>> +    quorum_free_vote_list(&acb->votes);
>>> +}
>>> +
>>>   static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
>>>                                            int64_t sector_num,
>>>                                            QEMUIOVector *qiov,
>>> @@ -194,7 +525,7 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
>>>       }
>>>       for (i = 0; i < s->total; i++) {
>>> -        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
>>> +        bdrv_aio_readv(s->bs[i], sector_num, &acb->aios[i].qiov, nb_sectors,
>>>                          quorum_aio_cb, &acb->aios[i]);
>>>       }
>>> diff --git a/configure b/configure
>>> index b472694..5a68738 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -263,6 +263,7 @@ gtkabi="2.0"
>>>   tpm="no"
>>>   libssh2=""
>>>   vhdx=""
>>> +quorum="no"
>>>   # parse CC options first
>>>   for opt do
>>> @@ -1000,6 +1001,10 @@ for opt do
>>>     ;;
>>>     --disable-vhdx) vhdx="no"
>>>     ;;
>>> +  --disable-quorum) quorum="no"
>>> +  ;;
>>> +  --enable-quorum) quorum="yes"
>>> +  ;;
>>>     *) echo "ERROR: unknown option $opt"; show_help="yes"
>>>     ;;
>>>     esac
>>> @@ -1254,6 +1259,8 @@ Advanced options (experts only):
>>>     --enable-libssh2         enable ssh block device support
>>>     --disable-vhdx           disables support for the Microsoft VHDX image format
>>>     --enable-vhdx            enable support for the Microsoft VHDX image format
>>> +  --disable-quorum         disable quorum block filter support
>>> +  --enable-quorum          enable quorum block filter support
>>>   NOTE: The object files are built at the place where configure is launched
>>>   EOF
>>> @@ -1923,6 +1930,30 @@ EOF
>>>   fi
>>>   ##########################################
>>> +# Quorum probe (check for gnutls)
>>> +if test "$quorum" != "no" ; then
>>> +cat > $TMPC <<EOF
>>> +#include <gnutls/gnutls.h>
>>> +#include <gnutls/crypto.h>
>>> +int main(void) {char data[4096], digest[32];
>>> +gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
>>> +return 0;
>>> +}
>>> +EOF
>>> +quorum_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
>>> +quorum_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
>>> +if compile_prog "$quorum_tls_cflags" "$quorum_tls_libs" ; then
>>> +  qcow_tls=yes
>>> +  libs_softmmu="$quorum_tls_libs $libs_softmmu"
>>> +  libs_tools="$quorum_tls_libs $libs_softmmu"
>>> +  QEMU_CFLAGS="$QEMU_CFLAGS $quorum_tls_cflags"
>>> +else
>>> +  echo "gnutls > 2.10.0 required to compile Quorum"
>>> +  exit 1
>>> +fi
>>> +fi
>>> +
>>> +##########################################
>>>   # VNC SASL detection
>>>   if test "$vnc" = "yes" -a "$vnc_sasl" != "no" ; then
>>>     cat > $TMPC <<EOF
>>> @@ -3843,6 +3874,7 @@ echo "libssh2 support   $libssh2"
>>>   echo "TPM passthrough   $tpm_passthrough"
>>>   echo "QOM debugging     $qom_cast_debug"
>>>   echo "vhdx              $vhdx"
>>> +echo "Quorum            $quorum"
>>>   if test "$sdl_too_old" = "yes"; then
>>>   echo "-> Your SDL version is too old - please upgrade to have SDL support"
>>> @@ -4241,6 +4273,10 @@ if test "$libssh2" = "yes" ; then
>>>     echo "CONFIG_LIBSSH2=y" >> $config_host_mak
>>>   fi
>>> +if test "$quorum" = "yes" ; then
>>> +  echo "CONFIG_QUORUM=y" >> $config_host_mak
>>> +fi
>>> +
>>>   if test "$virtio_blk_data_plane" = "yes" ; then
>>>     echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
>>>   fi
>>> diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
>>> index 6b87e97..1cbdde7 100644
>>> --- a/docs/qmp/qmp-events.txt
>>> +++ b/docs/qmp/qmp-events.txt
>>> @@ -500,3 +500,36 @@ Example:
>>>   Note: If action is "reset", "shutdown", or "pause" the WATCHDOG event is
>>>   followed respectively by the RESET, SHUTDOWN, or STOP events.
>>> +
>>> +QUORUM_FAILURE
>>> +--------------
>>> +
>>> +Emitted by the Quorum block driver when it fail to establish a quorum.
>> Probably rather “if” than “when”; also, “fails”.
>>
>>> +
>>> +Data:
>>> +
>>> +- "sector-num":   Number of the first sector of the failed read operation.
>>> +- "sector-count": Failed read operation sector count.
>>> +
>>> +Example:
>>> +
>>> +{ "event": "QUORUM_FAILURE",
>>> +     "data": { "sector-num": 345435, "sector-count": 5 },
>>> +     "timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
>>> +
>>> +QUORUM_REPORT_BAD
>>> +-----------------
>>> +
>>> +Emitted to report a corruption of a Quorum file.
>>> +
>>> +Data:
>>> +
>>> +- "node-name":  The graph node name of the block driver state.
>> The description is not aligned with the others, this is probably due
>> to the change from child-index to node-name…?
>>
>> Max
>>
>>> +- "sector-num":   Number of the first sector of the failed read operation.
>>> +- "sector-count": Failed read operation sector count.
>>> +
>>> +Example:
>>> +
>>> +{ "event": "QUORUM_REPORT_BAD",
>>> +     "data": { "node-name": "1.raw", "sector-num": 345435, "sector-count": 5 },
>>> +     "timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
>>> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
>>> index 7e5f752..a49ea11 100644
>>> --- a/include/monitor/monitor.h
>>> +++ b/include/monitor/monitor.h
>>> @@ -49,6 +49,8 @@ typedef enum MonitorEvent {
>>>       QEVENT_SPICE_MIGRATE_COMPLETED,
>>>       QEVENT_GUEST_PANICKED,
>>>       QEVENT_BLOCK_IMAGE_CORRUPTED,
>>> +    QEVENT_QUORUM_FAILURE,
>>> +    QEVENT_QUORUM_REPORT_BAD,
>>>       /* Add to 'monitor_event_names' array in monitor.c when
>>>        * defining new events here */
>>> diff --git a/monitor.c b/monitor.c
>>> index 80456fb..f8f6aea 100644
>>> --- a/monitor.c
>>> +++ b/monitor.c
>>> @@ -507,6 +507,8 @@ static const char *monitor_event_names[] = {
>>>       [QEVENT_SPICE_MIGRATE_COMPLETED] = "SPICE_MIGRATE_COMPLETED",
>>>       [QEVENT_GUEST_PANICKED] = "GUEST_PANICKED",
>>>       [QEVENT_BLOCK_IMAGE_CORRUPTED] = "BLOCK_IMAGE_CORRUPTED",
>>> +    [QEVENT_QUORUM_FAILURE] = "QUORUM_FAILURE",
>>> +    [QEVENT_QUORUM_REPORT_BAD] = "QUORUM_REPORT_BAD",
>>>   };
>>>   QEMU_BUILD_BUG_ON(ARRAY_SIZE(monitor_event_names) != QEVENT_MAX)
>>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH V10 10/13] quorum: Add quorum_co_flush().
  2014-02-03 11:57     ` Benoît Canet
@ 2014-02-03 19:04       ` Max Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Max Reitz @ 2014-02-03 19:04 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha

On 03.02.2014 12:57, Benoît Canet wrote:
> Le Sunday 02 Feb 2014 à 23:02:57 (+0100), Max Reitz a écrit :
>> On 28.01.2014 17:52, Benoît Canet wrote:
>>> From: Benoît Canet <benoit@irqsave.net>
>>>
>>> Makes a vote to select error if any.
>>>
>>> Signed-off-by: Benoit Canet <benoit@irqsave.net>
>>> ---
>>>   block/quorum.c | 34 ++++++++++++++++++++++++++++++++++
>>>   1 file changed, 34 insertions(+)
>>>
>>> diff --git a/block/quorum.c b/block/quorum.c
>>> index 9b0718b..1b84b07 100644
>>> --- a/block/quorum.c
>>> +++ b/block/quorum.c
>>> @@ -653,12 +653,46 @@ free_exit:
>>>       return result;
>>>   }
>>> +static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
>>> +{
>>> +    BDRVQuorumState *s = bs->opaque;
>>> +    QuorumVoteVersion *winner = NULL;
>>> +    QuorumVotes error_votes;
>>> +    QuorumVoteValue result_value;
>>> +    int i;
>>> +    int result = 0;
>>> +    bool error = false;
>>> +
>>> +    QLIST_INIT(&error_votes.vote_list);
>>> +    error_votes.compare = quorum_64bits_compare;
>>> +
>>> +    for (i = 0; i < s->total; i++) {
>>> +        result = bdrv_co_flush(s->bs[i]);
>>> +        if (result) {
>>> +            error = true;
>>> +            result_value.l = result;
>>> +            quorum_count_vote(&error_votes, &result_value, i);
>>> +        }
>>> +    }
>>> +
>>> +    if (error) {
>>> +        winner = quorum_get_vote_winner(&error_votes);
>>> +        result = winner->value.l;
>>> +    }
>>> +
>>> +    quorum_free_vote_list(&error_votes);
>>> +
>>> +    return result;
>>> +}
>>> +
>>>   static BlockDriver bdrv_quorum = {
>>>       .format_name        = "quorum",
>>>       .protocol_name      = "quorum",
>>>       .instance_size      = sizeof(BDRVQuorumState),
>>> +    .bdrv_co_flush_to_disk = quorum_co_flush,
>>> +
>>>       .bdrv_getlength     = quorum_getlength,
>>>       .bdrv_aio_readv     = quorum_aio_readv,
>> So, my general opinion on this patch (for reads/writes we don't vote
>> on the error code either; so why here?) hasn't changed, but well, I
>> definitely don't oppose it.
> For the reads/writes Kevin asked me to return the first error.
> For the flush function someone else (probably Eric) asked me to do voting.
>
>> Another problem, however: If any error occurs, this function will
>> return an error as well. Is that intended? If an error on a
>> read/write operation occurs but there are still enough successful
>> reads/writes to reach quorum, no error is returned. Is there a
>> reason why this should be different for flush?
> If an error occurs multiple times and is able to establish quorum this winning
> error will be returned.
> Else 0 will be returned due to the vote result if the error is in minority.
> So the behavior is similar as long quorum is reached.

You're only counting results which are not 0. Therefore, 0 will never 
reach majority.

Max

>> Max

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-02-03 19:02 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-28 16:52 [Qemu-devel] [PATCH V10 00/13] Quorum block driver Benoît Canet
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 01/13] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
2014-01-31 21:10   ` Max Reitz
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 02/13] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
2014-01-31 21:11   ` Max Reitz
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 03/13] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
2014-01-31 21:21   ` Max Reitz
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 04/13] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
2014-01-31 21:25   ` Max Reitz
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 05/13] quorum: Add quorum_aio_readv Benoît Canet
2014-01-31 21:47   ` Max Reitz
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 06/13] quorum: Add quorum mechanism Benoît Canet
2014-01-31 23:04   ` Max Reitz
2014-02-03 11:44     ` Benoît Canet
2014-02-03 19:03       ` Max Reitz
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 07/13] quorum: Add quorum_getlength() Benoît Canet
2014-02-02 21:25   ` Max Reitz
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 08/13] quorum: Add quorum_invalidate_cache() Benoît Canet
2014-02-02 21:27   ` Max Reitz
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 09/13] quorum: Add quorum_co_get_block_status Benoît Canet
2014-02-02 21:44   ` Max Reitz
2014-02-03 11:47     ` Benoît Canet
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 10/13] quorum: Add quorum_co_flush() Benoît Canet
2014-02-02 22:02   ` Max Reitz
2014-02-03 11:57     ` Benoît Canet
2014-02-03 19:04       ` Max Reitz
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 11/13] quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum Benoît Canet
2014-02-02 22:31   ` Max Reitz
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 12/13] quorum: Add quorum_open() and quorum_close() Benoît Canet
2014-02-02 23:09   ` Max Reitz
2014-02-03 12:21     ` Benoît Canet
2014-02-03 12:49       ` Benoît Canet
2014-02-03 12:43     ` Benoît Canet
2014-01-28 16:52 ` [Qemu-devel] [PATCH V10 13/13] quorum: Add unit test Benoît Canet
2014-02-02 23:19   ` Max Reitz
2014-01-31 20:40 ` [Qemu-devel] [PATCH V10 00/13] Quorum block driver Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.