All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC V7 00/11] Quorum block filter
@ 2013-01-18 17:30 Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 01/11] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This patchset is rebased on top of "cutils: unsigned int parsing functions"
by "Eduardo Habkost".

This patchset create a block driver implementing a quorum using total qemu disk
images. Writes are mirrored on the $total files.
For the reading part the $total files are read at the same time and a vote is
done to determine if a qiov version is present $threshold or more times. It then
return this majority version to the upper layers.
When i < $threshold versions of the data are returned by the lower layer the
quorum is broken and the read return -EIO.

The goal of this patchset is to be turned in a QEMU block filter living just
above raw-*.c and below qcow2/qed when the required infrastructure will be done.

Main use of this feature will be people using NFS appliances which can be
subjected to bitflip errors.

This patchset can be used to replace blkverify and the out of tree blkmirror.

usage: -drive
file=quorum:threshold/total:image_1.raw:...:image_total.raw,if=virtio,cache=none

in this version:
    parse total and threshold with parse_uint [Eric]
    return proper qerrors in quorum_open [Eric]
    Use sha256 for comparing blocks [Eric]
    Update the rest of the voting function to the new way of doing [Benoît]

V6:
    fix commit message of "quorum: Add quorum_open() and quorum_close()." [Eric]
    return error after a vote in quorum_co_flush [Eric]
    Fix bitrot caused by headers and structures renaming [Benoît]
    initialize finished to NULL to prevent crash [Benoît]
    convert internal quorum code to uint64_t instead of int64_t [Benoît]

V5:

Eric Blake: revert back separator to ":"
            rewrite quorum_getlength

Benoît Canet: use memcmp to compare iovec excepted for the blkverify case
              use strstart to parse argument in open


Benoît Canet (11):
  quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
  quorum: Create BDRVQuorumState and BlkDriver and do init.
  quorum: Add quorum_open() and quorum_close().
  quorum: Add quorum_aio_writev and its dependencies.
  blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from
    blkverify.
  quorum: Add quorum_aio_readv.
  quorum: Add quorum mechanism.
  quorum: Add quorum_getlength().
  quorum: Add quorum_invalidate_cache().
  quorum: Add quorum_co_is_allocated.
  quorum: Add quorum_co_flush().

 block/Makefile.objs   |    1 +
 block/blkverify.c     |  108 +------
 block/quorum.c        |  789 +++++++++++++++++++++++++++++++++++++++++++++++++
 configure             |   22 ++
 include/qemu-common.h |    2 +
 util/iov.c            |  103 +++++++
 6 files changed, 919 insertions(+), 106 deletions(-)
 create mode 100644 block/quorum.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 01/11] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 02/11] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs |    1 +
 block/quorum.c      |   45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)
 create mode 100644 block/quorum.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c067f38..4143e34 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
+block-obj-y += quorum.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
diff --git a/block/quorum.c b/block/quorum.c
new file mode 100644
index 0000000..ce094a1
--- /dev/null
+++ b/block/quorum.c
@@ -0,0 +1,45 @@
+/*
+ * Quorum Block filter
+ *
+ * Copyright (C) 2012-2013 Nodalink, SARL.
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Based on the design and code of blkverify.c (Copyright (C) 2010 IBM, Corp)
+ * and blkmirror.c (Copyright (C) 2011 Red Hat, Inc).
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "block/block_int.h"
+
+typedef struct QuorumAIOCB QuorumAIOCB;
+
+typedef struct QuorumSingleAIOCB {
+    BlockDriverAIOCB *aiocb;
+    uint8_t *buf;
+    int ret;
+    QuorumAIOCB *parent;
+} QuorumSingleAIOCB;
+
+struct QuorumAIOCB {
+    BlockDriverAIOCB common;
+    QEMUBH *bh;
+
+    /* Request metadata */
+    uint64_t sector_num;
+    int nb_sectors;
+
+    QEMUIOVector *qiov;         /* calling readv IOV */
+
+    QuorumSingleAIOCB *aios;    /* individual AIOs */
+    QEMUIOVector *qiovs;        /* individual IOVs */
+    int count;                  /* number of completed AIOCB */
+    int success_count;          /* number of successfully completed AIOCB */
+    bool *finished;             /* completion signal for cancel */
+
+    void (*vote)(QuorumAIOCB *acb);
+    int vote_ret;
+};
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 02/11] quorum: Create BDRVQuorumState and BlkDriver and do init.
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 01/11] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 03/11] quorum: Add quorum_open() and quorum_close() Benoît Canet
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index ce094a1..0524b63 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -15,6 +15,13 @@
 
 #include "block/block_int.h"
 
+typedef struct {
+    BlockDriverState **bs;
+    unsigned long long threshold;
+    unsigned long long total;
+    char **filenames;
+} BDRVQuorumState;
+
 typedef struct QuorumAIOCB QuorumAIOCB;
 
 typedef struct QuorumSingleAIOCB {
@@ -26,6 +33,7 @@ typedef struct QuorumSingleAIOCB {
 
 struct QuorumAIOCB {
     BlockDriverAIOCB common;
+    BDRVQuorumState *bqs;
     QEMUBH *bh;
 
     /* Request metadata */
@@ -43,3 +51,17 @@ struct QuorumAIOCB {
     void (*vote)(QuorumAIOCB *acb);
     int vote_ret;
 };
+
+static BlockDriver bdrv_quorum = {
+    .format_name        = "quorum",
+    .protocol_name      = "quorum",
+
+    .instance_size      = sizeof(BDRVQuorumState),
+};
+
+static void bdrv_quorum_init(void)
+{
+    bdrv_register(&bdrv_quorum);
+}
+
+block_init(bdrv_quorum_init);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 03/11] quorum: Add quorum_open() and quorum_close().
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 01/11] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 02/11] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 04/11] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Valid quorum resources look like
quorum:threshold/total:path/to/image_1: ... :path/to/image_total

':' is used as a separator
'\' is the escaping character for filename containing ':'
'\' escape itself
',' must be escaped with ','

On the command line for quorum files "img:test.raw", "img2,raw"
and "img3.raw" invocation look like:

-drive file=quorum:2/3:img\\:test.raw:img2,,raw:img3.raw
(note the double \\ and the double ,,)

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |  155 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 155 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 0524b63..e157eb1 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -52,11 +52,166 @@ struct QuorumAIOCB {
     int vote_ret;
 };
 
+static int quorum_parse_uint_step_next(const char *start,
+                                       const char *name,
+                                       const char separator,
+                                       unsigned long long *value,
+                                       char **next)
+{
+    int ret;
+    if (start[0] == '\0') {
+        qerror_report(QERR_MISSING_PARAMETER, name);
+        return -EINVAL;
+    }
+    ret = parse_uint(start, value, next, 10);
+    if (ret < 0) {
+        qerror_report(QERR_INVALID_PARAMETER_TYPE, name, "int");
+        return ret;
+    }
+    if (**next != separator) {
+        qerror_report(ERROR_CLASS_GENERIC_ERROR,
+                      "%c separator required after %s",
+                      separator, name);
+        return -EINVAL;
+    }
+    *next += 1;
+    return 0;
+}
+
+/* Valid quorum resources look like
+ * quorum:threshold/total:path/to/image_1: ... :path/to/image_total
+ *
+ * ':' is used as a separator
+ * '\' is the escaping character for filename containing ':'
+ */
+static int quorum_open(BlockDriverState *bs, const char *filename, int flags)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i, j, k, len, ret = 0;
+    char *a, *b, *names;
+    const char *start;
+    bool escape;
+
+    /* Parse the quorum: prefix */
+    if (!strstart(filename, "quorum:", &start)) {
+        return -EINVAL;
+    }
+
+    /* Get threshold */
+    ret = quorum_parse_uint_step_next(start, "threshold", '/',
+                                      &s->threshold, &a);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* Get total */
+    ret = quorum_parse_uint_step_next(a, "total", ':', &s->total, &b);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (s->threshold < 1) {
+        qerror_report(QERR_INVALID_PARAMETER_VALUE, "threshold", "value >= 1");
+        return -ERANGE;
+    }
+
+    if (s->total < 2) {
+        qerror_report(QERR_INVALID_PARAMETER_VALUE, "total", "value >= 2");
+        return -ERANGE;
+    }
+
+    if (s->threshold > s->total) {
+        qerror_report(ERROR_CLASS_GENERIC_ERROR,
+                      "threshold <= total must be true");
+        return -ERANGE;
+    }
+
+    s->bs = g_malloc0(sizeof(BlockDriverState *) * s->total);
+    /* Two allocations for all filenames: simpler to free */
+    s->filenames = g_malloc0(sizeof(char *) * s->total);
+    names = g_strdup(b);
+
+    /* Get the filenames pointers */
+    escape = false;
+    s->filenames[0] = names;
+    len = strlen(names);
+    for (i = j = k = 0; i < len && j < s->total; i++) {
+        /* separation between two files */
+        if (!escape && names[i] == ':') {
+            char *prev = s->filenames[j];
+            prev[k] = '\0';
+            s->filenames[++j] = prev + k + 1;
+            k = 0;
+            continue;
+        }
+
+        escape = !escape && names[i] == '\\';
+
+        /* if we are not escaping copy */
+        if (!escape) {
+            s->filenames[j][k++] = names[i];
+        }
+    }
+    /* terminate last string */
+    s->filenames[j][k] = '\0';
+
+    if ((j + 1) != s->total) {
+        qerror_report(ERROR_CLASS_GENERIC_ERROR,
+                      "Number of provided file must be equal to total");
+        ret = -EINVAL;
+        goto free_exit;
+    }
+
+    /* Open files */
+    for (i = 0; i < s->total; i++) {
+        s->bs[i] = bdrv_new("");
+        ret = bdrv_open(s->bs[i], s->filenames[i], flags, NULL);
+        if (ret < 0) {
+            goto error_exit;
+        }
+    }
+
+    goto exit;
+
+error_exit:
+    for (; i >= 0; i--) {
+        bdrv_delete(s->bs[i]);
+        s->bs[i] = NULL;
+    }
+free_exit:
+    g_free(s->filenames[0]);
+    g_free(s->filenames);
+    s->filenames = NULL;
+    g_free(s->bs);
+exit:
+    return ret;
+}
+
+static void quorum_close(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->total; i++) {
+        /* Ensure writes reach stable storage */
+        bdrv_flush(s->bs[i]);
+        bdrv_delete(s->bs[i]);
+    }
+
+    g_free(s->filenames[0]);
+    g_free(s->filenames);
+    s->filenames = NULL;
+    g_free(s->bs);
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
 
     .instance_size      = sizeof(BDRVQuorumState),
+
+    .bdrv_file_open     = quorum_open,
+    .bdrv_close         = quorum_close,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 04/11] quorum: Add quorum_aio_writev and its dependencies.
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
                   ` (2 preceding siblings ...)
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 03/11] quorum: Add quorum_open() and quorum_close() Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 05/11] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |  113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 113 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index e157eb1..71ae9ce 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -204,6 +204,117 @@ static void quorum_close(BlockDriverState *bs)
     g_free(s->bs);
 }
 
+static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+    QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
+    bool finished = false;
+
+    /* Wait for the request to finish */
+    acb->finished = &finished;
+    while (!finished) {
+        qemu_aio_wait();
+    }
+}
+
+static AIOCBInfo quorum_aiocb_info = {
+    .aiocb_size         = sizeof(QuorumAIOCB),
+    .cancel             = quorum_aio_cancel,
+};
+
+static void quorum_aio_bh(void *opaque)
+{
+    QuorumAIOCB *acb = opaque;
+    BDRVQuorumState *s = acb->bqs;
+    int ret;
+
+    ret = s->threshold <= acb->success_count ? 0 : -EIO;
+
+    qemu_bh_delete(acb->bh);
+    acb->common.cb(acb->common.opaque, ret);
+    if (acb->finished) {
+        *acb->finished = true;
+    }
+    g_free(acb->aios);
+    g_free(acb->qiovs);
+    qemu_aio_release(acb);
+}
+
+static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
+                                   BlockDriverState *bs,
+                                   QEMUIOVector *qiov,
+                                   uint64_t sector_num,
+                                   int nb_sectors,
+                                   BlockDriverCompletionFunc *cb,
+                                   void *opaque)
+{
+    QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
+    int i;
+
+    acb->aios = g_new0(QuorumSingleAIOCB, s->total);
+    acb->qiovs = g_new0(QEMUIOVector, s->total);
+
+    acb->bqs = s;
+    acb->qiov = qiov;
+    acb->bh = NULL;
+    acb->count = 0;
+    acb->success_count = 0;
+    acb->sector_num = sector_num;
+    acb->nb_sectors = nb_sectors;
+    acb->vote = NULL;
+    acb->vote_ret = 0;
+    acb->finished = NULL;
+
+    for (i = 0; i < s->total; i++) {
+        acb->aios[i].buf = NULL;
+        acb->aios[i].ret = 0;
+        acb->aios[i].parent = acb;
+    }
+
+    return acb;
+}
+
+static void quorum_aio_cb(void *opaque, int ret)
+{
+    QuorumSingleAIOCB *sacb = opaque;
+    QuorumAIOCB *acb = sacb->parent;
+    BDRVQuorumState *s = acb->bqs;
+
+    sacb->ret = ret;
+    acb->count++;
+    if (ret == 0) {
+        acb->success_count++;
+    }
+    assert(acb->count <= s->total);
+    assert(acb->success_count <= s->total);
+    if (acb->count < s->total) {
+        return;
+    }
+
+    acb->bh = qemu_bh_new(quorum_aio_bh, acb);
+    qemu_bh_schedule(acb->bh);
+}
+
+static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
+                                          int64_t sector_num,
+                                          QEMUIOVector *qiov,
+                                          int nb_sectors,
+                                          BlockDriverCompletionFunc *cb,
+                                          void *opaque)
+{
+    BDRVQuorumState *s = bs->opaque;
+    QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors,
+                                      cb, opaque);
+    int i;
+
+    for (i = 0; i < s->total; i++) {
+        acb->aios[i].aiocb = bdrv_aio_writev(s->bs[i], sector_num, qiov,
+                                             nb_sectors, &quorum_aio_cb,
+                                             &acb->aios[i]);
+    }
+
+    return &acb->common;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
@@ -212,6 +323,8 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_file_open     = quorum_open,
     .bdrv_close         = quorum_close,
+
+    .bdrv_aio_writev    = quorum_aio_writev,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 05/11] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify.
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
                   ` (3 preceding siblings ...)
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 04/11] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 06/11] quorum: Add quorum_aio_readv Benoît Canet
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/blkverify.c     |  108 +------------------------------------------------
 include/qemu-common.h |    2 +
 util/iov.c            |  103 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 106 deletions(-)

diff --git a/block/blkverify.c b/block/blkverify.c
index a7dd459..8c65425 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -123,110 +123,6 @@ static int64_t blkverify_getlength(BlockDriverState *bs)
     return bdrv_getlength(s->test_file);
 }
 
-/**
- * Check that I/O vector contents are identical
- *
- * @a:          I/O vector
- * @b:          I/O vector
- * @ret:        Offset to first mismatching byte or -1 if match
- */
-static ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
-{
-    int i;
-    ssize_t offset = 0;
-
-    assert(a->niov == b->niov);
-    for (i = 0; i < a->niov; i++) {
-        size_t len = 0;
-        uint8_t *p = (uint8_t *)a->iov[i].iov_base;
-        uint8_t *q = (uint8_t *)b->iov[i].iov_base;
-
-        assert(a->iov[i].iov_len == b->iov[i].iov_len);
-        while (len < a->iov[i].iov_len && *p++ == *q++) {
-            len++;
-        }
-
-        offset += len;
-
-        if (len != a->iov[i].iov_len) {
-            return offset;
-        }
-    }
-    return -1;
-}
-
-typedef struct {
-    int src_index;
-    struct iovec *src_iov;
-    void *dest_base;
-} IOVectorSortElem;
-
-static int sortelem_cmp_src_base(const void *a, const void *b)
-{
-    const IOVectorSortElem *elem_a = a;
-    const IOVectorSortElem *elem_b = b;
-
-    /* Don't overflow */
-    if (elem_a->src_iov->iov_base < elem_b->src_iov->iov_base) {
-        return -1;
-    } else if (elem_a->src_iov->iov_base > elem_b->src_iov->iov_base) {
-        return 1;
-    } else {
-        return 0;
-    }
-}
-
-static int sortelem_cmp_src_index(const void *a, const void *b)
-{
-    const IOVectorSortElem *elem_a = a;
-    const IOVectorSortElem *elem_b = b;
-
-    return elem_a->src_index - elem_b->src_index;
-}
-
-/**
- * Copy contents of I/O vector
- *
- * The relative relationships of overlapping iovecs are preserved.  This is
- * necessary to ensure identical semantics in the cloned I/O vector.
- */
-static void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
-                                  void *buf)
-{
-    IOVectorSortElem sortelems[src->niov];
-    void *last_end;
-    int i;
-
-    /* Sort by source iovecs by base address */
-    for (i = 0; i < src->niov; i++) {
-        sortelems[i].src_index = i;
-        sortelems[i].src_iov = &src->iov[i];
-    }
-    qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
-
-    /* Allocate buffer space taking into account overlapping iovecs */
-    last_end = NULL;
-    for (i = 0; i < src->niov; i++) {
-        struct iovec *cur = sortelems[i].src_iov;
-        ptrdiff_t rewind = 0;
-
-        /* Detect overlap */
-        if (last_end && last_end > cur->iov_base) {
-            rewind = last_end - cur->iov_base;
-        }
-
-        sortelems[i].dest_base = buf - rewind;
-        buf += cur->iov_len - MIN(rewind, cur->iov_len);
-        last_end = MAX(cur->iov_base + cur->iov_len, last_end);
-    }
-
-    /* Sort by source iovec index and build destination iovec */
-    qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
-    for (i = 0; i < src->niov; i++) {
-        qemu_iovec_add(dest, sortelems[i].dest_base, src->iov[i].iov_len);
-    }
-}
-
 static BlkverifyAIOCB *blkverify_aio_get(BlockDriverState *bs, bool is_write,
                                          int64_t sector_num, QEMUIOVector *qiov,
                                          int nb_sectors,
@@ -290,7 +186,7 @@ static void blkverify_aio_cb(void *opaque, int ret)
 
 static void blkverify_verify_readv(BlkverifyAIOCB *acb)
 {
-    ssize_t offset = blkverify_iovec_compare(acb->qiov, &acb->raw_qiov);
+    ssize_t offset = qemu_iovec_compare(acb->qiov, &acb->raw_qiov);
     if (offset != -1) {
         blkverify_err(acb, "contents mismatch in sector %" PRId64,
                       acb->sector_num + (int64_t)(offset / BDRV_SECTOR_SIZE));
@@ -308,7 +204,7 @@ static BlockDriverAIOCB *blkverify_aio_readv(BlockDriverState *bs,
     acb->verify = blkverify_verify_readv;
     acb->buf = qemu_blockalign(bs->file, qiov->size);
     qemu_iovec_init(&acb->raw_qiov, acb->qiov->niov);
-    blkverify_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
+    qemu_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
 
     bdrv_aio_readv(s->test_file, sector_num, qiov, nb_sectors,
                    blkverify_aio_cb, acb);
diff --git a/include/qemu-common.h b/include/qemu-common.h
index ca464bb..13ce13a 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -342,6 +342,8 @@ size_t qemu_iovec_from_buf(QEMUIOVector *qiov, size_t offset,
                            const void *buf, size_t bytes);
 size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
                          int fillc, size_t bytes);
+ssize_t qemu_iovec_compare(QEMUIOVector *a, QEMUIOVector *b);
+void qemu_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, void *buf);
 
 bool buffer_is_zero(const void *buf, size_t len);
 
diff --git a/util/iov.c b/util/iov.c
index c0f5c56..bed2c22 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -370,6 +370,109 @@ size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
     return iov_memset(qiov->iov, qiov->niov, offset, fillc, bytes);
 }
 
+/**
+ * Check that I/O vector contents are identical
+ *
+ * @a:          I/O vector
+ * @b:          I/O vector
+ * @ret:        Offset to first mismatching byte or -1 if match
+ */
+ssize_t qemu_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
+{
+    int i;
+    ssize_t offset = 0;
+
+    assert(a->niov == b->niov);
+    for (i = 0; i < a->niov; i++) {
+        size_t len = 0;
+        uint8_t *p = (uint8_t *)a->iov[i].iov_base;
+        uint8_t *q = (uint8_t *)b->iov[i].iov_base;
+
+        assert(a->iov[i].iov_len == b->iov[i].iov_len);
+        while (len < a->iov[i].iov_len && *p++ == *q++) {
+            len++;
+        }
+
+        offset += len;
+
+        if (len != a->iov[i].iov_len) {
+            return offset;
+        }
+    }
+    return -1;
+}
+
+typedef struct {
+    int src_index;
+    struct iovec *src_iov;
+    void *dest_base;
+} IOVectorSortElem;
+
+static int sortelem_cmp_src_base(const void *a, const void *b)
+{
+    const IOVectorSortElem *elem_a = a;
+    const IOVectorSortElem *elem_b = b;
+
+    /* Don't overflow */
+    if (elem_a->src_iov->iov_base < elem_b->src_iov->iov_base) {
+        return -1;
+    } else if (elem_a->src_iov->iov_base > elem_b->src_iov->iov_base) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
+static int sortelem_cmp_src_index(const void *a, const void *b)
+{
+    const IOVectorSortElem *elem_a = a;
+    const IOVectorSortElem *elem_b = b;
+
+    return elem_a->src_index - elem_b->src_index;
+}
+
+/**
+ * Copy contents of I/O vector
+ *
+ * The relative relationships of overlapping iovecs are preserved.  This is
+ * necessary to ensure identical semantics in the cloned I/O vector.
+ */
+void qemu_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, void *buf)
+{
+    IOVectorSortElem sortelems[src->niov];
+    void *last_end;
+    int i;
+
+    /* Sort by source iovecs by base address */
+    for (i = 0; i < src->niov; i++) {
+        sortelems[i].src_index = i;
+        sortelems[i].src_iov = &src->iov[i];
+    }
+    qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
+
+    /* Allocate buffer space taking into account overlapping iovecs */
+    last_end = NULL;
+    for (i = 0; i < src->niov; i++) {
+        struct iovec *cur = sortelems[i].src_iov;
+        ptrdiff_t rewind = 0;
+
+        /* Detect overlap */
+        if (last_end && last_end > cur->iov_base) {
+            rewind = last_end - cur->iov_base;
+        }
+
+        sortelems[i].dest_base = buf - rewind;
+        buf += cur->iov_len - MIN(rewind, cur->iov_len);
+        last_end = MAX(cur->iov_base + cur->iov_len, last_end);
+    }
+
+    /* Sort by source iovec index and build destination iovec */
+    qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
+    for (i = 0; i < src->niov; i++) {
+        qemu_iovec_add(dest, sortelems[i].dest_base, src->iov[i].iov_len);
+    }
+}
+
 size_t iov_discard_front(struct iovec **iov, unsigned int *iov_cnt,
                          size_t bytes)
 {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 06/11] quorum: Add quorum_aio_readv.
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
                   ` (4 preceding siblings ...)
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 05/11] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 07/11] quorum: Add quorum mechanism Benoît Canet
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index 71ae9ce..7194809 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -225,15 +225,24 @@ static void quorum_aio_bh(void *opaque)
 {
     QuorumAIOCB *acb = opaque;
     BDRVQuorumState *s = acb->bqs;
-    int ret;
+    int i, ret;
 
     ret = s->threshold <= acb->success_count ? 0 : -EIO;
 
+    for (i = 0; i < s->total; i++) {
+        qemu_vfree(acb->aios[i].buf);
+        acb->aios[i].buf = NULL;
+        acb->aios[i].ret = 0;
+    }
+
     qemu_bh_delete(acb->bh);
     acb->common.cb(acb->common.opaque, ret);
     if (acb->finished) {
         *acb->finished = true;
     }
+    for (i = 0; i < s->total; i++) {
+        qemu_iovec_destroy(&acb->qiovs[i]);
+    }
     g_free(acb->aios);
     g_free(acb->qiovs);
     qemu_aio_release(acb);
@@ -294,6 +303,32 @@ static void quorum_aio_cb(void *opaque, int ret)
     qemu_bh_schedule(acb->bh);
 }
 
+static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
+                                         int64_t sector_num,
+                                         QEMUIOVector *qiov,
+                                         int nb_sectors,
+                                         BlockDriverCompletionFunc *cb,
+                                         void *opaque)
+{
+    BDRVQuorumState *s = bs->opaque;
+    QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
+                                      nb_sectors, cb, opaque);
+    int i;
+
+    for (i = 0; i < s->total; i++) {
+        acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
+        qemu_iovec_init(&acb->qiovs[i], qiov->niov);
+        qemu_iovec_clone(&acb->qiovs[i], qiov, acb->aios[i].buf);
+    }
+
+    for (i = 0; i < s->total; i++) {
+        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+                       quorum_aio_cb, &acb->aios[i]);
+    }
+
+    return &acb->common;
+}
+
 static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
                                           int64_t sector_num,
                                           QEMUIOVector *qiov,
@@ -324,6 +359,7 @@ static BlockDriver bdrv_quorum = {
     .bdrv_file_open     = quorum_open,
     .bdrv_close         = quorum_close,
 
+    .bdrv_aio_readv     = quorum_aio_readv,
     .bdrv_aio_writev    = quorum_aio_writev,
 };
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 07/11] quorum: Add quorum mechanism.
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
                   ` (5 preceding siblings ...)
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 06/11] quorum: Add quorum_aio_readv Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 08/11] quorum: Add quorum_getlength() Benoît Canet
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Use gnutls's SHA-256 to compare versions.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |  303 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 configure      |   22 ++++
 2 files changed, 324 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index 7194809..e2b5208 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -13,8 +13,30 @@
  * See the COPYING file in the top-level directory.
  */
 
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
 #include "block/block_int.h"
 
+#define HASH_LENGTH 32
+
+typedef union QuorumVoteValue {
+    char h[HASH_LENGTH];       /* SHA-256 hash */
+    unsigned long l;  /* simpler hash */
+} QuorumVoteValue;
+
+typedef struct QuorumVoteItem {
+    int index;
+    QLIST_ENTRY(QuorumVoteItem) next;
+} QuorumVoteItem;
+
+typedef struct QuorumVoteVersion {
+    QuorumVoteValue value;
+    int index;
+    int vote_count;
+    QLIST_HEAD(, QuorumVoteItem) items;
+    QLIST_ENTRY(QuorumVoteVersion) next;
+} QuorumVoteVersion;
+
 typedef struct {
     BlockDriverState **bs;
     unsigned long long threshold;
@@ -31,6 +53,11 @@ typedef struct QuorumSingleAIOCB {
     QuorumAIOCB *parent;
 } QuorumSingleAIOCB;
 
+typedef struct QuorumVotes {
+    QLIST_HEAD(, QuorumVoteVersion) vote_list;
+    int (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
+} QuorumVotes;
+
 struct QuorumAIOCB {
     BlockDriverAIOCB common;
     BDRVQuorumState *bqs;
@@ -48,6 +75,8 @@ struct QuorumAIOCB {
     int success_count;          /* number of successfully completed AIOCB */
     bool *finished;             /* completion signal for cancel */
 
+    QuorumVotes votes;
+
     void (*vote)(QuorumAIOCB *acb);
     int vote_ret;
 };
@@ -236,6 +265,11 @@ static void quorum_aio_bh(void *opaque)
     }
 
     qemu_bh_delete(acb->bh);
+
+    if (acb->vote_ret) {
+        ret = acb->vote_ret;
+    }
+
     acb->common.cb(acb->common.opaque, ret);
     if (acb->finished) {
         *acb->finished = true;
@@ -248,6 +282,11 @@ static void quorum_aio_bh(void *opaque)
     qemu_aio_release(acb);
 }
 
+static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
+{
+    return memcmp(a, b, HASH_LENGTH);
+}
+
 static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
                                    BlockDriverState *bs,
                                    QEMUIOVector *qiov,
@@ -272,6 +311,8 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
     acb->vote = NULL;
     acb->vote_ret = 0;
     acb->finished = NULL;
+    acb->votes.compare = quorum_sha256_compare;
+    QLIST_INIT(&acb->votes.vote_list);
 
     for (i = 0; i < s->total; i++) {
         acb->aios[i].buf = NULL;
@@ -299,10 +340,268 @@ static void quorum_aio_cb(void *opaque, int ret)
         return;
     }
 
+    /* Do the vote */
+    if (acb->vote) {
+        acb->vote(acb);
+    }
+
     acb->bh = qemu_bh_new(quorum_aio_bh, acb);
     qemu_bh_schedule(acb->bh);
 }
 
+static void quorum_print_bad(QuorumAIOCB *acb, const char *filename)
+{
+    fprintf(stderr, "quorum: corrected error in quorum file %s: sector_num=%"
+            PRId64 " nb_sectors=%i\n", filename, acb->sector_num,
+            acb->nb_sectors);
+}
+
+static void quorum_print_failure(QuorumAIOCB *acb)
+{
+    fprintf(stderr, "quorum: failure sector_num=%" PRId64 " nb_sectors=%i\n",
+            acb->sector_num, acb->nb_sectors);
+}
+
+static void quorum_print_bad_versions(QuorumAIOCB *acb,
+                                      QuorumVoteValue *value)
+{
+    QuorumVoteVersion *version;
+    QuorumVoteItem *item;
+    BDRVQuorumState *s = acb->bqs;
+
+    QLIST_FOREACH(version, &acb->votes.vote_list, next) {
+        if (!acb->votes.compare(&version->value, value)) {
+            continue;
+        }
+        QLIST_FOREACH(item, &version->items, next) {
+            quorum_print_bad(acb, s->filenames[item->index]);
+        }
+    }
+}
+
+static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
+{
+    int i;
+    assert(dest->niov == source->niov);
+    assert(dest->size == source->size);
+    for (i = 0; i < source->niov; i++) {
+        assert(dest->iov[i].iov_len == source->iov[i].iov_len);
+        memcpy(dest->iov[i].iov_base,
+               source->iov[i].iov_base,
+               source->iov[i].iov_len);
+    }
+}
+
+static void quorum_count_vote(QuorumVotes *votes,
+                              QuorumVoteValue *value,
+                              int index)
+{
+    QuorumVoteVersion *v = NULL, *version = NULL;
+    QuorumVoteItem *item;
+
+    /* look if we have something with this hash */
+    QLIST_FOREACH(v, &votes->vote_list, next) {
+        if (!votes->compare(&v->value, value)) {
+            version = v;
+            break;
+        }
+    }
+
+    /* It's a version not yet in the list add it */
+    if (!version) {
+        version = g_new0(QuorumVoteVersion, 1);
+        QLIST_INIT(&version->items);
+        memcpy(&version->value, value, sizeof(version->value));
+        version->index = index;
+        version->vote_count = 0;
+        QLIST_INSERT_HEAD(&votes->vote_list, version, next);
+    }
+
+    version->vote_count++;
+
+    item = g_new0(QuorumVoteItem, 1);
+    item->index = index;
+    QLIST_INSERT_HEAD(&version->items, item, next);
+}
+
+static void quorum_free_vote_list(QuorumVotes *votes)
+{
+    QuorumVoteVersion *version, *next_version;
+    QuorumVoteItem *item, *next_item;
+
+    QLIST_FOREACH_SAFE(version, &votes->vote_list, next, next_version) {
+        QLIST_REMOVE(version, next);
+        QLIST_FOREACH_SAFE(item, &version->items, next, next_item) {
+            QLIST_REMOVE(item, next);
+            g_free(item);
+        }
+        g_free(version);
+    }
+}
+
+static int quorum_compute_hash(QuorumAIOCB *acb, int i, QuorumVoteValue *hash)
+{
+    int j, ret;
+    gnutls_hash_hd_t dig;
+    QEMUIOVector *qiov = &acb->qiovs[i];
+
+    ret = gnutls_hash_init(&dig, GNUTLS_DIG_SHA256);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    for (j = 0; j < qiov->niov; j++) {
+        ret = gnutls_hash(dig, qiov->iov[j].iov_base, qiov->iov[j].iov_len);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    gnutls_hash_deinit(dig, (void *) hash);
+
+    return 0;
+}
+
+static QuorumVoteVersion *quorum_get_vote_winner(QuorumVotes *votes)
+{
+    int i = 0;
+    QuorumVoteVersion *candidate, *winner = NULL;
+
+    QLIST_FOREACH(candidate, &votes->vote_list, next) {
+        if (candidate->vote_count > i) {
+            i = candidate->vote_count;
+            winner = candidate;
+        }
+    }
+
+    return winner;
+}
+
+static bool quorum_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
+{
+    int i;
+    int result;
+
+    assert(a->niov == b->niov);
+    for (i = 0; i < a->niov; i++) {
+        assert(a->iov[i].iov_len == b->iov[i].iov_len);
+        result = memcmp(a->iov[i].iov_base,
+                        b->iov[i].iov_base,
+                        a->iov[i].iov_len);
+        if (result) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
+static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
+                                          const char *fmt, ...)
+{
+    va_list ap;
+
+    va_start(ap, fmt);
+    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
+            acb->sector_num, acb->nb_sectors);
+    vfprintf(stderr, fmt, ap);
+    fprintf(stderr, "\n");
+    va_end(ap);
+    exit(1);
+}
+
+static bool quorum_compare(QuorumAIOCB *acb,
+                           QEMUIOVector *a,
+                           QEMUIOVector *b)
+{
+    BDRVQuorumState *s = acb->bqs;
+    bool blkverify = false;
+    ssize_t offset;
+
+    if (s->total == 2 && s->threshold == 2) {
+        blkverify = true;
+    }
+
+    if (blkverify) {
+        offset = qemu_iovec_compare(a, b);
+        if (offset != -1) {
+            quorum_err(acb, "contents mismatch in sector %" PRId64,
+                       acb->sector_num +
+                       (uint64_t)(offset / BDRV_SECTOR_SIZE));
+        }
+        return true;
+    }
+
+    return quorum_iovec_compare(a, b);
+}
+
+
+static void quorum_vote(QuorumAIOCB *acb)
+{
+    bool quorum = true;
+    int i, j, ret;
+    QuorumVoteValue hash;
+    BDRVQuorumState *s = acb->bqs;
+    QuorumVoteVersion *winner;
+
+    /* get the index of the first successful read */
+    for (i = 0; i < s->total; i++) {
+        if (!acb->aios[i].ret) {
+            break;
+        }
+    }
+
+    /* compare this read with all other successful read looking for quorum */
+    for (j = i + 1; j < s->total; j++) {
+        if (acb->aios[j].ret) {
+            continue;
+        }
+        quorum = quorum_compare(acb, &acb->qiovs[i], &acb->qiovs[j]);
+        if (!quorum) {
+            break;
+       }
+    }
+
+    /* Every successful read agrees -> Quorum */
+    if (quorum) {
+        quorum_copy_qiov(acb->qiov, &acb->qiovs[i]);
+        return;
+    }
+
+    /* compute hashs for each successful read, also store indexes */
+    for (i = 0; i < s->total; i++) {
+        if (acb->aios[i].ret) {
+            continue;
+        }
+        ret = quorum_compute_hash(acb, i, &hash);
+        assert(ret == 0);
+        quorum_count_vote(&acb->votes, &hash, i);
+    }
+
+    /* vote to select the most represented version */
+    winner = quorum_get_vote_winner(&acb->votes);
+    assert(winner != NULL);
+
+    /* if the winner count is smaller than threshold read fail */
+    if (winner->vote_count < s->threshold) {
+        quorum_print_failure(acb);
+        acb->vote_ret = -EIO;
+        fprintf(stderr, "quorum: vote result inferior to threshold\n");
+        goto free_exit;
+    }
+
+    /* we have a winner: copy it */
+    quorum_copy_qiov(acb->qiov, &acb->qiovs[winner->index]);
+
+    /* some versions are bad print them */
+    quorum_print_bad_versions(acb, &winner->value);
+
+free_exit:
+    /* free lists */
+    quorum_free_vote_list(&acb->votes);
+}
+
 static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
                                          int64_t sector_num,
                                          QEMUIOVector *qiov,
@@ -315,6 +614,8 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
                                       nb_sectors, cb, opaque);
     int i;
 
+    acb->vote = quorum_vote;
+
     for (i = 0; i < s->total; i++) {
         acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
         qemu_iovec_init(&acb->qiovs[i], qiov->niov);
@@ -322,7 +623,7 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
     }
 
     for (i = 0; i < s->total; i++) {
-        bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+        bdrv_aio_readv(s->bs[i], sector_num, &acb->qiovs[i], nb_sectors,
                        quorum_aio_cb, &acb->aios[i]);
     }
 
diff --git a/configure b/configure
index 4ebb60d..0832d26 100755
--- a/configure
+++ b/configure
@@ -1733,6 +1733,28 @@ EOF
 fi
 
 ##########################################
+# Quorum gnutls detection
+cat > $TMPC <<EOF
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
+int main(void) {char data[4096], digest[32];
+gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
+return 0;
+}
+EOF
+qcow_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
+qcow_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
+if compile_prog "$qcow_tls_cflags" "$qcow_tls_libs" ; then
+  qcow_tls=yes
+  libs_softmmu="$qcow_tls_libs $libs_softmmu"
+  libs_tools="$qcow_tls_libs $libs_softmmu"
+  QEMU_CFLAGS="$QEMU_CFLAGS $qcow_tls_cflags"
+else
+  echo "gnutls > 2.10.0 required to compile QEMU"
+  exit 1
+fi
+
+##########################################
 # VNC SASL detection
 if test "$vnc" = "yes" -a "$vnc_sasl" != "no" ; then
   cat > $TMPC <<EOF
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 08/11] quorum: Add quorum_getlength().
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
                   ` (6 preceding siblings ...)
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 07/11] quorum: Add quorum mechanism Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 09/11] quorum: Add quorum_invalidate_cache() Benoît Canet
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Check that every bs file return the same length.
If not return -EIO to disable the quorum and
avoid length discrepancy.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index e2b5208..a63a84f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -651,12 +651,32 @@ static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
     return &acb->common;
 }
 
+static int64_t quorum_getlength(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int64_t result;
+    int i;
+
+    /* check that every file have the same length */
+    result = bdrv_getlength(s->bs[0]);
+    for (i = 1; i < s->total; i++) {
+        int64_t value = bdrv_getlength(s->bs[i]);
+        if (value != result) {
+            return -EIO;
+        }
+    }
+
+    return result;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
 
     .instance_size      = sizeof(BDRVQuorumState),
 
+    .bdrv_getlength     = quorum_getlength,
+
     .bdrv_file_open     = quorum_open,
     .bdrv_close         = quorum_close,
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 09/11] quorum: Add quorum_invalidate_cache().
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
                   ` (7 preceding siblings ...)
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 08/11] quorum: Add quorum_getlength() Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 10/11] quorum: Add quorum_co_is_allocated Benoît Canet
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index a63a84f..5cafb40 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -669,6 +669,16 @@ static int64_t quorum_getlength(BlockDriverState *bs)
     return result;
 }
 
+static void quorum_invalidate_cache(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->total; i++) {
+        bdrv_invalidate_cache(s->bs[i]);
+    }
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
@@ -682,6 +692,7 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_aio_readv     = quorum_aio_readv,
     .bdrv_aio_writev    = quorum_aio_writev,
+    .bdrv_invalidate_cache = quorum_invalidate_cache,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 10/11] quorum: Add quorum_co_is_allocated.
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
                   ` (8 preceding siblings ...)
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 09/11] quorum: Add quorum_invalidate_cache() Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 11/11] quorum: Add quorum_co_flush() Benoît Canet
  2013-01-21 13:02 ` [Qemu-devel] [RFC V7 00/11] Quorum block filter Zhi Yong Wu
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 5cafb40..8cbf66f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -287,6 +287,22 @@ static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
     return memcmp(a, b, HASH_LENGTH);
 }
 
+static int quorum_long_compare(QuorumVoteValue *a, QuorumVoteValue *b)
+{
+    unsigned long i = a->l;
+    unsigned long j = b->l;
+
+    if (i < j) {
+        return -1;
+    }
+
+    if (i > j) {
+        return 1;
+    }
+
+    return 0;
+}
+
 static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
                                    BlockDriverState *bs,
                                    QEMUIOVector *qiov,
@@ -679,6 +695,42 @@ static void quorum_invalidate_cache(BlockDriverState *bs)
     }
 }
 
+static int coroutine_fn quorum_co_is_allocated(BlockDriverState *bs,
+                                               int64_t sector_num,
+                                               int nb_sectors,
+                                               int *pnum)
+{
+    BDRVQuorumState *s = bs->opaque;
+    QuorumVoteVersion *winner = NULL;
+    QuorumVotes result_votes, num_votes;
+    QuorumVoteValue result_value, num_value;
+    int i, result = 0, num;
+
+    QLIST_INIT(&result_votes.vote_list);
+    QLIST_INIT(&num_votes.vote_list);
+    result_votes.compare = quorum_long_compare;
+    num_votes.compare = quorum_long_compare;
+
+    for (i = 0; i < s->total; i++) {
+        result = bdrv_co_is_allocated(s->bs[i], sector_num, nb_sectors, &num);
+        result_value.l = result;
+        num_value.l = num;
+        quorum_count_vote(&result_votes, &result_value, i);
+        quorum_count_vote(&num_votes, &num_value, i);
+    }
+
+    winner = quorum_get_vote_winner(&result_votes);
+    result = winner->value.l;
+
+    winner = quorum_get_vote_winner(&num_votes);
+    *pnum = winner->value.l;
+
+    quorum_free_vote_list(&result_votes);
+    quorum_free_vote_list(&num_votes);
+
+    return result;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
@@ -693,6 +745,7 @@ static BlockDriver bdrv_quorum = {
     .bdrv_aio_readv     = quorum_aio_readv,
     .bdrv_aio_writev    = quorum_aio_writev,
     .bdrv_invalidate_cache = quorum_invalidate_cache,
+    .bdrv_co_is_allocated  = quorum_co_is_allocated,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC V7 11/11] quorum: Add quorum_co_flush().
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
                   ` (9 preceding siblings ...)
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 10/11] quorum: Add quorum_co_is_allocated Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
  2013-01-21 13:02 ` [Qemu-devel] [RFC V7 00/11] Quorum block filter Zhi Yong Wu
  11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Makes a vote to select error if any.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/quorum.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 8cbf66f..0f4f634 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -731,6 +731,38 @@ static int coroutine_fn quorum_co_is_allocated(BlockDriverState *bs,
     return result;
 }
 
+static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    QuorumVoteVersion *winner = NULL;
+    QuorumVotes error_votes;
+    QuorumVoteValue result_value;
+    int i;
+    int result = 0;
+    bool error = false;
+
+    QLIST_INIT(&error_votes.vote_list);
+    error_votes.compare = quorum_long_compare;
+
+    for (i = 0; i < s->total; i++) {
+        result = bdrv_co_flush(s->bs[i]);
+        if (result) {
+            error = true;
+            result_value.l = result;
+            quorum_count_vote(&error_votes, &result_value, i);
+        }
+    }
+
+    if (error) {
+        winner = quorum_get_vote_winner(&error_votes);
+        result = winner->value.l;
+    }
+
+    quorum_free_vote_list(&error_votes);
+
+    return result;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name        = "quorum",
     .protocol_name      = "quorum",
@@ -741,6 +773,7 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_file_open     = quorum_open,
     .bdrv_close         = quorum_close,
+    .bdrv_co_flush_to_disk = quorum_co_flush,
 
     .bdrv_aio_readv     = quorum_aio_readv,
     .bdrv_aio_writev    = quorum_aio_writev,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC V7 00/11] Quorum block filter
  2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
                   ` (10 preceding siblings ...)
  2013-01-18 17:30 ` [Qemu-devel] [RFC V7 11/11] quorum: Add quorum_co_flush() Benoît Canet
@ 2013-01-21 13:02 ` Zhi Yong Wu
  11 siblings, 0 replies; 13+ messages in thread
From: Zhi Yong Wu @ 2013-01-21 13:02 UTC (permalink / raw)
  To: Benoît Canet; +Cc: qemu-devel

On Sat, Jan 19, 2013 at 1:30 AM, Benoît Canet <benoit@irqsave.net> wrote:
> This patchset is rebased on top of "cutils: unsigned int parsing functions"
> by "Eduardo Habkost".
>
> This patchset create a block driver implementing a quorum using total qemu disk
> images. Writes are mirrored on the $total files.
> For the reading part the $total files are read at the same time and a vote is
> done to determine if a qiov version is present $threshold or more times. It then
> return this majority version to the upper layers.
> When i < $threshold versions of the data are returned by the lower layer the
> quorum is broken and the read return -EIO.
>
> The goal of this patchset is to be turned in a QEMU block filter living just
> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>
> Main use of this feature will be people using NFS appliances which can be
> subjected to bitflip errors.
>
> This patchset can be used to replace blkverify and the out of tree blkmirror.
>
> usage: -drive
> file=quorum:threshold/total:image_1.raw:...:image_total.raw,if=virtio,cache=none
I don't know if the following case can be handled correctly.
For example, quorum:2/3:image1.raw:image2.raw:image3.raw
Let us assume that some data in image2.raw and image3.raw get
corrupted, and the two images are now completely identical; while
image1.raw doesn't get corrupted. In this case, how will your vote
method know if which image gets corrupted and which image doesn't?

>
> in this version:
>     parse total and threshold with parse_uint [Eric]
>     return proper qerrors in quorum_open [Eric]
>     Use sha256 for comparing blocks [Eric]
>     Update the rest of the voting function to the new way of doing [Benoît]
>
> V6:
>     fix commit message of "quorum: Add quorum_open() and quorum_close()." [Eric]
>     return error after a vote in quorum_co_flush [Eric]
>     Fix bitrot caused by headers and structures renaming [Benoît]
>     initialize finished to NULL to prevent crash [Benoît]
>     convert internal quorum code to uint64_t instead of int64_t [Benoît]
>
> V5:
>
> Eric Blake: revert back separator to ":"
>             rewrite quorum_getlength
>
> Benoît Canet: use memcmp to compare iovec excepted for the blkverify case
>               use strstart to parse argument in open
>
>
> Benoît Canet (11):
>   quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
>   quorum: Create BDRVQuorumState and BlkDriver and do init.
>   quorum: Add quorum_open() and quorum_close().
>   quorum: Add quorum_aio_writev and its dependencies.
>   blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from
>     blkverify.
>   quorum: Add quorum_aio_readv.
>   quorum: Add quorum mechanism.
>   quorum: Add quorum_getlength().
>   quorum: Add quorum_invalidate_cache().
>   quorum: Add quorum_co_is_allocated.
>   quorum: Add quorum_co_flush().
>
>  block/Makefile.objs   |    1 +
>  block/blkverify.c     |  108 +------
>  block/quorum.c        |  789 +++++++++++++++++++++++++++++++++++++++++++++++++
>  configure             |   22 ++
>  include/qemu-common.h |    2 +
>  util/iov.c            |  103 +++++++
>  6 files changed, 919 insertions(+), 106 deletions(-)
>  create mode 100644 block/quorum.c
>
> --
> 1.7.10.4
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-01-21 13:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 01/11] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 02/11] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 03/11] quorum: Add quorum_open() and quorum_close() Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 04/11] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 05/11] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 06/11] quorum: Add quorum_aio_readv Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 07/11] quorum: Add quorum mechanism Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 08/11] quorum: Add quorum_getlength() Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 09/11] quorum: Add quorum_invalidate_cache() Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 10/11] quorum: Add quorum_co_is_allocated Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 11/11] quorum: Add quorum_co_flush() Benoît Canet
2013-01-21 13:02 ` [Qemu-devel] [RFC V7 00/11] Quorum block filter Zhi Yong Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.