All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev()
@ 2016-11-10 17:19 Kevin Wolf
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive() Kevin Wolf
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Kevin Wolf @ 2016-11-10 17:19 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, berto, qemu-devel

This is part of a series that I'm working on and that aims to remove the
bdrv_aio_*() emulation in io.c. blkdebug and blkverify were easy, but for
quorum I need a few more patches, so I'm sending this out as an RFC while I
continue work on the rest (QED, and then possibly some polishing).

After the series, quorum doesn't only use the preferred interfaces, but the
code becomes a little easier to follow and byte granularity requests are
supported. There is probably still some potential for additional cleanups in a
follow-up series, but this should already be enough to be worth merging.

Kevin Wolf (8):
  coroutine: Introduce qemu_coroutine_enter_if_inactive()
  quorum: Remove s from quorum_aio_get() arguments
  quorum: Implement .bdrv_co_readv/writev
  quorum: Do cleanup in caller coroutine
  quorum: Inline quorum_aio_cb()
  quorum: Avoid bdrv_aio_writev() for rewrites
  quorum: Implement .bdrv_co_preadv/pwritev()
  quorum: Inline quorum_fifo_aio_cb()

 block/quorum.c           | 373 +++++++++++++++++++++++++----------------------
 include/qemu/coroutine.h |   6 +
 util/qemu-coroutine.c    |   8 +
 3 files changed, 211 insertions(+), 176 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [RFC PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive()
  2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
@ 2016-11-10 17:19 ` Kevin Wolf
  2016-11-10 23:49   ` Eric Blake
  2016-11-17  9:30   ` Alberto Garcia
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments Kevin Wolf
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Kevin Wolf @ 2016-11-10 17:19 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, berto, qemu-devel

In the context of asynchronous work, if we have a worker coroutine that
didn't yield, the parent coroutine cannot be reentered because it hasn't
yielded yet. In this case we don't even have to reenter the parent
because it will see that the work is already done and won't even yield.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/coroutine.h | 6 ++++++
 util/qemu-coroutine.c    | 8 ++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index e6a60d5..12584ed 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -71,6 +71,12 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque);
 void qemu_coroutine_enter(Coroutine *coroutine);
 
 /**
+ * Transfer control to a coroutine if it's not active (i.e. part of the call
+ * stack of the running coroutine). Otherwise, do nothing.
+ */
+void qemu_coroutine_enter_if_inactive(Coroutine *co);
+
+/**
  * Transfer control back to a coroutine's caller
  *
  * This function does not return until the coroutine is re-entered using
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 737bffa..a06befe 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -19,6 +19,7 @@
 #include "qemu/atomic.h"
 #include "qemu/coroutine.h"
 #include "qemu/coroutine_int.h"
+#include "block/aio.h"
 
 enum {
     POOL_BATCH_SIZE = 64,
@@ -131,6 +132,13 @@ void qemu_coroutine_enter(Coroutine *co)
     }
 }
 
+void qemu_coroutine_enter_if_inactive(Coroutine *co)
+{
+    if (!qemu_coroutine_entered(co)) {
+        qemu_coroutine_enter(co);
+    }
+}
+
 void coroutine_fn qemu_coroutine_yield(void)
 {
     Coroutine *self = qemu_coroutine_self();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [RFC PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments
  2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive() Kevin Wolf
@ 2016-11-10 17:19 ` Kevin Wolf
  2016-11-10 23:52   ` Eric Blake
                     ` (2 more replies)
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 3/8] quorum: Implement .bdrv_co_readv/writev Kevin Wolf
                   ` (7 subsequent siblings)
  9 siblings, 3 replies; 35+ messages in thread
From: Kevin Wolf @ 2016-11-10 17:19 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, berto, qemu-devel

There is no point in passing the value of bs->opaque in order to
overwrite it with itself.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index d122299..dfa9fd3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -171,18 +171,17 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
     return a->l == b->l;
 }
 
-static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
-                                   BlockDriverState *bs,
+static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
                                    QEMUIOVector *qiov,
                                    uint64_t sector_num,
                                    int nb_sectors,
                                    BlockCompletionFunc *cb,
                                    void *opaque)
 {
+    BDRVQuorumState *s = bs->opaque;
     QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
     int i;
 
-    acb->common.bs->opaque = s;
     acb->sector_num = sector_num;
     acb->nb_sectors = nb_sectors;
     acb->qiov = qiov;
@@ -691,7 +690,7 @@ static BlockAIOCB *quorum_aio_readv(BlockDriverState *bs,
                                     void *opaque)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num,
                                       nb_sectors, cb, opaque);
     acb->is_read = true;
     acb->children_read = 0;
@@ -711,7 +710,7 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
                                      void *opaque)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors,
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors,
                                       cb, opaque);
     int i;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [RFC PATCH 3/8] quorum: Implement .bdrv_co_readv/writev
  2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive() Kevin Wolf
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments Kevin Wolf
@ 2016-11-10 17:19 ` Kevin Wolf
  2016-11-11  1:56   ` Eric Blake
  2016-11-16 15:57   ` Alberto Garcia
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 4/8] quorum: Do cleanup in caller coroutine Kevin Wolf
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Kevin Wolf @ 2016-11-10 17:19 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, berto, qemu-devel

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 194 ++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 117 insertions(+), 77 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index dfa9fd3..3cb579e 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -97,7 +97,7 @@ typedef struct QuorumAIOCB QuorumAIOCB;
  * $children_count QuorumChildRequest.
  */
 typedef struct QuorumChildRequest {
-    BlockAIOCB *aiocb;
+    BlockDriverState *bs;
     QEMUIOVector qiov;
     uint8_t *buf;
     int ret;
@@ -110,7 +110,8 @@ typedef struct QuorumChildRequest {
  * used to do operations on each children and track overall progress.
  */
 struct QuorumAIOCB {
-    BlockAIOCB common;
+    BlockDriverState *bs;
+    Coroutine *co;
 
     /* Request metadata */
     uint64_t sector_num;
@@ -129,36 +130,18 @@ struct QuorumAIOCB {
     QuorumVotes votes;
 
     bool is_read;
+    bool has_completed;
     int vote_ret;
     int children_read;          /* how many children have been read from */
 };
 
 static bool quorum_vote(QuorumAIOCB *acb);
 
-static void quorum_aio_cancel(BlockAIOCB *blockacb)
-{
-    QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
-    BDRVQuorumState *s = acb->common.bs->opaque;
-    int i;
-
-    /* cancel all callbacks */
-    for (i = 0; i < s->num_children; i++) {
-        if (acb->qcrs[i].aiocb) {
-            bdrv_aio_cancel_async(acb->qcrs[i].aiocb);
-        }
-    }
-}
-
-static AIOCBInfo quorum_aiocb_info = {
-    .aiocb_size         = sizeof(QuorumAIOCB),
-    .cancel_async       = quorum_aio_cancel,
-};
-
 static void quorum_aio_finalize(QuorumAIOCB *acb)
 {
-    acb->common.cb(acb->common.opaque, acb->vote_ret);
+    acb->has_completed = true;
     g_free(acb->qcrs);
-    qemu_aio_unref(acb);
+    qemu_coroutine_enter_if_inactive(acb->co);
 }
 
 static bool quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
@@ -174,14 +157,14 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
 static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
                                    QEMUIOVector *qiov,
                                    uint64_t sector_num,
-                                   int nb_sectors,
-                                   BlockCompletionFunc *cb,
-                                   void *opaque)
+                                   int nb_sectors)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
+    QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);
     int i;
 
+    acb->co = qemu_coroutine_self();
+    acb->bs = bs;
     acb->sector_num = sector_num;
     acb->nb_sectors = nb_sectors;
     acb->qiov = qiov;
@@ -191,6 +174,7 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
     acb->rewrite_count = 0;
     acb->votes.compare = quorum_sha256_compare;
     QLIST_INIT(&acb->votes.vote_list);
+    acb->has_completed = false;
     acb->is_read = false;
     acb->vote_ret = 0;
 
@@ -217,7 +201,7 @@ static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
 
 static void quorum_report_failure(QuorumAIOCB *acb)
 {
-    const char *reference = bdrv_get_device_or_node_name(acb->common.bs);
+    const char *reference = bdrv_get_device_or_node_name(acb->bs);
     qapi_event_send_quorum_failure(reference, acb->sector_num,
                                    acb->nb_sectors, &error_abort);
 }
@@ -226,7 +210,7 @@ static int quorum_vote_error(QuorumAIOCB *acb);
 
 static bool quorum_has_too_much_io_failed(QuorumAIOCB *acb)
 {
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
 
     if (acb->success_count < s->threshold) {
         acb->vote_ret = quorum_vote_error(acb);
@@ -252,7 +236,7 @@ static void quorum_rewrite_aio_cb(void *opaque, int ret)
     quorum_aio_finalize(acb);
 }
 
-static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb);
+static int read_fifo_child(QuorumAIOCB *acb);
 
 static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
 {
@@ -272,14 +256,14 @@ static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret)
     QuorumAIOCB *acb = sacb->parent;
     QuorumOpType type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE;
     quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
-                      sacb->aiocb->bs->node_name, ret);
+                      sacb->bs->node_name, ret);
 }
 
-static void quorum_fifo_aio_cb(void *opaque, int ret)
+static int quorum_fifo_aio_cb(void *opaque, int ret)
 {
     QuorumChildRequest *sacb = opaque;
     QuorumAIOCB *acb = sacb->parent;
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
 
     assert(acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO);
 
@@ -288,8 +272,7 @@ static void quorum_fifo_aio_cb(void *opaque, int ret)
 
         /* We try to read next child in FIFO order if we fail to read */
         if (acb->children_read < s->num_children) {
-            read_fifo_child(acb);
-            return;
+            return read_fifo_child(acb);
         }
     }
 
@@ -297,13 +280,14 @@ static void quorum_fifo_aio_cb(void *opaque, int ret)
 
     /* FIXME: rewrite failed children if acb->children_read > 1? */
     quorum_aio_finalize(acb);
+    return ret;
 }
 
 static void quorum_aio_cb(void *opaque, int ret)
 {
     QuorumChildRequest *sacb = opaque;
     QuorumAIOCB *acb = sacb->parent;
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
     bool rewrite = false;
     int i;
 
@@ -518,7 +502,7 @@ static bool quorum_compare(QuorumAIOCB *acb,
                            QEMUIOVector *a,
                            QEMUIOVector *b)
 {
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
     ssize_t offset;
 
     /* This driver will replace blkverify in this particular case */
@@ -538,7 +522,7 @@ static bool quorum_compare(QuorumAIOCB *acb,
 /* Do a vote to get the error code */
 static int quorum_vote_error(QuorumAIOCB *acb)
 {
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
     QuorumVoteVersion *winner = NULL;
     QuorumVotes error_votes;
     QuorumVoteValue result_value;
@@ -573,7 +557,7 @@ static bool quorum_vote(QuorumAIOCB *acb)
     bool rewrite = false;
     int i, j, ret;
     QuorumVoteValue hash;
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
     QuorumVoteVersion *winner;
 
     if (quorum_has_too_much_io_failed(acb)) {
@@ -649,10 +633,31 @@ free_exit:
     return rewrite;
 }
 
-static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
-{
-    BDRVQuorumState *s = acb->common.bs->opaque;
+typedef struct QuorumCo {
+    QuorumAIOCB *acb;
     int i;
+} QuorumCo;
+
+static void read_quorum_children_entry(void *opaque)
+{
+    QuorumCo *co = opaque;
+    QuorumAIOCB *acb = co->acb;
+    BDRVQuorumState *s = acb->bs->opaque;
+    int i = co->i;
+    int ret;
+    co = NULL; /* Not valid after the first yield */
+
+    acb->qcrs[i].bs = s->children[i]->bs;
+    ret = bdrv_co_preadv(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
+                         acb->nb_sectors * BDRV_SECTOR_SIZE,
+                         &acb->qcrs[i].qiov, 0);
+    quorum_aio_cb(&acb->qcrs[i], ret);
+}
+
+static int read_quorum_children(QuorumAIOCB *acb)
+{
+    BDRVQuorumState *s = acb->bs->opaque;
+    int i, ret;
 
     acb->children_read = s->num_children;
     for (i = 0; i < s->num_children; i++) {
@@ -662,65 +667,100 @@ static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
     }
 
     for (i = 0; i < s->num_children; i++) {
-        acb->qcrs[i].aiocb = bdrv_aio_readv(s->children[i], acb->sector_num,
-                                            &acb->qcrs[i].qiov, acb->nb_sectors,
-                                            quorum_aio_cb, &acb->qcrs[i]);
+        Coroutine *co;
+        QuorumCo data = {
+            .acb = acb,
+            .i = i,
+        };
+
+        co = qemu_coroutine_create(read_quorum_children_entry, &data);
+        qemu_coroutine_enter(co);
     }
 
-    return &acb->common;
+    if (!acb->has_completed) {
+        qemu_coroutine_yield();
+    }
+
+    ret = acb->vote_ret;
+
+    return ret;
 }
 
-static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
+static int read_fifo_child(QuorumAIOCB *acb)
 {
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
     int n = acb->children_read++;
+    int ret;
 
-    acb->qcrs[n].aiocb = bdrv_aio_readv(s->children[n], acb->sector_num,
-                                        acb->qiov, acb->nb_sectors,
-                                        quorum_fifo_aio_cb, &acb->qcrs[n]);
+    acb->qcrs[n].bs = s->children[n]->bs;
+    ret = bdrv_co_preadv(s->children[n], acb->sector_num * BDRV_SECTOR_SIZE,
+                         acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
+    ret = quorum_fifo_aio_cb(&acb->qcrs[n], ret);
 
-    return &acb->common;
+    return ret;
 }
 
-static BlockAIOCB *quorum_aio_readv(BlockDriverState *bs,
-                                    int64_t sector_num,
-                                    QEMUIOVector *qiov,
-                                    int nb_sectors,
-                                    BlockCompletionFunc *cb,
-                                    void *opaque)
+static int quorum_co_readv(BlockDriverState *bs,
+                           int64_t sector_num, int nb_sectors,
+                           QEMUIOVector *qiov)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num,
-                                      nb_sectors, cb, opaque);
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors);
+    int ret;
+
     acb->is_read = true;
     acb->children_read = 0;
 
     if (s->read_pattern == QUORUM_READ_PATTERN_QUORUM) {
-        return read_quorum_children(acb);
+        ret = read_quorum_children(acb);
+    } else {
+        ret = read_fifo_child(acb);
     }
+    g_free(acb);
+    return ret;
+}
 
-    return read_fifo_child(acb);
+static void write_quorum_entry(void *opaque)
+{
+    QuorumCo *co = opaque;
+    QuorumAIOCB *acb = co->acb;
+    BDRVQuorumState *s = acb->bs->opaque;
+    int i = co->i;
+    int ret;
+    co = NULL; /* Not valid after the first yield */
+
+    acb->qcrs[i].bs = s->children[i]->bs;
+    ret = bdrv_co_pwritev(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
+                          acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
+    quorum_aio_cb(&acb->qcrs[i], ret);
 }
 
-static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
-                                     int64_t sector_num,
-                                     QEMUIOVector *qiov,
-                                     int nb_sectors,
-                                     BlockCompletionFunc *cb,
-                                     void *opaque)
+static int quorum_co_writev(BlockDriverState *bs,
+                            int64_t sector_num, int nb_sectors,
+                            QEMUIOVector *qiov)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors,
-                                      cb, opaque);
-    int i;
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors);
+    int i, ret;
 
     for (i = 0; i < s->num_children; i++) {
-        acb->qcrs[i].aiocb = bdrv_aio_writev(s->children[i], sector_num,
-                                             qiov, nb_sectors, &quorum_aio_cb,
-                                             &acb->qcrs[i]);
+        Coroutine *co;
+        QuorumCo data = {
+            .acb = acb,
+            .i = i,
+        };
+
+        co = qemu_coroutine_create(write_quorum_entry, &data);
+        qemu_coroutine_enter(co);
     }
 
-    return &acb->common;
+    if (!acb->has_completed) {
+        qemu_coroutine_yield();
+    }
+
+    ret = acb->vote_ret;
+
+    return ret;
 }
 
 static int64_t quorum_getlength(BlockDriverState *bs)
@@ -1097,8 +1137,8 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_getlength                     = quorum_getlength,
 
-    .bdrv_aio_readv                     = quorum_aio_readv,
-    .bdrv_aio_writev                    = quorum_aio_writev,
+    .bdrv_co_readv                      = quorum_co_readv,
+    .bdrv_co_writev                     = quorum_co_writev,
 
     .bdrv_add_child                     = quorum_add_child,
     .bdrv_del_child                     = quorum_del_child,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [RFC PATCH 4/8] quorum: Do cleanup in caller coroutine
  2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (2 preceding siblings ...)
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 3/8] quorum: Implement .bdrv_co_readv/writev Kevin Wolf
@ 2016-11-10 17:19 ` Kevin Wolf
  2016-11-11  2:18   ` Eric Blake
  2016-11-17 10:04   ` Alberto Garcia
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 5/8] quorum: Inline quorum_aio_cb() Kevin Wolf
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Kevin Wolf @ 2016-11-10 17:19 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, berto, qemu-devel

Instead of calling quorum_aio_finalize() deeply nested in what used
to be an AIO callback, do it in the same functions that allocated the
AIOCB.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 3cb579e..a225327 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -139,9 +139,8 @@ static bool quorum_vote(QuorumAIOCB *acb);
 
 static void quorum_aio_finalize(QuorumAIOCB *acb)
 {
-    acb->has_completed = true;
     g_free(acb->qcrs);
-    qemu_coroutine_enter_if_inactive(acb->co);
+    g_free(acb);
 }
 
 static bool quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
@@ -233,7 +232,8 @@ static void quorum_rewrite_aio_cb(void *opaque, int ret)
         return;
     }
 
-    quorum_aio_finalize(acb);
+    acb->has_completed = true;
+    qemu_coroutine_enter_if_inactive(acb->co);
 }
 
 static int read_fifo_child(QuorumAIOCB *acb);
@@ -279,7 +279,7 @@ static int quorum_fifo_aio_cb(void *opaque, int ret)
     acb->vote_ret = ret;
 
     /* FIXME: rewrite failed children if acb->children_read > 1? */
-    quorum_aio_finalize(acb);
+
     return ret;
 }
 
@@ -317,7 +317,8 @@ static void quorum_aio_cb(void *opaque, int ret)
 
     /* if no rewrite is done the code will finish right away */
     if (!rewrite) {
-        quorum_aio_finalize(acb);
+        acb->has_completed = true;
+        qemu_coroutine_enter_if_inactive(acb->co);
     }
 }
 
@@ -716,7 +717,8 @@ static int quorum_co_readv(BlockDriverState *bs,
     } else {
         ret = read_fifo_child(acb);
     }
-    g_free(acb);
+    quorum_aio_finalize(acb);
+
     return ret;
 }
 
@@ -759,6 +761,7 @@ static int quorum_co_writev(BlockDriverState *bs,
     }
 
     ret = acb->vote_ret;
+    quorum_aio_finalize(acb);
 
     return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [RFC PATCH 5/8] quorum: Inline quorum_aio_cb()
  2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (3 preceding siblings ...)
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 4/8] quorum: Do cleanup in caller coroutine Kevin Wolf
@ 2016-11-10 17:19 ` Kevin Wolf
  2016-11-17 14:25   ` Alberto Garcia
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites Kevin Wolf
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Kevin Wolf @ 2016-11-10 17:19 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, berto, qemu-devel

This is a conversion to a more natural coroutine style and improves the
readability of the driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 112 ++++++++++++++++++++++++++-------------------------------
 1 file changed, 51 insertions(+), 61 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index a225327..b2bb3af 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -130,7 +130,6 @@ struct QuorumAIOCB {
     QuorumVotes votes;
 
     bool is_read;
-    bool has_completed;
     int vote_ret;
     int children_read;          /* how many children have been read from */
 };
@@ -173,7 +172,6 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
     acb->rewrite_count = 0;
     acb->votes.compare = quorum_sha256_compare;
     QLIST_INIT(&acb->votes.vote_list);
-    acb->has_completed = false;
     acb->is_read = false;
     acb->vote_ret = 0;
 
@@ -226,13 +224,6 @@ static void quorum_rewrite_aio_cb(void *opaque, int ret)
 
     /* one less rewrite to do */
     acb->rewrite_count--;
-
-    /* wait until all rewrite callbacks have completed */
-    if (acb->rewrite_count) {
-        return;
-    }
-
-    acb->has_completed = true;
     qemu_coroutine_enter_if_inactive(acb->co);
 }
 
@@ -283,45 +274,6 @@ static int quorum_fifo_aio_cb(void *opaque, int ret)
     return ret;
 }
 
-static void quorum_aio_cb(void *opaque, int ret)
-{
-    QuorumChildRequest *sacb = opaque;
-    QuorumAIOCB *acb = sacb->parent;
-    BDRVQuorumState *s = acb->bs->opaque;
-    bool rewrite = false;
-    int i;
-
-    sacb->ret = ret;
-    if (ret == 0) {
-        acb->success_count++;
-    } else {
-        quorum_report_bad_acb(sacb, ret);
-    }
-    acb->count++;
-    assert(acb->count <= s->num_children);
-    assert(acb->success_count <= s->num_children);
-    if (acb->count < s->num_children) {
-        return;
-    }
-
-    /* Do the vote on read */
-    if (acb->is_read) {
-        rewrite = quorum_vote(acb);
-        for (i = 0; i < s->num_children; i++) {
-            qemu_vfree(acb->qcrs[i].buf);
-            qemu_iovec_destroy(&acb->qcrs[i].qiov);
-        }
-    } else {
-        quorum_has_too_much_io_failed(acb);
-    }
-
-    /* if no rewrite is done the code will finish right away */
-    if (!rewrite) {
-        acb->has_completed = true;
-        qemu_coroutine_enter_if_inactive(acb->co);
-    }
-}
-
 static void quorum_report_bad_versions(BDRVQuorumState *s,
                                        QuorumAIOCB *acb,
                                        QuorumVoteValue *value)
@@ -645,19 +597,32 @@ static void read_quorum_children_entry(void *opaque)
     QuorumAIOCB *acb = co->acb;
     BDRVQuorumState *s = acb->bs->opaque;
     int i = co->i;
-    int ret;
+    QuorumChildRequest *sacb = &acb->qcrs[i];
     co = NULL; /* Not valid after the first yield */
 
-    acb->qcrs[i].bs = s->children[i]->bs;
-    ret = bdrv_co_preadv(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
-                         acb->nb_sectors * BDRV_SECTOR_SIZE,
-                         &acb->qcrs[i].qiov, 0);
-    quorum_aio_cb(&acb->qcrs[i], ret);
+    sacb->bs = s->children[i]->bs;
+    sacb->ret = bdrv_co_preadv(s->children[i],
+                               acb->sector_num * BDRV_SECTOR_SIZE,
+                               acb->nb_sectors * BDRV_SECTOR_SIZE,
+                               &acb->qcrs[i].qiov, 0);
+
+    if (sacb->ret == 0) {
+        acb->success_count++;
+    } else {
+        quorum_report_bad_acb(sacb, sacb->ret);
+    }
+
+    acb->count++;
+    assert(acb->count <= s->num_children);
+    assert(acb->success_count <= s->num_children);
+
+    qemu_coroutine_enter_if_inactive(acb->co);
 }
 
 static int read_quorum_children(QuorumAIOCB *acb)
 {
     BDRVQuorumState *s = acb->bs->opaque;
+    bool rewrite = false;
     int i, ret;
 
     acb->children_read = s->num_children;
@@ -678,7 +643,19 @@ static int read_quorum_children(QuorumAIOCB *acb)
         qemu_coroutine_enter(co);
     }
 
-    if (!acb->has_completed) {
+    while (acb->count < s->num_children) {
+        qemu_coroutine_yield();
+    }
+
+    /* Do the vote on read */
+    rewrite = quorum_vote(acb);
+    for (i = 0; i < s->num_children; i++) {
+        qemu_vfree(acb->qcrs[i].buf);
+        qemu_iovec_destroy(&acb->qcrs[i].qiov);
+    }
+
+    assert(rewrite == !!acb->rewrite_count);
+    while (acb->rewrite_count) {
         qemu_coroutine_yield();
     }
 
@@ -728,13 +705,24 @@ static void write_quorum_entry(void *opaque)
     QuorumAIOCB *acb = co->acb;
     BDRVQuorumState *s = acb->bs->opaque;
     int i = co->i;
-    int ret;
+    QuorumChildRequest *sacb = &acb->qcrs[i];
     co = NULL; /* Not valid after the first yield */
 
-    acb->qcrs[i].bs = s->children[i]->bs;
-    ret = bdrv_co_pwritev(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
-                          acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
-    quorum_aio_cb(&acb->qcrs[i], ret);
+    sacb->bs = s->children[i]->bs;
+    sacb->ret = bdrv_co_pwritev(s->children[i],
+                                acb->sector_num * BDRV_SECTOR_SIZE,
+                                acb->nb_sectors * BDRV_SECTOR_SIZE,
+                                acb->qiov, 0);
+    if (sacb->ret == 0) {
+        acb->success_count++;
+    } else {
+        quorum_report_bad_acb(sacb, sacb->ret);
+    }
+    acb->count++;
+    assert(acb->count <= s->num_children);
+    assert(acb->success_count <= s->num_children);
+
+    qemu_coroutine_enter_if_inactive(acb->co);
 }
 
 static int quorum_co_writev(BlockDriverState *bs,
@@ -756,10 +744,12 @@ static int quorum_co_writev(BlockDriverState *bs,
         qemu_coroutine_enter(co);
     }
 
-    if (!acb->has_completed) {
+    while (acb->count < s->num_children) {
         qemu_coroutine_yield();
     }
 
+    quorum_has_too_much_io_failed(acb);
+
     ret = acb->vote_ret;
     quorum_aio_finalize(acb);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites
  2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (4 preceding siblings ...)
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 5/8] quorum: Inline quorum_aio_cb() Kevin Wolf
@ 2016-11-10 17:19 ` Kevin Wolf
  2016-11-11  2:25   ` Eric Blake
  2016-11-17 14:54   ` Alberto Garcia
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Kevin Wolf @ 2016-11-10 17:19 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, berto, qemu-devel

Replacing it with bdrv_co_pwritev() prepares us for byte granularity
requests and gets us rid of the last bdrv_aio_*() user in quorum.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 52 +++++++++++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 19 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index b2bb3af..1426115 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -134,6 +134,11 @@ struct QuorumAIOCB {
     int children_read;          /* how many children have been read from */
 };
 
+typedef struct QuorumCo {
+    QuorumAIOCB *acb;
+    int i;
+} QuorumCo;
+
 static bool quorum_vote(QuorumAIOCB *acb);
 
 static void quorum_aio_finalize(QuorumAIOCB *acb)
@@ -218,15 +223,6 @@ static bool quorum_has_too_much_io_failed(QuorumAIOCB *acb)
     return false;
 }
 
-static void quorum_rewrite_aio_cb(void *opaque, int ret)
-{
-    QuorumAIOCB *acb = opaque;
-
-    /* one less rewrite to do */
-    acb->rewrite_count--;
-    qemu_coroutine_enter_if_inactive(acb->co);
-}
-
 static int read_fifo_child(QuorumAIOCB *acb);
 
 static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
@@ -293,7 +289,25 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
     }
 }
 
-static bool quorum_rewrite_bad_versions(BDRVQuorumState *s, QuorumAIOCB *acb,
+static void quorum_rewrite_entry(void *opaque)
+{
+    QuorumCo *co = opaque;
+    QuorumAIOCB *acb = co->acb;
+    BDRVQuorumState *s = acb->bs->opaque;
+    int ret;
+
+    ret = bdrv_co_pwritev(s->children[co->i],
+                          acb->sector_num * BDRV_SECTOR_SIZE,
+                          acb->nb_sectors * BDRV_SECTOR_SIZE,
+                          acb->qiov, 0);
+    (void) ret;
+
+    /* one less rewrite to do */
+    acb->rewrite_count--;
+    qemu_coroutine_enter_if_inactive(acb->co);
+}
+
+static bool quorum_rewrite_bad_versions(QuorumAIOCB *acb,
                                         QuorumVoteValue *value)
 {
     QuorumVoteVersion *version;
@@ -321,9 +335,14 @@ static bool quorum_rewrite_bad_versions(BDRVQuorumState *s, QuorumAIOCB *acb,
             continue;
         }
         QLIST_FOREACH(item, &version->items, next) {
-            bdrv_aio_writev(s->children[item->index], acb->sector_num,
-                            acb->qiov, acb->nb_sectors, quorum_rewrite_aio_cb,
-                            acb);
+            Coroutine *co;
+            QuorumCo data = {
+                .acb = acb,
+                .i = item->index,
+            };
+
+            co = qemu_coroutine_create(quorum_rewrite_entry, &data);
+            qemu_coroutine_enter(co);
         }
     }
 
@@ -577,7 +596,7 @@ static bool quorum_vote(QuorumAIOCB *acb)
 
     /* corruption correction is enabled */
     if (s->rewrite_corrupted) {
-        rewrite = quorum_rewrite_bad_versions(s, acb, &winner->value);
+        rewrite = quorum_rewrite_bad_versions(acb, &winner->value);
     }
 
 free_exit:
@@ -586,11 +605,6 @@ free_exit:
     return rewrite;
 }
 
-typedef struct QuorumCo {
-    QuorumAIOCB *acb;
-    int i;
-} QuorumCo;
-
 static void read_quorum_children_entry(void *opaque)
 {
     QuorumCo *co = opaque;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [RFC PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (5 preceding siblings ...)
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites Kevin Wolf
@ 2016-11-10 17:19 ` Kevin Wolf
  2016-11-11  2:37   ` Eric Blake
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 8/8] quorum: Inline quorum_fifo_aio_cb() Kevin Wolf
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Kevin Wolf @ 2016-11-10 17:19 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, berto, qemu-devel

This enables byte granularity requests on quorum nodes.

Note that the QMP events emitted by the driver are an external API that
we were careless enough to define as sector based. The offset and length
of requests reported in events are rounded down therefore.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 75 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 33 insertions(+), 42 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 1426115..45939c5 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -114,8 +114,8 @@ struct QuorumAIOCB {
     Coroutine *co;
 
     /* Request metadata */
-    uint64_t sector_num;
-    int nb_sectors;
+    uint64_t offset;
+    uint64_t bytes;
 
     QEMUIOVector *qiov;         /* calling IOV */
 
@@ -159,8 +159,8 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
 
 static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
                                    QEMUIOVector *qiov,
-                                   uint64_t sector_num,
-                                   int nb_sectors)
+                                   uint64_t offset,
+                                   uint64_t bytes)
 {
     BDRVQuorumState *s = bs->opaque;
     QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);
@@ -168,8 +168,8 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
 
     acb->co = qemu_coroutine_self();
     acb->bs = bs;
-    acb->sector_num = sector_num;
-    acb->nb_sectors = nb_sectors;
+    acb->offset = offset;
+    acb->bytes = bytes;
     acb->qiov = qiov;
     acb->qcrs = g_new0(QuorumChildRequest, s->num_children);
     acb->count = 0;
@@ -189,8 +189,8 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
     return acb;
 }
 
-static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
-                              int nb_sectors, char *node_name, int ret)
+static void quorum_report_bad(QuorumOpType type, uint64_t offset,
+                              uint64_t bytes, char *node_name, int ret)
 {
     const char *msg = NULL;
     if (ret < 0) {
@@ -198,14 +198,17 @@ static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
     }
 
     qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name,
-                                      sector_num, nb_sectors, &error_abort);
+                                      offset / BDRV_SECTOR_SIZE,
+                                      bytes / BDRV_SECTOR_SIZE, &error_abort);
 }
 
 static void quorum_report_failure(QuorumAIOCB *acb)
 {
     const char *reference = bdrv_get_device_or_node_name(acb->bs);
-    qapi_event_send_quorum_failure(reference, acb->sector_num,
-                                   acb->nb_sectors, &error_abort);
+    qapi_event_send_quorum_failure(reference,
+                                   acb->offset / BDRV_SECTOR_SIZE,
+                                   acb->bytes / BDRV_SECTOR_SIZE,
+                                   &error_abort);
 }
 
 static int quorum_vote_error(QuorumAIOCB *acb);
@@ -242,8 +245,7 @@ static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret)
 {
     QuorumAIOCB *acb = sacb->parent;
     QuorumOpType type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE;
-    quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
-                      sacb->bs->node_name, ret);
+    quorum_report_bad(type, acb->offset, acb->bytes, sacb->bs->node_name, ret);
 }
 
 static int quorum_fifo_aio_cb(void *opaque, int ret)
@@ -282,8 +284,7 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
             continue;
         }
         QLIST_FOREACH(item, &version->items, next) {
-            quorum_report_bad(QUORUM_OP_TYPE_READ, acb->sector_num,
-                              acb->nb_sectors,
+            quorum_report_bad(QUORUM_OP_TYPE_READ, acb->offset, acb->bytes,
                               s->children[item->index]->bs->node_name, 0);
         }
     }
@@ -296,9 +297,7 @@ static void quorum_rewrite_entry(void *opaque)
     BDRVQuorumState *s = acb->bs->opaque;
     int ret;
 
-    ret = bdrv_co_pwritev(s->children[co->i],
-                          acb->sector_num * BDRV_SECTOR_SIZE,
-                          acb->nb_sectors * BDRV_SECTOR_SIZE,
+    ret = bdrv_co_pwritev(s->children[co->i], acb->offset, acb->bytes,
                           acb->qiov, 0);
     (void) ret;
 
@@ -462,8 +461,8 @@ static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
     va_list ap;
 
     va_start(ap, fmt);
-    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
-            acb->sector_num, acb->nb_sectors);
+    fprintf(stderr, "quorum: offset=%" PRIu64 " bytes=%" PRIu64 " ",
+            acb->offset, acb->bytes);
     vfprintf(stderr, fmt, ap);
     fprintf(stderr, "\n");
     va_end(ap);
@@ -481,9 +480,8 @@ static bool quorum_compare(QuorumAIOCB *acb,
     if (s->is_blkverify) {
         offset = qemu_iovec_compare(a, b);
         if (offset != -1) {
-            quorum_err(acb, "contents mismatch in sector %" PRId64,
-                       acb->sector_num +
-                       (uint64_t)(offset / BDRV_SECTOR_SIZE));
+            quorum_err(acb, "contents mismatch at offset %" PRIu64,
+                       acb->offset + offset);
         }
         return true;
     }
@@ -615,9 +613,7 @@ static void read_quorum_children_entry(void *opaque)
     co = NULL; /* Not valid after the first yield */
 
     sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_preadv(s->children[i],
-                               acb->sector_num * BDRV_SECTOR_SIZE,
-                               acb->nb_sectors * BDRV_SECTOR_SIZE,
+    sacb->ret = bdrv_co_preadv(s->children[i], acb->offset, acb->bytes,
                                &acb->qcrs[i].qiov, 0);
 
     if (sacb->ret == 0) {
@@ -685,19 +681,17 @@ static int read_fifo_child(QuorumAIOCB *acb)
     int ret;
 
     acb->qcrs[n].bs = s->children[n]->bs;
-    ret = bdrv_co_preadv(s->children[n], acb->sector_num * BDRV_SECTOR_SIZE,
-                         acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
+    ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes, acb->qiov, 0);
     ret = quorum_fifo_aio_cb(&acb->qcrs[n], ret);
 
     return ret;
 }
 
-static int quorum_co_readv(BlockDriverState *bs,
-                           int64_t sector_num, int nb_sectors,
-                           QEMUIOVector *qiov)
+static int quorum_co_preadv(BlockDriverState *bs, uint64_t offset,
+                            uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors);
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes);
     int ret;
 
     acb->is_read = true;
@@ -723,9 +717,7 @@ static void write_quorum_entry(void *opaque)
     co = NULL; /* Not valid after the first yield */
 
     sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_pwritev(s->children[i],
-                                acb->sector_num * BDRV_SECTOR_SIZE,
-                                acb->nb_sectors * BDRV_SECTOR_SIZE,
+    sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
                                 acb->qiov, 0);
     if (sacb->ret == 0) {
         acb->success_count++;
@@ -739,12 +731,11 @@ static void write_quorum_entry(void *opaque)
     qemu_coroutine_enter_if_inactive(acb->co);
 }
 
-static int quorum_co_writev(BlockDriverState *bs,
-                            int64_t sector_num, int nb_sectors,
-                            QEMUIOVector *qiov)
+static int quorum_co_pwritev(BlockDriverState *bs, uint64_t offset,
+                             uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors);
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes);
     int i, ret;
 
     for (i = 0; i < s->num_children; i++) {
@@ -811,7 +802,7 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
         result = bdrv_co_flush(s->children[i]->bs);
         if (result) {
             quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
-                              bdrv_nb_sectors(s->children[i]->bs),
+                              bdrv_getlength(s->children[i]->bs),
                               s->children[i]->bs->node_name, result);
             result_value.l = result;
             quorum_count_vote(&error_votes, &result_value, i);
@@ -1144,8 +1135,8 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_getlength                     = quorum_getlength,
 
-    .bdrv_co_readv                      = quorum_co_readv,
-    .bdrv_co_writev                     = quorum_co_writev,
+    .bdrv_co_preadv                     = quorum_co_preadv,
+    .bdrv_co_pwritev                    = quorum_co_pwritev,
 
     .bdrv_add_child                     = quorum_add_child,
     .bdrv_del_child                     = quorum_del_child,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [RFC PATCH 8/8] quorum: Inline quorum_fifo_aio_cb()
  2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (6 preceding siblings ...)
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
@ 2016-11-10 17:19 ` Kevin Wolf
  2016-11-18  9:47   ` Alberto Garcia
  2016-11-11  9:56 ` [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Paolo Bonzini
  2016-11-13  3:18 ` no-reply
  9 siblings, 1 reply; 35+ messages in thread
From: Kevin Wolf @ 2016-11-10 17:19 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, berto, qemu-devel

Inlining the function removes some boilerplace code and replaces
recursion by a simple loop, so the code becomes somewhat easier to
understand.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 42 +++++++++++++-----------------------------
 1 file changed, 13 insertions(+), 29 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 45939c5..0e91d59 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -248,30 +248,6 @@ static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret)
     quorum_report_bad(type, acb->offset, acb->bytes, sacb->bs->node_name, ret);
 }
 
-static int quorum_fifo_aio_cb(void *opaque, int ret)
-{
-    QuorumChildRequest *sacb = opaque;
-    QuorumAIOCB *acb = sacb->parent;
-    BDRVQuorumState *s = acb->bs->opaque;
-
-    assert(acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO);
-
-    if (ret < 0) {
-        quorum_report_bad_acb(sacb, ret);
-
-        /* We try to read next child in FIFO order if we fail to read */
-        if (acb->children_read < s->num_children) {
-            return read_fifo_child(acb);
-        }
-    }
-
-    acb->vote_ret = ret;
-
-    /* FIXME: rewrite failed children if acb->children_read > 1? */
-
-    return ret;
-}
-
 static void quorum_report_bad_versions(BDRVQuorumState *s,
                                        QuorumAIOCB *acb,
                                        QuorumVoteValue *value)
@@ -677,12 +653,20 @@ static int read_quorum_children(QuorumAIOCB *acb)
 static int read_fifo_child(QuorumAIOCB *acb)
 {
     BDRVQuorumState *s = acb->bs->opaque;
-    int n = acb->children_read++;
-    int ret;
+    int n, ret;
+
+    /* We try to read the next child in FIFO order if we failed to read */
+    do {
+        n = acb->children_read++;
+        acb->qcrs[n].bs = s->children[n]->bs;
+        ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes,
+                             acb->qiov, 0);
+        if (ret < 0) {
+            quorum_report_bad_acb(&acb->qcrs[n], ret);
+        }
+    } while(ret < 0 && acb->children_read < s->num_children);
 
-    acb->qcrs[n].bs = s->children[n]->bs;
-    ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes, acb->qiov, 0);
-    ret = quorum_fifo_aio_cb(&acb->qcrs[n], ret);
+    /* FIXME: rewrite failed children if acb->children_read > 1? */
 
     return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive()
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive() Kevin Wolf
@ 2016-11-10 23:49   ` Eric Blake
  2016-11-17  9:30   ` Alberto Garcia
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Blake @ 2016-11-10 23:49 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 1310 bytes --]

On 11/10/2016 11:19 AM, Kevin Wolf wrote:
> In the context of asynchronous work, if we have a worker coroutine that
> didn't yield, the parent coroutine cannot be reentered because it hasn't
> yielded yet. In this case we don't even have to reenter the parent
> because it will see that the work is already done and won't even yield.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/coroutine.h | 6 ++++++
>  util/qemu-coroutine.c    | 8 ++++++++
>  2 files changed, 14 insertions(+)
> 

> +++ b/util/qemu-coroutine.c
> @@ -19,6 +19,7 @@
>  #include "qemu/atomic.h"
>  #include "qemu/coroutine.h"
>  #include "qemu/coroutine_int.h"
> +#include "block/aio.h"

Why do you need this include?

>  
>  enum {
>      POOL_BATCH_SIZE = 64,
> @@ -131,6 +132,13 @@ void qemu_coroutine_enter(Coroutine *co)
>      }
>  }
>  
> +void qemu_coroutine_enter_if_inactive(Coroutine *co)
> +{
> +    if (!qemu_coroutine_entered(co)) {
> +        qemu_coroutine_enter(co);
> +    }
> +}
> +
>  void coroutine_fn qemu_coroutine_yield(void)
>  {
>      Coroutine *self = qemu_coroutine_self();
> 

Otherwise:
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments Kevin Wolf
@ 2016-11-10 23:52   ` Eric Blake
  2016-11-11  9:58   ` Paolo Bonzini
  2016-11-11 14:18   ` Alberto Garcia
  2 siblings, 0 replies; 35+ messages in thread
From: Eric Blake @ 2016-11-10 23:52 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 446 bytes --]

On 11/10/2016 11:19 AM, Kevin Wolf wrote:
> There is no point in passing the value of bs->opaque in order to
> overwrite it with itself.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/8] quorum: Implement .bdrv_co_readv/writev
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 3/8] quorum: Implement .bdrv_co_readv/writev Kevin Wolf
@ 2016-11-11  1:56   ` Eric Blake
  2016-11-16 15:57   ` Alberto Garcia
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Blake @ 2016-11-11  1:56 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 1740 bytes --]

On 11/10/2016 11:19 AM, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 194 ++++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 117 insertions(+), 77 deletions(-)
> 


> +
> +static void read_quorum_children_entry(void *opaque)
> +{
> +    QuorumCo *co = opaque;
> +    QuorumAIOCB *acb = co->acb;
> +    BDRVQuorumState *s = acb->bs->opaque;
> +    int i = co->i;
> +    int ret;
> +    co = NULL; /* Not valid after the first yield */

Why bother to invalidate co...

> +
> +    acb->qcrs[i].bs = s->children[i]->bs;
> +    ret = bdrv_co_preadv(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
> +                         acb->nb_sectors * BDRV_SECTOR_SIZE,
> +                         &acb->qcrs[i].qiov, 0);
> +    quorum_aio_cb(&acb->qcrs[i], ret);
> +}

...when it isn't used later? Is it just for future-proofing edits made
in later patches?

> +static void write_quorum_entry(void *opaque)
> +{
> +    QuorumCo *co = opaque;
> +    QuorumAIOCB *acb = co->acb;
> +    BDRVQuorumState *s = acb->bs->opaque;
> +    int i = co->i;
> +    int ret;
> +    co = NULL; /* Not valid after the first yield */
> +
> +    acb->qcrs[i].bs = s->children[i]->bs;
> +    ret = bdrv_co_pwritev(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
> +                          acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
> +    quorum_aio_cb(&acb->qcrs[i], ret);
>  }

and again

Otherwise, the conversion looks sane to me, but I'm just weak enough on
coroutines that I won't give R-b while this is still in RFC

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 4/8] quorum: Do cleanup in caller coroutine
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 4/8] quorum: Do cleanup in caller coroutine Kevin Wolf
@ 2016-11-11  2:18   ` Eric Blake
  2016-11-17 10:04   ` Alberto Garcia
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Blake @ 2016-11-11  2:18 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 552 bytes --]

On 11/10/2016 11:19 AM, Kevin Wolf wrote:
> Instead of calling quorum_aio_finalize() deeply nested in what used
> to be an AIO callback, do it in the same functions that allocated the
> AIOCB.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 

Looks clean, but again I'm weak enough on coroutines that I won't give
R-b to an RFC

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites Kevin Wolf
@ 2016-11-11  2:25   ` Eric Blake
  2016-11-17 14:54   ` Alberto Garcia
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Blake @ 2016-11-11  2:25 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]

On 11/10/2016 11:19 AM, Kevin Wolf wrote:
> Replacing it with bdrv_co_pwritev() prepares us for byte granularity
> requests and gets us rid of the last bdrv_aio_*() user in quorum.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 52 +++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 33 insertions(+), 19 deletions(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index b2bb3af..1426115 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -134,6 +134,11 @@ struct QuorumAIOCB {
>      int children_read;          /* how many children have been read from */
>  };
>  
> +typedef struct QuorumCo {
> +    QuorumAIOCB *acb;
> +    int i;
> +} QuorumCo;
> +

> @@ -586,11 +605,6 @@ free_exit:
>      return rewrite;
>  }
>  
> -typedef struct QuorumCo {
> -    QuorumAIOCB *acb;
> -    int i;
> -} QuorumCo;

Code motion of a struct added earlier in this series; do you want to
declare it up front to begin with to minimize churn?


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
@ 2016-11-11  2:37   ` Eric Blake
  2016-11-11  9:58     ` Kevin Wolf
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Blake @ 2016-11-11  2:37 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

On 11/10/2016 11:19 AM, Kevin Wolf wrote:
> This enables byte granularity requests on quorum nodes.
> 
> Note that the QMP events emitted by the driver are an external API that
> we were careless enough to define as sector based. The offset and length
> of requests reported in events are rounded down therefore.

Would it be better to round offset down and length up?  A length of 0
looks odd.

Should we add more fields to the two affected events (QUORUM_FAILURE and
QUORUM_REPORT_BAD)? We have to keep the existing fields for back-compat,
but we could add new fields that give byte-based locations for
management apps smart enough to use the new instead of the old
(particularly since the old fields are named 'sector-num' and
'sectors-count').

> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 75 ++++++++++++++++++++++++++--------------------------------
>  1 file changed, 33 insertions(+), 42 deletions(-)
> 

> @@ -198,14 +198,17 @@ static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
>      }
>  
>      qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name,
> -                                      sector_num, nb_sectors, &error_abort);
> +                                      offset / BDRV_SECTOR_SIZE,
> +                                      bytes / BDRV_SECTOR_SIZE, &error_abort);

Rounding the offset down makes sense, but rounding the bytes down can
lead to weird messages.  Blindly rounding it up to a sector boundary can
also be wrong (example: writing 2 bytes at offset 511 really affects
1024 bytes when you consider that two sectors had to undergo
read-modify-write). Don't we have a helper routine for determining the
end boundary when we have to convert an offset and length to a courser
alignment?


> @@ -462,8 +461,8 @@ static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
>      va_list ap;
>  
>      va_start(ap, fmt);
> -    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
> -            acb->sector_num, acb->nb_sectors);
> +    fprintf(stderr, "quorum: offset=%" PRIu64 " bytes=%" PRIu64 " ",
> +            acb->offset, acb->bytes);

Might be worth a separate patch to get rid of fprintf and use correct
error reporting.  But not the work for this patch.

Overall, I like that you are getting rid of more sector-based cruft.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (7 preceding siblings ...)
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 8/8] quorum: Inline quorum_fifo_aio_cb() Kevin Wolf
@ 2016-11-11  9:56 ` Paolo Bonzini
  2016-11-11 10:22   ` Kevin Wolf
  2016-11-13  3:18 ` no-reply
  9 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2016-11-11  9:56 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz



On 10/11/2016 18:19, Kevin Wolf wrote:
> This is part of a series that I'm working on and that aims to remove the
> bdrv_aio_*() emulation in io.c. blkdebug and blkverify were easy, but for
> quorum I need a few more patches, so I'm sending this out as an RFC while I
> continue work on the rest (QED, and then possibly some polishing).
> 
> After the series, quorum doesn't only use the preferred interfaces, but the
> code becomes a little easier to follow and byte granularity requests are
> supported. There is probably still some potential for additional cleanups in a
> follow-up series, but this should already be enough to be worth merging.

Honestly I don't see the point.  It seems easier, more practical and
more effective to convert bdrv_aio_* to byte ranges, especially since
QED would be basically a rewrite.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments Kevin Wolf
  2016-11-10 23:52   ` Eric Blake
@ 2016-11-11  9:58   ` Paolo Bonzini
  2016-11-11 14:18   ` Alberto Garcia
  2 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2016-11-11  9:58 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz



On 10/11/2016 18:19, Kevin Wolf wrote:
> There is no point in passing the value of bs->opaque in order to
> overwrite it with itself.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index d122299..dfa9fd3 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -171,18 +171,17 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
>      return a->l == b->l;
>  }
>  
> -static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
> -                                   BlockDriverState *bs,
> +static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
>                                     QEMUIOVector *qiov,
>                                     uint64_t sector_num,
>                                     int nb_sectors,
>                                     BlockCompletionFunc *cb,
>                                     void *opaque)
>  {
> +    BDRVQuorumState *s = bs->opaque;
>      QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
>      int i;
>  
> -    acb->common.bs->opaque = s;
>      acb->sector_num = sector_num;
>      acb->nb_sectors = nb_sectors;
>      acb->qiov = qiov;
> @@ -691,7 +690,7 @@ static BlockAIOCB *quorum_aio_readv(BlockDriverState *bs,
>                                      void *opaque)
>  {
>      BDRVQuorumState *s = bs->opaque;
> -    QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
> +    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num,
>                                        nb_sectors, cb, opaque);
>      acb->is_read = true;
>      acb->children_read = 0;
> @@ -711,7 +710,7 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
>                                       void *opaque)
>  {
>      BDRVQuorumState *s = bs->opaque;
> -    QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors,
> +    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors,
>                                        cb, opaque);
>      int i;
>  
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-11  2:37   ` Eric Blake
@ 2016-11-11  9:58     ` Kevin Wolf
  2016-11-11 15:08       ` Eric Blake
  2016-11-17 15:30       ` Alberto Garcia
  0 siblings, 2 replies; 35+ messages in thread
From: Kevin Wolf @ 2016-11-11  9:58 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-block, berto, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 3037 bytes --]

Am 11.11.2016 um 03:37 hat Eric Blake geschrieben:
> On 11/10/2016 11:19 AM, Kevin Wolf wrote:
> > This enables byte granularity requests on quorum nodes.
> > 
> > Note that the QMP events emitted by the driver are an external API that
> > we were careless enough to define as sector based. The offset and length
> > of requests reported in events are rounded down therefore.
> 
> Would it be better to round offset down and length up?  A length of 0
> looks odd.

To be honest, I don't know what these events are used for and what the
most useful rounding would be. Maybe Berto can tell?

> Should we add more fields to the two affected events (QUORUM_FAILURE and
> QUORUM_REPORT_BAD)? We have to keep the existing fields for back-compat,
> but we could add new fields that give byte-based locations for
> management apps smart enough to use the new instead of the old
> (particularly since the old fields are named 'sector-num' and
> 'sectors-count').

If there is a user for the new fields, I can do that.

> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/quorum.c | 75 ++++++++++++++++++++++++++--------------------------------
> >  1 file changed, 33 insertions(+), 42 deletions(-)
> > 
> 
> > @@ -198,14 +198,17 @@ static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
> >      }
> >  
> >      qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name,
> > -                                      sector_num, nb_sectors, &error_abort);
> > +                                      offset / BDRV_SECTOR_SIZE,
> > +                                      bytes / BDRV_SECTOR_SIZE, &error_abort);
> 
> Rounding the offset down makes sense, but rounding the bytes down can
> lead to weird messages.  Blindly rounding it up to a sector boundary can
> also be wrong (example: writing 2 bytes at offset 511 really affects
> 1024 bytes when you consider that two sectors had to undergo
> read-modify-write). Don't we have a helper routine for determining the
> end boundary when we have to convert an offset and length to a courser
> alignment?

Hm, I would have to check the header files. I don't know one off the top
of my head. If you find it, let me know.

> > @@ -462,8 +461,8 @@ static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
> >      va_list ap;
> >  
> >      va_start(ap, fmt);
> > -    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
> > -            acb->sector_num, acb->nb_sectors);
> > +    fprintf(stderr, "quorum: offset=%" PRIu64 " bytes=%" PRIu64 " ",
> > +            acb->offset, acb->bytes);
> 
> Might be worth a separate patch to get rid of fprintf and use correct
> error reporting.  But not the work for this patch.

What would correct error reporting be in this case? A QMP event? Because
other than that and stderr, I don't think we have any channels for error
messages for I/O requests. We could use error_report(), but it would be
effectively the same thing as fprintf().

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-11  9:56 ` [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Paolo Bonzini
@ 2016-11-11 10:22   ` Kevin Wolf
  2016-11-18  9:51     ` Alberto Garcia
  0 siblings, 1 reply; 35+ messages in thread
From: Kevin Wolf @ 2016-11-11 10:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, berto, qemu-devel, mreitz

Am 11.11.2016 um 10:56 hat Paolo Bonzini geschrieben:
> On 10/11/2016 18:19, Kevin Wolf wrote:
> > This is part of a series that I'm working on and that aims to remove the
> > bdrv_aio_*() emulation in io.c. blkdebug and blkverify were easy, but for
> > quorum I need a few more patches, so I'm sending this out as an RFC while I
> > continue work on the rest (QED, and then possibly some polishing).
> > 
> > After the series, quorum doesn't only use the preferred interfaces, but the
> > code becomes a little easier to follow and byte granularity requests are
> > supported. There is probably still some potential for additional cleanups in a
> > follow-up series, but this should already be enough to be worth merging.
> 
> Honestly I don't see the point.  It seems easier, more practical and
> more effective to convert bdrv_aio_* to byte ranges, especially since
> QED would be basically a rewrite.

Well, it's an RFC for a reason, I want people to give their opinions
before consistently following through with this. But I do think that we
pointlessly offer too many different interfaces to do the same thing in
the block layer.

Anyway, I think that this series makes the quorum driver quite a bit
easier to follow (and it was the same thing really for blkdebug and
blkverify), even though I only made the really obvious big changes. If
someone looked into this, there are most likely more simplifications
that could be done now.

Sometimes, simplicity is in the eye of the beholder and that's why I'm
asking for more opinions, but for me this is a clear improvement. With
this, the added benefit that we can phase out at least one of the bdrv_*
interfaces and byte granularity as a side effect, my question is rather:
What would be the point in not doing it?


As for QED, that shouldn't be hard. Quorum was really the hardest
conversion because it actually uses AIO to implement parallelism.
blkverify was similar, but on a much simpler level. But QED won't need
to create any coroutines internally because each request is processed
strictly sequentially. It's a simple mechanical conversion from:

    bdrv_aio_foo(..., cb, opaque);

to:

    ret = bdrv_co_foo(...);
    cb(opaque, ret);

And that's it. The rest is cleanup work that we may or may not do.

Kevin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments Kevin Wolf
  2016-11-10 23:52   ` Eric Blake
  2016-11-11  9:58   ` Paolo Bonzini
@ 2016-11-11 14:18   ` Alberto Garcia
  2 siblings, 0 replies; 35+ messages in thread
From: Alberto Garcia @ 2016-11-11 14:18 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, qemu-devel

On Thu 10 Nov 2016 06:19:03 PM CET, Kevin Wolf wrote:
> There is no point in passing the value of bs->opaque in order to
> overwrite it with itself.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-11  9:58     ` Kevin Wolf
@ 2016-11-11 15:08       ` Eric Blake
  2016-11-17 15:30       ` Alberto Garcia
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Blake @ 2016-11-11 15:08 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, berto, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 2948 bytes --]

On 11/11/2016 03:58 AM, Kevin Wolf wrote:
>> Should we add more fields to the two affected events (QUORUM_FAILURE and
>> QUORUM_REPORT_BAD)? We have to keep the existing fields for back-compat,
>> but we could add new fields that give byte-based locations for
>> management apps smart enough to use the new instead of the old
>> (particularly since the old fields are named 'sector-num' and
>> 'sectors-count').
> 
> If there is a user for the new fields, I can do that.

Libvirt is not using quorums yet, but would definitely prefer to use
byte-based information rather than sector based (especially since the
documentation isn't specific on how much a sector is; a quorum built on
top of disks with 4k sectors may be weird to interpret if you don't know
that qemu is hard-coded to 512-byte sectors)


>>>      qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name,
>>> -                                      sector_num, nb_sectors, &error_abort);
>>> +                                      offset / BDRV_SECTOR_SIZE,
>>> +                                      bytes / BDRV_SECTOR_SIZE, &error_abort);
>>
>> Rounding the offset down makes sense, but rounding the bytes down can
>> lead to weird messages.  Blindly rounding it up to a sector boundary can
>> also be wrong (example: writing 2 bytes at offset 511 really affects
>> 1024 bytes when you consider that two sectors had to undergo
>> read-modify-write). Don't we have a helper routine for determining the
>> end boundary when we have to convert an offset and length to a courser
>> alignment?
> 
> Hm, I would have to check the header files. I don't know one off the top
> of my head. If you find it, let me know.
> 

I guess I was thinking of something like
io.c:bdrv_round_sectors_to_clusters(), but didn't readily find a
counterpart for rounding bytes to sectors or request_alignment.  Don't
know if it would help to have one, or not.

>>> @@ -462,8 +461,8 @@ static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
>>>      va_list ap;
>>>  
>>>      va_start(ap, fmt);
>>> -    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
>>> -            acb->sector_num, acb->nb_sectors);
>>> +    fprintf(stderr, "quorum: offset=%" PRIu64 " bytes=%" PRIu64 " ",
>>> +            acb->offset, acb->bytes);
>>
>> Might be worth a separate patch to get rid of fprintf and use correct
>> error reporting.  But not the work for this patch.
> 
> What would correct error reporting be in this case? A QMP event? Because
> other than that and stderr, I don't think we have any channels for error
> messages for I/O requests. We could use error_report(), but it would be
> effectively the same thing as fprintf().

Hmm, you're probably right that a QMP event would be best, if anything
is needed at all.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (8 preceding siblings ...)
  2016-11-11  9:56 ` [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Paolo Bonzini
@ 2016-11-13  3:18 ` no-reply
  9 siblings, 0 replies; 35+ messages in thread
From: no-reply @ 2016-11-13  3:18 UTC (permalink / raw)
  To: kwolf; +Cc: famz, qemu-block, berto, qemu-devel, mreitz

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev()
Message-id: 1478798349-28983-1-git-send-email-kwolf@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
235adae quorum: Inline quorum_fifo_aio_cb()
d6f6356 quorum: Implement .bdrv_co_preadv/pwritev()
806d272 quorum: Avoid bdrv_aio_writev() for rewrites
5fc63d2 quorum: Inline quorum_aio_cb()
f741198 quorum: Do cleanup in caller coroutine
0c84966 quorum: Implement .bdrv_co_readv/writev
de49af6 quorum: Remove s from quorum_aio_get() arguments
d0e058f coroutine: Introduce qemu_coroutine_enter_if_inactive()

=== OUTPUT BEGIN ===
fatal: unrecognized argument: --no-patch
Checking PATCH 1/8: ...
fatal: unrecognized argument: --no-patch
Checking PATCH 2/8: ...
fatal: unrecognized argument: --no-patch
Checking PATCH 3/8: ...
fatal: unrecognized argument: --no-patch
Checking PATCH 4/8: ...
fatal: unrecognized argument: --no-patch
Checking PATCH 5/8: ...
fatal: unrecognized argument: --no-patch
Checking PATCH 6/8: ...
fatal: unrecognized argument: --no-patch
Checking PATCH 7/8: ...
fatal: unrecognized argument: --no-patch
Checking PATCH 8/8: ...
ERROR: space required before the open parenthesis '('
#65: FILE: block/quorum.c:667:
+    } while(ret < 0 && acb->children_read < s->num_children);

total: 1 errors, 0 warnings, 55 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/8] quorum: Implement .bdrv_co_readv/writev
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 3/8] quorum: Implement .bdrv_co_readv/writev Kevin Wolf
  2016-11-11  1:56   ` Eric Blake
@ 2016-11-16 15:57   ` Alberto Garcia
  1 sibling, 0 replies; 35+ messages in thread
From: Alberto Garcia @ 2016-11-16 15:57 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, qemu-devel

On Thu 10 Nov 2016 06:19:04 PM CET, Kevin Wolf wrote:

> +typedef struct QuorumCo {
> +    QuorumAIOCB *acb;
>      int i;

Maybe 'i' could rename to something a bit more descriptive ('idx', I
don't know).

> +} QuorumCo;
> +
> +static void read_quorum_children_entry(void *opaque)
> +{
> +    QuorumCo *co = opaque;
> +    QuorumAIOCB *acb = co->acb;
> +    BDRVQuorumState *s = acb->bs->opaque;
> +    int i = co->i;
> +    int ret;
> +    co = NULL; /* Not valid after the first yield */

I also don't understand this last line. Is it to make sure that no one
tries to use it after the bdrv_co_preadv() call?

> +    acb->qcrs[i].bs = s->children[i]->bs;
> +    ret = bdrv_co_preadv(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
> +                         acb->nb_sectors * BDRV_SECTOR_SIZE,
> +                         &acb->qcrs[i].qiov, 0);
> +    quorum_aio_cb(&acb->qcrs[i], ret);
> +}

Otherwise the patch looks good to me.

Berto

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive()
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive() Kevin Wolf
  2016-11-10 23:49   ` Eric Blake
@ 2016-11-17  9:30   ` Alberto Garcia
  1 sibling, 0 replies; 35+ messages in thread
From: Alberto Garcia @ 2016-11-17  9:30 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, qemu-devel

On Thu 10 Nov 2016 06:19:02 PM CET, Kevin Wolf wrote:
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index 737bffa..a06befe 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -19,6 +19,7 @@
>  #include "qemu/atomic.h"
>  #include "qemu/coroutine.h"
>  #include "qemu/coroutine_int.h"
> +#include "block/aio.h"

I also don't see why this include is necessary.

Otherwise,

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 4/8] quorum: Do cleanup in caller coroutine
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 4/8] quorum: Do cleanup in caller coroutine Kevin Wolf
  2016-11-11  2:18   ` Eric Blake
@ 2016-11-17 10:04   ` Alberto Garcia
  1 sibling, 0 replies; 35+ messages in thread
From: Alberto Garcia @ 2016-11-17 10:04 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, qemu-devel

On Thu 10 Nov 2016 06:19:05 PM CET, Kevin Wolf wrote:
> Instead of calling quorum_aio_finalize() deeply nested in what used
> to be an AIO callback, do it in the same functions that allocated the
> AIOCB.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/8] quorum: Inline quorum_aio_cb()
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 5/8] quorum: Inline quorum_aio_cb() Kevin Wolf
@ 2016-11-17 14:25   ` Alberto Garcia
  0 siblings, 0 replies; 35+ messages in thread
From: Alberto Garcia @ 2016-11-17 14:25 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, qemu-devel

On Thu 10 Nov 2016 06:19:06 PM CET, Kevin Wolf wrote:
> This is a conversion to a more natural coroutine style and improves the
> readability of the driver.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

With this patch read_quorum_children_entry() and write_quorum_entry()
become identical (except for bdrv_co_{preadv,pwritev} of course), so
maybe we can factor them out.

But that can be left for a separate patch.

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites Kevin Wolf
  2016-11-11  2:25   ` Eric Blake
@ 2016-11-17 14:54   ` Alberto Garcia
  2016-11-18 12:21     ` Kevin Wolf
  1 sibling, 1 reply; 35+ messages in thread
From: Alberto Garcia @ 2016-11-17 14:54 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, qemu-devel

On Thu 10 Nov 2016 06:19:07 PM CET, Kevin Wolf wrote:
> Replacing it with bdrv_co_pwritev() prepares us for byte granularity
> requests and gets us rid of the last bdrv_aio_*() user in quorum.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 52 +++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 33 insertions(+), 19 deletions(-)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index b2bb3af..1426115 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -134,6 +134,11 @@ struct QuorumAIOCB {
>      int children_read;          /* how many children have been read from */
>  };
>  
> +typedef struct QuorumCo {
> +    QuorumAIOCB *acb;
> +    int i;
> +} QuorumCo;

What Eric said: this could be up here in the first patch already.

> -static void quorum_rewrite_aio_cb(void *opaque, int ret)
> -{
> -    QuorumAIOCB *acb = opaque;
> -
> -    /* one less rewrite to do */
> -    acb->rewrite_count--;
> -    qemu_coroutine_enter_if_inactive(acb->co);
> -}

You're moving this logic to quorum_rewrite_entry(), but the comment in
quorum_rewrite_bad_versions() still refers to this function.

> +static void quorum_rewrite_entry(void *opaque)
> +{
> +    QuorumCo *co = opaque;
> +    QuorumAIOCB *acb = co->acb;
> +    BDRVQuorumState *s = acb->bs->opaque;
> +    int ret;
> +
> +    ret = bdrv_co_pwritev(s->children[co->i],
> +                          acb->sector_num * BDRV_SECTOR_SIZE,
> +                          acb->nb_sectors * BDRV_SECTOR_SIZE,
> +                          acb->qiov, 0);
> +    (void) ret;

Why do you need 'ret' at all? If it's a placeholder to remind us to do
something with this value in the future, you can simply add a FIXME
comment.

> +    /* one less rewrite to do */
> +    acb->rewrite_count--;
> +    qemu_coroutine_enter_if_inactive(acb->co);

I think you should only enter acb->co when acb->rewrite_count reaches
zero.

In all other cases the main coroutine simply iterates inside the while()
loop, verifies that the number is still positive and yields again.

The same applies to all other cases of qemu_coroutine_enter_if_inactive,
by the way (I failed to notice it in patch #5).

Berto

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-11  9:58     ` Kevin Wolf
  2016-11-11 15:08       ` Eric Blake
@ 2016-11-17 15:30       ` Alberto Garcia
  1 sibling, 0 replies; 35+ messages in thread
From: Alberto Garcia @ 2016-11-17 15:30 UTC (permalink / raw)
  To: Kevin Wolf, Eric Blake; +Cc: qemu-block, qemu-devel, mreitz

On Fri 11 Nov 2016 10:58:57 AM CET, Kevin Wolf wrote:

>> > Note that the QMP events emitted by the driver are an external API
>> > that we were careless enough to define as sector based. The offset
>> > and length of requests reported in events are rounded down
>> > therefore.
>> 
>> Would it be better to round offset down and length up?  A length of 0
>> looks odd.
>
> To be honest, I don't know what these events are used for and what the
> most useful rounding would be. Maybe Berto can tell?

What makes sense if we don't want to break the API is to round as Eric
suggests. Or, more specifically:

sector-num = ROUND_SECTOR_DOWN(offset)
sectors-count = ROUND_SECTOR_UP(offset + length) - sector-num

>> Should we add more fields to the two affected events (QUORUM_FAILURE
>> and QUORUM_REPORT_BAD)? We have to keep the existing fields for
>> back-compat, but we could add new fields that give byte-based
>> locations for management apps smart enough to use the new instead of
>> the old (particularly since the old fields are named 'sector-num' and
>> 'sectors-count').
>
> If there is a user for the new fields, I can do that.

I wouldn't do it until we have a use case.

>> > @@ -462,8 +461,8 @@ static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
>> >      va_list ap;
>> >  
>> >      va_start(ap, fmt);
>> > -    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
>> > -            acb->sector_num, acb->nb_sectors);
>> > +    fprintf(stderr, "quorum: offset=%" PRIu64 " bytes=%" PRIu64 " ",
>> > +            acb->offset, acb->bytes);
>> 
>> Might be worth a separate patch to get rid of fprintf and use correct
>> error reporting.  But not the work for this patch.
>
> What would correct error reporting be in this case? A QMP event?
> Because other than that and stderr, I don't think we have any channels
> for error messages for I/O requests. We could use error_report(), but
> it would be effectively the same thing as fprintf().

I'm not sure what was the original use case of this, but the
functionality should be equivalente to the blkverify driver.

I think it should actually be possible to replace blkverify completely
with quorum:

https://lists.gnu.org/archive/html/qemu-devel/2012-08/msg02817.html

Berto

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 8/8] quorum: Inline quorum_fifo_aio_cb()
  2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 8/8] quorum: Inline quorum_fifo_aio_cb() Kevin Wolf
@ 2016-11-18  9:47   ` Alberto Garcia
  0 siblings, 0 replies; 35+ messages in thread
From: Alberto Garcia @ 2016-11-18  9:47 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, qemu-devel

On Thu 10 Nov 2016 06:19:09 PM CET, Kevin Wolf wrote:
> Inlining the function removes some boilerplace code and replaces
> recursion by a simple loop, so the code becomes somewhat easier to
> understand.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-11 10:22   ` Kevin Wolf
@ 2016-11-18  9:51     ` Alberto Garcia
  2016-11-18 11:10       ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Alberto Garcia @ 2016-11-18  9:51 UTC (permalink / raw)
  To: Kevin Wolf, Paolo Bonzini; +Cc: qemu-block, qemu-devel, mreitz

On Fri 11 Nov 2016 11:22:26 AM CET, Kevin Wolf wrote:

>> Honestly I don't see the point.  It seems easier, more practical and
>> more effective to convert bdrv_aio_* to byte ranges, especially since
>> QED would be basically a rewrite.
>
> Well, it's an RFC for a reason, I want people to give their opinions
> before consistently following through with this. But I do think that
> we pointlessly offer too many different interfaces to do the same
> thing in the block layer.

I cannot speak much about the AIO/coroutine interfaces (why do we have
both? is the latter supposed to supersede the former? I'd appreciate
some background here) but I did find the code easier to follow after
these patches, so from that point of view the series looks good to me.

Berto

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-18  9:51     ` Alberto Garcia
@ 2016-11-18 11:10       ` Paolo Bonzini
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2016-11-18 11:10 UTC (permalink / raw)
  To: Alberto Garcia, Kevin Wolf; +Cc: qemu-block, qemu-devel, mreitz



On 18/11/2016 10:51, Alberto Garcia wrote:
> 
>>> >> Honestly I don't see the point.  It seems easier, more practical and
>>> >> more effective to convert bdrv_aio_* to byte ranges, especially since
>>> >> QED would be basically a rewrite.
>> >
>> > Well, it's an RFC for a reason, I want people to give their opinions
>> > before consistently following through with this. But I do think that
>> > we pointlessly offer too many different interfaces to do the same
>> > thing in the block layer.
> I cannot speak much about the AIO/coroutine interfaces (why do we have
> both? is the latter supposed to supersede the former? I'd appreciate
> some background here) but I did find the code easier to follow after
> these patches, so from that point of view the series looks good to me.

If you like it, who am I to say no! :)

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites
  2016-11-17 14:54   ` Alberto Garcia
@ 2016-11-18 12:21     ` Kevin Wolf
  2016-11-18 12:33       ` Alberto Garcia
  2016-11-18 21:11       ` Eric Blake
  0 siblings, 2 replies; 35+ messages in thread
From: Kevin Wolf @ 2016-11-18 12:21 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: qemu-block, mreitz, qemu-devel

Am 17.11.2016 um 15:54 hat Alberto Garcia geschrieben:
> On Thu 10 Nov 2016 06:19:07 PM CET, Kevin Wolf wrote:
> > +static void quorum_rewrite_entry(void *opaque)
> > +{
> > +    QuorumCo *co = opaque;
> > +    QuorumAIOCB *acb = co->acb;
> > +    BDRVQuorumState *s = acb->bs->opaque;
> > +    int ret;
> > +
> > +    ret = bdrv_co_pwritev(s->children[co->i],
> > +                          acb->sector_num * BDRV_SECTOR_SIZE,
> > +                          acb->nb_sectors * BDRV_SECTOR_SIZE,
> > +                          acb->qiov, 0);
> > +    (void) ret;
> 
> Why do you need 'ret' at all? If it's a placeholder to remind us to do
> something with this value in the future, you can simply add a FIXME
> comment.

I'm not sure whether we want to fix anything, it looks intentional to
me. I just wanted to be explicit about the ignored return value, both
for human readers and for tools like Coverity.

> > +    /* one less rewrite to do */
> > +    acb->rewrite_count--;
> > +    qemu_coroutine_enter_if_inactive(acb->co);
> 
> I think you should only enter acb->co when acb->rewrite_count reaches
> zero.
> 
> In all other cases the main coroutine simply iterates inside the while()
> loop, verifies that the number is still positive and yields again.
> 
> The same applies to all other cases of qemu_coroutine_enter_if_inactive,
> by the way (I failed to notice it in patch #5).

I think I like it better this way because it keeps the loop condition
local to the caller instead of spreading it across the caller and the
places that reenter. On the other hand, I can see that not doing the
extra context switch might be a little more efficient.

If you feel strongly about this, I will change it.

Kevin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites
  2016-11-18 12:21     ` Kevin Wolf
@ 2016-11-18 12:33       ` Alberto Garcia
  2016-11-18 21:11       ` Eric Blake
  1 sibling, 0 replies; 35+ messages in thread
From: Alberto Garcia @ 2016-11-18 12:33 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, qemu-devel

On Fri 18 Nov 2016 01:21:12 PM CET, Kevin Wolf wrote:
>> > +    /* one less rewrite to do */
>> > +    acb->rewrite_count--;
>> > +    qemu_coroutine_enter_if_inactive(acb->co);
>> 
>> I think you should only enter acb->co when acb->rewrite_count reaches
>> zero.
>> 
>> In all other cases the main coroutine simply iterates inside the
>> while() loop, verifies that the number is still positive and yields
>> again.
>> 
>> The same applies to all other cases of
>> qemu_coroutine_enter_if_inactive, by the way (I failed to notice it
>> in patch #5).
>
> I think I like it better this way because it keeps the loop condition
> local to the caller instead of spreading it across the caller and the
> places that reenter. On the other hand, I can see that not doing the
> extra context switch might be a little more efficient.

Apart from the efficiency, I actually find it a bit easier to understand
if it's made explicit:

    /* Once all rewrites are done we can wake up the caller */
    if (--acb->rewrite_count == 0) {
        qemu_coroutine_enter_if_inactive(acb->co);
    }

But I can understand that some people may find your way easier. I don't
feel strongly about it, so no need to change it.

Berto

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites
  2016-11-18 12:21     ` Kevin Wolf
  2016-11-18 12:33       ` Alberto Garcia
@ 2016-11-18 21:11       ` Eric Blake
  2016-11-21 11:56         ` Kevin Wolf
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Blake @ 2016-11-18 21:11 UTC (permalink / raw)
  To: Kevin Wolf, Alberto Garcia; +Cc: qemu-devel, qemu-block, mreitz

[-- Attachment #1: Type: text/plain, Size: 2434 bytes --]

On 11/18/2016 06:21 AM, Kevin Wolf wrote:

>>> +    ret = bdrv_co_pwritev(s->children[co->i],
>>> +                          acb->sector_num * BDRV_SECTOR_SIZE,
>>> +                          acb->nb_sectors * BDRV_SECTOR_SIZE,
>>> +                          acb->qiov, 0);
>>> +    (void) ret;
>>
>> Why do you need 'ret' at all? If it's a placeholder to remind us to do
>> something with this value in the future, you can simply add a FIXME
>> comment.
> 
> I'm not sure whether we want to fix anything, it looks intentional to
> me. I just wanted to be explicit about the ignored return value, both
> for human readers and for tools like Coverity.

In bdrv_co_flush(), we have:

    /* Return value is ignored - it's ok if wait queue is empty */
    qemu_co_queue_next(&bs->flush_queue);

I don't know if Coverity would squawk, but the cast to void looks a bit
stranger to me than a comment, where what we did in bdrv_co_flush()
seems reasonable.  There's also the patch proposal to introduce
ignore_value() in place of a cast to void, which is a bit more
self-documenting about places that intentionally ignore a return value
while still shutting Coverity up:

https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg05165.html

> 
>>> +    /* one less rewrite to do */
>>> +    acb->rewrite_count--;
>>> +    qemu_coroutine_enter_if_inactive(acb->co);
>>
>> I think you should only enter acb->co when acb->rewrite_count reaches
>> zero.
>>
>> In all other cases the main coroutine simply iterates inside the while()
>> loop, verifies that the number is still positive and yields again.
>>
>> The same applies to all other cases of qemu_coroutine_enter_if_inactive,
>> by the way (I failed to notice it in patch #5).
> 
> I think I like it better this way because it keeps the loop condition
> local to the caller instead of spreading it across the caller and the
> places that reenter. On the other hand, I can see that not doing the
> extra context switch might be a little more efficient.

Do we have a feel for how many context switches this would save?  If
it's in the noise, cleaner code is probably a win; but if it is a
hotspot, then we should definitely try the optimization.

> 
> If you feel strongly about this, I will change it.
> 
> Kevin
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites
  2016-11-18 21:11       ` Eric Blake
@ 2016-11-21 11:56         ` Kevin Wolf
  0 siblings, 0 replies; 35+ messages in thread
From: Kevin Wolf @ 2016-11-21 11:56 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alberto Garcia, qemu-devel, qemu-block, mreitz

[-- Attachment #1: Type: text/plain, Size: 2841 bytes --]

Am 18.11.2016 um 22:11 hat Eric Blake geschrieben:
> On 11/18/2016 06:21 AM, Kevin Wolf wrote:
> 
> >>> +    ret = bdrv_co_pwritev(s->children[co->i],
> >>> +                          acb->sector_num * BDRV_SECTOR_SIZE,
> >>> +                          acb->nb_sectors * BDRV_SECTOR_SIZE,
> >>> +                          acb->qiov, 0);
> >>> +    (void) ret;
> >>
> >> Why do you need 'ret' at all? If it's a placeholder to remind us to do
> >> something with this value in the future, you can simply add a FIXME
> >> comment.
> > 
> > I'm not sure whether we want to fix anything, it looks intentional to
> > me. I just wanted to be explicit about the ignored return value, both
> > for human readers and for tools like Coverity.
> 
> In bdrv_co_flush(), we have:
> 
>     /* Return value is ignored - it's ok if wait queue is empty */
>     qemu_co_queue_next(&bs->flush_queue);
> 
> I don't know if Coverity would squawk, but the cast to void looks a bit
> stranger to me than a comment, where what we did in bdrv_co_flush()
> seems reasonable.  There's also the patch proposal to introduce
> ignore_value() in place of a cast to void, which is a bit more
> self-documenting about places that intentionally ignore a return value
> while still shutting Coverity up:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg05165.html

If you don't like the void cast, I can remove it. After all, the
bdrv_aio_writev() call in the original code didn't have it either. I'm
just surprised that it feels strange to both of you, I thought it was a
well-known idiom for "this value is intentionally unused".

> > 
> >>> +    /* one less rewrite to do */
> >>> +    acb->rewrite_count--;
> >>> +    qemu_coroutine_enter_if_inactive(acb->co);
> >>
> >> I think you should only enter acb->co when acb->rewrite_count reaches
> >> zero.
> >>
> >> In all other cases the main coroutine simply iterates inside the while()
> >> loop, verifies that the number is still positive and yields again.
> >>
> >> The same applies to all other cases of qemu_coroutine_enter_if_inactive,
> >> by the way (I failed to notice it in patch #5).
> > 
> > I think I like it better this way because it keeps the loop condition
> > local to the caller instead of spreading it across the caller and the
> > places that reenter. On the other hand, I can see that not doing the
> > extra context switch might be a little more efficient.
> 
> Do we have a feel for how many context switches this would save?  If
> it's in the noise, cleaner code is probably a win; but if it is a
> hotspot, then we should definitely try the optimization.

Should normally be (num_children - 1) per request. With inconsistent
results and enabled rewrites, add another one for each child that needs
to be corrected.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-11-21 11:56 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-10 17:19 [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive() Kevin Wolf
2016-11-10 23:49   ` Eric Blake
2016-11-17  9:30   ` Alberto Garcia
2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments Kevin Wolf
2016-11-10 23:52   ` Eric Blake
2016-11-11  9:58   ` Paolo Bonzini
2016-11-11 14:18   ` Alberto Garcia
2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 3/8] quorum: Implement .bdrv_co_readv/writev Kevin Wolf
2016-11-11  1:56   ` Eric Blake
2016-11-16 15:57   ` Alberto Garcia
2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 4/8] quorum: Do cleanup in caller coroutine Kevin Wolf
2016-11-11  2:18   ` Eric Blake
2016-11-17 10:04   ` Alberto Garcia
2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 5/8] quorum: Inline quorum_aio_cb() Kevin Wolf
2016-11-17 14:25   ` Alberto Garcia
2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites Kevin Wolf
2016-11-11  2:25   ` Eric Blake
2016-11-17 14:54   ` Alberto Garcia
2016-11-18 12:21     ` Kevin Wolf
2016-11-18 12:33       ` Alberto Garcia
2016-11-18 21:11       ` Eric Blake
2016-11-21 11:56         ` Kevin Wolf
2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
2016-11-11  2:37   ` Eric Blake
2016-11-11  9:58     ` Kevin Wolf
2016-11-11 15:08       ` Eric Blake
2016-11-17 15:30       ` Alberto Garcia
2016-11-10 17:19 ` [Qemu-devel] [RFC PATCH 8/8] quorum: Inline quorum_fifo_aio_cb() Kevin Wolf
2016-11-18  9:47   ` Alberto Garcia
2016-11-11  9:56 ` [Qemu-devel] [RFC PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Paolo Bonzini
2016-11-11 10:22   ` Kevin Wolf
2016-11-18  9:51     ` Alberto Garcia
2016-11-18 11:10       ` Paolo Bonzini
2016-11-13  3:18 ` no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.