All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev()
@ 2016-11-21 17:31 Kevin Wolf
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive() Kevin Wolf
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Kevin Wolf @ 2016-11-21 17:31 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, berto, eblake, mreitz, qemu-devel

This is part of my work that aims to remove the bdrv_aio_*() emulation in io.c.

After the series, quorum doesn't only use the preferred interfaces, but the
code becomes a little easier to follow and byte granularity requests are
supported. There is probably still some potential for additional cleanups in a
follow-up series, but this should already be enough to be worth merging.

v1 (from RFC):
- Patch 1: Removed useless #include [Eric]
- Patch 3:
  * Define QuorumCo at the top so that a later patch doesn't
    have to move it [Eric]
  * Renamed QuorumCo.i to QuorumCo.idx [Berto]
  * Removed useless co = NULL [Eric]
- Patch 5/6: Reenter caller coroutine only after the last request [Berto]
- Patch 6: Replace void cast by comment [Berto]
- Patch 7: Change rounding [Eric/Berto]

Kevin Wolf (8):
  coroutine: Introduce qemu_coroutine_enter_if_inactive()
  quorum: Remove s from quorum_aio_get() arguments
  quorum: Implement .bdrv_co_readv/writev
  quorum: Do cleanup in caller coroutine
  quorum: Inline quorum_aio_cb()
  quorum: Avoid bdrv_aio_writev() for rewrites
  quorum: Implement .bdrv_co_preadv/pwritev()
  quorum: Inline quorum_fifo_aio_cb()

 block/quorum.c           | 387 +++++++++++++++++++++++++----------------------
 include/qemu/coroutine.h |   6 +
 util/qemu-coroutine.c    |   7 +
 3 files changed, 221 insertions(+), 179 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive()
  2016-11-21 17:31 [Qemu-devel] [PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
@ 2016-11-21 17:31 ` Kevin Wolf
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments Kevin Wolf
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Kevin Wolf @ 2016-11-21 17:31 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, berto, eblake, mreitz, qemu-devel

In the context of asynchronous work, if we have a worker coroutine that
didn't yield, the parent coroutine cannot be reentered because it hasn't
yielded yet. In this case we don't even have to reenter the parent
because it will see that the work is already done and won't even yield.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 include/qemu/coroutine.h | 6 ++++++
 util/qemu-coroutine.c    | 7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index e6a60d5..12584ed 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -71,6 +71,12 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque);
 void qemu_coroutine_enter(Coroutine *coroutine);
 
 /**
+ * Transfer control to a coroutine if it's not active (i.e. part of the call
+ * stack of the running coroutine). Otherwise, do nothing.
+ */
+void qemu_coroutine_enter_if_inactive(Coroutine *co);
+
+/**
  * Transfer control back to a coroutine's caller
  *
  * This function does not return until the coroutine is re-entered using
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 737bffa..a5d2f6c 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -131,6 +131,13 @@ void qemu_coroutine_enter(Coroutine *co)
     }
 }
 
+void qemu_coroutine_enter_if_inactive(Coroutine *co)
+{
+    if (!qemu_coroutine_entered(co)) {
+        qemu_coroutine_enter(co);
+    }
+}
+
 void coroutine_fn qemu_coroutine_yield(void)
 {
     Coroutine *self = qemu_coroutine_self();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments
  2016-11-21 17:31 [Qemu-devel] [PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive() Kevin Wolf
@ 2016-11-21 17:31 ` Kevin Wolf
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev Kevin Wolf
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Kevin Wolf @ 2016-11-21 17:31 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, berto, eblake, mreitz, qemu-devel

There is no point in passing the value of bs->opaque in order to
overwrite it with itself.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/quorum.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index d122299..dfa9fd3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -171,18 +171,17 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
     return a->l == b->l;
 }
 
-static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
-                                   BlockDriverState *bs,
+static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
                                    QEMUIOVector *qiov,
                                    uint64_t sector_num,
                                    int nb_sectors,
                                    BlockCompletionFunc *cb,
                                    void *opaque)
 {
+    BDRVQuorumState *s = bs->opaque;
     QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
     int i;
 
-    acb->common.bs->opaque = s;
     acb->sector_num = sector_num;
     acb->nb_sectors = nb_sectors;
     acb->qiov = qiov;
@@ -691,7 +690,7 @@ static BlockAIOCB *quorum_aio_readv(BlockDriverState *bs,
                                     void *opaque)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num,
                                       nb_sectors, cb, opaque);
     acb->is_read = true;
     acb->children_read = 0;
@@ -711,7 +710,7 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
                                      void *opaque)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors,
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors,
                                       cb, opaque);
     int i;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev
  2016-11-21 17:31 [Qemu-devel] [PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive() Kevin Wolf
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments Kevin Wolf
@ 2016-11-21 17:31 ` Kevin Wolf
  2016-11-21 17:58   ` Eric Blake
  2016-11-22  7:39   ` Alberto Garcia
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 4/8] quorum: Do cleanup in caller coroutine Kevin Wolf
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Kevin Wolf @ 2016-11-21 17:31 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, berto, eblake, mreitz, qemu-devel

This converts the quorum block driver from implementing callback-based
interfaces for read/write to coroutine-based ones. This is the first
step that will allow us further simplification of the code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 192 ++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 115 insertions(+), 77 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index dfa9fd3..6a7bd91 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -97,7 +97,7 @@ typedef struct QuorumAIOCB QuorumAIOCB;
  * $children_count QuorumChildRequest.
  */
 typedef struct QuorumChildRequest {
-    BlockAIOCB *aiocb;
+    BlockDriverState *bs;
     QEMUIOVector qiov;
     uint8_t *buf;
     int ret;
@@ -110,7 +110,8 @@ typedef struct QuorumChildRequest {
  * used to do operations on each children and track overall progress.
  */
 struct QuorumAIOCB {
-    BlockAIOCB common;
+    BlockDriverState *bs;
+    Coroutine *co;
 
     /* Request metadata */
     uint64_t sector_num;
@@ -129,36 +130,23 @@ struct QuorumAIOCB {
     QuorumVotes votes;
 
     bool is_read;
+    bool has_completed;
     int vote_ret;
     int children_read;          /* how many children have been read from */
 };
 
-static bool quorum_vote(QuorumAIOCB *acb);
-
-static void quorum_aio_cancel(BlockAIOCB *blockacb)
-{
-    QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
-    BDRVQuorumState *s = acb->common.bs->opaque;
-    int i;
-
-    /* cancel all callbacks */
-    for (i = 0; i < s->num_children; i++) {
-        if (acb->qcrs[i].aiocb) {
-            bdrv_aio_cancel_async(acb->qcrs[i].aiocb);
-        }
-    }
-}
+typedef struct QuorumCo {
+    QuorumAIOCB *acb;
+    int idx;
+} QuorumCo;
 
-static AIOCBInfo quorum_aiocb_info = {
-    .aiocb_size         = sizeof(QuorumAIOCB),
-    .cancel_async       = quorum_aio_cancel,
-};
+static bool quorum_vote(QuorumAIOCB *acb);
 
 static void quorum_aio_finalize(QuorumAIOCB *acb)
 {
-    acb->common.cb(acb->common.opaque, acb->vote_ret);
+    acb->has_completed = true;
     g_free(acb->qcrs);
-    qemu_aio_unref(acb);
+    qemu_coroutine_enter_if_inactive(acb->co);
 }
 
 static bool quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
@@ -174,14 +162,14 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
 static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
                                    QEMUIOVector *qiov,
                                    uint64_t sector_num,
-                                   int nb_sectors,
-                                   BlockCompletionFunc *cb,
-                                   void *opaque)
+                                   int nb_sectors)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
+    QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);
     int i;
 
+    acb->co = qemu_coroutine_self();
+    acb->bs = bs;
     acb->sector_num = sector_num;
     acb->nb_sectors = nb_sectors;
     acb->qiov = qiov;
@@ -191,6 +179,7 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
     acb->rewrite_count = 0;
     acb->votes.compare = quorum_sha256_compare;
     QLIST_INIT(&acb->votes.vote_list);
+    acb->has_completed = false;
     acb->is_read = false;
     acb->vote_ret = 0;
 
@@ -217,7 +206,7 @@ static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
 
 static void quorum_report_failure(QuorumAIOCB *acb)
 {
-    const char *reference = bdrv_get_device_or_node_name(acb->common.bs);
+    const char *reference = bdrv_get_device_or_node_name(acb->bs);
     qapi_event_send_quorum_failure(reference, acb->sector_num,
                                    acb->nb_sectors, &error_abort);
 }
@@ -226,7 +215,7 @@ static int quorum_vote_error(QuorumAIOCB *acb);
 
 static bool quorum_has_too_much_io_failed(QuorumAIOCB *acb)
 {
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
 
     if (acb->success_count < s->threshold) {
         acb->vote_ret = quorum_vote_error(acb);
@@ -252,7 +241,7 @@ static void quorum_rewrite_aio_cb(void *opaque, int ret)
     quorum_aio_finalize(acb);
 }
 
-static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb);
+static int read_fifo_child(QuorumAIOCB *acb);
 
 static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
 {
@@ -272,14 +261,14 @@ static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret)
     QuorumAIOCB *acb = sacb->parent;
     QuorumOpType type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE;
     quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
-                      sacb->aiocb->bs->node_name, ret);
+                      sacb->bs->node_name, ret);
 }
 
-static void quorum_fifo_aio_cb(void *opaque, int ret)
+static int quorum_fifo_aio_cb(void *opaque, int ret)
 {
     QuorumChildRequest *sacb = opaque;
     QuorumAIOCB *acb = sacb->parent;
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
 
     assert(acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO);
 
@@ -288,8 +277,7 @@ static void quorum_fifo_aio_cb(void *opaque, int ret)
 
         /* We try to read next child in FIFO order if we fail to read */
         if (acb->children_read < s->num_children) {
-            read_fifo_child(acb);
-            return;
+            return read_fifo_child(acb);
         }
     }
 
@@ -297,13 +285,14 @@ static void quorum_fifo_aio_cb(void *opaque, int ret)
 
     /* FIXME: rewrite failed children if acb->children_read > 1? */
     quorum_aio_finalize(acb);
+    return ret;
 }
 
 static void quorum_aio_cb(void *opaque, int ret)
 {
     QuorumChildRequest *sacb = opaque;
     QuorumAIOCB *acb = sacb->parent;
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
     bool rewrite = false;
     int i;
 
@@ -518,7 +507,7 @@ static bool quorum_compare(QuorumAIOCB *acb,
                            QEMUIOVector *a,
                            QEMUIOVector *b)
 {
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
     ssize_t offset;
 
     /* This driver will replace blkverify in this particular case */
@@ -538,7 +527,7 @@ static bool quorum_compare(QuorumAIOCB *acb,
 /* Do a vote to get the error code */
 static int quorum_vote_error(QuorumAIOCB *acb)
 {
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
     QuorumVoteVersion *winner = NULL;
     QuorumVotes error_votes;
     QuorumVoteValue result_value;
@@ -573,7 +562,7 @@ static bool quorum_vote(QuorumAIOCB *acb)
     bool rewrite = false;
     int i, j, ret;
     QuorumVoteValue hash;
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
     QuorumVoteVersion *winner;
 
     if (quorum_has_too_much_io_failed(acb)) {
@@ -649,10 +638,25 @@ free_exit:
     return rewrite;
 }
 
-static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
+static void read_quorum_children_entry(void *opaque)
 {
-    BDRVQuorumState *s = acb->common.bs->opaque;
-    int i;
+    QuorumCo *co = opaque;
+    QuorumAIOCB *acb = co->acb;
+    BDRVQuorumState *s = acb->bs->opaque;
+    int i = co->idx;
+    int ret;
+
+    acb->qcrs[i].bs = s->children[i]->bs;
+    ret = bdrv_co_preadv(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
+                         acb->nb_sectors * BDRV_SECTOR_SIZE,
+                         &acb->qcrs[i].qiov, 0);
+    quorum_aio_cb(&acb->qcrs[i], ret);
+}
+
+static int read_quorum_children(QuorumAIOCB *acb)
+{
+    BDRVQuorumState *s = acb->bs->opaque;
+    int i, ret;
 
     acb->children_read = s->num_children;
     for (i = 0; i < s->num_children; i++) {
@@ -662,65 +666,99 @@ static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
     }
 
     for (i = 0; i < s->num_children; i++) {
-        acb->qcrs[i].aiocb = bdrv_aio_readv(s->children[i], acb->sector_num,
-                                            &acb->qcrs[i].qiov, acb->nb_sectors,
-                                            quorum_aio_cb, &acb->qcrs[i]);
+        Coroutine *co;
+        QuorumCo data = {
+            .acb = acb,
+            .idx = i,
+        };
+
+        co = qemu_coroutine_create(read_quorum_children_entry, &data);
+        qemu_coroutine_enter(co);
     }
 
-    return &acb->common;
+    if (!acb->has_completed) {
+        qemu_coroutine_yield();
+    }
+
+    ret = acb->vote_ret;
+
+    return ret;
 }
 
-static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
+static int read_fifo_child(QuorumAIOCB *acb)
 {
-    BDRVQuorumState *s = acb->common.bs->opaque;
+    BDRVQuorumState *s = acb->bs->opaque;
     int n = acb->children_read++;
+    int ret;
 
-    acb->qcrs[n].aiocb = bdrv_aio_readv(s->children[n], acb->sector_num,
-                                        acb->qiov, acb->nb_sectors,
-                                        quorum_fifo_aio_cb, &acb->qcrs[n]);
+    acb->qcrs[n].bs = s->children[n]->bs;
+    ret = bdrv_co_preadv(s->children[n], acb->sector_num * BDRV_SECTOR_SIZE,
+                         acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
+    ret = quorum_fifo_aio_cb(&acb->qcrs[n], ret);
 
-    return &acb->common;
+    return ret;
 }
 
-static BlockAIOCB *quorum_aio_readv(BlockDriverState *bs,
-                                    int64_t sector_num,
-                                    QEMUIOVector *qiov,
-                                    int nb_sectors,
-                                    BlockCompletionFunc *cb,
-                                    void *opaque)
+static int quorum_co_readv(BlockDriverState *bs,
+                           int64_t sector_num, int nb_sectors,
+                           QEMUIOVector *qiov)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num,
-                                      nb_sectors, cb, opaque);
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors);
+    int ret;
+
     acb->is_read = true;
     acb->children_read = 0;
 
     if (s->read_pattern == QUORUM_READ_PATTERN_QUORUM) {
-        return read_quorum_children(acb);
+        ret = read_quorum_children(acb);
+    } else {
+        ret = read_fifo_child(acb);
     }
+    g_free(acb);
+    return ret;
+}
 
-    return read_fifo_child(acb);
+static void write_quorum_entry(void *opaque)
+{
+    QuorumCo *co = opaque;
+    QuorumAIOCB *acb = co->acb;
+    BDRVQuorumState *s = acb->bs->opaque;
+    int i = co->idx;
+    int ret;
+
+    acb->qcrs[i].bs = s->children[i]->bs;
+    ret = bdrv_co_pwritev(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
+                          acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
+    quorum_aio_cb(&acb->qcrs[i], ret);
 }
 
-static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
-                                     int64_t sector_num,
-                                     QEMUIOVector *qiov,
-                                     int nb_sectors,
-                                     BlockCompletionFunc *cb,
-                                     void *opaque)
+static int quorum_co_writev(BlockDriverState *bs,
+                            int64_t sector_num, int nb_sectors,
+                            QEMUIOVector *qiov)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors,
-                                      cb, opaque);
-    int i;
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors);
+    int i, ret;
 
     for (i = 0; i < s->num_children; i++) {
-        acb->qcrs[i].aiocb = bdrv_aio_writev(s->children[i], sector_num,
-                                             qiov, nb_sectors, &quorum_aio_cb,
-                                             &acb->qcrs[i]);
+        Coroutine *co;
+        QuorumCo data = {
+            .acb = acb,
+            .idx = i,
+        };
+
+        co = qemu_coroutine_create(write_quorum_entry, &data);
+        qemu_coroutine_enter(co);
     }
 
-    return &acb->common;
+    if (!acb->has_completed) {
+        qemu_coroutine_yield();
+    }
+
+    ret = acb->vote_ret;
+
+    return ret;
 }
 
 static int64_t quorum_getlength(BlockDriverState *bs)
@@ -1097,8 +1135,8 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_getlength                     = quorum_getlength,
 
-    .bdrv_aio_readv                     = quorum_aio_readv,
-    .bdrv_aio_writev                    = quorum_aio_writev,
+    .bdrv_co_readv                      = quorum_co_readv,
+    .bdrv_co_writev                     = quorum_co_writev,
 
     .bdrv_add_child                     = quorum_add_child,
     .bdrv_del_child                     = quorum_del_child,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 4/8] quorum: Do cleanup in caller coroutine
  2016-11-21 17:31 [Qemu-devel] [PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (2 preceding siblings ...)
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev Kevin Wolf
@ 2016-11-21 17:31 ` Kevin Wolf
  2016-11-21 19:03   ` Eric Blake
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 5/8] quorum: Inline quorum_aio_cb() Kevin Wolf
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2016-11-21 17:31 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, berto, eblake, mreitz, qemu-devel

Instead of calling quorum_aio_finalize() deeply nested in what used
to be an AIO callback, do it in the same functions that allocated the
AIOCB.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/quorum.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 6a7bd91..e044010 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -144,9 +144,8 @@ static bool quorum_vote(QuorumAIOCB *acb);
 
 static void quorum_aio_finalize(QuorumAIOCB *acb)
 {
-    acb->has_completed = true;
     g_free(acb->qcrs);
-    qemu_coroutine_enter_if_inactive(acb->co);
+    g_free(acb);
 }
 
 static bool quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
@@ -238,7 +237,8 @@ static void quorum_rewrite_aio_cb(void *opaque, int ret)
         return;
     }
 
-    quorum_aio_finalize(acb);
+    acb->has_completed = true;
+    qemu_coroutine_enter_if_inactive(acb->co);
 }
 
 static int read_fifo_child(QuorumAIOCB *acb);
@@ -284,7 +284,7 @@ static int quorum_fifo_aio_cb(void *opaque, int ret)
     acb->vote_ret = ret;
 
     /* FIXME: rewrite failed children if acb->children_read > 1? */
-    quorum_aio_finalize(acb);
+
     return ret;
 }
 
@@ -322,7 +322,8 @@ static void quorum_aio_cb(void *opaque, int ret)
 
     /* if no rewrite is done the code will finish right away */
     if (!rewrite) {
-        quorum_aio_finalize(acb);
+        acb->has_completed = true;
+        qemu_coroutine_enter_if_inactive(acb->co);
     }
 }
 
@@ -715,7 +716,8 @@ static int quorum_co_readv(BlockDriverState *bs,
     } else {
         ret = read_fifo_child(acb);
     }
-    g_free(acb);
+    quorum_aio_finalize(acb);
+
     return ret;
 }
 
@@ -757,6 +759,7 @@ static int quorum_co_writev(BlockDriverState *bs,
     }
 
     ret = acb->vote_ret;
+    quorum_aio_finalize(acb);
 
     return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 5/8] quorum: Inline quorum_aio_cb()
  2016-11-21 17:31 [Qemu-devel] [PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (3 preceding siblings ...)
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 4/8] quorum: Do cleanup in caller coroutine Kevin Wolf
@ 2016-11-21 17:31 ` Kevin Wolf
  2016-11-21 19:21   ` Eric Blake
  2016-11-22  7:43   ` Alberto Garcia
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites Kevin Wolf
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Kevin Wolf @ 2016-11-21 17:31 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, berto, eblake, mreitz, qemu-devel

This is a conversion to a more natural coroutine style and improves the
readability of the driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 118 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 57 insertions(+), 61 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index e044010..8513414 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -130,7 +130,6 @@ struct QuorumAIOCB {
     QuorumVotes votes;
 
     bool is_read;
-    bool has_completed;
     int vote_ret;
     int children_read;          /* how many children have been read from */
 };
@@ -178,7 +177,6 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
     acb->rewrite_count = 0;
     acb->votes.compare = quorum_sha256_compare;
     QLIST_INIT(&acb->votes.vote_list);
-    acb->has_completed = false;
     acb->is_read = false;
     acb->vote_ret = 0;
 
@@ -231,13 +229,6 @@ static void quorum_rewrite_aio_cb(void *opaque, int ret)
 
     /* one less rewrite to do */
     acb->rewrite_count--;
-
-    /* wait until all rewrite callbacks have completed */
-    if (acb->rewrite_count) {
-        return;
-    }
-
-    acb->has_completed = true;
     qemu_coroutine_enter_if_inactive(acb->co);
 }
 
@@ -288,45 +279,6 @@ static int quorum_fifo_aio_cb(void *opaque, int ret)
     return ret;
 }
 
-static void quorum_aio_cb(void *opaque, int ret)
-{
-    QuorumChildRequest *sacb = opaque;
-    QuorumAIOCB *acb = sacb->parent;
-    BDRVQuorumState *s = acb->bs->opaque;
-    bool rewrite = false;
-    int i;
-
-    sacb->ret = ret;
-    if (ret == 0) {
-        acb->success_count++;
-    } else {
-        quorum_report_bad_acb(sacb, ret);
-    }
-    acb->count++;
-    assert(acb->count <= s->num_children);
-    assert(acb->success_count <= s->num_children);
-    if (acb->count < s->num_children) {
-        return;
-    }
-
-    /* Do the vote on read */
-    if (acb->is_read) {
-        rewrite = quorum_vote(acb);
-        for (i = 0; i < s->num_children; i++) {
-            qemu_vfree(acb->qcrs[i].buf);
-            qemu_iovec_destroy(&acb->qcrs[i].qiov);
-        }
-    } else {
-        quorum_has_too_much_io_failed(acb);
-    }
-
-    /* if no rewrite is done the code will finish right away */
-    if (!rewrite) {
-        acb->has_completed = true;
-        qemu_coroutine_enter_if_inactive(acb->co);
-    }
-}
-
 static void quorum_report_bad_versions(BDRVQuorumState *s,
                                        QuorumAIOCB *acb,
                                        QuorumVoteValue *value)
@@ -645,18 +597,34 @@ static void read_quorum_children_entry(void *opaque)
     QuorumAIOCB *acb = co->acb;
     BDRVQuorumState *s = acb->bs->opaque;
     int i = co->idx;
-    int ret;
+    QuorumChildRequest *sacb = &acb->qcrs[i];
+
+    sacb->bs = s->children[i]->bs;
+    sacb->ret = bdrv_co_preadv(s->children[i],
+                               acb->sector_num * BDRV_SECTOR_SIZE,
+                               acb->nb_sectors * BDRV_SECTOR_SIZE,
+                               &acb->qcrs[i].qiov, 0);
+
+    if (sacb->ret == 0) {
+        acb->success_count++;
+    } else {
+        quorum_report_bad_acb(sacb, sacb->ret);
+    }
 
-    acb->qcrs[i].bs = s->children[i]->bs;
-    ret = bdrv_co_preadv(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
-                         acb->nb_sectors * BDRV_SECTOR_SIZE,
-                         &acb->qcrs[i].qiov, 0);
-    quorum_aio_cb(&acb->qcrs[i], ret);
+    acb->count++;
+    assert(acb->count <= s->num_children);
+    assert(acb->success_count <= s->num_children);
+
+    /* Wake up the caller after the last read */
+    if (acb->count == s->num_children) {
+        qemu_coroutine_enter_if_inactive(acb->co);
+    }
 }
 
 static int read_quorum_children(QuorumAIOCB *acb)
 {
     BDRVQuorumState *s = acb->bs->opaque;
+    bool rewrite = false;
     int i, ret;
 
     acb->children_read = s->num_children;
@@ -677,7 +645,19 @@ static int read_quorum_children(QuorumAIOCB *acb)
         qemu_coroutine_enter(co);
     }
 
-    if (!acb->has_completed) {
+    while (acb->count < s->num_children) {
+        qemu_coroutine_yield();
+    }
+
+    /* Do the vote on read */
+    rewrite = quorum_vote(acb);
+    for (i = 0; i < s->num_children; i++) {
+        qemu_vfree(acb->qcrs[i].buf);
+        qemu_iovec_destroy(&acb->qcrs[i].qiov);
+    }
+
+    assert(rewrite == !!acb->rewrite_count);
+    while (acb->rewrite_count) {
         qemu_coroutine_yield();
     }
 
@@ -727,12 +707,26 @@ static void write_quorum_entry(void *opaque)
     QuorumAIOCB *acb = co->acb;
     BDRVQuorumState *s = acb->bs->opaque;
     int i = co->idx;
-    int ret;
+    QuorumChildRequest *sacb = &acb->qcrs[i];
+
+    sacb->bs = s->children[i]->bs;
+    sacb->ret = bdrv_co_pwritev(s->children[i],
+                                acb->sector_num * BDRV_SECTOR_SIZE,
+                                acb->nb_sectors * BDRV_SECTOR_SIZE,
+                                acb->qiov, 0);
+    if (sacb->ret == 0) {
+        acb->success_count++;
+    } else {
+        quorum_report_bad_acb(sacb, sacb->ret);
+    }
+    acb->count++;
+    assert(acb->count <= s->num_children);
+    assert(acb->success_count <= s->num_children);
 
-    acb->qcrs[i].bs = s->children[i]->bs;
-    ret = bdrv_co_pwritev(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
-                          acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
-    quorum_aio_cb(&acb->qcrs[i], ret);
+    /* Wake up the caller after the last write */
+    if (acb->count == s->num_children) {
+        qemu_coroutine_enter_if_inactive(acb->co);
+    }
 }
 
 static int quorum_co_writev(BlockDriverState *bs,
@@ -754,10 +748,12 @@ static int quorum_co_writev(BlockDriverState *bs,
         qemu_coroutine_enter(co);
     }
 
-    if (!acb->has_completed) {
+    while (acb->count < s->num_children) {
         qemu_coroutine_yield();
     }
 
+    quorum_has_too_much_io_failed(acb);
+
     ret = acb->vote_ret;
     quorum_aio_finalize(acb);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites
  2016-11-21 17:31 [Qemu-devel] [PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (4 preceding siblings ...)
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 5/8] quorum: Inline quorum_aio_cb() Kevin Wolf
@ 2016-11-21 17:31 ` Kevin Wolf
  2016-11-21 19:52   ` Eric Blake
  2016-11-22  7:45   ` Alberto Garcia
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 8/8] quorum: Inline quorum_fifo_aio_cb() Kevin Wolf
  7 siblings, 2 replies; 23+ messages in thread
From: Kevin Wolf @ 2016-11-21 17:31 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, berto, eblake, mreitz, qemu-devel

Replacing it with bdrv_co_pwritev() prepares us for byte granularity
requests and gets us rid of the last bdrv_aio_*() user in quorum.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 46 +++++++++++++++++++++++++++++++---------------
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 8513414..8e674a8 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -223,15 +223,6 @@ static bool quorum_has_too_much_io_failed(QuorumAIOCB *acb)
     return false;
 }
 
-static void quorum_rewrite_aio_cb(void *opaque, int ret)
-{
-    QuorumAIOCB *acb = opaque;
-
-    /* one less rewrite to do */
-    acb->rewrite_count--;
-    qemu_coroutine_enter_if_inactive(acb->co);
-}
-
 static int read_fifo_child(QuorumAIOCB *acb);
 
 static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
@@ -298,7 +289,27 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
     }
 }
 
-static bool quorum_rewrite_bad_versions(BDRVQuorumState *s, QuorumAIOCB *acb,
+static void quorum_rewrite_entry(void *opaque)
+{
+    QuorumCo *co = opaque;
+    QuorumAIOCB *acb = co->acb;
+    BDRVQuorumState *s = acb->bs->opaque;
+
+    /* Ignore any errors, it's just a correction attempt for already
+     * corrupted data. */
+    bdrv_co_pwritev(s->children[co->idx],
+                    acb->sector_num * BDRV_SECTOR_SIZE,
+                    acb->nb_sectors * BDRV_SECTOR_SIZE,
+                    acb->qiov, 0);
+
+    /* Wake up the caller after the last rewrite */
+    acb->rewrite_count--;
+    if (!acb->rewrite_count) {
+        qemu_coroutine_enter_if_inactive(acb->co);
+    }
+}
+
+static bool quorum_rewrite_bad_versions(QuorumAIOCB *acb,
                                         QuorumVoteValue *value)
 {
     QuorumVoteVersion *version;
@@ -317,7 +328,7 @@ static bool quorum_rewrite_bad_versions(BDRVQuorumState *s, QuorumAIOCB *acb,
         }
     }
 
-    /* quorum_rewrite_aio_cb will count down this to zero */
+    /* quorum_rewrite_entry will count down this to zero */
     acb->rewrite_count = count;
 
     /* now fire the correcting rewrites */
@@ -326,9 +337,14 @@ static bool quorum_rewrite_bad_versions(BDRVQuorumState *s, QuorumAIOCB *acb,
             continue;
         }
         QLIST_FOREACH(item, &version->items, next) {
-            bdrv_aio_writev(s->children[item->index], acb->sector_num,
-                            acb->qiov, acb->nb_sectors, quorum_rewrite_aio_cb,
-                            acb);
+            Coroutine *co;
+            QuorumCo data = {
+                .acb = acb,
+                .idx = item->index,
+            };
+
+            co = qemu_coroutine_create(quorum_rewrite_entry, &data);
+            qemu_coroutine_enter(co);
         }
     }
 
@@ -582,7 +598,7 @@ static bool quorum_vote(QuorumAIOCB *acb)
 
     /* corruption correction is enabled */
     if (s->rewrite_corrupted) {
-        rewrite = quorum_rewrite_bad_versions(s, acb, &winner->value);
+        rewrite = quorum_rewrite_bad_versions(acb, &winner->value);
     }
 
 free_exit:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-21 17:31 [Qemu-devel] [PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (5 preceding siblings ...)
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites Kevin Wolf
@ 2016-11-21 17:31 ` Kevin Wolf
  2016-11-21 20:04   ` Eric Blake
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 8/8] quorum: Inline quorum_fifo_aio_cb() Kevin Wolf
  7 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2016-11-21 17:31 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, berto, eblake, mreitz, qemu-devel

This enables byte granularity requests on quorum nodes.

Note that the QMP events emitted by the driver are an external API that
we were careless enough to define as sector based. The offset and length
of requests reported in events are rounded therefore.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/quorum.c | 79 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 36 insertions(+), 43 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 8e674a8..ae0fe05 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -114,8 +114,8 @@ struct QuorumAIOCB {
     Coroutine *co;
 
     /* Request metadata */
-    uint64_t sector_num;
-    int nb_sectors;
+    uint64_t offset;
+    uint64_t bytes;
 
     QEMUIOVector *qiov;         /* calling IOV */
 
@@ -159,8 +159,8 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
 
 static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
                                    QEMUIOVector *qiov,
-                                   uint64_t sector_num,
-                                   int nb_sectors)
+                                   uint64_t offset,
+                                   uint64_t bytes)
 {
     BDRVQuorumState *s = bs->opaque;
     QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);
@@ -168,8 +168,8 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
 
     acb->co = qemu_coroutine_self();
     acb->bs = bs;
-    acb->sector_num = sector_num;
-    acb->nb_sectors = nb_sectors;
+    acb->offset = offset;
+    acb->bytes = bytes;
     acb->qiov = qiov;
     acb->qcrs = g_new0(QuorumChildRequest, s->num_children);
     acb->count = 0;
@@ -189,23 +189,28 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
     return acb;
 }
 
-static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
-                              int nb_sectors, char *node_name, int ret)
+static void quorum_report_bad(QuorumOpType type, uint64_t offset,
+                              uint64_t bytes, char *node_name, int ret)
 {
     const char *msg = NULL;
+    int64_t start_sector = offset / BDRV_SECTOR_SIZE;
+    int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
+
     if (ret < 0) {
         msg = strerror(-ret);
     }
 
-    qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name,
-                                      sector_num, nb_sectors, &error_abort);
+    qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name, start_sector,
+                                      end_sector - start_sector, &error_abort);
 }
 
 static void quorum_report_failure(QuorumAIOCB *acb)
 {
     const char *reference = bdrv_get_device_or_node_name(acb->bs);
-    qapi_event_send_quorum_failure(reference, acb->sector_num,
-                                   acb->nb_sectors, &error_abort);
+    qapi_event_send_quorum_failure(reference,
+                                   acb->offset / BDRV_SECTOR_SIZE,
+                                   acb->bytes / BDRV_SECTOR_SIZE,
+                                   &error_abort);
 }
 
 static int quorum_vote_error(QuorumAIOCB *acb);
@@ -242,8 +247,7 @@ static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret)
 {
     QuorumAIOCB *acb = sacb->parent;
     QuorumOpType type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE;
-    quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
-                      sacb->bs->node_name, ret);
+    quorum_report_bad(type, acb->offset, acb->bytes, sacb->bs->node_name, ret);
 }
 
 static int quorum_fifo_aio_cb(void *opaque, int ret)
@@ -282,8 +286,7 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
             continue;
         }
         QLIST_FOREACH(item, &version->items, next) {
-            quorum_report_bad(QUORUM_OP_TYPE_READ, acb->sector_num,
-                              acb->nb_sectors,
+            quorum_report_bad(QUORUM_OP_TYPE_READ, acb->offset, acb->bytes,
                               s->children[item->index]->bs->node_name, 0);
         }
     }
@@ -297,9 +300,7 @@ static void quorum_rewrite_entry(void *opaque)
 
     /* Ignore any errors, it's just a correction attempt for already
      * corrupted data. */
-    bdrv_co_pwritev(s->children[co->idx],
-                    acb->sector_num * BDRV_SECTOR_SIZE,
-                    acb->nb_sectors * BDRV_SECTOR_SIZE,
+    bdrv_co_pwritev(s->children[co->idx], acb->offset, acb->bytes,
                     acb->qiov, 0);
 
     /* Wake up the caller after the last rewrite */
@@ -464,8 +465,8 @@ static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
     va_list ap;
 
     va_start(ap, fmt);
-    fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
-            acb->sector_num, acb->nb_sectors);
+    fprintf(stderr, "quorum: offset=%" PRIu64 " bytes=%" PRIu64 " ",
+            acb->offset, acb->bytes);
     vfprintf(stderr, fmt, ap);
     fprintf(stderr, "\n");
     va_end(ap);
@@ -483,9 +484,8 @@ static bool quorum_compare(QuorumAIOCB *acb,
     if (s->is_blkverify) {
         offset = qemu_iovec_compare(a, b);
         if (offset != -1) {
-            quorum_err(acb, "contents mismatch in sector %" PRId64,
-                       acb->sector_num +
-                       (uint64_t)(offset / BDRV_SECTOR_SIZE));
+            quorum_err(acb, "contents mismatch at offset %" PRIu64,
+                       acb->offset + offset);
         }
         return true;
     }
@@ -616,9 +616,7 @@ static void read_quorum_children_entry(void *opaque)
     QuorumChildRequest *sacb = &acb->qcrs[i];
 
     sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_preadv(s->children[i],
-                               acb->sector_num * BDRV_SECTOR_SIZE,
-                               acb->nb_sectors * BDRV_SECTOR_SIZE,
+    sacb->ret = bdrv_co_preadv(s->children[i], acb->offset, acb->bytes,
                                &acb->qcrs[i].qiov, 0);
 
     if (sacb->ret == 0) {
@@ -689,19 +687,17 @@ static int read_fifo_child(QuorumAIOCB *acb)
     int ret;
 
     acb->qcrs[n].bs = s->children[n]->bs;
-    ret = bdrv_co_preadv(s->children[n], acb->sector_num * BDRV_SECTOR_SIZE,
-                         acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
+    ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes, acb->qiov, 0);
     ret = quorum_fifo_aio_cb(&acb->qcrs[n], ret);
 
     return ret;
 }
 
-static int quorum_co_readv(BlockDriverState *bs,
-                           int64_t sector_num, int nb_sectors,
-                           QEMUIOVector *qiov)
+static int quorum_co_preadv(BlockDriverState *bs, uint64_t offset,
+                            uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors);
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes);
     int ret;
 
     acb->is_read = true;
@@ -726,9 +722,7 @@ static void write_quorum_entry(void *opaque)
     QuorumChildRequest *sacb = &acb->qcrs[i];
 
     sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_pwritev(s->children[i],
-                                acb->sector_num * BDRV_SECTOR_SIZE,
-                                acb->nb_sectors * BDRV_SECTOR_SIZE,
+    sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
                                 acb->qiov, 0);
     if (sacb->ret == 0) {
         acb->success_count++;
@@ -745,12 +739,11 @@ static void write_quorum_entry(void *opaque)
     }
 }
 
-static int quorum_co_writev(BlockDriverState *bs,
-                            int64_t sector_num, int nb_sectors,
-                            QEMUIOVector *qiov)
+static int quorum_co_pwritev(BlockDriverState *bs, uint64_t offset,
+                             uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
     BDRVQuorumState *s = bs->opaque;
-    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors);
+    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes);
     int i, ret;
 
     for (i = 0; i < s->num_children; i++) {
@@ -817,7 +810,7 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
         result = bdrv_co_flush(s->children[i]->bs);
         if (result) {
             quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
-                              bdrv_nb_sectors(s->children[i]->bs),
+                              bdrv_getlength(s->children[i]->bs),
                               s->children[i]->bs->node_name, result);
             result_value.l = result;
             quorum_count_vote(&error_votes, &result_value, i);
@@ -1150,8 +1143,8 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_getlength                     = quorum_getlength,
 
-    .bdrv_co_readv                      = quorum_co_readv,
-    .bdrv_co_writev                     = quorum_co_writev,
+    .bdrv_co_preadv                     = quorum_co_preadv,
+    .bdrv_co_pwritev                    = quorum_co_pwritev,
 
     .bdrv_add_child                     = quorum_add_child,
     .bdrv_del_child                     = quorum_del_child,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 8/8] quorum: Inline quorum_fifo_aio_cb()
  2016-11-21 17:31 [Qemu-devel] [PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
                   ` (6 preceding siblings ...)
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
@ 2016-11-21 17:31 ` Kevin Wolf
  2016-11-21 20:08   ` Eric Blake
  7 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2016-11-21 17:31 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, berto, eblake, mreitz, qemu-devel

Inlining the function removes some boilerplace code and replaces
recursion by a simple loop, so the code becomes somewhat easier to
understand.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/quorum.c | 42 +++++++++++++-----------------------------
 1 file changed, 13 insertions(+), 29 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index ae0fe05..ff287bf 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -250,30 +250,6 @@ static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret)
     quorum_report_bad(type, acb->offset, acb->bytes, sacb->bs->node_name, ret);
 }
 
-static int quorum_fifo_aio_cb(void *opaque, int ret)
-{
-    QuorumChildRequest *sacb = opaque;
-    QuorumAIOCB *acb = sacb->parent;
-    BDRVQuorumState *s = acb->bs->opaque;
-
-    assert(acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO);
-
-    if (ret < 0) {
-        quorum_report_bad_acb(sacb, ret);
-
-        /* We try to read next child in FIFO order if we fail to read */
-        if (acb->children_read < s->num_children) {
-            return read_fifo_child(acb);
-        }
-    }
-
-    acb->vote_ret = ret;
-
-    /* FIXME: rewrite failed children if acb->children_read > 1? */
-
-    return ret;
-}
-
 static void quorum_report_bad_versions(BDRVQuorumState *s,
                                        QuorumAIOCB *acb,
                                        QuorumVoteValue *value)
@@ -683,12 +659,20 @@ static int read_quorum_children(QuorumAIOCB *acb)
 static int read_fifo_child(QuorumAIOCB *acb)
 {
     BDRVQuorumState *s = acb->bs->opaque;
-    int n = acb->children_read++;
-    int ret;
+    int n, ret;
+
+    /* We try to read the next child in FIFO order if we failed to read */
+    do {
+        n = acb->children_read++;
+        acb->qcrs[n].bs = s->children[n]->bs;
+        ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes,
+                             acb->qiov, 0);
+        if (ret < 0) {
+            quorum_report_bad_acb(&acb->qcrs[n], ret);
+        }
+    } while (ret < 0 && acb->children_read < s->num_children);
 
-    acb->qcrs[n].bs = s->children[n]->bs;
-    ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes, acb->qiov, 0);
-    ret = quorum_fifo_aio_cb(&acb->qcrs[n], ret);
+    /* FIXME: rewrite failed children if acb->children_read > 1? */
 
     return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev Kevin Wolf
@ 2016-11-21 17:58   ` Eric Blake
  2016-11-22 11:32     ` Kevin Wolf
  2016-11-22  7:39   ` Alberto Garcia
  1 sibling, 1 reply; 23+ messages in thread
From: Eric Blake @ 2016-11-21 17:58 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, mreitz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4919 bytes --]

On 11/21/2016 11:31 AM, Kevin Wolf wrote:
> This converts the quorum block driver from implementing callback-based
> interfaces for read/write to coroutine-based ones. This is the first
> step that will allow us further simplification of the code.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 192 ++++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 115 insertions(+), 77 deletions(-)
> 

> @@ -174,14 +162,14 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
>  static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
>                                     QEMUIOVector *qiov,
>                                     uint64_t sector_num,
> -                                   int nb_sectors,
> -                                   BlockCompletionFunc *cb,
> -                                   void *opaque)
> +                                   int nb_sectors)
>  {
>      BDRVQuorumState *s = bs->opaque;
> -    QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
> +    QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);

Worth using g_new0() here...

>      int i;
>  
> +    acb->co = qemu_coroutine_self();
> +    acb->bs = bs;
>      acb->sector_num = sector_num;
>      acb->nb_sectors = nb_sectors;
>      acb->qiov = qiov;
> @@ -191,6 +179,7 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
>      acb->rewrite_count = 0;
>      acb->votes.compare = quorum_sha256_compare;
>      QLIST_INIT(&acb->votes.vote_list);
> +    acb->has_completed = false;
>      acb->is_read = false;
>      acb->vote_ret = 0;

...to eliminate 0-assignments here? Not a show-stopper to leave it
as-is, though.


> -static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb);
> +static int read_fifo_child(QuorumAIOCB *acb);
>  
>  static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
>  {
> @@ -272,14 +261,14 @@ static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret)
>      QuorumAIOCB *acb = sacb->parent;
>      QuorumOpType type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE;
>      quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
> -                      sacb->aiocb->bs->node_name, ret);
> +                      sacb->bs->node_name, ret);
>  }
>  
> -static void quorum_fifo_aio_cb(void *opaque, int ret)
> +static int quorum_fifo_aio_cb(void *opaque, int ret)
>  {
>      QuorumChildRequest *sacb = opaque;
>      QuorumAIOCB *acb = sacb->parent;
> -    BDRVQuorumState *s = acb->common.bs->opaque;
> +    BDRVQuorumState *s = acb->bs->opaque;
>  
>      assert(acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO);
>  
> @@ -288,8 +277,7 @@ static void quorum_fifo_aio_cb(void *opaque, int ret)
>  
>          /* We try to read next child in FIFO order if we fail to read */
>          if (acb->children_read < s->num_children) {
> -            read_fifo_child(acb);
> -            return;
> +            return read_fifo_child(acb);
>          }

Question unrelated to this patch: in FIFO mode, are we doing work
sequentially or in parallel?  That is, does the quorum code kick off all
children simultaneously, then wait until the first child answers with
success (and abort all remaining children) or failure (at which point
moving to the second child may already have an answer)?  Or does it only
kick of the first child, wait for a response, and not start the second
child until after the first child fails?  I guess one way has more
potentially wasted work (and a stress test of our ability to cancel work
on secondary children), while the other has higher latencies, so maybe
it is something that a future quorum patch may want to make configurable?

>  
> -static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
> +static int read_fifo_child(QuorumAIOCB *acb)
>  {
> -    BDRVQuorumState *s = acb->common.bs->opaque;
> +    BDRVQuorumState *s = acb->bs->opaque;
>      int n = acb->children_read++;
> +    int ret;
>  
> -    acb->qcrs[n].aiocb = bdrv_aio_readv(s->children[n], acb->sector_num,
> -                                        acb->qiov, acb->nb_sectors,
> -                                        quorum_fifo_aio_cb, &acb->qcrs[n]);
> +    acb->qcrs[n].bs = s->children[n]->bs;
> +    ret = bdrv_co_preadv(s->children[n], acb->sector_num * BDRV_SECTOR_SIZE,
> +                         acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
> +    ret = quorum_fifo_aio_cb(&acb->qcrs[n], ret);

somewhat answering myself - it looks like the current fifo approach is
high-latency rather than parallel, in that at most one child is being
run at a time.

The conversion itself looks sane;
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/8] quorum: Do cleanup in caller coroutine
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 4/8] quorum: Do cleanup in caller coroutine Kevin Wolf
@ 2016-11-21 19:03   ` Eric Blake
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Blake @ 2016-11-21 19:03 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, mreitz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 560 bytes --]

On 11/21/2016 11:31 AM, Kevin Wolf wrote:
> Instead of calling quorum_aio_finalize() deeply nested in what used
> to be an AIO callback, do it in the same functions that allocated the
> AIOCB.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/quorum.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] quorum: Inline quorum_aio_cb()
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 5/8] quorum: Inline quorum_aio_cb() Kevin Wolf
@ 2016-11-21 19:21   ` Eric Blake
  2016-11-22  7:43   ` Alberto Garcia
  1 sibling, 0 replies; 23+ messages in thread
From: Eric Blake @ 2016-11-21 19:21 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, mreitz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 506 bytes --]

On 11/21/2016 11:31 AM, Kevin Wolf wrote:
> This is a conversion to a more natural coroutine style and improves the
> readability of the driver.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 118 ++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 57 insertions(+), 61 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites Kevin Wolf
@ 2016-11-21 19:52   ` Eric Blake
  2016-11-22  7:45   ` Alberto Garcia
  1 sibling, 0 replies; 23+ messages in thread
From: Eric Blake @ 2016-11-21 19:52 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, mreitz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]

On 11/21/2016 11:31 AM, Kevin Wolf wrote:
> Replacing it with bdrv_co_pwritev() prepares us for byte granularity
> requests and gets us rid of the last bdrv_aio_*() user in quorum.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 46 +++++++++++++++++++++++++++++++---------------
>  1 file changed, 31 insertions(+), 15 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
@ 2016-11-21 20:04   ` Eric Blake
  2016-11-22 11:45     ` Kevin Wolf
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Blake @ 2016-11-21 20:04 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, mreitz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4757 bytes --]

On 11/21/2016 11:31 AM, Kevin Wolf wrote:
> This enables byte granularity requests on quorum nodes.
> 
> Note that the QMP events emitted by the driver are an external API that
> we were careless enough to define as sector based. The offset and length
> of requests reported in events are rounded therefore.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/quorum.c | 79 ++++++++++++++++++++++++++--------------------------------
>  1 file changed, 36 insertions(+), 43 deletions(-)
> 

> -static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
> -                              int nb_sectors, char *node_name, int ret)
> +static void quorum_report_bad(QuorumOpType type, uint64_t offset,
> +                              uint64_t bytes, char *node_name, int ret)
>  {
>      const char *msg = NULL;
> +    int64_t start_sector = offset / BDRV_SECTOR_SIZE;
> +    int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);

This one looks correct,

> +
>      if (ret < 0) {
>          msg = strerror(-ret);
>      }
>  
> -    qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name,
> -                                      sector_num, nb_sectors, &error_abort);
> +    qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name, start_sector,
> +                                      end_sector - start_sector, &error_abort);
>  }
>  
>  static void quorum_report_failure(QuorumAIOCB *acb)
>  {
>      const char *reference = bdrv_get_device_or_node_name(acb->bs);
> -    qapi_event_send_quorum_failure(reference, acb->sector_num,
> -                                   acb->nb_sectors, &error_abort);
> +    qapi_event_send_quorum_failure(reference,
> +                                   acb->offset / BDRV_SECTOR_SIZE,
> +                                   acb->bytes / BDRV_SECTOR_SIZE,

but this one still looks like it could give unexpected results for
acb->bytes < BDRV_SECTOR_SIZE.


>  
> -static int quorum_co_readv(BlockDriverState *bs,
> -                           int64_t sector_num, int nb_sectors,
> -                           QEMUIOVector *qiov)
> +static int quorum_co_preadv(BlockDriverState *bs, uint64_t offset,
> +                            uint64_t bytes, QEMUIOVector *qiov, int flags)
>  {

Is it worth adding assert(!flags)?  For now, the block layer doesn't
have any defined flags (and if it did, we'd probably want to add a
.supported_read_flags to parallel the existing .supported_write_flags).

>      BDRVQuorumState *s = bs->opaque;
> -    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, sector_num, nb_sectors);
> +    QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes);
>      int ret;
>  
>      acb->is_read = true;
> @@ -726,9 +722,7 @@ static void write_quorum_entry(void *opaque)
>      QuorumChildRequest *sacb = &acb->qcrs[i];
>  
>      sacb->bs = s->children[i]->bs;
> -    sacb->ret = bdrv_co_pwritev(s->children[i],
> -                                acb->sector_num * BDRV_SECTOR_SIZE,
> -                                acb->nb_sectors * BDRV_SECTOR_SIZE,
> +    sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
>                                  acb->qiov, 0);

A followup patch that supports non-zero flags such as BDRV_REQUEST_FUA
might be nice (presumably, we would advertise supported_write_flags of
of BDRV_REQUEST_FUA only if all children support it).  I agree that this
patch is not the place to do it, though, so hard-coding to 0 may be
okay, if...

>      if (sacb->ret == 0) {
>          acb->success_count++;
> @@ -745,12 +739,11 @@ static void write_quorum_entry(void *opaque)
>      }
>  }
>  
> -static int quorum_co_writev(BlockDriverState *bs,
> -                            int64_t sector_num, int nb_sectors,
> -                            QEMUIOVector *qiov)
> +static int quorum_co_pwritev(BlockDriverState *bs, uint64_t offset,
> +                             uint64_t bytes, QEMUIOVector *qiov, int flags)
>  {

...you make a decision on whether asserting that flags is zero is best
for now.

[Huh - side thought: right now, we don't have any defined semantics for
BDRV_REQUEST_FUA on reads (although we modeled it in part after SCSI,
which does have it defined for reads).  But quorum rewrites on read
might be an interesting application of where we can trigger a write
during reads, and where we may want to guarantee FUA semantics on those
writes, thus making a potentially plausible use of the flag on read]

Due to my question about rounding in the event code, I'm omitting R-b
for now.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] quorum: Inline quorum_fifo_aio_cb()
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 8/8] quorum: Inline quorum_fifo_aio_cb() Kevin Wolf
@ 2016-11-21 20:08   ` Eric Blake
  2016-11-22  9:23     ` Alberto Garcia
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Blake @ 2016-11-21 20:08 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, mreitz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1872 bytes --]

On 11/21/2016 11:31 AM, Kevin Wolf wrote:
> Inlining the function removes some boilerplace code and replaces
> recursion by a simple loop, so the code becomes somewhat easier to
> understand.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/quorum.c | 42 +++++++++++++-----------------------------
>  1 file changed, 13 insertions(+), 29 deletions(-)
> 

>                                         QuorumVoteValue *value)
> @@ -683,12 +659,20 @@ static int read_quorum_children(QuorumAIOCB *acb)
>  static int read_fifo_child(QuorumAIOCB *acb)
>  {
>      BDRVQuorumState *s = acb->bs->opaque;
> -    int n = acb->children_read++;
> -    int ret;
> +    int n, ret;
> +
> +    /* We try to read the next child in FIFO order if we failed to read */
> +    do {
> +        n = acb->children_read++;
> +        acb->qcrs[n].bs = s->children[n]->bs;
> +        ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes,
> +                             acb->qiov, 0);
> +        if (ret < 0) {
> +            quorum_report_bad_acb(&acb->qcrs[n], ret);
> +        }
> +    } while (ret < 0 && acb->children_read < s->num_children);

Back to my comments earlier in the series - We may want to think about a
parallel FIFO mode, where we kick off all reads, but then are prepared
to cancel reads on later children once we have a positive answer on an
earlier child.  This rewrite may interfere with such a mode, where we'd
have to go back to the callback approach.  But I'm not convinced we need
the complexity of a parallel fifo mode, and your rewrite is indeed
simpler to read, so I won't let it hold up my review.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev Kevin Wolf
  2016-11-21 17:58   ` Eric Blake
@ 2016-11-22  7:39   ` Alberto Garcia
  1 sibling, 0 replies; 23+ messages in thread
From: Alberto Garcia @ 2016-11-22  7:39 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eblake, mreitz, qemu-devel

On Mon 21 Nov 2016 06:31:23 PM CET, Kevin Wolf <kwolf@redhat.com> wrote:
> This converts the quorum block driver from implementing callback-based
> interfaces for read/write to coroutine-based ones. This is the first
> step that will allow us further simplification of the code.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] quorum: Inline quorum_aio_cb()
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 5/8] quorum: Inline quorum_aio_cb() Kevin Wolf
  2016-11-21 19:21   ` Eric Blake
@ 2016-11-22  7:43   ` Alberto Garcia
  1 sibling, 0 replies; 23+ messages in thread
From: Alberto Garcia @ 2016-11-22  7:43 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eblake, mreitz, qemu-devel

On Mon 21 Nov 2016 06:31:25 PM CET, Kevin Wolf <kwolf@redhat.com> wrote:
> This is a conversion to a more natural coroutine style and improves the
> readability of the driver.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites
  2016-11-21 17:31 ` [Qemu-devel] [PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites Kevin Wolf
  2016-11-21 19:52   ` Eric Blake
@ 2016-11-22  7:45   ` Alberto Garcia
  1 sibling, 0 replies; 23+ messages in thread
From: Alberto Garcia @ 2016-11-22  7:45 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eblake, mreitz, qemu-devel

On Mon 21 Nov 2016 06:31:26 PM CET, Kevin Wolf <kwolf@redhat.com> wrote:
> Replacing it with bdrv_co_pwritev() prepares us for byte granularity
> requests and gets us rid of the last bdrv_aio_*() user in quorum.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] quorum: Inline quorum_fifo_aio_cb()
  2016-11-21 20:08   ` Eric Blake
@ 2016-11-22  9:23     ` Alberto Garcia
  2016-11-22 12:51       ` Eric Blake
  0 siblings, 1 reply; 23+ messages in thread
From: Alberto Garcia @ 2016-11-22  9:23 UTC (permalink / raw)
  To: Eric Blake, Kevin Wolf, qemu-block; +Cc: mreitz, qemu-devel

On Mon 21 Nov 2016 09:08:06 PM CET, Eric Blake <eblake@redhat.com> wrote:

> Back to my comments earlier in the series - We may want to think about
> a parallel FIFO mode, where we kick off all reads, but then are
> prepared to cancel reads on later children once we have a positive
> answer on an earlier child.

And what happens if the positive answer comes first from a later child?
Does COLO rely on the assumption that the first child is always read
first?

Berto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev
  2016-11-21 17:58   ` Eric Blake
@ 2016-11-22 11:32     ` Kevin Wolf
  0 siblings, 0 replies; 23+ messages in thread
From: Kevin Wolf @ 2016-11-22 11:32 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-block, berto, mreitz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5722 bytes --]

Am 21.11.2016 um 18:58 hat Eric Blake geschrieben:
> On 11/21/2016 11:31 AM, Kevin Wolf wrote:
> > This converts the quorum block driver from implementing callback-based
> > interfaces for read/write to coroutine-based ones. This is the first
> > step that will allow us further simplification of the code.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/quorum.c | 192 ++++++++++++++++++++++++++++++++++-----------------------
> >  1 file changed, 115 insertions(+), 77 deletions(-)
> > 
> 
> > @@ -174,14 +162,14 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
> >  static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
> >                                     QEMUIOVector *qiov,
> >                                     uint64_t sector_num,
> > -                                   int nb_sectors,
> > -                                   BlockCompletionFunc *cb,
> > -                                   void *opaque)
> > +                                   int nb_sectors)
> >  {
> >      BDRVQuorumState *s = bs->opaque;
> > -    QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
> > +    QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);
> 
> Worth using g_new0() here...
> 
> >      int i;
> >  
> > +    acb->co = qemu_coroutine_self();
> > +    acb->bs = bs;
> >      acb->sector_num = sector_num;
> >      acb->nb_sectors = nb_sectors;
> >      acb->qiov = qiov;
> > @@ -191,6 +179,7 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
> >      acb->rewrite_count = 0;
> >      acb->votes.compare = quorum_sha256_compare;
> >      QLIST_INIT(&acb->votes.vote_list);
> > +    acb->has_completed = false;
> >      acb->is_read = false;
> >      acb->vote_ret = 0;
> 
> ...to eliminate 0-assignments here? Not a show-stopper to leave it
> as-is, though.

Not in this patch anyway. I could add a cleanup patch at the end of
series or as a follow-up, though. As you probably know by now, my style
of writing this in new code would use a compound literal:

    QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);
    *acb = (QuorumAIOCB) {
        ...
    };

> > -static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb);
> > +static int read_fifo_child(QuorumAIOCB *acb);
> >  
> >  static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
> >  {
> > @@ -272,14 +261,14 @@ static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret)
> >      QuorumAIOCB *acb = sacb->parent;
> >      QuorumOpType type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE;
> >      quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
> > -                      sacb->aiocb->bs->node_name, ret);
> > +                      sacb->bs->node_name, ret);
> >  }
> >  
> > -static void quorum_fifo_aio_cb(void *opaque, int ret)
> > +static int quorum_fifo_aio_cb(void *opaque, int ret)
> >  {
> >      QuorumChildRequest *sacb = opaque;
> >      QuorumAIOCB *acb = sacb->parent;
> > -    BDRVQuorumState *s = acb->common.bs->opaque;
> > +    BDRVQuorumState *s = acb->bs->opaque;
> >  
> >      assert(acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO);
> >  
> > @@ -288,8 +277,7 @@ static void quorum_fifo_aio_cb(void *opaque, int ret)
> >  
> >          /* We try to read next child in FIFO order if we fail to read */
> >          if (acb->children_read < s->num_children) {
> > -            read_fifo_child(acb);
> > -            return;
> > +            return read_fifo_child(acb);
> >          }
> 
> Question unrelated to this patch: in FIFO mode, are we doing work
> sequentially or in parallel?  That is, does the quorum code kick off all
> children simultaneously, then wait until the first child answers with
> success (and abort all remaining children) or failure (at which point
> moving to the second child may already have an answer)?  Or does it only
> kick of the first child, wait for a response, and not start the second
> child until after the first child fails?

It's the latter. This is quite easy to see in the new model (at the
end of this patch series) because in FIFO mode, reads don't spawn
coroutines, but just have a loop of bdrv_co_preadv() calls.

> I guess one way has more
> potentially wasted work (and a stress test of our ability to cancel work
> on secondary children), while the other has higher latencies, so maybe
> it is something that a future quorum patch may want to make configurable?

Our ability to cancel work barely exists, so I'm not too sure whether
the other way would really be worth implementing.

> >  
> > -static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
> > +static int read_fifo_child(QuorumAIOCB *acb)
> >  {
> > -    BDRVQuorumState *s = acb->common.bs->opaque;
> > +    BDRVQuorumState *s = acb->bs->opaque;
> >      int n = acb->children_read++;
> > +    int ret;
> >  
> > -    acb->qcrs[n].aiocb = bdrv_aio_readv(s->children[n], acb->sector_num,
> > -                                        acb->qiov, acb->nb_sectors,
> > -                                        quorum_fifo_aio_cb, &acb->qcrs[n]);
> > +    acb->qcrs[n].bs = s->children[n]->bs;
> > +    ret = bdrv_co_preadv(s->children[n], acb->sector_num * BDRV_SECTOR_SIZE,
> > +                         acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
> > +    ret = quorum_fifo_aio_cb(&acb->qcrs[n], ret);
> 
> somewhat answering myself - it looks like the current fifo approach is
> high-latency rather than parallel, in that at most one child is being
> run at a time.

Yes, you can see it in this patch already, even if it's even clearer at
the end of the series.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-21 20:04   ` Eric Blake
@ 2016-11-22 11:45     ` Kevin Wolf
  2016-11-22 12:49       ` Eric Blake
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2016-11-22 11:45 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-block, berto, mreitz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3637 bytes --]

Am 21.11.2016 um 21:04 hat Eric Blake geschrieben:
> On 11/21/2016 11:31 AM, Kevin Wolf wrote:
> > This enables byte granularity requests on quorum nodes.
> > 
> > Note that the QMP events emitted by the driver are an external API that
> > we were careless enough to define as sector based. The offset and length
> > of requests reported in events are rounded therefore.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/quorum.c | 79 ++++++++++++++++++++++++++--------------------------------
> >  1 file changed, 36 insertions(+), 43 deletions(-)
> > 
> 
> > -static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
> > -                              int nb_sectors, char *node_name, int ret)
> > +static void quorum_report_bad(QuorumOpType type, uint64_t offset,
> > +                              uint64_t bytes, char *node_name, int ret)
> >  {
> >      const char *msg = NULL;
> > +    int64_t start_sector = offset / BDRV_SECTOR_SIZE;
> > +    int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
> 
> This one looks correct,
> 
> > +
> >      if (ret < 0) {
> >          msg = strerror(-ret);
> >      }
> >  
> > -    qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name,
> > -                                      sector_num, nb_sectors, &error_abort);
> > +    qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name, start_sector,
> > +                                      end_sector - start_sector, &error_abort);
> >  }
> >  
> >  static void quorum_report_failure(QuorumAIOCB *acb)
> >  {
> >      const char *reference = bdrv_get_device_or_node_name(acb->bs);
> > -    qapi_event_send_quorum_failure(reference, acb->sector_num,
> > -                                   acb->nb_sectors, &error_abort);
> > +    qapi_event_send_quorum_failure(reference,
> > +                                   acb->offset / BDRV_SECTOR_SIZE,
> > +                                   acb->bytes / BDRV_SECTOR_SIZE,
> 
> but this one still looks like it could give unexpected results for
> acb->bytes < BDRV_SECTOR_SIZE.

Thanks, I missed this one. I'll send a v2.

> > -static int quorum_co_readv(BlockDriverState *bs,
> > -                           int64_t sector_num, int nb_sectors,
> > -                           QEMUIOVector *qiov)
> > +static int quorum_co_preadv(BlockDriverState *bs, uint64_t offset,
> > +                            uint64_t bytes, QEMUIOVector *qiov, int flags)
> >  {
> 
> Is it worth adding assert(!flags)?  For now, the block layer doesn't
> have any defined flags (and if it did, we'd probably want to add a
> .supported_read_flags to parallel the existing .supported_write_flags).

I don't think we need to assert this, no other driver does that. We have
.supported_write_flags and I would indeed add .supported_read_flags
if/when we start using flags for read requests, so we can be reasonably
sure that only those flags are set even without asserting it.

> [Huh - side thought: right now, we don't have any defined semantics for
> BDRV_REQUEST_FUA on reads (although we modeled it in part after SCSI,
> which does have it defined for reads).  But quorum rewrites on read
> might be an interesting application of where we can trigger a write
> during reads, and where we may want to guarantee FUA semantics on those
> writes, thus making a potentially plausible use of the flag on read]

Makes sense to me, but that's something for a different series. And
actually, I'm not sure who would even send a read with FUA set today.
Can this even happen yet?

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev()
  2016-11-22 11:45     ` Kevin Wolf
@ 2016-11-22 12:49       ` Eric Blake
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Blake @ 2016-11-22 12:49 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, berto, mreitz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1840 bytes --]

On 11/22/2016 05:45 AM, Kevin Wolf wrote:
>>
>> Is it worth adding assert(!flags)?  For now, the block layer doesn't
>> have any defined flags (and if it did, we'd probably want to add a
>> .supported_read_flags to parallel the existing .supported_write_flags).
> 
> I don't think we need to assert this, no other driver does that. We have
> .supported_write_flags and I would indeed add .supported_read_flags
> if/when we start using flags for read requests, so we can be reasonably
> sure that only those flags are set even without asserting it.

Seems reasonable.

> 
>> [Huh - side thought: right now, we don't have any defined semantics for
>> BDRV_REQUEST_FUA on reads (although we modeled it in part after SCSI,
>> which does have it defined for reads).  But quorum rewrites on read
>> might be an interesting application of where we can trigger a write
>> during reads, and where we may want to guarantee FUA semantics on those
>> writes, thus making a potentially plausible use of the flag on read]
> 
> Makes sense to me, but that's something for a different series. And
> actually, I'm not sure who would even send a read with FUA set today.
> Can this even happen yet?

Our iscsi backend has to emulate guests that send FUA on a SCSI read
command (since the SCSI spec has defined semantics for it); right now,
it does that by ignoring the flag (if I read the code correctly). The
NBD spec also says that clients MAY send FUA on any command including
read, but that the server MAY ignore FUA on all but writes.  Our NBD
client code used to send the FUA bit on flush (but not read), but I
ripped that out in a89ef0c.  So yeah, it's definitely something for a
different series, if ever.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] quorum: Inline quorum_fifo_aio_cb()
  2016-11-22  9:23     ` Alberto Garcia
@ 2016-11-22 12:51       ` Eric Blake
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Blake @ 2016-11-22 12:51 UTC (permalink / raw)
  To: Alberto Garcia, Kevin Wolf, qemu-block; +Cc: mreitz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]

On 11/22/2016 03:23 AM, Alberto Garcia wrote:
> On Mon 21 Nov 2016 09:08:06 PM CET, Eric Blake <eblake@redhat.com> wrote:
> 
>> Back to my comments earlier in the series - We may want to think about
>> a parallel FIFO mode, where we kick off all reads, but then are
>> prepared to cancel reads on later children once we have a positive
>> answer on an earlier child.
> 
> And what happens if the positive answer comes first from a later child?
> Does COLO rely on the assumption that the first child is always read
> first?

In parallel FIFO mode, it wouldn't matter if the second child has an
answer first, you STILL have to wait for the first child's answer. It's
just that if the first child fails, you have a headstart (or even an
answer already) on the second child, rather than starting the second
child from scratch.  In other words, parallel mode wastes a lot of work
when the first child is successful, but saves time if the first child fails.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-11-22 12:51 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-21 17:31 [Qemu-devel] [PATCH 0/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
2016-11-21 17:31 ` [Qemu-devel] [PATCH 1/8] coroutine: Introduce qemu_coroutine_enter_if_inactive() Kevin Wolf
2016-11-21 17:31 ` [Qemu-devel] [PATCH 2/8] quorum: Remove s from quorum_aio_get() arguments Kevin Wolf
2016-11-21 17:31 ` [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev Kevin Wolf
2016-11-21 17:58   ` Eric Blake
2016-11-22 11:32     ` Kevin Wolf
2016-11-22  7:39   ` Alberto Garcia
2016-11-21 17:31 ` [Qemu-devel] [PATCH 4/8] quorum: Do cleanup in caller coroutine Kevin Wolf
2016-11-21 19:03   ` Eric Blake
2016-11-21 17:31 ` [Qemu-devel] [PATCH 5/8] quorum: Inline quorum_aio_cb() Kevin Wolf
2016-11-21 19:21   ` Eric Blake
2016-11-22  7:43   ` Alberto Garcia
2016-11-21 17:31 ` [Qemu-devel] [PATCH 6/8] quorum: Avoid bdrv_aio_writev() for rewrites Kevin Wolf
2016-11-21 19:52   ` Eric Blake
2016-11-22  7:45   ` Alberto Garcia
2016-11-21 17:31 ` [Qemu-devel] [PATCH 7/8] quorum: Implement .bdrv_co_preadv/pwritev() Kevin Wolf
2016-11-21 20:04   ` Eric Blake
2016-11-22 11:45     ` Kevin Wolf
2016-11-22 12:49       ` Eric Blake
2016-11-21 17:31 ` [Qemu-devel] [PATCH 8/8] quorum: Inline quorum_fifo_aio_cb() Kevin Wolf
2016-11-21 20:08   ` Eric Blake
2016-11-22  9:23     ` Alberto Garcia
2016-11-22 12:51       ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.