All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite
@ 2016-03-16 14:16 Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 01/16] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
                   ` (16 more replies)
  0 siblings, 17 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

Now that the current dataplane failure has been isolated to an unwanted
reentrancy in virtio, I can post v2 of these patches.  Please review,
as I would like to get part 2 in QEMU 2.6 as well (I plan to send it out
tomorrow, since I hope there will be no comments on this one and I have
already tested both of them).

The changes from v1 are minor and all correspond to Fam's review (I've
pointed them out in the single patches - patch 2,5,7,9,13).

Paolo

Paolo Bonzini (16):
  block: make bdrv_start_throttled_reqs return void
  block: move restarting of throttled reqs to block/throttle-groups.c
  block: introduce bdrv_no_throttling_begin/end
  block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
  mirror: use bottom half to re-enter coroutine
  block: add BDS field to count in-flight requests
  block: change drain to look only at one child at a time
  blockjob: introduce .drain callback for jobs
  block: wait for all pending I/O when doing synchronous requests
  nfs: replace aio_poll with bdrv_drain
  sheepdog: disable dataplane
  aio: introduce aio_context_in_iothread
  block: only call aio_poll from iothread
  iothread: release AioContext around aio_poll
  qemu-thread: introduce QemuRecMutex
  aio: convert from RFifoLock to QemuRecMutex

 async.c                         |  28 +---
 block.c                         |   4 +-
 block/backup.c                  |   7 +
 block/io.c                      | 276 ++++++++++++++++++++++++----------------
 block/linux-aio.c               |  13 +-
 block/mirror.c                  |  38 +++++-
 block/nfs.c                     |  50 +++++---
 block/qed-table.c               |  16 +--
 block/raw-aio.h                 |   2 +-
 block/raw-posix.c               |  16 +--
 block/sheepdog.c                |  19 +++
 block/throttle-groups.c         |  20 +++
 blockjob.c                      |  16 ++-
 docs/multiple-iothreads.txt     |  40 +++---
 include/block/aio.h             |  13 +-
 include/block/block.h           |   3 +-
 include/block/block_int.h       |  22 +++-
 include/block/blockjob.h        |   7 +
 include/block/throttle-groups.h |   1 +
 include/qemu/rfifolock.h        |  54 --------
 include/qemu/thread-posix.h     |   6 +
 include/qemu/thread-win32.h     |  10 ++
 include/qemu/thread.h           |   3 +
 iothread.c                      |  20 +--
 stubs/Makefile.objs             |   1 +
 stubs/iothread.c                |   8 ++
 tests/.gitignore                |   1 -
 tests/Makefile                  |   2 -
 tests/qemu-iotests/060          |   8 +-
 tests/qemu-iotests/060.out      |   4 +-
 tests/test-aio.c                |  22 ++--
 tests/test-rfifolock.c          |  92 --------------
 util/Makefile.objs              |   1 -
 util/qemu-thread-posix.c        |  13 ++
 util/qemu-thread-win32.c        |  25 ++++
 util/rfifolock.c                |  78 ------------
 36 files changed, 461 insertions(+), 478 deletions(-)
 delete mode 100644 include/qemu/rfifolock.h
 create mode 100644 stubs/iothread.c
 delete mode 100644 tests/test-rfifolock.c
 delete mode 100644 util/rfifolock.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 01/16] block: make bdrv_start_throttled_reqs return void
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 02/16] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

The return value is unused and I am not sure why it would be useful.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index a69bfc4..e58cfe2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -75,10 +75,8 @@ void bdrv_set_io_limits(BlockDriverState *bs,
     }
 }
 
-/* this function drain all the throttled IOs */
-static bool bdrv_start_throttled_reqs(BlockDriverState *bs)
+static void bdrv_start_throttled_reqs(BlockDriverState *bs)
 {
-    bool drained = false;
     bool enabled = bs->io_limits_enabled;
     int i;
 
@@ -86,13 +84,11 @@ static bool bdrv_start_throttled_reqs(BlockDriverState *bs)
 
     for (i = 0; i < 2; i++) {
         while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
-            drained = true;
+            ;
         }
     }
 
     bs->io_limits_enabled = enabled;
-
-    return drained;
 }
 
 void bdrv_io_limits_disable(BlockDriverState *bs)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 02/16] block: move restarting of throttled reqs to block/throttle-groups.c
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 01/16] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-17  2:39   ` Fam Zheng
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 03/16] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

We want to remove throttled_reqs from block/io.c.  This is the easy
part---hide the handling of throttled_reqs during disable/enable of
throttling within throttle-groups.c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: only restart first request after throttle_group_config

 block/io.c                      | 15 +--------------
 block/throttle-groups.c         | 16 ++++++++++++++++
 include/block/throttle-groups.h |  1 +
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/block/io.c b/block/io.c
index e58cfe2..5ee5032 100644
--- a/block/io.c
+++ b/block/io.c
@@ -66,28 +66,15 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
 void bdrv_set_io_limits(BlockDriverState *bs,
                         ThrottleConfig *cfg)
 {
-    int i;
-
     throttle_group_config(bs, cfg);
-
-    for (i = 0; i < 2; i++) {
-        qemu_co_enter_next(&bs->throttled_reqs[i]);
-    }
 }
 
 static void bdrv_start_throttled_reqs(BlockDriverState *bs)
 {
     bool enabled = bs->io_limits_enabled;
-    int i;
 
     bs->io_limits_enabled = false;
-
-    for (i = 0; i < 2; i++) {
-        while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
-            ;
-        }
-    }
-
+    throttle_group_restart_bs(bs);
     bs->io_limits_enabled = enabled;
 }
 
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 4920e09..9f52d2b 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -313,6 +313,17 @@ void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs,
     qemu_mutex_unlock(&tg->lock);
 }
 
+void throttle_group_restart_bs(BlockDriverState *bs)
+{
+    int i;
+
+    for (i = 0; i < 2; i++) {
+        while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
+            ;
+        }
+    }
+}
+
 /* Update the throttle configuration for a particular group. Similar
  * to throttle_config(), but guarantees atomicity within the
  * throttling group.
@@ -335,6 +346,11 @@ void throttle_group_config(BlockDriverState *bs, ThrottleConfig *cfg)
     }
     throttle_config(ts, tt, cfg);
     qemu_mutex_unlock(&tg->lock);
+
+    aio_context_acquire(bdrv_get_aio_context(bs));
+    qemu_co_enter_next(&bs->throttled_reqs[0]);
+    qemu_co_enter_next(&bs->throttled_reqs[1]);
+    aio_context_release(bdrv_get_aio_context(bs));
 }
 
 /* Get the throttle configuration from a particular group. Similar to
diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index aba28f3..395f72d 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -38,6 +38,7 @@ void throttle_group_get_config(BlockDriverState *bs, ThrottleConfig *cfg);
 
 void throttle_group_register_bs(BlockDriverState *bs, const char *groupname);
 void throttle_group_unregister_bs(BlockDriverState *bs);
+void throttle_group_restart_bs(BlockDriverState *bs);
 
 void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs,
                                                         unsigned int bytes,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 03/16] block: introduce bdrv_no_throttling_begin/end
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 01/16] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 02/16] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-17  2:52   ` Fam Zheng
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 04/16] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Paolo Bonzini
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

Extract the handling of throttling from bdrv_flush_io_queue.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c                   |  1 -
 block/io.c                | 56 +++++++++++++++++++++++++++++------------------
 block/throttle-groups.c   |  4 ++++
 include/block/block_int.h |  6 ++---
 4 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/block.c b/block.c
index 59a18a3..bbd7de6 100644
--- a/block.c
+++ b/block.c
@@ -2304,7 +2304,6 @@ static void swap_feature_fields(BlockDriverState *bs_top,
 
     assert(!bs_new->throttle_state);
     if (bs_top->throttle_state) {
-        assert(bs_top->io_limits_enabled);
         bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
         bdrv_io_limits_disable(bs_top);
     }
diff --git a/block/io.c b/block/io.c
index 5ee5032..272caac 100644
--- a/block/io.c
+++ b/block/io.c
@@ -69,28 +69,43 @@ void bdrv_set_io_limits(BlockDriverState *bs,
     throttle_group_config(bs, cfg);
 }
 
-static void bdrv_start_throttled_reqs(BlockDriverState *bs)
+static void bdrv_no_throttling_begin(BlockDriverState *bs)
 {
-    bool enabled = bs->io_limits_enabled;
+    BdrvChild *child;
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_no_throttling_begin(child->bs);
+    }
 
-    bs->io_limits_enabled = false;
-    throttle_group_restart_bs(bs);
-    bs->io_limits_enabled = enabled;
+    if (bs->io_limits_disabled++ == 0) {
+        throttle_group_restart_bs(bs);
+    }
+}
+
+static void bdrv_no_throttling_end(BlockDriverState *bs)
+{
+    BdrvChild *child;
+
+    --bs->io_limits_disabled;
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_no_throttling_end(child->bs);
+    }
 }
 
 void bdrv_io_limits_disable(BlockDriverState *bs)
 {
-    bs->io_limits_enabled = false;
-    bdrv_start_throttled_reqs(bs);
+    assert(bs->throttle_state);
+    bdrv_no_throttling_begin(bs);
     throttle_group_unregister_bs(bs);
+    bdrv_no_throttling_end(bs);
 }
 
 /* should be called before bdrv_set_io_limits if a limit is set */
 void bdrv_io_limits_enable(BlockDriverState *bs, const char *group)
 {
-    assert(!bs->io_limits_enabled);
+    assert(!bs->throttle_state);
     throttle_group_register_bs(bs, group);
-    bs->io_limits_enabled = true;
 }
 
 void bdrv_io_limits_update_group(BlockDriverState *bs, const char *group)
@@ -255,6 +270,7 @@ void bdrv_drain(BlockDriverState *bs)
 {
     bool busy = true;
 
+    bdrv_no_throttling_begin(bs);
     bdrv_drain_recurse(bs);
     while (busy) {
         /* Keep iterating */
@@ -262,6 +278,7 @@ void bdrv_drain(BlockDriverState *bs)
          busy = bdrv_requests_pending(bs);
          busy |= aio_poll(bdrv_get_aio_context(bs), busy);
     }
+    bdrv_no_throttling_end(bs);
 }
 
 /*
@@ -284,6 +301,7 @@ void bdrv_drain_all(void)
         if (bs->job) {
             block_job_pause(bs->job);
         }
+        bdrv_no_throttling_begin(bs);
         bdrv_drain_recurse(bs);
         aio_context_release(aio_context);
 
@@ -325,6 +343,7 @@ void bdrv_drain_all(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
+        bdrv_no_throttling_end(bs);
         if (bs->job) {
             block_job_resume(bs->job);
         }
@@ -555,11 +574,7 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
      * will not fire; so the I/O throttling function has to be disabled here
      * if it has been enabled.
      */
-    if (bs->io_limits_enabled) {
-        fprintf(stderr, "Disabling I/O throttling on '%s' due "
-                        "to synchronous I/O.\n", bdrv_get_device_name(bs));
-        bdrv_io_limits_disable(bs);
-    }
+    bdrv_no_throttling_begin(bs);
 
     if (qemu_in_coroutine()) {
         /* Fast-path if already in coroutine context */
@@ -573,6 +588,8 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
             aio_poll(aio_context, true);
         }
     }
+
+    bdrv_no_throttling_end(bs);
     return rwco.ret;
 }
 
@@ -608,13 +625,11 @@ int bdrv_read(BlockDriverState *bs, int64_t sector_num,
 int bdrv_read_unthrottled(BlockDriverState *bs, int64_t sector_num,
                           uint8_t *buf, int nb_sectors)
 {
-    bool enabled;
     int ret;
 
-    enabled = bs->io_limits_enabled;
-    bs->io_limits_enabled = false;
+    bdrv_no_throttling_begin(bs);
     ret = bdrv_read(bs, sector_num, buf, nb_sectors);
-    bs->io_limits_enabled = enabled;
+    bdrv_no_throttling_end(bs);
     return ret;
 }
 
@@ -952,7 +967,7 @@ static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
     }
 
     /* throttling disk I/O */
-    if (bs->io_limits_enabled) {
+    if (bs->throttle_state) {
         throttle_group_co_io_limits_intercept(bs, bytes, false);
     }
 
@@ -1294,7 +1309,7 @@ static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
     }
 
     /* throttling disk I/O */
-    if (bs->io_limits_enabled) {
+    if (bs->throttle_state) {
         throttle_group_co_io_limits_intercept(bs, bytes, true);
     }
 
@@ -2749,7 +2764,6 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
     } else if (bs->file) {
         bdrv_flush_io_queue(bs->file->bs);
     }
-    bdrv_start_throttled_reqs(bs);
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 9f52d2b..b348886 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -219,6 +219,10 @@ static bool throttle_group_schedule_timer(BlockDriverState *bs,
     ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
     bool must_wait;
 
+    if (bs->io_limits_disabled) {
+        return false;
+    }
+
     /* Check if any of the timers in this group is already armed */
     if (tg->any_timer_armed[is_write]) {
         return true;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index dda5ba0..dac6d43 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -412,10 +412,10 @@ struct BlockDriverState {
 
     /* I/O throttling.
      * throttle_state tells us if this BDS has I/O limits configured.
-     * io_limits_enabled tells us if they are currently being
-     * enforced, but it can be temporarily set to false */
+     * io_limits_disabled tells us if they are currently being enforced */
     CoQueue      throttled_reqs[2];
-    bool         io_limits_enabled;
+    unsigned int io_limits_disabled;
+
     /* The following fields are protected by the ThrottleGroup lock.
      * See the ThrottleGroup documentation for details. */
     ThrottleState *throttle_state;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 04/16] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 03/16] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 05/16] mirror: use bottom half to re-enter coroutine Paolo Bonzini
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.

Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children.  The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c                | 72 +++++++++++++++++++++++++++++++++++------------
 block/linux-aio.c         | 13 ++++-----
 block/raw-aio.h           |  2 +-
 block/raw-posix.c         | 16 +----------
 include/block/block.h     |  3 +-
 include/block/block_int.h |  5 +++-
 6 files changed, 67 insertions(+), 44 deletions(-)

diff --git a/block/io.c b/block/io.c
index 272caac..d8d50b7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -271,13 +271,14 @@ void bdrv_drain(BlockDriverState *bs)
     bool busy = true;
 
     bdrv_no_throttling_begin(bs);
+    bdrv_io_unplugged_begin(bs);
     bdrv_drain_recurse(bs);
     while (busy) {
         /* Keep iterating */
-         bdrv_flush_io_queue(bs);
          busy = bdrv_requests_pending(bs);
          busy |= aio_poll(bdrv_get_aio_context(bs), busy);
     }
+    bdrv_io_unplugged_end(bs);
     bdrv_no_throttling_end(bs);
 }
 
@@ -302,6 +303,7 @@ void bdrv_drain_all(void)
             block_job_pause(bs->job);
         }
         bdrv_no_throttling_begin(bs);
+        bdrv_io_unplugged_begin(bs);
         bdrv_drain_recurse(bs);
         aio_context_release(aio_context);
 
@@ -326,7 +328,6 @@ void bdrv_drain_all(void)
             aio_context_acquire(aio_context);
             while ((bs = bdrv_next(bs))) {
                 if (aio_context == bdrv_get_aio_context(bs)) {
-                    bdrv_flush_io_queue(bs);
                     if (bdrv_requests_pending(bs)) {
                         busy = true;
                         aio_poll(aio_context, busy);
@@ -343,6 +344,7 @@ void bdrv_drain_all(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
+        bdrv_io_unplugged_end(bs);
         bdrv_no_throttling_end(bs);
         if (bs->job) {
             block_job_resume(bs->job);
@@ -2738,31 +2740,65 @@ void bdrv_add_before_write_notifier(BlockDriverState *bs,
 
 void bdrv_io_plug(BlockDriverState *bs)
 {
-    BlockDriver *drv = bs->drv;
-    if (drv && drv->bdrv_io_plug) {
-        drv->bdrv_io_plug(bs);
-    } else if (bs->file) {
-        bdrv_io_plug(bs->file->bs);
+    BdrvChild *child;
+
+    if (bs->io_plugged++ == 0 && bs->io_plug_disabled == 0) {
+        BlockDriver *drv = bs->drv;
+        if (drv && drv->bdrv_io_plug) {
+            drv->bdrv_io_plug(bs);
+        }
+    }
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_io_plug(child->bs);
     }
 }
 
 void bdrv_io_unplug(BlockDriverState *bs)
 {
-    BlockDriver *drv = bs->drv;
-    if (drv && drv->bdrv_io_unplug) {
-        drv->bdrv_io_unplug(bs);
-    } else if (bs->file) {
-        bdrv_io_unplug(bs->file->bs);
+    BdrvChild *child;
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_io_unplug(child->bs);
+    }
+
+    if (--bs->io_plugged == 0 && bs->io_plug_disabled == 0) {
+        BlockDriver *drv = bs->drv;
+        if (drv && drv->bdrv_io_unplug) {
+            drv->bdrv_io_unplug(bs);
+        }
     }
 }
 
-void bdrv_flush_io_queue(BlockDriverState *bs)
+void bdrv_io_unplugged_begin(BlockDriverState *bs)
 {
-    BlockDriver *drv = bs->drv;
-    if (drv && drv->bdrv_flush_io_queue) {
-        drv->bdrv_flush_io_queue(bs);
-    } else if (bs->file) {
-        bdrv_flush_io_queue(bs->file->bs);
+    BdrvChild *child;
+
+    if (bs->io_plug_disabled++ == 0 && bs->io_plugged > 0) {
+        BlockDriver *drv = bs->drv;
+        if (drv && drv->bdrv_io_unplug) {
+            drv->bdrv_io_unplug(bs);
+        }
+    }
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_io_unplugged_begin(child->bs);
+    }
+}
+
+void bdrv_io_unplugged_end(BlockDriverState *bs)
+{
+    BdrvChild *child;
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_io_unplugged_end(child->bs);
+    }
+
+    if (--bs->io_plug_disabled == 0 && bs->io_plugged > 0) {
+        BlockDriver *drv = bs->drv;
+        if (drv && drv->bdrv_io_plug) {
+            drv->bdrv_io_plug(bs);
+        }
     }
 }
 
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 805757e..102bf92 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -220,19 +220,16 @@ void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
 {
     struct qemu_laio_state *s = aio_ctx;
 
-    s->io_q.plugged++;
+    assert(!s->io_q.plugged);
+    s->io_q.plugged = 1;
 }
 
-void laio_io_unplug(BlockDriverState *bs, void *aio_ctx, bool unplug)
+void laio_io_unplug(BlockDriverState *bs, void *aio_ctx)
 {
     struct qemu_laio_state *s = aio_ctx;
 
-    assert(s->io_q.plugged > 0 || !unplug);
-
-    if (unplug && --s->io_q.plugged > 0) {
-        return;
-    }
-
+    assert(s->io_q.plugged);
+    s->io_q.plugged = 0;
     if (!s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
         ioq_submit(s);
     }
diff --git a/block/raw-aio.h b/block/raw-aio.h
index 31d791f..7600060 100644
--- a/block/raw-aio.h
+++ b/block/raw-aio.h
@@ -41,7 +41,7 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
 void laio_detach_aio_context(void *s, AioContext *old_context);
 void laio_attach_aio_context(void *s, AioContext *new_context);
 void laio_io_plug(BlockDriverState *bs, void *aio_ctx);
-void laio_io_unplug(BlockDriverState *bs, void *aio_ctx, bool unplug);
+void laio_io_unplug(BlockDriverState *bs, void *aio_ctx);
 #endif
 
 #ifdef _WIN32
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 8866121..7ecbee3 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1343,17 +1343,7 @@ static void raw_aio_unplug(BlockDriverState *bs)
 #ifdef CONFIG_LINUX_AIO
     BDRVRawState *s = bs->opaque;
     if (s->use_aio) {
-        laio_io_unplug(bs, s->aio_ctx, true);
-    }
-#endif
-}
-
-static void raw_aio_flush_io_queue(BlockDriverState *bs)
-{
-#ifdef CONFIG_LINUX_AIO
-    BDRVRawState *s = bs->opaque;
-    if (s->use_aio) {
-        laio_io_unplug(bs, s->aio_ctx, false);
+        laio_io_unplug(bs, s->aio_ctx);
     }
 #endif
 }
@@ -1947,7 +1937,6 @@ BlockDriver bdrv_file = {
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
     .bdrv_io_unplug = raw_aio_unplug,
-    .bdrv_flush_io_queue = raw_aio_flush_io_queue,
 
     .bdrv_truncate = raw_truncate,
     .bdrv_getlength = raw_getlength,
@@ -2310,7 +2299,6 @@ static BlockDriver bdrv_host_device = {
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
     .bdrv_io_unplug = raw_aio_unplug,
-    .bdrv_flush_io_queue = raw_aio_flush_io_queue,
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength	= raw_getlength,
@@ -2440,7 +2428,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
     .bdrv_io_unplug = raw_aio_unplug,
-    .bdrv_flush_io_queue = raw_aio_flush_io_queue,
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength      = raw_getlength,
@@ -2576,7 +2563,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
     .bdrv_io_unplug = raw_aio_unplug,
-    .bdrv_flush_io_queue = raw_aio_flush_io_queue,
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength      = raw_getlength,
diff --git a/include/block/block.h b/include/block/block.h
index eaa6426..527b691 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -524,7 +524,8 @@ int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
 
 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
-void bdrv_flush_io_queue(BlockDriverState *bs);
+void bdrv_io_unplugged_begin(BlockDriverState *bs);
+void bdrv_io_unplugged_end(BlockDriverState *bs);
 
 /**
  * bdrv_drained_begin:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index dac6d43..13136a3 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -282,7 +282,6 @@ struct BlockDriver {
     /* io queue for linux-aio */
     void (*bdrv_io_plug)(BlockDriverState *bs);
     void (*bdrv_io_unplug)(BlockDriverState *bs);
-    void (*bdrv_flush_io_queue)(BlockDriverState *bs);
 
     /**
      * Try to get @bs's logical and physical block size.
@@ -477,6 +476,10 @@ struct BlockDriverState {
     uint64_t write_threshold_offset;
     NotifierWithReturn write_threshold_notifier;
 
+    /* counters for nested bdrv_io_plug and bdrv_io_unplugged_begin */
+    unsigned io_plugged;
+    unsigned io_plug_disabled;
+
     int quiesce_counter;
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 05/16] mirror: use bottom half to re-enter coroutine
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 04/16] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-17  2:23   ` Fam Zheng
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 06/16] block: add BDS field to count in-flight requests Paolo Bonzini
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: use aio_bh_new() [Fam]

 block/mirror.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9635fa8..2c7874d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -72,6 +72,7 @@ typedef struct MirrorOp {
     QEMUIOVector qiov;
     int64_t sector_num;
     int nb_sectors;
+    QEMUBH *co_enter_bh;
 } MirrorOp;
 
 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
@@ -87,6 +88,18 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
     }
 }
 
+static void mirror_bh_cb(void *opaque)
+{
+    MirrorOp *op = opaque;
+    MirrorBlockJob *s = op->s;
+
+    qemu_bh_delete(op->co_enter_bh);
+    g_free(op);
+    if (s->waiting_for_io) {
+        qemu_coroutine_enter(s->common.co, NULL);
+    }
+}
+
 static void mirror_iteration_done(MirrorOp *op, int ret)
 {
     MirrorBlockJob *s = op->s;
@@ -117,11 +130,14 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
     }
 
     qemu_iovec_destroy(&op->qiov);
-    g_free(op);
 
-    if (s->waiting_for_io) {
-        qemu_coroutine_enter(s->common.co, NULL);
-    }
+    /* The I/O operation is not finished until the callback returns.
+     * If we call qemu_coroutine_enter here, there is the possibility
+     * of a deadlock when the coroutine calls bdrv_drained_begin.
+     */
+    op->co_enter_bh = aio_bh_new(bdrv_get_aio_context(s->target),
+                                 mirror_bh_cb, op);
+    qemu_bh_schedule(op->co_enter_bh);
 }
 
 static void mirror_write_complete(void *opaque, int ret)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 06/16] block: add BDS field to count in-flight requests
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 05/16] mirror: use bottom half to re-enter coroutine Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-17  2:53   ` Fam Zheng
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 07/16] block: change drain to look only at one child at a time Paolo Bonzini
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

Unlike tracked_requests, this field also counts throttled requests,
and remains non-zero if an AIO operation needs a BH to be "really"
completed.

With this change, it is no longer necessary to have a dummy
BdrvTrackedRequest for requests that are never serialising, and
it is no longer necessary to poll the AioContext once after
bdrv_requests_pending(bs) returns false.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c                | 71 +++++++++++++++++++++++++++++++----------------
 block/mirror.c            |  2 ++
 include/block/block_int.h |  8 ++++--
 3 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/block/io.c b/block/io.c
index d8d50b7..a9a23a6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -224,13 +224,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
 {
     BdrvChild *child;
 
-    if (!QLIST_EMPTY(&bs->tracked_requests)) {
-        return true;
-    }
-    if (!qemu_co_queue_empty(&bs->throttled_reqs[0])) {
-        return true;
-    }
-    if (!qemu_co_queue_empty(&bs->throttled_reqs[1])) {
+    if (atomic_read(&bs->in_flight)) {
         return true;
     }
 
@@ -268,15 +262,12 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
  */
 void bdrv_drain(BlockDriverState *bs)
 {
-    bool busy = true;
-
     bdrv_no_throttling_begin(bs);
     bdrv_io_unplugged_begin(bs);
     bdrv_drain_recurse(bs);
-    while (busy) {
+    while (bdrv_requests_pending(bs)) {
         /* Keep iterating */
-         busy = bdrv_requests_pending(bs);
-         busy |= aio_poll(bdrv_get_aio_context(bs), busy);
+         aio_poll(bdrv_get_aio_context(bs), true);
     }
     bdrv_io_unplugged_end(bs);
     bdrv_no_throttling_end(bs);
@@ -291,7 +282,7 @@ void bdrv_drain(BlockDriverState *bs)
 void bdrv_drain_all(void)
 {
     /* Always run first iteration so any pending completion BHs run */
-    bool busy = true;
+    bool waited = true;
     BlockDriverState *bs = NULL;
     GSList *aio_ctxs = NULL, *ctx;
 
@@ -318,8 +309,8 @@ void bdrv_drain_all(void)
      * request completion.  Therefore we must keep looping until there was no
      * more activity rather than simply draining each device independently.
      */
-    while (busy) {
-        busy = false;
+    while (waited) {
+        waited = false;
 
         for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
             AioContext *aio_context = ctx->data;
@@ -329,12 +320,11 @@ void bdrv_drain_all(void)
             while ((bs = bdrv_next(bs))) {
                 if (aio_context == bdrv_get_aio_context(bs)) {
                     if (bdrv_requests_pending(bs)) {
-                        busy = true;
-                        aio_poll(aio_context, busy);
+                        aio_poll(aio_context, true);
+                        waited = true;
                     }
                 }
             }
-            busy |= aio_poll(aio_context, false);
             aio_context_release(aio_context);
         }
     }
@@ -457,6 +447,16 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
     return true;
 }
 
+void bdrv_inc_in_flight(BlockDriverState *bs)
+{
+    atomic_inc(&bs->in_flight);
+}
+
+void bdrv_dec_in_flight(BlockDriverState *bs)
+{
+    atomic_dec(&bs->in_flight);
+}
+
 static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
 {
     BlockDriverState *bs = self->bs;
@@ -963,6 +963,8 @@ static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
         return ret;
     }
 
+    bdrv_inc_in_flight(bs);
+
     /* Don't do copy-on-read if we read data before write operation */
     if (bs->copy_on_read && !(flags & BDRV_REQ_NO_SERIALISING)) {
         flags |= BDRV_REQ_COPY_ON_READ;
@@ -1003,6 +1005,7 @@ static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
                               use_local_qiov ? &local_qiov : qiov,
                               flags);
     tracked_request_end(&req);
+    bdrv_dec_in_flight(bs);
 
     if (use_local_qiov) {
         qemu_iovec_destroy(&local_qiov);
@@ -1310,6 +1313,8 @@ static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
         return ret;
     }
 
+    bdrv_inc_in_flight(bs);
+
     /* throttling disk I/O */
     if (bs->throttle_state) {
         throttle_group_co_io_limits_intercept(bs, bytes, true);
@@ -1408,6 +1413,7 @@ fail:
     qemu_vfree(tail_buf);
 out:
     tracked_request_end(&req);
+    bdrv_dec_in_flight(bs);
     return ret;
 }
 
@@ -2065,6 +2071,7 @@ static const AIOCBInfo bdrv_em_aiocb_info = {
 static void bdrv_aio_bh_cb(void *opaque)
 {
     BlockAIOCBSync *acb = opaque;
+    BlockDriverState *bs = acb->common.bs;
 
     if (!acb->is_write && acb->ret >= 0) {
         qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size);
@@ -2072,6 +2079,7 @@ static void bdrv_aio_bh_cb(void *opaque)
     qemu_vfree(acb->bounce);
     acb->common.cb(acb->common.opaque, acb->ret);
     qemu_bh_delete(acb->bh);
+    bdrv_dec_in_flight(bs);
     acb->bh = NULL;
     qemu_aio_unref(acb);
 }
@@ -2087,6 +2095,7 @@ static BlockAIOCB *bdrv_aio_rw_vector(BlockDriverState *bs,
 {
     BlockAIOCBSync *acb;
 
+    bdrv_inc_in_flight(bs);
     acb = qemu_aio_get(&bdrv_em_aiocb_info, bs, cb, opaque);
     acb->is_write = is_write;
     acb->qiov = qiov;
@@ -2139,6 +2148,7 @@ static void bdrv_co_complete(BlockAIOCBCoroutine *acb)
 {
     if (!acb->need_bh) {
         acb->common.cb(acb->common.opaque, acb->req.error);
+        bdrv_dec_in_flight(acb->common.bs);
         qemu_aio_unref(acb);
     }
 }
@@ -2192,6 +2202,9 @@ static BlockAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
     Coroutine *co;
     BlockAIOCBCoroutine *acb;
 
+    /* Matched by bdrv_co_complete's bdrv_dec_in_flight.  */
+    bdrv_inc_in_flight(bs);
+
     acb = qemu_aio_get(&bdrv_em_co_aiocb_info, bs, cb, opaque);
     acb->need_bh = true;
     acb->req.error = -EINPROGRESS;
@@ -2225,6 +2238,9 @@ BlockAIOCB *bdrv_aio_flush(BlockDriverState *bs,
     Coroutine *co;
     BlockAIOCBCoroutine *acb;
 
+    /* Matched by bdrv_co_complete's bdrv_dec_in_flight.  */
+    bdrv_inc_in_flight(bs);
+
     acb = qemu_aio_get(&bdrv_em_co_aiocb_info, bs, cb, opaque);
     acb->need_bh = true;
     acb->req.error = -EINPROGRESS;
@@ -2254,6 +2270,9 @@ BlockAIOCB *bdrv_aio_discard(BlockDriverState *bs,
 
     trace_bdrv_aio_discard(bs, sector_num, nb_sectors, opaque);
 
+    /* Matched by bdrv_co_complete's bdrv_dec_in_flight.  */
+    bdrv_inc_in_flight(bs);
+
     acb = qemu_aio_get(&bdrv_em_co_aiocb_info, bs, cb, opaque);
     acb->need_bh = true;
     acb->req.error = -EINPROGRESS;
@@ -2361,14 +2380,14 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 {
     int ret;
-    BdrvTrackedRequest req;
 
     if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
         bdrv_is_sg(bs)) {
         return 0;
     }
 
-    tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
+    bdrv_inc_in_flight(bs);
+
     /* Write back cached data to the OS even with cache=unsafe */
     BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
     if (bs->drv->bdrv_co_flush_to_os) {
@@ -2423,7 +2442,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 flush_parent:
     ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
 out:
-    tracked_request_end(&req);
+    bdrv_dec_in_flight(bs);
     return ret;
 }
 
@@ -2491,6 +2510,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
         return 0;
     }
 
+    bdrv_inc_in_flight(bs);
     tracked_request_begin(&req, bs, sector_num, nb_sectors,
                           BDRV_TRACKED_DISCARD);
     bdrv_set_dirty(bs, sector_num, nb_sectors);
@@ -2543,6 +2563,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
     ret = 0;
 out:
     tracked_request_end(&req);
+    bdrv_dec_in_flight(bs);
     return ret;
 }
 
@@ -2588,13 +2609,12 @@ static void bdrv_ioctl_bh_cb(void *opaque)
 static int bdrv_co_do_ioctl(BlockDriverState *bs, int req, void *buf)
 {
     BlockDriver *drv = bs->drv;
-    BdrvTrackedRequest tracked_req;
     CoroutineIOCompletion co = {
         .coroutine = qemu_coroutine_self(),
     };
     BlockAIOCB *acb;
 
-    tracked_request_begin(&tracked_req, bs, 0, 0, BDRV_TRACKED_IOCTL);
+    bdrv_inc_in_flight(bs);
     if (!drv || !drv->bdrv_aio_ioctl) {
         co.ret = -ENOTSUP;
         goto out;
@@ -2610,7 +2630,7 @@ static int bdrv_co_do_ioctl(BlockDriverState *bs, int req, void *buf)
     }
     qemu_coroutine_yield();
 out:
-    tracked_request_end(&tracked_req);
+    bdrv_dec_in_flight(bs);
     return co.ret;
 }
 
@@ -2667,6 +2687,9 @@ BlockAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
                                             bs, cb, opaque);
     Coroutine *co;
 
+    /* Matched by bdrv_co_complete's bdrv_dec_in_flight.  */
+    bdrv_inc_in_flight(bs);
+
     acb->need_bh = true;
     acb->req.error = -EINPROGRESS;
     acb->req.req = req;
diff --git a/block/mirror.c b/block/mirror.c
index 2c7874d..1aecafd 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -95,6 +95,7 @@ static void mirror_bh_cb(void *opaque)
 
     qemu_bh_delete(op->co_enter_bh);
     g_free(op);
+    bdrv_dec_in_flight(s->common.bs);
     if (s->waiting_for_io) {
         qemu_coroutine_enter(s->common.co, NULL);
     }
@@ -135,6 +136,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
      * If we call qemu_coroutine_enter here, there is the possibility
      * of a deadlock when the coroutine calls bdrv_drained_begin.
      */
+    bdrv_inc_in_flight(s->common.bs);
     op->co_enter_bh = aio_bh_new(bdrv_get_aio_context(s->target),
                                  mirror_bh_cb, op);
     qemu_bh_schedule(op->co_enter_bh);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 13136a3..b0311ac 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -63,8 +63,6 @@
 enum BdrvTrackedRequestType {
     BDRV_TRACKED_READ,
     BDRV_TRACKED_WRITE,
-    BDRV_TRACKED_FLUSH,
-    BDRV_TRACKED_IOCTL,
     BDRV_TRACKED_DISCARD,
 };
 
@@ -406,7 +404,8 @@ struct BlockDriverState {
     /* Callback before write request is processed */
     NotifierWithReturnList before_write_notifiers;
 
-    /* number of in-flight serialising requests */
+    /* number of in-flight requests; overall and serialising */
+    unsigned int in_flight;
     unsigned int serialising_in_flight;
 
     /* I/O throttling.
@@ -715,6 +714,9 @@ bool bdrv_requests_pending(BlockDriverState *bs);
 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out);
 void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in);
 
+void bdrv_inc_in_flight(BlockDriverState *bs);
+void bdrv_dec_in_flight(BlockDriverState *bs);
+
 void blockdev_close_all_bdrv_states(void);
 
 #endif /* BLOCK_INT_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 07/16] block: change drain to look only at one child at a time
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 06/16] block: add BDS field to count in-flight requests Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-17  3:10   ` Fam Zheng
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 08/16] blockjob: introduce .drain callback for jobs Paolo Bonzini
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

bdrv_requests_pending is checking children to also wait until internal
requests (such as metadata writes) have completed.  However, checking
children is in general overkill.  Children requests can be of two kinds:

- requests caused by an operation on bs, e.g. a bdrv_aio_write to bs
causing a write to bs->file->bs.  In this case, the parent's in_flight
count will always be incremented by at least one for every request in
the child.

- asynchronous metadata writes or flushes.  Such writes can be started
even if bs's in_flight count is zero, but not after the .bdrv_drain
callback has been invoked.

This patch therefore changes bdrv_drain to finish I/O in the parent
(after which the parent's in_flight will be locked to zero), call
bdrv_drain (after which the parent will not generate I/O on the child
anymore), and then wait for internal I/O in the children to complete.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	v1->v2: moved bdrv_drain callback after in_flight is 0
	in the parent [from QED drain discussion]
	  
 block/io.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/block/io.c b/block/io.c
index a9a23a6..db975ab 100644
--- a/block/io.c
+++ b/block/io.c
@@ -237,16 +237,25 @@ bool bdrv_requests_pending(BlockDriverState *bs)
     return false;
 }
 
-static void bdrv_drain_recurse(BlockDriverState *bs)
+static bool bdrv_drain_io_recurse(BlockDriverState *bs)
 {
     BdrvChild *child;
+    bool waited = false;
+
+    while (atomic_read(&bs->in_flight) > 0) {
+        aio_poll(bdrv_get_aio_context(bs), true);
+        waited = true;
+    }
 
     if (bs->drv && bs->drv->bdrv_drain) {
         bs->drv->bdrv_drain(bs);
     }
+
     QLIST_FOREACH(child, &bs->children, next) {
-        bdrv_drain_recurse(child->bs);
+        waited |= bdrv_drain_io_recurse(child->bs);
     }
+
+    return waited;
 }
 
 /*
@@ -264,11 +273,7 @@ void bdrv_drain(BlockDriverState *bs)
 {
     bdrv_no_throttling_begin(bs);
     bdrv_io_unplugged_begin(bs);
-    bdrv_drain_recurse(bs);
-    while (bdrv_requests_pending(bs)) {
-        /* Keep iterating */
-         aio_poll(bdrv_get_aio_context(bs), true);
-    }
+    bdrv_drain_io_recurse(bs);
     bdrv_io_unplugged_end(bs);
     bdrv_no_throttling_end(bs);
 }
@@ -295,7 +300,6 @@ void bdrv_drain_all(void)
         }
         bdrv_no_throttling_begin(bs);
         bdrv_io_unplugged_begin(bs);
-        bdrv_drain_recurse(bs);
         aio_context_release(aio_context);
 
         if (!g_slist_find(aio_ctxs, aio_context)) {
@@ -319,10 +323,7 @@ void bdrv_drain_all(void)
             aio_context_acquire(aio_context);
             while ((bs = bdrv_next(bs))) {
                 if (aio_context == bdrv_get_aio_context(bs)) {
-                    if (bdrv_requests_pending(bs)) {
-                        aio_poll(aio_context, true);
-                        waited = true;
-                    }
+                    waited |= bdrv_drain_io_recurse(bs);
                 }
             }
             aio_context_release(aio_context);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 08/16] blockjob: introduce .drain callback for jobs
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (6 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 07/16] block: change drain to look only at one child at a time Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 09/16] block: wait for all pending I/O when doing synchronous requests Paolo Bonzini
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

This is required to decouple block jobs from running in an
AioContext.  With multiqueue block devices, a BlockDriverState
does not really belong to a single AioContext.

The solution is to first wait until all I/O operations are
complete; then loop in the main thread for the block job to
complete entirely.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/backup.c           |  7 +++++++
 block/mirror.c           | 12 ++++++++++--
 blockjob.c               | 16 +++++++++++++---
 include/block/blockjob.h |  7 +++++++
 4 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index ab3e345..eea337e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -257,6 +257,12 @@ static void backup_abort(BlockJob *job)
     }
 }
 
+static void backup_drain(BlockJob *job)
+{
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+    bdrv_drain(s->target);
+}
+
 static const BlockJobDriver backup_job_driver = {
     .instance_size  = sizeof(BackupBlockJob),
     .job_type       = BLOCK_JOB_TYPE_BACKUP,
@@ -264,6 +270,7 @@ static const BlockJobDriver backup_job_driver = {
     .iostatus_reset = backup_iostatus_reset,
     .commit         = backup_commit,
     .abort          = backup_abort,
+    .drain          = backup_drain,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
diff --git a/block/mirror.c b/block/mirror.c
index 1aecafd..b052dbf 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -433,7 +433,7 @@ static void mirror_free_init(MirrorBlockJob *s)
     }
 }
 
-static void mirror_drain(MirrorBlockJob *s)
+static void mirror_wait_for_completion(MirrorBlockJob *s)
 {
     while (s->in_flight > 0) {
         mirror_wait_for_io(s);
@@ -699,7 +699,7 @@ immediate_exit:
          * the target is a copy of the source.
          */
         assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common)));
-        mirror_drain(s);
+        mirror_wait_for_completion(s);
     }
 
     assert(s->in_flight == 0);
@@ -780,12 +780,19 @@ static void mirror_complete(BlockJob *job, Error **errp)
     block_job_enter(&s->common);
 }
 
+static void mirror_drain(BlockJob *job)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+    bdrv_drain(s->target);
+}
+
 static const BlockJobDriver mirror_job_driver = {
     .instance_size = sizeof(MirrorBlockJob),
     .job_type      = BLOCK_JOB_TYPE_MIRROR,
     .set_speed     = mirror_set_speed,
     .iostatus_reset= mirror_iostatus_reset,
     .complete      = mirror_complete,
+    .drain         = mirror_drain,
 };
 
 static const BlockJobDriver commit_active_job_driver = {
@@ -795,6 +802,7 @@ static const BlockJobDriver commit_active_job_driver = {
     .iostatus_reset
                    = mirror_iostatus_reset,
     .complete      = mirror_complete,
+    .drain         = mirror_drain,
 };
 
 static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
diff --git a/blockjob.c b/blockjob.c
index 9fc37ca..5bb6f9b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -289,16 +289,26 @@ static int block_job_finish_sync(BlockJob *job,
     assert(bs->job == job);
 
     block_job_ref(job);
+
+    /* finish will call block_job_enter (see e.g. block_job_cancel,
+     * or mirror_complete in block/mirror.c).  Barring bugs in the
+     * job coroutine, bdrv_drain should be enough to induce progress
+     * until the job completes or moves to the main thread.
+    */
     finish(job, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         block_job_unref(job);
         return -EBUSY;
     }
+    while (!job->deferred_to_main_loop && !job->completed) {
+        bdrv_drain(bs);
+        if (job->driver->drain) {
+            job->driver->drain(job);
+        }
+    }
     while (!job->completed) {
-        aio_poll(job->deferred_to_main_loop ? qemu_get_aio_context() :
-                                              bdrv_get_aio_context(bs),
-                 true);
+        aio_poll(qemu_get_aio_context(), true);
     }
     ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
     block_job_unref(job);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 8bedc49..8e564df 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -70,6 +70,13 @@ typedef struct BlockJobDriver {
      * never both.
      */
     void (*abort)(BlockJob *job);
+
+    /**
+     * If the callback is not NULL, it will be invoked when the job has to be
+     * synchronously cancelled or completed; it should drain BlockDriverStates
+     * as required to ensure progress.
+     */
+    void (*drain)(BlockJob *job);
 } BlockJobDriver;
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 09/16] block: wait for all pending I/O when doing synchronous requests
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (7 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 08/16] blockjob: introduce .drain callback for jobs Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-17  2:55   ` Fam Zheng
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 10/16] nfs: replace aio_poll with bdrv_drain Paolo Bonzini
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

Synchronous I/O should in general happen either in the main thread (e.g.
for bdrv_open and bdrv_create) or between bdrv_drained_begin and
bdrv_drained_end.  Therefore, the simplest way to wait for it to finish
is to wait for _all_ pending I/O to complete.

In fact, there was one case in bdrv_close where we explicitly drained
after bdrv_flush; this is now unnecessary.  And we should probably have
called bdrv_drain_all after calls to bdrv_flush_all, which is now
unnecessary too.

This decouples synchronous I/O from aio_poll.  When the request used
not to be tracked as part of bdrv_drain (e.g. bdrv_co_get_block_status)
we need to update the in_flight count.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	v1->v2: remove unnecessary qed.c hunk [Fam]

 block.c                    |  1 -
 block/io.c                 | 39 ++++++++++++---------------------------
 block/qed-table.c          | 16 ++++------------
 tests/qemu-iotests/060     |  8 ++++++--
 tests/qemu-iotests/060.out |  4 +++-
 5 files changed, 25 insertions(+), 43 deletions(-)

diff --git a/block.c b/block.c
index bbd7de6..47f2367 100644
--- a/block.c
+++ b/block.c
@@ -2142,7 +2142,6 @@ static void bdrv_close(BlockDriverState *bs)
 
     bdrv_drained_begin(bs); /* complete I/O */
     bdrv_flush(bs);
-    bdrv_drain(bs); /* in case flush left pending I/O */
 
     bdrv_release_named_dirty_bitmaps(bs);
     assert(QLIST_EMPTY(&bs->dirty_bitmaps));
diff --git a/block/io.c b/block/io.c
index db975ab..0a99131 100644
--- a/block/io.c
+++ b/block/io.c
@@ -583,13 +583,9 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
         /* Fast-path if already in coroutine context */
         bdrv_rw_co_entry(&rwco);
     } else {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
         co = qemu_coroutine_create(bdrv_rw_co_entry);
         qemu_coroutine_enter(co, &rwco);
-        while (rwco.ret == NOT_DONE) {
-            aio_poll(aio_context, true);
-        }
+        bdrv_drain(bs);
     }
 
     bdrv_no_throttling_end(bs);
@@ -1535,17 +1531,19 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
     }
 
     *file = NULL;
+    bdrv_inc_in_flight(bs);
     ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
                                             file);
     if (ret < 0) {
         *pnum = 0;
-        return ret;
+        goto out;
     }
 
     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID);
-        return bdrv_get_block_status(bs->file->bs, ret >> BDRV_SECTOR_BITS,
-                                     *pnum, pnum, file);
+        ret = bdrv_get_block_status(bs->file->bs, ret >> BDRV_SECTOR_BITS,
+                                    *pnum, pnum, file);
+        goto out;
     }
 
     if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
@@ -1587,6 +1585,8 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         }
     }
 
+out:
+    bdrv_dec_in_flight(bs);
     return ret;
 }
 
@@ -1652,13 +1652,9 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
         /* Fast-path if already in coroutine context */
         bdrv_get_block_status_above_co_entry(&data);
     } else {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
         co = qemu_coroutine_create(bdrv_get_block_status_above_co_entry);
         qemu_coroutine_enter(co, &data);
-        while (!data.done) {
-            aio_poll(aio_context, true);
-        }
+        bdrv_drain(bs);
     }
     return data.ret;
 }
@@ -2459,13 +2455,9 @@ int bdrv_flush(BlockDriverState *bs)
         /* Fast-path if already in coroutine context */
         bdrv_flush_co_entry(&rwco);
     } else {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
         co = qemu_coroutine_create(bdrv_flush_co_entry);
         qemu_coroutine_enter(co, &rwco);
-        while (rwco.ret == NOT_DONE) {
-            aio_poll(aio_context, true);
-        }
+        bdrv_drain(bs);
     }
 
     return rwco.ret;
@@ -2582,13 +2574,9 @@ int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
         /* Fast-path if already in coroutine context */
         bdrv_discard_co_entry(&rwco);
     } else {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
         co = qemu_coroutine_create(bdrv_discard_co_entry);
         qemu_coroutine_enter(co, &rwco);
-        while (rwco.ret == NOT_DONE) {
-            aio_poll(aio_context, true);
-        }
+        bdrv_drain(bs);
     }
 
     return rwco.ret;
@@ -2663,11 +2651,8 @@ int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
         bdrv_co_ioctl_entry(&data);
     } else {
         Coroutine *co = qemu_coroutine_create(bdrv_co_ioctl_entry);
-
         qemu_coroutine_enter(co, &data);
-        while (data.ret == -EINPROGRESS) {
-            aio_poll(bdrv_get_aio_context(bs), true);
-        }
+        bdrv_drain(bs);
     }
     return data.ret;
 }
diff --git a/block/qed-table.c b/block/qed-table.c
index 802945f..3a8566f 100644
--- a/block/qed-table.c
+++ b/block/qed-table.c
@@ -173,9 +173,7 @@ int qed_read_l1_table_sync(BDRVQEDState *s)
 
     qed_read_table(s, s->header.l1_table_offset,
                    s->l1_table, qed_sync_cb, &ret);
-    while (ret == -EINPROGRESS) {
-        aio_poll(bdrv_get_aio_context(s->bs), true);
-    }
+    bdrv_drain(s->bs);
 
     return ret;
 }
@@ -194,9 +192,7 @@ int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
     int ret = -EINPROGRESS;
 
     qed_write_l1_table(s, index, n, qed_sync_cb, &ret);
-    while (ret == -EINPROGRESS) {
-        aio_poll(bdrv_get_aio_context(s->bs), true);
-    }
+    bdrv_drain(s->bs);
 
     return ret;
 }
@@ -267,9 +263,7 @@ int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset
     int ret = -EINPROGRESS;
 
     qed_read_l2_table(s, request, offset, qed_sync_cb, &ret);
-    while (ret == -EINPROGRESS) {
-        aio_poll(bdrv_get_aio_context(s->bs), true);
-    }
+    bdrv_drain(s->bs);
 
     return ret;
 }
@@ -289,9 +283,7 @@ int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
     int ret = -EINPROGRESS;
 
     qed_write_l2_table(s, request, index, n, flush, qed_sync_cb, &ret);
-    while (ret == -EINPROGRESS) {
-        aio_poll(bdrv_get_aio_context(s->bs), true);
-    }
+    bdrv_drain(s->bs);
 
     return ret;
 }
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index c81319c..dffe35a 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -164,8 +164,12 @@ echo "open -o file.driver=blkdebug $TEST_IMG
 break cow_read 0
 aio_write 0k 1k
 wait_break 0
-write 64k 64k
-resume 0" | $QEMU_IO | _filter_qemu_io
+break pwritev_done 1
+aio_write 64k 64k
+wait_break 1
+resume 1
+resume 0
+aio_flush" | $QEMU_IO | _filter_qemu_io
 
 echo
 echo "=== Testing unallocated image header ==="
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 5d40206..a0a9a11 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -105,7 +105,9 @@ discard 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qcow2: Marking image as corrupt: Preventing invalid write on metadata (overlaps with active L2 table); further corruption events will be suppressed
 blkdebug: Suspended request '0'
-write failed: Input/output error
+blkdebug: Suspended request '1'
+blkdebug: Resuming request '1'
+aio_write failed: Input/output error
 blkdebug: Resuming request '0'
 aio_write failed: No medium found
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 10/16] nfs: replace aio_poll with bdrv_drain
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (8 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 09/16] block: wait for all pending I/O when doing synchronous requests Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 11/16] sheepdog: disable dataplane Paolo Bonzini
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/nfs.c | 50 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index 7220e89..5cebb83 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -48,6 +48,7 @@ typedef struct NFSClient {
 } NFSClient;
 
 typedef struct NFSRPC {
+    BlockDriverState *bs;
     int ret;
     int complete;
     QEMUIOVector *iov;
@@ -87,11 +88,12 @@ static void nfs_process_write(void *arg)
     nfs_set_events(client);
 }
 
-static void nfs_co_init_task(NFSClient *client, NFSRPC *task)
+static void nfs_co_init_task(BlockDriverState *bs, NFSRPC *task)
 {
     *task = (NFSRPC) {
         .co             = qemu_coroutine_self(),
-        .client         = client,
+        .bs             = bs,
+        .client         = bs->opaque,
     };
 }
 
@@ -109,6 +111,7 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
 {
     NFSRPC *task = private_data;
     task->ret = ret;
+    assert(!task->st);
     if (task->ret > 0 && task->iov) {
         if (task->ret <= task->iov->size) {
             qemu_iovec_from_buf(task->iov, 0, data, task->ret);
@@ -116,19 +119,12 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
             task->ret = -EIO;
         }
     }
-    if (task->ret == 0 && task->st) {
-        memcpy(task->st, data, sizeof(struct stat));
-    }
     if (task->ret < 0) {
         error_report("NFS Error: %s", nfs_get_error(nfs));
     }
-    if (task->co) {
-        task->bh = aio_bh_new(task->client->aio_context,
-                              nfs_co_generic_bh_cb, task);
-        qemu_bh_schedule(task->bh);
-    } else {
-        task->complete = 1;
-    }
+    task->bh = aio_bh_new(task->client->aio_context,
+                          nfs_co_generic_bh_cb, task);
+    qemu_bh_schedule(task->bh);
 }
 
 static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
@@ -138,7 +134,7 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
     NFSClient *client = bs->opaque;
     NFSRPC task;
 
-    nfs_co_init_task(client, &task);
+    nfs_co_init_task(bs, &task);
     task.iov = iov;
 
     if (nfs_pread_async(client->context, client->fh,
@@ -173,7 +169,7 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
     NFSRPC task;
     char *buf = NULL;
 
-    nfs_co_init_task(client, &task);
+    nfs_co_init_task(bs, &task);
 
     buf = g_try_malloc(nb_sectors * BDRV_SECTOR_SIZE);
     if (nb_sectors && buf == NULL) {
@@ -209,7 +205,7 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
     NFSClient *client = bs->opaque;
     NFSRPC task;
 
-    nfs_co_init_task(client, &task);
+    nfs_co_init_task(bs, &task);
 
     if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
                         &task) != 0) {
@@ -469,6 +465,21 @@ static int nfs_has_zero_init(BlockDriverState *bs)
     return client->has_zero_init;
 }
 
+static void
+nfs_get_allocated_file_size_cb(int ret, struct nfs_context *nfs, void *data,
+                               void *private_data)
+{
+    NFSRPC *task = private_data;
+    task->ret = ret;
+    if (task->ret == 0) {
+        memcpy(task->st, data, sizeof(struct stat));
+    }
+    if (task->ret < 0) {
+        error_report("NFS Error: %s", nfs_get_error(nfs));
+    }
+    bdrv_dec_in_flight(task->bs);
+}
+
 static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
 {
     NFSClient *client = bs->opaque;
@@ -480,16 +491,15 @@ static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
         return client->st_blocks * 512;
     }
 
+    bdrv_inc_in_flight(bs);
+    task.bs = bs;
     task.st = &st;
-    if (nfs_fstat_async(client->context, client->fh, nfs_co_generic_cb,
+    if (nfs_fstat_async(client->context, client->fh, nfs_get_allocated_file_size_cb,
                         &task) != 0) {
         return -ENOMEM;
     }
 
-    while (!task.complete) {
-        nfs_set_events(client);
-        aio_poll(client->aio_context, true);
-    }
+    bdrv_drain(bs);
 
     return (task.ret < 0 ? task.ret : st.st_blocks * 512);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 11/16] sheepdog: disable dataplane
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (9 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 10/16] nfs: replace aio_poll with bdrv_drain Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-23 10:45   ` Kevin Wolf
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 12/16] aio: introduce aio_context_in_iothread Paolo Bonzini
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

sheepdog has some calls to aio_poll that are hard to eliminate, for
example in sd_sheepdog_goto's call to do_req.  Since I don't have
means to test sheepdog well, disable dataplane altogether for this
driver.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/sheepdog.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index a6e98a5..8ced3e5 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -364,6 +364,7 @@ struct SheepdogAIOCB {
 typedef struct BDRVSheepdogState {
     BlockDriverState *bs;
     AioContext *aio_context;
+    Error *blocker;
 
     SheepdogInode inode;
 
@@ -1422,6 +1423,21 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
     Error *local_err = NULL;
     const char *filename;
 
+    /* sd_snapshot_goto does blocking operations that call aio_poll
+     * (through do_req).  This can cause races with iothread:
+     *
+     *       main thread                       I/O thread
+     *       -----------------                 ------------------
+     *       while(srco.finished == false)
+     *                                         aio_poll(..., true)
+     *                                            srco.finished = true
+     *         aio_poll(..., true)
+     *
+     * Now aio_poll potentially blocks forever.
+     */
+    error_setg(&s->blocker, "sheepdog does not support iothreads");
+    bdrv_op_block(bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
+
     s->bs = bs;
     s->aio_context = bdrv_get_aio_context(bs);
 
@@ -1962,6 +1978,9 @@ static void sd_close(BlockDriverState *bs)
                        false, NULL, NULL, NULL);
     closesocket(s->fd);
     g_free(s->host_spec);
+
+    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
+    error_free(s->blocker);
 }
 
 static int64_t sd_getlength(BlockDriverState *bs)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 12/16] aio: introduce aio_context_in_iothread
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (10 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 11/16] sheepdog: disable dataplane Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 13/16] block: only call aio_poll from iothread Paolo Bonzini
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

This will be used by bdrv_drain (and indirectly by the synchronous I/O
helpers), to choose between aio_poll or QemuEvent.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/block/aio.h | 7 +++++++
 iothread.c          | 9 +++++++++
 stubs/Makefile.objs | 1 +
 stubs/iothread.c    | 8 ++++++++
 4 files changed, 25 insertions(+)
 create mode 100644 stubs/iothread.c

diff --git a/include/block/aio.h b/include/block/aio.h
index e086e3b..9434665 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -435,6 +435,13 @@ static inline bool aio_node_check(AioContext *ctx, bool is_external)
 }
 
 /**
+ * @ctx: the aio context
+ *
+ * Return whether we are running in the I/O thread that manages @ctx.
+ */
+bool aio_context_in_iothread(AioContext *ctx);
+
+/**
  * aio_context_setup:
  * @ctx: the aio context
  *
diff --git a/iothread.c b/iothread.c
index f183d38..8d40bb0 100644
--- a/iothread.c
+++ b/iothread.c
@@ -20,6 +20,7 @@
 #include "qmp-commands.h"
 #include "qemu/error-report.h"
 #include "qemu/rcu.h"
+#include "qemu/main-loop.h"
 
 typedef ObjectClass IOThreadClass;
 
@@ -28,6 +29,13 @@ typedef ObjectClass IOThreadClass;
 #define IOTHREAD_CLASS(klass) \
    OBJECT_CLASS_CHECK(IOThreadClass, klass, TYPE_IOTHREAD)
 
+static __thread IOThread *my_iothread;
+
+bool aio_context_in_iothread(AioContext *ctx)
+{
+    return ctx == (my_iothread ? my_iothread->ctx : qemu_get_aio_context());
+}
+
 static void *iothread_run(void *opaque)
 {
     IOThread *iothread = opaque;
@@ -35,6 +43,7 @@ static void *iothread_run(void *opaque)
 
     rcu_register_thread();
 
+    my_iothread = iothread;
     qemu_mutex_lock(&iothread->init_done_lock);
     iothread->thread_id = qemu_get_thread_id();
     qemu_cond_signal(&iothread->init_done_cond);
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index e922de9..187cee1 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -13,6 +13,7 @@ stub-obj-y += gdbstub.o
 stub-obj-y += get-fd.o
 stub-obj-y += get-next-serial.o
 stub-obj-y += get-vm-name.o
+stub-obj-y += iothread.o
 stub-obj-y += iothread-lock.o
 stub-obj-y += is-daemonized.o
 stub-obj-y += machine-init-done.o
diff --git a/stubs/iothread.c b/stubs/iothread.c
new file mode 100644
index 0000000..6c02323
--- /dev/null
+++ b/stubs/iothread.c
@@ -0,0 +1,8 @@
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "qemu/main-loop.h"
+
+bool aio_context_in_iothread(AioContext *ctx)
+{
+    return ctx == qemu_get_aio_context();
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 13/16] block: only call aio_poll from iothread
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (11 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 12/16] aio: introduce aio_context_in_iothread Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 14/16] iothread: release AioContext around aio_poll Paolo Bonzini
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

aio_poll is not thread safe; for example bdrv_drain can hang if
the last in-flight I/O operation is completed in the I/O thread after
the main thread has checked bs->in_flight.

The bug remains latent as long as all of it is called within
aio_context_acquire/aio_context_release, but this will change soon.

To fix this, if bdrv_drain is called from outside the I/O thread handle
it internally in the BDS, without involving AioContext and aio_poll.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: Remove misleading comment [Fam]

 block.c                   |  2 ++
 block/io.c                | 18 +++++++++++++++---
 include/block/block_int.h |  5 ++++-
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 47f2367..2f13bfd 100644
--- a/block.c
+++ b/block.c
@@ -247,6 +247,7 @@ BlockDriverState *bdrv_new(void)
     qemu_co_queue_init(&bs->throttled_reqs[1]);
     bs->refcnt = 1;
     bs->aio_context = qemu_get_aio_context();
+    qemu_event_init(&bs->in_flight_event, true);
 
     QTAILQ_INSERT_TAIL(&all_bdrv_states, bs, bs_list);
 
@@ -2385,6 +2386,7 @@ static void bdrv_delete(BlockDriverState *bs)
     bdrv_make_anon(bs);
 
     QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);
+    qemu_event_destroy(&bs->in_flight_event);
 
     g_free(bs);
 }
diff --git a/block/io.c b/block/io.c
index 0a99131..c66b037 100644
--- a/block/io.c
+++ b/block/io.c
@@ -239,11 +239,21 @@ bool bdrv_requests_pending(BlockDriverState *bs)
 
 static bool bdrv_drain_io_recurse(BlockDriverState *bs)
 {
-    BdrvChild *child;
+    AioContext *ctx = bdrv_get_aio_context(bs);
     bool waited = false;
+    BdrvChild *child;
 
     while (atomic_read(&bs->in_flight) > 0) {
-        aio_poll(bdrv_get_aio_context(bs), true);
+        if (aio_context_in_iothread(ctx)) {
+            aio_poll(ctx, true);
+        } else {
+            qemu_event_reset(&bs->in_flight_event);
+            if (atomic_read(&bs->in_flight) > 0) {
+                aio_context_release(ctx);
+                qemu_event_wait(&bs->in_flight_event);
+                aio_context_acquire(ctx);
+            }
+        }
         waited = true;
     }
 
@@ -455,7 +465,9 @@ void bdrv_inc_in_flight(BlockDriverState *bs)
 
 void bdrv_dec_in_flight(BlockDriverState *bs)
 {
-    atomic_dec(&bs->in_flight);
+    if (atomic_fetch_dec(&bs->in_flight) == 1) {
+        qemu_event_set(&bs->in_flight_event);
+    }
 }
 
 static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index b0311ac..489aa3c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -404,9 +404,12 @@ struct BlockDriverState {
     /* Callback before write request is processed */
     NotifierWithReturnList before_write_notifiers;
 
-    /* number of in-flight requests; overall and serialising */
+    /* number of in-flight requests; overall and serialising.
+     * in_flight_event is set when in_flight becomes 0.
+     */
     unsigned int in_flight;
     unsigned int serialising_in_flight;
+    QemuEvent in_flight_event;
 
     /* I/O throttling.
      * throttle_state tells us if this BDS has I/O limits configured.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 14/16] iothread: release AioContext around aio_poll
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (12 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 13/16] block: only call aio_poll from iothread Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 15/16] qemu-thread: introduce QemuRecMutex Paolo Bonzini
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

This is the first step towards having fine-grained critical sections in
dataplane threads, which will resolve lock ordering problems between
address_space_* functions (which need the BQL when doing MMIO, even
after we complete RCU-based dispatch) and the AioContext.

Because AioContext does not use contention callbacks anymore, the
unit test has to be changed.

Previously applied as a0710f7995f914e3044e5899bd8ff6c43c62f916 and
then reverted.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c                     | 22 +++-------------------
 docs/multiple-iothreads.txt | 40 +++++++++++++++++++++++-----------------
 include/block/aio.h         |  3 ---
 iothread.c                  | 11 ++---------
 tests/test-aio.c            | 22 ++++++++++++++--------
 5 files changed, 42 insertions(+), 56 deletions(-)

diff --git a/async.c b/async.c
index d4dd2cc..0084d57 100644
--- a/async.c
+++ b/async.c
@@ -85,8 +85,8 @@ int aio_bh_poll(AioContext *ctx)
          * aio_notify again if necessary.
          */
         if (!bh->deleted && atomic_xchg(&bh->scheduled, 0)) {
-            /* Idle BHs and the notify BH don't count as progress */
-            if (!bh->idle && bh != ctx->notify_dummy_bh) {
+            /* Idle BHs don't count as progress */
+            if (!bh->idle) {
                 ret = 1;
             }
             bh->idle = 0;
@@ -238,7 +238,6 @@ aio_ctx_finalize(GSource     *source)
 {
     AioContext *ctx = (AioContext *) source;
 
-    qemu_bh_delete(ctx->notify_dummy_bh);
     thread_pool_free(ctx->thread_pool);
 
     qemu_mutex_lock(&ctx->bh_lock);
@@ -305,19 +304,6 @@ static void aio_timerlist_notify(void *opaque)
     aio_notify(opaque);
 }
 
-static void aio_rfifolock_cb(void *opaque)
-{
-    AioContext *ctx = opaque;
-
-    /* Kick owner thread in case they are blocked in aio_poll() */
-    qemu_bh_schedule(ctx->notify_dummy_bh);
-}
-
-static void notify_dummy_bh(void *opaque)
-{
-    /* Do nothing, we were invoked just to force the event loop to iterate */
-}
-
 static void event_notifier_dummy_cb(EventNotifier *e)
 {
 }
@@ -346,11 +332,9 @@ AioContext *aio_context_new(Error **errp)
                            event_notifier_dummy_cb);
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
-    rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
+    rfifolock_init(&ctx->lock, NULL, NULL);
     timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
 
-    ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
-
     return ctx;
 fail:
     g_source_destroy(&ctx->source);
diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
index 40b8419..0e7cdb2 100644
--- a/docs/multiple-iothreads.txt
+++ b/docs/multiple-iothreads.txt
@@ -105,13 +105,10 @@ a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
 acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
 sure to acquire the AioContext for aio_bh_new() if necessary.
 
-The relationship between AioContext and the block layer
--------------------------------------------------------
-The AioContext originates from the QEMU block layer because it provides a
-scoped way of running event loop iterations until all work is done.  This
-feature is used to complete all in-flight block I/O requests (see
-bdrv_drain_all()).  Nowadays AioContext is a generic event loop that can be
-used by any QEMU subsystem.
+AioContext and the block layer
+------------------------------
+The AioContext originates from the QEMU block layer, even though nowadays
+AioContext is a generic event loop that can be used by any QEMU subsystem.
 
 The block layer has support for AioContext integrated.  Each BlockDriverState
 is associated with an AioContext using bdrv_set_aio_context() and
@@ -122,13 +119,22 @@ Block layer code must therefore expect to run in an IOThread and avoid using
 old APIs that implicitly use the main loop.  See the "How to program for
 IOThreads" above for information on how to do that.
 
-If main loop code such as a QMP function wishes to access a BlockDriverState it
-must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
-IOThread does not run in parallel.
-
-Long-running jobs (usually in the form of coroutines) are best scheduled in the
-BlockDriverState's AioContext to avoid the need to acquire/release around each
-bdrv_*() call.  Be aware that there is currently no mechanism to get notified
-when bdrv_set_aio_context() moves this BlockDriverState to a different
-AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
-may need to add this if you want to support long-running jobs.
+If main loop code such as a QMP function wishes to access a BlockDriverState
+it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure
+that callbacks in the IOThread do not run in parallel.
+
+Code running in the monitor typically needs to ensure that past
+requests from the guest are completed.  When a block device is running
+in an IOThread, the IOThread can also process requests from the guest
+(via ioeventfd).  To achieve both objects, wrap the code between
+bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
+section".  The functions must be called between aio_context_acquire()
+and aio_context_release().  You can freely release and re-acquire the
+AioContext within a drained section.
+
+Long-running jobs (usually in the form of coroutines) are best scheduled in
+the BlockDriverState's AioContext to avoid the need to acquire/release around
+each bdrv_*() call.  The functions bdrv_add/remove_aio_context_notifier,
+or alternatively blk_add/remove_aio_context_notifier if you use BlockBackends,
+can be used to get a notification whenever bdrv_set_aio_context() moves a
+BlockDriverState to a different AioContext.
diff --git a/include/block/aio.h b/include/block/aio.h
index 9434665..ba1ee81 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -114,9 +114,6 @@ struct AioContext {
     bool notified;
     EventNotifier notifier;
 
-    /* Scheduling this BH forces the event loop it iterate */
-    QEMUBH *notify_dummy_bh;
-
     /* Thread pool for performing work and receiving completion callbacks */
     struct ThreadPool *thread_pool;
 
diff --git a/iothread.c b/iothread.c
index 8d40bb0..f66ec95 100644
--- a/iothread.c
+++ b/iothread.c
@@ -39,7 +39,6 @@ bool aio_context_in_iothread(AioContext *ctx)
 static void *iothread_run(void *opaque)
 {
     IOThread *iothread = opaque;
-    bool blocking;
 
     rcu_register_thread();
 
@@ -49,14 +48,8 @@ static void *iothread_run(void *opaque)
     qemu_cond_signal(&iothread->init_done_cond);
     qemu_mutex_unlock(&iothread->init_done_lock);
 
-    while (!iothread->stopping) {
-        aio_context_acquire(iothread->ctx);
-        blocking = true;
-        while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) {
-            /* Progress was made, keep going */
-            blocking = false;
-        }
-        aio_context_release(iothread->ctx);
+    while (!atomic_read(&iothread->stopping)) {
+        aio_poll(iothread->ctx, true);
     }
 
     rcu_unregister_thread();
diff --git a/tests/test-aio.c b/tests/test-aio.c
index a109bd0..17a8ce9 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -100,6 +100,7 @@ static void event_ready_cb(EventNotifier *e)
 
 typedef struct {
     QemuMutex start_lock;
+    EventNotifier notifier;
     bool thread_acquired;
 } AcquireTestData;
 
@@ -111,6 +112,11 @@ static void *test_acquire_thread(void *opaque)
     qemu_mutex_lock(&data->start_lock);
     qemu_mutex_unlock(&data->start_lock);
 
+    /* event_notifier_set might be called either before or after
+     * the main thread's call to poll().  The test case's outcome
+     * should be the same in either case.
+     */
+    event_notifier_set(&data->notifier);
     aio_context_acquire(ctx);
     aio_context_release(ctx);
 
@@ -125,20 +131,19 @@ static void set_event_notifier(AioContext *ctx, EventNotifier *notifier,
     aio_set_event_notifier(ctx, notifier, false, handler);
 }
 
-static void dummy_notifier_read(EventNotifier *unused)
+static void dummy_notifier_read(EventNotifier *n)
 {
-    g_assert(false); /* should never be invoked */
+    event_notifier_test_and_clear(n);
 }
 
 static void test_acquire(void)
 {
     QemuThread thread;
-    EventNotifier notifier;
     AcquireTestData data;
 
     /* Dummy event notifier ensures aio_poll() will block */
-    event_notifier_init(&notifier, false);
-    set_event_notifier(ctx, &notifier, dummy_notifier_read);
+    event_notifier_init(&data.notifier, false);
+    set_event_notifier(ctx, &data.notifier, dummy_notifier_read);
     g_assert(!aio_poll(ctx, false)); /* consume aio_notify() */
 
     qemu_mutex_init(&data.start_lock);
@@ -152,12 +157,13 @@ static void test_acquire(void)
     /* Block in aio_poll(), let other thread kick us and acquire context */
     aio_context_acquire(ctx);
     qemu_mutex_unlock(&data.start_lock); /* let the thread run */
-    g_assert(!aio_poll(ctx, true));
+    g_assert(aio_poll(ctx, true));
+    g_assert(!data.thread_acquired);
     aio_context_release(ctx);
 
     qemu_thread_join(&thread);
-    set_event_notifier(ctx, &notifier, NULL);
-    event_notifier_cleanup(&notifier);
+    set_event_notifier(ctx, &data.notifier, NULL);
+    event_notifier_cleanup(&data.notifier);
 
     g_assert(data.thread_acquired);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 15/16] qemu-thread: introduce QemuRecMutex
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (13 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 14/16] iothread: release AioContext around aio_poll Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 16/16] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
  2016-03-18 16:03 ` [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Stefan Hajnoczi
  16 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

GRecMutex is new in glib 2.32, so we cannot use it.  Introduce
a recursive mutex in qemu-thread instead, which will be used
instead of RFifoLock.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/thread-posix.h |  6 ++++++
 include/qemu/thread-win32.h | 10 ++++++++++
 include/qemu/thread.h       |  3 +++
 util/qemu-thread-posix.c    | 13 +++++++++++++
 util/qemu-thread-win32.c    | 25 +++++++++++++++++++++++++
 5 files changed, 57 insertions(+)

diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index eb5c7a1..2fb6b90 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -3,6 +3,12 @@
 #include "pthread.h"
 #include <semaphore.h>
 
+typedef QemuMutex QemuRecMutex;
+#define qemu_rec_mutex_destroy qemu_mutex_destroy
+#define qemu_rec_mutex_lock qemu_mutex_lock
+#define qemu_rec_mutex_try_lock qemu_mutex_try_lock
+#define qemu_rec_mutex_unlock qemu_mutex_unlock
+
 struct QemuMutex {
     pthread_mutex_t lock;
 };
diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
index 385ff5f..92c1969 100644
--- a/include/qemu/thread-win32.h
+++ b/include/qemu/thread-win32.h
@@ -7,6 +7,16 @@ struct QemuMutex {
     LONG owner;
 };
 
+typedef struct QemuRecMutex QemuRecMutex;
+struct QemuRecMutex {
+    CRITICAL_SECTION lock;
+};
+
+void qemu_rec_mutex_destroy(QemuRecMutex *mutex);
+void qemu_rec_mutex_lock(QemuRecMutex *mutex);
+int qemu_rec_mutex_trylock(QemuRecMutex *mutex);
+void qemu_rec_mutex_unlock(QemuRecMutex *mutex);
+
 struct QemuCond {
     LONG waiters, target;
     HANDLE sema;
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index bdae6df..cb10595 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -23,6 +23,9 @@ void qemu_mutex_lock(QemuMutex *mutex);
 int qemu_mutex_trylock(QemuMutex *mutex);
 void qemu_mutex_unlock(QemuMutex *mutex);
 
+/* Prototypes for other functions are in thread-posix.h/thread-win32.h.  */
+void qemu_rec_mutex_init(QemuRecMutex *mutex);
+
 void qemu_cond_init(QemuCond *cond);
 void qemu_cond_destroy(QemuCond *cond);
 
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 74a3023..1aec83f 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -80,6 +80,19 @@ void qemu_mutex_unlock(QemuMutex *mutex)
         error_exit(err, __func__);
 }
 
+void qemu_rec_mutex_init(QemuRecMutex *mutex)
+{
+    int err;
+    pthread_mutexattr_t attr;
+
+    pthread_mutexattr_init(&attr);
+    pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
+    err = pthread_mutex_init(&mutex->lock, &attr);
+    if (err) {
+        error_exit(err, __func__);
+    }
+}
+
 void qemu_cond_init(QemuCond *cond)
 {
     int err;
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 98a5ddf..171d83c 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -79,6 +79,31 @@ void qemu_mutex_unlock(QemuMutex *mutex)
     LeaveCriticalSection(&mutex->lock);
 }
 
+void qemu_rec_mutex_init(QemuRecMutex *mutex)
+{
+    InitializeCriticalSection(&mutex->lock);
+}
+
+void qemu_rec_mutex_destroy(QemuRecMutex *mutex)
+{
+    DeleteCriticalSection(&mutex->lock);
+}
+
+void qemu_rec_mutex_lock(QemuRecMutex *mutex)
+{
+    EnterCriticalSection(&mutex->lock);
+}
+
+int qemu_rec_mutex_trylock(QemuRecMutex *mutex)
+{
+    return !TryEnterCriticalSection(&mutex->lock);
+}
+
+void qemu_rec_mutex_unlock(QemuRecMutex *mutex)
+{
+    LeaveCriticalSection(&mutex->lock);
+}
+
 void qemu_cond_init(QemuCond *cond)
 {
     memset(cond, 0, sizeof(*cond));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 16/16] aio: convert from RFifoLock to QemuRecMutex
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (14 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 15/16] qemu-thread: introduce QemuRecMutex Paolo Bonzini
@ 2016-03-16 14:16 ` Paolo Bonzini
  2016-03-18 16:03 ` [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Stefan Hajnoczi
  16 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-16 14:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha

It is simpler and a bit faster, and QEMU does not need the contention
callbacks (and thus the fairness) anymore.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c                  |  8 ++---
 include/block/aio.h      |  3 +-
 include/qemu/rfifolock.h | 54 ----------------------------
 tests/.gitignore         |  1 -
 tests/Makefile           |  2 --
 tests/test-rfifolock.c   | 92 ------------------------------------------------
 util/Makefile.objs       |  1 -
 util/rfifolock.c         | 78 ----------------------------------------
 8 files changed, 5 insertions(+), 234 deletions(-)
 delete mode 100644 include/qemu/rfifolock.h
 delete mode 100644 tests/test-rfifolock.c
 delete mode 100644 util/rfifolock.c

diff --git a/async.c b/async.c
index 0084d57..c9f2fc2 100644
--- a/async.c
+++ b/async.c
@@ -254,7 +254,7 @@ aio_ctx_finalize(GSource     *source)
 
     aio_set_event_notifier(ctx, &ctx->notifier, false, NULL);
     event_notifier_cleanup(&ctx->notifier);
-    rfifolock_destroy(&ctx->lock);
+    qemu_rec_mutex_destroy(&ctx->lock);
     qemu_mutex_destroy(&ctx->bh_lock);
     timerlistgroup_deinit(&ctx->tlg);
 }
@@ -332,7 +332,7 @@ AioContext *aio_context_new(Error **errp)
                            event_notifier_dummy_cb);
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
-    rfifolock_init(&ctx->lock, NULL, NULL);
+    qemu_rec_mutex_init(&ctx->lock);
     timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
 
     return ctx;
@@ -353,10 +353,10 @@ void aio_context_unref(AioContext *ctx)
 
 void aio_context_acquire(AioContext *ctx)
 {
-    rfifolock_lock(&ctx->lock);
+    qemu_rec_mutex_lock(&ctx->lock);
 }
 
 void aio_context_release(AioContext *ctx)
 {
-    rfifolock_unlock(&ctx->lock);
+    qemu_rec_mutex_unlock(&ctx->lock);
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index ba1ee81..d9ff265 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -19,7 +19,6 @@
 #include "qemu/queue.h"
 #include "qemu/event_notifier.h"
 #include "qemu/thread.h"
-#include "qemu/rfifolock.h"
 #include "qemu/timer.h"
 
 typedef struct BlockAIOCB BlockAIOCB;
@@ -52,7 +51,7 @@ struct AioContext {
     GSource source;
 
     /* Protects all fields from multi-threaded access */
-    RFifoLock lock;
+    QemuRecMutex lock;
 
     /* The list of registered AIO handlers */
     QLIST_HEAD(, AioHandler) aio_handlers;
diff --git a/include/qemu/rfifolock.h b/include/qemu/rfifolock.h
deleted file mode 100644
index b23ab53..0000000
--- a/include/qemu/rfifolock.h
+++ /dev/null
@@ -1,54 +0,0 @@
-/*
- * Recursive FIFO lock
- *
- * Copyright Red Hat, Inc. 2013
- *
- * Authors:
- *  Stefan Hajnoczi   <stefanha@redhat.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef QEMU_RFIFOLOCK_H
-#define QEMU_RFIFOLOCK_H
-
-#include "qemu/thread.h"
-
-/* Recursive FIFO lock
- *
- * This lock provides more features than a plain mutex:
- *
- * 1. Fairness - enforces FIFO order.
- * 2. Nesting - can be taken recursively.
- * 3. Contention callback - optional, called when thread must wait.
- *
- * The recursive FIFO lock is heavyweight so prefer other synchronization
- * primitives if you do not need its features.
- */
-typedef struct {
-    QemuMutex lock;             /* protects all fields */
-
-    /* FIFO order */
-    unsigned int head;          /* active ticket number */
-    unsigned int tail;          /* waiting ticket number */
-    QemuCond cond;              /* used to wait for our ticket number */
-
-    /* Nesting */
-    QemuThread owner_thread;    /* thread that currently has ownership */
-    unsigned int nesting;       /* amount of nesting levels */
-
-    /* Contention callback */
-    void (*cb)(void *);         /* called when thread must wait, with ->lock
-                                 * held so it may not recursively lock/unlock
-                                 */
-    void *cb_opaque;
-} RFifoLock;
-
-void rfifolock_init(RFifoLock *r, void (*cb)(void *), void *opaque);
-void rfifolock_destroy(RFifoLock *r);
-void rfifolock_lock(RFifoLock *r);
-void rfifolock_unlock(RFifoLock *r);
-
-#endif /* QEMU_RFIFOLOCK_H */
diff --git a/tests/.gitignore b/tests/.gitignore
index 787c95c..b3da3f1 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -51,7 +51,6 @@ test-qmp-introspect.[ch]
 test-qmp-marshal.c
 test-qmp-output-visitor
 test-rcu-list
-test-rfifolock
 test-string-input-visitor
 test-string-output-visitor
 test-thread-pool
diff --git a/tests/Makefile b/tests/Makefile
index cd4bbd4..8a4e42f 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -39,7 +39,6 @@ check-unit-y += tests/test-visitor-serialization$(EXESUF)
 check-unit-y += tests/test-iov$(EXESUF)
 gcov-files-test-iov-y = util/iov.c
 check-unit-y += tests/test-aio$(EXESUF)
-check-unit-$(CONFIG_POSIX) += tests/test-rfifolock$(EXESUF)
 check-unit-y += tests/test-throttle$(EXESUF)
 gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
 gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
@@ -404,7 +403,6 @@ tests/check-qom-interface$(EXESUF): tests/check-qom-interface.o $(test-qom-obj-y
 tests/check-qom-proplist$(EXESUF): tests/check-qom-proplist.o $(test-qom-obj-y)
 tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(test-block-obj-y)
 tests/test-aio$(EXESUF): tests/test-aio.o $(test-block-obj-y)
-tests/test-rfifolock$(EXESUF): tests/test-rfifolock.o $(test-util-obj-y)
 tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
 tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o $(test-block-obj-y) $(test-util-obj-y)
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
diff --git a/tests/test-rfifolock.c b/tests/test-rfifolock.c
deleted file mode 100644
index 9a3cb24..0000000
--- a/tests/test-rfifolock.c
+++ /dev/null
@@ -1,92 +0,0 @@
-/*
- * RFifoLock tests
- *
- * Copyright Red Hat, Inc. 2013
- *
- * Authors:
- *  Stefan Hajnoczi    <stefanha@redhat.com>
- *
- * This work is licensed under the terms of the GNU LGPL, version 2 or later.
- * See the COPYING.LIB file in the top-level directory.
- */
-
-#include "qemu/osdep.h"
-#include <glib.h>
-#include "qemu-common.h"
-#include "qemu/rfifolock.h"
-
-static void test_nesting(void)
-{
-    RFifoLock lock;
-
-    /* Trivial test, ensure the lock is recursive */
-    rfifolock_init(&lock, NULL, NULL);
-    rfifolock_lock(&lock);
-    rfifolock_lock(&lock);
-    rfifolock_lock(&lock);
-    rfifolock_unlock(&lock);
-    rfifolock_unlock(&lock);
-    rfifolock_unlock(&lock);
-    rfifolock_destroy(&lock);
-}
-
-typedef struct {
-    RFifoLock lock;
-    int fd[2];
-} CallbackTestData;
-
-static void rfifolock_cb(void *opaque)
-{
-    CallbackTestData *data = opaque;
-    int ret;
-    char c = 0;
-
-    ret = write(data->fd[1], &c, sizeof(c));
-    g_assert(ret == 1);
-}
-
-static void *callback_thread(void *opaque)
-{
-    CallbackTestData *data = opaque;
-
-    /* The other thread holds the lock so the contention callback will be
-     * invoked...
-     */
-    rfifolock_lock(&data->lock);
-    rfifolock_unlock(&data->lock);
-    return NULL;
-}
-
-static void test_callback(void)
-{
-    CallbackTestData data;
-    QemuThread thread;
-    int ret;
-    char c;
-
-    rfifolock_init(&data.lock, rfifolock_cb, &data);
-    ret = qemu_pipe(data.fd);
-    g_assert(ret == 0);
-
-    /* Hold lock but allow the callback to kick us by writing to the pipe */
-    rfifolock_lock(&data.lock);
-    qemu_thread_create(&thread, "callback_thread",
-                       callback_thread, &data, QEMU_THREAD_JOINABLE);
-    ret = read(data.fd[0], &c, sizeof(c));
-    g_assert(ret == 1);
-    rfifolock_unlock(&data.lock);
-    /* If we got here then the callback was invoked, as expected */
-
-    qemu_thread_join(&thread);
-    close(data.fd[0]);
-    close(data.fd[1]);
-    rfifolock_destroy(&data.lock);
-}
-
-int main(int argc, char **argv)
-{
-    g_test_init(&argc, &argv, NULL);
-    g_test_add_func("/nesting", test_nesting);
-    g_test_add_func("/callback", test_callback);
-    return g_test_run();
-}
diff --git a/util/Makefile.objs b/util/Makefile.objs
index a8a777e..c0223c6 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -23,7 +23,6 @@ util-obj-y += crc32c.o
 util-obj-y += throttle.o
 util-obj-y += getauxval.o
 util-obj-y += readline.o
-util-obj-y += rfifolock.o
 util-obj-y += rcu.o
 util-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
 util-obj-y += qemu-coroutine-sleep.o
diff --git a/util/rfifolock.c b/util/rfifolock.c
deleted file mode 100644
index c22f5fe..0000000
--- a/util/rfifolock.c
+++ /dev/null
@@ -1,78 +0,0 @@
-/*
- * Recursive FIFO lock
- *
- * Copyright Red Hat, Inc. 2013
- *
- * Authors:
- *  Stefan Hajnoczi   <stefanha@redhat.com>
- *
- * This work is licensed under the terms of the GNU LGPL, version 2 or later.
- * See the COPYING.LIB file in the top-level directory.
- *
- */
-
-#include "qemu/osdep.h"
-#include "qemu/rfifolock.h"
-
-void rfifolock_init(RFifoLock *r, void (*cb)(void *), void *opaque)
-{
-    qemu_mutex_init(&r->lock);
-    r->head = 0;
-    r->tail = 0;
-    qemu_cond_init(&r->cond);
-    r->nesting = 0;
-    r->cb = cb;
-    r->cb_opaque = opaque;
-}
-
-void rfifolock_destroy(RFifoLock *r)
-{
-    qemu_cond_destroy(&r->cond);
-    qemu_mutex_destroy(&r->lock);
-}
-
-/*
- * Theory of operation:
- *
- * In order to ensure FIFO ordering, implement a ticketlock.  Threads acquiring
- * the lock enqueue themselves by incrementing the tail index.  When the lock
- * is unlocked, the head is incremented and waiting threads are notified.
- *
- * Recursive locking does not take a ticket since the head is only incremented
- * when the outermost recursive caller unlocks.
- */
-void rfifolock_lock(RFifoLock *r)
-{
-    qemu_mutex_lock(&r->lock);
-
-    /* Take a ticket */
-    unsigned int ticket = r->tail++;
-
-    if (r->nesting > 0 && qemu_thread_is_self(&r->owner_thread)) {
-        r->tail--; /* put ticket back, we're nesting */
-    } else {
-        while (ticket != r->head) {
-            /* Invoke optional contention callback */
-            if (r->cb) {
-                r->cb(r->cb_opaque);
-            }
-            qemu_cond_wait(&r->cond, &r->lock);
-        }
-    }
-
-    qemu_thread_get_self(&r->owner_thread);
-    r->nesting++;
-    qemu_mutex_unlock(&r->lock);
-}
-
-void rfifolock_unlock(RFifoLock *r)
-{
-    qemu_mutex_lock(&r->lock);
-    assert(r->nesting > 0);
-    assert(qemu_thread_is_self(&r->owner_thread));
-    if (--r->nesting == 0) {
-        r->head++;
-        qemu_cond_broadcast(&r->cond);
-    }
-    qemu_mutex_unlock(&r->lock);
-}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/16] mirror: use bottom half to re-enter coroutine
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 05/16] mirror: use bottom half to re-enter coroutine Paolo Bonzini
@ 2016-03-17  2:23   ` Fam Zheng
  0 siblings, 0 replies; 26+ messages in thread
From: Fam Zheng @ 2016-03-17  2:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, stefanha

On Wed, 03/16 15:16, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: use aio_bh_new() [Fam]
> 
>  block/mirror.c | 24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 9635fa8..2c7874d 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -72,6 +72,7 @@ typedef struct MirrorOp {
>      QEMUIOVector qiov;
>      int64_t sector_num;
>      int nb_sectors;
> +    QEMUBH *co_enter_bh;
>  } MirrorOp;
>  
>  static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
> @@ -87,6 +88,18 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
>      }
>  }
>  
> +static void mirror_bh_cb(void *opaque)
> +{
> +    MirrorOp *op = opaque;
> +    MirrorBlockJob *s = op->s;
> +
> +    qemu_bh_delete(op->co_enter_bh);
> +    g_free(op);
> +    if (s->waiting_for_io) {
> +        qemu_coroutine_enter(s->common.co, NULL);
> +    }
> +}
> +
>  static void mirror_iteration_done(MirrorOp *op, int ret)
>  {
>      MirrorBlockJob *s = op->s;
> @@ -117,11 +130,14 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>      }
>  
>      qemu_iovec_destroy(&op->qiov);
> -    g_free(op);
>  
> -    if (s->waiting_for_io) {
> -        qemu_coroutine_enter(s->common.co, NULL);
> -    }
> +    /* The I/O operation is not finished until the callback returns.
> +     * If we call qemu_coroutine_enter here, there is the possibility
> +     * of a deadlock when the coroutine calls bdrv_drained_begin.
> +     */
> +    op->co_enter_bh = aio_bh_new(bdrv_get_aio_context(s->target),
> +                                 mirror_bh_cb, op);
> +    qemu_bh_schedule(op->co_enter_bh);

I think we can conditionally create the BH only if s->waiting_for_io is true,
but the else branch is less important, so:

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/16] block: move restarting of throttled reqs to block/throttle-groups.c
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 02/16] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
@ 2016-03-17  2:39   ` Fam Zheng
  0 siblings, 0 replies; 26+ messages in thread
From: Fam Zheng @ 2016-03-17  2:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, stefanha

On Wed, 03/16 15:16, Paolo Bonzini wrote:
> We want to remove throttled_reqs from block/io.c.  This is the easy
> part---hide the handling of throttled_reqs during disable/enable of
> throttling within throttle-groups.c.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/16] block: introduce bdrv_no_throttling_begin/end
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 03/16] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
@ 2016-03-17  2:52   ` Fam Zheng
  0 siblings, 0 replies; 26+ messages in thread
From: Fam Zheng @ 2016-03-17  2:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, stefanha

On Wed, 03/16 15:16, Paolo Bonzini wrote:
> Extract the handling of throttling from bdrv_flush_io_queue.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/16] block: add BDS field to count in-flight requests
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 06/16] block: add BDS field to count in-flight requests Paolo Bonzini
@ 2016-03-17  2:53   ` Fam Zheng
  0 siblings, 0 replies; 26+ messages in thread
From: Fam Zheng @ 2016-03-17  2:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, stefanha

On Wed, 03/16 15:16, Paolo Bonzini wrote:
> Unlike tracked_requests, this field also counts throttled requests,
> and remains non-zero if an AIO operation needs a BH to be "really"
> completed.
> 
> With this change, it is no longer necessary to have a dummy
> BdrvTrackedRequest for requests that are never serialising, and
> it is no longer necessary to poll the AioContext once after
> bdrv_requests_pending(bs) returns false.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Assuming this has no change from v1:

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/16] block: wait for all pending I/O when doing synchronous requests
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 09/16] block: wait for all pending I/O when doing synchronous requests Paolo Bonzini
@ 2016-03-17  2:55   ` Fam Zheng
  0 siblings, 0 replies; 26+ messages in thread
From: Fam Zheng @ 2016-03-17  2:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, stefanha

On Wed, 03/16 15:16, Paolo Bonzini wrote:
> Synchronous I/O should in general happen either in the main thread (e.g.
> for bdrv_open and bdrv_create) or between bdrv_drained_begin and
> bdrv_drained_end.  Therefore, the simplest way to wait for it to finish
> is to wait for _all_ pending I/O to complete.
> 
> In fact, there was one case in bdrv_close where we explicitly drained
> after bdrv_flush; this is now unnecessary.  And we should probably have
> called bdrv_drain_all after calls to bdrv_flush_all, which is now
> unnecessary too.
> 
> This decouples synchronous I/O from aio_poll.  When the request used
> not to be tracked as part of bdrv_drain (e.g. bdrv_co_get_block_status)
> we need to update the in_flight count.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/16] block: change drain to look only at one child at a time
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 07/16] block: change drain to look only at one child at a time Paolo Bonzini
@ 2016-03-17  3:10   ` Fam Zheng
  0 siblings, 0 replies; 26+ messages in thread
From: Fam Zheng @ 2016-03-17  3:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, stefanha

On Wed, 03/16 15:16, Paolo Bonzini wrote:
> bdrv_requests_pending is checking children to also wait until internal
> requests (such as metadata writes) have completed.  However, checking
> children is in general overkill.  Children requests can be of two kinds:
> 
> - requests caused by an operation on bs, e.g. a bdrv_aio_write to bs
> causing a write to bs->file->bs.  In this case, the parent's in_flight
> count will always be incremented by at least one for every request in
> the child.
> 
> - asynchronous metadata writes or flushes.  Such writes can be started
> even if bs's in_flight count is zero, but not after the .bdrv_drain
> callback has been invoked.
> 
> This patch therefore changes bdrv_drain to finish I/O in the parent
> (after which the parent's in_flight will be locked to zero), call
> bdrv_drain (after which the parent will not generate I/O on the child
> anymore), and then wait for internal I/O in the children to complete.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> 	v1->v2: moved bdrv_drain callback after in_flight is 0
> 	in the parent [from QED drain discussion]

This is nice!

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite
  2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
                   ` (15 preceding siblings ...)
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 16/16] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
@ 2016-03-18 16:03 ` Stefan Hajnoczi
  16 siblings, 0 replies; 26+ messages in thread
From: Stefan Hajnoczi @ 2016-03-18 16:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 649 bytes --]

On Wed, Mar 16, 2016 at 03:16:41PM +0100, Paolo Bonzini wrote:
> Now that the current dataplane failure has been isolated to an unwanted
> reentrancy in virtio, I can post v2 of these patches.  Please review,
> as I would like to get part 2 in QEMU 2.6 as well (I plan to send it out
> tomorrow, since I hope there will be no comments on this one and I have
> already tested both of them).
> 
> The changes from v1 are minor and all correspond to Fam's review (I've
> pointed them out in the single patches - patch 2,5,7,9,13).

Please rebase onto Kevin's block branch to handle conflicts around
unthrottled requests.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/16] sheepdog: disable dataplane
  2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 11/16] sheepdog: disable dataplane Paolo Bonzini
@ 2016-03-23 10:45   ` Kevin Wolf
  2016-03-23 11:07     ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Kevin Wolf @ 2016-03-23 10:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, qemu-devel, stefanha

Am 16.03.2016 um 15:16 hat Paolo Bonzini geschrieben:
> sheepdog has some calls to aio_poll that are hard to eliminate, for
> example in sd_sheepdog_goto's call to do_req.  Since I don't have
> means to test sheepdog well, disable dataplane altogether for this
> driver.
> 
> Reviewed-by: Fam Zheng <famz@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/sheepdog.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index a6e98a5..8ced3e5 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -364,6 +364,7 @@ struct SheepdogAIOCB {
>  typedef struct BDRVSheepdogState {
>      BlockDriverState *bs;
>      AioContext *aio_context;
> +    Error *blocker;
>  
>      SheepdogInode inode;
>  
> @@ -1422,6 +1423,21 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
>      Error *local_err = NULL;
>      const char *filename;
>  
> +    /* sd_snapshot_goto does blocking operations that call aio_poll
> +     * (through do_req).  This can cause races with iothread:
> +     *
> +     *       main thread                       I/O thread
> +     *       -----------------                 ------------------
> +     *       while(srco.finished == false)
> +     *                                         aio_poll(..., true)
> +     *                                            srco.finished = true
> +     *         aio_poll(..., true)
> +     *
> +     * Now aio_poll potentially blocks forever.
> +     */
> +    error_setg(&s->blocker, "sheepdog does not support iothreads");
> +    bdrv_op_block(bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);

Our current op blockers are weak, so this doesn't completely rule out
having a sheepdog BDS under a dataplane device.

Actually, did you check that even the most obvious case works? If we
have a format layer on top (which we do by default), I think attaching
to dataplane would still work because only the blockers of the top level
would be considered. We have to be sure about this one.

But there are other, less common cases that could still result in a bad
setup. Essentially, this just blocks enabling dataplane on a device that
has a sheepdog BDS (on top), but it doesn't block opening a sheepdog
backend for a device that has dataplane already enabled. This includes
scenarios with removable media like virtio-scsi CD-ROMs, but also live
snapshots or block jobs with a target image on sheepdog.

I wouldn't feel comfortable about ignoring this second part, but maybe
we could get away with it if we have a plan how to fix it in the long
run. The new op blockers should be able to do that, but I guess it will
be well into the 2.7 development cycle, if not later, before we have
them.

Kevin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/16] sheepdog: disable dataplane
  2016-03-23 10:45   ` Kevin Wolf
@ 2016-03-23 11:07     ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-23 11:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: famz, qemu-devel, stefanha



On 23/03/2016 11:45, Kevin Wolf wrote:
> I wouldn't feel comfortable about ignoring this second part, but maybe
> we could get away with it if we have a plan how to fix it in the long
> run. The new op blockers should be able to do that, but I guess it will
> be well into the 2.7 development cycle, if not later, before we have
> them.

I have agreed with Stefan to only merge the first four patches of this
series (and even that seems unlikely right now).  I'll keep this in
mind, and probably just try and fix sheepdog.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-03-23 11:08 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-16 14:16 [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 01/16] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 02/16] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
2016-03-17  2:39   ` Fam Zheng
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 03/16] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
2016-03-17  2:52   ` Fam Zheng
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 04/16] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Paolo Bonzini
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 05/16] mirror: use bottom half to re-enter coroutine Paolo Bonzini
2016-03-17  2:23   ` Fam Zheng
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 06/16] block: add BDS field to count in-flight requests Paolo Bonzini
2016-03-17  2:53   ` Fam Zheng
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 07/16] block: change drain to look only at one child at a time Paolo Bonzini
2016-03-17  3:10   ` Fam Zheng
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 08/16] blockjob: introduce .drain callback for jobs Paolo Bonzini
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 09/16] block: wait for all pending I/O when doing synchronous requests Paolo Bonzini
2016-03-17  2:55   ` Fam Zheng
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 10/16] nfs: replace aio_poll with bdrv_drain Paolo Bonzini
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 11/16] sheepdog: disable dataplane Paolo Bonzini
2016-03-23 10:45   ` Kevin Wolf
2016-03-23 11:07     ` Paolo Bonzini
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 12/16] aio: introduce aio_context_in_iothread Paolo Bonzini
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 13/16] block: only call aio_poll from iothread Paolo Bonzini
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 14/16] iothread: release AioContext around aio_poll Paolo Bonzini
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 15/16] qemu-thread: introduce QemuRecMutex Paolo Bonzini
2016-03-16 14:16 ` [Qemu-devel] [PATCH v2 16/16] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
2016-03-18 16:03 ` [Qemu-devel] [PATCH v2 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.