All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState
@ 2016-04-07 16:33 Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 1/8] block: Don't disable I/O throttling on sync requests Paolo Bonzini
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Paolo Bonzini @ 2016-04-07 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, famz, qemu-block, stefanha

Patch 1 comes from Kevin's series to do BlockBackend throttling.

Patches 2-6 are from my bdrv_drain patches.  They apply on top of Fam's
patch (which will be in 2.6) that introduces bdrv_co_drain.  Patch 4
is new in this version, compared to v3.

Patches 7-8 are new but based on Ming Lei's old submission.
I'm including them here because they apply on top of patches 2-6.

Kevin Wolf (1):
  block: Don't disable I/O throttling on sync requests

Paolo Bonzini (7):
  block: make bdrv_start_throttled_reqs return void
  block: move restarting of throttled reqs to block/throttle-groups.c
  block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from
    bdrv_drain/bdrv_co_drain
  block: introduce bdrv_no_throttling_begin/end
  block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
  linux-aio: make it more type safe
  linux-aio: share one LinuxAioState within an AioContext

 async.c                            |  23 +++++
 block.c                            |   1 -
 block/block-backend.c              |   6 +-
 block/io.c                         | 168 ++++++++++++++++++++++---------------
 block/linux-aio.c                  |  60 ++++++-------
 block/raw-posix.c                  | 133 ++++-------------------------
 block/raw-win32.c                  |   2 +-
 block/throttle-groups.c            |  18 ++++
 include/block/aio.h                |  13 +++
 include/block/block.h              |   3 +-
 include/block/block_int.h          |  14 +++-
 {block => include/block}/raw-aio.h |  15 ++--
 include/block/throttle-groups.h    |   1 +
 13 files changed, 216 insertions(+), 241 deletions(-)
 rename {block => include/block}/raw-aio.h (80%)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 1/8] block: Don't disable I/O throttling on sync requests
  2016-04-07 16:33 [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Paolo Bonzini
@ 2016-04-07 16:33 ` Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 2/8] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2016-04-07 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, famz, qemu-block, stefanha

From: Kevin Wolf <kwolf@redhat.com>

We had to disable I/O throttling with synchronous requests because we
didn't use to run timers in nested event loops when the code was
introduced. This isn't true any more, and throttling works just fine
even when using the synchronous API.

The removed code is in fact dead code since commit a8823a3b ('block: Use
blk_co_pwritev() for blk_write()') because I/O throttling can only be
set on the top layer, but BlockBackend always uses the coroutine
interface now instead of using the sync API emulation in block.c.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/block/io.c b/block/io.c
index a7dbf85..a91d862 100644
--- a/block/io.c
+++ b/block/io.c
@@ -608,17 +608,6 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
         .flags = flags,
     };
 
-    /**
-     * In sync call context, when the vcpu is blocked, this throttling timer
-     * will not fire; so the I/O throttling function has to be disabled here
-     * if it has been enabled.
-     */
-    if (bs->io_limits_enabled) {
-        fprintf(stderr, "Disabling I/O throttling on '%s' due "
-                        "to synchronous I/O.\n", bdrv_get_device_name(bs));
-        bdrv_io_limits_disable(bs);
-    }
-
     if (qemu_in_coroutine()) {
         /* Fast-path if already in coroutine context */
         bdrv_rw_co_entry(&rwco);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 2/8] block: make bdrv_start_throttled_reqs return void
  2016-04-07 16:33 [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 1/8] block: Don't disable I/O throttling on sync requests Paolo Bonzini
@ 2016-04-07 16:33 ` Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 3/8] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2016-04-07 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, famz, qemu-block, stefanha

The return value is unused and I am not sure why it would be useful.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index a91d862..691baa6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -71,10 +71,8 @@ void bdrv_set_io_limits(BlockDriverState *bs,
     }
 }
 
-/* this function drain all the throttled IOs */
-static bool bdrv_start_throttled_reqs(BlockDriverState *bs)
+static void bdrv_start_throttled_reqs(BlockDriverState *bs)
 {
-    bool drained = false;
     bool enabled = bs->io_limits_enabled;
     int i;
 
@@ -82,13 +80,11 @@ static bool bdrv_start_throttled_reqs(BlockDriverState *bs)
 
     for (i = 0; i < 2; i++) {
         while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
-            drained = true;
+            ;
         }
     }
 
     bs->io_limits_enabled = enabled;
-
-    return drained;
 }
 
 void bdrv_io_limits_disable(BlockDriverState *bs)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 3/8] block: move restarting of throttled reqs to block/throttle-groups.c
  2016-04-07 16:33 [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 1/8] block: Don't disable I/O throttling on sync requests Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 2/8] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
@ 2016-04-07 16:33 ` Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 4/8] block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain Paolo Bonzini
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2016-04-07 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, famz, qemu-block, stefanha

We want to remove throttled_reqs from block/io.c.  This is the easy
part---hide the handling of throttled_reqs during disable/enable of
throttling within throttle-groups.c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: only restart first request after throttle_group_config
	v3->v4: remove aio_context_acquire/release, caller does it [Berto]

 block/io.c                      | 15 +--------------
 block/throttle-groups.c         | 14 ++++++++++++++
 include/block/throttle-groups.h |  1 +
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/block/io.c b/block/io.c
index 691baa6..9201b89 100644
--- a/block/io.c
+++ b/block/io.c
@@ -62,28 +62,15 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
 void bdrv_set_io_limits(BlockDriverState *bs,
                         ThrottleConfig *cfg)
 {
-    int i;
-
     throttle_group_config(bs, cfg);
-
-    for (i = 0; i < 2; i++) {
-        qemu_co_enter_next(&bs->throttled_reqs[i]);
-    }
 }
 
 static void bdrv_start_throttled_reqs(BlockDriverState *bs)
 {
     bool enabled = bs->io_limits_enabled;
-    int i;
 
     bs->io_limits_enabled = false;
-
-    for (i = 0; i < 2; i++) {
-        while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
-            ;
-        }
-    }
-
+    throttle_group_restart_bs(bs);
     bs->io_limits_enabled = enabled;
 }
 
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 4920e09..b796f6b 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -313,6 +313,17 @@ void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs,
     qemu_mutex_unlock(&tg->lock);
 }
 
+void throttle_group_restart_bs(BlockDriverState *bs)
+{
+    int i;
+
+    for (i = 0; i < 2; i++) {
+        while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
+            ;
+        }
+    }
+}
+
 /* Update the throttle configuration for a particular group. Similar
  * to throttle_config(), but guarantees atomicity within the
  * throttling group.
@@ -335,6 +346,9 @@ void throttle_group_config(BlockDriverState *bs, ThrottleConfig *cfg)
     }
     throttle_config(ts, tt, cfg);
     qemu_mutex_unlock(&tg->lock);
+
+    qemu_co_enter_next(&bs->throttled_reqs[0]);
+    qemu_co_enter_next(&bs->throttled_reqs[1]);
 }
 
 /* Get the throttle configuration from a particular group. Similar to
diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index aba28f3..395f72d 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -38,6 +38,7 @@ void throttle_group_get_config(BlockDriverState *bs, ThrottleConfig *cfg);
 
 void throttle_group_register_bs(BlockDriverState *bs, const char *groupname);
 void throttle_group_unregister_bs(BlockDriverState *bs);
+void throttle_group_restart_bs(BlockDriverState *bs);
 
 void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs,
                                                         unsigned int bytes,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 4/8] block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain
  2016-04-07 16:33 [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 3/8] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
@ 2016-04-07 16:33 ` Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 5/8] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2016-04-07 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, famz, qemu-block, stefanha

Do not call bdrv_drain_recurse twice in bdrv_co_drain.  A small
tweak to the logic in Fam's patch, which is harmless since no
one implements bdrv_drain anyway.  But better get it right.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	v4: new

 block/io.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/block/io.c b/block/io.c
index 9201b89..15bc956 100644
--- a/block/io.c
+++ b/block/io.c
@@ -243,18 +243,30 @@ typedef struct {
     bool done;
 } BdrvCoDrainData;
 
+static void bdrv_drain_poll(BlockDriverState *bs)
+{
+    bool busy = true;
+
+    while (busy) {
+        /* Keep iterating */
+        bdrv_flush_io_queue(bs);
+        busy = bdrv_requests_pending(bs);
+        busy |= aio_poll(bdrv_get_aio_context(bs), busy);
+    }
+}
+
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
     BdrvCoDrainData *data = opaque;
     Coroutine *co = data->co;
 
     qemu_bh_delete(data->bh);
-    bdrv_drain(data->bs);
+    bdrv_drain_poll(data->bs);
     data->done = true;
     qemu_coroutine_enter(co, NULL);
 }
 
-void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
+static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs)
 {
     BdrvCoDrainData data;
 
@@ -288,20 +300,19 @@ void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
  * not depend on events in other AioContexts.  In that case, use
  * bdrv_drain_all() instead.
  */
-void bdrv_drain(BlockDriverState *bs)
+void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
 {
-    bool busy = true;
+    bdrv_drain_recurse(bs);
+    bdrv_co_yield_to_drain(bs);
+}
 
+void bdrv_drain(BlockDriverState *bs)
+{
     bdrv_drain_recurse(bs);
     if (qemu_in_coroutine()) {
-        bdrv_co_drain(bs);
-        return;
-    }
-    while (busy) {
-        /* Keep iterating */
-         bdrv_flush_io_queue(bs);
-         busy = bdrv_requests_pending(bs);
-         busy |= aio_poll(bdrv_get_aio_context(bs), busy);
+        bdrv_co_yield_to_drain(bs);
+    } else {
+        bdrv_drain_poll(bs);
     }
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 5/8] block: introduce bdrv_no_throttling_begin/end
  2016-04-07 16:33 [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 4/8] block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain Paolo Bonzini
@ 2016-04-07 16:33 ` Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 6/8] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Paolo Bonzini
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2016-04-07 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, famz, qemu-block, stefanha

Extract the handling of throttling from bdrv_flush_io_queue.  These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".

Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	v3->v4: added assertion in bdrv_no_throttling_end [Berto]

 block.c                   |  1 -
 block/block-backend.c     |  6 ++----
 block/io.c                | 33 +++++++++++++++++++++------------
 block/throttle-groups.c   |  4 ++++
 include/block/block_int.h |  9 ++++++---
 5 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/block.c b/block.c
index d4939b4..6cbad0e 100644
--- a/block.c
+++ b/block.c
@@ -2261,7 +2261,6 @@ static void swap_feature_fields(BlockDriverState *bs_top,
 
     assert(!bs_new->throttle_state);
     if (bs_top->throttle_state) {
-        assert(bs_top->io_limits_enabled);
         bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
         bdrv_io_limits_disable(bs_top);
     }
diff --git a/block/block-backend.c b/block/block-backend.c
index d74f670..435856c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -794,7 +794,6 @@ int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
                          int nb_sectors)
 {
     BlockDriverState *bs = blk_bs(blk);
-    bool enabled;
     int ret;
 
     ret = blk_check_request(blk, sector_num, nb_sectors);
@@ -802,10 +801,9 @@ int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
         return ret;
     }
 
-    enabled = bs->io_limits_enabled;
-    bs->io_limits_enabled = false;
+    bdrv_no_throttling_begin(bs);
     ret = blk_read(blk, sector_num, buf, nb_sectors);
-    bs->io_limits_enabled = enabled;
+    bdrv_no_throttling_end(bs);
     return ret;
 }
 
diff --git a/block/io.c b/block/io.c
index 15bc956..fbcf954 100644
--- a/block/io.c
+++ b/block/io.c
@@ -65,28 +65,32 @@ void bdrv_set_io_limits(BlockDriverState *bs,
     throttle_group_config(bs, cfg);
 }
 
-static void bdrv_start_throttled_reqs(BlockDriverState *bs)
+void bdrv_no_throttling_begin(BlockDriverState *bs)
 {
-    bool enabled = bs->io_limits_enabled;
+    if (bs->io_limits_disabled++ == 0) {
+        throttle_group_restart_bs(bs);
+    }
+}
 
-    bs->io_limits_enabled = false;
-    throttle_group_restart_bs(bs);
-    bs->io_limits_enabled = enabled;
+void bdrv_no_throttling_end(BlockDriverState *bs)
+{
+    assert(bs->io_limits_disabled);
+    --bs->io_limits_disabled;
 }
 
 void bdrv_io_limits_disable(BlockDriverState *bs)
 {
-    bs->io_limits_enabled = false;
-    bdrv_start_throttled_reqs(bs);
+    assert(bs->throttle_state);
+    bdrv_no_throttling_begin(bs);
     throttle_group_unregister_bs(bs);
+    bdrv_no_throttling_end(bs);
 }
 
 /* should be called before bdrv_set_io_limits if a limit is set */
 void bdrv_io_limits_enable(BlockDriverState *bs, const char *group)
 {
-    assert(!bs->io_limits_enabled);
+    assert(!bs->throttle_state);
     throttle_group_register_bs(bs, group);
-    bs->io_limits_enabled = true;
 }
 
 void bdrv_io_limits_update_group(BlockDriverState *bs, const char *group)
@@ -302,18 +306,22 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs)
  */
 void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
 {
+    bdrv_no_throttling_begin(bs);
     bdrv_drain_recurse(bs);
     bdrv_co_yield_to_drain(bs);
+    bdrv_no_throttling_end(bs);
 }
 
 void bdrv_drain(BlockDriverState *bs)
 {
+    bdrv_no_throttling_begin(bs);
     bdrv_drain_recurse(bs);
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs);
     } else {
         bdrv_drain_poll(bs);
     }
+    bdrv_no_throttling_end(bs);
 }
 
 /*
@@ -336,6 +344,7 @@ void bdrv_drain_all(void)
         if (bs->job) {
             block_job_pause(bs->job);
         }
+        bdrv_no_throttling_begin(bs);
         bdrv_drain_recurse(bs);
         aio_context_release(aio_context);
 
@@ -377,6 +386,7 @@ void bdrv_drain_all(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
+        bdrv_no_throttling_end(bs);
         if (bs->job) {
             block_job_resume(bs->job);
         }
@@ -980,7 +990,7 @@ int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
     }
 
     /* throttling disk I/O */
-    if (bs->io_limits_enabled) {
+    if (bs->throttle_state) {
         throttle_group_co_io_limits_intercept(bs, bytes, false);
     }
 
@@ -1330,7 +1340,7 @@ int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
     }
 
     /* throttling disk I/O */
-    if (bs->io_limits_enabled) {
+    if (bs->throttle_state) {
         throttle_group_co_io_limits_intercept(bs, bytes, true);
     }
 
@@ -2772,7 +2782,6 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
     } else if (bs->file) {
         bdrv_flush_io_queue(bs->file->bs);
     }
-    bdrv_start_throttled_reqs(bs);
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index b796f6b..9ac063a 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -219,6 +219,10 @@ static bool throttle_group_schedule_timer(BlockDriverState *bs,
     ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
     bool must_wait;
 
+    if (bs->io_limits_disabled) {
+        return false;
+    }
+
     /* Check if any of the timers in this group is already armed */
     if (tg->any_timer_armed[is_write]) {
         return true;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 10d8759..ed9314b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -424,10 +424,10 @@ struct BlockDriverState {
 
     /* I/O throttling.
      * throttle_state tells us if this BDS has I/O limits configured.
-     * io_limits_enabled tells us if they are currently being
-     * enforced, but it can be temporarily set to false */
+     * io_limits_disabled tells us if they are currently being enforced */
     CoQueue      throttled_reqs[2];
-    bool         io_limits_enabled;
+    unsigned int io_limits_disabled;
+
     /* The following fields are protected by the ThrottleGroup lock.
      * See the ThrottleGroup documentation for details. */
     ThrottleState *throttle_state;
@@ -713,6 +713,9 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
                                   const BdrvChildRole *child_role);
 void bdrv_root_unref_child(BdrvChild *child);
 
+void bdrv_no_throttling_begin(BlockDriverState *bs);
+void bdrv_no_throttling_end(BlockDriverState *bs);
+
 void blk_dev_change_media_cb(BlockBackend *blk, bool load);
 bool blk_dev_has_removable_media(BlockBackend *blk);
 bool blk_dev_has_tray(BlockBackend *blk);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 6/8] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
  2016-04-07 16:33 [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 5/8] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
@ 2016-04-07 16:33 ` Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 7/8] linux-aio: make it more type safe Paolo Bonzini
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2016-04-07 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, famz, qemu-block, stefanha

Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.

Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children.  The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	v3->v4: unplug children after parent, plug children before parent;
	bdrv_io_plug/unplug did it one way, bdrv_io_unplugged_begin/end
	didn't [Paolo]

 block/io.c                | 76 ++++++++++++++++++++++++++++++++++++-----------
 block/linux-aio.c         | 13 ++++----
 block/raw-aio.h           |  2 +-
 block/raw-posix.c         | 16 +---------
 include/block/block.h     |  3 +-
 include/block/block_int.h |  5 +++-
 6 files changed, 71 insertions(+), 44 deletions(-)

diff --git a/block/io.c b/block/io.c
index fbcf954..6b25216 100644
--- a/block/io.c
+++ b/block/io.c
@@ -253,7 +253,6 @@ static void bdrv_drain_poll(BlockDriverState *bs)
 
     while (busy) {
         /* Keep iterating */
-        bdrv_flush_io_queue(bs);
         busy = bdrv_requests_pending(bs);
         busy |= aio_poll(bdrv_get_aio_context(bs), busy);
     }
@@ -307,20 +306,24 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs)
 void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
 {
     bdrv_no_throttling_begin(bs);
+    bdrv_io_unplugged_begin(bs);
     bdrv_drain_recurse(bs);
     bdrv_co_yield_to_drain(bs);
+    bdrv_io_unplugged_end(bs);
     bdrv_no_throttling_end(bs);
 }
 
 void bdrv_drain(BlockDriverState *bs)
 {
     bdrv_no_throttling_begin(bs);
+    bdrv_io_unplugged_begin(bs);
     bdrv_drain_recurse(bs);
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs);
     } else {
         bdrv_drain_poll(bs);
     }
+    bdrv_io_unplugged_end(bs);
     bdrv_no_throttling_end(bs);
 }
 
@@ -345,6 +348,7 @@ void bdrv_drain_all(void)
             block_job_pause(bs->job);
         }
         bdrv_no_throttling_begin(bs);
+        bdrv_io_unplugged_begin(bs);
         bdrv_drain_recurse(bs);
         aio_context_release(aio_context);
 
@@ -369,7 +373,6 @@ void bdrv_drain_all(void)
             aio_context_acquire(aio_context);
             while ((bs = bdrv_next(bs))) {
                 if (aio_context == bdrv_get_aio_context(bs)) {
-                    bdrv_flush_io_queue(bs);
                     if (bdrv_requests_pending(bs)) {
                         busy = true;
                         aio_poll(aio_context, busy);
@@ -386,6 +389,7 @@ void bdrv_drain_all(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
+        bdrv_io_unplugged_end(bs);
         bdrv_no_throttling_end(bs);
         if (bs->job) {
             block_job_resume(bs->job);
@@ -2756,31 +2760,67 @@ void bdrv_add_before_write_notifier(BlockDriverState *bs,
 
 void bdrv_io_plug(BlockDriverState *bs)
 {
-    BlockDriver *drv = bs->drv;
-    if (drv && drv->bdrv_io_plug) {
-        drv->bdrv_io_plug(bs);
-    } else if (bs->file) {
-        bdrv_io_plug(bs->file->bs);
+    BdrvChild *child;
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_io_plug(child->bs);
+    }
+
+    if (bs->io_plugged++ == 0 && bs->io_plug_disabled == 0) {
+        BlockDriver *drv = bs->drv;
+        if (drv && drv->bdrv_io_plug) {
+            drv->bdrv_io_plug(bs);
+        }
     }
 }
 
 void bdrv_io_unplug(BlockDriverState *bs)
 {
-    BlockDriver *drv = bs->drv;
-    if (drv && drv->bdrv_io_unplug) {
-        drv->bdrv_io_unplug(bs);
-    } else if (bs->file) {
-        bdrv_io_unplug(bs->file->bs);
+    BdrvChild *child;
+
+    assert(bs->io_plugged);
+    if (--bs->io_plugged == 0 && bs->io_plug_disabled == 0) {
+        BlockDriver *drv = bs->drv;
+        if (drv && drv->bdrv_io_unplug) {
+            drv->bdrv_io_unplug(bs);
+        }
+    }
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_io_unplug(child->bs);
     }
 }
 
-void bdrv_flush_io_queue(BlockDriverState *bs)
+void bdrv_io_unplugged_begin(BlockDriverState *bs)
 {
-    BlockDriver *drv = bs->drv;
-    if (drv && drv->bdrv_flush_io_queue) {
-        drv->bdrv_flush_io_queue(bs);
-    } else if (bs->file) {
-        bdrv_flush_io_queue(bs->file->bs);
+    BdrvChild *child;
+
+    if (bs->io_plug_disabled++ == 0 && bs->io_plugged > 0) {
+        BlockDriver *drv = bs->drv;
+        if (drv && drv->bdrv_io_unplug) {
+            drv->bdrv_io_unplug(bs);
+        }
+    }
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_io_unplugged_begin(child->bs);
+    }
+}
+
+void bdrv_io_unplugged_end(BlockDriverState *bs)
+{
+    BdrvChild *child;
+
+    assert(bs->io_plug_disabled);
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_io_unplugged_end(child->bs);
+    }
+
+    if (--bs->io_plug_disabled == 0 && bs->io_plugged > 0) {
+        BlockDriver *drv = bs->drv;
+        if (drv && drv->bdrv_io_plug) {
+            drv->bdrv_io_plug(bs);
+        }
     }
 }
 
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 805757e..102bf92 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -220,19 +220,16 @@ void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
 {
     struct qemu_laio_state *s = aio_ctx;
 
-    s->io_q.plugged++;
+    assert(!s->io_q.plugged);
+    s->io_q.plugged = 1;
 }
 
-void laio_io_unplug(BlockDriverState *bs, void *aio_ctx, bool unplug)
+void laio_io_unplug(BlockDriverState *bs, void *aio_ctx)
 {
     struct qemu_laio_state *s = aio_ctx;
 
-    assert(s->io_q.plugged > 0 || !unplug);
-
-    if (unplug && --s->io_q.plugged > 0) {
-        return;
-    }
-
+    assert(s->io_q.plugged);
+    s->io_q.plugged = 0;
     if (!s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
         ioq_submit(s);
     }
diff --git a/block/raw-aio.h b/block/raw-aio.h
index 811e375..0f5e35a 100644
--- a/block/raw-aio.h
+++ b/block/raw-aio.h
@@ -43,7 +43,7 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
 void laio_detach_aio_context(void *s, AioContext *old_context);
 void laio_attach_aio_context(void *s, AioContext *new_context);
 void laio_io_plug(BlockDriverState *bs, void *aio_ctx);
-void laio_io_unplug(BlockDriverState *bs, void *aio_ctx, bool unplug);
+void laio_io_unplug(BlockDriverState *bs, void *aio_ctx);
 #endif
 
 #ifdef _WIN32
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 906d5c9..50135ad 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1345,17 +1345,7 @@ static void raw_aio_unplug(BlockDriverState *bs)
 #ifdef CONFIG_LINUX_AIO
     BDRVRawState *s = bs->opaque;
     if (s->use_aio) {
-        laio_io_unplug(bs, s->aio_ctx, true);
-    }
-#endif
-}
-
-static void raw_aio_flush_io_queue(BlockDriverState *bs)
-{
-#ifdef CONFIG_LINUX_AIO
-    BDRVRawState *s = bs->opaque;
-    if (s->use_aio) {
-        laio_io_unplug(bs, s->aio_ctx, false);
+        laio_io_unplug(bs, s->aio_ctx);
     }
 #endif
 }
@@ -1949,7 +1939,6 @@ BlockDriver bdrv_file = {
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
     .bdrv_io_unplug = raw_aio_unplug,
-    .bdrv_flush_io_queue = raw_aio_flush_io_queue,
 
     .bdrv_truncate = raw_truncate,
     .bdrv_getlength = raw_getlength,
@@ -2398,7 +2387,6 @@ static BlockDriver bdrv_host_device = {
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
     .bdrv_io_unplug = raw_aio_unplug,
-    .bdrv_flush_io_queue = raw_aio_flush_io_queue,
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength	= raw_getlength,
@@ -2528,7 +2516,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
     .bdrv_io_unplug = raw_aio_unplug,
-    .bdrv_flush_io_queue = raw_aio_flush_io_queue,
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength      = raw_getlength,
@@ -2664,7 +2651,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
     .bdrv_io_unplug = raw_aio_unplug,
-    .bdrv_flush_io_queue = raw_aio_flush_io_queue,
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength      = raw_getlength,
diff --git a/include/block/block.h b/include/block/block.h
index 3a73137..0e8b4d1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -520,7 +520,8 @@ int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
 
 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
-void bdrv_flush_io_queue(BlockDriverState *bs);
+void bdrv_io_unplugged_begin(BlockDriverState *bs);
+void bdrv_io_unplugged_end(BlockDriverState *bs);
 
 /**
  * bdrv_drained_begin:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ed9314b..f1aabb9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -294,7 +294,6 @@ struct BlockDriver {
     /* io queue for linux-aio */
     void (*bdrv_io_plug)(BlockDriverState *bs);
     void (*bdrv_io_unplug)(BlockDriverState *bs);
-    void (*bdrv_flush_io_queue)(BlockDriverState *bs);
 
     /**
      * Try to get @bs's logical and physical block size.
@@ -484,6 +483,10 @@ struct BlockDriverState {
     uint64_t write_threshold_offset;
     NotifierWithReturn write_threshold_notifier;
 
+    /* counters for nested bdrv_io_plug and bdrv_io_unplugged_begin */
+    unsigned io_plugged;
+    unsigned io_plug_disabled;
+
     int quiesce_counter;
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 7/8] linux-aio: make it more type safe
  2016-04-07 16:33 [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 6/8] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Paolo Bonzini
@ 2016-04-07 16:33 ` Paolo Bonzini
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext Paolo Bonzini
  2016-04-19  9:10 ` [Qemu-devel] [Qemu-block] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Stefan Hajnoczi
  8 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2016-04-07 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, famz, qemu-block, stefanha

Replace void* with an opaque LinuxAioState type.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/linux-aio.c | 46 +++++++++++++++++-----------------------------
 block/raw-aio.h   | 15 ++++++++-------
 block/raw-posix.c |  4 ++--
 3 files changed, 27 insertions(+), 38 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 102bf92..90ec98e 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -30,7 +30,7 @@
 
 struct qemu_laiocb {
     BlockAIOCB common;
-    struct qemu_laio_state *ctx;
+    LinuxAioState *ctx;
     struct iocb iocb;
     ssize_t ret;
     size_t nbytes;
@@ -46,7 +46,7 @@ typedef struct {
     QSIMPLEQ_HEAD(, qemu_laiocb) pending;
 } LaioQueue;
 
-struct qemu_laio_state {
+struct LinuxAioState {
     io_context_t ctx;
     EventNotifier e;
 
@@ -60,7 +60,7 @@ struct qemu_laio_state {
     int event_max;
 };
 
-static void ioq_submit(struct qemu_laio_state *s);
+static void ioq_submit(LinuxAioState *s);
 
 static inline ssize_t io_event_ret(struct io_event *ev)
 {
@@ -70,8 +70,7 @@ static inline ssize_t io_event_ret(struct io_event *ev)
 /*
  * Completes an AIO request (calls the callback and frees the ACB).
  */
-static void qemu_laio_process_completion(struct qemu_laio_state *s,
-    struct qemu_laiocb *laiocb)
+static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
 {
     int ret;
 
@@ -99,7 +98,7 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
  *
  * The function is somewhat tricky because it supports nested event loops, for
  * example when a request callback invokes aio_poll().  In order to do this,
- * the completion events array and index are kept in qemu_laio_state.  The BH
+ * the completion events array and index are kept in LinuxAioState.  The BH
  * reschedules itself as long as there are completions pending so it will
  * either be called again in a nested event loop or will be called after all
  * events have been completed.  When there are no events left to complete, the
@@ -107,7 +106,7 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
  */
 static void qemu_laio_completion_bh(void *opaque)
 {
-    struct qemu_laio_state *s = opaque;
+    LinuxAioState *s = opaque;
 
     /* Fetch more completion events when empty */
     if (s->event_idx == s->event_max) {
@@ -136,7 +135,7 @@ static void qemu_laio_completion_bh(void *opaque)
         laiocb->ret = io_event_ret(&s->events[s->event_idx]);
         s->event_idx++;
 
-        qemu_laio_process_completion(s, laiocb);
+        qemu_laio_process_completion(laiocb);
     }
 
     if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
@@ -146,7 +145,7 @@ static void qemu_laio_completion_bh(void *opaque)
 
 static void qemu_laio_completion_cb(EventNotifier *e)
 {
-    struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, e);
+    LinuxAioState *s = container_of(e, LinuxAioState, e);
 
     if (event_notifier_test_and_clear(&s->e)) {
         qemu_bh_schedule(s->completion_bh);
@@ -185,7 +184,7 @@ static void ioq_init(LaioQueue *io_q)
     io_q->blocked = false;
 }
 
-static void ioq_submit(struct qemu_laio_state *s)
+static void ioq_submit(LinuxAioState *s)
 {
     int ret, len;
     struct qemu_laiocb *aiocb;
@@ -216,18 +215,14 @@ static void ioq_submit(struct qemu_laio_state *s)
     s->io_q.blocked = (s->io_q.n > 0);
 }
 
-void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
+void laio_io_plug(BlockDriverState *bs, LinuxAioState *s)
 {
-    struct qemu_laio_state *s = aio_ctx;
-
     assert(!s->io_q.plugged);
     s->io_q.plugged = 1;
 }
 
-void laio_io_unplug(BlockDriverState *bs, void *aio_ctx)
+void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s)
 {
-    struct qemu_laio_state *s = aio_ctx;
-
     assert(s->io_q.plugged);
     s->io_q.plugged = 0;
     if (!s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
@@ -235,11 +230,10 @@ void laio_io_unplug(BlockDriverState *bs, void *aio_ctx)
     }
 }
 
-BlockAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
+BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockCompletionFunc *cb, void *opaque, int type)
 {
-    struct qemu_laio_state *s = aio_ctx;
     struct qemu_laiocb *laiocb;
     struct iocb *iocbs;
     off_t offset = sector_num * 512;
@@ -281,26 +275,22 @@ out_free_aiocb:
     return NULL;
 }
 
-void laio_detach_aio_context(void *s_, AioContext *old_context)
+void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
 {
-    struct qemu_laio_state *s = s_;
-
     aio_set_event_notifier(old_context, &s->e, false, NULL);
     qemu_bh_delete(s->completion_bh);
 }
 
-void laio_attach_aio_context(void *s_, AioContext *new_context)
+void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
 {
-    struct qemu_laio_state *s = s_;
-
     s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
     aio_set_event_notifier(new_context, &s->e, false,
                            qemu_laio_completion_cb);
 }
 
-void *laio_init(void)
+LinuxAioState *laio_init(void)
 {
-    struct qemu_laio_state *s;
+    LinuxAioState *s;
 
     s = g_malloc0(sizeof(*s));
     if (event_notifier_init(&s->e, false) < 0) {
@@ -322,10 +312,8 @@ out_free_state:
     return NULL;
 }
 
-void laio_cleanup(void *s_)
+void laio_cleanup(LinuxAioState *s)
 {
-    struct qemu_laio_state *s = s_;
-
     event_notifier_cleanup(&s->e);
 
     if (io_destroy(s->ctx) != 0) {
diff --git a/block/raw-aio.h b/block/raw-aio.h
index 0f5e35a..714714e 100644
--- a/block/raw-aio.h
+++ b/block/raw-aio.h
@@ -35,15 +35,16 @@
 
 /* linux-aio.c - Linux native implementation */
 #ifdef CONFIG_LINUX_AIO
-void *laio_init(void);
-void laio_cleanup(void *s);
-BlockAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
+typedef struct LinuxAioState LinuxAioState;
+LinuxAioState *laio_init(void);
+void laio_cleanup(LinuxAioState *s);
+BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockCompletionFunc *cb, void *opaque, int type);
-void laio_detach_aio_context(void *s, AioContext *old_context);
-void laio_attach_aio_context(void *s, AioContext *new_context);
-void laio_io_plug(BlockDriverState *bs, void *aio_ctx);
-void laio_io_unplug(BlockDriverState *bs, void *aio_ctx);
+void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
+void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
+void laio_io_plug(BlockDriverState *bs, LinuxAioState *s);
+void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s);
 #endif
 
 #ifdef _WIN32
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 50135ad..71ec463 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -139,7 +139,7 @@ typedef struct BDRVRawState {
 
 #ifdef CONFIG_LINUX_AIO
     int use_aio;
-    void *aio_ctx;
+    LinuxAioState *aio_ctx;
 #endif
 #ifdef CONFIG_XFS
     bool is_xfs:1;
@@ -398,7 +398,7 @@ static void raw_attach_aio_context(BlockDriverState *bs,
 }
 
 #ifdef CONFIG_LINUX_AIO
-static int raw_set_aio(void **aio_ctx, int *use_aio, int bdrv_flags)
+static int raw_set_aio(LinuxAioState **aio_ctx, int *use_aio, int bdrv_flags)
 {
     int ret = -1;
     assert(aio_ctx != NULL);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext
  2016-04-07 16:33 [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Paolo Bonzini
                   ` (6 preceding siblings ...)
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 7/8] linux-aio: make it more type safe Paolo Bonzini
@ 2016-04-07 16:33 ` Paolo Bonzini
  2016-04-19  9:09   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2016-05-11 13:23   ` [Qemu-devel] " Stefan Hajnoczi
  2016-04-19  9:10 ` [Qemu-devel] [Qemu-block] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Stefan Hajnoczi
  8 siblings, 2 replies; 19+ messages in thread
From: Paolo Bonzini @ 2016-04-07 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, famz, qemu-block, stefanha

This has better performance because it executes fewer system calls
and does not use a bottom half per disk.

Originally proposed by Ming Lei.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c                            |  23 +++++++
 block/linux-aio.c                  |   3 +
 block/raw-posix.c                  | 119 +++++--------------------------------
 block/raw-win32.c                  |   2 +-
 include/block/aio.h                |  13 ++++
 {block => include/block}/raw-aio.h |   0
 6 files changed, 54 insertions(+), 106 deletions(-)
 rename {block => include/block}/raw-aio.h (100%)

diff --git a/async.c b/async.c
index b4bf205..6caa98c 100644
--- a/async.c
+++ b/async.c
@@ -29,6 +29,7 @@
 #include "block/thread-pool.h"
 #include "qemu/main-loop.h"
 #include "qemu/atomic.h"
+#include "block/raw-aio.h"
 
 /***********************************************************/
 /* bottom halves (can be seen as timers which expire ASAP) */
@@ -242,6 +243,14 @@ aio_ctx_finalize(GSource     *source)
     qemu_bh_delete(ctx->notify_dummy_bh);
     thread_pool_free(ctx->thread_pool);
 
+#ifdef CONFIG_LINUX_AIO
+    if (ctx->linux_aio) {
+        laio_detach_aio_context(ctx->linux_aio, ctx);
+        laio_cleanup(ctx->linux_aio);
+        ctx->linux_aio = NULL;
+    }
+#endif
+
     qemu_mutex_lock(&ctx->bh_lock);
     while (ctx->first_bh) {
         QEMUBH *next = ctx->first_bh->next;
@@ -282,6 +291,17 @@ ThreadPool *aio_get_thread_pool(AioContext *ctx)
     return ctx->thread_pool;
 }
 
+#ifdef CONFIG_LINUX_AIO
+LinuxAioState *aio_get_linux_aio(AioContext *ctx)
+{
+    if (!ctx->linux_aio) {
+        ctx->linux_aio = laio_init();
+        laio_attach_aio_context(ctx->linux_aio, ctx);
+    }
+    return ctx->linux_aio;
+}
+#endif
+
 void aio_notify(AioContext *ctx)
 {
     /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
@@ -345,6 +365,9 @@ AioContext *aio_context_new(Error **errp)
                            false,
                            (EventNotifierHandler *)
                            event_notifier_dummy_cb);
+#ifdef CONFIG_LINUX_AIO
+    ctx->linux_aio = NULL;
+#endif
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
     rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 90ec98e..b73ca14 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -47,6 +47,8 @@ typedef struct {
 } LaioQueue;
 
 struct LinuxAioState {
+    AioContext *aio_context;
+
     io_context_t ctx;
     EventNotifier e;
 
@@ -283,6 +285,7 @@ void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
 
 void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
 {
+    s->aio_context = new_context;
     s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
     aio_set_event_notifier(new_context, &s->e, false,
                            qemu_laio_completion_cb);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 71ec463..3b052a9 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -32,7 +32,7 @@
 #include "trace.h"
 #include "block/thread-pool.h"
 #include "qemu/iov.h"
-#include "raw-aio.h"
+#include "block/raw-aio.h"
 #include "qapi/util.h"
 #include "qapi/qmp/qstring.h"
 
@@ -137,10 +137,6 @@ typedef struct BDRVRawState {
     int open_flags;
     size_t buf_align;
 
-#ifdef CONFIG_LINUX_AIO
-    int use_aio;
-    LinuxAioState *aio_ctx;
-#endif
 #ifdef CONFIG_XFS
     bool is_xfs:1;
 #endif
@@ -154,9 +150,6 @@ typedef struct BDRVRawState {
 typedef struct BDRVRawReopenState {
     int fd;
     int open_flags;
-#ifdef CONFIG_LINUX_AIO
-    int use_aio;
-#endif
 } BDRVRawReopenState;
 
 static int fd_open(BlockDriverState *bs);
@@ -374,58 +367,15 @@ static void raw_parse_flags(int bdrv_flags, int *open_flags)
     }
 }
 
-static void raw_detach_aio_context(BlockDriverState *bs)
-{
 #ifdef CONFIG_LINUX_AIO
-    BDRVRawState *s = bs->opaque;
-
-    if (s->use_aio) {
-        laio_detach_aio_context(s->aio_ctx, bdrv_get_aio_context(bs));
-    }
-#endif
-}
-
-static void raw_attach_aio_context(BlockDriverState *bs,
-                                   AioContext *new_context)
+static bool raw_use_aio(int bdrv_flags)
 {
-#ifdef CONFIG_LINUX_AIO
-    BDRVRawState *s = bs->opaque;
-
-    if (s->use_aio) {
-        laio_attach_aio_context(s->aio_ctx, new_context);
-    }
-#endif
-}
-
-#ifdef CONFIG_LINUX_AIO
-static int raw_set_aio(LinuxAioState **aio_ctx, int *use_aio, int bdrv_flags)
-{
-    int ret = -1;
-    assert(aio_ctx != NULL);
-    assert(use_aio != NULL);
     /*
      * Currently Linux do AIO only for files opened with O_DIRECT
      * specified so check NOCACHE flag too
      */
-    if ((bdrv_flags & (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) ==
-                      (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) {
-
-        /* if non-NULL, laio_init() has already been run */
-        if (*aio_ctx == NULL) {
-            *aio_ctx = laio_init();
-            if (!*aio_ctx) {
-                goto error;
-            }
-        }
-        *use_aio = 1;
-    } else {
-        *use_aio = 0;
-    }
-
-    ret = 0;
-
-error:
-    return ret;
+    return (bdrv_flags & (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) ==
+                         (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO);
 }
 #endif
 
@@ -494,13 +444,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     s->fd = fd;
 
 #ifdef CONFIG_LINUX_AIO
-    if (raw_set_aio(&s->aio_ctx, &s->use_aio, bdrv_flags)) {
-        qemu_close(fd);
-        ret = -errno;
-        error_setg_errno(errp, -ret, "Could not set AIO state");
-        goto fail;
-    }
-    if (!s->use_aio && (bdrv_flags & BDRV_O_NATIVE_AIO)) {
+    if (!raw_use_aio(bdrv_flags) && (bdrv_flags & BDRV_O_NATIVE_AIO)) {
         error_setg(errp, "aio=native was specified, but it requires "
                          "cache.direct=on, which was not specified.");
         ret = -EINVAL;
@@ -566,8 +510,6 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
 #endif
 
-    raw_attach_aio_context(bs, bdrv_get_aio_context(bs));
-
     ret = 0;
 fail:
     if (filename && (bdrv_flags & BDRV_O_TEMPORARY)) {
@@ -608,18 +550,6 @@ static int raw_reopen_prepare(BDRVReopenState *state,
     state->opaque = g_new0(BDRVRawReopenState, 1);
     raw_s = state->opaque;
 
-#ifdef CONFIG_LINUX_AIO
-    raw_s->use_aio = s->use_aio;
-
-    /* we can use s->aio_ctx instead of a copy, because the use_aio flag is
-     * valid in the 'false' condition even if aio_ctx is set, and raw_set_aio()
-     * won't override aio_ctx if aio_ctx is non-NULL */
-    if (raw_set_aio(&s->aio_ctx, &raw_s->use_aio, state->flags)) {
-        error_setg(errp, "Could not set AIO state");
-        return -1;
-    }
-#endif
-
     if (s->type == FTYPE_CD) {
         raw_s->open_flags |= O_NONBLOCK;
     }
@@ -702,9 +632,6 @@ static void raw_reopen_commit(BDRVReopenState *state)
 
     qemu_close(s->fd);
     s->fd = raw_s->fd;
-#ifdef CONFIG_LINUX_AIO
-    s->use_aio = raw_s->use_aio;
-#endif
 
     g_free(state->opaque);
     state->opaque = NULL;
@@ -1319,8 +1246,9 @@ static BlockAIOCB *raw_aio_submit(BlockDriverState *bs,
         if (!bdrv_qiov_is_aligned(bs, qiov)) {
             type |= QEMU_AIO_MISALIGNED;
 #ifdef CONFIG_LINUX_AIO
-        } else if (s->use_aio) {
-            return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
+        } else if (bs->open_flags & BDRV_O_NATIVE_AIO) {
+            LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
+            return laio_submit(bs, aio, s->fd, sector_num, qiov,
                                nb_sectors, cb, opaque, type);
 #endif
         }
@@ -1333,9 +1261,9 @@ static BlockAIOCB *raw_aio_submit(BlockDriverState *bs,
 static void raw_aio_plug(BlockDriverState *bs)
 {
 #ifdef CONFIG_LINUX_AIO
-    BDRVRawState *s = bs->opaque;
-    if (s->use_aio) {
-        laio_io_plug(bs, s->aio_ctx);
+    if (bs->open_flags & BDRV_O_NATIVE_AIO) {
+        LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
+        laio_io_plug(bs, aio);
     }
 #endif
 }
@@ -1343,9 +1271,9 @@ static void raw_aio_plug(BlockDriverState *bs)
 static void raw_aio_unplug(BlockDriverState *bs)
 {
 #ifdef CONFIG_LINUX_AIO
-    BDRVRawState *s = bs->opaque;
-    if (s->use_aio) {
-        laio_io_unplug(bs, s->aio_ctx);
+    if (bs->open_flags & BDRV_O_NATIVE_AIO) {
+        LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
+        laio_io_unplug(bs, aio);
     }
 #endif
 }
@@ -1381,13 +1309,6 @@ static void raw_close(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
 
-    raw_detach_aio_context(bs);
-
-#ifdef CONFIG_LINUX_AIO
-    if (s->use_aio) {
-        laio_cleanup(s->aio_ctx);
-    }
-#endif
     if (s->fd >= 0) {
         qemu_close(s->fd);
         s->fd = -1;
@@ -1946,9 +1867,6 @@ BlockDriver bdrv_file = {
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
 
-    .bdrv_detach_aio_context = raw_detach_aio_context,
-    .bdrv_attach_aio_context = raw_attach_aio_context,
-
     .create_opts = &raw_create_opts,
 };
 
@@ -2396,9 +2314,6 @@ static BlockDriver bdrv_host_device = {
     .bdrv_probe_blocksizes = hdev_probe_blocksizes,
     .bdrv_probe_geometry = hdev_probe_geometry,
 
-    .bdrv_detach_aio_context = raw_detach_aio_context,
-    .bdrv_attach_aio_context = raw_attach_aio_context,
-
     /* generic scsi device */
 #ifdef __linux__
     .bdrv_aio_ioctl     = hdev_aio_ioctl,
@@ -2523,9 +2438,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
 
-    .bdrv_detach_aio_context = raw_detach_aio_context,
-    .bdrv_attach_aio_context = raw_attach_aio_context,
-
     /* removable device support */
     .bdrv_is_inserted   = cdrom_is_inserted,
     .bdrv_eject         = cdrom_eject,
@@ -2658,9 +2570,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
 
-    .bdrv_detach_aio_context = raw_detach_aio_context,
-    .bdrv_attach_aio_context = raw_attach_aio_context,
-
     /* removable device support */
     .bdrv_is_inserted   = cdrom_is_inserted,
     .bdrv_eject         = cdrom_eject,
diff --git a/block/raw-win32.c b/block/raw-win32.c
index fd23891..ce77432 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -27,7 +27,7 @@
 #include "qemu/timer.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
-#include "raw-aio.h"
+#include "block/raw-aio.h"
 #include "trace.h"
 #include "block/thread-pool.h"
 #include "qemu/iov.h"
diff --git a/include/block/aio.h b/include/block/aio.h
index 88a64ee..afd72a7 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -47,6 +47,9 @@ typedef struct AioHandler AioHandler;
 typedef void QEMUBHFunc(void *opaque);
 typedef void IOHandler(void *opaque);
 
+struct ThreadPool;
+struct LinuxAioState;
+
 struct AioContext {
     GSource source;
 
@@ -119,6 +122,13 @@ struct AioContext {
     /* Thread pool for performing work and receiving completion callbacks */
     struct ThreadPool *thread_pool;
 
+#ifdef CONFIG_LINUX_AIO
+    /* State for native Linux AIO.  Uses aio_context_acquire/release for
+     * locking.
+     */
+    struct LinuxAioState *linux_aio;
+#endif
+
     /* TimerLists for calling timers - one per clock type */
     QEMUTimerListGroup tlg;
 
@@ -335,6 +345,9 @@ GSource *aio_get_g_source(AioContext *ctx);
 /* Return the ThreadPool bound to this AioContext */
 struct ThreadPool *aio_get_thread_pool(AioContext *ctx);
 
+/* Return the LinuxAioState bound to this AioContext */
+struct LinuxAioState *aio_get_linux_aio(AioContext *ctx);
+
 /**
  * aio_timer_new:
  * @ctx: the aio context
diff --git a/block/raw-aio.h b/include/block/raw-aio.h
similarity index 100%
rename from block/raw-aio.h
rename to include/block/raw-aio.h
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext Paolo Bonzini
@ 2016-04-19  9:09   ` Stefan Hajnoczi
  2016-05-09 16:31     ` Paolo Bonzini
  2016-05-11 13:23   ` [Qemu-devel] " Stefan Hajnoczi
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2016-04-19  9:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, famz, qemu-block, stefanha

[-- Attachment #1: Type: text/plain, Size: 349 bytes --]

On Thu, Apr 07, 2016 at 06:33:36PM +0200, Paolo Bonzini wrote:
> This has better performance because it executes fewer system calls
> and does not use a bottom half per disk.

Each aio_context_t is initialized for 128 in-flight requests in
laio_init().

Will it be possible to hit the limit now that all drives share the same
aio_context_t?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState
  2016-04-07 16:33 [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Paolo Bonzini
                   ` (7 preceding siblings ...)
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext Paolo Bonzini
@ 2016-04-19  9:10 ` Stefan Hajnoczi
  2016-04-19 15:09   ` Kevin Wolf
  8 siblings, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2016-04-19  9:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, famz, qemu-block, stefanha

[-- Attachment #1: Type: text/plain, Size: 1976 bytes --]

On Thu, Apr 07, 2016 at 06:33:28PM +0200, Paolo Bonzini wrote:
> Patch 1 comes from Kevin's series to do BlockBackend throttling.
> 
> Patches 2-6 are from my bdrv_drain patches.  They apply on top of Fam's
> patch (which will be in 2.6) that introduces bdrv_co_drain.  Patch 4
> is new in this version, compared to v3.
> 
> Patches 7-8 are new but based on Ming Lei's old submission.
> I'm including them here because they apply on top of patches 2-6.
> 
> Kevin Wolf (1):
>   block: Don't disable I/O throttling on sync requests
> 
> Paolo Bonzini (7):
>   block: make bdrv_start_throttled_reqs return void
>   block: move restarting of throttled reqs to block/throttle-groups.c
>   block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from
>     bdrv_drain/bdrv_co_drain
>   block: introduce bdrv_no_throttling_begin/end
>   block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
>   linux-aio: make it more type safe
>   linux-aio: share one LinuxAioState within an AioContext
> 
>  async.c                            |  23 +++++
>  block.c                            |   1 -
>  block/block-backend.c              |   6 +-
>  block/io.c                         | 168 ++++++++++++++++++++++---------------
>  block/linux-aio.c                  |  60 ++++++-------
>  block/raw-posix.c                  | 133 ++++-------------------------
>  block/raw-win32.c                  |   2 +-
>  block/throttle-groups.c            |  18 ++++
>  include/block/aio.h                |  13 +++
>  include/block/block.h              |   3 +-
>  include/block/block_int.h          |  14 +++-
>  {block => include/block}/raw-aio.h |  15 ++--
>  include/block/throttle-groups.h    |   1 +
>  13 files changed, 216 insertions(+), 241 deletions(-)
>  rename {block => include/block}/raw-aio.h (80%)

Looks good overall.  I have reservations about the last patch, see my
reply there.

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState
  2016-04-19  9:10 ` [Qemu-devel] [Qemu-block] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Stefan Hajnoczi
@ 2016-04-19 15:09   ` Kevin Wolf
  0 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2016-04-19 15:09 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel, famz, qemu-block, stefanha

[-- Attachment #1: Type: text/plain, Size: 2174 bytes --]

Am 19.04.2016 um 11:10 hat Stefan Hajnoczi geschrieben:
> On Thu, Apr 07, 2016 at 06:33:28PM +0200, Paolo Bonzini wrote:
> > Patch 1 comes from Kevin's series to do BlockBackend throttling.
> > 
> > Patches 2-6 are from my bdrv_drain patches.  They apply on top of Fam's
> > patch (which will be in 2.6) that introduces bdrv_co_drain.  Patch 4
> > is new in this version, compared to v3.
> > 
> > Patches 7-8 are new but based on Ming Lei's old submission.
> > I'm including them here because they apply on top of patches 2-6.
> > 
> > Kevin Wolf (1):
> >   block: Don't disable I/O throttling on sync requests
> > 
> > Paolo Bonzini (7):
> >   block: make bdrv_start_throttled_reqs return void
> >   block: move restarting of throttled reqs to block/throttle-groups.c
> >   block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from
> >     bdrv_drain/bdrv_co_drain
> >   block: introduce bdrv_no_throttling_begin/end
> >   block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
> >   linux-aio: make it more type safe
> >   linux-aio: share one LinuxAioState within an AioContext
> > 
> >  async.c                            |  23 +++++
> >  block.c                            |   1 -
> >  block/block-backend.c              |   6 +-
> >  block/io.c                         | 168 ++++++++++++++++++++++---------------
> >  block/linux-aio.c                  |  60 ++++++-------
> >  block/raw-posix.c                  | 133 ++++-------------------------
> >  block/raw-win32.c                  |   2 +-
> >  block/throttle-groups.c            |  18 ++++
> >  include/block/aio.h                |  13 +++
> >  include/block/block.h              |   3 +-
> >  include/block/block_int.h          |  14 +++-
> >  {block => include/block}/raw-aio.h |  15 ++--
> >  include/block/throttle-groups.h    |   1 +
> >  13 files changed, 216 insertions(+), 241 deletions(-)
> >  rename {block => include/block}/raw-aio.h (80%)
> 
> Looks good overall.  I have reservations about the last patch, see my
> reply there.
> 
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

Thanks, applied patches 1-7 to block-next.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext
  2016-04-19  9:09   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2016-05-09 16:31     ` Paolo Bonzini
  2016-05-10  9:30       ` Stefan Hajnoczi
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2016-05-09 16:31 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, kwolf, famz, qemu-block, stefanha



On 19/04/2016 11:09, Stefan Hajnoczi wrote:
>> > This has better performance because it executes fewer system calls
>> > and does not use a bottom half per disk.
> Each aio_context_t is initialized for 128 in-flight requests in
> laio_init().
> 
> Will it be possible to hit the limit now that all drives share the same
> aio_context_t?

It was also possible before, because the virtqueue can be bigger than
128 items; that's why there is logic to submit I/O requests after an
io_get_events.  As usual when the answer seems trivial, am I
misunderstanding your question?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext
  2016-05-09 16:31     ` Paolo Bonzini
@ 2016-05-10  9:30       ` Stefan Hajnoczi
  2016-05-10  9:40         ` Kevin Wolf
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2016-05-10  9:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, famz, qemu-block, stefanha

[-- Attachment #1: Type: text/plain, Size: 2467 bytes --]

On Mon, May 09, 2016 at 06:31:44PM +0200, Paolo Bonzini wrote:
> On 19/04/2016 11:09, Stefan Hajnoczi wrote:
> >> > This has better performance because it executes fewer system calls
> >> > and does not use a bottom half per disk.
> > Each aio_context_t is initialized for 128 in-flight requests in
> > laio_init().
> > 
> > Will it be possible to hit the limit now that all drives share the same
> > aio_context_t?
> 
> It was also possible before, because the virtqueue can be bigger than
> 128 items; that's why there is logic to submit I/O requests after an
> io_get_events.  As usual when the answer seems trivial, am I
> misunderstanding your question?

I'm concerned about a performance regression rather than correctness.

But looking at linux-aio.c there *is* a correctness problem:

  static void ioq_submit(struct qemu_laio_state *s)
  {
      int ret, len;
      struct qemu_laiocb *aiocb;
      struct iocb *iocbs[MAX_QUEUED_IO];
      QSIMPLEQ_HEAD(, qemu_laiocb) completed;

      do {
          len = 0;
          QSIMPLEQ_FOREACH(aiocb, &s->io_q.pending, next) {
              iocbs[len++] = &aiocb->iocb;
              if (len == MAX_QUEUED_IO) {
                  break;
              }
          }

          ret = io_submit(s->ctx, len, iocbs);
          if (ret == -EAGAIN) {
              break;
          }
          if (ret < 0) {
              abort();
          }

          s->io_q.n -= ret;
          aiocb = container_of(iocbs[ret - 1], struct qemu_laiocb, iocb);
          QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
      } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
      s->io_q.blocked = (s->io_q.n > 0);
  }

io_submit() may have submitted some of the requests when -EAGAIN is
returned.  QEMU gets no indication of which requests were submitted.  It
may be possible to dig around in the s->ctx rings to find out or we need
to keep track of the number of in-flight requests so we can prevent ever
hitting EAGAIN.

ioq_submit() pretends that no requests were submitted on -EAGAIN and
will submit them again next time.  This could result in double
completions.

Regarding performance, I'm thinking about a guest with 8 disks (queue
depth 32).  The worst case is when the guest submits 32 requests at once
but the Linux AIO event limit has already been reached.  Then the disk
is starved until other disks' requests complete.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext
  2016-05-10  9:30       ` Stefan Hajnoczi
@ 2016-05-10  9:40         ` Kevin Wolf
  2016-05-10 10:32           ` Paolo Bonzini
  2016-05-11 13:18           ` Stefan Hajnoczi
  0 siblings, 2 replies; 19+ messages in thread
From: Kevin Wolf @ 2016-05-10  9:40 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel, famz, qemu-block, stefanha

[-- Attachment #1: Type: text/plain, Size: 2968 bytes --]

Am 10.05.2016 um 11:30 hat Stefan Hajnoczi geschrieben:
> On Mon, May 09, 2016 at 06:31:44PM +0200, Paolo Bonzini wrote:
> > On 19/04/2016 11:09, Stefan Hajnoczi wrote:
> > >> > This has better performance because it executes fewer system calls
> > >> > and does not use a bottom half per disk.
> > > Each aio_context_t is initialized for 128 in-flight requests in
> > > laio_init().
> > > 
> > > Will it be possible to hit the limit now that all drives share the same
> > > aio_context_t?
> > 
> > It was also possible before, because the virtqueue can be bigger than
> > 128 items; that's why there is logic to submit I/O requests after an
> > io_get_events.  As usual when the answer seems trivial, am I
> > misunderstanding your question?
> 
> I'm concerned about a performance regression rather than correctness.
> 
> But looking at linux-aio.c there *is* a correctness problem:
> 
>   static void ioq_submit(struct qemu_laio_state *s)
>   {
>       int ret, len;
>       struct qemu_laiocb *aiocb;
>       struct iocb *iocbs[MAX_QUEUED_IO];
>       QSIMPLEQ_HEAD(, qemu_laiocb) completed;
> 
>       do {
>           len = 0;
>           QSIMPLEQ_FOREACH(aiocb, &s->io_q.pending, next) {
>               iocbs[len++] = &aiocb->iocb;
>               if (len == MAX_QUEUED_IO) {
>                   break;
>               }
>           }
> 
>           ret = io_submit(s->ctx, len, iocbs);
>           if (ret == -EAGAIN) {
>               break;
>           }
>           if (ret < 0) {
>               abort();
>           }
> 
>           s->io_q.n -= ret;
>           aiocb = container_of(iocbs[ret - 1], struct qemu_laiocb, iocb);
>           QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
>       } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
>       s->io_q.blocked = (s->io_q.n > 0);
>   }
> 
> io_submit() may have submitted some of the requests when -EAGAIN is
> returned.  QEMU gets no indication of which requests were submitted.

My understanding (which is based on the manpage rather than code) is
that -EAGAIN is only returned if no request could be submitted. In other
cases, the number of submitted requests is returned (similar to how
short reads work).

> It may be possible to dig around in the s->ctx rings to find out or we
> need to keep track of the number of in-flight requests so we can
> prevent ever hitting EAGAIN.
> 
> ioq_submit() pretends that no requests were submitted on -EAGAIN and
> will submit them again next time.  This could result in double
> completions.

Did you check in the code that this can happen?

> Regarding performance, I'm thinking about a guest with 8 disks (queue
> depth 32).  The worst case is when the guest submits 32 requests at once
> but the Linux AIO event limit has already been reached.  Then the disk
> is starved until other disks' requests complete.

Sounds like a valid concern.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext
  2016-05-10  9:40         ` Kevin Wolf
@ 2016-05-10 10:32           ` Paolo Bonzini
  2016-05-11 13:22             ` Stefan Hajnoczi
  2016-05-11 13:18           ` Stefan Hajnoczi
  1 sibling, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2016-05-10 10:32 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi; +Cc: qemu-devel, famz, qemu-block, stefanha



On 10/05/2016 11:40, Kevin Wolf wrote:
> > Regarding performance, I'm thinking about a guest with 8 disks (queue
> > depth 32).  The worst case is when the guest submits 32 requests at once
> > but the Linux AIO event limit has already been reached.  Then the disk
> > is starved until other disks' requests complete.
>
> Sounds like a valid concern.

Oh, so you're concerned about the non-dataplane case.  My suspicion is
that, with such a number of outstanding I/O, you probably have bad
performance unless you use dataplane.

Also, aio=threads has had a 64 I/O limit for years and we've never heard
about problems with stuck I/O.  But I agree that it's one area to keep
an eye on.

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext
  2016-05-10  9:40         ` Kevin Wolf
  2016-05-10 10:32           ` Paolo Bonzini
@ 2016-05-11 13:18           ` Stefan Hajnoczi
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2016-05-11 13:18 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, Paolo Bonzini, qemu-devel, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3335 bytes --]

On Tue, May 10, 2016 at 11:40:33AM +0200, Kevin Wolf wrote:
> Am 10.05.2016 um 11:30 hat Stefan Hajnoczi geschrieben:
> > On Mon, May 09, 2016 at 06:31:44PM +0200, Paolo Bonzini wrote:
> > > On 19/04/2016 11:09, Stefan Hajnoczi wrote:
> > > >> > This has better performance because it executes fewer system calls
> > > >> > and does not use a bottom half per disk.
> > > > Each aio_context_t is initialized for 128 in-flight requests in
> > > > laio_init().
> > > > 
> > > > Will it be possible to hit the limit now that all drives share the same
> > > > aio_context_t?
> > > 
> > > It was also possible before, because the virtqueue can be bigger than
> > > 128 items; that's why there is logic to submit I/O requests after an
> > > io_get_events.  As usual when the answer seems trivial, am I
> > > misunderstanding your question?
> > 
> > I'm concerned about a performance regression rather than correctness.
> > 
> > But looking at linux-aio.c there *is* a correctness problem:
> > 
> >   static void ioq_submit(struct qemu_laio_state *s)
> >   {
> >       int ret, len;
> >       struct qemu_laiocb *aiocb;
> >       struct iocb *iocbs[MAX_QUEUED_IO];
> >       QSIMPLEQ_HEAD(, qemu_laiocb) completed;
> > 
> >       do {
> >           len = 0;
> >           QSIMPLEQ_FOREACH(aiocb, &s->io_q.pending, next) {
> >               iocbs[len++] = &aiocb->iocb;
> >               if (len == MAX_QUEUED_IO) {
> >                   break;
> >               }
> >           }
> > 
> >           ret = io_submit(s->ctx, len, iocbs);
> >           if (ret == -EAGAIN) {
> >               break;
> >           }
> >           if (ret < 0) {
> >               abort();
> >           }
> > 
> >           s->io_q.n -= ret;
> >           aiocb = container_of(iocbs[ret - 1], struct qemu_laiocb, iocb);
> >           QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
> >       } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
> >       s->io_q.blocked = (s->io_q.n > 0);
> >   }
> > 
> > io_submit() may have submitted some of the requests when -EAGAIN is
> > returned.  QEMU gets no indication of which requests were submitted.
> 
> My understanding (which is based on the manpage rather than code) is
> that -EAGAIN is only returned if no request could be submitted. In other
> cases, the number of submitted requests is returned (similar to how
> short reads work).

I misread the code:

  /*
   * AKPM: should this return a partial result if some of the IOs were
   * successfully submitted?
   */
  for (i=0; i<nr; i++) {
          struct iocb __user *user_iocb;
          struct iocb tmp;

          if (unlikely(__get_user(user_iocb, iocbpp + i))) {
                  ret = -EFAULT;
                  break;
          }

          if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
                  ret = -EFAULT;
                  break;
          }

          ret = io_submit_one(ctx, user_iocb, &tmp, compat);
          if (ret)
                  break;
  }
  blk_finish_plug(&plug);

  percpu_ref_put(&ctx->users);
  return i ? i : ret;

You are right that it will return the number of submitted requests (and
no errno) if a failure occurs partway through.

So the "bug" I found does not exist.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext
  2016-05-10 10:32           ` Paolo Bonzini
@ 2016-05-11 13:22             ` Stefan Hajnoczi
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2016-05-11 13:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 870 bytes --]

On Tue, May 10, 2016 at 12:32:27PM +0200, Paolo Bonzini wrote:
> On 10/05/2016 11:40, Kevin Wolf wrote:
> > > Regarding performance, I'm thinking about a guest with 8 disks (queue
> > > depth 32).  The worst case is when the guest submits 32 requests at once
> > > but the Linux AIO event limit has already been reached.  Then the disk
> > > is starved until other disks' requests complete.
> >
> > Sounds like a valid concern.
> 
> Oh, so you're concerned about the non-dataplane case.  My suspicion is
> that, with such a number of outstanding I/O, you probably have bad
> performance unless you use dataplane.
> 
> Also, aio=threads has had a 64 I/O limit for years and we've never heard
> about problems with stuck I/O.  But I agree that it's one area to keep
> an eye on.

Good point.  In that case I don't feel so bad if we merge this.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext
  2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext Paolo Bonzini
  2016-04-19  9:09   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2016-05-11 13:23   ` Stefan Hajnoczi
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2016-05-11 13:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, famz, qemu-block, berto

[-- Attachment #1: Type: text/plain, Size: 772 bytes --]

On Thu, Apr 07, 2016 at 06:33:36PM +0200, Paolo Bonzini wrote:
> This has better performance because it executes fewer system calls
> and does not use a bottom half per disk.
> 
> Originally proposed by Ming Lei.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  async.c                            |  23 +++++++
>  block/linux-aio.c                  |   3 +
>  block/raw-posix.c                  | 119 +++++--------------------------------
>  block/raw-win32.c                  |   2 +-
>  include/block/aio.h                |  13 ++++
>  {block => include/block}/raw-aio.h |   0
>  6 files changed, 54 insertions(+), 106 deletions(-)
>  rename {block => include/block}/raw-aio.h (100%)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-05-11 13:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-07 16:33 [Qemu-devel] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Paolo Bonzini
2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 1/8] block: Don't disable I/O throttling on sync requests Paolo Bonzini
2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 2/8] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 3/8] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 4/8] block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain Paolo Bonzini
2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 5/8] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 6/8] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Paolo Bonzini
2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 7/8] linux-aio: make it more type safe Paolo Bonzini
2016-04-07 16:33 ` [Qemu-devel] [PATCH v4 8/8] linux-aio: share one LinuxAioState within an AioContext Paolo Bonzini
2016-04-19  9:09   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-05-09 16:31     ` Paolo Bonzini
2016-05-10  9:30       ` Stefan Hajnoczi
2016-05-10  9:40         ` Kevin Wolf
2016-05-10 10:32           ` Paolo Bonzini
2016-05-11 13:22             ` Stefan Hajnoczi
2016-05-11 13:18           ` Stefan Hajnoczi
2016-05-11 13:23   ` [Qemu-devel] " Stefan Hajnoczi
2016-04-19  9:10 ` [Qemu-devel] [Qemu-block] [PATCH v4 0/8] bdrv_flush_io_queue removal, shared LinuxAioState Stefan Hajnoczi
2016-04-19 15:09   ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.