All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer
@ 2014-05-01 14:54 Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 01/22] block: use BlockDriverState AioContext Stefan Hajnoczi
                   ` (23 more replies)
  0 siblings, 24 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

This patch series switches virtio-blk data-plane from a custom Linux AIO
request queue to the QEMU block layer.  The previous "raw files only"
limitation is lifted.  All image formats and protocols can now be used with
virtio-blk data-plane.

How to review this series
-------------------------
I CCed the maintainer of each block driver that I modified.  You probably don't
need to review the entire series, just your patch.

>From now on fd handlers, timers, BHs, and event loop wait must explicitly use
BlockDriverState's AioContext instead of the main loop.  Use
bdrv_get_aio_context(bs) to get the AioContext.  The following function calls
need to be converted:

 * qemu_aio_set_fd_handler() -> aio_set_fd_handler()
 * timer_new*() -> aio_timer_new()
 * qemu_bh_new() -> aio_bh_new()
 * qemu_aio_wait() -> aio_poll(aio_context, true)

For simple block drivers this modification suffices and it is now safe to use
outside the QEMU global mutex.

Block drivers that keep fd handlers, timers, or BHs registered when requests
have been drained need a little bit more work.  Examples of this are network
block drivers with keepalive timers, like iSCSI.

This series adds a new bdrv_set_aio_context(bs, aio_context) function that
moves a BlockDriverState into a new AioContext.  This function calls the block
driver's optional .bdrv_detach_aio_context() and .bdrv_attach_aio_context()
functions.  Implement detach/attach to move the fd handlers, timers, or BHs to
the new AioContext.

Finally, block drivers that manage their own child nodes also need to
implement detach/attach because the generic block layer doesn't know about
their children.  Both ->file and ->backing_hd are automatically taken care of
but blkverify, quorum, and VMDK need to manually propagate detach/attach to
their children.

I have audited and modified all block drivers.  Block driver maintainers,
please check I did it correctly and didn't break your code.

Background
----------
The block layer is currently tied to the QEMU main loop for fd handlers, timer
callbacks, and BHs.  This means that even on hosts with many cores, parts of
block I/O processing happen in one thread and depend on the QEMU global mutex.

virtio-blk data-plane has shown that 1,000,000 IOPS is achievable if we use
additional threads that are not under the QEMU global mutex.

It is necessary to make the QEMU block layer aware that there may be more than
one event loop.  This way BlockDriverState can be used from a thread without
contention on the QEMU global mutex.

This series builds on the aio_context_acquire/release() interface that allows a
thread to temporarily grab an AioContext.  We add bdrv_set_aio_context(bs,
aio_context) for changing which AioContext a BlockDriverState uses.

The final patches convert virtio-blk data-plane to use the QEMU block layer and
let the BlockDriverState run in the IOThread AioContext.

What's next?
------------
I have already made block I/O throttling work in another AioContext and will
send the series out next week.

In order to keep this series reviewable, I'm holding back those patches for
now.  One could say, "throttling" them.

Thank you, thank you, I'll be here all night!

Stefan Hajnoczi (22):
  block: use BlockDriverState AioContext
  block: acquire AioContext in bdrv_close_all()
  block: add bdrv_set_aio_context()
  blkdebug: use BlockDriverState's AioContext
  blkverify: implement .bdrv_detach/attach_aio_context()
  curl: implement .bdrv_detach/attach_aio_context()
  gluster: use BlockDriverState's AioContext
  iscsi: implement .bdrv_detach/attach_aio_context()
  nbd: implement .bdrv_detach/attach_aio_context()
  nfs: implement .bdrv_detach/attach_aio_context()
  qed: use BlockDriverState's AioContext
  quorum: implement .bdrv_detach/attach_aio_context()
  block/raw-posix: implement .bdrv_detach/attach_aio_context()
  block/linux-aio: fix memory and fd leak
  rbd: use BlockDriverState's AioContext
  sheepdog: implement .bdrv_detach/attach_aio_context()
  ssh: use BlockDriverState's AioContext
  vmdk: implement .bdrv_detach/attach_aio_context()
  dataplane: use the QEMU block layer for I/O
  dataplane: delete IOQueue since it is no longer used
  dataplane: implement async flush
  raw-posix: drop raw_get_aio_fd() since it is no longer used

 block.c                          |  88 +++++++++++++--
 block/blkdebug.c                 |   2 +-
 block/blkverify.c                |  47 +++++---
 block/curl.c                     | 194 +++++++++++++++++++-------------
 block/gluster.c                  |   7 +-
 block/iscsi.c                    |  79 +++++++++----
 block/linux-aio.c                |  24 +++-
 block/nbd-client.c               |  24 +++-
 block/nbd-client.h               |   4 +
 block/nbd.c                      |  87 +++++++++------
 block/nfs.c                      |  80 ++++++++++----
 block/qed-table.c                |   8 +-
 block/qed.c                      |  35 +++++-
 block/quorum.c                   |  48 ++++++--
 block/raw-aio.h                  |   3 +
 block/raw-posix.c                |  82 ++++++++------
 block/rbd.c                      |   5 +-
 block/sheepdog.c                 | 118 +++++++++++++-------
 block/ssh.c                      |  36 +++---
 block/vmdk.c                     |  23 ++++
 hw/block/dataplane/Makefile.objs |   2 +-
 hw/block/dataplane/ioq.c         | 117 --------------------
 hw/block/dataplane/ioq.h         |  57 ----------
 hw/block/dataplane/virtio-blk.c  | 233 +++++++++++++++------------------------
 include/block/block.h            |  20 ++--
 include/block/block_int.h        |  36 ++++++
 26 files changed, 829 insertions(+), 630 deletions(-)
 delete mode 100644 hw/block/dataplane/ioq.c
 delete mode 100644 hw/block/dataplane/ioq.h

-- 
1.9.0

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 01/22] block: use BlockDriverState AioContext
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 02/22] block: acquire AioContext in bdrv_close_all() Stefan Hajnoczi
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

Drop the assumption that we're using the main AioContext.  Convert
qemu_aio_wait() to aio_poll() and qemu_bh_new() to aio_bh_new() so the
BlockDriverState AioContext is used.

Note there is still one qemu_aio_wait() left in bdrv_create() but we do
not have a BlockDriverState there and only main loop code invokes this
function.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 4745712..94999e5 100644
--- a/block.c
+++ b/block.c
@@ -2682,10 +2682,12 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
         /* Fast-path if already in coroutine context */
         bdrv_rw_co_entry(&rwco);
     } else {
+        AioContext *aio_context = bdrv_get_aio_context(bs);
+
         co = qemu_coroutine_create(bdrv_rw_co_entry);
         qemu_coroutine_enter(co, &rwco);
         while (rwco.ret == NOT_DONE) {
-            qemu_aio_wait();
+            aio_poll(aio_context, true);
         }
     }
     return rwco.ret;
@@ -3903,10 +3905,12 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
         /* Fast-path if already in coroutine context */
         bdrv_get_block_status_co_entry(&data);
     } else {
+        AioContext *aio_context = bdrv_get_aio_context(bs);
+
         co = qemu_coroutine_create(bdrv_get_block_status_co_entry);
         qemu_coroutine_enter(co, &data);
         while (!data.done) {
-            qemu_aio_wait();
+            aio_poll(aio_context, true);
         }
     }
     return data.ret;
@@ -4501,7 +4505,7 @@ static BlockDriverAIOCB *bdrv_aio_rw_vector(BlockDriverState *bs,
     acb->is_write = is_write;
     acb->qiov = qiov;
     acb->bounce = qemu_blockalign(bs, qiov->size);
-    acb->bh = qemu_bh_new(bdrv_aio_bh_cb, acb);
+    acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_aio_bh_cb, acb);
 
     if (is_write) {
         qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
@@ -4540,13 +4544,14 @@ typedef struct BlockDriverAIOCBCoroutine {
 
 static void bdrv_aio_co_cancel_em(BlockDriverAIOCB *blockacb)
 {
+    AioContext *aio_context = bdrv_get_aio_context(blockacb->bs);
     BlockDriverAIOCBCoroutine *acb =
         container_of(blockacb, BlockDriverAIOCBCoroutine, common);
     bool done = false;
 
     acb->done = &done;
     while (!done) {
-        qemu_aio_wait();
+        aio_poll(aio_context, true);
     }
 }
 
@@ -4583,7 +4588,7 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
             acb->req.nb_sectors, acb->req.qiov, acb->req.flags);
     }
 
-    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
+    acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb);
     qemu_bh_schedule(acb->bh);
 }
 
@@ -4619,7 +4624,7 @@ static void coroutine_fn bdrv_aio_flush_co_entry(void *opaque)
     BlockDriverState *bs = acb->common.bs;
 
     acb->req.error = bdrv_co_flush(bs);
-    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
+    acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb);
     qemu_bh_schedule(acb->bh);
 }
 
@@ -4646,7 +4651,7 @@ static void coroutine_fn bdrv_aio_discard_co_entry(void *opaque)
     BlockDriverState *bs = acb->common.bs;
 
     acb->req.error = bdrv_co_discard(bs, acb->req.sector, acb->req.nb_sectors);
-    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
+    acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb);
     qemu_bh_schedule(acb->bh);
 }
 
@@ -4886,10 +4891,12 @@ int bdrv_flush(BlockDriverState *bs)
         /* Fast-path if already in coroutine context */
         bdrv_flush_co_entry(&rwco);
     } else {
+        AioContext *aio_context = bdrv_get_aio_context(bs);
+
         co = qemu_coroutine_create(bdrv_flush_co_entry);
         qemu_coroutine_enter(co, &rwco);
         while (rwco.ret == NOT_DONE) {
-            qemu_aio_wait();
+            aio_poll(aio_context, true);
         }
     }
 
@@ -4999,10 +5006,12 @@ int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
         /* Fast-path if already in coroutine context */
         bdrv_discard_co_entry(&rwco);
     } else {
+        AioContext *aio_context = bdrv_get_aio_context(bs);
+
         co = qemu_coroutine_create(bdrv_discard_co_entry);
         qemu_coroutine_enter(co, &rwco);
         while (rwco.ret == NOT_DONE) {
-            qemu_aio_wait();
+            aio_poll(aio_context, true);
         }
     }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 02/22] block: acquire AioContext in bdrv_close_all()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 01/22] block: use BlockDriverState AioContext Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 03/22] block: add bdrv_set_aio_context() Stefan Hajnoczi
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

bdrv_close_all() closes all BlockDriverState instances.  It is called
from vl.c:main() and qemu-nbd when shutting down.

Some BlockDriverState instances may be running in another AioContext.
Make sure to acquire the AioContext before closing the BlockDriverState.

This will protect against race conditions once virtio-blk data-plane is
using the BlockDriverState from another AioContext event loop.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block.c b/block.c
index 94999e5..9381918 100644
--- a/block.c
+++ b/block.c
@@ -1755,7 +1755,11 @@ void bdrv_close_all(void)
     BlockDriverState *bs;
 
     QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
+        AioContext *aio_context = bdrv_get_aio_context(bs);
+
+        aio_context_acquire(aio_context);
         bdrv_close(bs);
+        aio_context_release(aio_context);
     }
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 03/22] block: add bdrv_set_aio_context()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 01/22] block: use BlockDriverState AioContext Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 02/22] block: acquire AioContext in bdrv_close_all() Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 04/22] blkdebug: use BlockDriverState's AioContext Stefan Hajnoczi
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

Up until now all BlockDriverState instances have used the QEMU main loop
for fd handlers, timers, and BHs.  This is not scalable on SMP guests
and hosts so we need to move to a model with multiple event loops on
different host CPUs.

bdrv_set_aio_context() assigns the AioContext event loop to use for a
particular BlockDriverState.  It first detaches the entire
BlockDriverState graph from the current AioContext and then attaches to
the new AioContext.

This function will be used by virtio-blk data-plane to assign a
BlockDriverState to its IOThread AioContext.  Make
bdrv_aio_set_context() public since data-plane should not include
block_int.h.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c                   | 57 +++++++++++++++++++++++++++++++++++++++++++++--
 include/block/block.h     | 11 +++++++++
 include/block/block_int.h | 36 ++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 9381918..9ef87fa 100644
--- a/block.c
+++ b/block.c
@@ -359,6 +359,7 @@ BlockDriverState *bdrv_new(const char *device_name, Error **errp)
     qemu_co_queue_init(&bs->throttled_reqs[0]);
     qemu_co_queue_init(&bs->throttled_reqs[1]);
     bs->refcnt = 1;
+    bs->aio_context = qemu_get_aio_context();
 
     return bs;
 }
@@ -5467,8 +5468,60 @@ out:
 
 AioContext *bdrv_get_aio_context(BlockDriverState *bs)
 {
-    /* Currently BlockDriverState always uses the main loop AioContext */
-    return qemu_get_aio_context();
+    return bs->aio_context;
+}
+
+void bdrv_detach_aio_context(BlockDriverState *bs)
+{
+    if (!bs->drv) {
+        return;
+    }
+
+    if (bs->drv->bdrv_detach_aio_context) {
+        bs->drv->bdrv_detach_aio_context(bs);
+    }
+    if (bs->file) {
+        bdrv_detach_aio_context(bs->file);
+    }
+    if (bs->backing_hd) {
+        bdrv_detach_aio_context(bs->backing_hd);
+    }
+
+    bs->aio_context = NULL;
+}
+
+void bdrv_attach_aio_context(BlockDriverState *bs,
+                             AioContext *new_context)
+{
+    if (!bs->drv) {
+        return;
+    }
+
+    bs->aio_context = new_context;
+
+    if (bs->backing_hd) {
+        bdrv_attach_aio_context(bs->backing_hd, new_context);
+    }
+    if (bs->file) {
+        bdrv_attach_aio_context(bs->file, new_context);
+    }
+    if (bs->drv->bdrv_attach_aio_context) {
+        bs->drv->bdrv_attach_aio_context(bs, new_context);
+    }
+}
+
+void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
+{
+    bdrv_drain_all(); /* ensure there are no in-flight requests */
+
+    bdrv_detach_aio_context(bs);
+
+    /* This function executes in the old AioContext so acquire the new one in
+     * case it runs in a different thread.
+     */
+    aio_context_acquire(new_context);
+    bdrv_attach_aio_context(bs, new_context);
+    aio_context_release(new_context);
 }
 
 void bdrv_add_before_write_notifier(BlockDriverState *bs,
diff --git a/include/block/block.h b/include/block/block.h
index c12808a..5660184 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -541,4 +541,15 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag);
 int bdrv_debug_resume(BlockDriverState *bs, const char *tag);
 bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag);
 
+/**
+ * bdrv_set_aio_context:
+ *
+ * Changes the #AioContext used for fd handlers, timers, and BHs by this
+ * BlockDriverState and all its children.
+ *
+ * This function must be called from the old #AioContext or with a lock held so
+ * the old #AioContext is not executing.
+ */
+void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index cd5bc73..42649fa 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -247,6 +247,19 @@ struct BlockDriver {
      */
     int (*bdrv_has_zero_init)(BlockDriverState *bs);
 
+    /* Remove fd handlers, timers, and other event loop callbacks so the event
+     * loop is no longer in use.  Called with no in-flight requests and in
+     * depth-first traversal order with parents before child nodes.
+     */
+    void (*bdrv_detach_aio_context)(BlockDriverState *bs);
+
+    /* Add fd handlers, timers, and other event loop callbacks so I/O requests
+     * can be processed again.  Called with no in-flight requests and in
+     * depth-first traversal order with child nodes before parent nodes.
+     */
+    void (*bdrv_attach_aio_context)(BlockDriverState *bs,
+                                    AioContext *new_context);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
@@ -295,6 +308,8 @@ struct BlockDriverState {
     const BlockDevOps *dev_ops;
     void *dev_opaque;
 
+    AioContext *aio_context; /* event loop used for fd handlers, timers, etc */
+
     char filename[1024];
     char backing_file[1024]; /* if non zero, the image is a diff of
                                 this file image */
@@ -389,6 +404,27 @@ void bdrv_add_before_write_notifier(BlockDriverState *bs,
  */
 AioContext *bdrv_get_aio_context(BlockDriverState *bs);
 
+/**
+ * bdrv_detach_aio_context:
+ *
+ * May be called from .bdrv_detach_aio_context() to detach children from the
+ * current #AioContext.  This is only needed by block drivers that manage their
+ * own children.  Both ->file and ->backing_hd are automatically handled and
+ * block drivers should not call this function on them explicitly.
+ */
+void bdrv_detach_aio_context(BlockDriverState *bs);
+
+/**
+ * bdrv_attach_aio_context:
+ *
+ * May be called from .bdrv_attach_aio_context() to attach children to the new
+ * #AioContext.  This is only needed by block drivers that manage their own
+ * children.  Both ->file and ->backing_hd are automatically handled and block
+ * drivers should not call this function on them explicitly.
+ */
+void bdrv_attach_aio_context(BlockDriverState *bs,
+                             AioContext *new_context);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 04/22] blkdebug: use BlockDriverState's AioContext
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 03/22] block: add bdrv_set_aio_context() Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 05/22] blkverify: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

Drop the assumption that we're using the main AioContext.  Convert
qemu_bh_new() to aio_bh_new() so we use the BlockDriverState's
AioContext.

The .bdrv_detach_aio_context() and .bdrv_attach_aio_context() interfaces
are not needed since no fd handlers, timers, or BHs stay registered when
requests have been drained.

Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/blkdebug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 380c736..f51407d 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -471,7 +471,7 @@ static BlockDriverAIOCB *inject_error(BlockDriverState *bs,
     acb = qemu_aio_get(&blkdebug_aiocb_info, bs, cb, opaque);
     acb->ret = -error;
 
-    bh = qemu_bh_new(error_callback_bh, acb);
+    bh = aio_bh_new(bdrv_get_aio_context(bs), error_callback_bh, acb);
     acb->bh = bh;
     qemu_bh_schedule(bh);
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 05/22] blkverify: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 04/22] blkdebug: use BlockDriverState's AioContext Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 06/22] curl: " Stefan Hajnoczi
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

Drop the assumption that we're using the main AioContext.  Convert
qemu_bh_new() to aio_bh_new() and qemu_aio_wait() to aio_poll() so we
use the BlockDriverState's AioContext.

Implement .bdrv_detach/attach_aio_context() interfaces to propagate
detach/attach to BDRVBlkverifyState->test_file.  The block layer takes
care of ->file and ->backing_hd but doesn't know about our ->test_file
BlockDriverState, which is also part of the graph.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/blkverify.c | 47 ++++++++++++++++++++++++++++++++++-------------
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/block/blkverify.c b/block/blkverify.c
index e1c3117..621b785 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -39,12 +39,13 @@ struct BlkverifyAIOCB {
 static void blkverify_aio_cancel(BlockDriverAIOCB *blockacb)
 {
     BlkverifyAIOCB *acb = (BlkverifyAIOCB *)blockacb;
+    AioContext *aio_context = bdrv_get_aio_context(blockacb->bs);
     bool finished = false;
 
     /* Wait until request completes, invokes its callback, and frees itself */
     acb->finished = &finished;
     while (!finished) {
-        qemu_aio_wait();
+        aio_poll(aio_context, true);
     }
 }
 
@@ -228,7 +229,8 @@ static void blkverify_aio_cb(void *opaque, int ret)
             acb->verify(acb);
         }
 
-        acb->bh = qemu_bh_new(blkverify_aio_bh, acb);
+        acb->bh = aio_bh_new(bdrv_get_aio_context(acb->common.bs),
+                             blkverify_aio_bh, acb);
         qemu_bh_schedule(acb->bh);
         break;
     }
@@ -302,21 +304,40 @@ static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
     return bdrv_recurse_is_first_non_filter(s->test_file, candidate);
 }
 
+/* Propagate AioContext changes to ->test_file */
+static void blkverify_detach_aio_context(BlockDriverState *bs)
+{
+    BDRVBlkverifyState *s = bs->opaque;
+
+    bdrv_detach_aio_context(s->test_file);
+}
+
+static void blkverify_attach_aio_context(BlockDriverState *bs,
+                                         AioContext *new_context)
+{
+    BDRVBlkverifyState *s = bs->opaque;
+
+    bdrv_attach_aio_context(s->test_file, new_context);
+}
+
 static BlockDriver bdrv_blkverify = {
-    .format_name            = "blkverify",
-    .protocol_name          = "blkverify",
-    .instance_size          = sizeof(BDRVBlkverifyState),
+    .format_name                      = "blkverify",
+    .protocol_name                    = "blkverify",
+    .instance_size                    = sizeof(BDRVBlkverifyState),
+
+    .bdrv_parse_filename              = blkverify_parse_filename,
+    .bdrv_file_open                   = blkverify_open,
+    .bdrv_close                       = blkverify_close,
+    .bdrv_getlength                   = blkverify_getlength,
 
-    .bdrv_parse_filename    = blkverify_parse_filename,
-    .bdrv_file_open         = blkverify_open,
-    .bdrv_close             = blkverify_close,
-    .bdrv_getlength         = blkverify_getlength,
+    .bdrv_aio_readv                   = blkverify_aio_readv,
+    .bdrv_aio_writev                  = blkverify_aio_writev,
+    .bdrv_aio_flush                   = blkverify_aio_flush,
 
-    .bdrv_aio_readv         = blkverify_aio_readv,
-    .bdrv_aio_writev        = blkverify_aio_writev,
-    .bdrv_aio_flush         = blkverify_aio_flush,
+    .bdrv_attach_aio_context          = blkverify_attach_aio_context,
+    .bdrv_detach_aio_context          = blkverify_detach_aio_context,
 
-    .is_filter              = true,
+    .is_filter                        = true,
     .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
 };
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 06/22] curl: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 05/22] blkverify: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-04 11:00   ` Fam Zheng
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 07/22] gluster: use BlockDriverState's AioContext Stefan Hajnoczi
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Shergill, Gurinder, Alexander Graf,
	Stefan Hajnoczi, Paolo Bonzini, Vinod, Chegu

The curl block driver uses fd handlers, timers, and BHs.  The fd
handlers and timers are managed on behalf of libcurl, which controls
them using callback functions that the block driver implements.

The simplest way to implement .bdrv_detach/attach_aio_context() is to
clean up libcurl in the old event loop and initialize it again in the
new event loop.  We do not need to keep track of anything since there
are no pending requests when the AioContext is changed.

Also make sure to use aio_set_fd_handler() instead of
qemu_aio_set_fd_handler() and aio_bh_new() instead of qemu_bh_new() so
the current AioContext is passed in.

Cc: Alexander Graf <agraf@suse.de>
Cc: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/curl.c | 194 +++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 116 insertions(+), 78 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 6731d28..88638ec 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -88,6 +88,7 @@ typedef struct BDRVCURLState {
     char *url;
     size_t readahead_size;
     bool accept_range;
+    AioContext *aio_context;
 } BDRVCURLState;
 
 static void curl_clean_state(CURLState *s);
@@ -111,21 +112,24 @@ static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque)
 #endif
 
 static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
-                        void *s, void *sp)
+                        void *userp, void *sp)
 {
+    BDRVCURLState *s = userp;
+
     DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, fd);
     switch (action) {
         case CURL_POLL_IN:
-            qemu_aio_set_fd_handler(fd, curl_multi_do, NULL, s);
+            aio_set_fd_handler(s->aio_context, fd, curl_multi_do, NULL, s);
             break;
         case CURL_POLL_OUT:
-            qemu_aio_set_fd_handler(fd, NULL, curl_multi_do, s);
+            aio_set_fd_handler(s->aio_context, fd, NULL, curl_multi_do, s);
             break;
         case CURL_POLL_INOUT:
-            qemu_aio_set_fd_handler(fd, curl_multi_do, curl_multi_do, s);
+            aio_set_fd_handler(s->aio_context, fd, curl_multi_do,
+                               curl_multi_do, s);
             break;
         case CURL_POLL_REMOVE:
-            qemu_aio_set_fd_handler(fd, NULL, NULL, NULL);
+            aio_set_fd_handler(s->aio_context, fd, NULL, NULL, NULL);
             break;
     }
 
@@ -430,6 +434,55 @@ static void curl_parse_filename(const char *filename, QDict *options,
     g_free(file);
 }
 
+static void curl_detach_aio_context(BlockDriverState *bs)
+{
+    BDRVCURLState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < CURL_NUM_STATES; i++) {
+        if (s->states[i].in_use) {
+            curl_clean_state(&s->states[i]);
+        }
+        if (s->states[i].curl) {
+            curl_easy_cleanup(s->states[i].curl);
+            s->states[i].curl = NULL;
+        }
+        if (s->states[i].orig_buf) {
+            g_free(s->states[i].orig_buf);
+            s->states[i].orig_buf = NULL;
+        }
+    }
+    if (s->multi) {
+        curl_multi_cleanup(s->multi);
+        s->multi = NULL;
+    }
+
+    timer_del(&s->timer);
+}
+
+static void curl_attach_aio_context(BlockDriverState *bs,
+                                    AioContext *new_context)
+{
+    BDRVCURLState *s = bs->opaque;
+
+    aio_timer_init(new_context, &s->timer,
+                   QEMU_CLOCK_REALTIME, SCALE_NS,
+                   curl_multi_timeout_do, s);
+
+    // Now we know the file exists and its size, so let's
+    // initialize the multi interface!
+
+    s->multi = curl_multi_init();
+    s->aio_context = new_context;
+    curl_multi_setopt(s->multi, CURLMOPT_SOCKETDATA, s);
+    curl_multi_setopt(s->multi, CURLMOPT_SOCKETFUNCTION, curl_sock_cb);
+#ifdef NEED_CURL_TIMER_CALLBACK
+    curl_multi_setopt(s->multi, CURLMOPT_TIMERDATA, s);
+    curl_multi_setopt(s->multi, CURLMOPT_TIMERFUNCTION, curl_timer_cb);
+#endif
+    curl_multi_do(s);
+}
+
 static QemuOptsList runtime_opts = {
     .name = "curl",
     .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
@@ -523,21 +576,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
     curl_easy_cleanup(state->curl);
     state->curl = NULL;
 
-    aio_timer_init(bdrv_get_aio_context(bs), &s->timer,
-                   QEMU_CLOCK_REALTIME, SCALE_NS,
-                   curl_multi_timeout_do, s);
-
-    // Now we know the file exists and its size, so let's
-    // initialize the multi interface!
-
-    s->multi = curl_multi_init();
-    curl_multi_setopt(s->multi, CURLMOPT_SOCKETDATA, s);
-    curl_multi_setopt(s->multi, CURLMOPT_SOCKETFUNCTION, curl_sock_cb);
-#ifdef NEED_CURL_TIMER_CALLBACK
-    curl_multi_setopt(s->multi, CURLMOPT_TIMERDATA, s);
-    curl_multi_setopt(s->multi, CURLMOPT_TIMERFUNCTION, curl_timer_cb);
-#endif
-    curl_multi_do(s);
+    curl_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
     qemu_opts_del(opts);
     return 0;
@@ -630,7 +669,7 @@ static BlockDriverAIOCB *curl_aio_readv(BlockDriverState *bs,
     acb->sector_num = sector_num;
     acb->nb_sectors = nb_sectors;
 
-    acb->bh = qemu_bh_new(curl_readv_bh_cb, acb);
+    acb->bh = aio_bh_new(bdrv_get_aio_context(bs), curl_readv_bh_cb, acb);
     qemu_bh_schedule(acb->bh);
     return &acb->common;
 }
@@ -638,25 +677,9 @@ static BlockDriverAIOCB *curl_aio_readv(BlockDriverState *bs,
 static void curl_close(BlockDriverState *bs)
 {
     BDRVCURLState *s = bs->opaque;
-    int i;
 
     DPRINTF("CURL: Close\n");
-    for (i=0; i<CURL_NUM_STATES; i++) {
-        if (s->states[i].in_use)
-            curl_clean_state(&s->states[i]);
-        if (s->states[i].curl) {
-            curl_easy_cleanup(s->states[i].curl);
-            s->states[i].curl = NULL;
-        }
-        if (s->states[i].orig_buf) {
-            g_free(s->states[i].orig_buf);
-            s->states[i].orig_buf = NULL;
-        }
-    }
-    if (s->multi)
-        curl_multi_cleanup(s->multi);
-
-    timer_del(&s->timer);
+    curl_detach_aio_context(bs);
 
     g_free(s->url);
 }
@@ -668,68 +691,83 @@ static int64_t curl_getlength(BlockDriverState *bs)
 }
 
 static BlockDriver bdrv_http = {
-    .format_name            = "http",
-    .protocol_name          = "http",
+    .format_name                = "http",
+    .protocol_name              = "http",
+
+    .instance_size              = sizeof(BDRVCURLState),
+    .bdrv_parse_filename        = curl_parse_filename,
+    .bdrv_file_open             = curl_open,
+    .bdrv_close                 = curl_close,
+    .bdrv_getlength             = curl_getlength,
 
-    .instance_size          = sizeof(BDRVCURLState),
-    .bdrv_parse_filename    = curl_parse_filename,
-    .bdrv_file_open         = curl_open,
-    .bdrv_close             = curl_close,
-    .bdrv_getlength         = curl_getlength,
+    .bdrv_aio_readv             = curl_aio_readv,
 
-    .bdrv_aio_readv         = curl_aio_readv,
+    .bdrv_detach_aio_context    = curl_detach_aio_context,
+    .bdrv_attach_aio_context    = curl_attach_aio_context,
 };
 
 static BlockDriver bdrv_https = {
-    .format_name            = "https",
-    .protocol_name          = "https",
+    .format_name                = "https",
+    .protocol_name              = "https",
 
-    .instance_size          = sizeof(BDRVCURLState),
-    .bdrv_parse_filename    = curl_parse_filename,
-    .bdrv_file_open         = curl_open,
-    .bdrv_close             = curl_close,
-    .bdrv_getlength         = curl_getlength,
+    .instance_size              = sizeof(BDRVCURLState),
+    .bdrv_parse_filename        = curl_parse_filename,
+    .bdrv_file_open             = curl_open,
+    .bdrv_close                 = curl_close,
+    .bdrv_getlength             = curl_getlength,
 
-    .bdrv_aio_readv         = curl_aio_readv,
+    .bdrv_aio_readv             = curl_aio_readv,
+
+    .bdrv_detach_aio_context    = curl_detach_aio_context,
+    .bdrv_attach_aio_context    = curl_attach_aio_context,
 };
 
 static BlockDriver bdrv_ftp = {
-    .format_name            = "ftp",
-    .protocol_name          = "ftp",
+    .format_name                = "ftp",
+    .protocol_name              = "ftp",
+
+    .instance_size              = sizeof(BDRVCURLState),
+    .bdrv_parse_filename        = curl_parse_filename,
+    .bdrv_file_open             = curl_open,
+    .bdrv_close                 = curl_close,
+    .bdrv_getlength             = curl_getlength,
 
-    .instance_size          = sizeof(BDRVCURLState),
-    .bdrv_parse_filename    = curl_parse_filename,
-    .bdrv_file_open         = curl_open,
-    .bdrv_close             = curl_close,
-    .bdrv_getlength         = curl_getlength,
+    .bdrv_aio_readv             = curl_aio_readv,
 
-    .bdrv_aio_readv         = curl_aio_readv,
+    .bdrv_detach_aio_context    = curl_detach_aio_context,
+    .bdrv_attach_aio_context    = curl_attach_aio_context,
 };
 
 static BlockDriver bdrv_ftps = {
-    .format_name            = "ftps",
-    .protocol_name          = "ftps",
+    .format_name                = "ftps",
+    .protocol_name              = "ftps",
 
-    .instance_size          = sizeof(BDRVCURLState),
-    .bdrv_parse_filename    = curl_parse_filename,
-    .bdrv_file_open         = curl_open,
-    .bdrv_close             = curl_close,
-    .bdrv_getlength         = curl_getlength,
+    .instance_size              = sizeof(BDRVCURLState),
+    .bdrv_parse_filename        = curl_parse_filename,
+    .bdrv_file_open             = curl_open,
+    .bdrv_close                 = curl_close,
+    .bdrv_getlength             = curl_getlength,
 
-    .bdrv_aio_readv         = curl_aio_readv,
+    .bdrv_aio_readv             = curl_aio_readv,
+
+    .bdrv_detach_aio_context    = curl_detach_aio_context,
+    .bdrv_attach_aio_context    = curl_attach_aio_context,
 };
 
 static BlockDriver bdrv_tftp = {
-    .format_name            = "tftp",
-    .protocol_name          = "tftp",
+    .format_name                = "tftp",
+    .protocol_name              = "tftp",
+
+    .instance_size              = sizeof(BDRVCURLState),
+    .bdrv_parse_filename        = curl_parse_filename,
+    .bdrv_file_open             = curl_open,
+    .bdrv_close                 = curl_close,
+    .bdrv_getlength             = curl_getlength,
 
-    .instance_size          = sizeof(BDRVCURLState),
-    .bdrv_parse_filename    = curl_parse_filename,
-    .bdrv_file_open         = curl_open,
-    .bdrv_close             = curl_close,
-    .bdrv_getlength         = curl_getlength,
+    .bdrv_aio_readv             = curl_aio_readv,
 
-    .bdrv_aio_readv         = curl_aio_readv,
+    .bdrv_detach_aio_context    = curl_detach_aio_context,
+    .bdrv_attach_aio_context    = curl_attach_aio_context,
 };
 
 static void curl_block_init(void)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 07/22] gluster: use BlockDriverState's AioContext
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 06/22] curl: " Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-05  8:39   ` Bharata B Rao
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Shergill, Gurinder, Stefan Hajnoczi, Bharata B Rao,
	Paolo Bonzini, Vinod, Chegu

Drop the assumption that we're using the main AioContext.  Use
aio_bh_new() instead of qemu_bh_new().

The .bdrv_detach_aio_context() and .bdrv_attach_aio_context() interfaces
are not needed since no fd handlers, timers, or BHs stay registered when
requests have been drained.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/gluster.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/gluster.c b/block/gluster.c
index 8836085..b358bdc 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -16,6 +16,7 @@ typedef struct GlusterAIOCB {
     int ret;
     QEMUBH *bh;
     Coroutine *coroutine;
+    AioContext *aio_context;
 } GlusterAIOCB;
 
 typedef struct BDRVGlusterState {
@@ -244,7 +245,7 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg)
         acb->ret = -EIO; /* Partial read/write - fail it */
     }
 
-    acb->bh = qemu_bh_new(qemu_gluster_complete_aio, acb);
+    acb->bh = aio_bh_new(acb->aio_context, qemu_gluster_complete_aio, acb);
     qemu_bh_schedule(acb->bh);
 }
 
@@ -431,6 +432,7 @@ static coroutine_fn int qemu_gluster_co_write_zeroes(BlockDriverState *bs,
     acb->size = size;
     acb->ret = 0;
     acb->coroutine = qemu_coroutine_self();
+    acb->aio_context = bdrv_get_aio_context(bs);
 
     ret = glfs_zerofill_async(s->fd, offset, size, &gluster_finish_aiocb, acb);
     if (ret < 0) {
@@ -544,6 +546,7 @@ static coroutine_fn int qemu_gluster_co_rw(BlockDriverState *bs,
     acb->size = size;
     acb->ret = 0;
     acb->coroutine = qemu_coroutine_self();
+    acb->aio_context = bdrv_get_aio_context(bs);
 
     if (write) {
         ret = glfs_pwritev_async(s->fd, qiov->iov, qiov->niov, offset, 0,
@@ -600,6 +603,7 @@ static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
     acb->size = 0;
     acb->ret = 0;
     acb->coroutine = qemu_coroutine_self();
+    acb->aio_context = bdrv_get_aio_context(bs);
 
     ret = glfs_fsync_async(s->fd, &gluster_finish_aiocb, acb);
     if (ret < 0) {
@@ -628,6 +632,7 @@ static coroutine_fn int qemu_gluster_co_discard(BlockDriverState *bs,
     acb->size = 0;
     acb->ret = 0;
     acb->coroutine = qemu_coroutine_self();
+    acb->aio_context = bdrv_get_aio_context(bs);
 
     ret = glfs_discard_async(s->fd, offset, size, &gluster_finish_aiocb, acb);
     if (ret < 0) {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 07/22] gluster: use BlockDriverState's AioContext Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 22:39   ` Peter Lieven
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 09/22] nbd: " Stefan Hajnoczi
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Ronnie Sahlberg, Shergill, Gurinder, Peter Lieven,
	Stefan Hajnoczi, Paolo Bonzini, Vinod, Chegu

Drop the assumption that we're using the main AioContext for Linux
AIO.  Convert qemu_aio_set_fd_handler() to aio_set_fd_handler() and
timer_new_ms() to aio_timer_new().

The .bdrv_detach/attach_aio_context() interfaces also need to be
implemented to move the fd and timer from the old to the new AioContext.

Cc: Peter Lieven <pl@kamp.de>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/iscsi.c | 79 +++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 55 insertions(+), 24 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index a30202b..81e3ebd 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -47,6 +47,7 @@
 
 typedef struct IscsiLun {
     struct iscsi_context *iscsi;
+    AioContext *aio_context;
     int lun;
     enum scsi_inquiry_peripheral_device_type type;
     int block_size;
@@ -69,6 +70,7 @@ typedef struct IscsiTask {
     struct scsi_task *task;
     Coroutine *co;
     QEMUBH *bh;
+    AioContext *aio_context;
 } IscsiTask;
 
 typedef struct IscsiAIOCB {
@@ -120,7 +122,7 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
     if (acb->bh) {
         return;
     }
-    acb->bh = qemu_bh_new(iscsi_bh_cb, acb);
+    acb->bh = aio_bh_new(acb->iscsilun->aio_context, iscsi_bh_cb, acb);
     qemu_bh_schedule(acb->bh);
 }
 
@@ -156,7 +158,7 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
 
 out:
     if (iTask->co) {
-        iTask->bh = qemu_bh_new(iscsi_co_generic_bh_cb, iTask);
+        iTask->bh = aio_bh_new(iTask->aio_context, iscsi_co_generic_bh_cb, iTask);
         qemu_bh_schedule(iTask->bh);
     }
 }
@@ -164,8 +166,9 @@ out:
 static void iscsi_co_init_iscsitask(IscsiLun *iscsilun, struct IscsiTask *iTask)
 {
     *iTask = (struct IscsiTask) {
-        .co         = qemu_coroutine_self(),
-        .retries    = ISCSI_CMD_RETRIES,
+        .co             = qemu_coroutine_self(),
+        .retries        = ISCSI_CMD_RETRIES,
+        .aio_context    = iscsilun->aio_context,
     };
 }
 
@@ -196,7 +199,7 @@ iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
                                      iscsi_abort_task_cb, acb);
 
     while (acb->status == -EINPROGRESS) {
-        qemu_aio_wait();
+        aio_poll(bdrv_get_aio_context(blockacb->bs), true);
     }
 }
 
@@ -219,10 +222,11 @@ iscsi_set_events(IscsiLun *iscsilun)
     ev = POLLIN;
     ev |= iscsi_which_events(iscsi);
     if (ev != iscsilun->events) {
-        qemu_aio_set_fd_handler(iscsi_get_fd(iscsi),
-                      iscsi_process_read,
-                      (ev & POLLOUT) ? iscsi_process_write : NULL,
-                      iscsilun);
+        aio_set_fd_handler(iscsilun->aio_context,
+                           iscsi_get_fd(iscsi),
+                           iscsi_process_read,
+                           (ev & POLLOUT) ? iscsi_process_write : NULL,
+                           iscsilun);
 
     }
 
@@ -620,7 +624,7 @@ static int iscsi_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
         iscsi_aio_ioctl(bs, req, buf, ioctl_cb, &status);
 
         while (status == -EINPROGRESS) {
-            qemu_aio_wait();
+            aio_poll(bdrv_get_aio_context(bs), true);
         }
 
         return 0;
@@ -1110,6 +1114,40 @@ fail_with_err:
     return NULL;
 }
 
+static void iscsi_detach_aio_context(BlockDriverState *bs)
+{
+    IscsiLun *iscsilun = bs->opaque;
+
+    aio_set_fd_handler(iscsilun->aio_context,
+                       iscsi_get_fd(iscsilun->iscsi),
+                       NULL, NULL, NULL);
+    iscsilun->events = 0;
+
+    if (iscsilun->nop_timer) {
+        timer_del(iscsilun->nop_timer);
+        timer_free(iscsilun->nop_timer);
+        iscsilun->nop_timer = NULL;
+    }
+}
+
+static void iscsi_attach_aio_context(BlockDriverState *bs,
+                                     AioContext *new_context)
+{
+    IscsiLun *iscsilun = bs->opaque;
+
+    iscsilun->aio_context = new_context;
+    iscsi_set_events(iscsilun);
+
+#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
+    /* Set up a timer for sending out iSCSI NOPs */
+    iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context,
+                                        QEMU_CLOCK_REALTIME, SCALE_MS,
+                                        iscsi_nop_timed_event, iscsilun);
+    timer_mod(iscsilun->nop_timer,
+              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
+#endif
+}
+
 /*
  * We support iscsi url's on the form
  * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
@@ -1216,6 +1254,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     iscsilun->iscsi = iscsi;
+    iscsilun->aio_context = bdrv_get_aio_context(bs);
     iscsilun->lun   = iscsi_url->lun;
     iscsilun->has_write_same = true;
 
@@ -1289,11 +1328,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
     scsi_free_scsi_task(task);
     task = NULL;
 
-#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
-    /* Set up a timer for sending out iSCSI NOPs */
-    iscsilun->nop_timer = timer_new_ms(QEMU_CLOCK_REALTIME, iscsi_nop_timed_event, iscsilun);
-    timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
-#endif
+    iscsi_attach_aio_context(bs, iscsilun->aio_context);
 
 out:
     qemu_opts_del(opts);
@@ -1321,11 +1356,7 @@ static void iscsi_close(BlockDriverState *bs)
     IscsiLun *iscsilun = bs->opaque;
     struct iscsi_context *iscsi = iscsilun->iscsi;
 
-    if (iscsilun->nop_timer) {
-        timer_del(iscsilun->nop_timer);
-        timer_free(iscsilun->nop_timer);
-    }
-    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL);
+    iscsi_detach_aio_context(bs);
     iscsi_destroy_context(iscsi);
     g_free(iscsilun->zeroblock);
     memset(iscsilun, 0, sizeof(IscsiLun));
@@ -1421,10 +1452,7 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options,
     if (ret != 0) {
         goto out;
     }
-    if (iscsilun->nop_timer) {
-        timer_del(iscsilun->nop_timer);
-        timer_free(iscsilun->nop_timer);
-    }
+    iscsi_detach_aio_context(bs);
     if (iscsilun->type != TYPE_DISK) {
         ret = -ENODEV;
         goto out;
@@ -1501,6 +1529,9 @@ static BlockDriver bdrv_iscsi = {
     .bdrv_ioctl       = iscsi_ioctl,
     .bdrv_aio_ioctl   = iscsi_aio_ioctl,
 #endif
+
+    .bdrv_detach_aio_context = iscsi_detach_aio_context,
+    .bdrv_attach_aio_context = iscsi_attach_aio_context,
 };
 
 static QemuOptsList qemu_iscsi_opts = {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 09/22] nbd: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-02  7:40   ` Paolo Bonzini
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 10/22] nfs: " Stefan Hajnoczi
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

Drop the assumption that we're using the main AioContext.  Convert
qemu_aio_set_fd_handler() calls to aio_set_fd_handler().

The .bdrv_detach/attach_aio_context() interfaces also need to be
implemented to move the socket fd handler from the old to the new
AioContext.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/nbd-client.c | 24 ++++++++++++---
 block/nbd-client.h |  4 +++
 block/nbd.c        | 87 +++++++++++++++++++++++++++++++++---------------------
 3 files changed, 78 insertions(+), 37 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 7d698cb..6e1c97c 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -49,7 +49,7 @@ static void nbd_teardown_connection(NbdClientSession *client)
     shutdown(client->sock, 2);
     nbd_recv_coroutines_enter_all(client);
 
-    qemu_aio_set_fd_handler(client->sock, NULL, NULL, NULL);
+    nbd_client_session_detach_aio_context(client);
     closesocket(client->sock);
     client->sock = -1;
 }
@@ -103,11 +103,14 @@ static int nbd_co_send_request(NbdClientSession *s,
     struct nbd_request *request,
     QEMUIOVector *qiov, int offset)
 {
+    AioContext *aio_context;
     int rc, ret;
 
     qemu_co_mutex_lock(&s->send_mutex);
     s->send_coroutine = qemu_coroutine_self();
-    qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, nbd_restart_write, s);
+    aio_context = bdrv_get_aio_context(s->bs);
+    aio_set_fd_handler(aio_context, s->sock,
+                       nbd_reply_ready, nbd_restart_write, s);
     if (qiov) {
         if (!s->is_unix) {
             socket_set_cork(s->sock, 1);
@@ -126,7 +129,7 @@ static int nbd_co_send_request(NbdClientSession *s,
     } else {
         rc = nbd_send_request(s->sock, request);
     }
-    qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, NULL, s);
+    aio_set_fd_handler(aio_context, s->sock, nbd_reply_ready, NULL, s);
     s->send_coroutine = NULL;
     qemu_co_mutex_unlock(&s->send_mutex);
     return rc;
@@ -335,6 +338,19 @@ int nbd_client_session_co_discard(NbdClientSession *client, int64_t sector_num,
 
 }
 
+void nbd_client_session_detach_aio_context(NbdClientSession *client)
+{
+    aio_set_fd_handler(bdrv_get_aio_context(client->bs), client->sock,
+                       NULL, NULL, NULL);
+}
+
+void nbd_client_session_attach_aio_context(NbdClientSession *client,
+                                           AioContext *new_context)
+{
+    aio_set_fd_handler(new_context, client->sock,
+                       nbd_reply_ready, NULL, client);
+}
+
 void nbd_client_session_close(NbdClientSession *client)
 {
     struct nbd_request request = {
@@ -381,7 +397,7 @@ int nbd_client_session_init(NbdClientSession *client, BlockDriverState *bs,
     /* Now that we're connected, set the socket to be non-blocking and
      * kick the reply mechanism.  */
     qemu_set_nonblock(sock);
-    qemu_aio_set_fd_handler(sock, nbd_reply_ready, NULL, client);
+    nbd_client_session_attach_aio_context(client, bdrv_get_aio_context(bs));
 
     logout("Established connection with NBD server\n");
     return 0;
diff --git a/block/nbd-client.h b/block/nbd-client.h
index f2a6337..cd478f3 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -47,4 +47,8 @@ int nbd_client_session_co_writev(NbdClientSession *client, int64_t sector_num,
 int nbd_client_session_co_readv(NbdClientSession *client, int64_t sector_num,
                                 int nb_sectors, QEMUIOVector *qiov);
 
+void nbd_client_session_detach_aio_context(NbdClientSession *client);
+void nbd_client_session_attach_aio_context(NbdClientSession *client,
+                                           AioContext *new_context);
+
 #endif /* NBD_CLIENT_H */
diff --git a/block/nbd.c b/block/nbd.c
index 613f258..4eda095 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -323,46 +323,67 @@ static int64_t nbd_getlength(BlockDriverState *bs)
     return s->client.size;
 }
 
+static void nbd_detach_aio_context(BlockDriverState *bs)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    nbd_client_session_detach_aio_context(&s->client);
+}
+
+static void nbd_attach_aio_context(BlockDriverState *bs,
+                                   AioContext *new_context)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    nbd_client_session_attach_aio_context(&s->client, new_context);
+}
+
 static BlockDriver bdrv_nbd = {
-    .format_name         = "nbd",
-    .protocol_name       = "nbd",
-    .instance_size       = sizeof(BDRVNBDState),
-    .bdrv_parse_filename = nbd_parse_filename,
-    .bdrv_file_open      = nbd_open,
-    .bdrv_co_readv       = nbd_co_readv,
-    .bdrv_co_writev      = nbd_co_writev,
-    .bdrv_close          = nbd_close,
-    .bdrv_co_flush_to_os = nbd_co_flush,
-    .bdrv_co_discard     = nbd_co_discard,
-    .bdrv_getlength      = nbd_getlength,
+    .format_name                = "nbd",
+    .protocol_name              = "nbd",
+    .instance_size              = sizeof(BDRVNBDState),
+    .bdrv_parse_filename        = nbd_parse_filename,
+    .bdrv_file_open             = nbd_open,
+    .bdrv_co_readv              = nbd_co_readv,
+    .bdrv_co_writev             = nbd_co_writev,
+    .bdrv_close                 = nbd_close,
+    .bdrv_co_flush_to_os        = nbd_co_flush,
+    .bdrv_co_discard            = nbd_co_discard,
+    .bdrv_getlength             = nbd_getlength,
+    .bdrv_detach_aio_context    = nbd_detach_aio_context,
+    .bdrv_attach_aio_context    = nbd_attach_aio_context,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
-    .format_name         = "nbd",
-    .protocol_name       = "nbd+tcp",
-    .instance_size       = sizeof(BDRVNBDState),
-    .bdrv_parse_filename = nbd_parse_filename,
-    .bdrv_file_open      = nbd_open,
-    .bdrv_co_readv       = nbd_co_readv,
-    .bdrv_co_writev      = nbd_co_writev,
-    .bdrv_close          = nbd_close,
-    .bdrv_co_flush_to_os = nbd_co_flush,
-    .bdrv_co_discard     = nbd_co_discard,
-    .bdrv_getlength      = nbd_getlength,
+    .format_name                = "nbd",
+    .protocol_name              = "nbd+tcp",
+    .instance_size              = sizeof(BDRVNBDState),
+    .bdrv_parse_filename        = nbd_parse_filename,
+    .bdrv_file_open             = nbd_open,
+    .bdrv_co_readv              = nbd_co_readv,
+    .bdrv_co_writev             = nbd_co_writev,
+    .bdrv_close                 = nbd_close,
+    .bdrv_co_flush_to_os        = nbd_co_flush,
+    .bdrv_co_discard            = nbd_co_discard,
+    .bdrv_getlength             = nbd_getlength,
+    .bdrv_detach_aio_context    = nbd_detach_aio_context,
+    .bdrv_attach_aio_context    = nbd_attach_aio_context,
 };
 
 static BlockDriver bdrv_nbd_unix = {
-    .format_name         = "nbd",
-    .protocol_name       = "nbd+unix",
-    .instance_size       = sizeof(BDRVNBDState),
-    .bdrv_parse_filename = nbd_parse_filename,
-    .bdrv_file_open      = nbd_open,
-    .bdrv_co_readv       = nbd_co_readv,
-    .bdrv_co_writev      = nbd_co_writev,
-    .bdrv_close          = nbd_close,
-    .bdrv_co_flush_to_os = nbd_co_flush,
-    .bdrv_co_discard     = nbd_co_discard,
-    .bdrv_getlength      = nbd_getlength,
+    .format_name                = "nbd",
+    .protocol_name              = "nbd+unix",
+    .instance_size              = sizeof(BDRVNBDState),
+    .bdrv_parse_filename        = nbd_parse_filename,
+    .bdrv_file_open             = nbd_open,
+    .bdrv_co_readv              = nbd_co_readv,
+    .bdrv_co_writev             = nbd_co_writev,
+    .bdrv_close                 = nbd_close,
+    .bdrv_co_flush_to_os        = nbd_co_flush,
+    .bdrv_co_discard            = nbd_co_discard,
+    .bdrv_getlength             = nbd_getlength,
+    .bdrv_detach_aio_context    = nbd_detach_aio_context,
+    .bdrv_attach_aio_context    = nbd_attach_aio_context,
 };
 
 static void bdrv_nbd_init(void)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 10/22] nfs: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 09/22] nbd: " Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 11/22] qed: use BlockDriverState's AioContext Stefan Hajnoczi
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Shergill, Gurinder, Richard W.M. Jones,
	Stefan Hajnoczi, Paolo Bonzini, Vinod, Chegu

Drop the assumption that we're using the main AioContext.  The following
functions need to be converted:
 * qemu_bh_new() -> aio_bh_new()
 * qemu_aio_set_fd_handler() -> aio_set_fd_handler()
 * qemu_aio_wait() -> aio_poll()

The .bdrv_detach/attach_aio_context() interfaces also need to be
implemented to move the fd handler from the old to the new AioContext.

Cc: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/nfs.c | 80 ++++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 56 insertions(+), 24 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index 9fa831f..33f198b 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -40,6 +40,7 @@ typedef struct NFSClient {
     struct nfsfh *fh;
     int events;
     bool has_zero_init;
+    AioContext *aio_context;
 } NFSClient;
 
 typedef struct NFSRPC {
@@ -49,6 +50,7 @@ typedef struct NFSRPC {
     struct stat *st;
     Coroutine *co;
     QEMUBH *bh;
+    AioContext *aio_context;
 } NFSRPC;
 
 static void nfs_process_read(void *arg);
@@ -58,10 +60,11 @@ static void nfs_set_events(NFSClient *client)
 {
     int ev = nfs_which_events(client->context);
     if (ev != client->events) {
-        qemu_aio_set_fd_handler(nfs_get_fd(client->context),
-                      (ev & POLLIN) ? nfs_process_read : NULL,
-                      (ev & POLLOUT) ? nfs_process_write : NULL,
-                      client);
+        aio_set_fd_handler(client->aio_context,
+                           nfs_get_fd(client->context),
+                           (ev & POLLIN) ? nfs_process_read : NULL,
+                           (ev & POLLOUT) ? nfs_process_write : NULL,
+                           client);
 
     }
     client->events = ev;
@@ -84,7 +87,8 @@ static void nfs_process_write(void *arg)
 static void nfs_co_init_task(NFSClient *client, NFSRPC *task)
 {
     *task = (NFSRPC) {
-        .co         = qemu_coroutine_self(),
+        .co             = qemu_coroutine_self(),
+        .aio_context    = client->aio_context,
     };
 }
 
@@ -116,7 +120,7 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
         error_report("NFS Error: %s", nfs_get_error(nfs));
     }
     if (task->co) {
-        task->bh = qemu_bh_new(nfs_co_generic_bh_cb, task);
+        task->bh = aio_bh_new(task->aio_context, nfs_co_generic_bh_cb, task);
         qemu_bh_schedule(task->bh);
     }
 }
@@ -224,13 +228,34 @@ static QemuOptsList runtime_opts = {
     },
 };
 
+static void nfs_detach_aio_context(BlockDriverState *bs)
+{
+    NFSClient *client = bs->opaque;
+
+    aio_set_fd_handler(client->aio_context,
+                       nfs_get_fd(client->context),
+                       NULL, NULL, NULL);
+    client->events = 0;
+}
+
+static void nfs_attach_aio_context(BlockDriverState *bs,
+                                   AioContext *new_context)
+{
+    NFSClient *client = bs->opaque;
+
+    client->aio_context = new_context;
+    nfs_set_events(client);
+}
+
 static void nfs_client_close(NFSClient *client)
 {
     if (client->context) {
         if (client->fh) {
             nfs_close(client->context, client->fh);
         }
-        qemu_aio_set_fd_handler(nfs_get_fd(client->context), NULL, NULL, NULL);
+        aio_set_fd_handler(client->aio_context,
+                           nfs_get_fd(client->context),
+                           NULL, NULL, NULL);
         nfs_destroy_context(client->context);
     }
     memset(client, 0, sizeof(NFSClient));
@@ -341,6 +366,8 @@ static int nfs_file_open(BlockDriverState *bs, QDict *options, int flags,
     QemuOpts *opts;
     Error *local_err = NULL;
 
+    client->aio_context = bdrv_get_aio_context(bs);
+
     opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
     qemu_opts_absorb_qdict(opts, options, &local_err);
     if (local_err) {
@@ -364,6 +391,8 @@ static int nfs_file_create(const char *url, QEMUOptionParameter *options,
     int64_t total_size = 0;
     NFSClient *client = g_malloc0(sizeof(NFSClient));
 
+    client->aio_context = qemu_get_aio_context();
+
     /* Read out options */
     while (options && options->name) {
         if (!strcmp(options->name, "size")) {
@@ -403,7 +432,7 @@ static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
 
     while (!task.complete) {
         nfs_set_events(client);
-        qemu_aio_wait();
+        aio_poll(bdrv_get_aio_context(bs), true);
     }
 
     return (task.ret < 0 ? task.ret : st.st_blocks * st.st_blksize);
@@ -416,22 +445,25 @@ static int nfs_file_truncate(BlockDriverState *bs, int64_t offset)
 }
 
 static BlockDriver bdrv_nfs = {
-    .format_name     = "nfs",
-    .protocol_name   = "nfs",
-
-    .instance_size   = sizeof(NFSClient),
-    .bdrv_needs_filename = true,
-    .bdrv_has_zero_init = nfs_has_zero_init,
-    .bdrv_get_allocated_file_size = nfs_get_allocated_file_size,
-    .bdrv_truncate = nfs_file_truncate,
-
-    .bdrv_file_open  = nfs_file_open,
-    .bdrv_close      = nfs_file_close,
-    .bdrv_create     = nfs_file_create,
-
-    .bdrv_co_readv         = nfs_co_readv,
-    .bdrv_co_writev        = nfs_co_writev,
-    .bdrv_co_flush_to_disk = nfs_co_flush,
+    .format_name                    = "nfs",
+    .protocol_name                  = "nfs",
+
+    .instance_size                  = sizeof(NFSClient),
+    .bdrv_needs_filename            = true,
+    .bdrv_has_zero_init             = nfs_has_zero_init,
+    .bdrv_get_allocated_file_size   = nfs_get_allocated_file_size,
+    .bdrv_truncate                  = nfs_file_truncate,
+
+    .bdrv_file_open                 = nfs_file_open,
+    .bdrv_close                     = nfs_file_close,
+    .bdrv_create                    = nfs_file_create,
+
+    .bdrv_co_readv                  = nfs_co_readv,
+    .bdrv_co_writev                 = nfs_co_writev,
+    .bdrv_co_flush_to_disk          = nfs_co_flush,
+
+    .bdrv_detach_aio_context        = nfs_detach_aio_context,
+    .bdrv_attach_aio_context        = nfs_attach_aio_context,
 };
 
 static void nfs_block_init(void)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 11/22] qed: use BlockDriverState's AioContext
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (9 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 10/22] nfs: " Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 12/22] quorum: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

Drop the assumption that we're using the main AioContext.  Convert
qemu_bh_new() to aio_bh_new() and qemu_aio_wait() to aio_poll() so we're
using the BlockDriverState's AioContext.

Implement .bdrv_detach/attach_aio_context() interfaces to move the
QED_F_NEED_CHECK timer from the old AioContext to the new one.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed-table.c |  8 ++++----
 block/qed.c       | 35 +++++++++++++++++++++++++++++------
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/block/qed-table.c b/block/qed-table.c
index 76d2dcc..f61107a 100644
--- a/block/qed-table.c
+++ b/block/qed-table.c
@@ -173,7 +173,7 @@ int qed_read_l1_table_sync(BDRVQEDState *s)
     qed_read_table(s, s->header.l1_table_offset,
                    s->l1_table, qed_sync_cb, &ret);
     while (ret == -EINPROGRESS) {
-        qemu_aio_wait();
+        aio_poll(bdrv_get_aio_context(s->bs), true);
     }
 
     return ret;
@@ -194,7 +194,7 @@ int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
 
     qed_write_l1_table(s, index, n, qed_sync_cb, &ret);
     while (ret == -EINPROGRESS) {
-        qemu_aio_wait();
+        aio_poll(bdrv_get_aio_context(s->bs), true);
     }
 
     return ret;
@@ -267,7 +267,7 @@ int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset
 
     qed_read_l2_table(s, request, offset, qed_sync_cb, &ret);
     while (ret == -EINPROGRESS) {
-        qemu_aio_wait();
+        aio_poll(bdrv_get_aio_context(s->bs), true);
     }
 
     return ret;
@@ -289,7 +289,7 @@ int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
 
     qed_write_l2_table(s, request, index, n, flush, qed_sync_cb, &ret);
     while (ret == -EINPROGRESS) {
-        qemu_aio_wait();
+        aio_poll(bdrv_get_aio_context(s->bs), true);
     }
 
     return ret;
diff --git a/block/qed.c b/block/qed.c
index c130e42..79f5bd3 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -21,12 +21,13 @@
 static void qed_aio_cancel(BlockDriverAIOCB *blockacb)
 {
     QEDAIOCB *acb = (QEDAIOCB *)blockacb;
+    AioContext *aio_context = bdrv_get_aio_context(blockacb->bs);
     bool finished = false;
 
     /* Wait for the request to finish */
     acb->finished = &finished;
     while (!finished) {
-        qemu_aio_wait();
+        aio_poll(aio_context, true);
     }
 }
 
@@ -373,6 +374,27 @@ static void bdrv_qed_rebind(BlockDriverState *bs)
     s->bs = bs;
 }
 
+static void bdrv_qed_detach_aio_context(BlockDriverState *bs)
+{
+    BDRVQEDState *s = bs->opaque;
+
+    qed_cancel_need_check_timer(s);
+    timer_free(s->need_check_timer);
+}
+
+static void bdrv_qed_attach_aio_context(BlockDriverState *bs,
+                                        AioContext *new_context)
+{
+    BDRVQEDState *s = bs->opaque;
+
+    s->need_check_timer = aio_timer_new(new_context,
+                                        QEMU_CLOCK_VIRTUAL, SCALE_NS,
+                                        qed_need_check_timer_cb, s);
+    if (s->header.features & QED_F_NEED_CHECK) {
+        qed_start_need_check_timer(s);
+    }
+}
+
 static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
                          Error **errp)
 {
@@ -496,8 +518,7 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
         }
     }
 
-    s->need_check_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-                                            qed_need_check_timer_cb, s);
+    bdrv_qed_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
 out:
     if (ret) {
@@ -528,8 +549,7 @@ static void bdrv_qed_close(BlockDriverState *bs)
 {
     BDRVQEDState *s = bs->opaque;
 
-    qed_cancel_need_check_timer(s);
-    timer_free(s->need_check_timer);
+    bdrv_qed_detach_aio_context(bs);
 
     /* Ensure writes reach stable storage */
     bdrv_flush(bs->file);
@@ -919,7 +939,8 @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
 
     /* Arrange for a bh to invoke the completion function */
     acb->bh_ret = ret;
-    acb->bh = qemu_bh_new(qed_aio_complete_bh, acb);
+    acb->bh = aio_bh_new(bdrv_get_aio_context(acb->common.bs),
+                         qed_aio_complete_bh, acb);
     qemu_bh_schedule(acb->bh);
 
     /* Start next allocating write request waiting behind this one.  Note that
@@ -1644,6 +1665,8 @@ static BlockDriver bdrv_qed = {
     .bdrv_change_backing_file = bdrv_qed_change_backing_file,
     .bdrv_invalidate_cache    = bdrv_qed_invalidate_cache,
     .bdrv_check               = bdrv_qed_check,
+    .bdrv_detach_aio_context  = bdrv_qed_detach_aio_context,
+    .bdrv_attach_aio_context  = bdrv_qed_attach_aio_context,
 };
 
 static void bdrv_qed_init(void)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 12/22] quorum: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (10 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 11/22] qed: use BlockDriverState's AioContext Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-05 15:46   ` Benoît Canet
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 13/22] block/raw-posix: " Stefan Hajnoczi
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Benoît Canet, Shergill, Gurinder,
	Stefan Hajnoczi, Paolo Bonzini, Vinod, Chegu

Implement .bdrv_detach/attach_aio_context() interfaces to propagate
detach/attach to BDRVQuorumState->bs[] children.  The block layer takes
care of ->file and ->backing_hd but doesn't know about our ->bs[]
BlockDriverStates, which is also part of the graph.

Cc: Benoît Canet <benoit.canet@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/quorum.c | 48 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index ecec3a5..426077a 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -848,25 +848,49 @@ static void quorum_close(BlockDriverState *bs)
     g_free(s->bs);
 }
 
+static void quorum_detach_aio_context(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        bdrv_detach_aio_context(s->bs[i]);
+    }
+}
+
+static void quorum_attach_aio_context(BlockDriverState *bs,
+                                      AioContext *new_context)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        bdrv_attach_aio_context(s->bs[i], new_context);
+    }
+}
+
 static BlockDriver bdrv_quorum = {
-    .format_name        = "quorum",
-    .protocol_name      = "quorum",
+    .format_name                        = "quorum",
+    .protocol_name                      = "quorum",
+
+    .instance_size                      = sizeof(BDRVQuorumState),
 
-    .instance_size      = sizeof(BDRVQuorumState),
+    .bdrv_file_open                     = quorum_open,
+    .bdrv_close                         = quorum_close,
 
-    .bdrv_file_open     = quorum_open,
-    .bdrv_close         = quorum_close,
+    .bdrv_co_flush_to_disk              = quorum_co_flush,
 
-    .bdrv_co_flush_to_disk = quorum_co_flush,
+    .bdrv_getlength                     = quorum_getlength,
 
-    .bdrv_getlength     = quorum_getlength,
+    .bdrv_aio_readv                     = quorum_aio_readv,
+    .bdrv_aio_writev                    = quorum_aio_writev,
+    .bdrv_invalidate_cache              = quorum_invalidate_cache,
 
-    .bdrv_aio_readv     = quorum_aio_readv,
-    .bdrv_aio_writev    = quorum_aio_writev,
-    .bdrv_invalidate_cache = quorum_invalidate_cache,
+    .bdrv_detach_aio_context            = quorum_detach_aio_context,
+    .bdrv_attach_aio_context            = quorum_attach_aio_context,
 
-    .is_filter           = true,
-    .bdrv_recurse_is_first_non_filter = quorum_recurse_is_first_non_filter,
+    .is_filter                          = true,
+    .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 13/22] block/raw-posix: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (11 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 12/22] quorum: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-02  7:39   ` Paolo Bonzini
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 14/22] block/linux-aio: fix memory and fd leak Stefan Hajnoczi
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

Drop the assumption that we're using the main AioContext for Linux AIO.
Convert the Linux AIO event notifier to use aio_set_event_notifier().

The .bdrv_detach/attach_aio_context() interfaces also need to be
implemented to move the event notifier handler from the old to the new
AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/linux-aio.c | 16 ++++++++++++++--
 block/raw-aio.h   |  2 ++
 block/raw-posix.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 53434e2..7ff3897 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -177,6 +177,20 @@ out_free_aiocb:
     return NULL;
 }
 
+void laio_detach_aio_context(void *s_, AioContext *old_context)
+{
+    struct qemu_laio_state *s = s_;
+
+    aio_set_event_notifier(old_context, &s->e, NULL);
+}
+
+void laio_attach_aio_context(void *s_, AioContext *new_context)
+{
+    struct qemu_laio_state *s = s_;
+
+    aio_set_event_notifier(new_context, &s->e, qemu_laio_completion_cb);
+}
+
 void *laio_init(void)
 {
     struct qemu_laio_state *s;
@@ -190,8 +204,6 @@ void *laio_init(void)
         goto out_close_efd;
     }
 
-    qemu_aio_set_event_notifier(&s->e, qemu_laio_completion_cb);
-
     return s;
 
 out_close_efd:
diff --git a/block/raw-aio.h b/block/raw-aio.h
index 7ad0a8a..9a761ee 100644
--- a/block/raw-aio.h
+++ b/block/raw-aio.h
@@ -37,6 +37,8 @@ void *laio_init(void);
 BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque, int type);
+void laio_detach_aio_context(void *s, AioContext *old_context);
+void laio_attach_aio_context(void *s, AioContext *new_context);
 #endif
 
 #ifdef _WIN32
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 1688e16..9fef157 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -304,6 +304,29 @@ static void raw_parse_flags(int bdrv_flags, int *open_flags)
     }
 }
 
+static void raw_detach_aio_context(BlockDriverState *bs)
+{
+#ifdef CONFIG_LINUX_AIO
+    BDRVRawState *s = bs->opaque;
+
+    if (s->use_aio) {
+        laio_detach_aio_context(s->aio_ctx, bdrv_get_aio_context(bs));
+    }
+#endif
+}
+
+static void raw_attach_aio_context(BlockDriverState *bs,
+                                   AioContext *new_context)
+{
+#ifdef CONFIG_LINUX_AIO
+    BDRVRawState *s = bs->opaque;
+
+    if (s->use_aio) {
+        laio_attach_aio_context(s->aio_ctx, new_context);
+    }
+#endif
+}
+
 #ifdef CONFIG_LINUX_AIO
 static int raw_set_aio(void **aio_ctx, int *use_aio, int bdrv_flags)
 {
@@ -444,6 +467,8 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
 #endif
 
+    raw_attach_aio_context(bs, bdrv_get_aio_context(bs));
+
     ret = 0;
 fail:
     qemu_opts_del(opts);
@@ -1053,6 +1078,9 @@ static BlockDriverAIOCB *raw_aio_flush(BlockDriverState *bs,
 static void raw_close(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
+
+    raw_detach_aio_context(bs);
+
     if (s->fd >= 0) {
         qemu_close(s->fd);
         s->fd = -1;
@@ -1448,6 +1476,9 @@ static BlockDriver bdrv_file = {
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
 
+    .bdrv_detach_aio_context = raw_detach_aio_context,
+    .bdrv_attach_aio_context = raw_attach_aio_context,
+
     .create_options = raw_create_options,
 };
 
@@ -1848,6 +1879,9 @@ static BlockDriver bdrv_host_device = {
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
 
+    .bdrv_detach_aio_context = raw_detach_aio_context,
+    .bdrv_attach_aio_context = raw_attach_aio_context,
+
     /* generic scsi device */
 #ifdef __linux__
     .bdrv_ioctl         = hdev_ioctl,
@@ -1990,6 +2024,9 @@ static BlockDriver bdrv_host_floppy = {
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
 
+    .bdrv_detach_aio_context = raw_detach_aio_context,
+    .bdrv_attach_aio_context = raw_attach_aio_context,
+
     /* removable device support */
     .bdrv_is_inserted   = floppy_is_inserted,
     .bdrv_media_changed = floppy_media_changed,
@@ -2115,6 +2152,9 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
 
+    .bdrv_detach_aio_context = raw_detach_aio_context,
+    .bdrv_attach_aio_context = raw_attach_aio_context,
+
     /* removable device support */
     .bdrv_is_inserted   = cdrom_is_inserted,
     .bdrv_eject         = cdrom_eject,
@@ -2246,6 +2286,9 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
 
+    .bdrv_detach_aio_context = raw_detach_aio_context,
+    .bdrv_attach_aio_context = raw_attach_aio_context,
+
     /* removable device support */
     .bdrv_is_inserted   = cdrom_is_inserted,
     .bdrv_eject         = cdrom_eject,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 14/22] block/linux-aio: fix memory and fd leak
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (12 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 13/22] block/raw-posix: " Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 15/22] rbd: use BlockDriverState's AioContext Stefan Hajnoczi
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

Hot unplugging -drive aio=native,file=test.img,format=raw images leaves
the Linux AIO event notifier and struct qemu_laio_state allocated.
Luckily nothing will use the event notifier after the BlockDriverState
has been closed so the handler function is never called.

It's still worth fixing this resource leak.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/linux-aio.c | 8 ++++++++
 block/raw-aio.h   | 1 +
 block/raw-posix.c | 5 +++++
 3 files changed, 14 insertions(+)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 7ff3897..f0a2c08 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -212,3 +212,11 @@ out_free_state:
     g_free(s);
     return NULL;
 }
+
+void laio_cleanup(void *s_)
+{
+    struct qemu_laio_state *s = s_;
+
+    event_notifier_cleanup(&s->e);
+    g_free(s);
+}
diff --git a/block/raw-aio.h b/block/raw-aio.h
index 9a761ee..55e0ccc 100644
--- a/block/raw-aio.h
+++ b/block/raw-aio.h
@@ -34,6 +34,7 @@
 /* linux-aio.c - Linux native implementation */
 #ifdef CONFIG_LINUX_AIO
 void *laio_init(void);
+void laio_cleanup(void *s);
 BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque, int type);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 9fef157..36366a6 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1081,6 +1081,11 @@ static void raw_close(BlockDriverState *bs)
 
     raw_detach_aio_context(bs);
 
+#ifdef CONFIG_LINUX_AIO
+    if (s->use_aio) {
+        laio_cleanup(s->aio_ctx);
+    }
+#endif
     if (s->fd >= 0) {
         qemu_close(s->fd);
         s->fd = -1;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 15/22] rbd: use BlockDriverState's AioContext
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (13 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 14/22] block/linux-aio: fix memory and fd leak Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 16/22] sheepdog: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Shergill, Gurinder, Stefan Hajnoczi, Josh Durgin,
	Paolo Bonzini, Vinod, Chegu

Drop the assumption that we're using the main AioContext.  Convert
qemu_bh_new() to aio_bh_new() and qemu_aio_wait() to aio_poll().

The .bdrv_detach_aio_context() and .bdrv_attach_aio_context() interfaces
are not needed since no fd handlers, timers, or BHs stay registered when
requests have been drained.

Cc: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/rbd.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index dbc79f4..41f7bdc 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -548,7 +548,7 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb)
     acb->cancelled = 1;
 
     while (acb->status == -EINPROGRESS) {
-        qemu_aio_wait();
+        aio_poll(bdrv_get_aio_context(acb->common.bs), true);
     }
 
     qemu_aio_release(acb);
@@ -581,7 +581,8 @@ static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB *rcb)
     rcb->ret = rbd_aio_get_return_value(c);
     rbd_aio_release(c);
 
-    acb->bh = qemu_bh_new(rbd_finish_bh, rcb);
+    acb->bh = aio_bh_new(bdrv_get_aio_context(acb->common.bs),
+                         rbd_finish_bh, rcb);
     qemu_bh_schedule(acb->bh);
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 16/22] sheepdog: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (14 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 15/22] rbd: use BlockDriverState's AioContext Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-05  8:10   ` Liu Yuan
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 17/22] ssh: use BlockDriverState's AioContext Stefan Hajnoczi
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Shergill, Gurinder, Stefan Hajnoczi, Liu Yuan,
	Paolo Bonzini, Vinod, Chegu, MORITA Kazutaka

Drop the assumption that we're using the main AioContext.  Convert
qemu_aio_set_fd_handler() to aio_set_fd_handler() and qemu_aio_wait() to
aio_poll().

The .bdrv_detach/attach_aio_context() interfaces also need to be
implemented to move the socket fd handler from the old to the new
AioContext.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/sheepdog.c | 118 +++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 80 insertions(+), 38 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 0eb33ee..4727fc1 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -314,6 +314,7 @@ struct SheepdogAIOCB {
 
 typedef struct BDRVSheepdogState {
     BlockDriverState *bs;
+    AioContext *aio_context;
 
     SheepdogInode inode;
 
@@ -496,7 +497,7 @@ static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
             sd_finish_aiocb(acb);
             return;
         }
-        qemu_aio_wait();
+        aio_poll(s->aio_context, true);
     }
 }
 
@@ -582,6 +583,7 @@ static void restart_co_req(void *opaque)
 
 typedef struct SheepdogReqCo {
     int sockfd;
+    AioContext *aio_context;
     SheepdogReq *hdr;
     void *data;
     unsigned int *wlen;
@@ -602,14 +604,14 @@ static coroutine_fn void do_co_req(void *opaque)
     unsigned int *rlen = srco->rlen;
 
     co = qemu_coroutine_self();
-    qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, co);
+    aio_set_fd_handler(srco->aio_context, sockfd, NULL, restart_co_req, co);
 
     ret = send_co_req(sockfd, hdr, data, wlen);
     if (ret < 0) {
         goto out;
     }
 
-    qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, co);
+    aio_set_fd_handler(srco->aio_context, sockfd, restart_co_req, NULL, co);
 
     ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
     if (ret != sizeof(*hdr)) {
@@ -634,18 +636,19 @@ static coroutine_fn void do_co_req(void *opaque)
 out:
     /* there is at most one request for this sockfd, so it is safe to
      * set each handler to NULL. */
-    qemu_aio_set_fd_handler(sockfd, NULL, NULL, NULL);
+    aio_set_fd_handler(srco->aio_context, sockfd, NULL, NULL, NULL);
 
     srco->ret = ret;
     srco->finished = true;
 }
 
-static int do_req(int sockfd, SheepdogReq *hdr, void *data,
-                  unsigned int *wlen, unsigned int *rlen)
+static int do_req(int sockfd, AioContext *aio_context, SheepdogReq *hdr,
+                  void *data, unsigned int *wlen, unsigned int *rlen)
 {
     Coroutine *co;
     SheepdogReqCo srco = {
         .sockfd = sockfd,
+        .aio_context = aio_context,
         .hdr = hdr,
         .data = data,
         .wlen = wlen,
@@ -660,7 +663,7 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data,
         co = qemu_coroutine_create(do_co_req);
         qemu_coroutine_enter(co, &srco);
         while (!srco.finished) {
-            qemu_aio_wait();
+            aio_poll(aio_context, true);
         }
     }
 
@@ -712,7 +715,7 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
     BDRVSheepdogState *s = opaque;
     AIOReq *aio_req, *next;
 
-    qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL);
+    aio_set_fd_handler(s->aio_context, s->fd, NULL, NULL, NULL);
     close(s->fd);
     s->fd = -1;
 
@@ -923,7 +926,7 @@ static int get_sheep_fd(BDRVSheepdogState *s)
         return fd;
     }
 
-    qemu_aio_set_fd_handler(fd, co_read_response, NULL, s);
+    aio_set_fd_handler(s->aio_context, fd, co_read_response, NULL, s);
     return fd;
 }
 
@@ -1093,7 +1096,7 @@ static int find_vdi_name(BDRVSheepdogState *s, const char *filename,
     hdr.snapid = snapid;
     hdr.flags = SD_FLAG_CMD_WRITE;
 
-    ret = do_req(fd, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
+    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
     if (ret) {
         goto out;
     }
@@ -1173,7 +1176,8 @@ static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
 
     qemu_co_mutex_lock(&s->lock);
     s->co_send = qemu_coroutine_self();
-    qemu_aio_set_fd_handler(s->fd, co_read_response, co_write_request, s);
+    aio_set_fd_handler(s->aio_context, s->fd,
+                       co_read_response, co_write_request, s);
     socket_set_cork(s->fd, 1);
 
     /* send a header */
@@ -1191,12 +1195,13 @@ static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
     }
 out:
     socket_set_cork(s->fd, 0);
-    qemu_aio_set_fd_handler(s->fd, co_read_response, NULL, s);
+    aio_set_fd_handler(s->aio_context, s->fd, co_read_response, NULL, s);
     s->co_send = NULL;
     qemu_co_mutex_unlock(&s->lock);
 }
 
-static int read_write_object(int fd, char *buf, uint64_t oid, uint8_t copies,
+static int read_write_object(int fd, AioContext *aio_context, char *buf,
+                             uint64_t oid, uint8_t copies,
                              unsigned int datalen, uint64_t offset,
                              bool write, bool create, uint32_t cache_flags)
 {
@@ -1229,7 +1234,7 @@ static int read_write_object(int fd, char *buf, uint64_t oid, uint8_t copies,
     hdr.offset = offset;
     hdr.copies = copies;
 
-    ret = do_req(fd, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
+    ret = do_req(fd, aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
     if (ret) {
         error_report("failed to send a request to the sheep");
         return ret;
@@ -1244,19 +1249,23 @@ static int read_write_object(int fd, char *buf, uint64_t oid, uint8_t copies,
     }
 }
 
-static int read_object(int fd, char *buf, uint64_t oid, uint8_t copies,
+static int read_object(int fd, AioContext *aio_context, char *buf,
+                       uint64_t oid, uint8_t copies,
                        unsigned int datalen, uint64_t offset,
                        uint32_t cache_flags)
 {
-    return read_write_object(fd, buf, oid, copies, datalen, offset, false,
+    return read_write_object(fd, aio_context, buf, oid, copies,
+                             datalen, offset, false,
                              false, cache_flags);
 }
 
-static int write_object(int fd, char *buf, uint64_t oid, uint8_t copies,
+static int write_object(int fd, AioContext *aio_context, char *buf,
+                        uint64_t oid, uint8_t copies,
                         unsigned int datalen, uint64_t offset, bool create,
                         uint32_t cache_flags)
 {
-    return read_write_object(fd, buf, oid, copies, datalen, offset, true,
+    return read_write_object(fd, aio_context, buf, oid, copies,
+                             datalen, offset, true,
                              create, cache_flags);
 }
 
@@ -1279,7 +1288,7 @@ static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char *tag)
         goto out;
     }
 
-    ret = read_object(fd, (char *)inode, vid_to_vdi_oid(vid),
+    ret = read_object(fd, s->aio_context, (char *)inode, vid_to_vdi_oid(vid),
                       s->inode.nr_copies, sizeof(*inode), 0, s->cache_flags);
     if (ret < 0) {
         goto out;
@@ -1354,6 +1363,22 @@ out:
     }
 }
 
+static void sd_detach_aio_context(BlockDriverState *bs)
+{
+    BDRVSheepdogState *s = bs->opaque;
+
+    aio_set_fd_handler(s->aio_context, s->fd, NULL, NULL, NULL);
+}
+
+static void sd_attach_aio_context(BlockDriverState *bs,
+                                  AioContext *new_context)
+{
+    BDRVSheepdogState *s = bs->opaque;
+
+    s->aio_context = new_context;
+    aio_set_fd_handler(new_context, s->fd, co_read_response, NULL, s);
+}
+
 /* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
     .name = "sheepdog",
@@ -1382,6 +1407,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
     const char *filename;
 
     s->bs = bs;
+    s->aio_context = bdrv_get_aio_context(bs);
 
     opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
     qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -1443,8 +1469,8 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     buf = g_malloc(SD_INODE_SIZE);
-    ret = read_object(fd, buf, vid_to_vdi_oid(vid), 0, SD_INODE_SIZE, 0,
-                      s->cache_flags);
+    ret = read_object(fd, s->aio_context, buf, vid_to_vdi_oid(vid),
+                      0, SD_INODE_SIZE, 0, s->cache_flags);
 
     closesocket(fd);
 
@@ -1463,7 +1489,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
     g_free(buf);
     return 0;
 out:
-    qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL);
+    aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd, NULL, NULL, NULL);
     if (s->fd >= 0) {
         closesocket(s->fd);
     }
@@ -1505,7 +1531,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot)
     hdr.copy_policy = s->inode.copy_policy;
     hdr.copies = s->inode.nr_copies;
 
-    ret = do_req(fd, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
+    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
 
     closesocket(fd);
 
@@ -1751,7 +1777,8 @@ static void sd_close(BlockDriverState *bs)
     hdr.data_length = wlen;
     hdr.flags = SD_FLAG_CMD_WRITE;
 
-    ret = do_req(fd, (SheepdogReq *)&hdr, s->name, &wlen, &rlen);
+    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr,
+                 s->name, &wlen, &rlen);
 
     closesocket(fd);
 
@@ -1760,7 +1787,7 @@ static void sd_close(BlockDriverState *bs)
         error_report("%s, %s", sd_strerror(rsp->result), s->name);
     }
 
-    qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL);
+    aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd, NULL, NULL, NULL);
     closesocket(s->fd);
     g_free(s->host_spec);
 }
@@ -1794,8 +1821,9 @@ static int sd_truncate(BlockDriverState *bs, int64_t offset)
     /* we don't need to update entire object */
     datalen = SD_INODE_SIZE - sizeof(s->inode.data_vdi_id);
     s->inode.vdi_size = offset;
-    ret = write_object(fd, (char *)&s->inode, vid_to_vdi_oid(s->inode.vdi_id),
-                       s->inode.nr_copies, datalen, 0, false, s->cache_flags);
+    ret = write_object(fd, s->aio_context, (char *)&s->inode,
+                       vid_to_vdi_oid(s->inode.vdi_id), s->inode.nr_copies,
+                       datalen, 0, false, s->cache_flags);
     close(fd);
 
     if (ret < 0) {
@@ -1861,7 +1889,8 @@ static bool sd_delete(BDRVSheepdogState *s)
         return false;
     }
 
-    ret = do_req(fd, (SheepdogReq *)&hdr, s->name, &wlen, &rlen);
+    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr,
+                 s->name, &wlen, &rlen);
     closesocket(fd);
     if (ret) {
         return false;
@@ -1913,8 +1942,8 @@ static int sd_create_branch(BDRVSheepdogState *s)
         goto out;
     }
 
-    ret = read_object(fd, buf, vid_to_vdi_oid(vid), s->inode.nr_copies,
-                      SD_INODE_SIZE, 0, s->cache_flags);
+    ret = read_object(fd, s->aio_context, buf, vid_to_vdi_oid(vid),
+                      s->inode.nr_copies, SD_INODE_SIZE, 0, s->cache_flags);
 
     closesocket(fd);
 
@@ -2157,8 +2186,9 @@ static int sd_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
         goto cleanup;
     }
 
-    ret = write_object(fd, (char *)&s->inode, vid_to_vdi_oid(s->inode.vdi_id),
-                       s->inode.nr_copies, datalen, 0, false, s->cache_flags);
+    ret = write_object(fd, s->aio_context, (char *)&s->inode,
+                       vid_to_vdi_oid(s->inode.vdi_id), s->inode.nr_copies,
+                       datalen, 0, false, s->cache_flags);
     if (ret < 0) {
         error_report("failed to write snapshot's inode.");
         goto cleanup;
@@ -2173,8 +2203,9 @@ static int sd_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
 
     inode = (SheepdogInode *)g_malloc(datalen);
 
-    ret = read_object(fd, (char *)inode, vid_to_vdi_oid(new_vid),
-                      s->inode.nr_copies, datalen, 0, s->cache_flags);
+    ret = read_object(fd, s->aio_context, (char *)inode,
+                      vid_to_vdi_oid(new_vid), s->inode.nr_copies,
+                      datalen, 0, s->cache_flags);
 
     if (ret < 0) {
         error_report("failed to read new inode info. %s", strerror(errno));
@@ -2277,7 +2308,8 @@ static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
     req.opcode = SD_OP_READ_VDIS;
     req.data_length = max;
 
-    ret = do_req(fd, (SheepdogReq *)&req, vdi_inuse, &wlen, &rlen);
+    ret = do_req(fd, s->aio_context, (SheepdogReq *)&req,
+                 vdi_inuse, &wlen, &rlen);
 
     closesocket(fd);
     if (ret) {
@@ -2302,7 +2334,8 @@ static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
         }
 
         /* we don't need to read entire object */
-        ret = read_object(fd, (char *)&inode, vid_to_vdi_oid(vid),
+        ret = read_object(fd, s->aio_context, (char *)&inode,
+                          vid_to_vdi_oid(vid),
                           0, SD_INODE_SIZE - sizeof(inode.data_vdi_id), 0,
                           s->cache_flags);
 
@@ -2364,11 +2397,11 @@ static int do_load_save_vmstate(BDRVSheepdogState *s, uint8_t *data,
 
         create = (offset == 0);
         if (load) {
-            ret = read_object(fd, (char *)data, vmstate_oid,
+            ret = read_object(fd, s->aio_context, (char *)data, vmstate_oid,
                               s->inode.nr_copies, data_len, offset,
                               s->cache_flags);
         } else {
-            ret = write_object(fd, (char *)data, vmstate_oid,
+            ret = write_object(fd, s->aio_context, (char *)data, vmstate_oid,
                                s->inode.nr_copies, data_len, offset, create,
                                s->cache_flags);
         }
@@ -2541,6 +2574,9 @@ static BlockDriver bdrv_sheepdog = {
     .bdrv_save_vmstate  = sd_save_vmstate,
     .bdrv_load_vmstate  = sd_load_vmstate,
 
+    .bdrv_detach_aio_context = sd_detach_aio_context,
+    .bdrv_attach_aio_context = sd_attach_aio_context,
+
     .create_options = sd_create_options,
 };
 
@@ -2571,6 +2607,9 @@ static BlockDriver bdrv_sheepdog_tcp = {
     .bdrv_save_vmstate  = sd_save_vmstate,
     .bdrv_load_vmstate  = sd_load_vmstate,
 
+    .bdrv_detach_aio_context = sd_detach_aio_context,
+    .bdrv_attach_aio_context = sd_attach_aio_context,
+
     .create_options = sd_create_options,
 };
 
@@ -2601,6 +2640,9 @@ static BlockDriver bdrv_sheepdog_unix = {
     .bdrv_save_vmstate  = sd_save_vmstate,
     .bdrv_load_vmstate  = sd_load_vmstate,
 
+    .bdrv_detach_aio_context = sd_detach_aio_context,
+    .bdrv_attach_aio_context = sd_attach_aio_context,
+
     .create_options = sd_create_options,
 };
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 17/22] ssh: use BlockDriverState's AioContext
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (15 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 16/22] sheepdog: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 15:03   ` Richard W.M. Jones
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 18/22] vmdk: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Shergill, Gurinder, Richard W.M. Jones,
	Stefan Hajnoczi, Paolo Bonzini, Vinod, Chegu

Drop the assumption that we're using the main AioContext.  Use
bdrv_get_aio_context() to register fd handlers in the right AioContext
for this BlockDriverState.

The .bdrv_detach_aio_context() and .bdrv_attach_aio_context() interfaces
are not needed since no fd handlers, timers, or BHs stay registered when
requests have been drained.

For now this doesn't make much difference but will allow ssh to work in
IOThread instances in the future.

Cc: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/ssh.c | 36 +++++++++++++++++++-----------------
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index aa63c9d..3f4a9fb 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -742,7 +742,7 @@ static void restart_coroutine(void *opaque)
     qemu_coroutine_enter(co, NULL);
 }
 
-static coroutine_fn void set_fd_handler(BDRVSSHState *s)
+static coroutine_fn void set_fd_handler(BDRVSSHState *s, BlockDriverState *bs)
 {
     int r;
     IOHandler *rd_handler = NULL, *wr_handler = NULL;
@@ -760,24 +760,26 @@ static coroutine_fn void set_fd_handler(BDRVSSHState *s)
     DPRINTF("s->sock=%d rd_handler=%p wr_handler=%p", s->sock,
             rd_handler, wr_handler);
 
-    qemu_aio_set_fd_handler(s->sock, rd_handler, wr_handler, co);
+    aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
+                       rd_handler, wr_handler, co);
 }
 
-static coroutine_fn void clear_fd_handler(BDRVSSHState *s)
+static coroutine_fn void clear_fd_handler(BDRVSSHState *s,
+                                          BlockDriverState *bs)
 {
     DPRINTF("s->sock=%d", s->sock);
-    qemu_aio_set_fd_handler(s->sock, NULL, NULL, NULL);
+    aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock, NULL, NULL, NULL);
 }
 
 /* A non-blocking call returned EAGAIN, so yield, ensuring the
  * handlers are set up so that we'll be rescheduled when there is an
  * interesting event on the socket.
  */
-static coroutine_fn void co_yield(BDRVSSHState *s)
+static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
 {
-    set_fd_handler(s);
+    set_fd_handler(s, bs);
     qemu_coroutine_yield();
-    clear_fd_handler(s);
+    clear_fd_handler(s, bs);
 }
 
 /* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
@@ -807,7 +809,7 @@ static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
     }
 }
 
-static coroutine_fn int ssh_read(BDRVSSHState *s,
+static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
                                  int64_t offset, size_t size,
                                  QEMUIOVector *qiov)
 {
@@ -840,7 +842,7 @@ static coroutine_fn int ssh_read(BDRVSSHState *s,
         DPRINTF("sftp_read returned %zd", r);
 
         if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
-            co_yield(s);
+            co_yield(s, bs);
             goto again;
         }
         if (r < 0) {
@@ -875,14 +877,14 @@ static coroutine_fn int ssh_co_readv(BlockDriverState *bs,
     int ret;
 
     qemu_co_mutex_lock(&s->lock);
-    ret = ssh_read(s, sector_num * BDRV_SECTOR_SIZE,
+    ret = ssh_read(s, bs, sector_num * BDRV_SECTOR_SIZE,
                    nb_sectors * BDRV_SECTOR_SIZE, qiov);
     qemu_co_mutex_unlock(&s->lock);
 
     return ret;
 }
 
-static int ssh_write(BDRVSSHState *s,
+static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
                      int64_t offset, size_t size,
                      QEMUIOVector *qiov)
 {
@@ -910,7 +912,7 @@ static int ssh_write(BDRVSSHState *s,
         DPRINTF("sftp_write returned %zd", r);
 
         if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
-            co_yield(s);
+            co_yield(s, bs);
             goto again;
         }
         if (r < 0) {
@@ -929,7 +931,7 @@ static int ssh_write(BDRVSSHState *s,
          */
         if (r == 0) {
             ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
-            co_yield(s);
+            co_yield(s, bs);
             goto again;
         }
 
@@ -957,7 +959,7 @@ static coroutine_fn int ssh_co_writev(BlockDriverState *bs,
     int ret;
 
     qemu_co_mutex_lock(&s->lock);
-    ret = ssh_write(s, sector_num * BDRV_SECTOR_SIZE,
+    ret = ssh_write(s, bs, sector_num * BDRV_SECTOR_SIZE,
                     nb_sectors * BDRV_SECTOR_SIZE, qiov);
     qemu_co_mutex_unlock(&s->lock);
 
@@ -978,7 +980,7 @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
 
 #ifdef HAS_LIBSSH2_SFTP_FSYNC
 
-static coroutine_fn int ssh_flush(BDRVSSHState *s)
+static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
 {
     int r;
 
@@ -986,7 +988,7 @@ static coroutine_fn int ssh_flush(BDRVSSHState *s)
  again:
     r = libssh2_sftp_fsync(s->sftp_handle);
     if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
-        co_yield(s);
+        co_yield(s, bs);
         goto again;
     }
     if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
@@ -1008,7 +1010,7 @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
     int ret;
 
     qemu_co_mutex_lock(&s->lock);
-    ret = ssh_flush(s);
+    ret = ssh_flush(s, bs);
     qemu_co_mutex_unlock(&s->lock);
 
     return ret;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 18/22] vmdk: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (16 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 17/22] ssh: use BlockDriverState's AioContext Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-04  9:50   ` Fam Zheng
  2014-05-04 10:17   ` Fam Zheng
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 19/22] dataplane: use the QEMU block layer for I/O Stefan Hajnoczi
                   ` (5 subsequent siblings)
  23 siblings, 2 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Shergill, Gurinder, Stefan Hajnoczi,
	Paolo Bonzini, Vinod, Chegu

Implement .bdrv_detach/attach_aio_context() interfaces to propagate
detach/attach to BDRVVmdkState->extents[].file.  The block layer takes
care of ->file and ->backing_hd but doesn't know about our extents
BlockDriverStates, which is also part of the graph.

Cc: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/vmdk.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/block/vmdk.c b/block/vmdk.c
index 06a1f9f..1ca944a 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2063,6 +2063,27 @@ static ImageInfoSpecific *vmdk_get_specific_info(BlockDriverState *bs)
     return spec_info;
 }
 
+static void vmdk_detach_aio_context(BlockDriverState *bs)
+{
+    BDRVVmdkState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_extents; i++) {
+        bdrv_detach_aio_context(s->extents[i].file);
+    }
+}
+
+static void vmdk_attach_aio_context(BlockDriverState *bs,
+                                    AioContext *new_context)
+{
+    BDRVVmdkState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_extents; i++) {
+        bdrv_attach_aio_context(s->extents[i].file, new_context);
+    }
+}
+
 static QEMUOptionParameter vmdk_create_options[] = {
     {
         .name = BLOCK_OPT_SIZE,
@@ -2118,6 +2139,8 @@ static BlockDriver bdrv_vmdk = {
     .bdrv_has_zero_init           = vmdk_has_zero_init,
     .bdrv_get_specific_info       = vmdk_get_specific_info,
     .bdrv_refresh_limits          = vmdk_refresh_limits,
+    .bdrv_detach_aio_context      = vmdk_detach_aio_context,
+    .bdrv_attach_aio_context      = vmdk_attach_aio_context,
 
     .create_options               = vmdk_create_options,
 };
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 19/22] dataplane: use the QEMU block layer for I/O
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (17 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 18/22] vmdk: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-04 11:51   ` Fam Zheng
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 20/22] dataplane: delete IOQueue since it is no longer used Stefan Hajnoczi
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

Stop using a custom Linux AIO request queue from ioq.h and instead use
the QEMU block layer for I/O.

This patch adjusts the VirtIOBlockRequest struct with fields needed for
bdrv_aio_readv()/bdrv_aio_writev().  ioq.h used struct iovec and struct
iocb, which we don't need directly anymore.

Modify dataplane start/stop to set the AioContext on the
BlockDriverState.  We also no longer need to get the raw-posix file
descriptor.  This means image formats are now supported with dataplane!

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 194 ++++++++++++++--------------------------
 1 file changed, 66 insertions(+), 128 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 70b8a5a..0cd74f2 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -17,7 +17,6 @@
 #include "qemu/thread.h"
 #include "qemu/error-report.h"
 #include "hw/virtio/dataplane/vring.h"
-#include "ioq.h"
 #include "block/block.h"
 #include "hw/virtio/virtio-blk.h"
 #include "virtio-blk.h"
@@ -25,20 +24,14 @@
 #include "hw/virtio/virtio-bus.h"
 #include "qom/object_interfaces.h"
 
-enum {
-    SEG_MAX = 126,                  /* maximum number of I/O segments */
-    VRING_MAX = SEG_MAX + 2,        /* maximum number of vring descriptors */
-    REQ_MAX = VRING_MAX,            /* maximum number of requests in the vring,
-                                     * is VRING_MAX / 2 with traditional and
-                                     * VRING_MAX with indirect descriptors */
-};
-
 typedef struct {
-    struct iocb iocb;               /* Linux AIO control block */
+    VirtIOBlockDataPlane *s;
     QEMUIOVector *inhdr;            /* iovecs for virtio_blk_inhdr */
     VirtQueueElement *elem;         /* saved data from the virtqueue */
-    struct iovec *bounce_iov;       /* used if guest buffers are unaligned */
-    QEMUIOVector *read_qiov;        /* for read completion /w bounce buffer */
+    QEMUIOVector qiov;              /* original request iovecs */
+    struct iovec bounce_iov;        /* used if guest buffers are unaligned */
+    QEMUIOVector bounce_qiov;       /* bounce buffer iovecs */
+    bool read;                      /* read or write? */
 } VirtIOBlockRequest;
 
 struct VirtIOBlockDataPlane {
@@ -61,15 +54,7 @@ struct VirtIOBlockDataPlane {
     IOThread *iothread;
     IOThread internal_iothread_obj;
     AioContext *ctx;
-    EventNotifier io_notifier;      /* Linux AIO completion */
     EventNotifier host_notifier;    /* doorbell */
-
-    IOQueue ioqueue;                /* Linux AIO queue (should really be per
-                                       IOThread) */
-    VirtIOBlockRequest requests[REQ_MAX]; /* pool of requests, managed by the
-                                             queue */
-
-    unsigned int num_reqs;
 };
 
 /* Raise an interrupt to signal guest, if necessary */
@@ -82,33 +67,28 @@ static void notify_guest(VirtIOBlockDataPlane *s)
     event_notifier_set(s->guest_notifier);
 }
 
-static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
+static void complete_rdwr(void *opaque, int ret)
 {
-    VirtIOBlockDataPlane *s = opaque;
-    VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
+    VirtIOBlockRequest *req = opaque;
     struct virtio_blk_inhdr hdr;
     int len;
 
-    if (likely(ret >= 0)) {
+    if (likely(ret == 0)) {
         hdr.status = VIRTIO_BLK_S_OK;
-        len = ret;
+        len = req->qiov.size;
     } else {
         hdr.status = VIRTIO_BLK_S_IOERR;
         len = 0;
     }
 
-    trace_virtio_blk_data_plane_complete_request(s, req->elem->index, ret);
+    trace_virtio_blk_data_plane_complete_request(req->s, req->elem->index, ret);
 
-    if (req->read_qiov) {
-        assert(req->bounce_iov);
-        qemu_iovec_from_buf(req->read_qiov, 0, req->bounce_iov->iov_base, len);
-        qemu_iovec_destroy(req->read_qiov);
-        g_slice_free(QEMUIOVector, req->read_qiov);
+    if (req->read && req->bounce_iov.iov_base) {
+        qemu_iovec_from_buf(&req->qiov, 0, req->bounce_iov.iov_base, len);
     }
 
-    if (req->bounce_iov) {
-        qemu_vfree(req->bounce_iov->iov_base);
-        g_slice_free(struct iovec, req->bounce_iov);
+    if (req->bounce_iov.iov_base) {
+        qemu_vfree(req->bounce_iov.iov_base);
     }
 
     qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr));
@@ -119,9 +99,9 @@ static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
      * written to, but for virtio-blk it seems to be the number of bytes
      * transferred plus the status bytes.
      */
-    vring_push(&s->vring, req->elem, len + sizeof(hdr));
-    req->elem = NULL;
-    s->num_reqs--;
+    vring_push(&req->s->vring, req->elem, len + sizeof(hdr));
+    notify_guest(req->s);
+    g_slice_free(VirtIOBlockRequest, req);
 }
 
 static void complete_request_early(VirtIOBlockDataPlane *s, VirtQueueElement *elem,
@@ -152,51 +132,53 @@ static void do_get_id_cmd(VirtIOBlockDataPlane *s,
     complete_request_early(s, elem, inhdr, VIRTIO_BLK_S_OK);
 }
 
-static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
-                       struct iovec *iov, unsigned iov_cnt,
-                       long long offset, VirtQueueElement *elem,
-                       QEMUIOVector *inhdr)
+static void do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
+                        struct iovec *iov, unsigned iov_cnt,
+                        int64_t sector_num, VirtQueueElement *elem,
+                        QEMUIOVector *inhdr)
 {
-    struct iocb *iocb;
-    QEMUIOVector qiov;
-    struct iovec *bounce_iov = NULL;
-    QEMUIOVector *read_qiov = NULL;
-
-    qemu_iovec_init_external(&qiov, iov, iov_cnt);
-    if (!bdrv_qiov_is_aligned(s->blk->conf.bs, &qiov)) {
-        void *bounce_buffer = qemu_blockalign(s->blk->conf.bs, qiov.size);
-
-        if (read) {
-            /* Need to copy back from bounce buffer on completion */
-            read_qiov = g_slice_new(QEMUIOVector);
-            qemu_iovec_init(read_qiov, iov_cnt);
-            qemu_iovec_concat_iov(read_qiov, iov, iov_cnt, 0, qiov.size);
-        } else {
-            qemu_iovec_to_buf(&qiov, 0, bounce_buffer, qiov.size);
+    VirtIOBlockRequest *req = g_slice_new(VirtIOBlockRequest);
+    QEMUIOVector *qiov;
+    int nb_sectors;
+
+    /* Fill in virtio block metadata needed for completion */
+    memset(req, 0, sizeof(*req));
+    req->s = s;
+    req->elem = elem;
+    req->inhdr = inhdr;
+    req->read = read;
+    qemu_iovec_init_external(&req->qiov, iov, iov_cnt);
+
+    qiov = &req->qiov;
+
+    if (!bdrv_qiov_is_aligned(s->blk->conf.bs, qiov)) {
+        void *bounce_buffer = qemu_blockalign(s->blk->conf.bs, qiov->size);
+
+        /* Populate bounce buffer with data for writes */
+        if (!read) {
+            qemu_iovec_to_buf(qiov, 0, bounce_buffer, qiov->size);
         }
 
         /* Redirect I/O to aligned bounce buffer */
-        bounce_iov = g_slice_new(struct iovec);
-        bounce_iov->iov_base = bounce_buffer;
-        bounce_iov->iov_len = qiov.size;
-        iov = bounce_iov;
-        iov_cnt = 1;
+        req->bounce_iov.iov_base = bounce_buffer;
+        req->bounce_iov.iov_len = qiov->size;
+        qemu_iovec_init_external(&req->bounce_qiov, &req->bounce_iov, 1);
+        qiov = &req->bounce_qiov;
     }
 
-    iocb = ioq_rdwr(&s->ioqueue, read, iov, iov_cnt, offset);
+    nb_sectors = qiov->size / BDRV_SECTOR_SIZE;
 
-    /* Fill in virtio block metadata needed for completion */
-    VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
-    req->elem = elem;
-    req->inhdr = inhdr;
-    req->bounce_iov = bounce_iov;
-    req->read_qiov = read_qiov;
-    return 0;
+    if (read) {
+        bdrv_aio_readv(s->blk->conf.bs, sector_num, qiov, nb_sectors,
+                       complete_rdwr, req);
+    } else {
+        bdrv_aio_writev(s->blk->conf.bs, sector_num, qiov, nb_sectors,
+                        complete_rdwr, req);
+    }
 }
 
-static int process_request(IOQueue *ioq, VirtQueueElement *elem)
+static int process_request(VirtIOBlockDataPlane *s, VirtQueueElement *elem)
 {
-    VirtIOBlockDataPlane *s = container_of(ioq, VirtIOBlockDataPlane, ioqueue);
     struct iovec *iov = elem->out_sg;
     struct iovec *in_iov = elem->in_sg;
     unsigned out_num = elem->out_num;
@@ -231,11 +213,15 @@ static int process_request(IOQueue *ioq, VirtQueueElement *elem)
 
     switch (outhdr.type) {
     case VIRTIO_BLK_T_IN:
-        do_rdwr_cmd(s, true, in_iov, in_num, outhdr.sector * 512, elem, inhdr);
+        do_rdwr_cmd(s, true, in_iov, in_num,
+                    outhdr.sector * 512 / BDRV_SECTOR_SIZE,
+                    elem, inhdr);
         return 0;
 
     case VIRTIO_BLK_T_OUT:
-        do_rdwr_cmd(s, false, iov, out_num, outhdr.sector * 512, elem, inhdr);
+        do_rdwr_cmd(s, false, iov, out_num,
+                    outhdr.sector * 512 / BDRV_SECTOR_SIZE,
+                    elem, inhdr);
         return 0;
 
     case VIRTIO_BLK_T_SCSI_CMD:
@@ -271,7 +257,6 @@ static void handle_notify(EventNotifier *e)
 
     VirtQueueElement *elem;
     int ret;
-    unsigned int num_queued;
 
     event_notifier_test_and_clear(&s->host_notifier);
     for (;;) {
@@ -288,7 +273,7 @@ static void handle_notify(EventNotifier *e)
             trace_virtio_blk_data_plane_process_request(s, elem->out_num,
                                                         elem->in_num, elem->index);
 
-            if (process_request(&s->ioqueue, elem) < 0) {
+            if (process_request(s, elem) < 0) {
                 vring_set_broken(&s->vring);
                 vring_free_element(elem);
                 ret = -EFAULT;
@@ -303,44 +288,10 @@ static void handle_notify(EventNotifier *e)
             if (vring_enable_notification(s->vdev, &s->vring)) {
                 break;
             }
-        } else { /* ret == -ENOBUFS or fatal error, iovecs[] is depleted */
-            /* Since there are no iovecs[] left, stop processing for now.  Do
-             * not re-enable guest->host notifies since the I/O completion
-             * handler knows to check for more vring descriptors anyway.
-             */
+        } else { /* fatal error */
             break;
         }
     }
-
-    num_queued = ioq_num_queued(&s->ioqueue);
-    if (num_queued > 0) {
-        s->num_reqs += num_queued;
-
-        int rc = ioq_submit(&s->ioqueue);
-        if (unlikely(rc < 0)) {
-            fprintf(stderr, "ioq_submit failed %d\n", rc);
-            exit(1);
-        }
-    }
-}
-
-static void handle_io(EventNotifier *e)
-{
-    VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
-                                           io_notifier);
-
-    event_notifier_test_and_clear(&s->io_notifier);
-    if (ioq_run_completion(&s->ioqueue, complete_request, s) > 0) {
-        notify_guest(s);
-    }
-
-    /* If there were more requests than iovecs, the vring will not be empty yet
-     * so check again.  There should now be enough resources to process more
-     * requests.
-     */
-    if (unlikely(vring_more_avail(&s->vring))) {
-        handle_notify(&s->host_notifier);
-    }
 }
 
 /* Context: QEMU global mutex held */
@@ -431,7 +382,6 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s->vdev)));
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     VirtQueue *vq;
-    int i;
 
     if (s->started) {
         return;
@@ -464,24 +414,18 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
     }
     s->host_notifier = *virtio_queue_get_host_notifier(vq);
 
-    /* Set up ioqueue */
-    ioq_init(&s->ioqueue, s->fd, REQ_MAX);
-    for (i = 0; i < ARRAY_SIZE(s->requests); i++) {
-        ioq_put_iocb(&s->ioqueue, &s->requests[i].iocb);
-    }
-    s->io_notifier = *ioq_get_notifier(&s->ioqueue);
-
     s->starting = false;
     s->started = true;
     trace_virtio_blk_data_plane_start(s);
 
+    bdrv_set_aio_context(s->blk->conf.bs, s->ctx);
+
     /* Kick right away to begin processing requests already in vring */
     event_notifier_set(virtio_queue_get_host_notifier(vq));
 
     /* Get this show started by hooking up our callbacks */
     aio_context_acquire(s->ctx);
     aio_set_event_notifier(s->ctx, &s->host_notifier, handle_notify);
-    aio_set_event_notifier(s->ctx, &s->io_notifier, handle_io);
     aio_context_release(s->ctx);
 }
 
@@ -501,13 +445,8 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
     /* Stop notifications for new requests from guest */
     aio_set_event_notifier(s->ctx, &s->host_notifier, NULL);
 
-    /* Complete pending requests */
-    while (s->num_reqs > 0) {
-        aio_poll(s->ctx, true);
-    }
-
-    /* Stop ioq callbacks (there are no pending requests left) */
-    aio_set_event_notifier(s->ctx, &s->io_notifier, NULL);
+    /* Drain and switch bs back to the QEMU main loop */
+    bdrv_set_aio_context(s->blk->conf.bs, qemu_get_aio_context());
 
     aio_context_release(s->ctx);
 
@@ -516,7 +455,6 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
      */
     vring_teardown(&s->vring, s->vdev, 0);
 
-    ioq_cleanup(&s->ioqueue);
     k->set_host_notifier(qbus->parent, 0, false);
 
     /* Clean up guest notifier (irq) */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 20/22] dataplane: delete IOQueue since it is no longer used
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (18 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 19/22] dataplane: use the QEMU block layer for I/O Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 21/22] dataplane: implement async flush Stefan Hajnoczi
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

This custom Linux AIO request queue is no longer used by virtio-blk
data-plane.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/Makefile.objs |   2 +-
 hw/block/dataplane/ioq.c         | 117 ---------------------------------------
 hw/block/dataplane/ioq.h         |  57 -------------------
 3 files changed, 1 insertion(+), 175 deletions(-)
 delete mode 100644 hw/block/dataplane/ioq.c
 delete mode 100644 hw/block/dataplane/ioq.h

diff --git a/hw/block/dataplane/Makefile.objs b/hw/block/dataplane/Makefile.objs
index 9da2eb8..e786f66 100644
--- a/hw/block/dataplane/Makefile.objs
+++ b/hw/block/dataplane/Makefile.objs
@@ -1 +1 @@
-obj-y += ioq.o virtio-blk.o
+obj-y += virtio-blk.o
diff --git a/hw/block/dataplane/ioq.c b/hw/block/dataplane/ioq.c
deleted file mode 100644
index f709f87..0000000
--- a/hw/block/dataplane/ioq.c
+++ /dev/null
@@ -1,117 +0,0 @@
-/*
- * Linux AIO request queue
- *
- * Copyright 2012 IBM, Corp.
- * Copyright 2012 Red Hat, Inc. and/or its affiliates
- *
- * Authors:
- *   Stefan Hajnoczi <stefanha@redhat.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#include "ioq.h"
-
-void ioq_init(IOQueue *ioq, int fd, unsigned int max_reqs)
-{
-    int rc;
-
-    ioq->fd = fd;
-    ioq->max_reqs = max_reqs;
-
-    memset(&ioq->io_ctx, 0, sizeof ioq->io_ctx);
-    rc = io_setup(max_reqs, &ioq->io_ctx);
-    if (rc != 0) {
-        fprintf(stderr, "ioq io_setup failed %d\n", rc);
-        exit(1);
-    }
-
-    rc = event_notifier_init(&ioq->io_notifier, 0);
-    if (rc != 0) {
-        fprintf(stderr, "ioq io event notifier creation failed %d\n", rc);
-        exit(1);
-    }
-
-    ioq->freelist = g_malloc0(sizeof ioq->freelist[0] * max_reqs);
-    ioq->freelist_idx = 0;
-
-    ioq->queue = g_malloc0(sizeof ioq->queue[0] * max_reqs);
-    ioq->queue_idx = 0;
-}
-
-void ioq_cleanup(IOQueue *ioq)
-{
-    g_free(ioq->freelist);
-    g_free(ioq->queue);
-
-    event_notifier_cleanup(&ioq->io_notifier);
-    io_destroy(ioq->io_ctx);
-}
-
-EventNotifier *ioq_get_notifier(IOQueue *ioq)
-{
-    return &ioq->io_notifier;
-}
-
-struct iocb *ioq_get_iocb(IOQueue *ioq)
-{
-    /* Underflow cannot happen since ioq is sized for max_reqs */
-    assert(ioq->freelist_idx != 0);
-
-    struct iocb *iocb = ioq->freelist[--ioq->freelist_idx];
-    ioq->queue[ioq->queue_idx++] = iocb;
-    return iocb;
-}
-
-void ioq_put_iocb(IOQueue *ioq, struct iocb *iocb)
-{
-    /* Overflow cannot happen since ioq is sized for max_reqs */
-    assert(ioq->freelist_idx != ioq->max_reqs);
-
-    ioq->freelist[ioq->freelist_idx++] = iocb;
-}
-
-struct iocb *ioq_rdwr(IOQueue *ioq, bool read, struct iovec *iov,
-                      unsigned int count, long long offset)
-{
-    struct iocb *iocb = ioq_get_iocb(ioq);
-
-    if (read) {
-        io_prep_preadv(iocb, ioq->fd, iov, count, offset);
-    } else {
-        io_prep_pwritev(iocb, ioq->fd, iov, count, offset);
-    }
-    io_set_eventfd(iocb, event_notifier_get_fd(&ioq->io_notifier));
-    return iocb;
-}
-
-int ioq_submit(IOQueue *ioq)
-{
-    int rc = io_submit(ioq->io_ctx, ioq->queue_idx, ioq->queue);
-    ioq->queue_idx = 0; /* reset */
-    return rc;
-}
-
-int ioq_run_completion(IOQueue *ioq, IOQueueCompletion *completion,
-                       void *opaque)
-{
-    struct io_event events[ioq->max_reqs];
-    int nevents, i;
-
-    do {
-        nevents = io_getevents(ioq->io_ctx, 0, ioq->max_reqs, events, NULL);
-    } while (nevents < 0 && errno == EINTR);
-    if (nevents < 0) {
-        return nevents;
-    }
-
-    for (i = 0; i < nevents; i++) {
-        ssize_t ret = ((uint64_t)events[i].res2 << 32) | events[i].res;
-
-        completion(events[i].obj, ret, opaque);
-        ioq_put_iocb(ioq, events[i].obj);
-    }
-    return nevents;
-}
diff --git a/hw/block/dataplane/ioq.h b/hw/block/dataplane/ioq.h
deleted file mode 100644
index b49b5de..0000000
--- a/hw/block/dataplane/ioq.h
+++ /dev/null
@@ -1,57 +0,0 @@
-/*
- * Linux AIO request queue
- *
- * Copyright 2012 IBM, Corp.
- * Copyright 2012 Red Hat, Inc. and/or its affiliates
- *
- * Authors:
- *   Stefan Hajnoczi <stefanha@redhat.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef IOQ_H
-#define IOQ_H
-
-#include <libaio.h>
-#include "qemu/event_notifier.h"
-
-typedef struct {
-    int fd;                         /* file descriptor */
-    unsigned int max_reqs;          /* max length of freelist and queue */
-
-    io_context_t io_ctx;            /* Linux AIO context */
-    EventNotifier io_notifier;      /* Linux AIO eventfd */
-
-    /* Requests can complete in any order so a free list is necessary to manage
-     * available iocbs.
-     */
-    struct iocb **freelist;         /* free iocbs */
-    unsigned int freelist_idx;
-
-    /* Multiple requests are queued up before submitting them all in one go */
-    struct iocb **queue;            /* queued iocbs */
-    unsigned int queue_idx;
-} IOQueue;
-
-void ioq_init(IOQueue *ioq, int fd, unsigned int max_reqs);
-void ioq_cleanup(IOQueue *ioq);
-EventNotifier *ioq_get_notifier(IOQueue *ioq);
-struct iocb *ioq_get_iocb(IOQueue *ioq);
-void ioq_put_iocb(IOQueue *ioq, struct iocb *iocb);
-struct iocb *ioq_rdwr(IOQueue *ioq, bool read, struct iovec *iov,
-                      unsigned int count, long long offset);
-int ioq_submit(IOQueue *ioq);
-
-static inline unsigned int ioq_num_queued(IOQueue *ioq)
-{
-    return ioq->queue_idx;
-}
-
-typedef void IOQueueCompletion(struct iocb *iocb, ssize_t ret, void *opaque);
-int ioq_run_completion(IOQueue *ioq, IOQueueCompletion *completion,
-                       void *opaque);
-
-#endif /* IOQ_H */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 21/22] dataplane: implement async flush
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (19 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 20/22] dataplane: delete IOQueue since it is no longer used Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 22/22] raw-posix: drop raw_get_aio_fd() since it is no longer used Stefan Hajnoczi
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

Stop using the raw-posix file descriptor for synchronous
qemu_fdatasync().  Use bdrv_aio_flush() instead and drop the
VirtIOBlockDataPlane->fd field.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 43 ++++++++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 0cd74f2..96a9aef 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -40,7 +40,6 @@ struct VirtIOBlockDataPlane {
     bool stopping;
 
     VirtIOBlkConf *blk;
-    int fd;                         /* image file descriptor */
 
     VirtIODevice *vdev;
     Vring vring;                    /* virtqueue vring */
@@ -177,6 +176,32 @@ static void do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
     }
 }
 
+static void complete_flush(void *opaque, int ret)
+{
+    VirtIOBlockRequest *req = opaque;
+    unsigned char status;
+
+    if (ret == 0) {
+        status = VIRTIO_BLK_S_OK;
+    } else {
+        status = VIRTIO_BLK_S_IOERR;
+    }
+
+    complete_request_early(req->s, req->elem, req->inhdr, status);
+    g_slice_free(VirtIOBlockRequest, req);
+}
+
+static void do_flush_cmd(VirtIOBlockDataPlane *s, VirtQueueElement *elem,
+                         QEMUIOVector *inhdr)
+{
+    VirtIOBlockRequest *req = g_slice_new(VirtIOBlockRequest);
+    req->s = s;
+    req->elem = elem;
+    req->inhdr = inhdr;
+
+    bdrv_aio_flush(s->blk->conf.bs, complete_flush, req);
+}
+
 static int process_request(VirtIOBlockDataPlane *s, VirtQueueElement *elem)
 {
     struct iovec *iov = elem->out_sg;
@@ -230,12 +255,7 @@ static int process_request(VirtIOBlockDataPlane *s, VirtQueueElement *elem)
         return 0;
 
     case VIRTIO_BLK_T_FLUSH:
-        /* TODO fdsync not supported by Linux AIO, do it synchronously here! */
-        if (qemu_fdatasync(s->fd) < 0) {
-            complete_request_early(s, elem, inhdr, VIRTIO_BLK_S_IOERR);
-        } else {
-            complete_request_early(s, elem, inhdr, VIRTIO_BLK_S_OK);
-        }
+        do_flush_cmd(s, elem, inhdr);
         return 0;
 
     case VIRTIO_BLK_T_GET_ID:
@@ -300,7 +320,6 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *blk,
                                   Error **errp)
 {
     VirtIOBlockDataPlane *s;
-    int fd;
 
     *dataplane = NULL;
 
@@ -329,16 +348,8 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *blk,
         return;
     }
 
-    fd = raw_get_aio_fd(blk->conf.bs);
-    if (fd < 0) {
-        error_setg(errp, "drive is incompatible with x-data-plane, "
-                         "use format=raw,cache=none,aio=native");
-        return;
-    }
-
     s = g_new0(VirtIOBlockDataPlane, 1);
     s->vdev = vdev;
-    s->fd = fd;
     s->blk = blk;
 
     if (blk->iothread) {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Qemu-devel] [PATCH 22/22] raw-posix: drop raw_get_aio_fd() since it is no longer used
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (20 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 21/22] dataplane: implement async flush Stefan Hajnoczi
@ 2014-05-01 14:54 ` Stefan Hajnoczi
  2014-05-02  7:42 ` [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Paolo Bonzini
  2014-05-05  9:17 ` Christian Borntraeger
  23 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 14:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu,
	Stefan Hajnoczi

virtio-blk data-plane now uses the QEMU block layer for I/O.  We do not
need raw_get_aio_fd() anymore.  It was a layering violation anyway, so
let's get rid of it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/raw-posix.c     | 34 ----------------------------------
 include/block/block.h |  9 ---------
 2 files changed, 43 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 36366a6..17e5016 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -2301,40 +2301,6 @@ static BlockDriver bdrv_host_cdrom = {
 };
 #endif /* __FreeBSD__ */
 
-#ifdef CONFIG_LINUX_AIO
-/**
- * Return the file descriptor for Linux AIO
- *
- * This function is a layering violation and should be removed when it becomes
- * possible to call the block layer outside the global mutex.  It allows the
- * caller to hijack the file descriptor so I/O can be performed outside the
- * block layer.
- */
-int raw_get_aio_fd(BlockDriverState *bs)
-{
-    BDRVRawState *s;
-
-    if (!bs->drv) {
-        return -ENOMEDIUM;
-    }
-
-    if (bs->drv == bdrv_find_format("raw")) {
-        bs = bs->file;
-    }
-
-    /* raw-posix has several protocols so just check for raw_aio_readv */
-    if (bs->drv->bdrv_aio_readv != raw_aio_readv) {
-        return -ENOTSUP;
-    }
-
-    s = bs->opaque;
-    if (!s->use_aio) {
-        return -ENOTSUP;
-    }
-    return s->fd;
-}
-#endif /* CONFIG_LINUX_AIO */
-
 static void bdrv_file_init(void)
 {
     /*
diff --git a/include/block/block.h b/include/block/block.h
index 5660184..15e88fd 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -448,15 +448,6 @@ void bdrv_unref(BlockDriverState *bs);
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
-#ifdef CONFIG_LINUX_AIO
-int raw_get_aio_fd(BlockDriverState *bs);
-#else
-static inline int raw_get_aio_fd(BlockDriverState *bs)
-{
-    return -ENOTSUP;
-}
-#endif
-
 enum BlockAcctType {
     BDRV_ACCT_READ,
     BDRV_ACCT_WRITE,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 17/22] ssh: use BlockDriverState's AioContext
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 17/22] ssh: use BlockDriverState's AioContext Stefan Hajnoczi
@ 2014-05-01 15:03   ` Richard W.M. Jones
  2014-05-01 15:13     ` Stefan Hajnoczi
  0 siblings, 1 reply; 53+ messages in thread
From: Richard W.M. Jones @ 2014-05-01 15:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu, qemu-devel

On Thu, May 01, 2014 at 04:54:41PM +0200, Stefan Hajnoczi wrote:
> Drop the assumption that we're using the main AioContext.  Use
> bdrv_get_aio_context() to register fd handlers in the right AioContext
> for this BlockDriverState.
> 
> The .bdrv_detach_aio_context() and .bdrv_attach_aio_context() interfaces
> are not needed since no fd handlers, timers, or BHs stay registered when
> requests have been drained.
> 
> For now this doesn't make much difference but will allow ssh to work in
> IOThread instances in the future.
> 
> Cc: Richard W.M. Jones <rjones@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/ssh.c | 36 +++++++++++++++++++-----------------
>  1 file changed, 19 insertions(+), 17 deletions(-)
> 
> diff --git a/block/ssh.c b/block/ssh.c
> index aa63c9d..3f4a9fb 100644
> --- a/block/ssh.c
> +++ b/block/ssh.c
> @@ -742,7 +742,7 @@ static void restart_coroutine(void *opaque)
>      qemu_coroutine_enter(co, NULL);
>  }
>  
> -static coroutine_fn void set_fd_handler(BDRVSSHState *s)
> +static coroutine_fn void set_fd_handler(BDRVSSHState *s, BlockDriverState *bs)
>  {
>      int r;
>      IOHandler *rd_handler = NULL, *wr_handler = NULL;
> @@ -760,24 +760,26 @@ static coroutine_fn void set_fd_handler(BDRVSSHState *s)
>      DPRINTF("s->sock=%d rd_handler=%p wr_handler=%p", s->sock,
>              rd_handler, wr_handler);
>  
> -    qemu_aio_set_fd_handler(s->sock, rd_handler, wr_handler, co);
> +    aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
> +                       rd_handler, wr_handler, co);
>  }
>  
> -static coroutine_fn void clear_fd_handler(BDRVSSHState *s)
> +static coroutine_fn void clear_fd_handler(BDRVSSHState *s,
> +                                          BlockDriverState *bs)
>  {
>      DPRINTF("s->sock=%d", s->sock);
> -    qemu_aio_set_fd_handler(s->sock, NULL, NULL, NULL);
> +    aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock, NULL, NULL, NULL);
>  }
>  
>  /* A non-blocking call returned EAGAIN, so yield, ensuring the
>   * handlers are set up so that we'll be rescheduled when there is an
>   * interesting event on the socket.
>   */
> -static coroutine_fn void co_yield(BDRVSSHState *s)
> +static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
>  {
> -    set_fd_handler(s);
> +    set_fd_handler(s, bs);
>      qemu_coroutine_yield();
> -    clear_fd_handler(s);
> +    clear_fd_handler(s, bs);
>  }
>  
>  /* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
> @@ -807,7 +809,7 @@ static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
>      }
>  }
>  
> -static coroutine_fn int ssh_read(BDRVSSHState *s,
> +static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
>                                   int64_t offset, size_t size,
>                                   QEMUIOVector *qiov)
>  {
> @@ -840,7 +842,7 @@ static coroutine_fn int ssh_read(BDRVSSHState *s,
>          DPRINTF("sftp_read returned %zd", r);
>  
>          if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
> -            co_yield(s);
> +            co_yield(s, bs);
>              goto again;
>          }
>          if (r < 0) {
> @@ -875,14 +877,14 @@ static coroutine_fn int ssh_co_readv(BlockDriverState *bs,
>      int ret;
>  
>      qemu_co_mutex_lock(&s->lock);
> -    ret = ssh_read(s, sector_num * BDRV_SECTOR_SIZE,
> +    ret = ssh_read(s, bs, sector_num * BDRV_SECTOR_SIZE,
>                     nb_sectors * BDRV_SECTOR_SIZE, qiov);
>      qemu_co_mutex_unlock(&s->lock);
>  
>      return ret;
>  }
>  
> -static int ssh_write(BDRVSSHState *s,
> +static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
>                       int64_t offset, size_t size,
>                       QEMUIOVector *qiov)
>  {
> @@ -910,7 +912,7 @@ static int ssh_write(BDRVSSHState *s,
>          DPRINTF("sftp_write returned %zd", r);
>  
>          if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
> -            co_yield(s);
> +            co_yield(s, bs);
>              goto again;
>          }
>          if (r < 0) {
> @@ -929,7 +931,7 @@ static int ssh_write(BDRVSSHState *s,
>           */
>          if (r == 0) {
>              ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
> -            co_yield(s);
> +            co_yield(s, bs);
>              goto again;
>          }
>  
> @@ -957,7 +959,7 @@ static coroutine_fn int ssh_co_writev(BlockDriverState *bs,
>      int ret;
>  
>      qemu_co_mutex_lock(&s->lock);
> -    ret = ssh_write(s, sector_num * BDRV_SECTOR_SIZE,
> +    ret = ssh_write(s, bs, sector_num * BDRV_SECTOR_SIZE,
>                      nb_sectors * BDRV_SECTOR_SIZE, qiov);
>      qemu_co_mutex_unlock(&s->lock);
>  
> @@ -978,7 +980,7 @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
>  
>  #ifdef HAS_LIBSSH2_SFTP_FSYNC
>  
> -static coroutine_fn int ssh_flush(BDRVSSHState *s)
> +static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
>  {
>      int r;
>  
> @@ -986,7 +988,7 @@ static coroutine_fn int ssh_flush(BDRVSSHState *s)
>   again:
>      r = libssh2_sftp_fsync(s->sftp_handle);
>      if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
> -        co_yield(s);
> +        co_yield(s, bs);
>          goto again;
>      }
>      if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
> @@ -1008,7 +1010,7 @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
>      int ret;
>  
>      qemu_co_mutex_lock(&s->lock);
> -    ret = ssh_flush(s);
> +    ret = ssh_flush(s, bs);
>      qemu_co_mutex_unlock(&s->lock);
>  
>      return ret;
> -- 
> 1.9.0

As this appears to simply be about adding a context pointer to several
calls, it seems to be a simple, mechanical change, so ACK.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 17/22] ssh: use BlockDriverState's AioContext
  2014-05-01 15:03   ` Richard W.M. Jones
@ 2014-05-01 15:13     ` Stefan Hajnoczi
  0 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-01 15:13 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Kevin Wolf, Shergill, Gurinder, qemu-devel, Stefan Hajnoczi,
	Paolo Bonzini, Vinod, Chegu

On Thu, May 1, 2014 at 5:03 PM, Richard W.M. Jones <rjones@redhat.com> wrote:
> On Thu, May 01, 2014 at 04:54:41PM +0200, Stefan Hajnoczi wrote:
>> Drop the assumption that we're using the main AioContext.  Use
>> bdrv_get_aio_context() to register fd handlers in the right AioContext
>> for this BlockDriverState.
>>
>> The .bdrv_detach_aio_context() and .bdrv_attach_aio_context() interfaces
>> are not needed since no fd handlers, timers, or BHs stay registered when
>> requests have been drained.
>>
>> For now this doesn't make much difference but will allow ssh to work in
>> IOThread instances in the future.
>>
>> Cc: Richard W.M. Jones <rjones@redhat.com>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>  block/ssh.c | 36 +++++++++++++++++++-----------------
>>  1 file changed, 19 insertions(+), 17 deletions(-)
>>
>> diff --git a/block/ssh.c b/block/ssh.c
>> index aa63c9d..3f4a9fb 100644
>> --- a/block/ssh.c
>> +++ b/block/ssh.c
>> @@ -742,7 +742,7 @@ static void restart_coroutine(void *opaque)
>>      qemu_coroutine_enter(co, NULL);
>>  }
>>
>> -static coroutine_fn void set_fd_handler(BDRVSSHState *s)
>> +static coroutine_fn void set_fd_handler(BDRVSSHState *s, BlockDriverState *bs)
>>  {
>>      int r;
>>      IOHandler *rd_handler = NULL, *wr_handler = NULL;
>> @@ -760,24 +760,26 @@ static coroutine_fn void set_fd_handler(BDRVSSHState *s)
>>      DPRINTF("s->sock=%d rd_handler=%p wr_handler=%p", s->sock,
>>              rd_handler, wr_handler);
>>
>> -    qemu_aio_set_fd_handler(s->sock, rd_handler, wr_handler, co);
>> +    aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
>> +                       rd_handler, wr_handler, co);
>>  }
>>
>> -static coroutine_fn void clear_fd_handler(BDRVSSHState *s)
>> +static coroutine_fn void clear_fd_handler(BDRVSSHState *s,
>> +                                          BlockDriverState *bs)
>>  {
>>      DPRINTF("s->sock=%d", s->sock);
>> -    qemu_aio_set_fd_handler(s->sock, NULL, NULL, NULL);
>> +    aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock, NULL, NULL, NULL);
>>  }
>>
>>  /* A non-blocking call returned EAGAIN, so yield, ensuring the
>>   * handlers are set up so that we'll be rescheduled when there is an
>>   * interesting event on the socket.
>>   */
>> -static coroutine_fn void co_yield(BDRVSSHState *s)
>> +static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
>>  {
>> -    set_fd_handler(s);
>> +    set_fd_handler(s, bs);
>>      qemu_coroutine_yield();
>> -    clear_fd_handler(s);
>> +    clear_fd_handler(s, bs);
>>  }
>>
>>  /* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
>> @@ -807,7 +809,7 @@ static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
>>      }
>>  }
>>
>> -static coroutine_fn int ssh_read(BDRVSSHState *s,
>> +static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
>>                                   int64_t offset, size_t size,
>>                                   QEMUIOVector *qiov)
>>  {
>> @@ -840,7 +842,7 @@ static coroutine_fn int ssh_read(BDRVSSHState *s,
>>          DPRINTF("sftp_read returned %zd", r);
>>
>>          if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
>> -            co_yield(s);
>> +            co_yield(s, bs);
>>              goto again;
>>          }
>>          if (r < 0) {
>> @@ -875,14 +877,14 @@ static coroutine_fn int ssh_co_readv(BlockDriverState *bs,
>>      int ret;
>>
>>      qemu_co_mutex_lock(&s->lock);
>> -    ret = ssh_read(s, sector_num * BDRV_SECTOR_SIZE,
>> +    ret = ssh_read(s, bs, sector_num * BDRV_SECTOR_SIZE,
>>                     nb_sectors * BDRV_SECTOR_SIZE, qiov);
>>      qemu_co_mutex_unlock(&s->lock);
>>
>>      return ret;
>>  }
>>
>> -static int ssh_write(BDRVSSHState *s,
>> +static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
>>                       int64_t offset, size_t size,
>>                       QEMUIOVector *qiov)
>>  {
>> @@ -910,7 +912,7 @@ static int ssh_write(BDRVSSHState *s,
>>          DPRINTF("sftp_write returned %zd", r);
>>
>>          if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
>> -            co_yield(s);
>> +            co_yield(s, bs);
>>              goto again;
>>          }
>>          if (r < 0) {
>> @@ -929,7 +931,7 @@ static int ssh_write(BDRVSSHState *s,
>>           */
>>          if (r == 0) {
>>              ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
>> -            co_yield(s);
>> +            co_yield(s, bs);
>>              goto again;
>>          }
>>
>> @@ -957,7 +959,7 @@ static coroutine_fn int ssh_co_writev(BlockDriverState *bs,
>>      int ret;
>>
>>      qemu_co_mutex_lock(&s->lock);
>> -    ret = ssh_write(s, sector_num * BDRV_SECTOR_SIZE,
>> +    ret = ssh_write(s, bs, sector_num * BDRV_SECTOR_SIZE,
>>                      nb_sectors * BDRV_SECTOR_SIZE, qiov);
>>      qemu_co_mutex_unlock(&s->lock);
>>
>> @@ -978,7 +980,7 @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
>>
>>  #ifdef HAS_LIBSSH2_SFTP_FSYNC
>>
>> -static coroutine_fn int ssh_flush(BDRVSSHState *s)
>> +static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
>>  {
>>      int r;
>>
>> @@ -986,7 +988,7 @@ static coroutine_fn int ssh_flush(BDRVSSHState *s)
>>   again:
>>      r = libssh2_sftp_fsync(s->sftp_handle);
>>      if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
>> -        co_yield(s);
>> +        co_yield(s, bs);
>>          goto again;
>>      }
>>      if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
>> @@ -1008,7 +1010,7 @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
>>      int ret;
>>
>>      qemu_co_mutex_lock(&s->lock);
>> -    ret = ssh_flush(s);
>> +    ret = ssh_flush(s, bs);
>>      qemu_co_mutex_unlock(&s->lock);
>>
>>      return ret;
>> --
>> 1.9.0
>
> As this appears to simply be about adding a context pointer to several
> calls, it seems to be a simple, mechanical change, so ACK.

Yes.  I wrote about the reason for the changes and what to look out
for in the cover letter of this series if you want to know more.

Stefan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
@ 2014-05-01 22:39   ` Peter Lieven
  2014-05-07 10:07     ` Stefan Hajnoczi
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Lieven @ 2014-05-01 22:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Shergill, Gurinder, qemu-devel, Ronnie Sahlberg,
	Paolo Bonzini, Vinod, Chegu


Am 01.05.2014 um 16:54 schrieb Stefan Hajnoczi <stefanha@redhat.com>:

> Drop the assumption that we're using the main AioContext for Linux
> AIO.  Convert qemu_aio_set_fd_handler() to aio_set_fd_handler() and
> timer_new_ms() to aio_timer_new().
> 
> The .bdrv_detach/attach_aio_context() interfaces also need to be
> implemented to move the fd and timer from the old to the new AioContext.
> 
> Cc: Peter Lieven <pl@kamp.de>
> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> block/iscsi.c | 79 +++++++++++++++++++++++++++++++++++++++++------------------
> 1 file changed, 55 insertions(+), 24 deletions(-)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index a30202b..81e3ebd 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -47,6 +47,7 @@
> 
> typedef struct IscsiLun {
>     struct iscsi_context *iscsi;
> +    AioContext *aio_context;
>     int lun;
>     enum scsi_inquiry_peripheral_device_type type;
>     int block_size;
> @@ -69,6 +70,7 @@ typedef struct IscsiTask {
>     struct scsi_task *task;
>     Coroutine *co;
>     QEMUBH *bh;
> +    AioContext *aio_context;
> } IscsiTask;
> 
> typedef struct IscsiAIOCB {
> @@ -120,7 +122,7 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
>     if (acb->bh) {
>         return;
>     }
> -    acb->bh = qemu_bh_new(iscsi_bh_cb, acb);
> +    acb->bh = aio_bh_new(acb->iscsilun->aio_context, iscsi_bh_cb, acb);
>     qemu_bh_schedule(acb->bh);
> }
> 
> @@ -156,7 +158,7 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
> 
> out:
>     if (iTask->co) {
> -        iTask->bh = qemu_bh_new(iscsi_co_generic_bh_cb, iTask);
> +        iTask->bh = aio_bh_new(iTask->aio_context, iscsi_co_generic_bh_cb, iTask);
>         qemu_bh_schedule(iTask->bh);
>     }
> }
> @@ -164,8 +166,9 @@ out:
> static void iscsi_co_init_iscsitask(IscsiLun *iscsilun, struct IscsiTask *iTask)
> {
>     *iTask = (struct IscsiTask) {
> -        .co         = qemu_coroutine_self(),
> -        .retries    = ISCSI_CMD_RETRIES,
> +        .co             = qemu_coroutine_self(),
> +        .retries        = ISCSI_CMD_RETRIES,
> +        .aio_context    = iscsilun->aio_context,
>     };
> }
> 
> @@ -196,7 +199,7 @@ iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
>                                      iscsi_abort_task_cb, acb);
> 
>     while (acb->status == -EINPROGRESS) {
> -        qemu_aio_wait();
> +        aio_poll(bdrv_get_aio_context(blockacb->bs), true);
>     }
> }
> 
> @@ -219,10 +222,11 @@ iscsi_set_events(IscsiLun *iscsilun)
>     ev = POLLIN;
>     ev |= iscsi_which_events(iscsi);
>     if (ev != iscsilun->events) {
> -        qemu_aio_set_fd_handler(iscsi_get_fd(iscsi),
> -                      iscsi_process_read,
> -                      (ev & POLLOUT) ? iscsi_process_write : NULL,
> -                      iscsilun);
> +        aio_set_fd_handler(iscsilun->aio_context,
> +                           iscsi_get_fd(iscsi),
> +                           iscsi_process_read,
> +                           (ev & POLLOUT) ? iscsi_process_write : NULL,
> +                           iscsilun);
> 
>     }
> 
> @@ -620,7 +624,7 @@ static int iscsi_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
>         iscsi_aio_ioctl(bs, req, buf, ioctl_cb, &status);
> 
>         while (status == -EINPROGRESS) {
> -            qemu_aio_wait();
> +            aio_poll(bdrv_get_aio_context(bs), true);
>         }
> 
>         return 0;
> @@ -1110,6 +1114,40 @@ fail_with_err:
>     return NULL;
> }
> 
> +static void iscsi_detach_aio_context(BlockDriverState *bs)
> +{
> +    IscsiLun *iscsilun = bs->opaque;
> +
> +    aio_set_fd_handler(iscsilun->aio_context,
> +                       iscsi_get_fd(iscsilun->iscsi),
> +                       NULL, NULL, NULL);
> +    iscsilun->events = 0;
> +
> +    if (iscsilun->nop_timer) {
> +        timer_del(iscsilun->nop_timer);
> +        timer_free(iscsilun->nop_timer);
> +        iscsilun->nop_timer = NULL;
> +    }
> +}
> +
> +static void iscsi_attach_aio_context(BlockDriverState *bs,
> +                                     AioContext *new_context)
> +{
> +    IscsiLun *iscsilun = bs->opaque;
> +
> +    iscsilun->aio_context = new_context;
> +    iscsi_set_events(iscsilun);
> +
> +#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
> +    /* Set up a timer for sending out iSCSI NOPs */
> +    iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context,
> +                                        QEMU_CLOCK_REALTIME, SCALE_MS,
> +                                        iscsi_nop_timed_event, iscsilun);
> +    timer_mod(iscsilun->nop_timer,
> +              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
> +#endif
> +}

Is it still guaranteed that iscsi_nop_timed_event for a target is not invoked
while we are in another function/callback of the iscsi driver for the same target?

Peter


> +
> /*
>  * We support iscsi url's on the form
>  * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
> @@ -1216,6 +1254,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>     }
> 
>     iscsilun->iscsi = iscsi;
> +    iscsilun->aio_context = bdrv_get_aio_context(bs);
>     iscsilun->lun   = iscsi_url->lun;
>     iscsilun->has_write_same = true;
> 
> @@ -1289,11 +1328,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>     scsi_free_scsi_task(task);
>     task = NULL;
> 
> -#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
> -    /* Set up a timer for sending out iSCSI NOPs */
> -    iscsilun->nop_timer = timer_new_ms(QEMU_CLOCK_REALTIME, iscsi_nop_timed_event, iscsilun);
> -    timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
> -#endif
> +    iscsi_attach_aio_context(bs, iscsilun->aio_context);
> 
> out:
>     qemu_opts_del(opts);
> @@ -1321,11 +1356,7 @@ static void iscsi_close(BlockDriverState *bs)
>     IscsiLun *iscsilun = bs->opaque;
>     struct iscsi_context *iscsi = iscsilun->iscsi;
> 
> -    if (iscsilun->nop_timer) {
> -        timer_del(iscsilun->nop_timer);
> -        timer_free(iscsilun->nop_timer);
> -    }
> -    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL);
> +    iscsi_detach_aio_context(bs);
>     iscsi_destroy_context(iscsi);
>     g_free(iscsilun->zeroblock);
>     memset(iscsilun, 0, sizeof(IscsiLun));
> @@ -1421,10 +1452,7 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options,
>     if (ret != 0) {
>         goto out;
>     }
> -    if (iscsilun->nop_timer) {
> -        timer_del(iscsilun->nop_timer);
> -        timer_free(iscsilun->nop_timer);
> -    }
> +    iscsi_detach_aio_context(bs);
>     if (iscsilun->type != TYPE_DISK) {
>         ret = -ENODEV;
>         goto out;
> @@ -1501,6 +1529,9 @@ static BlockDriver bdrv_iscsi = {
>     .bdrv_ioctl       = iscsi_ioctl,
>     .bdrv_aio_ioctl   = iscsi_aio_ioctl,
> #endif
> +
> +    .bdrv_detach_aio_context = iscsi_detach_aio_context,
> +    .bdrv_attach_aio_context = iscsi_attach_aio_context,
> };
> 
> static QemuOptsList qemu_iscsi_opts = {
> -- 
> 1.9.0
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 13/22] block/raw-posix: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 13/22] block/raw-posix: " Stefan Hajnoczi
@ 2014-05-02  7:39   ` Paolo Bonzini
  2014-05-02 11:45     ` Stefan Hajnoczi
  0 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2014-05-02  7:39 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Shergill, Gurinder, Vinod, Chegu

Il 01/05/2014 16:54, Stefan Hajnoczi ha scritto:
> Drop the assumption that we're using the main AioContext for Linux AIO.
> Convert the Linux AIO event notifier to use aio_set_event_notifier().
>
> The .bdrv_detach/attach_aio_context() interfaces also need to be
> implemented to move the event notifier handler from the old to the new
> AioContext.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Is the same needed for raw-win32?

Paolo

> ---
>  block/linux-aio.c | 16 ++++++++++++++--
>  block/raw-aio.h   |  2 ++
>  block/raw-posix.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index 53434e2..7ff3897 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -177,6 +177,20 @@ out_free_aiocb:
>      return NULL;
>  }
>
> +void laio_detach_aio_context(void *s_, AioContext *old_context)
> +{
> +    struct qemu_laio_state *s = s_;
> +
> +    aio_set_event_notifier(old_context, &s->e, NULL);
> +}
> +
> +void laio_attach_aio_context(void *s_, AioContext *new_context)
> +{
> +    struct qemu_laio_state *s = s_;
> +
> +    aio_set_event_notifier(new_context, &s->e, qemu_laio_completion_cb);
> +}
> +
>  void *laio_init(void)
>  {
>      struct qemu_laio_state *s;
> @@ -190,8 +204,6 @@ void *laio_init(void)
>          goto out_close_efd;
>      }
>
> -    qemu_aio_set_event_notifier(&s->e, qemu_laio_completion_cb);
> -
>      return s;
>
>  out_close_efd:
> diff --git a/block/raw-aio.h b/block/raw-aio.h
> index 7ad0a8a..9a761ee 100644
> --- a/block/raw-aio.h
> +++ b/block/raw-aio.h
> @@ -37,6 +37,8 @@ void *laio_init(void);
>  BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
>          int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>          BlockDriverCompletionFunc *cb, void *opaque, int type);
> +void laio_detach_aio_context(void *s, AioContext *old_context);
> +void laio_attach_aio_context(void *s, AioContext *new_context);
>  #endif
>
>  #ifdef _WIN32
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 1688e16..9fef157 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -304,6 +304,29 @@ static void raw_parse_flags(int bdrv_flags, int *open_flags)
>      }
>  }
>
> +static void raw_detach_aio_context(BlockDriverState *bs)
> +{
> +#ifdef CONFIG_LINUX_AIO
> +    BDRVRawState *s = bs->opaque;
> +
> +    if (s->use_aio) {
> +        laio_detach_aio_context(s->aio_ctx, bdrv_get_aio_context(bs));
> +    }
> +#endif
> +}
> +
> +static void raw_attach_aio_context(BlockDriverState *bs,
> +                                   AioContext *new_context)
> +{
> +#ifdef CONFIG_LINUX_AIO
> +    BDRVRawState *s = bs->opaque;
> +
> +    if (s->use_aio) {
> +        laio_attach_aio_context(s->aio_ctx, new_context);
> +    }
> +#endif
> +}
> +
>  #ifdef CONFIG_LINUX_AIO
>  static int raw_set_aio(void **aio_ctx, int *use_aio, int bdrv_flags)
>  {
> @@ -444,6 +467,8 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>      }
>  #endif
>
> +    raw_attach_aio_context(bs, bdrv_get_aio_context(bs));
> +
>      ret = 0;
>  fail:
>      qemu_opts_del(opts);
> @@ -1053,6 +1078,9 @@ static BlockDriverAIOCB *raw_aio_flush(BlockDriverState *bs,
>  static void raw_close(BlockDriverState *bs)
>  {
>      BDRVRawState *s = bs->opaque;
> +
> +    raw_detach_aio_context(bs);
> +
>      if (s->fd >= 0) {
>          qemu_close(s->fd);
>          s->fd = -1;
> @@ -1448,6 +1476,9 @@ static BlockDriver bdrv_file = {
>      .bdrv_get_allocated_file_size
>                          = raw_get_allocated_file_size,
>
> +    .bdrv_detach_aio_context = raw_detach_aio_context,
> +    .bdrv_attach_aio_context = raw_attach_aio_context,
> +
>      .create_options = raw_create_options,
>  };
>
> @@ -1848,6 +1879,9 @@ static BlockDriver bdrv_host_device = {
>      .bdrv_get_allocated_file_size
>                          = raw_get_allocated_file_size,
>
> +    .bdrv_detach_aio_context = raw_detach_aio_context,
> +    .bdrv_attach_aio_context = raw_attach_aio_context,
> +
>      /* generic scsi device */
>  #ifdef __linux__
>      .bdrv_ioctl         = hdev_ioctl,
> @@ -1990,6 +2024,9 @@ static BlockDriver bdrv_host_floppy = {
>      .bdrv_get_allocated_file_size
>                          = raw_get_allocated_file_size,
>
> +    .bdrv_detach_aio_context = raw_detach_aio_context,
> +    .bdrv_attach_aio_context = raw_attach_aio_context,
> +
>      /* removable device support */
>      .bdrv_is_inserted   = floppy_is_inserted,
>      .bdrv_media_changed = floppy_media_changed,
> @@ -2115,6 +2152,9 @@ static BlockDriver bdrv_host_cdrom = {
>      .bdrv_get_allocated_file_size
>                          = raw_get_allocated_file_size,
>
> +    .bdrv_detach_aio_context = raw_detach_aio_context,
> +    .bdrv_attach_aio_context = raw_attach_aio_context,
> +
>      /* removable device support */
>      .bdrv_is_inserted   = cdrom_is_inserted,
>      .bdrv_eject         = cdrom_eject,
> @@ -2246,6 +2286,9 @@ static BlockDriver bdrv_host_cdrom = {
>      .bdrv_get_allocated_file_size
>                          = raw_get_allocated_file_size,
>
> +    .bdrv_detach_aio_context = raw_detach_aio_context,
> +    .bdrv_attach_aio_context = raw_attach_aio_context,
> +
>      /* removable device support */
>      .bdrv_is_inserted   = cdrom_is_inserted,
>      .bdrv_eject         = cdrom_eject,
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 09/22] nbd: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 09/22] nbd: " Stefan Hajnoczi
@ 2014-05-02  7:40   ` Paolo Bonzini
  0 siblings, 0 replies; 53+ messages in thread
From: Paolo Bonzini @ 2014-05-02  7:40 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Shergill, Gurinder, Vinod, Chegu

Il 01/05/2014 16:54, Stefan Hajnoczi ha scritto:
> Drop the assumption that we're using the main AioContext.  Convert
> qemu_aio_set_fd_handler() calls to aio_set_fd_handler().
>
> The .bdrv_detach/attach_aio_context() interfaces also need to be
> implemented to move the socket fd handler from the old to the new
> AioContext.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/nbd-client.c | 24 ++++++++++++---
>  block/nbd-client.h |  4 +++
>  block/nbd.c        | 87 +++++++++++++++++++++++++++++++++---------------------
>  3 files changed, 78 insertions(+), 37 deletions(-)
>
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 7d698cb..6e1c97c 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -49,7 +49,7 @@ static void nbd_teardown_connection(NbdClientSession *client)
>      shutdown(client->sock, 2);
>      nbd_recv_coroutines_enter_all(client);
>
> -    qemu_aio_set_fd_handler(client->sock, NULL, NULL, NULL);
> +    nbd_client_session_detach_aio_context(client);
>      closesocket(client->sock);
>      client->sock = -1;
>  }
> @@ -103,11 +103,14 @@ static int nbd_co_send_request(NbdClientSession *s,
>      struct nbd_request *request,
>      QEMUIOVector *qiov, int offset)
>  {
> +    AioContext *aio_context;
>      int rc, ret;
>
>      qemu_co_mutex_lock(&s->send_mutex);
>      s->send_coroutine = qemu_coroutine_self();
> -    qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, nbd_restart_write, s);
> +    aio_context = bdrv_get_aio_context(s->bs);
> +    aio_set_fd_handler(aio_context, s->sock,
> +                       nbd_reply_ready, nbd_restart_write, s);
>      if (qiov) {
>          if (!s->is_unix) {
>              socket_set_cork(s->sock, 1);
> @@ -126,7 +129,7 @@ static int nbd_co_send_request(NbdClientSession *s,
>      } else {
>          rc = nbd_send_request(s->sock, request);
>      }
> -    qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, NULL, s);
> +    aio_set_fd_handler(aio_context, s->sock, nbd_reply_ready, NULL, s);
>      s->send_coroutine = NULL;
>      qemu_co_mutex_unlock(&s->send_mutex);
>      return rc;
> @@ -335,6 +338,19 @@ int nbd_client_session_co_discard(NbdClientSession *client, int64_t sector_num,
>
>  }
>
> +void nbd_client_session_detach_aio_context(NbdClientSession *client)
> +{
> +    aio_set_fd_handler(bdrv_get_aio_context(client->bs), client->sock,
> +                       NULL, NULL, NULL);
> +}
> +
> +void nbd_client_session_attach_aio_context(NbdClientSession *client,
> +                                           AioContext *new_context)
> +{
> +    aio_set_fd_handler(new_context, client->sock,
> +                       nbd_reply_ready, NULL, client);
> +}
> +
>  void nbd_client_session_close(NbdClientSession *client)
>  {
>      struct nbd_request request = {
> @@ -381,7 +397,7 @@ int nbd_client_session_init(NbdClientSession *client, BlockDriverState *bs,
>      /* Now that we're connected, set the socket to be non-blocking and
>       * kick the reply mechanism.  */
>      qemu_set_nonblock(sock);
> -    qemu_aio_set_fd_handler(sock, nbd_reply_ready, NULL, client);
> +    nbd_client_session_attach_aio_context(client, bdrv_get_aio_context(bs));
>
>      logout("Established connection with NBD server\n");
>      return 0;
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index f2a6337..cd478f3 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -47,4 +47,8 @@ int nbd_client_session_co_writev(NbdClientSession *client, int64_t sector_num,
>  int nbd_client_session_co_readv(NbdClientSession *client, int64_t sector_num,
>                                  int nb_sectors, QEMUIOVector *qiov);
>
> +void nbd_client_session_detach_aio_context(NbdClientSession *client);
> +void nbd_client_session_attach_aio_context(NbdClientSession *client,
> +                                           AioContext *new_context);
> +
>  #endif /* NBD_CLIENT_H */
> diff --git a/block/nbd.c b/block/nbd.c
> index 613f258..4eda095 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -323,46 +323,67 @@ static int64_t nbd_getlength(BlockDriverState *bs)
>      return s->client.size;
>  }
>
> +static void nbd_detach_aio_context(BlockDriverState *bs)
> +{
> +    BDRVNBDState *s = bs->opaque;
> +
> +    nbd_client_session_detach_aio_context(&s->client);
> +}
> +
> +static void nbd_attach_aio_context(BlockDriverState *bs,
> +                                   AioContext *new_context)
> +{
> +    BDRVNBDState *s = bs->opaque;
> +
> +    nbd_client_session_attach_aio_context(&s->client, new_context);
> +}
> +
>  static BlockDriver bdrv_nbd = {
> -    .format_name         = "nbd",
> -    .protocol_name       = "nbd",
> -    .instance_size       = sizeof(BDRVNBDState),
> -    .bdrv_parse_filename = nbd_parse_filename,
> -    .bdrv_file_open      = nbd_open,
> -    .bdrv_co_readv       = nbd_co_readv,
> -    .bdrv_co_writev      = nbd_co_writev,
> -    .bdrv_close          = nbd_close,
> -    .bdrv_co_flush_to_os = nbd_co_flush,
> -    .bdrv_co_discard     = nbd_co_discard,
> -    .bdrv_getlength      = nbd_getlength,
> +    .format_name                = "nbd",
> +    .protocol_name              = "nbd",
> +    .instance_size              = sizeof(BDRVNBDState),
> +    .bdrv_parse_filename        = nbd_parse_filename,
> +    .bdrv_file_open             = nbd_open,
> +    .bdrv_co_readv              = nbd_co_readv,
> +    .bdrv_co_writev             = nbd_co_writev,
> +    .bdrv_close                 = nbd_close,
> +    .bdrv_co_flush_to_os        = nbd_co_flush,
> +    .bdrv_co_discard            = nbd_co_discard,
> +    .bdrv_getlength             = nbd_getlength,
> +    .bdrv_detach_aio_context    = nbd_detach_aio_context,
> +    .bdrv_attach_aio_context    = nbd_attach_aio_context,
>  };
>
>  static BlockDriver bdrv_nbd_tcp = {
> -    .format_name         = "nbd",
> -    .protocol_name       = "nbd+tcp",
> -    .instance_size       = sizeof(BDRVNBDState),
> -    .bdrv_parse_filename = nbd_parse_filename,
> -    .bdrv_file_open      = nbd_open,
> -    .bdrv_co_readv       = nbd_co_readv,
> -    .bdrv_co_writev      = nbd_co_writev,
> -    .bdrv_close          = nbd_close,
> -    .bdrv_co_flush_to_os = nbd_co_flush,
> -    .bdrv_co_discard     = nbd_co_discard,
> -    .bdrv_getlength      = nbd_getlength,
> +    .format_name                = "nbd",
> +    .protocol_name              = "nbd+tcp",
> +    .instance_size              = sizeof(BDRVNBDState),
> +    .bdrv_parse_filename        = nbd_parse_filename,
> +    .bdrv_file_open             = nbd_open,
> +    .bdrv_co_readv              = nbd_co_readv,
> +    .bdrv_co_writev             = nbd_co_writev,
> +    .bdrv_close                 = nbd_close,
> +    .bdrv_co_flush_to_os        = nbd_co_flush,
> +    .bdrv_co_discard            = nbd_co_discard,
> +    .bdrv_getlength             = nbd_getlength,
> +    .bdrv_detach_aio_context    = nbd_detach_aio_context,
> +    .bdrv_attach_aio_context    = nbd_attach_aio_context,
>  };
>
>  static BlockDriver bdrv_nbd_unix = {
> -    .format_name         = "nbd",
> -    .protocol_name       = "nbd+unix",
> -    .instance_size       = sizeof(BDRVNBDState),
> -    .bdrv_parse_filename = nbd_parse_filename,
> -    .bdrv_file_open      = nbd_open,
> -    .bdrv_co_readv       = nbd_co_readv,
> -    .bdrv_co_writev      = nbd_co_writev,
> -    .bdrv_close          = nbd_close,
> -    .bdrv_co_flush_to_os = nbd_co_flush,
> -    .bdrv_co_discard     = nbd_co_discard,
> -    .bdrv_getlength      = nbd_getlength,
> +    .format_name                = "nbd",
> +    .protocol_name              = "nbd+unix",
> +    .instance_size              = sizeof(BDRVNBDState),
> +    .bdrv_parse_filename        = nbd_parse_filename,
> +    .bdrv_file_open             = nbd_open,
> +    .bdrv_co_readv              = nbd_co_readv,
> +    .bdrv_co_writev             = nbd_co_writev,
> +    .bdrv_close                 = nbd_close,
> +    .bdrv_co_flush_to_os        = nbd_co_flush,
> +    .bdrv_co_discard            = nbd_co_discard,
> +    .bdrv_getlength             = nbd_getlength,
> +    .bdrv_detach_aio_context    = nbd_detach_aio_context,
> +    .bdrv_attach_aio_context    = nbd_attach_aio_context,
>  };
>
>  static void bdrv_nbd_init(void)
>

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (21 preceding siblings ...)
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 22/22] raw-posix: drop raw_get_aio_fd() since it is no longer used Stefan Hajnoczi
@ 2014-05-02  7:42 ` Paolo Bonzini
  2014-05-02 11:59   ` Stefan Hajnoczi
  2014-05-05  9:17 ` Christian Borntraeger
  23 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2014-05-02  7:42 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Shergill, Gurinder, Vinod, Chegu

Il 01/05/2014 16:54, Stefan Hajnoczi ha scritto:
> This patch series switches virtio-blk data-plane from a custom Linux AIO
> request queue to the QEMU block layer.  The previous "raw files only"
> limitation is lifted.  All image formats and protocols can now be used with
> virtio-blk data-plane.

Yay!

> I have already made block I/O throttling work in another AioContext and will
> send the series out next week.
>
> In order to keep this series reviewable, I'm holding back those patches for
> now.  One could say, "throttling" them.

What's also missing is things like block jobs and live snapshots, right?

Also, blockstats is not thread-safe and probably a couple other things 
but that's minor and easy to fix.

Paolo

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 13/22] block/raw-posix: implement .bdrv_detach/attach_aio_context()
  2014-05-02  7:39   ` Paolo Bonzini
@ 2014-05-02 11:45     ` Stefan Hajnoczi
  0 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-02 11:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Shergill, Gurinder, Vinod, Chegu, qemu-devel,
	Stefan Hajnoczi

On Fri, May 2, 2014 at 9:39 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 01/05/2014 16:54, Stefan Hajnoczi ha scritto:
>
>> Drop the assumption that we're using the main AioContext for Linux AIO.
>> Convert the Linux AIO event notifier to use aio_set_event_notifier().
>>
>> The .bdrv_detach/attach_aio_context() interfaces also need to be
>> implemented to move the event notifier handler from the old to the new
>> AioContext.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
>
> Is the same needed for raw-win32?

You are right, I will add raw-win32 support in v2.

Stefan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer
  2014-05-02  7:42 ` [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Paolo Bonzini
@ 2014-05-02 11:59   ` Stefan Hajnoczi
  0 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-02 11:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Shergill, Gurinder, Vinod, Chegu, qemu-devel,
	Stefan Hajnoczi

On Fri, May 2, 2014 at 9:42 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 01/05/2014 16:54, Stefan Hajnoczi ha scritto:
>
>> This patch series switches virtio-blk data-plane from a custom Linux AIO
>> request queue to the QEMU block layer.  The previous "raw files only"
>> limitation is lifted.  All image formats and protocols can now be used
>> with
>> virtio-blk data-plane.
>
>
> Yay!
>
>
>> I have already made block I/O throttling work in another AioContext and
>> will
>> send the series out next week.
>>
>> In order to keep this series reviewable, I'm holding back those patches
>> for
>> now.  One could say, "throttling" them.
>
>
> What's also missing is things like block jobs and live snapshots, right?
>
> Also, blockstats is not thread-safe and probably a couple other things but
> that's minor and easy to fix.

Yes, my plan for QEMU 2.1 is to handle the monitor commands that
aren't protected by bdrv_in_use().

The list of limitations is still fairly long:
1. No I/O throttling
2. Snapshot, blockstats, and other commands are not supported
3. No block jobs
4. No run-time NBD server
5. No hot unplug
6. virtio-blk scsi=on and config-wce=on are not supported

#3, #4, #5, and some of #2 are protected by the bdrv_in_use()
mechanism.  They continue to return an error after this patch series
just like they do today.

#1 is my first priority because it isn't protected by anything.  For
#2 I'm auditing monitor commands and protecting things on a
case-by-case basis.

Once everything up to and including #5 has been finished we can stop
using bdrv_in_use() in such a crude way.  Things will be protected by
aio_context_acquire/release() or other methods (I still have the
"switch this coroutine into that AioContext" up my sleeve for block
jobs and other hard cases).

Finally, #6 is where we have to answer the question "can virtio-blk
dataplane simply share the normal virtio-blk.c code?".  I want to
unify the dataplane and non-dataplane virtio-blk emulation so there is
only one code path that either runs in vcpu/mainloop or dataplane
IOThread.

Stefan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 18/22] vmdk: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 18/22] vmdk: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
@ 2014-05-04  9:50   ` Fam Zheng
  2014-05-04 10:17   ` Fam Zheng
  1 sibling, 0 replies; 53+ messages in thread
From: Fam Zheng @ 2014-05-04  9:50 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu, qemu-devel

On Thu, 05/01 16:54, Stefan Hajnoczi wrote:
> Implement .bdrv_detach/attach_aio_context() interfaces to propagate
> detach/attach to BDRVVmdkState->extents[].file.  The block layer takes
> care of ->file and ->backing_hd but doesn't know about our extents
> BlockDriverStates, which is also part of the graph.
> 
> Cc: Fam Zheng <famz@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/vmdk.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 06a1f9f..1ca944a 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -2063,6 +2063,27 @@ static ImageInfoSpecific *vmdk_get_specific_info(BlockDriverState *bs)
>      return spec_info;
>  }
>  
> +static void vmdk_detach_aio_context(BlockDriverState *bs)
> +{
> +    BDRVVmdkState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_extents; i++) {
> +        bdrv_detach_aio_context(s->extents[i].file);
> +    }
> +}
> +
> +static void vmdk_attach_aio_context(BlockDriverState *bs,
> +                                    AioContext *new_context)
> +{
> +    BDRVVmdkState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_extents; i++) {
> +        bdrv_attach_aio_context(s->extents[i].file, new_context);
> +    }
> +}
> +
>  static QEMUOptionParameter vmdk_create_options[] = {
>      {
>          .name = BLOCK_OPT_SIZE,
> @@ -2118,6 +2139,8 @@ static BlockDriver bdrv_vmdk = {
>      .bdrv_has_zero_init           = vmdk_has_zero_init,
>      .bdrv_get_specific_info       = vmdk_get_specific_info,
>      .bdrv_refresh_limits          = vmdk_refresh_limits,
> +    .bdrv_detach_aio_context      = vmdk_detach_aio_context,
> +    .bdrv_attach_aio_context      = vmdk_attach_aio_context,
>  
>      .create_options               = vmdk_create_options,
>  };

Looks good to me,

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 18/22] vmdk: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 18/22] vmdk: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
  2014-05-04  9:50   ` Fam Zheng
@ 2014-05-04 10:17   ` Fam Zheng
  2014-05-05 12:03     ` Stefan Hajnoczi
  1 sibling, 1 reply; 53+ messages in thread
From: Fam Zheng @ 2014-05-04 10:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu, qemu-devel

On Thu, 05/01 16:54, Stefan Hajnoczi wrote:
> Implement .bdrv_detach/attach_aio_context() interfaces to propagate
> detach/attach to BDRVVmdkState->extents[].file.  The block layer takes
> care of ->file and ->backing_hd but doesn't know about our extents
> BlockDriverStates, which is also part of the graph.
> 
> Cc: Fam Zheng <famz@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/vmdk.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 06a1f9f..1ca944a 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -2063,6 +2063,27 @@ static ImageInfoSpecific *vmdk_get_specific_info(BlockDriverState *bs)
>      return spec_info;
>  }
>  
> +static void vmdk_detach_aio_context(BlockDriverState *bs)
> +{
> +    BDRVVmdkState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_extents; i++) {
> +        bdrv_detach_aio_context(s->extents[i].file);
> +    }
> +}
> +
> +static void vmdk_attach_aio_context(BlockDriverState *bs,
> +                                    AioContext *new_context)
> +{
> +    BDRVVmdkState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_extents; i++) {
> +        bdrv_attach_aio_context(s->extents[i].file, new_context);
> +    }
> +}
> +
>  static QEMUOptionParameter vmdk_create_options[] = {
>      {
>          .name = BLOCK_OPT_SIZE,
> @@ -2118,6 +2139,8 @@ static BlockDriver bdrv_vmdk = {
>      .bdrv_has_zero_init           = vmdk_has_zero_init,
>      .bdrv_get_specific_info       = vmdk_get_specific_info,
>      .bdrv_refresh_limits          = vmdk_refresh_limits,
> +    .bdrv_detach_aio_context      = vmdk_detach_aio_context,
> +    .bdrv_attach_aio_context      = vmdk_attach_aio_context,
>  
>      .create_options               = vmdk_create_options,
>  };
> -- 

I'm wondering why we need to separate detach and attach as two functions, and
also add bdrv_set_aio_context in block.c, instead of a single
.bdrv_set_aio_context member which is called in bdrv_set_aio_context()? The
latter seems less code.

Thanks,
Fam

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 06/22] curl: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 06/22] curl: " Stefan Hajnoczi
@ 2014-05-04 11:00   ` Fam Zheng
  2014-05-05 11:52     ` Stefan Hajnoczi
  0 siblings, 1 reply; 53+ messages in thread
From: Fam Zheng @ 2014-05-04 11:00 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Shergill, Gurinder, qemu-devel, Alexander Graf,
	Paolo Bonzini, Vinod, Chegu

On Thu, 05/01 16:54, Stefan Hajnoczi wrote:
> The curl block driver uses fd handlers, timers, and BHs.  The fd
> handlers and timers are managed on behalf of libcurl, which controls
> them using callback functions that the block driver implements.
> 
> The simplest way to implement .bdrv_detach/attach_aio_context() is to
> clean up libcurl in the old event loop and initialize it again in the
> new event loop.  We do not need to keep track of anything since there
> are no pending requests when the AioContext is changed.
> 
> Also make sure to use aio_set_fd_handler() instead of
> qemu_aio_set_fd_handler() and aio_bh_new() instead of qemu_bh_new() so
> the current AioContext is passed in.
> 
> Cc: Alexander Graf <agraf@suse.de>
> Cc: Fam Zheng <famz@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Might need to rebase on current master because of the latest curl fixes.

The patch itself looks good. Minor comments below.

> ---
>  block/curl.c | 194 +++++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 116 insertions(+), 78 deletions(-)
> 
> diff --git a/block/curl.c b/block/curl.c
> index 6731d28..88638ec 100644
> --- a/block/curl.c
> +++ b/block/curl.c
> @@ -430,6 +434,55 @@ static void curl_parse_filename(const char *filename, QDict *options,
>      g_free(file);
>  }
>  
> +static void curl_detach_aio_context(BlockDriverState *bs)
> +{
> +    BDRVCURLState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < CURL_NUM_STATES; i++) {
> +        if (s->states[i].in_use) {
> +            curl_clean_state(&s->states[i]);
> +        }
> +        if (s->states[i].curl) {
> +            curl_easy_cleanup(s->states[i].curl);
> +            s->states[i].curl = NULL;
> +        }
> +        if (s->states[i].orig_buf) {
> +            g_free(s->states[i].orig_buf);
> +            s->states[i].orig_buf = NULL;
> +        }
> +    }
> +    if (s->multi) {
> +        curl_multi_cleanup(s->multi);
> +        s->multi = NULL;
> +    }
> +
> +    timer_del(&s->timer);
> +}
> +
> +static void curl_attach_aio_context(BlockDriverState *bs,
> +                                    AioContext *new_context)
> +{
> +    BDRVCURLState *s = bs->opaque;
> +
> +    aio_timer_init(new_context, &s->timer,
> +                   QEMU_CLOCK_REALTIME, SCALE_NS,
> +                   curl_multi_timeout_do, s);
> +
> +    // Now we know the file exists and its size, so let's
> +    // initialize the multi interface!

I would keep this comment where it was. :)

> +
> +    s->multi = curl_multi_init();

Should we assert bdrv_attach_aio_context() is never called repeatedly or
without a preceding bdrv_detach_aio_context()? Otherwise s->multi could leak.

> +    s->aio_context = new_context;
> +    curl_multi_setopt(s->multi, CURLMOPT_SOCKETDATA, s);
> +    curl_multi_setopt(s->multi, CURLMOPT_SOCKETFUNCTION, curl_sock_cb);
> +#ifdef NEED_CURL_TIMER_CALLBACK
> +    curl_multi_setopt(s->multi, CURLMOPT_TIMERDATA, s);
> +    curl_multi_setopt(s->multi, CURLMOPT_TIMERFUNCTION, curl_timer_cb);
> +#endif
> +    curl_multi_do(s);

If you rebase to master, this call to curl_multi_do() is gone, among other
changes.

Thanks,
Fam

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 19/22] dataplane: use the QEMU block layer for I/O
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 19/22] dataplane: use the QEMU block layer for I/O Stefan Hajnoczi
@ 2014-05-04 11:51   ` Fam Zheng
  2014-05-05 12:03     ` Stefan Hajnoczi
  0 siblings, 1 reply; 53+ messages in thread
From: Fam Zheng @ 2014-05-04 11:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu, qemu-devel

On Thu, 05/01 16:54, Stefan Hajnoczi wrote:
> @@ -152,51 +132,53 @@ static void do_get_id_cmd(VirtIOBlockDataPlane *s,
>      complete_request_early(s, elem, inhdr, VIRTIO_BLK_S_OK);
>  }
>  
> -static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
> -                       struct iovec *iov, unsigned iov_cnt,
> -                       long long offset, VirtQueueElement *elem,
> -                       QEMUIOVector *inhdr)
> +static void do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
> +                        struct iovec *iov, unsigned iov_cnt,
> +                        int64_t sector_num, VirtQueueElement *elem,
> +                        QEMUIOVector *inhdr)
>  {
> -    struct iocb *iocb;
> -    QEMUIOVector qiov;
> -    struct iovec *bounce_iov = NULL;
> -    QEMUIOVector *read_qiov = NULL;
> -
> -    qemu_iovec_init_external(&qiov, iov, iov_cnt);
> -    if (!bdrv_qiov_is_aligned(s->blk->conf.bs, &qiov)) {
> -        void *bounce_buffer = qemu_blockalign(s->blk->conf.bs, qiov.size);
> -
> -        if (read) {
> -            /* Need to copy back from bounce buffer on completion */
> -            read_qiov = g_slice_new(QEMUIOVector);
> -            qemu_iovec_init(read_qiov, iov_cnt);
> -            qemu_iovec_concat_iov(read_qiov, iov, iov_cnt, 0, qiov.size);
> -        } else {
> -            qemu_iovec_to_buf(&qiov, 0, bounce_buffer, qiov.size);
> +    VirtIOBlockRequest *req = g_slice_new(VirtIOBlockRequest);

Could be g_slice_new0,

> +    QEMUIOVector *qiov;
> +    int nb_sectors;
> +
> +    /* Fill in virtio block metadata needed for completion */
> +    memset(req, 0, sizeof(*req));

so this memset is not needed.

> +    req->s = s;
> +    req->elem = elem;
> +    req->inhdr = inhdr;
> +    req->read = read;
> +    qemu_iovec_init_external(&req->qiov, iov, iov_cnt);
> +
> +    qiov = &req->qiov;
> +
> +    if (!bdrv_qiov_is_aligned(s->blk->conf.bs, qiov)) {
> +        void *bounce_buffer = qemu_blockalign(s->blk->conf.bs, qiov->size);
> +
> +        /* Populate bounce buffer with data for writes */
> +        if (!read) {
> +            qemu_iovec_to_buf(qiov, 0, bounce_buffer, qiov->size);
>          }
>  

Fam

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 16/22] sheepdog: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 16/22] sheepdog: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
@ 2014-05-05  8:10   ` Liu Yuan
  0 siblings, 0 replies; 53+ messages in thread
From: Liu Yuan @ 2014-05-05  8:10 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Shergill, Gurinder, qemu-devel, Paolo Bonzini, Vinod,
	Chegu, MORITA Kazutaka

On Thu, May 01, 2014 at 04:54:40PM +0200, Stefan Hajnoczi wrote:
> Drop the assumption that we're using the main AioContext.  Convert
> qemu_aio_set_fd_handler() to aio_set_fd_handler() and qemu_aio_wait() to
> aio_poll().
> 
> The .bdrv_detach/attach_aio_context() interfaces also need to be
> implemented to move the socket fd handler from the old to the new
> AioContext.
> 
> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> Cc: Liu Yuan <namei.unix@gmail.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/sheepdog.c | 118 +++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 80 insertions(+), 38 deletions(-)
> 
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 0eb33ee..4727fc1 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -314,6 +314,7 @@ struct SheepdogAIOCB {
>  
>  typedef struct BDRVSheepdogState {
>      BlockDriverState *bs;
> +    AioContext *aio_context;
>  
>      SheepdogInode inode;
>  
> @@ -496,7 +497,7 @@ static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
>              sd_finish_aiocb(acb);
>              return;
>          }
> -        qemu_aio_wait();
> +        aio_poll(s->aio_context, true);
>      }
>  }
>  
> @@ -582,6 +583,7 @@ static void restart_co_req(void *opaque)
>  
>  typedef struct SheepdogReqCo {
>      int sockfd;
> +    AioContext *aio_context;
>      SheepdogReq *hdr;
>      void *data;
>      unsigned int *wlen;
> @@ -602,14 +604,14 @@ static coroutine_fn void do_co_req(void *opaque)
>      unsigned int *rlen = srco->rlen;
>  
>      co = qemu_coroutine_self();
> -    qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, co);
> +    aio_set_fd_handler(srco->aio_context, sockfd, NULL, restart_co_req, co);
>  
>      ret = send_co_req(sockfd, hdr, data, wlen);
>      if (ret < 0) {
>          goto out;
>      }
>  
> -    qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, co);
> +    aio_set_fd_handler(srco->aio_context, sockfd, restart_co_req, NULL, co);
>  
>      ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
>      if (ret != sizeof(*hdr)) {
> @@ -634,18 +636,19 @@ static coroutine_fn void do_co_req(void *opaque)
>  out:
>      /* there is at most one request for this sockfd, so it is safe to
>       * set each handler to NULL. */
> -    qemu_aio_set_fd_handler(sockfd, NULL, NULL, NULL);
> +    aio_set_fd_handler(srco->aio_context, sockfd, NULL, NULL, NULL);
>  
>      srco->ret = ret;
>      srco->finished = true;
>  }
>  
> -static int do_req(int sockfd, SheepdogReq *hdr, void *data,
> -                  unsigned int *wlen, unsigned int *rlen)
> +static int do_req(int sockfd, AioContext *aio_context, SheepdogReq *hdr,
> +                  void *data, unsigned int *wlen, unsigned int *rlen)
>  {
>      Coroutine *co;
>      SheepdogReqCo srco = {
>          .sockfd = sockfd,
> +        .aio_context = aio_context,
>          .hdr = hdr,
>          .data = data,
>          .wlen = wlen,
> @@ -660,7 +663,7 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data,
>          co = qemu_coroutine_create(do_co_req);
>          qemu_coroutine_enter(co, &srco);
>          while (!srco.finished) {
> -            qemu_aio_wait();
> +            aio_poll(aio_context, true);
>          }
>      }
>  
> @@ -712,7 +715,7 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
>      BDRVSheepdogState *s = opaque;
>      AIOReq *aio_req, *next;
>  
> -    qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL);
> +    aio_set_fd_handler(s->aio_context, s->fd, NULL, NULL, NULL);
>      close(s->fd);
>      s->fd = -1;
>  
> @@ -923,7 +926,7 @@ static int get_sheep_fd(BDRVSheepdogState *s)
>          return fd;
>      }
>  
> -    qemu_aio_set_fd_handler(fd, co_read_response, NULL, s);
> +    aio_set_fd_handler(s->aio_context, fd, co_read_response, NULL, s);
>      return fd;
>  }
>  
> @@ -1093,7 +1096,7 @@ static int find_vdi_name(BDRVSheepdogState *s, const char *filename,
>      hdr.snapid = snapid;
>      hdr.flags = SD_FLAG_CMD_WRITE;
>  
> -    ret = do_req(fd, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
> +    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
>      if (ret) {
>          goto out;
>      }
> @@ -1173,7 +1176,8 @@ static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
>  
>      qemu_co_mutex_lock(&s->lock);
>      s->co_send = qemu_coroutine_self();
> -    qemu_aio_set_fd_handler(s->fd, co_read_response, co_write_request, s);
> +    aio_set_fd_handler(s->aio_context, s->fd,
> +                       co_read_response, co_write_request, s);
>      socket_set_cork(s->fd, 1);
>  
>      /* send a header */
> @@ -1191,12 +1195,13 @@ static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
>      }
>  out:
>      socket_set_cork(s->fd, 0);
> -    qemu_aio_set_fd_handler(s->fd, co_read_response, NULL, s);
> +    aio_set_fd_handler(s->aio_context, s->fd, co_read_response, NULL, s);
>      s->co_send = NULL;
>      qemu_co_mutex_unlock(&s->lock);
>  }
>  
> -static int read_write_object(int fd, char *buf, uint64_t oid, uint8_t copies,
> +static int read_write_object(int fd, AioContext *aio_context, char *buf,
> +                             uint64_t oid, uint8_t copies,
>                               unsigned int datalen, uint64_t offset,
>                               bool write, bool create, uint32_t cache_flags)
>  {
> @@ -1229,7 +1234,7 @@ static int read_write_object(int fd, char *buf, uint64_t oid, uint8_t copies,
>      hdr.offset = offset;
>      hdr.copies = copies;
>  
> -    ret = do_req(fd, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
> +    ret = do_req(fd, aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
>      if (ret) {
>          error_report("failed to send a request to the sheep");
>          return ret;
> @@ -1244,19 +1249,23 @@ static int read_write_object(int fd, char *buf, uint64_t oid, uint8_t copies,
>      }
>  }
>  
> -static int read_object(int fd, char *buf, uint64_t oid, uint8_t copies,
> +static int read_object(int fd, AioContext *aio_context, char *buf,
> +                       uint64_t oid, uint8_t copies,
>                         unsigned int datalen, uint64_t offset,
>                         uint32_t cache_flags)
>  {
> -    return read_write_object(fd, buf, oid, copies, datalen, offset, false,
> +    return read_write_object(fd, aio_context, buf, oid, copies,
> +                             datalen, offset, false,
>                               false, cache_flags);
>  }
>  
> -static int write_object(int fd, char *buf, uint64_t oid, uint8_t copies,
> +static int write_object(int fd, AioContext *aio_context, char *buf,
> +                        uint64_t oid, uint8_t copies,
>                          unsigned int datalen, uint64_t offset, bool create,
>                          uint32_t cache_flags)
>  {
> -    return read_write_object(fd, buf, oid, copies, datalen, offset, true,
> +    return read_write_object(fd, aio_context, buf, oid, copies,
> +                             datalen, offset, true,
>                               create, cache_flags);
>  }
>  
> @@ -1279,7 +1288,7 @@ static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char *tag)
>          goto out;
>      }
>  
> -    ret = read_object(fd, (char *)inode, vid_to_vdi_oid(vid),
> +    ret = read_object(fd, s->aio_context, (char *)inode, vid_to_vdi_oid(vid),
>                        s->inode.nr_copies, sizeof(*inode), 0, s->cache_flags);
>      if (ret < 0) {
>          goto out;
> @@ -1354,6 +1363,22 @@ out:
>      }
>  }
>  
> +static void sd_detach_aio_context(BlockDriverState *bs)
> +{
> +    BDRVSheepdogState *s = bs->opaque;
> +
> +    aio_set_fd_handler(s->aio_context, s->fd, NULL, NULL, NULL);
> +}
> +
> +static void sd_attach_aio_context(BlockDriverState *bs,
> +                                  AioContext *new_context)
> +{
> +    BDRVSheepdogState *s = bs->opaque;
> +
> +    s->aio_context = new_context;
> +    aio_set_fd_handler(new_context, s->fd, co_read_response, NULL, s);
> +}
> +
>  /* TODO Convert to fine grained options */
>  static QemuOptsList runtime_opts = {
>      .name = "sheepdog",
> @@ -1382,6 +1407,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
>      const char *filename;
>  
>      s->bs = bs;
> +    s->aio_context = bdrv_get_aio_context(bs);
>  
>      opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
>      qemu_opts_absorb_qdict(opts, options, &local_err);
> @@ -1443,8 +1469,8 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
>      }
>  
>      buf = g_malloc(SD_INODE_SIZE);
> -    ret = read_object(fd, buf, vid_to_vdi_oid(vid), 0, SD_INODE_SIZE, 0,
> -                      s->cache_flags);
> +    ret = read_object(fd, s->aio_context, buf, vid_to_vdi_oid(vid),
> +                      0, SD_INODE_SIZE, 0, s->cache_flags);
>  
>      closesocket(fd);
>  
> @@ -1463,7 +1489,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
>      g_free(buf);
>      return 0;
>  out:
> -    qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL);
> +    aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd, NULL, NULL, NULL);
>      if (s->fd >= 0) {
>          closesocket(s->fd);
>      }
> @@ -1505,7 +1531,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot)
>      hdr.copy_policy = s->inode.copy_policy;
>      hdr.copies = s->inode.nr_copies;
>  
> -    ret = do_req(fd, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
> +    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
>  
>      closesocket(fd);
>  
> @@ -1751,7 +1777,8 @@ static void sd_close(BlockDriverState *bs)
>      hdr.data_length = wlen;
>      hdr.flags = SD_FLAG_CMD_WRITE;
>  
> -    ret = do_req(fd, (SheepdogReq *)&hdr, s->name, &wlen, &rlen);
> +    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr,
> +                 s->name, &wlen, &rlen);
>  
>      closesocket(fd);
>  
> @@ -1760,7 +1787,7 @@ static void sd_close(BlockDriverState *bs)
>          error_report("%s, %s", sd_strerror(rsp->result), s->name);
>      }
>  
> -    qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL);
> +    aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd, NULL, NULL, NULL);
>      closesocket(s->fd);
>      g_free(s->host_spec);
>  }
> @@ -1794,8 +1821,9 @@ static int sd_truncate(BlockDriverState *bs, int64_t offset)
>      /* we don't need to update entire object */
>      datalen = SD_INODE_SIZE - sizeof(s->inode.data_vdi_id);
>      s->inode.vdi_size = offset;
> -    ret = write_object(fd, (char *)&s->inode, vid_to_vdi_oid(s->inode.vdi_id),
> -                       s->inode.nr_copies, datalen, 0, false, s->cache_flags);
> +    ret = write_object(fd, s->aio_context, (char *)&s->inode,
> +                       vid_to_vdi_oid(s->inode.vdi_id), s->inode.nr_copies,
> +                       datalen, 0, false, s->cache_flags);
>      close(fd);
>  
>      if (ret < 0) {
> @@ -1861,7 +1889,8 @@ static bool sd_delete(BDRVSheepdogState *s)
>          return false;
>      }
>  
> -    ret = do_req(fd, (SheepdogReq *)&hdr, s->name, &wlen, &rlen);
> +    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr,
> +                 s->name, &wlen, &rlen);
>      closesocket(fd);
>      if (ret) {
>          return false;
> @@ -1913,8 +1942,8 @@ static int sd_create_branch(BDRVSheepdogState *s)
>          goto out;
>      }
>  
> -    ret = read_object(fd, buf, vid_to_vdi_oid(vid), s->inode.nr_copies,
> -                      SD_INODE_SIZE, 0, s->cache_flags);
> +    ret = read_object(fd, s->aio_context, buf, vid_to_vdi_oid(vid),
> +                      s->inode.nr_copies, SD_INODE_SIZE, 0, s->cache_flags);
>  
>      closesocket(fd);
>  
> @@ -2157,8 +2186,9 @@ static int sd_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
>          goto cleanup;
>      }
>  
> -    ret = write_object(fd, (char *)&s->inode, vid_to_vdi_oid(s->inode.vdi_id),
> -                       s->inode.nr_copies, datalen, 0, false, s->cache_flags);
> +    ret = write_object(fd, s->aio_context, (char *)&s->inode,
> +                       vid_to_vdi_oid(s->inode.vdi_id), s->inode.nr_copies,
> +                       datalen, 0, false, s->cache_flags);
>      if (ret < 0) {
>          error_report("failed to write snapshot's inode.");
>          goto cleanup;
> @@ -2173,8 +2203,9 @@ static int sd_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
>  
>      inode = (SheepdogInode *)g_malloc(datalen);
>  
> -    ret = read_object(fd, (char *)inode, vid_to_vdi_oid(new_vid),
> -                      s->inode.nr_copies, datalen, 0, s->cache_flags);
> +    ret = read_object(fd, s->aio_context, (char *)inode,
> +                      vid_to_vdi_oid(new_vid), s->inode.nr_copies,
> +                      datalen, 0, s->cache_flags);
>  
>      if (ret < 0) {
>          error_report("failed to read new inode info. %s", strerror(errno));
> @@ -2277,7 +2308,8 @@ static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
>      req.opcode = SD_OP_READ_VDIS;
>      req.data_length = max;
>  
> -    ret = do_req(fd, (SheepdogReq *)&req, vdi_inuse, &wlen, &rlen);
> +    ret = do_req(fd, s->aio_context, (SheepdogReq *)&req,
> +                 vdi_inuse, &wlen, &rlen);
>  
>      closesocket(fd);
>      if (ret) {
> @@ -2302,7 +2334,8 @@ static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
>          }
>  
>          /* we don't need to read entire object */
> -        ret = read_object(fd, (char *)&inode, vid_to_vdi_oid(vid),
> +        ret = read_object(fd, s->aio_context, (char *)&inode,
> +                          vid_to_vdi_oid(vid),
>                            0, SD_INODE_SIZE - sizeof(inode.data_vdi_id), 0,
>                            s->cache_flags);
>  
> @@ -2364,11 +2397,11 @@ static int do_load_save_vmstate(BDRVSheepdogState *s, uint8_t *data,
>  
>          create = (offset == 0);
>          if (load) {
> -            ret = read_object(fd, (char *)data, vmstate_oid,
> +            ret = read_object(fd, s->aio_context, (char *)data, vmstate_oid,
>                                s->inode.nr_copies, data_len, offset,
>                                s->cache_flags);
>          } else {
> -            ret = write_object(fd, (char *)data, vmstate_oid,
> +            ret = write_object(fd, s->aio_context, (char *)data, vmstate_oid,
>                                 s->inode.nr_copies, data_len, offset, create,
>                                 s->cache_flags);
>          }
> @@ -2541,6 +2574,9 @@ static BlockDriver bdrv_sheepdog = {
>      .bdrv_save_vmstate  = sd_save_vmstate,
>      .bdrv_load_vmstate  = sd_load_vmstate,
>  
> +    .bdrv_detach_aio_context = sd_detach_aio_context,
> +    .bdrv_attach_aio_context = sd_attach_aio_context,
> +
>      .create_options = sd_create_options,
>  };
>  
> @@ -2571,6 +2607,9 @@ static BlockDriver bdrv_sheepdog_tcp = {
>      .bdrv_save_vmstate  = sd_save_vmstate,
>      .bdrv_load_vmstate  = sd_load_vmstate,
>  
> +    .bdrv_detach_aio_context = sd_detach_aio_context,
> +    .bdrv_attach_aio_context = sd_attach_aio_context,
> +
>      .create_options = sd_create_options,
>  };
>  
> @@ -2601,6 +2640,9 @@ static BlockDriver bdrv_sheepdog_unix = {
>      .bdrv_save_vmstate  = sd_save_vmstate,
>      .bdrv_load_vmstate  = sd_load_vmstate,
>  
> +    .bdrv_detach_aio_context = sd_detach_aio_context,
> +    .bdrv_attach_aio_context = sd_attach_aio_context,
> +
>      .create_options = sd_create_options,
>  };
>  
> -- 
> 1.9.0
> 

Acked-by: Liu Yuan <namei.unix@gmail.com>

Yuan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 07/22] gluster: use BlockDriverState's AioContext
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 07/22] gluster: use BlockDriverState's AioContext Stefan Hajnoczi
@ 2014-05-05  8:39   ` Bharata B Rao
  0 siblings, 0 replies; 53+ messages in thread
From: Bharata B Rao @ 2014-05-05  8:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu, qemu-devel

On Thu, May 01, 2014 at 04:54:31PM +0200, Stefan Hajnoczi wrote:
> Drop the assumption that we're using the main AioContext.  Use
> aio_bh_new() instead of qemu_bh_new().
> 
> The .bdrv_detach_aio_context() and .bdrv_attach_aio_context() interfaces
> are not needed since no fd handlers, timers, or BHs stay registered when
> requests have been drained.
> 
> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/gluster.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Changes look fine from gluster driver's point of view.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer
  2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
                   ` (22 preceding siblings ...)
  2014-05-02  7:42 ` [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Paolo Bonzini
@ 2014-05-05  9:17 ` Christian Borntraeger
  2014-05-05 12:05   ` Stefan Hajnoczi
  23 siblings, 1 reply; 53+ messages in thread
From: Christian Borntraeger @ 2014-05-05  9:17 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Shergill, Gurinder, Vinod, Chegu

On 01/05/14 16:54, Stefan Hajnoczi wrote:
> This patch series switches virtio-blk data-plane from a custom Linux AIO
> request queue to the QEMU block layer.  The previous "raw files only"
> limitation is lifted.  All image formats and protocols can now be used with
> virtio-blk data-plane.

Nice. Is there a git branch somewhere, so that we can test this on s390?

Christian
> 
> How to review this series
> -------------------------
> I CCed the maintainer of each block driver that I modified.  You probably don't
> need to review the entire series, just your patch.
> 
> From now on fd handlers, timers, BHs, and event loop wait must explicitly use
> BlockDriverState's AioContext instead of the main loop.  Use
> bdrv_get_aio_context(bs) to get the AioContext.  The following function calls
> need to be converted:
> 
>  * qemu_aio_set_fd_handler() -> aio_set_fd_handler()
>  * timer_new*() -> aio_timer_new()
>  * qemu_bh_new() -> aio_bh_new()
>  * qemu_aio_wait() -> aio_poll(aio_context, true)
> 
> For simple block drivers this modification suffices and it is now safe to use
> outside the QEMU global mutex.
> 
> Block drivers that keep fd handlers, timers, or BHs registered when requests
> have been drained need a little bit more work.  Examples of this are network
> block drivers with keepalive timers, like iSCSI.
> 
> This series adds a new bdrv_set_aio_context(bs, aio_context) function that
> moves a BlockDriverState into a new AioContext.  This function calls the block
> driver's optional .bdrv_detach_aio_context() and .bdrv_attach_aio_context()
> functions.  Implement detach/attach to move the fd handlers, timers, or BHs to
> the new AioContext.
> 
> Finally, block drivers that manage their own child nodes also need to
> implement detach/attach because the generic block layer doesn't know about
> their children.  Both ->file and ->backing_hd are automatically taken care of
> but blkverify, quorum, and VMDK need to manually propagate detach/attach to
> their children.
> 
> I have audited and modified all block drivers.  Block driver maintainers,
> please check I did it correctly and didn't break your code.
> 
> Background
> ----------
> The block layer is currently tied to the QEMU main loop for fd handlers, timer
> callbacks, and BHs.  This means that even on hosts with many cores, parts of
> block I/O processing happen in one thread and depend on the QEMU global mutex.
> 
> virtio-blk data-plane has shown that 1,000,000 IOPS is achievable if we use
> additional threads that are not under the QEMU global mutex.
> 
> It is necessary to make the QEMU block layer aware that there may be more than
> one event loop.  This way BlockDriverState can be used from a thread without
> contention on the QEMU global mutex.
> 
> This series builds on the aio_context_acquire/release() interface that allows a
> thread to temporarily grab an AioContext.  We add bdrv_set_aio_context(bs,
> aio_context) for changing which AioContext a BlockDriverState uses.
> 
> The final patches convert virtio-blk data-plane to use the QEMU block layer and
> let the BlockDriverState run in the IOThread AioContext.
> 
> What's next?
> ------------
> I have already made block I/O throttling work in another AioContext and will
> send the series out next week.
> 
> In order to keep this series reviewable, I'm holding back those patches for
> now.  One could say, "throttling" them.
> 
> Thank you, thank you, I'll be here all night!
> 
> Stefan Hajnoczi (22):
>   block: use BlockDriverState AioContext
>   block: acquire AioContext in bdrv_close_all()
>   block: add bdrv_set_aio_context()
>   blkdebug: use BlockDriverState's AioContext
>   blkverify: implement .bdrv_detach/attach_aio_context()
>   curl: implement .bdrv_detach/attach_aio_context()
>   gluster: use BlockDriverState's AioContext
>   iscsi: implement .bdrv_detach/attach_aio_context()
>   nbd: implement .bdrv_detach/attach_aio_context()
>   nfs: implement .bdrv_detach/attach_aio_context()
>   qed: use BlockDriverState's AioContext
>   quorum: implement .bdrv_detach/attach_aio_context()
>   block/raw-posix: implement .bdrv_detach/attach_aio_context()
>   block/linux-aio: fix memory and fd leak
>   rbd: use BlockDriverState's AioContext
>   sheepdog: implement .bdrv_detach/attach_aio_context()
>   ssh: use BlockDriverState's AioContext
>   vmdk: implement .bdrv_detach/attach_aio_context()
>   dataplane: use the QEMU block layer for I/O
>   dataplane: delete IOQueue since it is no longer used
>   dataplane: implement async flush
>   raw-posix: drop raw_get_aio_fd() since it is no longer used
> 
>  block.c                          |  88 +++++++++++++--
>  block/blkdebug.c                 |   2 +-
>  block/blkverify.c                |  47 +++++---
>  block/curl.c                     | 194 +++++++++++++++++++-------------
>  block/gluster.c                  |   7 +-
>  block/iscsi.c                    |  79 +++++++++----
>  block/linux-aio.c                |  24 +++-
>  block/nbd-client.c               |  24 +++-
>  block/nbd-client.h               |   4 +
>  block/nbd.c                      |  87 +++++++++------
>  block/nfs.c                      |  80 ++++++++++----
>  block/qed-table.c                |   8 +-
>  block/qed.c                      |  35 +++++-
>  block/quorum.c                   |  48 ++++++--
>  block/raw-aio.h                  |   3 +
>  block/raw-posix.c                |  82 ++++++++------
>  block/rbd.c                      |   5 +-
>  block/sheepdog.c                 | 118 +++++++++++++-------
>  block/ssh.c                      |  36 +++---
>  block/vmdk.c                     |  23 ++++
>  hw/block/dataplane/Makefile.objs |   2 +-
>  hw/block/dataplane/ioq.c         | 117 --------------------
>  hw/block/dataplane/ioq.h         |  57 ----------
>  hw/block/dataplane/virtio-blk.c  | 233 +++++++++++++++------------------------
>  include/block/block.h            |  20 ++--
>  include/block/block_int.h        |  36 ++++++
>  26 files changed, 829 insertions(+), 630 deletions(-)
>  delete mode 100644 hw/block/dataplane/ioq.c
>  delete mode 100644 hw/block/dataplane/ioq.h
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 06/22] curl: implement .bdrv_detach/attach_aio_context()
  2014-05-04 11:00   ` Fam Zheng
@ 2014-05-05 11:52     ` Stefan Hajnoczi
  0 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-05 11:52 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Shergill, Gurinder, qemu-devel, Alexander Graf,
	Stefan Hajnoczi, Paolo Bonzini, Vinod, Chegu

On Sun, May 04, 2014 at 07:00:26PM +0800, Fam Zheng wrote:
> On Thu, 05/01 16:54, Stefan Hajnoczi wrote:
> > The curl block driver uses fd handlers, timers, and BHs.  The fd
> > handlers and timers are managed on behalf of libcurl, which controls
> > them using callback functions that the block driver implements.
> > 
> > The simplest way to implement .bdrv_detach/attach_aio_context() is to
> > clean up libcurl in the old event loop and initialize it again in the
> > new event loop.  We do not need to keep track of anything since there
> > are no pending requests when the AioContext is changed.
> > 
> > Also make sure to use aio_set_fd_handler() instead of
> > qemu_aio_set_fd_handler() and aio_bh_new() instead of qemu_bh_new() so
> > the current AioContext is passed in.
> > 
> > Cc: Alexander Graf <agraf@suse.de>
> > Cc: Fam Zheng <famz@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Might need to rebase on current master because of the latest curl fixes.
> 
> The patch itself looks good. Minor comments below.
> 
> > ---
> >  block/curl.c | 194 +++++++++++++++++++++++++++++++++++------------------------
> >  1 file changed, 116 insertions(+), 78 deletions(-)
> > 
> > diff --git a/block/curl.c b/block/curl.c
> > index 6731d28..88638ec 100644
> > --- a/block/curl.c
> > +++ b/block/curl.c
> > @@ -430,6 +434,55 @@ static void curl_parse_filename(const char *filename, QDict *options,
> >      g_free(file);
> >  }
> >  
> > +static void curl_detach_aio_context(BlockDriverState *bs)
> > +{
> > +    BDRVCURLState *s = bs->opaque;
> > +    int i;
> > +
> > +    for (i = 0; i < CURL_NUM_STATES; i++) {
> > +        if (s->states[i].in_use) {
> > +            curl_clean_state(&s->states[i]);
> > +        }
> > +        if (s->states[i].curl) {
> > +            curl_easy_cleanup(s->states[i].curl);
> > +            s->states[i].curl = NULL;
> > +        }
> > +        if (s->states[i].orig_buf) {
> > +            g_free(s->states[i].orig_buf);
> > +            s->states[i].orig_buf = NULL;
> > +        }
> > +    }
> > +    if (s->multi) {
> > +        curl_multi_cleanup(s->multi);
> > +        s->multi = NULL;
> > +    }
> > +
> > +    timer_del(&s->timer);
> > +}
> > +
> > +static void curl_attach_aio_context(BlockDriverState *bs,
> > +                                    AioContext *new_context)
> > +{
> > +    BDRVCURLState *s = bs->opaque;
> > +
> > +    aio_timer_init(new_context, &s->timer,
> > +                   QEMU_CLOCK_REALTIME, SCALE_NS,
> > +                   curl_multi_timeout_do, s);
> > +
> > +    // Now we know the file exists and its size, so let's
> > +    // initialize the multi interface!
> 
> I would keep this comment where it was. :)

Good point.

> > +
> > +    s->multi = curl_multi_init();
> 
> Should we assert bdrv_attach_aio_context() is never called repeatedly or
> without a preceding bdrv_detach_aio_context()? Otherwise s->multi could leak.

I'll add the appropriate assertions.

> > +    s->aio_context = new_context;
> > +    curl_multi_setopt(s->multi, CURLMOPT_SOCKETDATA, s);
> > +    curl_multi_setopt(s->multi, CURLMOPT_SOCKETFUNCTION, curl_sock_cb);
> > +#ifdef NEED_CURL_TIMER_CALLBACK
> > +    curl_multi_setopt(s->multi, CURLMOPT_TIMERDATA, s);
> > +    curl_multi_setopt(s->multi, CURLMOPT_TIMERFUNCTION, curl_timer_cb);
> > +#endif
> > +    curl_multi_do(s);
> 
> If you rebase to master, this call to curl_multi_do() is gone, among other
> changes.

Okay.  I'll rebase and resolve the conflicts.

Stefan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 18/22] vmdk: implement .bdrv_detach/attach_aio_context()
  2014-05-04 10:17   ` Fam Zheng
@ 2014-05-05 12:03     ` Stefan Hajnoczi
  0 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-05 12:03 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Shergill, Gurinder, qemu-devel, Stefan Hajnoczi,
	Paolo Bonzini, Vinod, Chegu

On Sun, May 04, 2014 at 06:17:45PM +0800, Fam Zheng wrote:
> On Thu, 05/01 16:54, Stefan Hajnoczi wrote:
> > @@ -2118,6 +2139,8 @@ static BlockDriver bdrv_vmdk = {
> >      .bdrv_has_zero_init           = vmdk_has_zero_init,
> >      .bdrv_get_specific_info       = vmdk_get_specific_info,
> >      .bdrv_refresh_limits          = vmdk_refresh_limits,
> > +    .bdrv_detach_aio_context      = vmdk_detach_aio_context,
> > +    .bdrv_attach_aio_context      = vmdk_attach_aio_context,
> >  
> >      .create_options               = vmdk_create_options,
> >  };
> > -- 
> 
> I'm wondering why we need to separate detach and attach as two functions, and
> also add bdrv_set_aio_context in block.c, instead of a single
> .bdrv_set_aio_context member which is called in bdrv_set_aio_context()? The
> latter seems less code.

I can see it working either way, but here is why I chose to keep them
separate:

The detach/attach happens in two phases:

1. Parents are detached before child nodes - just in case the parent
   still needs the child in order to detach.

2. The new AioContext is acquired and then children are attached before
   their parent nodes - that way the parent knows it can already use its
   children during attach.

Acquiring the new AioContext for the minimum amount of time (attach
only) seems like a good idea.  Remember the AioContext may be
responsible for other I/O devices too so we should minimize the scope of
acquire/release.

Doing it all in a single .bdrv_set_aio_context() forces detach to happen
while the new AioContext is held.

Another reason why separate detach/attach is nice is that it allows
block drivers to avoid code duplication.  .bdrv_open() calls attach()
and .bdrv_close() calls detach().  A single .bdrv_set_aio_context()
function would need extra code to deal with the open (currently not
attached) and close (don't attach to a new context) scenarios.

Stefan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 19/22] dataplane: use the QEMU block layer for I/O
  2014-05-04 11:51   ` Fam Zheng
@ 2014-05-05 12:03     ` Stefan Hajnoczi
  0 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-05 12:03 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Shergill, Gurinder, qemu-devel, Stefan Hajnoczi,
	Paolo Bonzini, Vinod, Chegu

On Sun, May 04, 2014 at 07:51:40PM +0800, Fam Zheng wrote:
> On Thu, 05/01 16:54, Stefan Hajnoczi wrote:
> > +    VirtIOBlockRequest *req = g_slice_new(VirtIOBlockRequest);
> 
> Could be g_slice_new0,
> 
> > +    QEMUIOVector *qiov;
> > +    int nb_sectors;
> > +
> > +    /* Fill in virtio block metadata needed for completion */
> > +    memset(req, 0, sizeof(*req));
> 
> so this memset is not needed.

Thanks, will fix.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer
  2014-05-05  9:17 ` Christian Borntraeger
@ 2014-05-05 12:05   ` Stefan Hajnoczi
  2014-05-05 12:46     ` Christian Borntraeger
  0 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-05 12:05 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kevin Wolf, Shergill, Gurinder, qemu-devel, Stefan Hajnoczi,
	Paolo Bonzini, Vinod, Chegu

On Mon, May 05, 2014 at 11:17:44AM +0200, Christian Borntraeger wrote:
> On 01/05/14 16:54, Stefan Hajnoczi wrote:
> > This patch series switches virtio-blk data-plane from a custom Linux AIO
> > request queue to the QEMU block layer.  The previous "raw files only"
> > limitation is lifted.  All image formats and protocols can now be used with
> > virtio-blk data-plane.
> 
> Nice. Is there a git branch somewhere, so that we can test this on s390?

Hi Christian,
I'm getting to work on v2 but you can grab this v1 series from git in
the meantime:

https://github.com/stefanha/qemu.git bdrv_set_aio_context

Stefan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer
  2014-05-05 12:05   ` Stefan Hajnoczi
@ 2014-05-05 12:46     ` Christian Borntraeger
  2014-05-06  8:39       ` Stefan Hajnoczi
  2014-05-06 13:30       ` Stefan Hajnoczi
  0 siblings, 2 replies; 53+ messages in thread
From: Christian Borntraeger @ 2014-05-05 12:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Shergill, Gurinder, qemu-devel, Stefan Hajnoczi,
	Paolo Bonzini, Vinod, Chegu

On 05/05/14 14:05, Stefan Hajnoczi wrote:
> On Mon, May 05, 2014 at 11:17:44AM +0200, Christian Borntraeger wrote:
>> On 01/05/14 16:54, Stefan Hajnoczi wrote:
>>> This patch series switches virtio-blk data-plane from a custom Linux AIO
>>> request queue to the QEMU block layer.  The previous "raw files only"
>>> limitation is lifted.  All image formats and protocols can now be used with
>>> virtio-blk data-plane.
>>
>> Nice. Is there a git branch somewhere, so that we can test this on s390?
> 
> Hi Christian,
> I'm getting to work on v2 but you can grab this v1 series from git in
> the meantime:
> 
> https://github.com/stefanha/qemu.git bdrv_set_aio_context
> 
> Stefan
> 

In general the main path seems to work fine.

With lots of devices (one qcow2, 23 raw scsi disks)
I get a hang on shutdown. kvm_stat claims that nothing is going on any more, but somehow threads are stuck in ppoll.

gdb tells me that 

all cpus have
#0  0x000003fffcde0ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8072ba30 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:842
#5  qemu_kvm_cpu_thread_fn (arg=0x80a53e10) at /home/cborntra/REPOS/qemu/cpus.c:878

all iothreads have
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff4001b00, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x807dd610, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x807dd4c8) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

the main thread has
Thread 1 (Thread 0x3fff9e5c9b0 (LWP 33684)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80ae8030, nfds=nfds@entry=4, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=ctx@entry=0x809a7ea0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x0000000080030c46 in bdrv_flush (bs=bs@entry=0x807e5900) at /home/cborntra/REPOS/qemu/block.c:4904
#5  0x0000000080030ce8 in bdrv_flush_all () at /home/cborntra/REPOS/qemu/block.c:3723
#6  0x0000000080152fe8 in do_vm_stop (state=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:538
#7  vm_stop (state=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:1219
#8  0x0000000000000000 in ?? ()


How are the ppoll calls supposed to return if there is nothing going on?

PS: I think I have seen this before recently during managedsave, so it might have been introduced with the iothread rework instead of this one.





---- full trace ----
Thread 34 (Thread 0x3fff919c910 (LWP 33696)):
#0  0x000003fffcde0b5e in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8072ba30 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:842
#5  qemu_kvm_cpu_thread_fn (arg=0x809e1f00) at /home/cborntra/REPOS/qemu/cpus.c:878
#6  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#7  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 33 (Thread 0x3fff899c910 (LWP 33697)):
#0  0x000003fffcde0ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8072ba30 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:842
#5  qemu_kvm_cpu_thread_fn (arg=0x809f2370) at /home/cborntra/REPOS/qemu/cpus.c:878
#6  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#7  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 32 (Thread 0x3fef3fff910 (LWP 33698)):
#0  0x000003fffcde0ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8072ba30 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:842
#5  qemu_kvm_cpu_thread_fn (arg=0x80a027e0) at /home/cborntra/REPOS/qemu/cpus.c:878
#6  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#7  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 31 (Thread 0x3fef37ff910 (LWP 33699)):
#0  0x000003fffcde0b5e in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8072ba30 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:842
#5  qemu_kvm_cpu_thread_fn (arg=0x80a12c50) at /home/cborntra/REPOS/qemu/cpus.c:878
#6  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#7  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 30 (Thread 0x3fef2fff910 (LWP 33700)):
#0  0x000003fffcde0ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8072ba30 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:842
#5  qemu_kvm_cpu_thread_fn (arg=0x80a230c0) at /home/cborntra/REPOS/qemu/cpus.c:878
#6  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#7  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 29 (Thread 0x3fef27ff910 (LWP 33701)):
#0  0x000003fffcde0ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8072ba30 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:842
#5  qemu_kvm_cpu_thread_fn (arg=0x80a33530) at /home/cborntra/REPOS/qemu/cpus.c:878
#6  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#7  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 28 (Thread 0x3fef1fff910 (LWP 33702)):
#0  0x000003fffcde0ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8072ba30 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:842
#5  qemu_kvm_cpu_thread_fn (arg=0x80a439a0) at /home/cborntra/REPOS/qemu/cpus.c:878
#6  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#7  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 27 (Thread 0x3fef17ff910 (LWP 33703)):
#0  0x000003fffcde0ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8072ba30 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:842
#5  qemu_kvm_cpu_thread_fn (arg=0x80a53e10) at /home/cborntra/REPOS/qemu/cpus.c:878
#6  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#7  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 26 (Thread 0x3fef0fff910 (LWP 33704)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80ae8030, nfds=nfds@entry=4, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x809a7ea0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x809a7d58) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 25 (Thread 0x3fef07ff910 (LWP 33705)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff4001b00, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x807dd610, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x807dd4c8) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 24 (Thread 0x3feeffff910 (LWP 33706)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x807e5470, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x807e0130, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x807e0038) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 23 (Thread 0x3feef7ff910 (LWP 33707)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff400e350, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a6f440, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a6f348) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 22 (Thread 0x3feeefff910 (LWP 33708)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80ad5dd0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a65db0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a65c68) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 21 (Thread 0x3feee7ff910 (LWP 33709)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff400e380, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a69f00, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a69e08) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 20 (Thread 0x3feedfff910 (LWP 33710)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80ae8580, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a7cb50, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a7ca58) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 19 (Thread 0x3feed7ff910 (LWP 33711)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff4001180, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a87050, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a86f08) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 18 (Thread 0x3feecfff910 (LWP 33712)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80acf3d0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a83280, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a83188) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 17 (Thread 0x3feec7ff910 (LWP 33713)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff4008fd0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a74bc0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a74a78) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 16 (Thread 0x3feebfff910 (LWP 33714)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80ae61d0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a78cf0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a78bf8) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 15 (Thread 0x3feeb7ff910 (LWP 33715)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff40011b0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a911a0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a910a8) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 14 (Thread 0x3feeafff910 (LWP 33716)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80af53d0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a9b680, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a9b538) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 13 (Thread 0x3feea7ff910 (LWP 33717)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff4002bd0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a978b0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a977b8) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 12 (Thread 0x3fee9fff910 (LWP 33718)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80ae85b0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a891d0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a89088) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 11 (Thread 0x3fee97ff910 (LWP 33719)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff400a7d0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a8d320, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a8d228) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 10 (Thread 0x3fee8fff910 (LWP 33720)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80ae27d0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80aa57d0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80aa56d8) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 9 (Thread 0x3fee87ff910 (LWP 33721)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff4018fd0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80aafcd0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80aafb88) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 8 (Thread 0x3fee7fff910 (LWP 33722)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80b02fd0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80aabf00, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80aabe08) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 7 (Thread 0x3fee77ff910 (LWP 33723)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff4017580, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80a9d840, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80a9d6f8) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 6 (Thread 0x3fee6fff910 (LWP 33724)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80abc4d0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80aa1970, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80aa1878) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 5 (Thread 0x3fee67ff910 (LWP 33725)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff40067d0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80aba760, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80aba668) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 4 (Thread 0x3fee5fff910 (LWP 33726)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80af7bd0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80ab2630, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80ab24e8) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 3 (Thread 0x3fee57ff910 (LWP 33727)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x3fff400e2e0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80ab5170, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80ab5078) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 2 (Thread 0x3fee4fff910 (LWP 33728)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80b02980, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=0x80ac5ac0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x00000000800b2f6c in iothread_run (opaque=0x80ac59c8) at /home/cborntra/REPOS/qemu/iothread.c:41
#5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
#6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0x3fff9e5c9b0 (LWP 33684)):
#0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
#1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=fds@entry=0x80ae8030, nfds=nfds@entry=4, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
#3  0x000000008001ae4c in aio_poll (ctx=ctx@entry=0x809a7ea0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4  0x0000000080030c46 in bdrv_flush (bs=bs@entry=0x807e5900) at /home/cborntra/REPOS/qemu/block.c:4904
#5  0x0000000080030ce8 in bdrv_flush_all () at /home/cborntra/REPOS/qemu/block.c:3723
#6  0x0000000080152fe8 in do_vm_stop (state=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:538
#7  vm_stop (state=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:1219
#8  0x0000000000000000 in ?? ()

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 12/22] quorum: implement .bdrv_detach/attach_aio_context()
  2014-05-01 14:54 ` [Qemu-devel] [PATCH 12/22] quorum: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
@ 2014-05-05 15:46   ` Benoît Canet
  0 siblings, 0 replies; 53+ messages in thread
From: Benoît Canet @ 2014-05-05 15:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Benoît Canet, Shergill, Gurinder, qemu-devel,
	Paolo Bonzini, Vinod, Chegu

The Thursday 01 May 2014 à 16:54:36 (+0200), Stefan Hajnoczi wrote :
> Implement .bdrv_detach/attach_aio_context() interfaces to propagate
> detach/attach to BDRVQuorumState->bs[] children.  The block layer takes
> care of ->file and ->backing_hd but doesn't know about our ->bs[]
> BlockDriverStates, which is also part of the graph.
> 
> Cc: Benoît Canet <benoit.canet@irqsave.net>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/quorum.c | 48 ++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 36 insertions(+), 12 deletions(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index ecec3a5..426077a 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -848,25 +848,49 @@ static void quorum_close(BlockDriverState *bs)
>      g_free(s->bs);
>  }
>  
> +static void quorum_detach_aio_context(BlockDriverState *bs)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        bdrv_detach_aio_context(s->bs[i]);
> +    }
> +}
> +
> +static void quorum_attach_aio_context(BlockDriverState *bs,
> +                                      AioContext *new_context)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        bdrv_attach_aio_context(s->bs[i], new_context);
> +    }
> +}
> +
>  static BlockDriver bdrv_quorum = {
> -    .format_name        = "quorum",
> -    .protocol_name      = "quorum",
> +    .format_name                        = "quorum",
> +    .protocol_name                      = "quorum",
> +
> +    .instance_size                      = sizeof(BDRVQuorumState),
>  
> -    .instance_size      = sizeof(BDRVQuorumState),
> +    .bdrv_file_open                     = quorum_open,
> +    .bdrv_close                         = quorum_close,
>  
> -    .bdrv_file_open     = quorum_open,
> -    .bdrv_close         = quorum_close,
> +    .bdrv_co_flush_to_disk              = quorum_co_flush,
>  
> -    .bdrv_co_flush_to_disk = quorum_co_flush,
> +    .bdrv_getlength                     = quorum_getlength,
>  
> -    .bdrv_getlength     = quorum_getlength,
> +    .bdrv_aio_readv                     = quorum_aio_readv,
> +    .bdrv_aio_writev                    = quorum_aio_writev,
> +    .bdrv_invalidate_cache              = quorum_invalidate_cache,
>  
> -    .bdrv_aio_readv     = quorum_aio_readv,
> -    .bdrv_aio_writev    = quorum_aio_writev,
> -    .bdrv_invalidate_cache = quorum_invalidate_cache,
> +    .bdrv_detach_aio_context            = quorum_detach_aio_context,
> +    .bdrv_attach_aio_context            = quorum_attach_aio_context,
>  
> -    .is_filter           = true,
> -    .bdrv_recurse_is_first_non_filter = quorum_recurse_is_first_non_filter,
> +    .is_filter                          = true,
> +    .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
>  };
>  
>  static void bdrv_quorum_init(void)
> -- 
> 1.9.0
> 
Look good

Reviewed-by: Benoit Canet <benoit@irqsave.net>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer
  2014-05-05 12:46     ` Christian Borntraeger
@ 2014-05-06  8:39       ` Stefan Hajnoczi
  2014-05-06 13:30       ` Stefan Hajnoczi
  1 sibling, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-06  8:39 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kevin Wolf, Stefan Hajnoczi, Shergill, Gurinder, qemu-devel,
	Paolo Bonzini, Vinod, Chegu

On Mon, May 05, 2014 at 02:46:09PM +0200, Christian Borntraeger wrote:
> On 05/05/14 14:05, Stefan Hajnoczi wrote:
> > On Mon, May 05, 2014 at 11:17:44AM +0200, Christian Borntraeger wrote:
> >> On 01/05/14 16:54, Stefan Hajnoczi wrote:
> >>> This patch series switches virtio-blk data-plane from a custom Linux AIO
> >>> request queue to the QEMU block layer.  The previous "raw files only"
> >>> limitation is lifted.  All image formats and protocols can now be used with
> >>> virtio-blk data-plane.
> >>
> >> Nice. Is there a git branch somewhere, so that we can test this on s390?
> > 
> > Hi Christian,
> > I'm getting to work on v2 but you can grab this v1 series from git in
> > the meantime:
> > 
> > https://github.com/stefanha/qemu.git bdrv_set_aio_context
> > 
> > Stefan
> > 
> 
> In general the main path seems to work fine.
> 
> With lots of devices (one qcow2, 23 raw scsi disks)
> I get a hang on shutdown. kvm_stat claims that nothing is going on any more, but somehow threads are stuck in ppoll.
> 
> gdb tells me that 
> 
> all cpus have
> #0  0x000003fffcde0ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
> #2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8072ba30 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
> #4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:842
> #5  qemu_kvm_cpu_thread_fn (arg=0x80a53e10) at /home/cborntra/REPOS/qemu/cpus.c:878
> 
> all iothreads have
> #0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
> #1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2  qemu_poll_ns (fds=fds@entry=0x3fff4001b00, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
> #3  0x000000008001ae4c in aio_poll (ctx=0x807dd610, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
> #4  0x00000000800b2f6c in iothread_run (opaque=0x807dd4c8) at /home/cborntra/REPOS/qemu/iothread.c:41
> #5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
> #6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6
> 
> the main thread has
> Thread 1 (Thread 0x3fff9e5c9b0 (LWP 33684)):
> #0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
> #1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2  qemu_poll_ns (fds=fds@entry=0x80ae8030, nfds=nfds@entry=4, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:311
> #3  0x000000008001ae4c in aio_poll (ctx=ctx@entry=0x809a7ea0, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
> #4  0x0000000080030c46 in bdrv_flush (bs=bs@entry=0x807e5900) at /home/cborntra/REPOS/qemu/block.c:4904
> #5  0x0000000080030ce8 in bdrv_flush_all () at /home/cborntra/REPOS/qemu/block.c:3723
> #6  0x0000000080152fe8 in do_vm_stop (state=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:538
> #7  vm_stop (state=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:1219
> #8  0x0000000000000000 in ?? ()
> 
> 
> How are the ppoll calls supposed to return if there is nothing going on?

The AioContext event loop has an event notifier to kick the AioContext.
This is how you can signal it from another thread.

> PS: I think I have seen this before recently during managedsave, so it might have been introduced with the iothread rework instead of this one.

I suspect this is due to a race condition in bdrv_flush_all().  In this
series I added AioContext acquire/release for bdrv_close_all() so that
vl.c:main() shutdown works.  It's probably a similar issue.

Thanks for raising this issue, I'll investigate and send a fix.  I
suspect this is not the other issue which you saw during managedsave.

Stefan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer
  2014-05-05 12:46     ` Christian Borntraeger
  2014-05-06  8:39       ` Stefan Hajnoczi
@ 2014-05-06 13:30       ` Stefan Hajnoczi
  1 sibling, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-06 13:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kevin Wolf, Shergill, Gurinder, qemu-devel, Stefan Hajnoczi,
	Paolo Bonzini, Vinod, Chegu

On Mon, May 05, 2014 at 02:46:09PM +0200, Christian Borntraeger wrote:
> On 05/05/14 14:05, Stefan Hajnoczi wrote:
> > On Mon, May 05, 2014 at 11:17:44AM +0200, Christian Borntraeger wrote:
> >> On 01/05/14 16:54, Stefan Hajnoczi wrote:
> >>> This patch series switches virtio-blk data-plane from a custom Linux AIO
> >>> request queue to the QEMU block layer.  The previous "raw files only"
> >>> limitation is lifted.  All image formats and protocols can now be used with
> >>> virtio-blk data-plane.
> >>
> >> Nice. Is there a git branch somewhere, so that we can test this on s390?
> > 
> > Hi Christian,
> > I'm getting to work on v2 but you can grab this v1 series from git in
> > the meantime:
> > 
> > https://github.com/stefanha/qemu.git bdrv_set_aio_context
> > 
> > Stefan
> > 
> 
> In general the main path seems to work fine.
> 
> With lots of devices (one qcow2, 23 raw scsi disks)
> I get a hang on shutdown. kvm_stat claims that nothing is going on any more, but somehow threads are stuck in ppoll.

Thanks for the debugging info and testing the fix on IRC.

I will include it in the next revision.

Stefan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()
  2014-05-01 22:39   ` Peter Lieven
@ 2014-05-07 10:07     ` Stefan Hajnoczi
  2014-05-07 10:29       ` Paolo Bonzini
  0 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-07 10:07 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Stefan Hajnoczi, Shergill, Gurinder, qemu-devel,
	Ronnie Sahlberg, Paolo Bonzini, Vinod, Chegu

On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote:
> > +static void iscsi_attach_aio_context(BlockDriverState *bs,
> > +                                     AioContext *new_context)
> > +{
> > +    IscsiLun *iscsilun = bs->opaque;
> > +
> > +    iscsilun->aio_context = new_context;
> > +    iscsi_set_events(iscsilun);
> > +
> > +#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
> > +    /* Set up a timer for sending out iSCSI NOPs */
> > +    iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context,
> > +                                        QEMU_CLOCK_REALTIME, SCALE_MS,
> > +                                        iscsi_nop_timed_event, iscsilun);
> > +    timer_mod(iscsilun->nop_timer,
> > +              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
> > +#endif
> > +}
> 
> Is it still guaranteed that iscsi_nop_timed_event for a target is not invoked
> while we are in another function/callback of the iscsi driver for the same target?

This is a good point.

Previously, the nop timer was deferred until the qemu_aio_wait() loop
terminates.

With this patch the nop timer fires during aio_poll() loops for any
synchronous emulation that QEMU does (including iscsi_aio_cancel() and
.bdrv_ioctl() in block/iscsi.c).

I don't know libiscsi well enough to understand the implications.  I can
see that iscsi_reconnect() resends in-flight commands.  So what's the
upshot of all this?

BTW, is iscsi_reconnect() the right libiscsi interface to use since it
is synchronous?  It seems like this would block QEMU until the socket
has connected!  The guest would be frozen.

Stefan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()
  2014-05-07 10:07     ` Stefan Hajnoczi
@ 2014-05-07 10:29       ` Paolo Bonzini
  2014-05-07 14:09         ` Peter Lieven
  0 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2014-05-07 10:29 UTC (permalink / raw)
  To: Stefan Hajnoczi, Peter Lieven
  Cc: Kevin Wolf, Stefan Hajnoczi, Shergill, Gurinder, qemu-devel,
	Ronnie Sahlberg, Vinod, Chegu

Il 07/05/2014 12:07, Stefan Hajnoczi ha scritto:
> On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote:
>>> +static void iscsi_attach_aio_context(BlockDriverState *bs,
>>> +                                     AioContext *new_context)
>>> +{
>>> +    IscsiLun *iscsilun = bs->opaque;
>>> +
>>> +    iscsilun->aio_context = new_context;
>>> +    iscsi_set_events(iscsilun);
>>> +
>>> +#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
>>> +    /* Set up a timer for sending out iSCSI NOPs */
>>> +    iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context,
>>> +                                        QEMU_CLOCK_REALTIME, SCALE_MS,
>>> +                                        iscsi_nop_timed_event, iscsilun);
>>> +    timer_mod(iscsilun->nop_timer,
>>> +              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
>>> +#endif
>>> +}
>>
>> Is it still guaranteed that iscsi_nop_timed_event for a target is not invoked
>> while we are in another function/callback of the iscsi driver for the same target?

Yes, since the timer is in the same AioContext as the iscsi driver 
callbacks.

> This is a good point.
>
> Previously, the nop timer was deferred until the qemu_aio_wait() loop
> terminates.
>
> With this patch the nop timer fires during aio_poll() loops for any
> synchronous emulation that QEMU does (including iscsi_aio_cancel() and
> .bdrv_ioctl() in block/iscsi.c).
>
> I don't know libiscsi well enough to understand the implications.  I can
> see that iscsi_reconnect() resends in-flight commands.  So what's the
> upshot of all this?

I think it's fine.  The target will process NOPs asynchronously, so 
iscsi_nop_timed_event will see no NOP in flight if the target is working 
properly.

> BTW, is iscsi_reconnect() the right libiscsi interface to use since it
> is synchronous?  It seems like this would block QEMU until the socket
> has connected!  The guest would be frozen.

There is no asynchronous interface yet for reconnection, unfortunately.

Paolo

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()
  2014-05-07 10:29       ` Paolo Bonzini
@ 2014-05-07 14:09         ` Peter Lieven
  2014-05-08 11:33           ` Stefan Hajnoczi
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Lieven @ 2014-05-07 14:09 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, Stefan Hajnoczi, Shergill, Gurinder, qemu-devel,
	Ronnie Sahlberg, Vinod, Chegu

On 07.05.2014 12:29, Paolo Bonzini wrote:
> Il 07/05/2014 12:07, Stefan Hajnoczi ha scritto:
>> On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote:
>>>> +static void iscsi_attach_aio_context(BlockDriverState *bs,
>>>> +                                     AioContext *new_context)
>>>> +{
>>>> +    IscsiLun *iscsilun = bs->opaque;
>>>> +
>>>> +    iscsilun->aio_context = new_context;
>>>> +    iscsi_set_events(iscsilun);
>>>> +
>>>> +#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
>>>> +    /* Set up a timer for sending out iSCSI NOPs */
>>>> +    iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context,
>>>> + QEMU_CLOCK_REALTIME, SCALE_MS,
>>>> + iscsi_nop_timed_event, iscsilun);
>>>> +    timer_mod(iscsilun->nop_timer,
>>>> +              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
>>>> +#endif
>>>> +}
>>>
>>> Is it still guaranteed that iscsi_nop_timed_event for a target is not invoked
>>> while we are in another function/callback of the iscsi driver for the same target?
>
> Yes, since the timer is in the same AioContext as the iscsi driver callbacks.


Ok. Stefan: What MUST NOT happen is that the timer gets fired while we are in iscsi_service.
As Paolo outlined, this cannot happen, right?

>
>> This is a good point.
>>
>> Previously, the nop timer was deferred until the qemu_aio_wait() loop
>> terminates.
>>
>> With this patch the nop timer fires during aio_poll() loops for any
>> synchronous emulation that QEMU does (including iscsi_aio_cancel() and
>> .bdrv_ioctl() in block/iscsi.c).
>>
>> I don't know libiscsi well enough to understand the implications.  I can
>> see that iscsi_reconnect() resends in-flight commands.  So what's the
>> upshot of all this?
>
> I think it's fine.  The target will process NOPs asynchronously, so iscsi_nop_timed_event will see no NOP in flight if the target is working properly.

Yes, or at most one in flight NOP.

>
>> BTW, is iscsi_reconnect() the right libiscsi interface to use since it
>> is synchronous?  It seems like this would block QEMU until the socket
>> has connected!  The guest would be frozen.
>
> There is no asynchronous interface yet for reconnection, unfortunately.

We initiate the reconnect after we miss a few NOP replies. So the target is already down for approx. 30 seconds.
Every process inside the guest is already haging or has timed out.

If I understand correctly with the new patches only the communication with this target is hanging or isn't it?
So what benefit would an asyncronous reconnect have?

Peter

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()
  2014-05-07 14:09         ` Peter Lieven
@ 2014-05-08 11:33           ` Stefan Hajnoczi
  2014-05-08 14:52             ` ronnie sahlberg
  0 siblings, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2014-05-08 11:33 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Stefan Hajnoczi, Shergill, Gurinder, qemu-devel,
	Ronnie Sahlberg, Paolo Bonzini, Vinod, Chegu

On Wed, May 07, 2014 at 04:09:27PM +0200, Peter Lieven wrote:
> On 07.05.2014 12:29, Paolo Bonzini wrote:
> >Il 07/05/2014 12:07, Stefan Hajnoczi ha scritto:
> >>On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote:
> >>>>+static void iscsi_attach_aio_context(BlockDriverState *bs,
> >>>>+                                     AioContext *new_context)
> >>>>+{
> >>>>+    IscsiLun *iscsilun = bs->opaque;
> >>>>+
> >>>>+    iscsilun->aio_context = new_context;
> >>>>+    iscsi_set_events(iscsilun);
> >>>>+
> >>>>+#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
> >>>>+    /* Set up a timer for sending out iSCSI NOPs */
> >>>>+    iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context,
> >>>>+ QEMU_CLOCK_REALTIME, SCALE_MS,
> >>>>+ iscsi_nop_timed_event, iscsilun);
> >>>>+    timer_mod(iscsilun->nop_timer,
> >>>>+              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
> >>>>+#endif
> >>>>+}
> >>>
> >>>Is it still guaranteed that iscsi_nop_timed_event for a target is not invoked
> >>>while we are in another function/callback of the iscsi driver for the same target?
> >
> >Yes, since the timer is in the same AioContext as the iscsi driver callbacks.
> 
> 
> Ok. Stefan: What MUST NOT happen is that the timer gets fired while we are in iscsi_service.
> As Paolo outlined, this cannot happen, right?

Okay, I think we're safe then.  The timer can only be invoked during
aio_poll() event loop iterations.  It cannot be invoked while we're
inside iscsi_service().

> >>BTW, is iscsi_reconnect() the right libiscsi interface to use since it
> >>is synchronous?  It seems like this would block QEMU until the socket
> >>has connected!  The guest would be frozen.
> >
> >There is no asynchronous interface yet for reconnection, unfortunately.
> 
> We initiate the reconnect after we miss a few NOP replies. So the target is already down for approx. 30 seconds.
> Every process inside the guest is already haging or has timed out.
> 
> If I understand correctly with the new patches only the communication with this target is hanging or isn't it?
> So what benefit would an asyncronous reconnect have?

Asynchronous reconnect is desirable:

1. The QEMU monitor is blocked while we're waiting for the iSCSI target
   to accept our reconnect.  This means the management stack (libvirt)
   cannot control QEMU until we time out or succeed.

2. The guest is totally frozen - cannot execute instructions - because
   it will soon reach a point in the code that locks the QEMU global
   mutex (which is being held while we reconnect to the iSCSI target).

   This may be okayish for guests where the iSCSI LUN contains the
   "main" data that is being processed.  But what if an iSCSI LUN was
   just attached to a guest that is also doing other things that are
   independent (e.g. serving a website, processing data from a local
   disk, etc) - now the reconnect causes downtime for the entire guest.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()
  2014-05-08 11:33           ` Stefan Hajnoczi
@ 2014-05-08 14:52             ` ronnie sahlberg
  2014-05-08 15:45               ` Peter Lieven
  0 siblings, 1 reply; 53+ messages in thread
From: ronnie sahlberg @ 2014-05-08 14:52 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Stefan Hajnoczi, Shergill, Gurinder, Peter Lieven,
	qemu-devel, Paolo Bonzini, Vinod, Chegu

On Thu, May 8, 2014 at 4:33 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Wed, May 07, 2014 at 04:09:27PM +0200, Peter Lieven wrote:
>> On 07.05.2014 12:29, Paolo Bonzini wrote:
>> >Il 07/05/2014 12:07, Stefan Hajnoczi ha scritto:
>> >>On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote:
>> >>>>+static void iscsi_attach_aio_context(BlockDriverState *bs,
>> >>>>+                                     AioContext *new_context)
>> >>>>+{
>> >>>>+    IscsiLun *iscsilun = bs->opaque;
>> >>>>+
>> >>>>+    iscsilun->aio_context = new_context;
>> >>>>+    iscsi_set_events(iscsilun);
>> >>>>+
>> >>>>+#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
>> >>>>+    /* Set up a timer for sending out iSCSI NOPs */
>> >>>>+    iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context,
>> >>>>+ QEMU_CLOCK_REALTIME, SCALE_MS,
>> >>>>+ iscsi_nop_timed_event, iscsilun);
>> >>>>+    timer_mod(iscsilun->nop_timer,
>> >>>>+              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
>> >>>>+#endif
>> >>>>+}
>> >>>
>> >>>Is it still guaranteed that iscsi_nop_timed_event for a target is not invoked
>> >>>while we are in another function/callback of the iscsi driver for the same target?
>> >
>> >Yes, since the timer is in the same AioContext as the iscsi driver callbacks.
>>
>>
>> Ok. Stefan: What MUST NOT happen is that the timer gets fired while we are in iscsi_service.
>> As Paolo outlined, this cannot happen, right?
>
> Okay, I think we're safe then.  The timer can only be invoked during
> aio_poll() event loop iterations.  It cannot be invoked while we're
> inside iscsi_service().
>
>> >>BTW, is iscsi_reconnect() the right libiscsi interface to use since it
>> >>is synchronous?  It seems like this would block QEMU until the socket
>> >>has connected!  The guest would be frozen.
>> >
>> >There is no asynchronous interface yet for reconnection, unfortunately.
>>
>> We initiate the reconnect after we miss a few NOP replies. So the target is already down for approx. 30 seconds.
>> Every process inside the guest is already haging or has timed out.
>>
>> If I understand correctly with the new patches only the communication with this target is hanging or isn't it?
>> So what benefit would an asyncronous reconnect have?
>
> Asynchronous reconnect is desirable:
>
> 1. The QEMU monitor is blocked while we're waiting for the iSCSI target
>    to accept our reconnect.  This means the management stack (libvirt)
>    cannot control QEMU until we time out or succeed.
>
> 2. The guest is totally frozen - cannot execute instructions - because
>    it will soon reach a point in the code that locks the QEMU global
>    mutex (which is being held while we reconnect to the iSCSI target).
>
>    This may be okayish for guests where the iSCSI LUN contains the
>    "main" data that is being processed.  But what if an iSCSI LUN was
>    just attached to a guest that is also doing other things that are
>    independent (e.g. serving a website, processing data from a local
>    disk, etc) - now the reconnect causes downtime for the entire guest.

I will look into making the reconnect async over the next few days.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()
  2014-05-08 14:52             ` ronnie sahlberg
@ 2014-05-08 15:45               ` Peter Lieven
  0 siblings, 0 replies; 53+ messages in thread
From: Peter Lieven @ 2014-05-08 15:45 UTC (permalink / raw)
  To: ronnie sahlberg, Stefan Hajnoczi
  Cc: Kevin Wolf, Stefan Hajnoczi, Shergill, Gurinder, qemu-devel,
	Paolo Bonzini, Vinod, Chegu

Am 08.05.2014 16:52, schrieb ronnie sahlberg:
> On Thu, May 8, 2014 at 4:33 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>> On Wed, May 07, 2014 at 04:09:27PM +0200, Peter Lieven wrote:
>>> On 07.05.2014 12:29, Paolo Bonzini wrote:
>>>> Il 07/05/2014 12:07, Stefan Hajnoczi ha scritto:
>>>>> On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote:
>>>>>>> +static void iscsi_attach_aio_context(BlockDriverState *bs,
>>>>>>> +                                     AioContext *new_context)
>>>>>>> +{
>>>>>>> +    IscsiLun *iscsilun = bs->opaque;
>>>>>>> +
>>>>>>> +    iscsilun->aio_context = new_context;
>>>>>>> +    iscsi_set_events(iscsilun);
>>>>>>> +
>>>>>>> +#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
>>>>>>> +    /* Set up a timer for sending out iSCSI NOPs */
>>>>>>> +    iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context,
>>>>>>> + QEMU_CLOCK_REALTIME, SCALE_MS,
>>>>>>> + iscsi_nop_timed_event, iscsilun);
>>>>>>> +    timer_mod(iscsilun->nop_timer,
>>>>>>> +              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
>>>>>>> +#endif
>>>>>>> +}
>>>>>> Is it still guaranteed that iscsi_nop_timed_event for a target is not invoked
>>>>>> while we are in another function/callback of the iscsi driver for the same target?
>>>> Yes, since the timer is in the same AioContext as the iscsi driver callbacks.
>>>
>>> Ok. Stefan: What MUST NOT happen is that the timer gets fired while we are in iscsi_service.
>>> As Paolo outlined, this cannot happen, right?
>> Okay, I think we're safe then.  The timer can only be invoked during
>> aio_poll() event loop iterations.  It cannot be invoked while we're
>> inside iscsi_service().
>>
>>>>> BTW, is iscsi_reconnect() the right libiscsi interface to use since it
>>>>> is synchronous?  It seems like this would block QEMU until the socket
>>>>> has connected!  The guest would be frozen.
>>>> There is no asynchronous interface yet for reconnection, unfortunately.
>>> We initiate the reconnect after we miss a few NOP replies. So the target is already down for approx. 30 seconds.
>>> Every process inside the guest is already haging or has timed out.
>>>
>>> If I understand correctly with the new patches only the communication with this target is hanging or isn't it?
>>> So what benefit would an asyncronous reconnect have?
>> Asynchronous reconnect is desirable:
>>
>> 1. The QEMU monitor is blocked while we're waiting for the iSCSI target
>>    to accept our reconnect.  This means the management stack (libvirt)
>>    cannot control QEMU until we time out or succeed.
>>
>> 2. The guest is totally frozen - cannot execute instructions - because
>>    it will soon reach a point in the code that locks the QEMU global
>>    mutex (which is being held while we reconnect to the iSCSI target).
>>
>>    This may be okayish for guests where the iSCSI LUN contains the
>>    "main" data that is being processed.  But what if an iSCSI LUN was
>>    just attached to a guest that is also doing other things that are
>>    independent (e.g. serving a website, processing data from a local
>>    disk, etc) - now the reconnect causes downtime for the entire guest.
> I will look into making the reconnect async over the next few days.

Thanks for looking into this. I have a few things in mind that I will
post on github to the issue you created.

Peter

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()
@ 2014-05-01 22:47 Peter Lieven
  0 siblings, 0 replies; 53+ messages in thread
From: Peter Lieven @ 2014-05-01 22:47 UTC (permalink / raw)
  To: pl
  Cc: Kevin Wolf, Ronnie Sahlberg, Shergill, Gurinder, qemu-devel,
	Stefan Hajnoczi, Paolo Bonzini, Vinod, Chegu

Stefan Hajnoczi wrote:
> Drop the assumption that we're using the main AioContext for Linux
> AIO.  Convert qemu_aio_set_fd_handler() to aio_set_fd_handler() and
> timer_new_ms() to aio_timer_new().
>
> The .bdrv_detach/attach_aio_context() interfaces also need to be
> implemented to move the fd and timer from the old to the new AioContext.
>
> Cc: Peter Lieven <pl@kamp.de>
> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/iscsi.c | 79
> +++++++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 55 insertions(+), 24 deletions(-)
>
> diff --git a/block/iscsi.c b/block/iscsi.c
> index a30202b..81e3ebd 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -47,6 +47,7 @@
>
>  typedef struct IscsiLun {
>      struct iscsi_context *iscsi;
> +    AioContext *aio_context;
>      int lun;
>      enum scsi_inquiry_peripheral_device_type type;
>      int block_size;
> @@ -69,6 +70,7 @@ typedef struct IscsiTask {
>      struct scsi_task *task;
>      Coroutine *co;
>      QEMUBH *bh;
> +    AioContext *aio_context;
>  } IscsiTask;
>
>  typedef struct IscsiAIOCB {
> @@ -120,7 +122,7 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
>      if (acb->bh) {
>          return;
>      }
> -    acb->bh = qemu_bh_new(iscsi_bh_cb, acb);
> +    acb->bh = aio_bh_new(acb->iscsilun->aio_context, iscsi_bh_cb, acb);
>      qemu_bh_schedule(acb->bh);
>  }
>
> @@ -156,7 +158,7 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int
> status,
>
>  out:
>      if (iTask->co) {
> -        iTask->bh = qemu_bh_new(iscsi_co_generic_bh_cb, iTask);
> +        iTask->bh = aio_bh_new(iTask->aio_context,
> iscsi_co_generic_bh_cb, iTask);
>          qemu_bh_schedule(iTask->bh);
>      }
>  }
> @@ -164,8 +166,9 @@ out:
>  static void iscsi_co_init_iscsitask(IscsiLun *iscsilun, struct IscsiTask
> *iTask)
>  {
>      *iTask = (struct IscsiTask) {
> -        .co         = qemu_coroutine_self(),
> -        .retries    = ISCSI_CMD_RETRIES,
> +        .co             = qemu_coroutine_self(),%

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2014-05-08 15:45 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-01 14:54 [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 01/22] block: use BlockDriverState AioContext Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 02/22] block: acquire AioContext in bdrv_close_all() Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 03/22] block: add bdrv_set_aio_context() Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 04/22] blkdebug: use BlockDriverState's AioContext Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 05/22] blkverify: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 06/22] curl: " Stefan Hajnoczi
2014-05-04 11:00   ` Fam Zheng
2014-05-05 11:52     ` Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 07/22] gluster: use BlockDriverState's AioContext Stefan Hajnoczi
2014-05-05  8:39   ` Bharata B Rao
2014-05-01 14:54 ` [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
2014-05-01 22:39   ` Peter Lieven
2014-05-07 10:07     ` Stefan Hajnoczi
2014-05-07 10:29       ` Paolo Bonzini
2014-05-07 14:09         ` Peter Lieven
2014-05-08 11:33           ` Stefan Hajnoczi
2014-05-08 14:52             ` ronnie sahlberg
2014-05-08 15:45               ` Peter Lieven
2014-05-01 14:54 ` [Qemu-devel] [PATCH 09/22] nbd: " Stefan Hajnoczi
2014-05-02  7:40   ` Paolo Bonzini
2014-05-01 14:54 ` [Qemu-devel] [PATCH 10/22] nfs: " Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 11/22] qed: use BlockDriverState's AioContext Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 12/22] quorum: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
2014-05-05 15:46   ` Benoît Canet
2014-05-01 14:54 ` [Qemu-devel] [PATCH 13/22] block/raw-posix: " Stefan Hajnoczi
2014-05-02  7:39   ` Paolo Bonzini
2014-05-02 11:45     ` Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 14/22] block/linux-aio: fix memory and fd leak Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 15/22] rbd: use BlockDriverState's AioContext Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 16/22] sheepdog: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
2014-05-05  8:10   ` Liu Yuan
2014-05-01 14:54 ` [Qemu-devel] [PATCH 17/22] ssh: use BlockDriverState's AioContext Stefan Hajnoczi
2014-05-01 15:03   ` Richard W.M. Jones
2014-05-01 15:13     ` Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 18/22] vmdk: implement .bdrv_detach/attach_aio_context() Stefan Hajnoczi
2014-05-04  9:50   ` Fam Zheng
2014-05-04 10:17   ` Fam Zheng
2014-05-05 12:03     ` Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 19/22] dataplane: use the QEMU block layer for I/O Stefan Hajnoczi
2014-05-04 11:51   ` Fam Zheng
2014-05-05 12:03     ` Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 20/22] dataplane: delete IOQueue since it is no longer used Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 21/22] dataplane: implement async flush Stefan Hajnoczi
2014-05-01 14:54 ` [Qemu-devel] [PATCH 22/22] raw-posix: drop raw_get_aio_fd() since it is no longer used Stefan Hajnoczi
2014-05-02  7:42 ` [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer Paolo Bonzini
2014-05-02 11:59   ` Stefan Hajnoczi
2014-05-05  9:17 ` Christian Borntraeger
2014-05-05 12:05   ` Stefan Hajnoczi
2014-05-05 12:46     ` Christian Borntraeger
2014-05-06  8:39       ` Stefan Hajnoczi
2014-05-06 13:30       ` Stefan Hajnoczi
2014-05-01 22:47 [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context() Peter Lieven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.