All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock
@ 2016-10-17 13:54 Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 01/20] replication: interrupt failover if the main device is closed Paolo Bonzini
                   ` (20 more replies)
  0 siblings, 21 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

This patch reorganizes aio_poll callers to establish new rules for
dataplane locking.  The idea is that I/O operations on a dataplane
BDS (i.e. one where the AioContext is not the main one) do not call
aio_poll anymore.  Instead, they wait for the operation to end in the
other I/O thread, at which point the other I/O thread calls bdrv_wakeup
to wake up the main thread.

With this change, only one thread runs aio_poll for an AioContext.
While aio_context_acquire/release is still needed to protect the BDSes,
it need not interrupt the other thread's event loop anymore, and therefore
it does not need contention callbacks anymore.  Thus the patch can remove
RFifoLock.  This fixes possible hangs in bdrv_drain_all, reproducible (for
example) by unplugging a virtio-scsi-dataplane device while there is I/O
going on for a virtio-blk-dataplane on the same I/O thread.

Patch 1 is a bugfix that I already posted.

Patch 2 makes blockjobs independent of aio_poll, the reason for which
should be apparent from the explanation above.

Patch 3 is an independent mirror bugfix, that I wanted to submit separately
but happens to fix a hang in COLO replication.  Like patch 1 I believe
it's pre-existing and merely exposed by these patches.

Patches 4 to 10 introduce the infrastructure to wake up the main thread
while bdrv_drain or other synchronous operations are running.  Patches 11
to 16 do other changes to prepare for this.  Notably bdrv_drain_all
needs to be called without holding any AioContext lock, so bdrv_reopen
releases the lock temporarily (and callers of bdrv_reopen needs fixing).

Patch 17 then does the big change, after which there are just some
cleanups left to do.

Paolo

Fam Zheng (1):
  qed: Implement .bdrv_drain

Paolo Bonzini (19):
  replication: interrupt failover if the main device is closed
  blockjob: introduce .drain callback for jobs
  mirror: use bdrv_drained_begin/bdrv_drained_end
  block: add BDS field to count in-flight requests
  block: change drain to look only at one child at a time
  block: introduce BDRV_POLL_WHILE
  nfs: move nfs_set_events out of the while loops
  nfs: use BDRV_POLL_WHILE
  sheepdog: use BDRV_POLL_WHILE
  aio: introduce qemu_get_current_aio_context
  iothread: detach all block devices before stopping them
  replication: pass BlockDriverState to reopen_backing_file
  block: prepare bdrv_reopen_multiple to release AioContext
  qemu-io: acquire AioContext
  qemu-img: call aio_context_acquire/release around block job
  block: only call aio_poll on the current thread's AioContext
  iothread: release AioContext around aio_poll
  qemu-thread: introduce QemuRecMutex
  aio: convert from RFifoLock to QemuRecMutex

 async.c                         |  29 ++------
 block.c                         |   6 +-
 block/backup.c                  |  17 +++++
 block/block-backend.c           |  30 +++++---
 block/commit.c                  |   2 +-
 block/io.c                      | 148 ++++++++++++++++++++++------------------
 block/mirror.c                  |  70 +++++++++++++------
 block/nfs.c                     |  55 +++++++++------
 block/qed-table.c               |  16 ++---
 block/qed.c                     |  16 ++++-
 block/replication.c             |  27 +++++---
 block/sheepdog.c                |  67 ++++++++++--------
 blockjob.c                      |  37 +++++-----
 docs/multiple-iothreads.txt     |  40 ++++++-----
 hw/scsi/virtio-scsi-dataplane.c |   4 +-
 include/block/aio.h             |  24 +++++--
 include/block/block.h           |  29 +++++++-
 include/block/block_int.h       |  27 ++++++--
 include/block/blockjob.h        |   7 ++
 include/qemu/rfifolock.h        |  54 ---------------
 include/qemu/thread-posix.h     |   6 ++
 include/qemu/thread-win32.h     |  10 +++
 include/qemu/thread.h           |   3 +
 iothread.c                      |  33 ++++++---
 qemu-img.c                      |   6 ++
 qemu-io-cmds.c                  |   6 +-
 stubs/Makefile.objs             |   1 +
 stubs/iothread.c                |   8 +++
 tests/.gitignore                |   1 -
 tests/Makefile.include          |   2 -
 tests/test-aio.c                |  22 +++---
 tests/test-rfifolock.c          |  91 ------------------------
 util/Makefile.objs              |   1 -
 util/qemu-thread-posix.c        |  14 ++++
 util/qemu-thread-win32.c        |  25 +++++++
 util/rfifolock.c                |  78 ---------------------
 36 files changed, 526 insertions(+), 486 deletions(-)
 delete mode 100644 include/qemu/rfifolock.h
 create mode 100644 stubs/iothread.c
 delete mode 100644 tests/test-rfifolock.c
 delete mode 100644 util/rfifolock.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 01/20] replication: interrupt failover if the main device is closed
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 02/20] blockjob: introduce .drain callback for jobs Paolo Bonzini
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

Without this change, there is a race condition in tests/test-replication.
Depending on how fast the failover job (active commit) runs, there is a
chance of two bad things happening:

1) replication_done can be called after the secondary has been closed
and hence when the BDRVReplicationState is not valid anymore.

2) two copies of the active disk are present during the
/replication/secondary/stop test (that test runs immediately after
/replication/secondary/start, which tests failover).  This causes the
corruption detector to fire.

Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        (already applied to Stefan's tree)

 block/replication.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/replication.c b/block/replication.c
index 3bd1cf1..5231a00 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -133,6 +133,9 @@ static void replication_close(BlockDriverState *bs)
     if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
         replication_stop(s->rs, false, NULL);
     }
+    if (s->replication_state == BLOCK_REPLICATION_FAILOVER) {
+        block_job_cancel_sync(s->active_disk->bs->job);
+    }
 
     if (s->mode == REPLICATION_MODE_SECONDARY) {
         g_free(s->top_id);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 02/20] blockjob: introduce .drain callback for jobs
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 01/20] replication: interrupt failover if the main device is closed Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 03/20] mirror: use bdrv_drained_begin/bdrv_drained_end Paolo Bonzini
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

This is required to decouple block jobs from running in an
AioContext.  With multiqueue block devices, a BlockDriverState
does not really belong to a single AioContext.

The solution is to first wait until all I/O operations are
complete; then loop in the main thread for the block job to
complete entirely.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: save s->target around ref/drain/unref sequence [Stefan]

 block/backup.c           | 17 +++++++++++++++++
 block/mirror.c           | 35 +++++++++++++++++++++++++++--------
 blockjob.c               | 37 ++++++++++++++++++++-----------------
 include/block/blockjob.h |  7 +++++++
 4 files changed, 71 insertions(+), 25 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 582bd0f..130cb3f 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -300,6 +300,21 @@ void backup_cow_request_end(CowRequest *req)
     cow_request_end(req);
 }
 
+static void backup_drain(BlockJob *job)
+{
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+
+    /* Need to keep a reference in case blk_drain triggers execution
+     * of backup_complete...
+     */
+    if (s->target) {
+        BlockBackend *target = s->target;
+        blk_ref(target);
+        blk_drain(target);
+        blk_unref(target);
+    }
+}
+
 static const BlockJobDriver backup_job_driver = {
     .instance_size          = sizeof(BackupBlockJob),
     .job_type               = BLOCK_JOB_TYPE_BACKUP,
@@ -307,6 +322,7 @@ static const BlockJobDriver backup_job_driver = {
     .commit                 = backup_commit,
     .abort                  = backup_abort,
     .attached_aio_context   = backup_attached_aio_context,
+    .drain                  = backup_drain,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
@@ -331,6 +347,7 @@ static void backup_complete(BlockJob *job, void *opaque)
     BackupCompleteData *data = opaque;
 
     blk_unref(s->target);
+    s->target = NULL;
 
     block_job_completed(job, data->ret);
     g_free(data);
diff --git a/block/mirror.c b/block/mirror.c
index f9d1fec..77d188a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -469,7 +469,11 @@ static void mirror_free_init(MirrorBlockJob *s)
     }
 }
 
-static void mirror_drain(MirrorBlockJob *s)
+/* This is also used for the .pause callback. There is no matching
+ * mirror_resume() because mirror_run() will begin iterating again
+ * when the job is resumed.
+ */
+static void mirror_wait_for_all_io(MirrorBlockJob *s)
 {
     while (s->in_flight > 0) {
         mirror_wait_for_io(s);
@@ -528,6 +532,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
     g_free(s->replaces);
     bdrv_op_unblock_all(target_bs, s->common.blocker);
     blk_unref(s->target);
+    s->target = NULL;
     block_job_completed(&s->common, data->ret);
     g_free(data);
     bdrv_drained_end(src);
@@ -582,7 +587,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
             sector_num += nb_sectors;
         }
 
-        mirror_drain(s);
+        mirror_wait_for_all_io(s);
     }
 
     /* First part, loop on the sectors and initialize the dirty bitmap.  */
@@ -786,7 +791,7 @@ immediate_exit:
          * the target is a copy of the source.
          */
         assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common)));
-        mirror_drain(s);
+        mirror_wait_for_all_io(s);
     }
 
     assert(s->in_flight == 0);
@@ -870,14 +875,11 @@ static void mirror_complete(BlockJob *job, Error **errp)
     block_job_enter(&s->common);
 }
 
-/* There is no matching mirror_resume() because mirror_run() will begin
- * iterating again when the job is resumed.
- */
-static void coroutine_fn mirror_pause(BlockJob *job)
+static void mirror_pause(BlockJob *job)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
 
-    mirror_drain(s);
+    mirror_wait_for_all_io(s);
 }
 
 static void mirror_attached_aio_context(BlockJob *job, AioContext *new_context)
@@ -887,6 +889,21 @@ static void mirror_attached_aio_context(BlockJob *job, AioContext *new_context)
     blk_set_aio_context(s->target, new_context);
 }
 
+static void mirror_drain(BlockJob *job)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+    /* Need to keep a reference in case blk_drain triggers execution
+     * of mirror_complete...
+     */
+    if (s->target) {
+        BlockBackend *target = s->target;
+        blk_ref(target);
+        blk_drain(target);
+        blk_unref(target);
+    }
+}
+
 static const BlockJobDriver mirror_job_driver = {
     .instance_size          = sizeof(MirrorBlockJob),
     .job_type               = BLOCK_JOB_TYPE_MIRROR,
@@ -894,6 +911,7 @@ static const BlockJobDriver mirror_job_driver = {
     .complete               = mirror_complete,
     .pause                  = mirror_pause,
     .attached_aio_context   = mirror_attached_aio_context,
+    .drain                  = mirror_drain,
 };
 
 static const BlockJobDriver commit_active_job_driver = {
@@ -903,6 +921,7 @@ static const BlockJobDriver commit_active_job_driver = {
     .complete               = mirror_complete,
     .pause                  = mirror_pause,
     .attached_aio_context   = mirror_attached_aio_context,
+    .drain                  = mirror_drain,
 };
 
 static void mirror_start_job(const char *job_id, BlockDriverState *bs,
diff --git a/blockjob.c b/blockjob.c
index 43fecbe..7c88b30 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -74,17 +74,6 @@ BlockJob *block_job_get(const char *id)
     return NULL;
 }
 
-/* Normally the job runs in its BlockBackend's AioContext.  The exception is
- * block_job_defer_to_main_loop() where it runs in the QEMU main loop.  Code
- * that supports both cases uses this helper function.
- */
-static AioContext *block_job_get_aio_context(BlockJob *job)
-{
-    return job->deferred_to_main_loop ?
-           qemu_get_aio_context() :
-           blk_get_aio_context(job->blk);
-}
-
 static void block_job_attached_aio_context(AioContext *new_context,
                                            void *opaque)
 {
@@ -97,6 +86,17 @@ static void block_job_attached_aio_context(AioContext *new_context,
     block_job_resume(job);
 }
 
+static void block_job_drain(BlockJob *job)
+{
+    /* If job is !job->busy this kicks it into the next pause point. */
+    block_job_enter(job);
+
+    blk_drain(job->blk);
+    if (job->driver->drain) {
+        job->driver->drain(job);
+    }
+}
+
 static void block_job_detach_aio_context(void *opaque)
 {
     BlockJob *job = opaque;
@@ -106,12 +106,8 @@ static void block_job_detach_aio_context(void *opaque)
 
     block_job_pause(job);
 
-    if (!job->paused) {
-        /* If job is !job->busy this kicks it into the next pause point. */
-        block_job_enter(job);
-    }
     while (!job->paused && !job->completed) {
-        aio_poll(block_job_get_aio_context(job), true);
+        block_job_drain(job);
     }
 
     block_job_unref(job);
@@ -413,14 +409,21 @@ static int block_job_finish_sync(BlockJob *job,
     assert(blk_bs(job->blk)->job == job);
 
     block_job_ref(job);
+
     finish(job, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         block_job_unref(job);
         return -EBUSY;
     }
+    /* block_job_drain calls block_job_enter, and it should be enough to
+     * induce progress until the job completes or moves to the main thread.
+    */
+    while (!job->deferred_to_main_loop && !job->completed) {
+        block_job_drain(job);
+    }
     while (!job->completed) {
-        aio_poll(block_job_get_aio_context(job), true);
+        aio_poll(qemu_get_aio_context(), true);
     }
     ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
     block_job_unref(job);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 4ddb4ae..2bb39f4 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -92,6 +92,13 @@ typedef struct BlockJobDriver {
      * besides job->blk to the new AioContext.
      */
     void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
+
+    /*
+     * If the callback is not NULL, it will be invoked when the job has to be
+     * synchronously cancelled or completed; it should drain BlockDriverStates
+     * as required to ensure progress.
+     */
+    void (*drain)(BlockJob *job);
 } BlockJobDriver;
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 03/20] mirror: use bdrv_drained_begin/bdrv_drained_end
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 01/20] replication: interrupt failover if the main device is closed Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 02/20] blockjob: introduce .drain callback for jobs Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 04/20] block: add BDS field to count in-flight requests Paolo Bonzini
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

Ensure that there are no changes between the last check to
bdrv_get_dirty_count and the switch to the target.

There is already a bdrv_drained_end call, we only need to ensure
that bdrv_drained_begin is not called twice.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 77d188a..45f9248 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -622,6 +622,7 @@ static void coroutine_fn mirror_run(void *opaque)
     MirrorExitData *data;
     BlockDriverState *bs = blk_bs(s->common.blk);
     BlockDriverState *target_bs = blk_bs(s->target);
+    bool need_drain = true;
     int64_t length;
     BlockDriverInfo bdi;
     char backing_filename[2]; /* we only need 2 characters because we are only
@@ -756,11 +757,26 @@ static void coroutine_fn mirror_run(void *opaque)
              * source has dirty data to copy!
              *
              * Note that I/O can be submitted by the guest while
-             * mirror_populate runs.
+             * mirror_populate runs, so pause it now.  Before deciding
+             * whether to switch to target check one last time if I/O has
+             * come in the meanwhile, and if not flush the data to disk.
              */
             trace_mirror_before_drain(s, cnt);
-            bdrv_co_drain(bs);
+
+            bdrv_drained_begin(bs);
             cnt = bdrv_get_dirty_count(s->dirty_bitmap);
+            if (cnt > 0) {
+                bdrv_drained_end(bs);
+                continue;
+            }
+
+            /* The two disks are in sync.  Exit and report successful
+             * completion.
+             */
+            assert(QLIST_EMPTY(&bs->tracked_requests));
+            s->common.cancelled = false;
+            need_drain = false;
+            break;
         }
 
         ret = 0;
@@ -773,13 +789,6 @@ static void coroutine_fn mirror_run(void *opaque)
         } else if (!should_complete) {
             delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
             block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
-        } else if (cnt == 0) {
-            /* The two disks are in sync.  Exit and report successful
-             * completion.
-             */
-            assert(QLIST_EMPTY(&bs->tracked_requests));
-            s->common.cancelled = false;
-            break;
         }
         s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     }
@@ -791,6 +800,7 @@ immediate_exit:
          * the target is a copy of the source.
          */
         assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common)));
+        assert(need_drain);
         mirror_wait_for_all_io(s);
     }
 
@@ -802,9 +812,10 @@ immediate_exit:
 
     data = g_malloc(sizeof(*data));
     data->ret = ret;
-    /* Before we switch to target in mirror_exit, make sure data doesn't
-     * change. */
-    bdrv_drained_begin(bs);
+
+    if (need_drain) {
+        bdrv_drained_begin(bs);
+    }
     block_job_defer_to_main_loop(&s->common, mirror_exit, data);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 04/20] block: add BDS field to count in-flight requests
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 03/20] mirror: use bdrv_drained_begin/bdrv_drained_end Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 05/20] block: change drain to look only at one child at a time Paolo Bonzini
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

Unlike tracked_requests, this field also counts throttled requests,
and remains non-zero if an AIO operation needs a BH to be "really"
completed.

With this change, it is no longer necessary to have a dummy
BdrvTrackedRequest for requests that are never serialising, and
it is no longer necessary to poll the AioContext once after
bdrv_requests_pending(bs) returns false.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/block-backend.c     | 23 +++++++++++---
 block/io.c                | 81 ++++++++++++++++++++++++++++++++---------------
 include/block/block_int.h | 10 +++---
 3 files changed, 81 insertions(+), 33 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 1a724a8..234df1e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -799,20 +799,25 @@ int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
                                BdrvRequestFlags flags)
 {
     int ret;
+    BlockDriverState *bs = blk_bs(blk);
 
-    trace_blk_co_preadv(blk, blk_bs(blk), offset, bytes, flags);
+    trace_blk_co_preadv(blk, bs, offset, bytes, flags);
 
     ret = blk_check_byte_request(blk, offset, bytes);
     if (ret < 0) {
         return ret;
     }
 
+    bdrv_inc_in_flight(bs);
+
     /* throttling disk I/O */
     if (blk->public.throttle_state) {
         throttle_group_co_io_limits_intercept(blk, bytes, false);
     }
 
-    return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
+    ret = bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
+    bdrv_dec_in_flight(bs);
+    return ret;
 }
 
 int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
@@ -820,14 +825,17 @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
                                 BdrvRequestFlags flags)
 {
     int ret;
+    BlockDriverState *bs = blk_bs(blk);
 
-    trace_blk_co_pwritev(blk, blk_bs(blk), offset, bytes, flags);
+    trace_blk_co_pwritev(blk, bs, offset, bytes, flags);
 
     ret = blk_check_byte_request(blk, offset, bytes);
     if (ret < 0) {
         return ret;
     }
 
+    bdrv_inc_in_flight(bs);
+
     /* throttling disk I/O */
     if (blk->public.throttle_state) {
         throttle_group_co_io_limits_intercept(blk, bytes, true);
@@ -837,7 +845,9 @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
         flags |= BDRV_REQ_FUA;
     }
 
-    return bdrv_co_pwritev(blk->root, offset, bytes, qiov, flags);
+    ret = bdrv_co_pwritev(blk->root, offset, bytes, qiov, flags);
+    bdrv_dec_in_flight(bs);
+    return ret;
 }
 
 typedef struct BlkRwCo {
@@ -930,6 +940,8 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
 static void error_callback_bh(void *opaque)
 {
     struct BlockBackendAIOCB *acb = opaque;
+
+    bdrv_dec_in_flight(acb->common.bs);
     acb->common.cb(acb->common.opaque, acb->ret);
     qemu_aio_unref(acb);
 }
@@ -940,6 +952,7 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
 {
     struct BlockBackendAIOCB *acb;
 
+    bdrv_inc_in_flight(blk_bs(blk));
     acb = blk_aio_get(&block_backend_aiocb_info, blk, cb, opaque);
     acb->blk = blk;
     acb->ret = ret;
@@ -962,6 +975,7 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
     if (acb->has_returned) {
+        bdrv_dec_in_flight(acb->common.bs);
         acb->common.cb(acb->common.opaque, acb->rwco.ret);
         qemu_aio_unref(acb);
     }
@@ -983,6 +997,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
     BlkAioEmAIOCB *acb;
     Coroutine *co;
 
+    bdrv_inc_in_flight(blk_bs(blk));
     acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
     acb->rwco = (BlkRwCo) {
         .blk    = blk,
diff --git a/block/io.c b/block/io.c
index b136c89..8d46d8b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -143,7 +143,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
 {
     BdrvChild *child;
 
-    if (!QLIST_EMPTY(&bs->tracked_requests)) {
+    if (atomic_read(&bs->in_flight)) {
         return true;
     }
 
@@ -176,12 +176,9 @@ typedef struct {
 
 static void bdrv_drain_poll(BlockDriverState *bs)
 {
-    bool busy = true;
-
-    while (busy) {
+    while (bdrv_requests_pending(bs)) {
         /* Keep iterating */
-        busy = bdrv_requests_pending(bs);
-        busy |= aio_poll(bdrv_get_aio_context(bs), busy);
+        aio_poll(bdrv_get_aio_context(bs), true);
     }
 }
 
@@ -189,8 +186,10 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 {
     BdrvCoDrainData *data = opaque;
     Coroutine *co = data->co;
+    BlockDriverState *bs = data->bs;
 
-    bdrv_drain_poll(data->bs);
+    bdrv_dec_in_flight(bs);
+    bdrv_drain_poll(bs);
     data->done = true;
     qemu_coroutine_enter(co);
 }
@@ -209,6 +208,7 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs)
         .bs = bs,
         .done = false,
     };
+    bdrv_inc_in_flight(bs);
     aio_bh_schedule_oneshot(bdrv_get_aio_context(bs),
                             bdrv_co_drain_bh_cb, &data);
 
@@ -279,7 +279,7 @@ void bdrv_drain(BlockDriverState *bs)
 void bdrv_drain_all(void)
 {
     /* Always run first iteration so any pending completion BHs run */
-    bool busy = true;
+    bool waited = true;
     BlockDriverState *bs;
     BdrvNextIterator it;
     BlockJob *job = NULL;
@@ -313,8 +313,8 @@ void bdrv_drain_all(void)
      * request completion.  Therefore we must keep looping until there was no
      * more activity rather than simply draining each device independently.
      */
-    while (busy) {
-        busy = false;
+    while (waited) {
+        waited = false;
 
         for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
             AioContext *aio_context = ctx->data;
@@ -323,12 +323,11 @@ void bdrv_drain_all(void)
             for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
                 if (aio_context == bdrv_get_aio_context(bs)) {
                     if (bdrv_requests_pending(bs)) {
-                        busy = true;
-                        aio_poll(aio_context, busy);
+                        aio_poll(aio_context, true);
+                        waited = true;
                     }
                 }
             }
-            busy |= aio_poll(aio_context, false);
             aio_context_release(aio_context);
         }
     }
@@ -476,6 +475,16 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
     return true;
 }
 
+void bdrv_inc_in_flight(BlockDriverState *bs)
+{
+    atomic_inc(&bs->in_flight);
+}
+
+void bdrv_dec_in_flight(BlockDriverState *bs)
+{
+    atomic_dec(&bs->in_flight);
+}
+
 static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
 {
     BlockDriverState *bs = self->bs;
@@ -1097,6 +1106,8 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
         return ret;
     }
 
+    bdrv_inc_in_flight(bs);
+
     /* Don't do copy-on-read if we read data before write operation */
     if (bs->copy_on_read && !(flags & BDRV_REQ_NO_SERIALISING)) {
         flags |= BDRV_REQ_COPY_ON_READ;
@@ -1132,6 +1143,7 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
                               use_local_qiov ? &local_qiov : qiov,
                               flags);
     tracked_request_end(&req);
+    bdrv_dec_in_flight(bs);
 
     if (use_local_qiov) {
         qemu_iovec_destroy(&local_qiov);
@@ -1480,6 +1492,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
         return ret;
     }
 
+    bdrv_inc_in_flight(bs);
     /*
      * Align write if necessary by performing a read-modify-write cycle.
      * Pad qiov with the read parts and be sure to have a tracked request not
@@ -1581,6 +1594,7 @@ fail:
     qemu_vfree(tail_buf);
 out:
     tracked_request_end(&req);
+    bdrv_dec_in_flight(bs);
     return ret;
 }
 
@@ -1705,17 +1719,19 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
     }
 
     *file = NULL;
+    bdrv_inc_in_flight(bs);
     ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
                                             file);
     if (ret < 0) {
         *pnum = 0;
-        return ret;
+        goto out;
     }
 
     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID);
-        return bdrv_get_block_status(bs->file->bs, ret >> BDRV_SECTOR_BITS,
-                                     *pnum, pnum, file);
+        ret = bdrv_get_block_status(bs->file->bs, ret >> BDRV_SECTOR_BITS,
+                                    *pnum, pnum, file);
+        goto out;
     }
 
     if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
@@ -1757,6 +1773,8 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         }
     }
 
+out:
+    bdrv_dec_in_flight(bs);
     return ret;
 }
 
@@ -2102,6 +2120,7 @@ static const AIOCBInfo bdrv_em_co_aiocb_info = {
 static void bdrv_co_complete(BlockAIOCBCoroutine *acb)
 {
     if (!acb->need_bh) {
+        bdrv_dec_in_flight(acb->common.bs);
         acb->common.cb(acb->common.opaque, acb->req.error);
         qemu_aio_unref(acb);
     }
@@ -2152,6 +2171,9 @@ static BlockAIOCB *bdrv_co_aio_prw_vector(BdrvChild *child,
     Coroutine *co;
     BlockAIOCBCoroutine *acb;
 
+    /* Matched by bdrv_co_complete's bdrv_dec_in_flight.  */
+    bdrv_inc_in_flight(child->bs);
+
     acb = qemu_aio_get(&bdrv_em_co_aiocb_info, child->bs, cb, opaque);
     acb->child = child;
     acb->need_bh = true;
@@ -2185,6 +2207,9 @@ BlockAIOCB *bdrv_aio_flush(BlockDriverState *bs,
     Coroutine *co;
     BlockAIOCBCoroutine *acb;
 
+    /* Matched by bdrv_co_complete's bdrv_dec_in_flight.  */
+    bdrv_inc_in_flight(bs);
+
     acb = qemu_aio_get(&bdrv_em_co_aiocb_info, bs, cb, opaque);
     acb->need_bh = true;
     acb->req.error = -EINPROGRESS;
@@ -2213,6 +2238,9 @@ BlockAIOCB *bdrv_aio_pdiscard(BlockDriverState *bs, int64_t offset, int count,
 
     trace_bdrv_aio_pdiscard(bs, offset, count, opaque);
 
+    /* Matched by bdrv_co_complete's bdrv_dec_in_flight.  */
+    bdrv_inc_in_flight(bs);
+
     acb = qemu_aio_get(&bdrv_em_co_aiocb_info, bs, cb, opaque);
     acb->need_bh = true;
     acb->req.error = -EINPROGRESS;
@@ -2273,23 +2301,22 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 {
     int ret;
-    BdrvTrackedRequest req;
 
     if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
         bdrv_is_sg(bs)) {
         return 0;
     }
 
-    tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
+    bdrv_inc_in_flight(bs);
 
     int current_gen = bs->write_gen;
 
     /* Wait until any previous flushes are completed */
-    while (bs->active_flush_req != NULL) {
+    while (bs->active_flush_req) {
         qemu_co_queue_wait(&bs->flush_queue);
     }
 
-    bs->active_flush_req = &req;
+    bs->active_flush_req = true;
 
     /* Write back all layers by calling one driver function */
     if (bs->drv->bdrv_co_flush) {
@@ -2359,11 +2386,11 @@ flush_parent:
 out:
     /* Notify any pending flushes that we have completed */
     bs->flushed_gen = current_gen;
-    bs->active_flush_req = NULL;
+    bs->active_flush_req = false;
     /* Return value is ignored - it's ok if wait queue is empty */
     qemu_co_queue_next(&bs->flush_queue);
 
-    tracked_request_end(&req);
+    bdrv_dec_in_flight(bs);
     return ret;
 }
 
@@ -2446,6 +2473,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,
         return 0;
     }
 
+    bdrv_inc_in_flight(bs);
     tracked_request_begin(&req, bs, offset, count, BDRV_TRACKED_DISCARD);
 
     ret = notifier_with_return_list_notify(&bs->before_write_notifiers, &req);
@@ -2492,6 +2520,7 @@ out:
     bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
                    req.bytes >> BDRV_SECTOR_BITS);
     tracked_request_end(&req);
+    bdrv_dec_in_flight(bs);
     return ret;
 }
 
@@ -2524,13 +2553,12 @@ int bdrv_pdiscard(BlockDriverState *bs, int64_t offset, int count)
 static int bdrv_co_do_ioctl(BlockDriverState *bs, int req, void *buf)
 {
     BlockDriver *drv = bs->drv;
-    BdrvTrackedRequest tracked_req;
     CoroutineIOCompletion co = {
         .coroutine = qemu_coroutine_self(),
     };
     BlockAIOCB *acb;
 
-    tracked_request_begin(&tracked_req, bs, 0, 0, BDRV_TRACKED_IOCTL);
+    bdrv_inc_in_flight(bs);
     if (!drv || !drv->bdrv_aio_ioctl) {
         co.ret = -ENOTSUP;
         goto out;
@@ -2543,7 +2571,7 @@ static int bdrv_co_do_ioctl(BlockDriverState *bs, int req, void *buf)
     }
     qemu_coroutine_yield();
 out:
-    tracked_request_end(&tracked_req);
+    bdrv_dec_in_flight(bs);
     return co.ret;
 }
 
@@ -2600,6 +2628,9 @@ BlockAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
                                             bs, cb, opaque);
     Coroutine *co;
 
+    /* Matched by bdrv_co_complete's bdrv_dec_in_flight.  */
+    bdrv_inc_in_flight(bs);
+
     acb->need_bh = true;
     acb->req.error = -EINPROGRESS;
     acb->req.req = req;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3e79228..5a7308b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -62,8 +62,6 @@
 enum BdrvTrackedRequestType {
     BDRV_TRACKED_READ,
     BDRV_TRACKED_WRITE,
-    BDRV_TRACKED_FLUSH,
-    BDRV_TRACKED_IOCTL,
     BDRV_TRACKED_DISCARD,
 };
 
@@ -443,7 +441,7 @@ struct BlockDriverState {
                          note this is a reference count */
 
     CoQueue flush_queue;            /* Serializing flush queue */
-    BdrvTrackedRequest *active_flush_req; /* Flush request in flight */
+    bool active_flush_req;          /* Flush request in flight? */
     unsigned int write_gen;         /* Current data generation */
     unsigned int flushed_gen;       /* Flushed write generation */
 
@@ -471,7 +469,8 @@ struct BlockDriverState {
     /* Callback before write request is processed */
     NotifierWithReturnList before_write_notifiers;
 
-    /* number of in-flight serialising requests */
+    /* number of in-flight requests; overall and serialising */
+    unsigned int in_flight;
     unsigned int serialising_in_flight;
 
     /* Offset after the highest byte written to */
@@ -785,6 +784,9 @@ bool bdrv_requests_pending(BlockDriverState *bs);
 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out);
 void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in);
 
+void bdrv_inc_in_flight(BlockDriverState *bs);
+void bdrv_dec_in_flight(BlockDriverState *bs);
+
 void blockdev_close_all_bdrv_states(void);
 
 #endif /* BLOCK_INT_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 05/20] block: change drain to look only at one child at a time
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 04/20] block: add BDS field to count in-flight requests Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 06/20] qed: Implement .bdrv_drain Paolo Bonzini
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

bdrv_requests_pending is checking children to also wait until internal
requests (such as metadata writes) have completed.  However, checking
children is in general overkill.  Children requests can be of two kinds:

- requests caused by an operation on bs, e.g. a bdrv_aio_write to bs
causing a write to bs->file->bs.  In this case, the parent's in_flight
count will always be incremented by at least one for every request in
the child.

- asynchronous metadata writes or flushes.  Such writes can be started
even if bs's in_flight count is zero, but not after the .bdrv_drain
callback has been invoked.

This patch therefore changes bdrv_drain to finish I/O in the parent
(after which the parent's in_flight will be locked to zero), call
bdrv_drain (after which the parent will not generate I/O on the child
anymore), and then wait for internal I/O in the children to complete.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c | 47 ++++++++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/block/io.c b/block/io.c
index 8d46d8b..afec968 100644
--- a/block/io.c
+++ b/block/io.c
@@ -156,16 +156,33 @@ bool bdrv_requests_pending(BlockDriverState *bs)
     return false;
 }
 
-static void bdrv_drain_recurse(BlockDriverState *bs)
+static bool bdrv_drain_poll(BlockDriverState *bs)
+{
+    bool waited = false;
+
+    while (atomic_read(&bs->in_flight) > 0) {
+        aio_poll(bdrv_get_aio_context(bs), true);
+        waited = true;
+    }
+    return waited;
+}
+
+static bool bdrv_drain_recurse(BlockDriverState *bs)
 {
     BdrvChild *child;
+    bool waited;
+
+    waited = bdrv_drain_poll(bs);
 
     if (bs->drv && bs->drv->bdrv_drain) {
         bs->drv->bdrv_drain(bs);
     }
+
     QLIST_FOREACH(child, &bs->children, next) {
-        bdrv_drain_recurse(child->bs);
+        waited |= bdrv_drain_recurse(child->bs);
     }
+
+    return waited;
 }
 
 typedef struct {
@@ -174,14 +191,6 @@ typedef struct {
     bool done;
 } BdrvCoDrainData;
 
-static void bdrv_drain_poll(BlockDriverState *bs)
-{
-    while (bdrv_requests_pending(bs)) {
-        /* Keep iterating */
-        aio_poll(bdrv_get_aio_context(bs), true);
-    }
-}
-
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
     BdrvCoDrainData *data = opaque;
@@ -189,7 +198,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
     BlockDriverState *bs = data->bs;
 
     bdrv_dec_in_flight(bs);
-    bdrv_drain_poll(bs);
+    bdrv_drained_begin(bs);
     data->done = true;
     qemu_coroutine_enter(co);
 }
@@ -220,6 +229,11 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs)
 
 void bdrv_drained_begin(BlockDriverState *bs)
 {
+    if (qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain(bs);
+        return;
+    }
+
     if (!bs->quiesce_counter++) {
         aio_disable_external(bdrv_get_aio_context(bs));
         bdrv_parent_drained_begin(bs);
@@ -227,11 +241,6 @@ void bdrv_drained_begin(BlockDriverState *bs)
 
     bdrv_io_unplugged_begin(bs);
     bdrv_drain_recurse(bs);
-    if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs);
-    } else {
-        bdrv_drain_poll(bs);
-    }
     bdrv_io_unplugged_end(bs);
 }
 
@@ -299,7 +308,6 @@ void bdrv_drain_all(void)
         aio_context_acquire(aio_context);
         bdrv_parent_drained_begin(bs);
         bdrv_io_unplugged_begin(bs);
-        bdrv_drain_recurse(bs);
         aio_context_release(aio_context);
 
         if (!g_slist_find(aio_ctxs, aio_context)) {
@@ -322,10 +330,7 @@ void bdrv_drain_all(void)
             aio_context_acquire(aio_context);
             for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
                 if (aio_context == bdrv_get_aio_context(bs)) {
-                    if (bdrv_requests_pending(bs)) {
-                        aio_poll(aio_context, true);
-                        waited = true;
-                    }
+                    waited |= bdrv_drain_recurse(bs);
                 }
             }
             aio_context_release(aio_context);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 06/20] qed: Implement .bdrv_drain
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 05/20] block: change drain to look only at one child at a time Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 07/20] block: introduce BDRV_POLL_WHILE Paolo Bonzini
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

From: Fam Zheng <famz@redhat.com>

The "need_check_timer" is used to clear the "NEED_CHECK" flag in the
image header after a grace period once metadata update has finished. To
comply with the bdrv_drain semantics, we should make sure it remains
deleted once .bdrv_drain is called.

The change to qed_need_check_timer_cb is needed because bdrv_qed_drain
is called after s->bs has been drained, and should not operate on it;
instead it should operate on the BdrvChild-ren exclusively.  Doing so
is easy because QED does not have a bdrv_co_flush_to_os callback, hence
all that is needed to flush it is to ensure writes have reached the disk.

Based on commit df9a681dc9a (which however included some unrelated
hunks, possibly due to a merge failure or an overlooked squash).
The patch was reverted because at the time bdrv_qed_drain could call
qed_plug_allocating_write_reqs while an allocating write was queued.
This however is not possible anymore after the previous patch;
.bdrv_drain is only called after all writes have completed at the
QED level, and its purpose is to trigger metadata writes in bs->file.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/qed.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/qed.c b/block/qed.c
index 3ee879b..1a7ef0a 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -336,7 +336,7 @@ static void qed_need_check_timer_cb(void *opaque)
     qed_plug_allocating_write_reqs(s);
 
     /* Ensure writes are on disk before clearing flag */
-    bdrv_aio_flush(s->bs, qed_clear_need_check, s);
+    bdrv_aio_flush(s->bs->file->bs, qed_clear_need_check, s);
 }
 
 static void qed_start_need_check_timer(BDRVQEDState *s)
@@ -378,6 +378,19 @@ static void bdrv_qed_attach_aio_context(BlockDriverState *bs,
     }
 }
 
+static void bdrv_qed_drain(BlockDriverState *bs)
+{
+    BDRVQEDState *s = bs->opaque;
+
+    /* Fire the timer immediately in order to start doing I/O as soon as the
+     * header is flushed.
+     */
+    if (s->need_check_timer && timer_pending(s->need_check_timer)) {
+        qed_cancel_need_check_timer(s);
+        qed_need_check_timer_cb(s);
+    }
+}
+
 static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
                          Error **errp)
 {
@@ -1668,6 +1681,7 @@ static BlockDriver bdrv_qed = {
     .bdrv_check               = bdrv_qed_check,
     .bdrv_detach_aio_context  = bdrv_qed_detach_aio_context,
     .bdrv_attach_aio_context  = bdrv_qed_attach_aio_context,
+    .bdrv_drain               = bdrv_qed_drain,
 };
 
 static void bdrv_qed_init(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 07/20] block: introduce BDRV_POLL_WHILE
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 06/20] qed: Implement .bdrv_drain Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 08/20] nfs: move nfs_set_events out of the while loops Paolo Bonzini
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

We want the BDS event loop to run exclusively in the iothread that
owns the BDS's AioContext.  This macro will provide the synchronization
between the two event loops; for now it just wraps the common idiom
of a while loop around aio_poll.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: make macro name uppercase [Fam]
        leave bdrv_wakeup for later [Stefan]

 block/block-backend.c |  7 +------
 block/io.c            | 42 ++++++------------------------------------
 block/qed-table.c     | 16 ++++------------
 include/block/block.h |  9 +++++++++
 4 files changed, 20 insertions(+), 54 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 234df1e..27cb030 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -878,7 +878,6 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
                    int64_t bytes, CoroutineEntry co_entry,
                    BdrvRequestFlags flags)
 {
-    AioContext *aio_context;
     QEMUIOVector qiov;
     struct iovec iov;
     Coroutine *co;
@@ -900,11 +899,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
 
     co = qemu_coroutine_create(co_entry, &rwco);
     qemu_coroutine_enter(co);
-
-    aio_context = blk_get_aio_context(blk);
-    while (rwco.ret == NOT_DONE) {
-        aio_poll(aio_context, true);
-    }
+    BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
 
     return rwco.ret;
 }
diff --git a/block/io.c b/block/io.c
index afec968..f0682dd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -156,23 +156,12 @@ bool bdrv_requests_pending(BlockDriverState *bs)
     return false;
 }
 
-static bool bdrv_drain_poll(BlockDriverState *bs)
-{
-    bool waited = false;
-
-    while (atomic_read(&bs->in_flight) > 0) {
-        aio_poll(bdrv_get_aio_context(bs), true);
-        waited = true;
-    }
-    return waited;
-}
-
 static bool bdrv_drain_recurse(BlockDriverState *bs)
 {
     BdrvChild *child;
     bool waited;
 
-    waited = bdrv_drain_poll(bs);
+    waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
 
     if (bs->drv && bs->drv->bdrv_drain) {
         bs->drv->bdrv_drain(bs);
@@ -597,13 +586,9 @@ static int bdrv_prwv_co(BdrvChild *child, int64_t offset,
         /* Fast-path if already in coroutine context */
         bdrv_rw_co_entry(&rwco);
     } else {
-        AioContext *aio_context = bdrv_get_aio_context(child->bs);
-
         co = qemu_coroutine_create(bdrv_rw_co_entry, &rwco);
         qemu_coroutine_enter(co);
-        while (rwco.ret == NOT_DONE) {
-            aio_poll(aio_context, true);
-        }
+        BDRV_POLL_WHILE(child->bs, rwco.ret == NOT_DONE);
     }
     return rwco.ret;
 }
@@ -1845,14 +1830,10 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
         /* Fast-path if already in coroutine context */
         bdrv_get_block_status_above_co_entry(&data);
     } else {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
         co = qemu_coroutine_create(bdrv_get_block_status_above_co_entry,
                                    &data);
         qemu_coroutine_enter(co);
-        while (!data.done) {
-            aio_poll(aio_context, true);
-        }
+        BDRV_POLL_WHILE(bs, !data.done);
     }
     return data.ret;
 }
@@ -2411,13 +2392,9 @@ int bdrv_flush(BlockDriverState *bs)
         /* Fast-path if already in coroutine context */
         bdrv_flush_co_entry(&flush_co);
     } else {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
         co = qemu_coroutine_create(bdrv_flush_co_entry, &flush_co);
         qemu_coroutine_enter(co);
-        while (flush_co.ret == NOT_DONE) {
-            aio_poll(aio_context, true);
-        }
+        BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE);
     }
 
     return flush_co.ret;
@@ -2543,13 +2520,9 @@ int bdrv_pdiscard(BlockDriverState *bs, int64_t offset, int count)
         /* Fast-path if already in coroutine context */
         bdrv_pdiscard_co_entry(&rwco);
     } else {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
         co = qemu_coroutine_create(bdrv_pdiscard_co_entry, &rwco);
         qemu_coroutine_enter(co);
-        while (rwco.ret == NOT_DONE) {
-            aio_poll(aio_context, true);
-        }
+        BDRV_POLL_WHILE(bs, rwco.ret == NOT_DONE);
     }
 
     return rwco.ret;
@@ -2608,11 +2581,8 @@ int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
         bdrv_co_ioctl_entry(&data);
     } else {
         Coroutine *co = qemu_coroutine_create(bdrv_co_ioctl_entry, &data);
-
         qemu_coroutine_enter(co);
-        while (data.ret == -EINPROGRESS) {
-            aio_poll(bdrv_get_aio_context(bs), true);
-        }
+        BDRV_POLL_WHILE(bs, data.ret == -EINPROGRESS);
     }
     return data.ret;
 }
diff --git a/block/qed-table.c b/block/qed-table.c
index 1a731df..ed443e2 100644
--- a/block/qed-table.c
+++ b/block/qed-table.c
@@ -174,9 +174,7 @@ int qed_read_l1_table_sync(BDRVQEDState *s)
 
     qed_read_table(s, s->header.l1_table_offset,
                    s->l1_table, qed_sync_cb, &ret);
-    while (ret == -EINPROGRESS) {
-        aio_poll(bdrv_get_aio_context(s->bs), true);
-    }
+    BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS);
 
     return ret;
 }
@@ -195,9 +193,7 @@ int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
     int ret = -EINPROGRESS;
 
     qed_write_l1_table(s, index, n, qed_sync_cb, &ret);
-    while (ret == -EINPROGRESS) {
-        aio_poll(bdrv_get_aio_context(s->bs), true);
-    }
+    BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS);
 
     return ret;
 }
@@ -268,9 +264,7 @@ int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset
     int ret = -EINPROGRESS;
 
     qed_read_l2_table(s, request, offset, qed_sync_cb, &ret);
-    while (ret == -EINPROGRESS) {
-        aio_poll(bdrv_get_aio_context(s->bs), true);
-    }
+    BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS);
 
     return ret;
 }
@@ -290,9 +284,7 @@ int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
     int ret = -EINPROGRESS;
 
     qed_write_l2_table(s, request, index, n, flush, qed_sync_cb, &ret);
-    while (ret == -EINPROGRESS) {
-        aio_poll(bdrv_get_aio_context(s->bs), true);
-    }
+    BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS);
 
     return ret;
 }
diff --git a/include/block/block.h b/include/block/block.h
index 107c603..3ba9e34 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -340,6 +340,15 @@ void bdrv_drain(BlockDriverState *bs);
 void coroutine_fn bdrv_co_drain(BlockDriverState *bs);
 void bdrv_drain_all(void);
 
+#define BDRV_POLL_WHILE(bs, cond) ({                       \
+    bool waited_ = false;                                  \
+    BlockDriverState *bs_ = (bs);                          \
+    while ((cond)) {                                       \
+        aio_poll(bdrv_get_aio_context(bs_), true);         \
+        waited_ = true;                                    \
+    }                                                      \
+    waited_; })
+
 int bdrv_pdiscard(BlockDriverState *bs, int64_t offset, int count);
 int bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset, int count);
 int bdrv_has_zero_init_1(BlockDriverState *bs);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 08/20] nfs: move nfs_set_events out of the while loops
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (6 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 07/20] block: introduce BDRV_POLL_WHILE Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 09/20] nfs: use BDRV_POLL_WHILE Paolo Bonzini
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

nfs_set_events only needs to be called once before entering the
while loop; afterwards, nfs_process_read and nfs_process_write
take care of it.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/nfs.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index c3db2ec..c8df8d8 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -149,8 +149,8 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
         return -ENOMEM;
     }
 
+    nfs_set_events(client);
     while (!task.complete) {
-        nfs_set_events(client);
         qemu_coroutine_yield();
     }
 
@@ -191,8 +191,8 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
         return -ENOMEM;
     }
 
+    nfs_set_events(client);
     while (!task.complete) {
-        nfs_set_events(client);
         qemu_coroutine_yield();
     }
 
@@ -217,8 +217,8 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
         return -ENOMEM;
     }
 
+    nfs_set_events(client);
     while (!task.complete) {
-        nfs_set_events(client);
         qemu_coroutine_yield();
     }
 
@@ -513,8 +513,8 @@ static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
         return -ENOMEM;
     }
 
+    nfs_set_events(client);
     while (!task.complete) {
-        nfs_set_events(client);
         aio_poll(client->aio_context, true);
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 09/20] nfs: use BDRV_POLL_WHILE
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (7 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 08/20] nfs: move nfs_set_events out of the while loops Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 10/20] sheepdog: " Paolo Bonzini
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

This will make it possible to use nfs_get_allocated_file_size on
a file that is not in the main AioContext.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: no bdrv_wakeup yet

 block/nfs.c | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index c8df8d8..7474fbc 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -52,6 +52,7 @@ typedef struct NFSClient {
 } NFSClient;
 
 typedef struct NFSRPC {
+    BlockDriverState *bs;
     int ret;
     int complete;
     QEMUIOVector *iov;
@@ -90,11 +91,12 @@ static void nfs_process_write(void *arg)
     nfs_set_events(client);
 }
 
-static void nfs_co_init_task(NFSClient *client, NFSRPC *task)
+static void nfs_co_init_task(BlockDriverState *bs, NFSRPC *task)
 {
     *task = (NFSRPC) {
         .co             = qemu_coroutine_self(),
-        .client         = client,
+        .bs             = bs,
+        .client         = bs->opaque,
     };
 }
 
@@ -111,6 +113,7 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
 {
     NFSRPC *task = private_data;
     task->ret = ret;
+    assert(!task->st);
     if (task->ret > 0 && task->iov) {
         if (task->ret <= task->iov->size) {
             qemu_iovec_from_buf(task->iov, 0, data, task->ret);
@@ -118,18 +121,11 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
             task->ret = -EIO;
         }
     }
-    if (task->ret == 0 && task->st) {
-        memcpy(task->st, data, sizeof(struct stat));
-    }
     if (task->ret < 0) {
         error_report("NFS Error: %s", nfs_get_error(nfs));
     }
-    if (task->co) {
-        aio_bh_schedule_oneshot(task->client->aio_context,
-                                nfs_co_generic_bh_cb, task);
-    } else {
-        task->complete = 1;
-    }
+    aio_bh_schedule_oneshot(task->client->aio_context,
+                            nfs_co_generic_bh_cb, task);
 }
 
 static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
@@ -139,7 +135,7 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
     NFSClient *client = bs->opaque;
     NFSRPC task;
 
-    nfs_co_init_task(client, &task);
+    nfs_co_init_task(bs, &task);
     task.iov = iov;
 
     if (nfs_pread_async(client->context, client->fh,
@@ -174,7 +170,7 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
     NFSRPC task;
     char *buf = NULL;
 
-    nfs_co_init_task(client, &task);
+    nfs_co_init_task(bs, &task);
 
     buf = g_try_malloc(nb_sectors * BDRV_SECTOR_SIZE);
     if (nb_sectors && buf == NULL) {
@@ -210,7 +206,7 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
     NFSClient *client = bs->opaque;
     NFSRPC task;
 
-    nfs_co_init_task(client, &task);
+    nfs_co_init_task(bs, &task);
 
     if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
                         &task) != 0) {
@@ -496,6 +492,21 @@ static int nfs_has_zero_init(BlockDriverState *bs)
     return client->has_zero_init;
 }
 
+static void
+nfs_get_allocated_file_size_cb(int ret, struct nfs_context *nfs, void *data,
+                               void *private_data)
+{
+    NFSRPC *task = private_data;
+    task->ret = ret;
+    if (task->ret == 0) {
+        memcpy(task->st, data, sizeof(struct stat));
+    }
+    if (task->ret < 0) {
+        error_report("NFS Error: %s", nfs_get_error(nfs));
+    }
+    task->complete = 1;
+}
+
 static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
 {
     NFSClient *client = bs->opaque;
@@ -507,16 +518,15 @@ static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
         return client->st_blocks * 512;
     }
 
+    task.bs = bs;
     task.st = &st;
-    if (nfs_fstat_async(client->context, client->fh, nfs_co_generic_cb,
+    if (nfs_fstat_async(client->context, client->fh, nfs_get_allocated_file_size_cb,
                         &task) != 0) {
         return -ENOMEM;
     }
 
     nfs_set_events(client);
-    while (!task.complete) {
-        aio_poll(client->aio_context, true);
-    }
+    BDRV_POLL_WHILE(bs, !task.complete);
 
     return (task.ret < 0 ? task.ret : st.st_blocks * 512);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 10/20] sheepdog: use BDRV_POLL_WHILE
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (8 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 09/20] nfs: use BDRV_POLL_WHILE Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 11/20] aio: introduce qemu_get_current_aio_context Paolo Bonzini
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

This is important when the sheepdog driver works on a BlockDriverState
that is attached to an I/O thread other than the main thread.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: no bdrv_wakeup yet

 block/sheepdog.c | 64 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 35 insertions(+), 29 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index ccbf7e1..16a5c1c 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -641,6 +641,7 @@ static void restart_co_req(void *opaque)
 
 typedef struct SheepdogReqCo {
     int sockfd;
+    BlockDriverState *bs;
     AioContext *aio_context;
     SheepdogReq *hdr;
     void *data;
@@ -708,13 +709,14 @@ out:
  *
  * Return 0 on success, -errno in case of error.
  */
-static int do_req(int sockfd, AioContext *aio_context, SheepdogReq *hdr,
+static int do_req(int sockfd, BlockDriverState *bs, SheepdogReq *hdr,
                   void *data, unsigned int *wlen, unsigned int *rlen)
 {
     Coroutine *co;
     SheepdogReqCo srco = {
         .sockfd = sockfd,
-        .aio_context = aio_context,
+        .aio_context = bs ? bdrv_get_aio_context(bs) : qemu_get_aio_context(),
+        .bs = bs,
         .hdr = hdr,
         .data = data,
         .wlen = wlen,
@@ -727,9 +729,14 @@ static int do_req(int sockfd, AioContext *aio_context, SheepdogReq *hdr,
         do_co_req(&srco);
     } else {
         co = qemu_coroutine_create(do_co_req, &srco);
-        qemu_coroutine_enter(co);
-        while (!srco.finished) {
-            aio_poll(aio_context, true);
+        if (bs) {
+            qemu_coroutine_enter(co);
+            BDRV_POLL_WHILE(bs, !srco.finished);
+        } else {
+            qemu_coroutine_enter(co);
+            while (!srco.finished) {
+                aio_poll(qemu_get_aio_context(), true);
+            }
         }
     }
 
@@ -1125,7 +1132,7 @@ static int find_vdi_name(BDRVSheepdogState *s, const char *filename,
     hdr.snapid = snapid;
     hdr.flags = SD_FLAG_CMD_WRITE;
 
-    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
+    ret = do_req(fd, s->bs, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
     if (ret) {
         error_setg_errno(errp, -ret, "cannot get vdi info");
         goto out;
@@ -1240,7 +1247,7 @@ out:
     qemu_co_mutex_unlock(&s->lock);
 }
 
-static int read_write_object(int fd, AioContext *aio_context, char *buf,
+static int read_write_object(int fd, BlockDriverState *bs, char *buf,
                              uint64_t oid, uint8_t copies,
                              unsigned int datalen, uint64_t offset,
                              bool write, bool create, uint32_t cache_flags)
@@ -1274,7 +1281,7 @@ static int read_write_object(int fd, AioContext *aio_context, char *buf,
     hdr.offset = offset;
     hdr.copies = copies;
 
-    ret = do_req(fd, aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
+    ret = do_req(fd, bs, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
     if (ret) {
         error_report("failed to send a request to the sheep");
         return ret;
@@ -1289,22 +1296,22 @@ static int read_write_object(int fd, AioContext *aio_context, char *buf,
     }
 }
 
-static int read_object(int fd, AioContext *aio_context, char *buf,
+static int read_object(int fd, BlockDriverState *bs, char *buf,
                        uint64_t oid, uint8_t copies,
                        unsigned int datalen, uint64_t offset,
                        uint32_t cache_flags)
 {
-    return read_write_object(fd, aio_context, buf, oid, copies,
+    return read_write_object(fd, bs, buf, oid, copies,
                              datalen, offset, false,
                              false, cache_flags);
 }
 
-static int write_object(int fd, AioContext *aio_context, char *buf,
+static int write_object(int fd, BlockDriverState *bs, char *buf,
                         uint64_t oid, uint8_t copies,
                         unsigned int datalen, uint64_t offset, bool create,
                         uint32_t cache_flags)
 {
-    return read_write_object(fd, aio_context, buf, oid, copies,
+    return read_write_object(fd, bs, buf, oid, copies,
                              datalen, offset, true,
                              create, cache_flags);
 }
@@ -1331,7 +1338,7 @@ static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char *tag)
         goto out;
     }
 
-    ret = read_object(fd, s->aio_context, (char *)inode, vid_to_vdi_oid(vid),
+    ret = read_object(fd, s->bs, (char *)inode, vid_to_vdi_oid(vid),
                       s->inode.nr_copies, SD_INODE_HEADER_SIZE, 0,
                       s->cache_flags);
     if (ret < 0) {
@@ -1489,7 +1496,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     buf = g_malloc(SD_INODE_SIZE);
-    ret = read_object(fd, s->aio_context, buf, vid_to_vdi_oid(vid),
+    ret = read_object(fd, s->bs, buf, vid_to_vdi_oid(vid),
                       0, SD_INODE_SIZE, 0, s->cache_flags);
 
     closesocket(fd);
@@ -1618,7 +1625,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot,
     hdr.copies = s->inode.nr_copies;
     hdr.block_size_shift = s->inode.block_size_shift;
 
-    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
+    ret = do_req(fd, NULL, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
 
     closesocket(fd);
 
@@ -1886,7 +1893,7 @@ static int sd_create(const char *filename, QemuOpts *opts,
         hdr.opcode = SD_OP_GET_CLUSTER_DEFAULT;
         hdr.proto_ver = SD_PROTO_VER;
 
-        ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr,
+        ret = do_req(fd, NULL, (SheepdogReq *)&hdr,
                      NULL, &wlen, &rlen);
         closesocket(fd);
         if (ret) {
@@ -1951,7 +1958,7 @@ static void sd_close(BlockDriverState *bs)
     hdr.data_length = wlen;
     hdr.flags = SD_FLAG_CMD_WRITE;
 
-    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr,
+    ret = do_req(fd, s->bs, (SheepdogReq *)&hdr,
                  s->name, &wlen, &rlen);
 
     closesocket(fd);
@@ -2000,7 +2007,7 @@ static int sd_truncate(BlockDriverState *bs, int64_t offset)
     /* we don't need to update entire object */
     datalen = SD_INODE_SIZE - sizeof(s->inode.data_vdi_id);
     s->inode.vdi_size = offset;
-    ret = write_object(fd, s->aio_context, (char *)&s->inode,
+    ret = write_object(fd, s->bs, (char *)&s->inode,
                        vid_to_vdi_oid(s->inode.vdi_id), s->inode.nr_copies,
                        datalen, 0, false, s->cache_flags);
     close(fd);
@@ -2070,7 +2077,7 @@ static bool sd_delete(BDRVSheepdogState *s)
         return false;
     }
 
-    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr,
+    ret = do_req(fd, s->bs, (SheepdogReq *)&hdr,
                  s->name, &wlen, &rlen);
     closesocket(fd);
     if (ret) {
@@ -2126,7 +2133,7 @@ static int sd_create_branch(BDRVSheepdogState *s)
         goto out;
     }
 
-    ret = read_object(fd, s->aio_context, buf, vid_to_vdi_oid(vid),
+    ret = read_object(fd, s->bs, buf, vid_to_vdi_oid(vid),
                       s->inode.nr_copies, SD_INODE_SIZE, 0, s->cache_flags);
 
     closesocket(fd);
@@ -2411,7 +2418,7 @@ static int sd_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
         goto cleanup;
     }
 
-    ret = write_object(fd, s->aio_context, (char *)&s->inode,
+    ret = write_object(fd, s->bs, (char *)&s->inode,
                        vid_to_vdi_oid(s->inode.vdi_id), s->inode.nr_copies,
                        datalen, 0, false, s->cache_flags);
     if (ret < 0) {
@@ -2426,7 +2433,7 @@ static int sd_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
         goto cleanup;
     }
 
-    ret = read_object(fd, s->aio_context, (char *)inode,
+    ret = read_object(fd, s->bs, (char *)inode,
                       vid_to_vdi_oid(new_vid), s->inode.nr_copies, datalen, 0,
                       s->cache_flags);
 
@@ -2528,7 +2535,7 @@ static bool remove_objects(BDRVSheepdogState *s)
             i++;
         }
 
-        ret = write_object(fd, s->aio_context,
+        ret = write_object(fd, s->bs,
                            (char *)&inode->data_vdi_id[start_idx],
                            vid_to_vdi_oid(s->inode.vdi_id), inode->nr_copies,
                            (i - start_idx) * sizeof(uint32_t),
@@ -2600,7 +2607,7 @@ static int sd_snapshot_delete(BlockDriverState *bs,
         return -1;
     }
 
-    ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr,
+    ret = do_req(fd, s->bs, (SheepdogReq *)&hdr,
                  buf, &wlen, &rlen);
     closesocket(fd);
     if (ret) {
@@ -2652,8 +2659,7 @@ static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
     req.opcode = SD_OP_READ_VDIS;
     req.data_length = max;
 
-    ret = do_req(fd, s->aio_context, &req,
-                 vdi_inuse, &wlen, &rlen);
+    ret = do_req(fd, s->bs, &req, vdi_inuse, &wlen, &rlen);
 
     closesocket(fd);
     if (ret) {
@@ -2679,7 +2685,7 @@ static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
         }
 
         /* we don't need to read entire object */
-        ret = read_object(fd, s->aio_context, (char *)&inode,
+        ret = read_object(fd, s->bs, (char *)&inode,
                           vid_to_vdi_oid(vid),
                           0, SD_INODE_SIZE - sizeof(inode.data_vdi_id), 0,
                           s->cache_flags);
@@ -2745,11 +2751,11 @@ static int do_load_save_vmstate(BDRVSheepdogState *s, uint8_t *data,
 
         create = (offset == 0);
         if (load) {
-            ret = read_object(fd, s->aio_context, (char *)data, vmstate_oid,
+            ret = read_object(fd, s->bs, (char *)data, vmstate_oid,
                               s->inode.nr_copies, data_len, offset,
                               s->cache_flags);
         } else {
-            ret = write_object(fd, s->aio_context, (char *)data, vmstate_oid,
+            ret = write_object(fd, s->bs, (char *)data, vmstate_oid,
                                s->inode.nr_copies, data_len, offset, create,
                                s->cache_flags);
         }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 11/20] aio: introduce qemu_get_current_aio_context
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (9 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 10/20] sheepdog: " Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 12/20] iothread: detach all block devices before stopping them Paolo Bonzini
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

This will be used by BDRV_POLL_WHILE (and thus by bdrv_drain)
to choose how to wait for I/O completion.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: adjust documentation according to Stefan's suggestion

 include/block/aio.h | 18 ++++++++++++++++++
 iothread.c          |  9 +++++++++
 stubs/Makefile.objs |  1 +
 stubs/iothread.c    |  8 ++++++++
 4 files changed, 36 insertions(+)
 create mode 100644 stubs/iothread.c

diff --git a/include/block/aio.h b/include/block/aio.h
index b9fe2cb..040b3b1 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -453,6 +453,24 @@ static inline bool aio_node_check(AioContext *ctx, bool is_external)
 }
 
 /**
+ * Return the AioContext whose event loop runs in the current thread.
+ *
+ * If called from an IOThread this will be the IOThread's AioContext.  If
+ * called from another thread it will be the main loop AioContext.
+ */
+AioContext *qemu_get_current_aio_context(void);
+
+/**
+ * @ctx: the aio context
+ *
+ * Return whether we are running in the I/O thread that manages @ctx.
+ */
+static inline bool aio_context_in_iothread(AioContext *ctx)
+{
+    return ctx == qemu_get_current_aio_context();
+}
+
+/**
  * aio_context_setup:
  * @ctx: the aio context
  *
diff --git a/iothread.c b/iothread.c
index fbeb8de..62c8796 100644
--- a/iothread.c
+++ b/iothread.c
@@ -20,6 +20,7 @@
 #include "qmp-commands.h"
 #include "qemu/error-report.h"
 #include "qemu/rcu.h"
+#include "qemu/main-loop.h"
 
 typedef ObjectClass IOThreadClass;
 
@@ -28,6 +29,13 @@ typedef ObjectClass IOThreadClass;
 #define IOTHREAD_CLASS(klass) \
    OBJECT_CLASS_CHECK(IOThreadClass, klass, TYPE_IOTHREAD)
 
+static __thread IOThread *my_iothread;
+
+AioContext *qemu_get_current_aio_context(void)
+{
+    return my_iothread ? my_iothread->ctx : qemu_get_aio_context();
+}
+
 static void *iothread_run(void *opaque)
 {
     IOThread *iothread = opaque;
@@ -35,6 +43,7 @@ static void *iothread_run(void *opaque)
 
     rcu_register_thread();
 
+    my_iothread = iothread;
     qemu_mutex_lock(&iothread->init_done_lock);
     iothread->thread_id = qemu_get_thread_id();
     qemu_cond_signal(&iothread->init_done_cond);
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index c5850e8..84b9d9e 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -17,6 +17,7 @@ stub-obj-y += gdbstub.o
 stub-obj-y += get-fd.o
 stub-obj-y += get-next-serial.o
 stub-obj-y += get-vm-name.o
+stub-obj-y += iothread.o
 stub-obj-y += iothread-lock.o
 stub-obj-y += is-daemonized.o
 stub-obj-y += machine-init-done.o
diff --git a/stubs/iothread.c b/stubs/iothread.c
new file mode 100644
index 0000000..8cc9e28
--- /dev/null
+++ b/stubs/iothread.c
@@ -0,0 +1,8 @@
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "qemu/main-loop.h"
+
+AioContext *qemu_get_current_aio_context(void)
+{
+    return qemu_get_aio_context();
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 12/20] iothread: detach all block devices before stopping them
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (10 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 11/20] aio: introduce qemu_get_current_aio_context Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 13/20] replication: pass BlockDriverState to reopen_backing_file Paolo Bonzini
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

Soon bdrv_drain will not call aio_poll itself on iothreads.  If block
devices are left hanging off the iothread's AioContext, there will be no
one to do I/O for those poor devices.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: add "if ... continue" [Fam]

 iothread.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/iothread.c b/iothread.c
index 62c8796..fdfb440 100644
--- a/iothread.c
+++ b/iothread.c
@@ -16,6 +16,7 @@
 #include "qom/object_interfaces.h"
 #include "qemu/module.h"
 #include "block/aio.h"
+#include "block/block.h"
 #include "sysemu/iothread.h"
 #include "qmp-commands.h"
 #include "qemu/error-report.h"
@@ -199,6 +200,18 @@ IOThreadInfoList *qmp_query_iothreads(Error **errp)
 void iothread_stop_all(void)
 {
     Object *container = object_get_objects_root();
+    BlockDriverState *bs;
+    BdrvNextIterator it;
+
+    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+        AioContext *ctx = bdrv_get_aio_context(bs);
+        if (ctx == qemu_get_aio_context()) {
+            continue;
+        }
+        aio_context_acquire(ctx);
+        bdrv_set_aio_context(bs, qemu_get_aio_context());
+        aio_context_release(ctx);
+    }
 
     object_child_foreach(container, iothread_stop, NULL);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 13/20] replication: pass BlockDriverState to reopen_backing_file
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (11 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 12/20] iothread: detach all block devices before stopping them Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 14/20] block: prepare bdrv_reopen_multiple to release AioContext Paolo Bonzini
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

This will be needed in the next patch to retrieve the AioContext.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/replication.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 5231a00..af47479 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -317,9 +317,10 @@ static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
     }
 }
 
-static void reopen_backing_file(BDRVReplicationState *s, bool writable,
+static void reopen_backing_file(BlockDriverState *bs, bool writable,
                                 Error **errp)
 {
+    BDRVReplicationState *s = bs->opaque;
     BlockReopenQueue *reopen_queue = NULL;
     int orig_hidden_flags, orig_secondary_flags;
     int new_hidden_flags, new_secondary_flags;
@@ -359,8 +360,9 @@ static void reopen_backing_file(BDRVReplicationState *s, bool writable,
     }
 }
 
-static void backup_job_cleanup(BDRVReplicationState *s)
+static void backup_job_cleanup(BlockDriverState *bs)
 {
+    BDRVReplicationState *s = bs->opaque;
     BlockDriverState *top_bs;
 
     top_bs = bdrv_lookup_bs(s->top_id, s->top_id, NULL);
@@ -369,19 +371,20 @@ static void backup_job_cleanup(BDRVReplicationState *s)
     }
     bdrv_op_unblock_all(top_bs, s->blocker);
     error_free(s->blocker);
-    reopen_backing_file(s, false, NULL);
+    reopen_backing_file(bs, false, NULL);
 }
 
 static void backup_job_completed(void *opaque, int ret)
 {
-    BDRVReplicationState *s = opaque;
+    BlockDriverState *bs = opaque;
+    BDRVReplicationState *s = bs->opaque;
 
     if (s->replication_state != BLOCK_REPLICATION_FAILOVER) {
         /* The backup job is cancelled unexpectedly */
         s->error = -EIO;
     }
 
-    backup_job_cleanup(s);
+    backup_job_cleanup(bs);
 }
 
 static bool check_top_bs(BlockDriverState *top_bs, BlockDriverState *bs)
@@ -477,7 +480,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         }
 
         /* reopen the backing file in r/w mode */
-        reopen_backing_file(s, true, &local_err);
+        reopen_backing_file(bs, true, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             aio_context_release(aio_context);
@@ -492,7 +495,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         if (!top_bs || !bdrv_is_root_node(top_bs) ||
             !check_top_bs(top_bs, bs)) {
             error_setg(errp, "No top_bs or it is invalid");
-            reopen_backing_file(s, false, NULL);
+            reopen_backing_file(bs, false, NULL);
             aio_context_release(aio_context);
             return;
         }
@@ -502,10 +505,10 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         backup_start("replication-backup", s->secondary_disk->bs,
                      s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE, NULL, false,
                      BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
-                     backup_job_completed, s, NULL, &local_err);
+                     backup_job_completed, bs, NULL, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
-            backup_job_cleanup(s);
+            backup_job_cleanup(bs);
             aio_context_release(aio_context);
             return;
         }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 14/20] block: prepare bdrv_reopen_multiple to release AioContext
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (12 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 13/20] replication: pass BlockDriverState to reopen_backing_file Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 15/20] qemu-io: acquire AioContext Paolo Bonzini
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

After the next patch bdrv_drain_all will have to be called without holding any
AioContext.  Prepare to do this by adding an AioContext argument to
bdrv_reopen_multiple.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c               | 4 ++--
 block/commit.c        | 2 +-
 block/replication.c   | 3 ++-
 include/block/block.h | 2 +-
 qemu-io-cmds.c        | 2 +-
 5 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 7f3e7bc..fbe485c 100644
--- a/block.c
+++ b/block.c
@@ -2082,7 +2082,7 @@ BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
  * to all devices.
  *
  */
-int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
+int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue, Error **errp)
 {
     int ret = -1;
     BlockReopenQueueEntry *bs_entry, *next;
@@ -2131,7 +2131,7 @@ int bdrv_reopen(BlockDriverState *bs, int bdrv_flags, Error **errp)
     Error *local_err = NULL;
     BlockReopenQueue *queue = bdrv_reopen_queue(NULL, bs, NULL, bdrv_flags);
 
-    ret = bdrv_reopen_multiple(queue, &local_err);
+    ret = bdrv_reopen_multiple(bdrv_get_aio_context(bs), queue, &local_err);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
     }
diff --git a/block/commit.c b/block/commit.c
index 9f67a8b..499ecca 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -251,7 +251,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
                                          orig_overlay_flags | BDRV_O_RDWR);
     }
     if (reopen_queue) {
-        bdrv_reopen_multiple(reopen_queue, &local_err);
+        bdrv_reopen_multiple(bdrv_get_aio_context(bs), reopen_queue, &local_err);
         if (local_err != NULL) {
             error_propagate(errp, local_err);
             block_job_unref(&s->common);
diff --git a/block/replication.c b/block/replication.c
index af47479..c6962e2 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -355,7 +355,8 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable,
     }
 
     if (reopen_queue) {
-        bdrv_reopen_multiple(reopen_queue, &local_err);
+        bdrv_reopen_multiple(bdrv_get_aio_context(bs),
+                             reopen_queue, &local_err);
         error_propagate(errp, local_err);
     }
 }
diff --git a/include/block/block.h b/include/block/block.h
index 3ba9e34..fb86611 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -218,7 +218,7 @@ BlockDriverState *bdrv_open(const char *filename, const char *reference,
 BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
                                     BlockDriverState *bs,
                                     QDict *options, int flags);
-int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp);
+int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue, Error **errp);
 int bdrv_reopen(BlockDriverState *bs, int bdrv_flags, Error **errp);
 int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
                         BlockReopenQueue *queue, Error **errp);
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 3a3838a..dbe0413 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1956,7 +1956,7 @@ static int reopen_f(BlockBackend *blk, int argc, char **argv)
     qemu_opts_reset(&reopen_opts);
 
     brq = bdrv_reopen_queue(NULL, bs, opts, flags);
-    bdrv_reopen_multiple(brq, &local_err);
+    bdrv_reopen_multiple(bdrv_get_aio_context(bs), brq, &local_err);
     if (local_err) {
         error_report_err(local_err);
     } else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 15/20] qemu-io: acquire AioContext
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (13 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 14/20] block: prepare bdrv_reopen_multiple to release AioContext Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 16/20] qemu-img: call aio_context_acquire/release around block job Paolo Bonzini
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: new [qemu-iotests]

 qemu-io-cmds.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index dbe0413..4750e9a 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -2216,6 +2216,7 @@ static const cmdinfo_t help_cmd = {
 
 bool qemuio_command(BlockBackend *blk, const char *cmd)
 {
+    AioContext *ctx;
     char *input;
     const cmdinfo_t *ct;
     char **v;
@@ -2227,7 +2228,10 @@ bool qemuio_command(BlockBackend *blk, const char *cmd)
     if (c) {
         ct = find_command(v[0]);
         if (ct) {
+            ctx = blk ? blk_get_aio_context(blk) : qemu_get_aio_context();
+            aio_context_acquire(ctx);
             done = command(blk, ct, c, v);
+            aio_context_release(ctx);
         } else {
             fprintf(stderr, "command \"%s\" not found\n", v[0]);
         }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 16/20] qemu-img: call aio_context_acquire/release around block job
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (14 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 15/20] qemu-io: acquire AioContext Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-19  0:54   ` Fam Zheng
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 17/20] block: only call aio_poll on the current thread's AioContext Paolo Bonzini
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

This will be needed by bdrv_reopen_multiple, which calls
bdrv_drain_all and thus will *release* the AioContext.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: new [qemu-iotests]

 qemu-img.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 02c07b9..ad7c964 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -795,6 +795,7 @@ static void run_block_job(BlockJob *job, Error **errp)
 {
     AioContext *aio_context = blk_get_aio_context(job->blk);
 
+    aio_context_acquire(aio_context);
     do {
         aio_poll(aio_context, true);
         qemu_progress_print(job->len ?
@@ -802,6 +803,7 @@ static void run_block_job(BlockJob *job, Error **errp)
     } while (!job->ready);
 
     block_job_complete_sync(job, errp);
+    aio_context_release(aio_context);
 
     /* A block job may finish instantaneously without publishing any progress,
      * so just signal completion here */
@@ -819,6 +821,7 @@ static int img_commit(int argc, char **argv)
     Error *local_err = NULL;
     CommonBlockJobCBInfo cbi;
     bool image_opts = false;
+    AioContext *aio_context;
 
     fmt = NULL;
     cache = BDRV_DEFAULT_CACHE;
@@ -928,8 +931,11 @@ static int img_commit(int argc, char **argv)
         .bs   = bs,
     };
 
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
     commit_active_start("commit", bs, base_bs, 0, BLOCKDEV_ON_ERROR_REPORT,
                         common_block_job_cb, &cbi, &local_err, false);
+    aio_context_release(aio_context);
     if (local_err) {
         goto done;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 17/20] block: only call aio_poll on the current thread's AioContext
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (15 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 16/20] qemu-img: call aio_context_acquire/release around block job Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 18/20] iothread: release AioContext around aio_poll Paolo Bonzini
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

aio_poll is not thread safe; for example bdrv_drain can hang if
the last in-flight I/O operation is completed in the I/O thread after
the main thread has checked bs->in_flight.

The bug remains latent as long as all of it is called within
aio_context_acquire/aio_context_release, but this will change soon.

To fix this, if bdrv_drain is called from outside the I/O thread,
signal the main AioContext through a dummy bottom half.  The event
loop then only runs in the I/O thread.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: bdrv_wakeup moved here and documented [Stefan]

 async.c                         |  1 +
 block.c                         |  2 ++
 block/io.c                      | 12 ++++++++++++
 block/nfs.c                     |  1 +
 block/sheepdog.c                |  3 +++
 hw/scsi/virtio-scsi-dataplane.c |  4 +---
 include/block/block.h           | 24 +++++++++++++++++++++---
 include/block/block_int.h       | 17 +++++++++++++++++
 8 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/async.c b/async.c
index f30d011..fb37b03 100644
--- a/async.c
+++ b/async.c
@@ -61,6 +61,7 @@ void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
     smp_wmb();
     ctx->first_bh = bh;
     qemu_mutex_unlock(&ctx->bh_lock);
+    aio_notify(ctx);
 }
 
 QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
diff --git a/block.c b/block.c
index fbe485c..a17baab 100644
--- a/block.c
+++ b/block.c
@@ -2090,7 +2090,9 @@ int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue, Error **er
 
     assert(bs_queue != NULL);
 
+    aio_context_release(ctx);
     bdrv_drain_all();
+    aio_context_acquire(ctx);
 
     QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) {
         if (bdrv_reopen_prepare(&bs_entry->state, bs_queue, &local_err)) {
diff --git a/block/io.c b/block/io.c
index f0682dd..91f8a08 100644
--- a/block/io.c
+++ b/block/io.c
@@ -474,9 +474,21 @@ void bdrv_inc_in_flight(BlockDriverState *bs)
     atomic_inc(&bs->in_flight);
 }
 
+static void dummy_bh_cb(void *opaque)
+{
+}
+
+void bdrv_wakeup(BlockDriverState *bs)
+{
+    if (bs->wakeup) {
+        aio_bh_schedule_oneshot(qemu_get_aio_context(), dummy_bh_cb, NULL);
+    }
+}
+
 void bdrv_dec_in_flight(BlockDriverState *bs)
 {
     atomic_dec(&bs->in_flight);
+    bdrv_wakeup(bs);
 }
 
 static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
diff --git a/block/nfs.c b/block/nfs.c
index 7474fbc..88c60a9 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -505,6 +505,7 @@ nfs_get_allocated_file_size_cb(int ret, struct nfs_context *nfs, void *data,
         error_report("NFS Error: %s", nfs_get_error(nfs));
     }
     task->complete = 1;
+    bdrv_wakeup(task->bs);
 }
 
 static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 16a5c1c..1fb9173 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -702,6 +702,9 @@ out:
 
     srco->ret = ret;
     srco->finished = true;
+    if (srco->bs) {
+        bdrv_wakeup(srco->bs);
+    }
 }
 
 /*
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index b173b94..9424f0e 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -189,13 +189,11 @@ void virtio_scsi_dataplane_stop(VirtIOSCSI *s)
     assert(s->ctx == iothread_get_aio_context(vs->conf.iothread));
 
     aio_context_acquire(s->ctx);
-
     virtio_scsi_clear_aio(s);
+    aio_context_release(s->ctx);
 
     blk_drain_all(); /* ensure there are no in-flight requests */
 
-    aio_context_release(s->ctx);
-
     for (i = 0; i < vs->conf.num_queues + 2; i++) {
         virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
     }
diff --git a/include/block/block.h b/include/block/block.h
index fb86611..df25665 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -343,9 +343,27 @@ void bdrv_drain_all(void);
 #define BDRV_POLL_WHILE(bs, cond) ({                       \
     bool waited_ = false;                                  \
     BlockDriverState *bs_ = (bs);                          \
-    while ((cond)) {                                       \
-        aio_poll(bdrv_get_aio_context(bs_), true);         \
-        waited_ = true;                                    \
+    AioContext *ctx_ = bdrv_get_aio_context(bs_);          \
+    if (aio_context_in_iothread(ctx_)) {                   \
+        while ((cond)) {                                   \
+            aio_poll(ctx_, true);                          \
+            waited_ = true;                                \
+        }                                                  \
+    } else {                                               \
+        assert(qemu_get_current_aio_context() ==           \
+               qemu_get_aio_context());                    \
+        /* Ask bdrv_dec_in_flight to wake up the main      \
+         * QEMU AioContext.                                \
+         */                                                \
+        assert(!bs_->wakeup);                              \
+        bs_->wakeup = true;                                \
+        while ((cond)) {                                   \
+            aio_context_release(ctx_);                     \
+            aio_poll(qemu_get_aio_context(), true);        \
+            aio_context_acquire(ctx_);                     \
+            waited_ = true;                                \
+        }                                                  \
+        bs_->wakeup = false;                               \
     }                                                      \
     waited_; })
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5a7308b..2289ca1 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -473,6 +473,8 @@ struct BlockDriverState {
     unsigned int in_flight;
     unsigned int serialising_in_flight;
 
+    bool wakeup;
+
     /* Offset after the highest byte written to */
     uint64_t wr_highest_offset;
 
@@ -631,6 +633,21 @@ void bdrv_remove_aio_context_notifier(BlockDriverState *bs,
                                       void (*aio_context_detached)(void *),
                                       void *opaque);
 
+/**
+ * bdrv_wakeup:
+ * @bs: The BlockDriverState for which an I/O operation has been completed.
+ *
+ * Wake up the main thread if it is waiting on BDRV_POLL_WHILE.  During
+ * synchronous I/O on a BlockDriverState that is attached to another
+ * I/O thread, the main thread lets the I/O thread's event loop run,
+ * waiting for the I/O operation to complete.  A bdrv_wakeup will wake
+ * up the main thread if necessary.
+ *
+ * Manual calls to bdrv_wakeup are rarely necessary, because
+ * bdrv_dec_in_flight already calls it.
+ */
+void bdrv_wakeup(BlockDriverState *bs);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 18/20] iothread: release AioContext around aio_poll
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (16 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 17/20] block: only call aio_poll on the current thread's AioContext Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 19/20] qemu-thread: introduce QemuRecMutex Paolo Bonzini
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

This is the first step towards having fine-grained critical sections in
dataplane threads, which will resolve lock ordering problems between
address_space_* functions (which need the BQL when doing MMIO, even
after we complete RCU-based dispatch) and the AioContext.

Because AioContext does not use contention callbacks anymore, the
unit test has to be changed.

Previously applied as a0710f7995f914e3044e5899bd8ff6c43c62f916 and
then reverted.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c                     | 22 +++-------------------
 docs/multiple-iothreads.txt | 40 +++++++++++++++++++++++-----------------
 include/block/aio.h         |  3 ---
 iothread.c                  | 11 ++---------
 tests/test-aio.c            | 22 ++++++++++++++--------
 5 files changed, 42 insertions(+), 56 deletions(-)

diff --git a/async.c b/async.c
index fb37b03..27db772 100644
--- a/async.c
+++ b/async.c
@@ -107,8 +107,8 @@ int aio_bh_poll(AioContext *ctx)
          * aio_notify again if necessary.
          */
         if (atomic_xchg(&bh->scheduled, 0)) {
-            /* Idle BHs and the notify BH don't count as progress */
-            if (!bh->idle && bh != ctx->notify_dummy_bh) {
+            /* Idle BHs don't count as progress */
+            if (!bh->idle) {
                 ret = 1;
             }
             bh->idle = 0;
@@ -260,7 +260,6 @@ aio_ctx_finalize(GSource     *source)
 {
     AioContext *ctx = (AioContext *) source;
 
-    qemu_bh_delete(ctx->notify_dummy_bh);
     thread_pool_free(ctx->thread_pool);
 
 #ifdef CONFIG_LINUX_AIO
@@ -346,19 +345,6 @@ static void aio_timerlist_notify(void *opaque)
     aio_notify(opaque);
 }
 
-static void aio_rfifolock_cb(void *opaque)
-{
-    AioContext *ctx = opaque;
-
-    /* Kick owner thread in case they are blocked in aio_poll() */
-    qemu_bh_schedule(ctx->notify_dummy_bh);
-}
-
-static void notify_dummy_bh(void *opaque)
-{
-    /* Do nothing, we were invoked just to force the event loop to iterate */
-}
-
 static void event_notifier_dummy_cb(EventNotifier *e)
 {
 }
@@ -386,11 +372,9 @@ AioContext *aio_context_new(Error **errp)
 #endif
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
-    rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
+    rfifolock_init(&ctx->lock, NULL, NULL);
     timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
 
-    ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
-
     return ctx;
 fail:
     g_source_destroy(&ctx->source);
diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
index 40b8419..0e7cdb2 100644
--- a/docs/multiple-iothreads.txt
+++ b/docs/multiple-iothreads.txt
@@ -105,13 +105,10 @@ a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
 acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
 sure to acquire the AioContext for aio_bh_new() if necessary.
 
-The relationship between AioContext and the block layer
--------------------------------------------------------
-The AioContext originates from the QEMU block layer because it provides a
-scoped way of running event loop iterations until all work is done.  This
-feature is used to complete all in-flight block I/O requests (see
-bdrv_drain_all()).  Nowadays AioContext is a generic event loop that can be
-used by any QEMU subsystem.
+AioContext and the block layer
+------------------------------
+The AioContext originates from the QEMU block layer, even though nowadays
+AioContext is a generic event loop that can be used by any QEMU subsystem.
 
 The block layer has support for AioContext integrated.  Each BlockDriverState
 is associated with an AioContext using bdrv_set_aio_context() and
@@ -122,13 +119,22 @@ Block layer code must therefore expect to run in an IOThread and avoid using
 old APIs that implicitly use the main loop.  See the "How to program for
 IOThreads" above for information on how to do that.
 
-If main loop code such as a QMP function wishes to access a BlockDriverState it
-must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
-IOThread does not run in parallel.
-
-Long-running jobs (usually in the form of coroutines) are best scheduled in the
-BlockDriverState's AioContext to avoid the need to acquire/release around each
-bdrv_*() call.  Be aware that there is currently no mechanism to get notified
-when bdrv_set_aio_context() moves this BlockDriverState to a different
-AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
-may need to add this if you want to support long-running jobs.
+If main loop code such as a QMP function wishes to access a BlockDriverState
+it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure
+that callbacks in the IOThread do not run in parallel.
+
+Code running in the monitor typically needs to ensure that past
+requests from the guest are completed.  When a block device is running
+in an IOThread, the IOThread can also process requests from the guest
+(via ioeventfd).  To achieve both objects, wrap the code between
+bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
+section".  The functions must be called between aio_context_acquire()
+and aio_context_release().  You can freely release and re-acquire the
+AioContext within a drained section.
+
+Long-running jobs (usually in the form of coroutines) are best scheduled in
+the BlockDriverState's AioContext to avoid the need to acquire/release around
+each bdrv_*() call.  The functions bdrv_add/remove_aio_context_notifier,
+or alternatively blk_add/remove_aio_context_notifier if you use BlockBackends,
+can be used to get a notification whenever bdrv_set_aio_context() moves a
+BlockDriverState to a different AioContext.
diff --git a/include/block/aio.h b/include/block/aio.h
index 040b3b1..8e1885b 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -116,9 +116,6 @@ struct AioContext {
     bool notified;
     EventNotifier notifier;
 
-    /* Scheduling this BH forces the event loop it iterate */
-    QEMUBH *notify_dummy_bh;
-
     /* Thread pool for performing work and receiving completion callbacks */
     struct ThreadPool *thread_pool;
 
diff --git a/iothread.c b/iothread.c
index fdfb440..bd70344 100644
--- a/iothread.c
+++ b/iothread.c
@@ -40,7 +40,6 @@ AioContext *qemu_get_current_aio_context(void)
 static void *iothread_run(void *opaque)
 {
     IOThread *iothread = opaque;
-    bool blocking;
 
     rcu_register_thread();
 
@@ -50,14 +49,8 @@ static void *iothread_run(void *opaque)
     qemu_cond_signal(&iothread->init_done_cond);
     qemu_mutex_unlock(&iothread->init_done_lock);
 
-    while (!iothread->stopping) {
-        aio_context_acquire(iothread->ctx);
-        blocking = true;
-        while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) {
-            /* Progress was made, keep going */
-            blocking = false;
-        }
-        aio_context_release(iothread->ctx);
+    while (!atomic_read(&iothread->stopping)) {
+        aio_poll(iothread->ctx, true);
     }
 
     rcu_unregister_thread();
diff --git a/tests/test-aio.c b/tests/test-aio.c
index 03aa846..5be99f8 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -100,6 +100,7 @@ static void event_ready_cb(EventNotifier *e)
 
 typedef struct {
     QemuMutex start_lock;
+    EventNotifier notifier;
     bool thread_acquired;
 } AcquireTestData;
 
@@ -111,6 +112,11 @@ static void *test_acquire_thread(void *opaque)
     qemu_mutex_lock(&data->start_lock);
     qemu_mutex_unlock(&data->start_lock);
 
+    /* event_notifier_set might be called either before or after
+     * the main thread's call to poll().  The test case's outcome
+     * should be the same in either case.
+     */
+    event_notifier_set(&data->notifier);
     aio_context_acquire(ctx);
     aio_context_release(ctx);
 
@@ -125,20 +131,19 @@ static void set_event_notifier(AioContext *ctx, EventNotifier *notifier,
     aio_set_event_notifier(ctx, notifier, false, handler);
 }
 
-static void dummy_notifier_read(EventNotifier *unused)
+static void dummy_notifier_read(EventNotifier *n)
 {
-    g_assert(false); /* should never be invoked */
+    event_notifier_test_and_clear(n);
 }
 
 static void test_acquire(void)
 {
     QemuThread thread;
-    EventNotifier notifier;
     AcquireTestData data;
 
     /* Dummy event notifier ensures aio_poll() will block */
-    event_notifier_init(&notifier, false);
-    set_event_notifier(ctx, &notifier, dummy_notifier_read);
+    event_notifier_init(&data.notifier, false);
+    set_event_notifier(ctx, &data.notifier, dummy_notifier_read);
     g_assert(!aio_poll(ctx, false)); /* consume aio_notify() */
 
     qemu_mutex_init(&data.start_lock);
@@ -152,12 +157,13 @@ static void test_acquire(void)
     /* Block in aio_poll(), let other thread kick us and acquire context */
     aio_context_acquire(ctx);
     qemu_mutex_unlock(&data.start_lock); /* let the thread run */
-    g_assert(!aio_poll(ctx, true));
+    g_assert(aio_poll(ctx, true));
+    g_assert(!data.thread_acquired);
     aio_context_release(ctx);
 
     qemu_thread_join(&thread);
-    set_event_notifier(ctx, &notifier, NULL);
-    event_notifier_cleanup(&notifier);
+    set_event_notifier(ctx, &data.notifier, NULL);
+    event_notifier_cleanup(&data.notifier);
 
     g_assert(data.thread_acquired);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 19/20] qemu-thread: introduce QemuRecMutex
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (17 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 18/20] iothread: release AioContext around aio_poll Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 20/20] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
  2016-10-19  0:55 ` [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Fam Zheng
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

GRecMutex is new in glib 2.32, so we cannot use it.  Introduce
a recursive mutex in qemu-thread instead, which will be used
instead of RFifoLock.

Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/thread-posix.h |  6 ++++++
 include/qemu/thread-win32.h | 10 ++++++++++
 include/qemu/thread.h       |  3 +++
 util/qemu-thread-posix.c    | 14 ++++++++++++++
 util/qemu-thread-win32.c    | 25 +++++++++++++++++++++++++
 5 files changed, 58 insertions(+)

diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index aa03567..09d1e15 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -4,6 +4,12 @@
 #include <pthread.h>
 #include <semaphore.h>
 
+typedef QemuMutex QemuRecMutex;
+#define qemu_rec_mutex_destroy qemu_mutex_destroy
+#define qemu_rec_mutex_lock qemu_mutex_lock
+#define qemu_rec_mutex_try_lock qemu_mutex_try_lock
+#define qemu_rec_mutex_unlock qemu_mutex_unlock
+
 struct QemuMutex {
     pthread_mutex_t lock;
 };
diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
index c7ce8dc..5fb6541 100644
--- a/include/qemu/thread-win32.h
+++ b/include/qemu/thread-win32.h
@@ -8,6 +8,16 @@ struct QemuMutex {
     LONG owner;
 };
 
+typedef struct QemuRecMutex QemuRecMutex;
+struct QemuRecMutex {
+    CRITICAL_SECTION lock;
+};
+
+void qemu_rec_mutex_destroy(QemuRecMutex *mutex);
+void qemu_rec_mutex_lock(QemuRecMutex *mutex);
+int qemu_rec_mutex_trylock(QemuRecMutex *mutex);
+void qemu_rec_mutex_unlock(QemuRecMutex *mutex);
+
 struct QemuCond {
     LONG waiters, target;
     HANDLE sema;
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 31237e9..e8e665f 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -25,6 +25,9 @@ void qemu_mutex_lock(QemuMutex *mutex);
 int qemu_mutex_trylock(QemuMutex *mutex);
 void qemu_mutex_unlock(QemuMutex *mutex);
 
+/* Prototypes for other functions are in thread-posix.h/thread-win32.h.  */
+void qemu_rec_mutex_init(QemuRecMutex *mutex);
+
 void qemu_cond_init(QemuCond *cond);
 void qemu_cond_destroy(QemuCond *cond);
 
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 74a3023..1bb00a8 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -80,6 +80,20 @@ void qemu_mutex_unlock(QemuMutex *mutex)
         error_exit(err, __func__);
 }
 
+void qemu_rec_mutex_init(QemuRecMutex *mutex)
+{
+    int err;
+    pthread_mutexattr_t attr;
+
+    pthread_mutexattr_init(&attr);
+    pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
+    err = pthread_mutex_init(&mutex->lock, &attr);
+    pthread_mutexattr_destroy(&attr);
+    if (err) {
+        error_exit(err, __func__);
+    }
+}
+
 void qemu_cond_init(QemuCond *cond)
 {
     int err;
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 98a5ddf..171d83c 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -79,6 +79,31 @@ void qemu_mutex_unlock(QemuMutex *mutex)
     LeaveCriticalSection(&mutex->lock);
 }
 
+void qemu_rec_mutex_init(QemuRecMutex *mutex)
+{
+    InitializeCriticalSection(&mutex->lock);
+}
+
+void qemu_rec_mutex_destroy(QemuRecMutex *mutex)
+{
+    DeleteCriticalSection(&mutex->lock);
+}
+
+void qemu_rec_mutex_lock(QemuRecMutex *mutex)
+{
+    EnterCriticalSection(&mutex->lock);
+}
+
+int qemu_rec_mutex_trylock(QemuRecMutex *mutex)
+{
+    return !TryEnterCriticalSection(&mutex->lock);
+}
+
+void qemu_rec_mutex_unlock(QemuRecMutex *mutex)
+{
+    LeaveCriticalSection(&mutex->lock);
+}
+
 void qemu_cond_init(QemuCond *cond)
 {
     memset(cond, 0, sizeof(*cond));
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 20/20] aio: convert from RFifoLock to QemuRecMutex
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (18 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 19/20] qemu-thread: introduce QemuRecMutex Paolo Bonzini
@ 2016-10-17 13:54 ` Paolo Bonzini
  2016-10-19  0:55 ` [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Fam Zheng
  20 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-17 13:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, famz, kwolf, stefanha

It is simpler and a bit faster, and QEMU does not need the contention
callbacks (and thus the fairness) anymore.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c                  |  8 ++---
 include/block/aio.h      |  3 +-
 include/qemu/rfifolock.h | 54 ----------------------------
 tests/.gitignore         |  1 -
 tests/Makefile.include   |  2 --
 tests/test-rfifolock.c   | 91 ------------------------------------------------
 util/Makefile.objs       |  1 -
 util/rfifolock.c         | 78 -----------------------------------------
 8 files changed, 5 insertions(+), 233 deletions(-)
 delete mode 100644 include/qemu/rfifolock.h
 delete mode 100644 tests/test-rfifolock.c
 delete mode 100644 util/rfifolock.c

diff --git a/async.c b/async.c
index 27db772..b2de360 100644
--- a/async.c
+++ b/async.c
@@ -284,7 +284,7 @@ aio_ctx_finalize(GSource     *source)
 
     aio_set_event_notifier(ctx, &ctx->notifier, false, NULL);
     event_notifier_cleanup(&ctx->notifier);
-    rfifolock_destroy(&ctx->lock);
+    qemu_rec_mutex_destroy(&ctx->lock);
     qemu_mutex_destroy(&ctx->bh_lock);
     timerlistgroup_deinit(&ctx->tlg);
 }
@@ -372,7 +372,7 @@ AioContext *aio_context_new(Error **errp)
 #endif
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
-    rfifolock_init(&ctx->lock, NULL, NULL);
+    qemu_rec_mutex_init(&ctx->lock);
     timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
 
     return ctx;
@@ -393,10 +393,10 @@ void aio_context_unref(AioContext *ctx)
 
 void aio_context_acquire(AioContext *ctx)
 {
-    rfifolock_lock(&ctx->lock);
+    qemu_rec_mutex_lock(&ctx->lock);
 }
 
 void aio_context_release(AioContext *ctx)
 {
-    rfifolock_unlock(&ctx->lock);
+    qemu_rec_mutex_unlock(&ctx->lock);
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index 8e1885b..c7ae27c 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -18,7 +18,6 @@
 #include "qemu/queue.h"
 #include "qemu/event_notifier.h"
 #include "qemu/thread.h"
-#include "qemu/rfifolock.h"
 #include "qemu/timer.h"
 
 typedef struct BlockAIOCB BlockAIOCB;
@@ -54,7 +53,7 @@ struct AioContext {
     GSource source;
 
     /* Protects all fields from multi-threaded access */
-    RFifoLock lock;
+    QemuRecMutex lock;
 
     /* The list of registered AIO handlers */
     QLIST_HEAD(, AioHandler) aio_handlers;
diff --git a/include/qemu/rfifolock.h b/include/qemu/rfifolock.h
deleted file mode 100644
index b23ab53..0000000
--- a/include/qemu/rfifolock.h
+++ /dev/null
@@ -1,54 +0,0 @@
-/*
- * Recursive FIFO lock
- *
- * Copyright Red Hat, Inc. 2013
- *
- * Authors:
- *  Stefan Hajnoczi   <stefanha@redhat.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef QEMU_RFIFOLOCK_H
-#define QEMU_RFIFOLOCK_H
-
-#include "qemu/thread.h"
-
-/* Recursive FIFO lock
- *
- * This lock provides more features than a plain mutex:
- *
- * 1. Fairness - enforces FIFO order.
- * 2. Nesting - can be taken recursively.
- * 3. Contention callback - optional, called when thread must wait.
- *
- * The recursive FIFO lock is heavyweight so prefer other synchronization
- * primitives if you do not need its features.
- */
-typedef struct {
-    QemuMutex lock;             /* protects all fields */
-
-    /* FIFO order */
-    unsigned int head;          /* active ticket number */
-    unsigned int tail;          /* waiting ticket number */
-    QemuCond cond;              /* used to wait for our ticket number */
-
-    /* Nesting */
-    QemuThread owner_thread;    /* thread that currently has ownership */
-    unsigned int nesting;       /* amount of nesting levels */
-
-    /* Contention callback */
-    void (*cb)(void *);         /* called when thread must wait, with ->lock
-                                 * held so it may not recursively lock/unlock
-                                 */
-    void *cb_opaque;
-} RFifoLock;
-
-void rfifolock_init(RFifoLock *r, void (*cb)(void *), void *opaque);
-void rfifolock_destroy(RFifoLock *r);
-void rfifolock_lock(RFifoLock *r);
-void rfifolock_unlock(RFifoLock *r);
-
-#endif /* QEMU_RFIFOLOCK_H */
diff --git a/tests/.gitignore b/tests/.gitignore
index 9f3d2ee..2cd897e 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -66,7 +66,6 @@ test-qmp-marshal.c
 test-qmp-output-visitor
 test-rcu-list
 test-replication
-test-rfifolock
 test-string-input-visitor
 test-string-output-visitor
 test-thread-pool
diff --git a/tests/Makefile.include b/tests/Makefile.include
index a77777c..26e7e90 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -43,7 +43,6 @@ check-unit-y += tests/test-visitor-serialization$(EXESUF)
 check-unit-y += tests/test-iov$(EXESUF)
 gcov-files-test-iov-y = util/iov.c
 check-unit-y += tests/test-aio$(EXESUF)
-check-unit-$(CONFIG_POSIX) += tests/test-rfifolock$(EXESUF)
 check-unit-y += tests/test-throttle$(EXESUF)
 gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
 gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
@@ -481,7 +480,6 @@ tests/check-qom-interface$(EXESUF): tests/check-qom-interface.o $(test-qom-obj-y
 tests/check-qom-proplist$(EXESUF): tests/check-qom-proplist.o $(test-qom-obj-y)
 tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(test-block-obj-y)
 tests/test-aio$(EXESUF): tests/test-aio.o $(test-block-obj-y)
-tests/test-rfifolock$(EXESUF): tests/test-rfifolock.o $(test-util-obj-y)
 tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
 tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) $(test-util-obj-y)
 tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o $(test-block-obj-y) $(test-util-obj-y)
diff --git a/tests/test-rfifolock.c b/tests/test-rfifolock.c
deleted file mode 100644
index 471a811..0000000
--- a/tests/test-rfifolock.c
+++ /dev/null
@@ -1,91 +0,0 @@
-/*
- * RFifoLock tests
- *
- * Copyright Red Hat, Inc. 2013
- *
- * Authors:
- *  Stefan Hajnoczi    <stefanha@redhat.com>
- *
- * This work is licensed under the terms of the GNU LGPL, version 2 or later.
- * See the COPYING.LIB file in the top-level directory.
- */
-
-#include "qemu/osdep.h"
-#include "qemu-common.h"
-#include "qemu/rfifolock.h"
-
-static void test_nesting(void)
-{
-    RFifoLock lock;
-
-    /* Trivial test, ensure the lock is recursive */
-    rfifolock_init(&lock, NULL, NULL);
-    rfifolock_lock(&lock);
-    rfifolock_lock(&lock);
-    rfifolock_lock(&lock);
-    rfifolock_unlock(&lock);
-    rfifolock_unlock(&lock);
-    rfifolock_unlock(&lock);
-    rfifolock_destroy(&lock);
-}
-
-typedef struct {
-    RFifoLock lock;
-    int fd[2];
-} CallbackTestData;
-
-static void rfifolock_cb(void *opaque)
-{
-    CallbackTestData *data = opaque;
-    int ret;
-    char c = 0;
-
-    ret = write(data->fd[1], &c, sizeof(c));
-    g_assert(ret == 1);
-}
-
-static void *callback_thread(void *opaque)
-{
-    CallbackTestData *data = opaque;
-
-    /* The other thread holds the lock so the contention callback will be
-     * invoked...
-     */
-    rfifolock_lock(&data->lock);
-    rfifolock_unlock(&data->lock);
-    return NULL;
-}
-
-static void test_callback(void)
-{
-    CallbackTestData data;
-    QemuThread thread;
-    int ret;
-    char c;
-
-    rfifolock_init(&data.lock, rfifolock_cb, &data);
-    ret = qemu_pipe(data.fd);
-    g_assert(ret == 0);
-
-    /* Hold lock but allow the callback to kick us by writing to the pipe */
-    rfifolock_lock(&data.lock);
-    qemu_thread_create(&thread, "callback_thread",
-                       callback_thread, &data, QEMU_THREAD_JOINABLE);
-    ret = read(data.fd[0], &c, sizeof(c));
-    g_assert(ret == 1);
-    rfifolock_unlock(&data.lock);
-    /* If we got here then the callback was invoked, as expected */
-
-    qemu_thread_join(&thread);
-    close(data.fd[0]);
-    close(data.fd[1]);
-    rfifolock_destroy(&data.lock);
-}
-
-int main(int argc, char **argv)
-{
-    g_test_init(&argc, &argv, NULL);
-    g_test_add_func("/nesting", test_nesting);
-    g_test_add_func("/callback", test_callback);
-    return g_test_run();
-}
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 36c7dcc..ad0f9c7 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -25,7 +25,6 @@ util-obj-y += uuid.o
 util-obj-y += throttle.o
 util-obj-y += getauxval.o
 util-obj-y += readline.o
-util-obj-y += rfifolock.o
 util-obj-y += rcu.o
 util-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
 util-obj-y += qemu-coroutine-sleep.o
diff --git a/util/rfifolock.c b/util/rfifolock.c
deleted file mode 100644
index 084c2f0..0000000
--- a/util/rfifolock.c
+++ /dev/null
@@ -1,78 +0,0 @@
-/*
- * Recursive FIFO lock
- *
- * Copyright Red Hat, Inc. 2013
- *
- * Authors:
- *  Stefan Hajnoczi   <stefanha@redhat.com>
- *
- * This work is licensed under the terms of the GNU LGPL, version 2 or later.
- * See the COPYING.LIB file in the top-level directory.
- *
- */
-
-#include "qemu/osdep.h"
-#include "qemu/rfifolock.h"
-
-void rfifolock_init(RFifoLock *r, void (*cb)(void *), void *opaque)
-{
-    qemu_mutex_init(&r->lock);
-    r->head = 0;
-    r->tail = 0;
-    qemu_cond_init(&r->cond);
-    r->nesting = 0;
-    r->cb = cb;
-    r->cb_opaque = opaque;
-}
-
-void rfifolock_destroy(RFifoLock *r)
-{
-    qemu_cond_destroy(&r->cond);
-    qemu_mutex_destroy(&r->lock);
-}
-
-/*
- * Theory of operation:
- *
- * In order to ensure FIFO ordering, implement a ticketlock.  Threads acquiring
- * the lock enqueue themselves by incrementing the tail index.  When the lock
- * is unlocked, the head is incremented and waiting threads are notified.
- *
- * Recursive locking does not take a ticket since the head is only incremented
- * when the outermost recursive caller unlocks.
- */
-void rfifolock_lock(RFifoLock *r)
-{
-    qemu_mutex_lock(&r->lock);
-
-    /* Take a ticket */
-    unsigned int ticket = r->tail++;
-
-    if (r->nesting > 0 && qemu_thread_is_self(&r->owner_thread)) {
-        r->tail--; /* put ticket back, we're nesting */
-    } else {
-        while (ticket != r->head) {
-            /* Invoke optional contention callback */
-            if (r->cb) {
-                r->cb(r->cb_opaque);
-            }
-            qemu_cond_wait(&r->cond, &r->lock);
-        }
-        qemu_thread_get_self(&r->owner_thread);
-    }
-
-    r->nesting++;
-    qemu_mutex_unlock(&r->lock);
-}
-
-void rfifolock_unlock(RFifoLock *r)
-{
-    qemu_mutex_lock(&r->lock);
-    assert(r->nesting > 0);
-    assert(qemu_thread_is_self(&r->owner_thread));
-    if (--r->nesting == 0) {
-        r->head++;
-        qemu_cond_broadcast(&r->cond);
-    }
-    qemu_mutex_unlock(&r->lock);
-}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 16/20] qemu-img: call aio_context_acquire/release around block job
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 16/20] qemu-img: call aio_context_acquire/release around block job Paolo Bonzini
@ 2016-10-19  0:54   ` Fam Zheng
  2016-10-19 11:05     ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Fam Zheng @ 2016-10-19  0:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, kwolf, stefanha

On Mon, 10/17 15:54, Paolo Bonzini wrote:
> This will be needed by bdrv_reopen_multiple, which calls
> bdrv_drain_all and thus will *release* the AioContext.

Looks okay, but I wonder how bdrv_drain_all releasing AioContext break anything?

Fam

> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: new [qemu-iotests]
> 
>  qemu-img.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 02c07b9..ad7c964 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -795,6 +795,7 @@ static void run_block_job(BlockJob *job, Error **errp)
>  {
>      AioContext *aio_context = blk_get_aio_context(job->blk);
>  
> +    aio_context_acquire(aio_context);
>      do {
>          aio_poll(aio_context, true);
>          qemu_progress_print(job->len ?
> @@ -802,6 +803,7 @@ static void run_block_job(BlockJob *job, Error **errp)
>      } while (!job->ready);
>  
>      block_job_complete_sync(job, errp);
> +    aio_context_release(aio_context);
>  
>      /* A block job may finish instantaneously without publishing any progress,
>       * so just signal completion here */
> @@ -819,6 +821,7 @@ static int img_commit(int argc, char **argv)
>      Error *local_err = NULL;
>      CommonBlockJobCBInfo cbi;
>      bool image_opts = false;
> +    AioContext *aio_context;
>  
>      fmt = NULL;
>      cache = BDRV_DEFAULT_CACHE;
> @@ -928,8 +931,11 @@ static int img_commit(int argc, char **argv)
>          .bs   = bs,
>      };
>  
> +    aio_context = bdrv_get_aio_context(bs);
> +    aio_context_acquire(aio_context);
>      commit_active_start("commit", bs, base_bs, 0, BLOCKDEV_ON_ERROR_REPORT,
>                          common_block_job_cb, &cbi, &local_err, false);
> +    aio_context_release(aio_context);
>      if (local_err) {
>          goto done;
>      }
> -- 
> 2.7.4
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock
  2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
                   ` (19 preceding siblings ...)
  2016-10-17 13:54 ` [Qemu-devel] [PATCH 20/20] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
@ 2016-10-19  0:55 ` Fam Zheng
  2016-10-19  0:55   ` Fam Zheng
  2016-10-27  1:30   ` Fam Zheng
  20 siblings, 2 replies; 26+ messages in thread
From: Fam Zheng @ 2016-10-19  0:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, kwolf, stefanha

On Mon, 10/17 15:54, Paolo Bonzini wrote:
> This patch reorganizes aio_poll callers to establish new rules for
> dataplane locking.  The idea is that I/O operations on a dataplane
> BDS (i.e. one where the AioContext is not the main one) do not call
> aio_poll anymore.  Instead, they wait for the operation to end in the
> other I/O thread, at which point the other I/O thread calls bdrv_wakeup
> to wake up the main thread.
> 
> With this change, only one thread runs aio_poll for an AioContext.
> While aio_context_acquire/release is still needed to protect the BDSes,
> it need not interrupt the other thread's event loop anymore, and therefore
> it does not need contention callbacks anymore.  Thus the patch can remove
> RFifoLock.  This fixes possible hangs in bdrv_drain_all, reproducible (for
> example) by unplugging a virtio-scsi-dataplane device while there is I/O
> going on for a virtio-blk-dataplane on the same I/O thread.
> 
> Patch 1 is a bugfix that I already posted.
> 
> Patch 2 makes blockjobs independent of aio_poll, the reason for which
> should be apparent from the explanation above.
> 
> Patch 3 is an independent mirror bugfix, that I wanted to submit separately
> but happens to fix a hang in COLO replication.  Like patch 1 I believe
> it's pre-existing and merely exposed by these patches.
> 
> Patches 4 to 10 introduce the infrastructure to wake up the main thread
> while bdrv_drain or other synchronous operations are running.  Patches 11
> to 16 do other changes to prepare for this.  Notably bdrv_drain_all
> needs to be called without holding any AioContext lock, so bdrv_reopen
> releases the lock temporarily (and callers of bdrv_reopen needs fixing).
> 
> Patch 17 then does the big change, after which there are just some
> cleanups left to do.
> 
> Paolo
> 
> Fam Zheng (1):
>   qed: Implement .bdrv_drain
> 
> Paolo Bonzini (19):
>   replication: interrupt failover if the main device is closed
>   blockjob: introduce .drain callback for jobs
>   mirror: use bdrv_drained_begin/bdrv_drained_end
>   block: add BDS field to count in-flight requests
>   block: change drain to look only at one child at a time
>   block: introduce BDRV_POLL_WHILE
>   nfs: move nfs_set_events out of the while loops
>   nfs: use BDRV_POLL_WHILE
>   sheepdog: use BDRV_POLL_WHILE
>   aio: introduce qemu_get_current_aio_context
>   iothread: detach all block devices before stopping them
>   replication: pass BlockDriverState to reopen_backing_file
>   block: prepare bdrv_reopen_multiple to release AioContext
>   qemu-io: acquire AioContext
>   qemu-img: call aio_context_acquire/release around block job
>   block: only call aio_poll on the current thread's AioContext
>   iothread: release AioContext around aio_poll
>   qemu-thread: introduce QemuRecMutex
>   aio: convert from RFifoLock to QemuRecMutex

Modulo the one harmful question, series:

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock
  2016-10-19  0:55 ` [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Fam Zheng
@ 2016-10-19  0:55   ` Fam Zheng
  2016-10-27  1:30   ` Fam Zheng
  1 sibling, 0 replies; 26+ messages in thread
From: Fam Zheng @ 2016-10-19  0:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, kwolf, stefanha

On Wed, 10/19 08:55, Fam Zheng wrote:
> Modulo the one harmful question, series:

s/harmful/unharmful/, apparently.

Fam

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 16/20] qemu-img: call aio_context_acquire/release around block job
  2016-10-19  0:54   ` Fam Zheng
@ 2016-10-19 11:05     ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-10-19 11:05 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-devel, qemu-block, kwolf, stefanha



On 19/10/2016 02:54, Fam Zheng wrote:
> On Mon, 10/17 15:54, Paolo Bonzini wrote:
>> This will be needed by bdrv_reopen_multiple, which calls
>> bdrv_drain_all and thus will *release* the AioContext.
> 
> Looks okay, but I wonder how bdrv_drain_all releasing AioContext break anything?

If you don't acquire it first, you get an assertion failure for
releasing a mutex you haven't acquired yet.

Initially I made aio_context_acquire a nop for the default QEMU context,
but then decided it was a hack.

Paolo

> Fam
> 
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>         v1->v2: new [qemu-iotests]
>>
>>  qemu-img.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index 02c07b9..ad7c964 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -795,6 +795,7 @@ static void run_block_job(BlockJob *job, Error **errp)
>>  {
>>      AioContext *aio_context = blk_get_aio_context(job->blk);
>>  
>> +    aio_context_acquire(aio_context);
>>      do {
>>          aio_poll(aio_context, true);
>>          qemu_progress_print(job->len ?
>> @@ -802,6 +803,7 @@ static void run_block_job(BlockJob *job, Error **errp)
>>      } while (!job->ready);
>>  
>>      block_job_complete_sync(job, errp);
>> +    aio_context_release(aio_context);
>>  
>>      /* A block job may finish instantaneously without publishing any progress,
>>       * so just signal completion here */
>> @@ -819,6 +821,7 @@ static int img_commit(int argc, char **argv)
>>      Error *local_err = NULL;
>>      CommonBlockJobCBInfo cbi;
>>      bool image_opts = false;
>> +    AioContext *aio_context;
>>  
>>      fmt = NULL;
>>      cache = BDRV_DEFAULT_CACHE;
>> @@ -928,8 +931,11 @@ static int img_commit(int argc, char **argv)
>>          .bs   = bs,
>>      };
>>  
>> +    aio_context = bdrv_get_aio_context(bs);
>> +    aio_context_acquire(aio_context);
>>      commit_active_start("commit", bs, base_bs, 0, BLOCKDEV_ON_ERROR_REPORT,
>>                          common_block_job_cb, &cbi, &local_err, false);
>> +    aio_context_release(aio_context);
>>      if (local_err) {
>>          goto done;
>>      }
>> -- 
>> 2.7.4
>>
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock
  2016-10-19  0:55 ` [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Fam Zheng
  2016-10-19  0:55   ` Fam Zheng
@ 2016-10-27  1:30   ` Fam Zheng
  1 sibling, 0 replies; 26+ messages in thread
From: Fam Zheng @ 2016-10-27  1:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, stefanha, qemu-devel, qemu-block

On Wed, 10/19 08:55, Fam Zheng wrote:
> Reviewed-by: Fam Zheng <famz@redhat.com>

FWIW in merging it:

Acked-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-10-27  1:30 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-17 13:54 [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 01/20] replication: interrupt failover if the main device is closed Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 02/20] blockjob: introduce .drain callback for jobs Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 03/20] mirror: use bdrv_drained_begin/bdrv_drained_end Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 04/20] block: add BDS field to count in-flight requests Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 05/20] block: change drain to look only at one child at a time Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 06/20] qed: Implement .bdrv_drain Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 07/20] block: introduce BDRV_POLL_WHILE Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 08/20] nfs: move nfs_set_events out of the while loops Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 09/20] nfs: use BDRV_POLL_WHILE Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 10/20] sheepdog: " Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 11/20] aio: introduce qemu_get_current_aio_context Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 12/20] iothread: detach all block devices before stopping them Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 13/20] replication: pass BlockDriverState to reopen_backing_file Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 14/20] block: prepare bdrv_reopen_multiple to release AioContext Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 15/20] qemu-io: acquire AioContext Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 16/20] qemu-img: call aio_context_acquire/release around block job Paolo Bonzini
2016-10-19  0:54   ` Fam Zheng
2016-10-19 11:05     ` Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 17/20] block: only call aio_poll on the current thread's AioContext Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 18/20] iothread: release AioContext around aio_poll Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 19/20] qemu-thread: introduce QemuRecMutex Paolo Bonzini
2016-10-17 13:54 ` [Qemu-devel] [PATCH 20/20] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
2016-10-19  0:55 ` [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock Fam Zheng
2016-10-19  0:55   ` Fam Zheng
2016-10-27  1:30   ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.