[Qemu-devel] [PATCH 04/16] aio: only call aio_poll_internal from iothread

From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Cc: stefanha@redhat.com
Subject: [Qemu-devel] [PATCH 04/16] aio: only call aio_poll_internal from iothread
Date: Fri, 15 Jan 2016 16:12:07 +0100	[thread overview]
Message-ID: <1452870739-28484-5-git-send-email-pbonzini@redhat.com> (raw)
In-Reply-To: <1452870739-28484-1-git-send-email-pbonzini@redhat.com>

aio_poll is not thread safe; it can report progress incorrectly when
called from the main thread.  The bug remains latent as long as
all of it is called within aio_context_acquire/aio_context_release,
but this will change soon.

The details of the bug are pretty simple, but fixing it in an
efficient way is thorny.  There are plenty of comments and formal
models in the patch, so I will refer to it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c                         | 216 +++++++++++++++++++++++++++++++++++++++-
 block/io.c                      |  14 ++-
 docs/aio_poll_drain.promela     | 210 ++++++++++++++++++++++++++++++++++++++
 docs/aio_poll_drain_bug.promela | 158 +++++++++++++++++++++++++++++
 docs/aio_poll_sync_io.promela   |  88 ++++++++++++++++
 docs/multiple-iothreads.txt     |  20 ++--
 include/block/aio.h             |  12 ++-
 stubs/iothread-lock.c           |   5 +
 8 files changed, 701 insertions(+), 22 deletions(-)
 create mode 100644 docs/aio_poll_drain.promela
 create mode 100644 docs/aio_poll_drain_bug.promela
 create mode 100644 docs/aio_poll_sync_io.promela

diff --git a/async.c b/async.c
index 856aa75..29f2845 100644
--- a/async.c
+++ b/async.c
@@ -299,12 +299,224 @@ void aio_notify_accept(AioContext *ctx)
     }
 }
 
+/* aio_poll_internal is not thread-safe; it only reports progress
+ * correctly when called from one thread, because it has no
+ * history of what happened in different threads.  When called
+ * from two threads, there is a race:
+ *
+ *      main thread                       I/O thread
+ *      -----------------------           --------------------------
+ *      blk_drain
+ *        bdrv_requests_pending -> true
+ *                                        aio_poll_internal
+ *                                          process last request
+ *        aio_poll_internal
+ *
+ * Now aio_poll_internal will never exit, because there is no pending
+ * I/O on the AioContext.
+ *
+ * Therefore, aio_poll is a wrapper around aio_poll_internal that allows
+ * usage from _two_ threads: the I/O thread of course, and the main thread.
+ * When called from the main thread, aio_poll just asks the I/O thread
+ * for a nudge as soon as the next call to aio_poll is complete.
+ * Because we use QemuEvent, and QemuEvent supports a single consumer
+ * only, this only works when the calling thread holds the big QEMU lock.
+ *
+ * Because aio_poll is used in a loop, spurious wakeups are okay.
+ * Therefore, the I/O thread calls qemu_event_set very liberally
+ * (it helps that qemu_event_set is cheap on an already-set event).
+ * generally used in a loop, it's okay to have spurious wakeups.
+ * Similarly it is okay to return true when no progress was made
+ * (as long as this doesn't happen forever, or you get livelock).
+ *
+ * The important thing is that you need to report progress from
+ * aio_poll(ctx, false) correctly.  This is complicated and the
+ * implementation builds on event_notifier_set plus
+ * aio_poll(ctx, true).
+ *
+ * The implementation consists of the following functions:
+ * - aio_poll_and_wake: runs in the iothread and takes care of
+ *   waking up the main thread after aio_poll_internal returns
+ *
+ * - aio_wait_iteration: implementation of aio_poll(ctx, true)
+ *   for the main thread
+ *
+ * - aio_force_iteration: implementation of aio_poll(ctx, false)
+ *   for the main thread
+ */
+static bool aio_poll_and_wake(AioContext *ctx, bool blocking)
+{
+    bool progress = aio_poll_internal(ctx, blocking);
+    smp_wmb();
+    if (progress) {
+        ctx->progress = true;
+    }
+
+    qemu_event_set(&ctx->sync_io_event);
+    return progress;
+}
+
+static bool aio_wait_iteration(AioContext *ctx)
+{
+    /* Wait until at least one iteration has passed since the main thread
+     * last called aio_poll.
+     */
+    qemu_event_wait(&ctx->sync_io_event);
+
+    /* Remember how aio_poll is used---in a loop, until a guard condition
+     * becomes true.   By resetting the event here, we ensure that the guard
+     * condition will be checked before the next call to qemu_event_wait.
+     * The above race is resolved as follows.
+     *
+     *      main thread                       I/O thread
+     *      -----------------------           --------------------------
+     *      blk_drain
+     *        bdrv_requests_pending -> true
+     *                                        aio_poll_internal
+     *                                          process last request
+     *                                        qemu_event_set
+     *        qemu_event_wait
+     */
+    qemu_event_reset(&ctx->sync_io_event);
+
+    return atomic_xchg(&ctx->progress, false);
+}
+
+/* The first idea is to simply implement non-blocking aio_poll as
+ *
+ *     // cannot use aio_notify; because of the notify_me optimization,
+ *     // aio_notify might do nothing and then poll will block
+ *     event_notifier_set(&ctx->notifier);
+ *     progress = aio_wait_iteration(ctx);
+ *
+ * which doesn't work because of a relatively simple race.  Remember
+ * that spurious wakeups of aio_poll are common.  What can happen then is:
+ *
+ *     main thread                             I/O thread
+ *     ---------------------------             -------------------------
+ *                                             qemu_event_set();
+ *                                             ...
+ *                                             qemu_bh_schedule()
+ *     aio_poll(ctx, true)
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *                                             ctx->progress = true;
+ *         progress=xchg(ctx->progress, false)
+ *                                             qemu_event_set();
+ *     ...
+ *     aio_poll(ctx, false)
+ *       event_notifier_set()
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *       <<returns false>>
+ *
+ * aio_poll's contract ensures that *some* progress is made if possible;
+ * hence the bottom half should be executed before aio_poll(ctx, false)
+ * returns, and aio_poll should return true.  The failure happened because
+ * aio_poll_internal has not run at all since the last qemu_event_wait;
+ * the execution threads interleaved so that aio_poll(ctx, false)
+ * immediately returns false.
+ *
+ *
+ * The next idea then is to add a reset of the event in the non-blocking
+ * aio_poll.  This has the same problem, just a little harder to trigger:
+ *
+ *     main thread                             I/O thread
+ *     ---------------------------             -----------------------------
+ *                                             qemu_event_set();
+ *                                             qemu_bh_schedule()
+ *     aio_poll(ctx, true)
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *                                             ctx->progress = true;
+ *         progress=xchg(ctx->progress, false)
+ *     ...
+ *     aio_poll(ctx, false)
+ *       qemu_event_reset();
+ *                                             qemu_event_set();
+ *       event_notifier_set()
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *         progress=xchg(ctx->progress, false)
+ *       <<returns false>>
+ *
+ *
+ * Let's change approach and try running aio_wait_iteration *twice*.
+ * This doesn't work either, but for a different reason.  If the main
+ * thread is "slow", an entire execution of the loop happens in the I/O
+ * thread, and a preceding aio_poll(ctx, true) can leave ctx->progress
+ * equal to false.  But unlike the previous example, some progress was
+ * made *after* qemu_event_wait returned:
+ *
+ *     main thread                             I/O thread
+ *     ---------------------------             -----------------------------
+ *                                             qemu_event_set();
+ *                                             ...
+ *     aio_poll(ctx, true)
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *                                             ctx->progress = true;
+ *                                             qemu_event_set();
+ *                                             ...
+ *                                             qemu_bh_schedule()
+ *         progress=xchg(ctx->progress, false)
+ *     aio_poll(ctx, false)
+ *       event_notifier_set()
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *         progress=xchg(ctx->progress, false)
+ *                                             qemu_event_set();
+ *       event_notifier_set()
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *         progress|=xchg(ctx->progress, false)
+ *       <<returns false>>
+ *
+ * However, applying both workarounds solves the problems.  The initial
+ * qemu_event_reset removes leftovers of previously completed iterations
+ * (such as in the last example).  Then, the double iteration ensures
+ * at least one execution of the dispatch phase to happen.  The dispatch
+ * phase will then execute pending bottom halves and return progress
+ * correctly.
+ */
+static bool aio_force_iteration(AioContext *ctx)
+{
+    bool progress;
+    qemu_event_reset(&ctx->sync_io_event);
+
+    event_notifier_set(&ctx->notifier);
+    progress = aio_wait_iteration(ctx);
+
+    event_notifier_set(&ctx->notifier);
+    progress |= aio_wait_iteration(ctx);
+
+    return progress;
+}
+
 bool aio_poll(AioContext *ctx, bool blocking)
 {
-    assert(qemu_mutex_iothread_locked() ||
-           aio_context_in_iothread(ctx));
+    bool progress;
+
+    if (aio_context_in_iothread(ctx)) {
+        return aio_poll_and_wake(ctx, blocking);
+    }
 
-    return aio_poll_internal(ctx, blocking);
+    assert(qemu_mutex_iothread_locked());
+    aio_context_release(ctx);
+    if (blocking) {
+        progress = aio_wait_iteration(ctx);
+    } else {
+        progress = aio_force_iteration(ctx);
+    }
+    aio_context_acquire(ctx);
+    return progress;
 }
 
 static void aio_timerlist_notify(void *opaque)
diff --git a/block/io.c b/block/io.c
index 34118d8..d1f9a1e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -269,15 +269,19 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
  */
 void bdrv_drain(BlockDriverState *bs)
 {
-    bool busy = true;
+    bool busy;
 
     bdrv_drain_recurse(bs);
-    while (busy) {
+    do {
         /* Keep iterating */
          bdrv_flush_io_queue(bs);
-         busy = bdrv_requests_pending(bs);
-         busy |= aio_poll(bdrv_get_aio_context(bs), busy);
-    }
+         if (bdrv_requests_pending(bs)) {
+             aio_poll(bdrv_get_aio_context(bs), true);
+             busy = true;
+         } else {
+             busy = aio_poll(bdrv_get_aio_context(bs), false);
+         }
+    } while (busy);
 }
 
 /*
diff --git a/docs/aio_poll_drain.promela b/docs/aio_poll_drain.promela
new file mode 100644
index 0000000..302b333
--- /dev/null
+++ b/docs/aio_poll_drain.promela
@@ -0,0 +1,210 @@
+/*
+ * Demonstrate the behavior of blk_drain() when called from the
+ * iothread.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To simulate it:
+ *     spin -p docs/aio_poll_sync_io.promela
+ *
+ * To verify it:
+ *     spin -a docs/aio_poll_sync_io.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ *
+ * To verify it (with a bug planted):
+ *     spin -DBUG1 -a docs/aio_poll_sync_io.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ *
+ * (try BUG2/BUG3/BUG4 too)
+ */
+
+#define REQS         3
+#define REQS_INITIAL 1
+
+int req_submitted = 0;
+int req_completed = 0;
+int req_pending   = 0;
+
+
+/* This is a mutex.  */
+int mutex = 0;
+#define LOCK(mutex)   atomic { mutex == 0 -> mutex = 1; }
+#define UNLOCK(mutex) mutex = 0
+
+/* A QEMUBH (e.g. from thread-pool.c) */
+bool io_bh_scheduled;
+bool spawn_bh_scheduled;
+
+/* The AioContext's EventNotifier */
+bool aio_notify;
+
+/* The AioContext's QemuEvent.  Typically the first call to aio_poll
+ * produces a spurious wakeup, which is modeled by these initializers.
+ */
+bool progress = 1;
+bool sync_io_event = 1;
+
+proctype io_worker(int req_id)
+{
+    LOCK(mutex);
+        req_completed++;
+    UNLOCK(mutex);
+
+    io_bh_scheduled = 1;
+    aio_notify = 1;
+}
+
+#define SUBMIT_ONE_REQ {                         \
+       run io_worker(req_submitted);             \
+       req_submitted++;                          \
+       req_pending++;                            \
+    }
+
+active proctype iothread()
+{
+    bool old;
+    bool aio_poll_progress;
+
+    /* The iothread keeps calling aio_poll(..., true).  At the end
+     * of the simulation it is stuck here waiting for a request,
+     * so mark this as a valid end state.
+     */
+end:
+    do
+        :: aio_notify -> {
+            /* aio_notify_accept */
+            aio_notify = 0;
+
+            aio_poll_progress = 0;
+
+            /* Execute the "bottom half" which can spawn more requests */
+            atomic { old = spawn_bh_scheduled; spawn_bh_scheduled = 0; }
+            if
+                :: old -> {
+                    LOCK(mutex);
+                        if
+                           :: req_submitted < REQS -> SUBMIT_ONE_REQ
+                           :: else                 -> skip;
+                        fi;
+                    UNLOCK(mutex);
+                    aio_poll_progress = 1;
+                }
+                :: else -> skip;
+            fi;
+
+            /* Execute the bottom half which spawns the other bh */
+            atomic { old = io_bh_scheduled; io_bh_scheduled = 0; }
+            if
+                :: old -> {
+                    LOCK(mutex);
+                        req_pending = req_pending - req_completed;
+                        req_completed = 0;
+                        assert(req_pending >= 0);
+                    UNLOCK(mutex);
+                    aio_poll_progress = 1;
+                    spawn_bh_scheduled = 1;
+                    aio_notify = 1;
+                }
+                :: else -> skip;
+            fi;
+
+            /* The above is aio_poll_internal, this is aio_poll */
+            if
+                :: aio_poll_progress -> progress = 1;
+                :: else -> skip;
+            fi;
+            sync_io_event = 1;
+        }
+    od;
+}
+
+#define AIO_POLL_ITERATION             \
+    if                                 \
+        :: sync_io_event -> skip;      \
+    fi;                                \
+    sync_io_event = 0;                 \
+    atomic {                           \
+        aio_poll_result = aio_poll_result | progress;    \
+        progress = 0;                  \
+    }
+    
+#define AIO_POLL                       \
+    aio_poll_result = 0;               \
+    AIO_POLL_ITERATION
+
+#if defined BUG1
+/* Skip the nonblocking aio_poll altogether */
+#define AIO_POLL_NONBLOCKING aio_poll_result = 0
+
+#elif defined BUG2
+/* Simple but wrong implementation */
+#define AIO_POLL_NONBLOCKING           \
+    aio_poll_result = 0;               \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION
+
+#elif defined BUG3
+/* Workaround 1 is not enough */
+#define AIO_POLL_NONBLOCKING           \
+    sync_io_event = 0;                 \
+    aio_poll_result = 0;               \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION
+
+#elif defined BUG4
+/* Workaround 2 is not enough */
+#define AIO_POLL_NONBLOCKING           \
+    aio_poll_result = 0;               \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION                 \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION
+
+#else
+/* Combining them works! */
+#define AIO_POLL_NONBLOCKING           \
+    sync_io_event = 0;                 \
+    aio_poll_result = 0;               \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION                 \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION
+#endif
+
+active proctype mainthread()
+{
+    bool aio_poll_result;
+    bool busy = true;
+
+    LOCK(mutex);
+        /* Start with a few requests */
+        do
+           :: req_submitted < REQS_INITIAL -> SUBMIT_ONE_REQ
+           :: else                         -> break;
+        od;
+    UNLOCK(mutex);
+
+    do
+        :: !busy -> break;
+
+        :: else -> {
+            /* This is blk_drain */
+            LOCK(mutex);
+                busy = req_pending > 0;
+            UNLOCK(mutex);
+            if
+               :: busy -> AIO_POLL;
+               :: else -> AIO_POLL_NONBLOCKING; busy = aio_poll_result;
+            fi;
+        }
+    od;
+
+    LOCK(mutex);
+        assert(!req_pending && req_submitted == REQS);
+    UNLOCK(mutex);
+}
diff --git a/docs/aio_poll_drain_bug.promela b/docs/aio_poll_drain_bug.promela
new file mode 100644
index 0000000..26f351d
--- /dev/null
+++ b/docs/aio_poll_drain_bug.promela
@@ -0,0 +1,158 @@
+/*
+ * Demonstrate the behavior of blk_drain() when called from the
+ * iothread.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To simulate it:
+ *     spin -p docs/aio_poll_sync_io.promela
+ *
+ * To verify it:
+ *     spin -a docs/aio_poll_sync_io.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ *
+ * Note that the model verifies successfully, but it is actually buggy.
+ * The reason is that the bottom half structure in the iothread is
+ * overly simplified.
+ *
+ * This explains why blk_drain() needs to use non-blocking aio_poll,
+ * for whose modeling see aio_poll_drain.promela.
+ */
+
+#define REQS         3
+#define REQS_INITIAL 1
+
+int req_submitted = 0;
+int req_completed = 0;
+int req_pending   = 0;
+
+
+/* This is a mutex.  */
+int mutex = 0;
+#define LOCK(mutex)   atomic { mutex == 0 -> mutex = 1; }
+#define UNLOCK(mutex) mutex = 0
+
+/* A QEMUBH (e.g. from thread-pool.c) */
+bool io_bh_scheduled;
+
+/* The AioContext's EventNotifier */
+bool aio_notify;
+
+/* The AioContext's QemuEvent.  Typically the first call to aio_poll
+ * produces a spurious wakeup.  In aio_poll_drain.promela, the
+ * initializers are set to 1 to model this.  This model is buggy
+ * so we skip this additional torture...
+ */
+bool progress;
+bool sync_io_event;
+
+proctype io_worker(int req_id)
+{
+    LOCK(mutex);
+        req_completed++;
+    UNLOCK(mutex);
+
+    io_bh_scheduled = 1;
+    aio_notify = 1;
+}
+
+#define SUBMIT_ONE_REQ {                         \
+       run io_worker(req_submitted);             \
+       req_submitted++;                          \
+       req_pending++;                            \
+    }
+
+active proctype iothread()
+{
+    bool old;
+    bool aio_poll_progress;
+
+    /* The iothread keeps calling aio_poll(..., true).  At the end
+     * of the simulation it is stuck here waiting for a request,
+     * so mark this as a valid end state.
+     */
+end:
+    do
+        :: aio_notify -> {
+            /* aio_notify_accept */
+            aio_notify = 0;
+
+            aio_poll_progress = 0;
+
+            atomic { old = io_bh_scheduled; io_bh_scheduled = 0; }
+            if
+                :: old -> {
+                    LOCK(mutex);
+                        req_pending = req_pending - req_completed;
+                        req_completed = 0;
+                        assert(req_pending >= 0);
+
+                        if
+                           :: req_submitted < REQS -> SUBMIT_ONE_REQ
+                           :: else                 -> skip;
+                        fi;
+                    UNLOCK(mutex);
+                    aio_poll_progress = 1;
+                }
+                :: else -> skip;
+            fi;
+
+            /* The above is aio_poll_internal, this is aio_poll */
+            if
+                :: aio_poll_progress -> progress = 1;
+                :: else -> skip;
+            fi;
+            sync_io_event = 1;
+        }
+    od;
+}
+
+#define AIO_POLL_ITERATION             \
+    if                                 \
+        :: sync_io_event -> skip;      \
+    fi;                                \
+    sync_io_event = 0;                 \
+    atomic {                           \
+        aio_poll_result = aio_poll_result | progress;    \
+        progress = 0;                  \
+    }
+
+#define AIO_POLL                       \
+    aio_poll_result = 0;               \
+    AIO_POLL_ITERATION
+
+active proctype mainthread()
+{
+    bool aio_poll_result;
+    bool busy = true;
+
+    LOCK(mutex);
+        /* Start with a few requests */
+        do
+           :: req_submitted < REQS_INITIAL -> SUBMIT_ONE_REQ
+           :: else                         -> break;
+        od;
+    UNLOCK(mutex);
+
+    do
+        :: true -> {
+            /* This is blk_drain */
+            LOCK(mutex);
+                busy = req_pending > 0;
+            UNLOCK(mutex);
+            if
+               :: busy -> AIO_POLL;
+               :: else -> break;
+            fi;
+        }
+    od;
+
+    LOCK(mutex);
+        assert(!req_pending && req_submitted == REQS);
+    UNLOCK(mutex);
+}
+
diff --git a/docs/aio_poll_sync_io.promela b/docs/aio_poll_sync_io.promela
new file mode 100644
index 0000000..3851d8e
--- /dev/null
+++ b/docs/aio_poll_sync_io.promela
@@ -0,0 +1,88 @@
+/*
+ * Demonstrate the behavior of aio_poll(ctx, true) when called from the
+ * iothread.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To simulate it:
+ *     spin -p docs/aio_poll_sync_io.promela
+ *
+ * To verify it:
+ *     spin -a docs/aio_poll_sync_io.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ *
+ * To verify it (with a bug planted in the model):
+ *     spin -a -DBUG docs/aio_poll_sync_io.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ */
+
+/* A QEMUBH (e.g. from thread-pool.c) */
+bool io_bh_scheduled;
+
+/* The AioContext's EventNotifier */
+bool aio_notify;
+
+/* The "guard condition" for synchronous I/O */
+bool io_done;
+
+/* The AioContext's QemuEvent */
+bool sync_io_event;
+
+active proctype io_worker()
+{
+    /* qemu_bh_schedule */
+    io_bh_scheduled = 1;
+    aio_notify = 1;
+}
+
+active proctype iothread()
+{
+    bool old;
+
+    /* The iothread keeps calling aio_poll(..., true).  At the end
+     * of the simulation it is stuck here waiting for a request,
+     * so mark this as a valid end state.
+     */
+end:
+    do
+        :: aio_notify -> {
+            /* aio_notify_accept */
+            aio_notify = 0;
+
+            /* Execute the "bottom half" */
+
+            atomic { old = io_bh_scheduled; io_bh_scheduled = 0; }
+            if
+                :: old -> { io_done = 1; }
+                :: else -> skip;
+            fi;
+
+            /* The above is aio_poll_internal, this is aio_poll */
+#ifndef BUG
+            sync_io_event = 1;
+#endif
+        }
+    od;
+}
+
+active proctype mainthread()
+{
+    do
+        :: io_done -> break;
+
+        :: else -> {
+            /* qemu_event_wait */
+            if
+               :: sync_io_event -> skip;
+            fi;
+
+            /* qemu_event_reset */
+            sync_io_event = 0;
+        }
+    od;
+}
diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
index 40b8419..ac3e74c 100644
--- a/docs/multiple-iothreads.txt
+++ b/docs/multiple-iothreads.txt
@@ -88,17 +88,15 @@ descriptors, event notifiers, timers, or BHs across threads:
 notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
 necessary.
 
-2. Other threads wishing to access the AioContext must use
-aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
-context is acquired no other thread can access it or run event loop iterations
-in this AioContext.
-
-aio_context_acquire()/aio_context_release() calls may be nested.  This
-means you can call them if you're not sure whether #1 applies.
-
-There is currently no lock ordering rule if a thread needs to acquire multiple
-AioContexts simultaneously.  Therefore, it is only safe for code holding the
-QEMU global mutex to acquire other AioContexts.
+2. Other threads wishing to access the AioContext must take the QEMU global
+mutex *as well as* call aio_context_acquire()/aio_context_release() for
+mutual exclusion.  The QEMU global mutex must be taken outside the call
+to aio_context_acquire().
+
+No lock ordering rule is necessary if a thread needs to acquire multiple
+AioContexts simultaneously.  Because the IOThreads won't ever acquire
+multiple AioContexts, it is always safe for the owner of the QEMU global
+mutex to acquire any number of them.
 
 Side note: the best way to schedule a function call across threads is to create
 a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
diff --git a/include/block/aio.h b/include/block/aio.h
index 986be97..191bd3e 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -129,6 +129,12 @@ struct AioContext {
     int epollfd;
     bool epoll_enabled;
     bool epoll_available;
+
+    /* Returns whether there has been progress on the AioContext since
+     * the last time this field was set to false.
+     */
+    bool progress;
+    QemuEvent sync_io_event;
 };
 
 /**
@@ -296,9 +302,9 @@ bool aio_poll_internal(AioContext *ctx, bool blocking);
 /* Progress in completing AIO work to occur.  This can issue new pending
  * aio as a result of executing I/O completion or bh callbacks.
  *
- * Return whether any progress was made by executing AIO or bottom half
- * handlers.  If @blocking == true, this should always be true except
- * if someone called aio_notify.
+ * Return whether any progress was made since the last call to aio_poll
+ * in the execution of AIO or bottom half handlers.  If @blocking ==
+ * true, this should always be true except if someone called aio_notify.
  *
  * If there are no pending bottom halves, but there are pending AIO
  * operations, it may not be possible to make any progress without
diff --git a/stubs/iothread-lock.c b/stubs/iothread-lock.c
index dda6f6b..171e92f 100644
--- a/stubs/iothread-lock.c
+++ b/stubs/iothread-lock.c
@@ -13,3 +13,8 @@ void qemu_mutex_lock_iothread(void)
 void qemu_mutex_unlock_iothread(void)
 {
 }
+
+bool aio_context_in_iothread(AioContext *ctx)
+{
+    return true;
+}
-- 
2.5.0