All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Cc: stefanha@redhat.com
Subject: [Qemu-devel] [PATCH 04/16] aio: only call aio_poll_internal from iothread
Date: Fri, 15 Jan 2016 16:12:07 +0100	[thread overview]
Message-ID: <1452870739-28484-5-git-send-email-pbonzini@redhat.com> (raw)
In-Reply-To: <1452870739-28484-1-git-send-email-pbonzini@redhat.com>

aio_poll is not thread safe; it can report progress incorrectly when
called from the main thread.  The bug remains latent as long as
all of it is called within aio_context_acquire/aio_context_release,
but this will change soon.

The details of the bug are pretty simple, but fixing it in an
efficient way is thorny.  There are plenty of comments and formal
models in the patch, so I will refer to it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c                         | 216 +++++++++++++++++++++++++++++++++++++++-
 block/io.c                      |  14 ++-
 docs/aio_poll_drain.promela     | 210 ++++++++++++++++++++++++++++++++++++++
 docs/aio_poll_drain_bug.promela | 158 +++++++++++++++++++++++++++++
 docs/aio_poll_sync_io.promela   |  88 ++++++++++++++++
 docs/multiple-iothreads.txt     |  20 ++--
 include/block/aio.h             |  12 ++-
 stubs/iothread-lock.c           |   5 +
 8 files changed, 701 insertions(+), 22 deletions(-)
 create mode 100644 docs/aio_poll_drain.promela
 create mode 100644 docs/aio_poll_drain_bug.promela
 create mode 100644 docs/aio_poll_sync_io.promela

diff --git a/async.c b/async.c
index 856aa75..29f2845 100644
--- a/async.c
+++ b/async.c
@@ -299,12 +299,224 @@ void aio_notify_accept(AioContext *ctx)
     }
 }
 
+/* aio_poll_internal is not thread-safe; it only reports progress
+ * correctly when called from one thread, because it has no
+ * history of what happened in different threads.  When called
+ * from two threads, there is a race:
+ *
+ *      main thread                       I/O thread
+ *      -----------------------           --------------------------
+ *      blk_drain
+ *        bdrv_requests_pending -> true
+ *                                        aio_poll_internal
+ *                                          process last request
+ *        aio_poll_internal
+ *
+ * Now aio_poll_internal will never exit, because there is no pending
+ * I/O on the AioContext.
+ *
+ * Therefore, aio_poll is a wrapper around aio_poll_internal that allows
+ * usage from _two_ threads: the I/O thread of course, and the main thread.
+ * When called from the main thread, aio_poll just asks the I/O thread
+ * for a nudge as soon as the next call to aio_poll is complete.
+ * Because we use QemuEvent, and QemuEvent supports a single consumer
+ * only, this only works when the calling thread holds the big QEMU lock.
+ *
+ * Because aio_poll is used in a loop, spurious wakeups are okay.
+ * Therefore, the I/O thread calls qemu_event_set very liberally
+ * (it helps that qemu_event_set is cheap on an already-set event).
+ * generally used in a loop, it's okay to have spurious wakeups.
+ * Similarly it is okay to return true when no progress was made
+ * (as long as this doesn't happen forever, or you get livelock).
+ *
+ * The important thing is that you need to report progress from
+ * aio_poll(ctx, false) correctly.  This is complicated and the
+ * implementation builds on event_notifier_set plus
+ * aio_poll(ctx, true).
+ *
+ * The implementation consists of the following functions:
+ * - aio_poll_and_wake: runs in the iothread and takes care of
+ *   waking up the main thread after aio_poll_internal returns
+ *
+ * - aio_wait_iteration: implementation of aio_poll(ctx, true)
+ *   for the main thread
+ *
+ * - aio_force_iteration: implementation of aio_poll(ctx, false)
+ *   for the main thread
+ */
+static bool aio_poll_and_wake(AioContext *ctx, bool blocking)
+{
+    bool progress = aio_poll_internal(ctx, blocking);
+    smp_wmb();
+    if (progress) {
+        ctx->progress = true;
+    }
+
+    qemu_event_set(&ctx->sync_io_event);
+    return progress;
+}
+
+static bool aio_wait_iteration(AioContext *ctx)
+{
+    /* Wait until at least one iteration has passed since the main thread
+     * last called aio_poll.
+     */
+    qemu_event_wait(&ctx->sync_io_event);
+
+    /* Remember how aio_poll is used---in a loop, until a guard condition
+     * becomes true.   By resetting the event here, we ensure that the guard
+     * condition will be checked before the next call to qemu_event_wait.
+     * The above race is resolved as follows.
+     *
+     *      main thread                       I/O thread
+     *      -----------------------           --------------------------
+     *      blk_drain
+     *        bdrv_requests_pending -> true
+     *                                        aio_poll_internal
+     *                                          process last request
+     *                                        qemu_event_set
+     *        qemu_event_wait
+     */
+    qemu_event_reset(&ctx->sync_io_event);
+
+    return atomic_xchg(&ctx->progress, false);
+}
+
+/* The first idea is to simply implement non-blocking aio_poll as
+ *
+ *     // cannot use aio_notify; because of the notify_me optimization,
+ *     // aio_notify might do nothing and then poll will block
+ *     event_notifier_set(&ctx->notifier);
+ *     progress = aio_wait_iteration(ctx);
+ *
+ * which doesn't work because of a relatively simple race.  Remember
+ * that spurious wakeups of aio_poll are common.  What can happen then is:
+ *
+ *     main thread                             I/O thread
+ *     ---------------------------             -------------------------
+ *                                             qemu_event_set();
+ *                                             ...
+ *                                             qemu_bh_schedule()
+ *     aio_poll(ctx, true)
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *                                             ctx->progress = true;
+ *         progress=xchg(ctx->progress, false)
+ *                                             qemu_event_set();
+ *     ...
+ *     aio_poll(ctx, false)
+ *       event_notifier_set()
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *       <<returns false>>
+ *
+ * aio_poll's contract ensures that *some* progress is made if possible;
+ * hence the bottom half should be executed before aio_poll(ctx, false)
+ * returns, and aio_poll should return true.  The failure happened because
+ * aio_poll_internal has not run at all since the last qemu_event_wait;
+ * the execution threads interleaved so that aio_poll(ctx, false)
+ * immediately returns false.
+ *
+ *
+ * The next idea then is to add a reset of the event in the non-blocking
+ * aio_poll.  This has the same problem, just a little harder to trigger:
+ *
+ *     main thread                             I/O thread
+ *     ---------------------------             -----------------------------
+ *                                             qemu_event_set();
+ *                                             qemu_bh_schedule()
+ *     aio_poll(ctx, true)
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *                                             ctx->progress = true;
+ *         progress=xchg(ctx->progress, false)
+ *     ...
+ *     aio_poll(ctx, false)
+ *       qemu_event_reset();
+ *                                             qemu_event_set();
+ *       event_notifier_set()
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *         progress=xchg(ctx->progress, false)
+ *       <<returns false>>
+ *
+ *
+ * Let's change approach and try running aio_wait_iteration *twice*.
+ * This doesn't work either, but for a different reason.  If the main
+ * thread is "slow", an entire execution of the loop happens in the I/O
+ * thread, and a preceding aio_poll(ctx, true) can leave ctx->progress
+ * equal to false.  But unlike the previous example, some progress was
+ * made *after* qemu_event_wait returned:
+ *
+ *     main thread                             I/O thread
+ *     ---------------------------             -----------------------------
+ *                                             qemu_event_set();
+ *                                             ...
+ *     aio_poll(ctx, true)
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *                                             ctx->progress = true;
+ *                                             qemu_event_set();
+ *                                             ...
+ *                                             qemu_bh_schedule()
+ *         progress=xchg(ctx->progress, false)
+ *     aio_poll(ctx, false)
+ *       event_notifier_set()
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *         progress=xchg(ctx->progress, false)
+ *                                             qemu_event_set();
+ *       event_notifier_set()
+ *       aio_wait_iteration
+ *         qemu_event_wait();
+ *         qemu_event_reset();
+ *         progress|=xchg(ctx->progress, false)
+ *       <<returns false>>
+ *
+ * However, applying both workarounds solves the problems.  The initial
+ * qemu_event_reset removes leftovers of previously completed iterations
+ * (such as in the last example).  Then, the double iteration ensures
+ * at least one execution of the dispatch phase to happen.  The dispatch
+ * phase will then execute pending bottom halves and return progress
+ * correctly.
+ */
+static bool aio_force_iteration(AioContext *ctx)
+{
+    bool progress;
+    qemu_event_reset(&ctx->sync_io_event);
+
+    event_notifier_set(&ctx->notifier);
+    progress = aio_wait_iteration(ctx);
+
+    event_notifier_set(&ctx->notifier);
+    progress |= aio_wait_iteration(ctx);
+
+    return progress;
+}
+
 bool aio_poll(AioContext *ctx, bool blocking)
 {
-    assert(qemu_mutex_iothread_locked() ||
-           aio_context_in_iothread(ctx));
+    bool progress;
+
+    if (aio_context_in_iothread(ctx)) {
+        return aio_poll_and_wake(ctx, blocking);
+    }
 
-    return aio_poll_internal(ctx, blocking);
+    assert(qemu_mutex_iothread_locked());
+    aio_context_release(ctx);
+    if (blocking) {
+        progress = aio_wait_iteration(ctx);
+    } else {
+        progress = aio_force_iteration(ctx);
+    }
+    aio_context_acquire(ctx);
+    return progress;
 }
 
 static void aio_timerlist_notify(void *opaque)
diff --git a/block/io.c b/block/io.c
index 34118d8..d1f9a1e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -269,15 +269,19 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
  */
 void bdrv_drain(BlockDriverState *bs)
 {
-    bool busy = true;
+    bool busy;
 
     bdrv_drain_recurse(bs);
-    while (busy) {
+    do {
         /* Keep iterating */
          bdrv_flush_io_queue(bs);
-         busy = bdrv_requests_pending(bs);
-         busy |= aio_poll(bdrv_get_aio_context(bs), busy);
-    }
+         if (bdrv_requests_pending(bs)) {
+             aio_poll(bdrv_get_aio_context(bs), true);
+             busy = true;
+         } else {
+             busy = aio_poll(bdrv_get_aio_context(bs), false);
+         }
+    } while (busy);
 }
 
 /*
diff --git a/docs/aio_poll_drain.promela b/docs/aio_poll_drain.promela
new file mode 100644
index 0000000..302b333
--- /dev/null
+++ b/docs/aio_poll_drain.promela
@@ -0,0 +1,210 @@
+/*
+ * Demonstrate the behavior of blk_drain() when called from the
+ * iothread.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To simulate it:
+ *     spin -p docs/aio_poll_sync_io.promela
+ *
+ * To verify it:
+ *     spin -a docs/aio_poll_sync_io.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ *
+ * To verify it (with a bug planted):
+ *     spin -DBUG1 -a docs/aio_poll_sync_io.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ *
+ * (try BUG2/BUG3/BUG4 too)
+ */
+
+#define REQS         3
+#define REQS_INITIAL 1
+
+int req_submitted = 0;
+int req_completed = 0;
+int req_pending   = 0;
+
+
+/* This is a mutex.  */
+int mutex = 0;
+#define LOCK(mutex)   atomic { mutex == 0 -> mutex = 1; }
+#define UNLOCK(mutex) mutex = 0
+
+/* A QEMUBH (e.g. from thread-pool.c) */
+bool io_bh_scheduled;
+bool spawn_bh_scheduled;
+
+/* The AioContext's EventNotifier */
+bool aio_notify;
+
+/* The AioContext's QemuEvent.  Typically the first call to aio_poll
+ * produces a spurious wakeup, which is modeled by these initializers.
+ */
+bool progress = 1;
+bool sync_io_event = 1;
+
+proctype io_worker(int req_id)
+{
+    LOCK(mutex);
+        req_completed++;
+    UNLOCK(mutex);
+
+    io_bh_scheduled = 1;
+    aio_notify = 1;
+}
+
+#define SUBMIT_ONE_REQ {                         \
+       run io_worker(req_submitted);             \
+       req_submitted++;                          \
+       req_pending++;                            \
+    }
+
+active proctype iothread()
+{
+    bool old;
+    bool aio_poll_progress;
+
+    /* The iothread keeps calling aio_poll(..., true).  At the end
+     * of the simulation it is stuck here waiting for a request,
+     * so mark this as a valid end state.
+     */
+end:
+    do
+        :: aio_notify -> {
+            /* aio_notify_accept */
+            aio_notify = 0;
+
+            aio_poll_progress = 0;
+
+            /* Execute the "bottom half" which can spawn more requests */
+            atomic { old = spawn_bh_scheduled; spawn_bh_scheduled = 0; }
+            if
+                :: old -> {
+                    LOCK(mutex);
+                        if
+                           :: req_submitted < REQS -> SUBMIT_ONE_REQ
+                           :: else                 -> skip;
+                        fi;
+                    UNLOCK(mutex);
+                    aio_poll_progress = 1;
+                }
+                :: else -> skip;
+            fi;
+
+            /* Execute the bottom half which spawns the other bh */
+            atomic { old = io_bh_scheduled; io_bh_scheduled = 0; }
+            if
+                :: old -> {
+                    LOCK(mutex);
+                        req_pending = req_pending - req_completed;
+                        req_completed = 0;
+                        assert(req_pending >= 0);
+                    UNLOCK(mutex);
+                    aio_poll_progress = 1;
+                    spawn_bh_scheduled = 1;
+                    aio_notify = 1;
+                }
+                :: else -> skip;
+            fi;
+
+            /* The above is aio_poll_internal, this is aio_poll */
+            if
+                :: aio_poll_progress -> progress = 1;
+                :: else -> skip;
+            fi;
+            sync_io_event = 1;
+        }
+    od;
+}
+
+#define AIO_POLL_ITERATION             \
+    if                                 \
+        :: sync_io_event -> skip;      \
+    fi;                                \
+    sync_io_event = 0;                 \
+    atomic {                           \
+        aio_poll_result = aio_poll_result | progress;    \
+        progress = 0;                  \
+    }
+    
+#define AIO_POLL                       \
+    aio_poll_result = 0;               \
+    AIO_POLL_ITERATION
+
+#if defined BUG1
+/* Skip the nonblocking aio_poll altogether */
+#define AIO_POLL_NONBLOCKING aio_poll_result = 0
+
+#elif defined BUG2
+/* Simple but wrong implementation */
+#define AIO_POLL_NONBLOCKING           \
+    aio_poll_result = 0;               \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION
+
+#elif defined BUG3
+/* Workaround 1 is not enough */
+#define AIO_POLL_NONBLOCKING           \
+    sync_io_event = 0;                 \
+    aio_poll_result = 0;               \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION
+
+#elif defined BUG4
+/* Workaround 2 is not enough */
+#define AIO_POLL_NONBLOCKING           \
+    aio_poll_result = 0;               \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION                 \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION
+
+#else
+/* Combining them works! */
+#define AIO_POLL_NONBLOCKING           \
+    sync_io_event = 0;                 \
+    aio_poll_result = 0;               \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION                 \
+    aio_notify = 1;                    \
+    AIO_POLL_ITERATION
+#endif
+
+active proctype mainthread()
+{
+    bool aio_poll_result;
+    bool busy = true;
+
+    LOCK(mutex);
+        /* Start with a few requests */
+        do
+           :: req_submitted < REQS_INITIAL -> SUBMIT_ONE_REQ
+           :: else                         -> break;
+        od;
+    UNLOCK(mutex);
+
+    do
+        :: !busy -> break;
+
+        :: else -> {
+            /* This is blk_drain */
+            LOCK(mutex);
+                busy = req_pending > 0;
+            UNLOCK(mutex);
+            if
+               :: busy -> AIO_POLL;
+               :: else -> AIO_POLL_NONBLOCKING; busy = aio_poll_result;
+            fi;
+        }
+    od;
+
+    LOCK(mutex);
+        assert(!req_pending && req_submitted == REQS);
+    UNLOCK(mutex);
+}
diff --git a/docs/aio_poll_drain_bug.promela b/docs/aio_poll_drain_bug.promela
new file mode 100644
index 0000000..26f351d
--- /dev/null
+++ b/docs/aio_poll_drain_bug.promela
@@ -0,0 +1,158 @@
+/*
+ * Demonstrate the behavior of blk_drain() when called from the
+ * iothread.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To simulate it:
+ *     spin -p docs/aio_poll_sync_io.promela
+ *
+ * To verify it:
+ *     spin -a docs/aio_poll_sync_io.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ *
+ * Note that the model verifies successfully, but it is actually buggy.
+ * The reason is that the bottom half structure in the iothread is
+ * overly simplified.
+ *
+ * This explains why blk_drain() needs to use non-blocking aio_poll,
+ * for whose modeling see aio_poll_drain.promela.
+ */
+
+#define REQS         3
+#define REQS_INITIAL 1
+
+int req_submitted = 0;
+int req_completed = 0;
+int req_pending   = 0;
+
+
+/* This is a mutex.  */
+int mutex = 0;
+#define LOCK(mutex)   atomic { mutex == 0 -> mutex = 1; }
+#define UNLOCK(mutex) mutex = 0
+
+/* A QEMUBH (e.g. from thread-pool.c) */
+bool io_bh_scheduled;
+
+/* The AioContext's EventNotifier */
+bool aio_notify;
+
+/* The AioContext's QemuEvent.  Typically the first call to aio_poll
+ * produces a spurious wakeup.  In aio_poll_drain.promela, the
+ * initializers are set to 1 to model this.  This model is buggy
+ * so we skip this additional torture...
+ */
+bool progress;
+bool sync_io_event;
+
+proctype io_worker(int req_id)
+{
+    LOCK(mutex);
+        req_completed++;
+    UNLOCK(mutex);
+
+    io_bh_scheduled = 1;
+    aio_notify = 1;
+}
+
+#define SUBMIT_ONE_REQ {                         \
+       run io_worker(req_submitted);             \
+       req_submitted++;                          \
+       req_pending++;                            \
+    }
+
+active proctype iothread()
+{
+    bool old;
+    bool aio_poll_progress;
+
+    /* The iothread keeps calling aio_poll(..., true).  At the end
+     * of the simulation it is stuck here waiting for a request,
+     * so mark this as a valid end state.
+     */
+end:
+    do
+        :: aio_notify -> {
+            /* aio_notify_accept */
+            aio_notify = 0;
+
+            aio_poll_progress = 0;
+
+            atomic { old = io_bh_scheduled; io_bh_scheduled = 0; }
+            if
+                :: old -> {
+                    LOCK(mutex);
+                        req_pending = req_pending - req_completed;
+                        req_completed = 0;
+                        assert(req_pending >= 0);
+
+                        if
+                           :: req_submitted < REQS -> SUBMIT_ONE_REQ
+                           :: else                 -> skip;
+                        fi;
+                    UNLOCK(mutex);
+                    aio_poll_progress = 1;
+                }
+                :: else -> skip;
+            fi;
+
+            /* The above is aio_poll_internal, this is aio_poll */
+            if
+                :: aio_poll_progress -> progress = 1;
+                :: else -> skip;
+            fi;
+            sync_io_event = 1;
+        }
+    od;
+}
+
+#define AIO_POLL_ITERATION             \
+    if                                 \
+        :: sync_io_event -> skip;      \
+    fi;                                \
+    sync_io_event = 0;                 \
+    atomic {                           \
+        aio_poll_result = aio_poll_result | progress;    \
+        progress = 0;                  \
+    }
+
+#define AIO_POLL                       \
+    aio_poll_result = 0;               \
+    AIO_POLL_ITERATION
+
+active proctype mainthread()
+{
+    bool aio_poll_result;
+    bool busy = true;
+
+    LOCK(mutex);
+        /* Start with a few requests */
+        do
+           :: req_submitted < REQS_INITIAL -> SUBMIT_ONE_REQ
+           :: else                         -> break;
+        od;
+    UNLOCK(mutex);
+
+    do
+        :: true -> {
+            /* This is blk_drain */
+            LOCK(mutex);
+                busy = req_pending > 0;
+            UNLOCK(mutex);
+            if
+               :: busy -> AIO_POLL;
+               :: else -> break;
+            fi;
+        }
+    od;
+
+    LOCK(mutex);
+        assert(!req_pending && req_submitted == REQS);
+    UNLOCK(mutex);
+}
+
diff --git a/docs/aio_poll_sync_io.promela b/docs/aio_poll_sync_io.promela
new file mode 100644
index 0000000..3851d8e
--- /dev/null
+++ b/docs/aio_poll_sync_io.promela
@@ -0,0 +1,88 @@
+/*
+ * Demonstrate the behavior of aio_poll(ctx, true) when called from the
+ * iothread.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To simulate it:
+ *     spin -p docs/aio_poll_sync_io.promela
+ *
+ * To verify it:
+ *     spin -a docs/aio_poll_sync_io.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ *
+ * To verify it (with a bug planted in the model):
+ *     spin -a -DBUG docs/aio_poll_sync_io.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ */
+
+/* A QEMUBH (e.g. from thread-pool.c) */
+bool io_bh_scheduled;
+
+/* The AioContext's EventNotifier */
+bool aio_notify;
+
+/* The "guard condition" for synchronous I/O */
+bool io_done;
+
+/* The AioContext's QemuEvent */
+bool sync_io_event;
+
+active proctype io_worker()
+{
+    /* qemu_bh_schedule */
+    io_bh_scheduled = 1;
+    aio_notify = 1;
+}
+
+active proctype iothread()
+{
+    bool old;
+
+    /* The iothread keeps calling aio_poll(..., true).  At the end
+     * of the simulation it is stuck here waiting for a request,
+     * so mark this as a valid end state.
+     */
+end:
+    do
+        :: aio_notify -> {
+            /* aio_notify_accept */
+            aio_notify = 0;
+
+            /* Execute the "bottom half" */
+
+            atomic { old = io_bh_scheduled; io_bh_scheduled = 0; }
+            if
+                :: old -> { io_done = 1; }
+                :: else -> skip;
+            fi;
+
+            /* The above is aio_poll_internal, this is aio_poll */
+#ifndef BUG
+            sync_io_event = 1;
+#endif
+        }
+    od;
+}
+
+active proctype mainthread()
+{
+    do
+        :: io_done -> break;
+
+        :: else -> {
+            /* qemu_event_wait */
+            if
+               :: sync_io_event -> skip;
+            fi;
+
+            /* qemu_event_reset */
+            sync_io_event = 0;
+        }
+    od;
+}
diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
index 40b8419..ac3e74c 100644
--- a/docs/multiple-iothreads.txt
+++ b/docs/multiple-iothreads.txt
@@ -88,17 +88,15 @@ descriptors, event notifiers, timers, or BHs across threads:
 notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
 necessary.
 
-2. Other threads wishing to access the AioContext must use
-aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
-context is acquired no other thread can access it or run event loop iterations
-in this AioContext.
-
-aio_context_acquire()/aio_context_release() calls may be nested.  This
-means you can call them if you're not sure whether #1 applies.
-
-There is currently no lock ordering rule if a thread needs to acquire multiple
-AioContexts simultaneously.  Therefore, it is only safe for code holding the
-QEMU global mutex to acquire other AioContexts.
+2. Other threads wishing to access the AioContext must take the QEMU global
+mutex *as well as* call aio_context_acquire()/aio_context_release() for
+mutual exclusion.  The QEMU global mutex must be taken outside the call
+to aio_context_acquire().
+
+No lock ordering rule is necessary if a thread needs to acquire multiple
+AioContexts simultaneously.  Because the IOThreads won't ever acquire
+multiple AioContexts, it is always safe for the owner of the QEMU global
+mutex to acquire any number of them.
 
 Side note: the best way to schedule a function call across threads is to create
 a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
diff --git a/include/block/aio.h b/include/block/aio.h
index 986be97..191bd3e 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -129,6 +129,12 @@ struct AioContext {
     int epollfd;
     bool epoll_enabled;
     bool epoll_available;
+
+    /* Returns whether there has been progress on the AioContext since
+     * the last time this field was set to false.
+     */
+    bool progress;
+    QemuEvent sync_io_event;
 };
 
 /**
@@ -296,9 +302,9 @@ bool aio_poll_internal(AioContext *ctx, bool blocking);
 /* Progress in completing AIO work to occur.  This can issue new pending
  * aio as a result of executing I/O completion or bh callbacks.
  *
- * Return whether any progress was made by executing AIO or bottom half
- * handlers.  If @blocking == true, this should always be true except
- * if someone called aio_notify.
+ * Return whether any progress was made since the last call to aio_poll
+ * in the execution of AIO or bottom half handlers.  If @blocking ==
+ * true, this should always be true except if someone called aio_notify.
  *
  * If there are no pending bottom halves, but there are pending AIO
  * operations, it may not be possible to make any progress without
diff --git a/stubs/iothread-lock.c b/stubs/iothread-lock.c
index dda6f6b..171e92f 100644
--- a/stubs/iothread-lock.c
+++ b/stubs/iothread-lock.c
@@ -13,3 +13,8 @@ void qemu_mutex_lock_iothread(void)
 void qemu_mutex_unlock_iothread(void)
 {
 }
+
+bool aio_context_in_iothread(AioContext *ctx)
+{
+    return true;
+}
-- 
2.5.0

  parent reply	other threads:[~2016-01-15 15:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-15 15:12 [Qemu-devel] [PATCH 00/16] aio: first part of aio_context_acquire/release pushdown Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 01/16] aio: introduce aio_context_in_iothread Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 02/16] aio: do not really acquire/release the main AIO context Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 03/16] aio: introduce aio_poll_internal Paolo Bonzini
2016-01-15 15:12 ` Paolo Bonzini [this message]
2016-01-15 15:12 ` [Qemu-devel] [PATCH 05/16] iothread: release AioContext around aio_poll Paolo Bonzini
2016-02-02 14:52   ` Stefan Hajnoczi
2016-02-02 15:01     ` Paolo Bonzini
2016-02-03  9:34       ` Stefan Hajnoczi
2016-02-03  9:52         ` Paolo Bonzini
2016-02-05 17:51           ` Stefan Hajnoczi
2016-02-08 15:34             ` Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 06/16] qemu-thread: introduce QemuRecMutex Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 07/16] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 08/16] aio: rename bh_lock to list_lock Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 09/16] qemu-thread: introduce QemuLockCnt Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 10/16] aio: make ctx->list_lock a QemuLockCnt, subsuming ctx->walking_bh Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 11/16] qemu-thread: optimize QemuLockCnt with futexes on Linux Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 12/16] aio: tweak walking in dispatch phase Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 13/16] aio-posix: remove walking_handlers, protecting AioHandler list with list_lock Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 14/16] aio-win32: " Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 15/16] aio: document locking Paolo Bonzini
2016-01-15 15:12 ` [Qemu-devel] [PATCH 16/16] aio: push aio_context_acquire/release down to dispatching Paolo Bonzini
2016-02-02  9:41 ` [Qemu-devel] [PATCH 00/16] aio: first part of aio_context_acquire/release pushdown Paolo Bonzini
2016-02-02 17:26 ` Stefan Hajnoczi
2016-02-08 16:14 [Qemu-devel] [PATCH v2 " Paolo Bonzini
2016-02-08 16:14 ` [Qemu-devel] [PATCH 04/16] aio: only call aio_poll_internal from iothread Paolo Bonzini
2016-02-08 22:22   ` Eric Blake
2016-02-09 11:45 [Qemu-devel] [PATCH v3 00/16] aio: first part of aio_context_acquire/release pushdown Paolo Bonzini
2016-02-09 11:46 ` [Qemu-devel] [PATCH 04/16] aio: only call aio_poll_internal from iothread Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1452870739-28484-5-git-send-email-pbonzini@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.