All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe
@ 2017-02-13 18:12 Paolo Bonzini
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 1/6] coroutine-lock: make CoMutex thread-safe Paolo Bonzini
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Paolo Bonzini @ 2017-02-13 18:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha, qemu-block

This is yet another tiny bit of the multiqueue work, this time affecting
the synchronization infrastructure for coroutines.  Currently, coroutines
synchronize between the main I/O thread and the dataplane iothread through
the AioContext lock.  However, for multiqueue a single BDS will be used
by multiple iothreads and hence multiple AioContexts.  This calls for
a different approach to coroutine synchronization, and this series is my
attempt.

After the previous series, coroutines are already bound to an AioContext
while they wait on a CoMutex.  Of course currently a CoMutex is generally
not used across multiple iothreads, because you have to acquire/release
the AioContext around CoMutex critical sections.  This series is the
missing link between the aio_co_schedule/aio_co_wake infrastructure and
making BDS thread-safe: by making CoMutex thread-safe, it removes
the need for a "thread-based" mutex around it.

This will still need some changes in the formats because, for multiqueue,
CoMutexes would need to be used like "real" thread mutexes.  Code like
this:

    ...
    qemu_co_mutex_unlock()
    ... /* still access shared data, but don't yield */
    qemu_coroutine_yield()

might be required to use this other pattern:

    ... /* access shared data, but don't yield */
    qemu_co_mutex_unlock()
    qemu_coroutine_yield()

because adding a second AioContext is already introducing concurrency that
wasn't there before.  Still, even if you have to take concurrency into
account, multitasking between coroutines remains non-preemptive.  So for
example, it is easy to synchronize between qemu_co_mutex_lock's yield
and the qemu_coroutine_enter in aio_co_schedule's bottom half.

CoMutex puts coroutines to sleep with qemu_coroutine_yield and wake them
up with aio_co_wake.  I could have wrapped CoMutex's CoQueue with
a "regular" thread mutex or spinlock.  The resulting code would
have looked a lot like RFifoLock (with CoQueue replacing RFifoLock's
condition variable).  Instead, CoMutex is implemented from scratch and
CoQueue is made to depend on a CoMutex, similar to condition variables.
Most CoQueues already have a corresponding CoMutex so this is not a big
deal; converting the others is left for a future series, but a surprising
number of drivers actually need no change.

The mutex algorithm comes from OSv; it only needs two to four atomic ops
for a lock-unlock pair (two when uncontended) and if necessary
we could even take OSv's support for wait morphing (which avoids the
thundering herd problem) and add it to CoMutex and CoQueue.

Performance of CoMutex is comparable to pthread mutexes.  However, you
cannot make a direct comparison between CoMutex (fair) and pthread_mutex_t
(unfair).  For this reason the testcase also measures performance of
a quick-and-dirty implementation of a fair mutex, based on MCS locks
and futexes.

Paolo

Paolo Bonzini (6):
  coroutine-lock: make CoMutex thread-safe
  coroutine-lock: add limited spinning to CoMutex
  test-aio-multithread: add performance comparison with thread-based
    mutexes
  coroutine-lock: place CoMutex before CoQueue in header
  coroutine-lock: add mutex argument to CoQueue APIs
  coroutine-lock: make CoRwlock thread-safe and fair

 block/backup.c               |   2 +-
 block/io.c                   |   4 +-
 block/nbd-client.c           |   2 +-
 block/qcow2-cluster.c        |   4 +-
 block/sheepdog.c             |   2 +-
 block/throttle-groups.c      |   2 +-
 hw/9pfs/9p.c                 |   2 +-
 include/qemu/coroutine.h     |  84 +++++++++------
 tests/test-aio-multithread.c | 250 +++++++++++++++++++++++++++++++++++++++++++
 util/qemu-coroutine-lock.c   | 247 ++++++++++++++++++++++++++++++++++++++----
 util/qemu-coroutine.c        |   2 +-
 util/trace-events            |   1 +
 12 files changed, 537 insertions(+), 65 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 1/6] coroutine-lock: make CoMutex thread-safe
  2017-02-13 18:12 [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Paolo Bonzini
@ 2017-02-13 18:12 ` Paolo Bonzini
  2017-02-16 16:03   ` Stefan Hajnoczi
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 2/6] coroutine-lock: add limited spinning to CoMutex Paolo Bonzini
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2017-02-13 18:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha, qemu-block

This uses the lock-free mutex described in the paper '"Blocking without
Locking", or LFTHREADS: A lock-free thread library' by Gidenstam and
Papatriantafilou.  The same technique is used in OSv, and in fact
the code is essentially a conversion to C of OSv's code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/coroutine.h     |  17 ++++-
 tests/test-aio-multithread.c |  86 ++++++++++++++++++++++++
 util/qemu-coroutine-lock.c   | 153 ++++++++++++++++++++++++++++++++++++++++---
 util/trace-events            |   1 +
 4 files changed, 245 insertions(+), 12 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 12584ed..fce228f 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -160,10 +160,23 @@ bool qemu_co_queue_empty(CoQueue *queue);
 /**
  * Provides a mutex that can be used to synchronise coroutines
  */
+struct CoWaitRecord;
 typedef struct CoMutex {
-    bool locked;
+    /* Count of pending lockers; 0 for a free mutex, 1 for an
+     * uncontended mutex.
+     */
+    unsigned locked;
+
+    /* A queue of waiters.  Elements are added atomically in front of
+     * from_push.  to_pop is only populated, and popped from, by whoever
+     * is in charge of the next wakeup.  This can be an unlocker or,
+     * through the handoff protocol, a locker that is about to go to sleep.
+     */
+    QSLIST_HEAD(, CoWaitRecord) from_push, to_pop;
+
+    unsigned handoff, sequence;
+
     Coroutine *holder;
-    CoQueue queue;
 } CoMutex;
 
 /**
diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
index 534807d..ada8c48 100644
--- a/tests/test-aio-multithread.c
+++ b/tests/test-aio-multithread.c
@@ -196,6 +196,88 @@ static void test_multi_co_schedule_10(void)
     test_multi_co_schedule(10);
 }
 
+/* CoMutex thread-safety.  */
+
+static uint32_t atomic_counter;
+static uint32_t running;
+static uint32_t counter;
+static CoMutex comutex;
+
+static void test_multi_co_mutex_entry(void *opaque)
+{
+    while (!atomic_mb_read(&now_stopping)) {
+        qemu_co_mutex_lock(&comutex);
+        counter++;
+        qemu_co_mutex_unlock(&comutex);
+
+        /* Increase atomic_counter *after* releasing the mutex.  Otherwise
+         * there is a chance (it happens about 1 in 3 runs) that the iothread
+         * exits before the coroutine is woken up, causing a spurious
+         * assertion failure.
+         */
+        atomic_inc(&atomic_counter);
+    }
+    atomic_dec(&running);
+}
+
+static void test_multi_co_mutex(int threads, int seconds)
+{
+    int i;
+
+    qemu_co_mutex_init(&comutex);
+    counter = 0;
+    atomic_counter = 0;
+    now_stopping = false;
+
+    create_aio_contexts();
+    assert(threads <= NUM_CONTEXTS);
+    running = threads;
+    for (i = 0; i < threads; i++) {
+        Coroutine *co1 = qemu_coroutine_create(test_multi_co_mutex_entry, NULL);
+        aio_co_schedule(ctx[i], co1);
+    }
+
+    g_usleep(seconds * 1000000);
+
+    atomic_mb_set(&now_stopping, true);
+    while (running > 0) {
+        g_usleep(100000);
+    }
+
+    join_aio_contexts();
+    g_test_message("%d iterations/second\n", counter / seconds);
+    g_assert_cmpint(counter, ==, atomic_counter);
+}
+
+/* Testing with NUM_CONTEXTS threads focuses on the queue.  The mutex however
+ * is too contended (and the threads spend too much time in aio_poll)
+ * to actually stress the handoff protocol.
+ */
+static void test_multi_co_mutex_1(void)
+{
+    test_multi_co_mutex(NUM_CONTEXTS, 1);
+}
+
+static void test_multi_co_mutex_10(void)
+{
+    test_multi_co_mutex(NUM_CONTEXTS, 10);
+}
+
+/* Testing with fewer threads stresses the handoff protocol too.  Still, the
+ * case where the locker _can_ pick up a handoff is very rare, happening
+ * about 10 times in 1 million, so increase the runtime a bit compared to
+ * other "quick" testcases that only run for 1 second.
+ */
+static void test_multi_co_mutex_2_3(void)
+{
+    test_multi_co_mutex(2, 3);
+}
+
+static void test_multi_co_mutex_2_30(void)
+{
+    test_multi_co_mutex(2, 30);
+}
+
 /* End of tests.  */
 
 int main(int argc, char **argv)
@@ -206,8 +288,12 @@ int main(int argc, char **argv)
     g_test_add_func("/aio/multi/lifecycle", test_lifecycle);
     if (g_test_quick()) {
         g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_1);
+        g_test_add_func("/aio/multi/mutex/contended", test_multi_co_mutex_1);
+        g_test_add_func("/aio/multi/mutex/handoff", test_multi_co_mutex_2_3);
     } else {
         g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_10);
+        g_test_add_func("/aio/multi/mutex/contended", test_multi_co_mutex_10);
+        g_test_add_func("/aio/multi/mutex/handoff", test_multi_co_mutex_2_30);
     }
     return g_test_run();
 }
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index e6afd1a..25da9fa 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -20,6 +20,10 @@
  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  * THE SOFTWARE.
+ *
+ * The lock-free mutex implementation is based on OSv
+ * (core/lfmutex.cc, include/lockfree/mutex.hh).
+ * Copyright (C) 2013 Cloudius Systems, Ltd.
  */
 
 #include "qemu/osdep.h"
@@ -111,27 +115,119 @@ bool qemu_co_queue_empty(CoQueue *queue)
     return QSIMPLEQ_FIRST(&queue->entries) == NULL;
 }
 
+/* The wait records are handled with a multiple-producer, single-consumer
+ * lock-free queue.  There cannot be two concurrent pop_waiter() calls
+ * because pop_waiter() can only be called while mutex->handoff is zero.
+ * This can happen in three cases:
+ * - in qemu_co_mutex_unlock, before the hand-off protocol has started.
+ *   In this case, qemu_co_mutex_lock will see mutex->handoff == 0 and
+ *   not take part in the handoff.
+ * - in qemu_co_mutex_lock, if it steals the hand-off responsibility from
+ *   qemu_co_mutex_unlock.  In this case, qemu_co_mutex_unlock will fail
+ *   the cmpxchg (it will see either 0 or the next sequence value) and
+ *   exit.  The next hand-off cannot begin until qemu_co_mutex_lock has
+ *   woken up someone.
+ * - in qemu_co_mutex_unlock, if it takes the hand-off token itself.
+ *   In this case another iteration starts with mutex->handoff == 0;
+ *   a concurrent qemu_co_mutex_lock will fail the cmpxchg, and
+ *   qemu_co_mutex_unlock will go back to case (1).
+ *
+ * The following functions manage this queue.
+ */
+typedef struct CoWaitRecord {
+    Coroutine *co;
+    QSLIST_ENTRY(CoWaitRecord) next;
+} CoWaitRecord;
+
+static void push_waiter(CoMutex *mutex, CoWaitRecord *w)
+{
+    w->co = qemu_coroutine_self();
+    QSLIST_INSERT_HEAD_ATOMIC(&mutex->from_push, w, next);
+}
+
+static void move_waiters(CoMutex *mutex)
+{
+    QSLIST_HEAD(, CoWaitRecord) reversed;
+    QSLIST_MOVE_ATOMIC(&reversed, &mutex->from_push);
+    while (!QSLIST_EMPTY(&reversed)) {
+        CoWaitRecord *w = QSLIST_FIRST(&reversed);
+        QSLIST_REMOVE_HEAD(&reversed, next);
+        QSLIST_INSERT_HEAD(&mutex->to_pop, w, next);
+    }
+}
+
+static CoWaitRecord *pop_waiter(CoMutex *mutex)
+{
+    CoWaitRecord *w;
+
+    if (QSLIST_EMPTY(&mutex->to_pop)) {
+        move_waiters(mutex);
+        if (QSLIST_EMPTY(&mutex->to_pop)) {
+            return NULL;
+        }
+    }
+    w = QSLIST_FIRST(&mutex->to_pop);
+    QSLIST_REMOVE_HEAD(&mutex->to_pop, next);
+    return w;
+}
+
+static bool has_waiters(CoMutex *mutex)
+{
+    return QSLIST_EMPTY(&mutex->to_pop) || QSLIST_EMPTY(&mutex->from_push);
+}
+
 void qemu_co_mutex_init(CoMutex *mutex)
 {
     memset(mutex, 0, sizeof(*mutex));
-    qemu_co_queue_init(&mutex->queue);
 }
 
-void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex)
+static void coroutine_fn qemu_co_mutex_lock_slowpath(CoMutex *mutex)
 {
     Coroutine *self = qemu_coroutine_self();
+    CoWaitRecord w;
+    unsigned old_handoff;
 
     trace_qemu_co_mutex_lock_entry(mutex, self);
+    w.co = self;
+    push_waiter(mutex, &w);
+
+    /* This is the "Responsibility Hand-Off" protocol; a lock() picks from
+     * a concurrent unlock() the responsibility of waking somebody up.
+     */
+    old_handoff = atomic_mb_read(&mutex->handoff);
+    if (old_handoff &&
+        has_waiters(mutex) &&
+        atomic_cmpxchg(&mutex->handoff, old_handoff, 0) == old_handoff) {
+        /* There can be no concurrent pops, because there can be only
+         * one active handoff at a time.
+         */
+        CoWaitRecord *to_wake = pop_waiter(mutex);
+        Coroutine *co = to_wake->co;
+        if (co == self) {
+            /* We got the lock ourselves!  */
+            assert(to_wake == &w);
+            return;
+        }
 
-    while (mutex->locked) {
-        qemu_co_queue_wait(&mutex->queue);
+        aio_co_wake(co);
     }
 
-    mutex->locked = true;
+    qemu_coroutine_yield();
+    trace_qemu_co_mutex_lock_return(mutex, self);
+}
+
+void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex)
+{
+    Coroutine *self = qemu_coroutine_self();
+
+    if (atomic_fetch_inc(&mutex->locked) == 0) {
+        /* Uncontended.  */
+        trace_qemu_co_mutex_lock_uncontended(mutex, self);
+    } else {
+        qemu_co_mutex_lock_slowpath(mutex);
+    }
     mutex->holder = self;
     self->locks_held++;
-
-    trace_qemu_co_mutex_lock_return(mutex, self);
 }
 
 void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
@@ -140,14 +236,51 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
 
     trace_qemu_co_mutex_unlock_entry(mutex, self);
 
-    assert(mutex->locked == true);
+    assert(mutex->locked);
     assert(mutex->holder == self);
     assert(qemu_in_coroutine());
 
-    mutex->locked = false;
     mutex->holder = NULL;
     self->locks_held--;
-    qemu_co_queue_next(&mutex->queue);
+    if (atomic_fetch_dec(&mutex->locked) == 1) {
+        /* No waiting qemu_co_mutex_lock().  Pfew, that was easy!  */
+        return;
+    }
+
+    for (;;) {
+        CoWaitRecord *to_wake = pop_waiter(mutex);
+        unsigned our_handoff;
+
+        if (to_wake) {
+            Coroutine *co = to_wake->co;
+            aio_co_wake(co);
+            break;
+        }
+
+        /* Some concurrent lock() is in progress (we know this because
+         * mutex->locked was >1) but it hasn't yet put itself on the wait
+         * queue.  Pick a sequence number for the handoff protocol (not 0).
+         */
+        if (++mutex->sequence == 0) {
+            mutex->sequence = 1;
+        }
+
+        our_handoff = mutex->sequence;
+        atomic_mb_set(&mutex->handoff, our_handoff);
+        if (!has_waiters(mutex)) {
+            /* The concurrent lock has not added itself yet, so it
+             * will be able to pick our handoff.
+             */
+            break;
+        }
+
+        /* Try to do the handoff protocol ourselves; if somebody else has
+         * already taken it, however, we're done and they're responsible.
+         */
+        if (atomic_cmpxchg(&mutex->handoff, our_handoff, 0) != our_handoff) {
+            break;
+        }
+    }
 
     trace_qemu_co_mutex_unlock_return(mutex, self);
 }
diff --git a/util/trace-events b/util/trace-events
index 65c9787..ac27d94 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -28,6 +28,7 @@ qemu_coroutine_terminate(void *co) "self %p"
 
 # util/qemu-coroutine-lock.c
 qemu_co_queue_run_restart(void *co) "co %p"
+qemu_co_mutex_lock_uncontended(void *mutex, void *self) "mutex %p self %p"
 qemu_co_mutex_lock_entry(void *mutex, void *self) "mutex %p self %p"
 qemu_co_mutex_lock_return(void *mutex, void *self) "mutex %p self %p"
 qemu_co_mutex_unlock_entry(void *mutex, void *self) "mutex %p self %p"
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 2/6] coroutine-lock: add limited spinning to CoMutex
  2017-02-13 18:12 [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Paolo Bonzini
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 1/6] coroutine-lock: make CoMutex thread-safe Paolo Bonzini
@ 2017-02-13 18:12 ` Paolo Bonzini
  2017-02-16 16:10   ` Stefan Hajnoczi
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 3/6] test-aio-multithread: add performance comparison with thread-based mutexes Paolo Bonzini
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2017-02-13 18:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha, qemu-block

Running a very small critical section on pthread_mutex_t and CoMutex
shows that pthread_mutex_t is much faster because it doesn't actually
go to sleep.  What happens is that the critical section is shorter
than the latency of entering the kernel and thus FUTEX_WAIT always
fails.  With CoMutex there is no such latency but you still want to
avoid wait and wakeup.  So introduce it artificially.

This only works with one waiters; because CoMutex is fair, it will
always have more waits and wakeups than a pthread_mutex_t.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/coroutine.h   |  5 +++++
 util/qemu-coroutine-lock.c | 51 ++++++++++++++++++++++++++++++++++++++++------
 util/qemu-coroutine.c      |  2 +-
 3 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index fce228f..12ce8e1 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -167,6 +167,11 @@ typedef struct CoMutex {
      */
     unsigned locked;
 
+    /* Context that is holding the lock.  Useful to avoid spinning
+     * when two coroutines on the same AioContext try to get the lock. :)
+     */
+    AioContext *ctx;
+
     /* A queue of waiters.  Elements are added atomically in front of
      * from_push.  to_pop is only populated, and popped from, by whoever
      * is in charge of the next wakeup.  This can be an unlocker or,
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 25da9fa..73fe77c 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -30,6 +30,7 @@
 #include "qemu-common.h"
 #include "qemu/coroutine.h"
 #include "qemu/coroutine_int.h"
+#include "qemu/processor.h"
 #include "qemu/queue.h"
 #include "block/aio.h"
 #include "trace.h"
@@ -181,7 +182,18 @@ void qemu_co_mutex_init(CoMutex *mutex)
     memset(mutex, 0, sizeof(*mutex));
 }
 
-static void coroutine_fn qemu_co_mutex_lock_slowpath(CoMutex *mutex)
+static void coroutine_fn qemu_co_mutex_wake(CoMutex *mutex, Coroutine *co)
+{
+    /* Read co before co->ctx; pairs with smp_wmb() in
+     * qemu_coroutine_enter().
+     */
+    smp_read_barrier_depends();
+    mutex->ctx = co->ctx;
+    aio_co_wake(co);
+}
+
+static void coroutine_fn qemu_co_mutex_lock_slowpath(AioContext *ctx,
+                                                     CoMutex *mutex)
 {
     Coroutine *self = qemu_coroutine_self();
     CoWaitRecord w;
@@ -206,10 +218,11 @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(CoMutex *mutex)
         if (co == self) {
             /* We got the lock ourselves!  */
             assert(to_wake == &w);
+            mutex->ctx = ctx;
             return;
         }
 
-        aio_co_wake(co);
+        qemu_co_mutex_wake(mutex, co);
     }
 
     qemu_coroutine_yield();
@@ -218,13 +231,39 @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(CoMutex *mutex)
 
 void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex)
 {
+    AioContext *ctx = qemu_get_current_aio_context();
     Coroutine *self = qemu_coroutine_self();
+    int waiters, i;
+
+    /* Running a very small critical section on pthread_mutex_t and CoMutex
+     * shows that pthread_mutex_t is much faster because it doesn't actually
+     * go to sleep.  What happens is that the critical section is shorter
+     * than the latency of entering the kernel and thus FUTEX_WAIT always
+     * fails.  With CoMutex there is no such latency but you still want to
+     * avoid wait and wakeup.  So introduce it artificially.
+     */
+    i = 0;
+retry_fast_path:
+    waiters = atomic_cmpxchg(&mutex->locked, 0, 1);
+    if (waiters != 0) {
+        while (waiters == 1 && ++i < 1000) {
+            if (atomic_read(&mutex->ctx) == ctx) {
+                break;
+            }
+            if (atomic_read(&mutex->locked) == 0) {
+                goto retry_fast_path;
+            }
+            cpu_relax();
+        }
+        waiters = atomic_fetch_inc(&mutex->locked);
+    }
 
-    if (atomic_fetch_inc(&mutex->locked) == 0) {
+    if (waiters == 0) {
         /* Uncontended.  */
         trace_qemu_co_mutex_lock_uncontended(mutex, self);
+        mutex->ctx = ctx;
     } else {
-        qemu_co_mutex_lock_slowpath(mutex);
+        qemu_co_mutex_lock_slowpath(ctx, mutex);
     }
     mutex->holder = self;
     self->locks_held++;
@@ -240,6 +279,7 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
     assert(mutex->holder == self);
     assert(qemu_in_coroutine());
 
+    mutex->ctx = NULL;
     mutex->holder = NULL;
     self->locks_held--;
     if (atomic_fetch_dec(&mutex->locked) == 1) {
@@ -252,8 +292,7 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
         unsigned our_handoff;
 
         if (to_wake) {
-            Coroutine *co = to_wake->co;
-            aio_co_wake(co);
+            qemu_co_mutex_wake(mutex, to_wake->co);
             break;
         }
 
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 415600d..72412e5 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -118,7 +118,7 @@ void qemu_coroutine_enter(Coroutine *co)
     co->ctx = qemu_get_current_aio_context();
 
     /* Store co->ctx before anything that stores co.  Matches
-     * barrier in aio_co_wake.
+     * barrier in aio_co_wake and qemu_co_mutex_wake.
      */
     smp_wmb();
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 3/6] test-aio-multithread: add performance comparison with thread-based mutexes
  2017-02-13 18:12 [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Paolo Bonzini
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 1/6] coroutine-lock: make CoMutex thread-safe Paolo Bonzini
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 2/6] coroutine-lock: add limited spinning to CoMutex Paolo Bonzini
@ 2017-02-13 18:12 ` Paolo Bonzini
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 4/6] coroutine-lock: place CoMutex before CoQueue in header Paolo Bonzini
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2017-02-13 18:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha, qemu-block

Add two implementations of the same benchmark as the previous patch,
but using pthreads.  One uses a normal QemuMutex, the other is Linux
only and implements a fair mutex based on MCS locks and futexes.
This shows that the slower performance of the 5-thread case is due to
the fairness of CoMutex, rather than to coroutines.  If fairness does
not matter, as is the case with two threads, CoMutex can actually be
faster than pthreads.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/test-aio-multithread.c | 164 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 164 insertions(+)

diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
index ada8c48..e7256a9 100644
--- a/tests/test-aio-multithread.c
+++ b/tests/test-aio-multithread.c
@@ -278,6 +278,162 @@ static void test_multi_co_mutex_2_30(void)
     test_multi_co_mutex(2, 30);
 }
 
+/* Same test with fair mutexes, for performance comparison.  */
+
+#ifdef CONFIG_LINUX
+#include "qemu/futex.h"
+
+/* The nodes for the mutex reside in this structure (on which we try to avoid
+ * false sharing).  The head of the mutex is in the "mutex_head" variable.
+ */
+static struct {
+    int next, locked;
+    int padding[14];
+} nodes[NUM_CONTEXTS] __attribute__((__aligned__(64)));
+
+static int mutex_head = -1;
+
+static void mcs_mutex_lock(void)
+{
+    int prev;
+
+    nodes[id].next = -1;
+    nodes[id].locked = 1;
+    prev = atomic_xchg(&mutex_head, id);
+    if (prev != -1) {
+        atomic_set(&nodes[prev].next, id);
+        qemu_futex_wait(&nodes[id].locked, 1);
+    }
+}
+
+static void mcs_mutex_unlock(void)
+{
+    int next;
+    if (nodes[id].next == -1) {
+        if (atomic_read(&mutex_head) == id &&
+            atomic_cmpxchg(&mutex_head, id, -1) == id) {
+            /* Last item in the list, exit.  */
+            return;
+        }
+        while (atomic_read(&nodes[id].next) == -1) {
+            /* mcs_mutex_lock did the xchg, but has not updated
+             * nodes[prev].next yet.
+             */
+        }
+    }
+
+    /* Wake up the next in line.  */
+    next = nodes[id].next;
+    nodes[next].locked = 0;
+    qemu_futex_wake(&nodes[next].locked, 1);
+}
+
+static void test_multi_fair_mutex_entry(void *opaque)
+{
+    while (!atomic_mb_read(&now_stopping)) {
+        mcs_mutex_lock();
+        counter++;
+        mcs_mutex_unlock();
+        atomic_inc(&atomic_counter);
+    }
+    atomic_dec(&running);
+}
+
+static void test_multi_fair_mutex(int threads, int seconds)
+{
+    int i;
+
+    assert(mutex_head == -1);
+    counter = 0;
+    atomic_counter = 0;
+    now_stopping = false;
+
+    create_aio_contexts();
+    assert(threads <= NUM_CONTEXTS);
+    running = threads;
+    for (i = 0; i < threads; i++) {
+        Coroutine *co1 = qemu_coroutine_create(test_multi_fair_mutex_entry, NULL);
+        aio_co_schedule(ctx[i], co1);
+    }
+
+    g_usleep(seconds * 1000000);
+
+    atomic_mb_set(&now_stopping, true);
+    while (running > 0) {
+        g_usleep(100000);
+    }
+
+    join_aio_contexts();
+    g_test_message("%d iterations/second\n", counter / seconds);
+    g_assert_cmpint(counter, ==, atomic_counter);
+}
+
+static void test_multi_fair_mutex_1(void)
+{
+    test_multi_fair_mutex(NUM_CONTEXTS, 1);
+}
+
+static void test_multi_fair_mutex_10(void)
+{
+    test_multi_fair_mutex(NUM_CONTEXTS, 10);
+}
+#endif
+
+/* Same test with pthread mutexes, for performance comparison and
+ * portability.  */
+
+static QemuMutex mutex;
+
+static void test_multi_mutex_entry(void *opaque)
+{
+    while (!atomic_mb_read(&now_stopping)) {
+        qemu_mutex_lock(&mutex);
+        counter++;
+        qemu_mutex_unlock(&mutex);
+        atomic_inc(&atomic_counter);
+    }
+    atomic_dec(&running);
+}
+
+static void test_multi_mutex(int threads, int seconds)
+{
+    int i;
+
+    qemu_mutex_init(&mutex);
+    counter = 0;
+    atomic_counter = 0;
+    now_stopping = false;
+
+    create_aio_contexts();
+    assert(threads <= NUM_CONTEXTS);
+    running = threads;
+    for (i = 0; i < threads; i++) {
+        Coroutine *co1 = qemu_coroutine_create(test_multi_mutex_entry, NULL);
+        aio_co_schedule(ctx[i], co1);
+    }
+
+    g_usleep(seconds * 1000000);
+
+    atomic_mb_set(&now_stopping, true);
+    while (running > 0) {
+        g_usleep(100000);
+    }
+
+    join_aio_contexts();
+    g_test_message("%d iterations/second\n", counter / seconds);
+    g_assert_cmpint(counter, ==, atomic_counter);
+}
+
+static void test_multi_mutex_1(void)
+{
+    test_multi_mutex(NUM_CONTEXTS, 1);
+}
+
+static void test_multi_mutex_10(void)
+{
+    test_multi_mutex(NUM_CONTEXTS, 10);
+}
+
 /* End of tests.  */
 
 int main(int argc, char **argv)
@@ -290,10 +446,18 @@ int main(int argc, char **argv)
         g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_1);
         g_test_add_func("/aio/multi/mutex/contended", test_multi_co_mutex_1);
         g_test_add_func("/aio/multi/mutex/handoff", test_multi_co_mutex_2_3);
+#ifdef CONFIG_LINUX
+        g_test_add_func("/aio/multi/mutex/mcs", test_multi_fair_mutex_1);
+#endif
+        g_test_add_func("/aio/multi/mutex/pthread", test_multi_mutex_1);
     } else {
         g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_10);
         g_test_add_func("/aio/multi/mutex/contended", test_multi_co_mutex_10);
         g_test_add_func("/aio/multi/mutex/handoff", test_multi_co_mutex_2_30);
+#ifdef CONFIG_LINUX
+        g_test_add_func("/aio/multi/mutex/mcs", test_multi_fair_mutex_10);
+#endif
+        g_test_add_func("/aio/multi/mutex/pthread", test_multi_mutex_10);
     }
     return g_test_run();
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 4/6] coroutine-lock: place CoMutex before CoQueue in header
  2017-02-13 18:12 [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Paolo Bonzini
                   ` (2 preceding siblings ...)
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 3/6] test-aio-multithread: add performance comparison with thread-based mutexes Paolo Bonzini
@ 2017-02-13 18:12 ` Paolo Bonzini
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 5/6] coroutine-lock: add mutex argument to CoQueue APIs Paolo Bonzini
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2017-02-13 18:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha, qemu-block

This will avoid forward references in the next patch.  It is also
more logical because CoQueue is not anymore the basic primitive.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/coroutine.h | 89 ++++++++++++++++++++++++------------------------
 1 file changed, 44 insertions(+), 45 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 12ce8e1..9f68579 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -112,51 +112,6 @@ bool qemu_in_coroutine(void);
  */
 bool qemu_coroutine_entered(Coroutine *co);
 
-
-/**
- * CoQueues are a mechanism to queue coroutines in order to continue executing
- * them later. They provide the fundamental primitives on which coroutine locks
- * are built.
- */
-typedef struct CoQueue {
-    QSIMPLEQ_HEAD(, Coroutine) entries;
-} CoQueue;
-
-/**
- * Initialise a CoQueue. This must be called before any other operation is used
- * on the CoQueue.
- */
-void qemu_co_queue_init(CoQueue *queue);
-
-/**
- * Adds the current coroutine to the CoQueue and transfers control to the
- * caller of the coroutine.
- */
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
-
-/**
- * Restarts the next coroutine in the CoQueue and removes it from the queue.
- *
- * Returns true if a coroutine was restarted, false if the queue is empty.
- */
-bool coroutine_fn qemu_co_queue_next(CoQueue *queue);
-
-/**
- * Restarts all coroutines in the CoQueue and leaves the queue empty.
- */
-void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue);
-
-/**
- * Enter the next coroutine in the queue
- */
-bool qemu_co_enter_next(CoQueue *queue);
-
-/**
- * Checks if the CoQueue is empty.
- */
-bool qemu_co_queue_empty(CoQueue *queue);
-
-
 /**
  * Provides a mutex that can be used to synchronise coroutines
  */
@@ -202,6 +157,50 @@ void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex);
  */
 void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex);
 
+
+/**
+ * CoQueues are a mechanism to queue coroutines in order to continue executing
+ * them later.
+ */
+typedef struct CoQueue {
+    QSIMPLEQ_HEAD(, Coroutine) entries;
+} CoQueue;
+
+/**
+ * Initialise a CoQueue. This must be called before any other operation is used
+ * on the CoQueue.
+ */
+void qemu_co_queue_init(CoQueue *queue);
+
+/**
+ * Adds the current coroutine to the CoQueue and transfers control to the
+ * caller of the coroutine.
+ */
+void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
+
+/**
+ * Restarts the next coroutine in the CoQueue and removes it from the queue.
+ *
+ * Returns true if a coroutine was restarted, false if the queue is empty.
+ */
+bool coroutine_fn qemu_co_queue_next(CoQueue *queue);
+
+/**
+ * Restarts all coroutines in the CoQueue and leaves the queue empty.
+ */
+void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue);
+
+/**
+ * Enter the next coroutine in the queue
+ */
+bool qemu_co_enter_next(CoQueue *queue);
+
+/**
+ * Checks if the CoQueue is empty.
+ */
+bool qemu_co_queue_empty(CoQueue *queue);
+
+
 typedef struct CoRwlock {
     bool writer;
     int reader;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 5/6] coroutine-lock: add mutex argument to CoQueue APIs
  2017-02-13 18:12 [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Paolo Bonzini
                   ` (3 preceding siblings ...)
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 4/6] coroutine-lock: place CoMutex before CoQueue in header Paolo Bonzini
@ 2017-02-13 18:12 ` Paolo Bonzini
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 6/6] coroutine-lock: make CoRwlock thread-safe and fair Paolo Bonzini
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2017-02-13 18:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha, qemu-block

All that CoQueue needs in order to become thread-safe is help
from an external mutex.  Add this to the API.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/backup.c             |  2 +-
 block/io.c                 |  4 ++--
 block/nbd-client.c         |  2 +-
 block/qcow2-cluster.c      |  4 +---
 block/sheepdog.c           |  2 +-
 block/throttle-groups.c    |  2 +-
 hw/9pfs/9p.c               |  2 +-
 include/qemu/coroutine.h   |  8 +++++---
 util/qemu-coroutine-lock.c | 24 +++++++++++++++++++++---
 9 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index ea38733..fe010e7 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -64,7 +64,7 @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
         retry = false;
         QLIST_FOREACH(req, &job->inflight_reqs, list) {
             if (end > req->start && start < req->end) {
-                qemu_co_queue_wait(&req->wait_queue);
+                qemu_co_queue_wait(&req->wait_queue, NULL);
                 retry = true;
                 break;
             }
diff --git a/block/io.c b/block/io.c
index a5c7d36..d5c4544 100644
--- a/block/io.c
+++ b/block/io.c
@@ -539,7 +539,7 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
                  * (instead of producing a deadlock in the former case). */
                 if (!req->waiting_for) {
                     self->waiting_for = req;
-                    qemu_co_queue_wait(&req->wait_queue);
+                    qemu_co_queue_wait(&req->wait_queue, NULL);
                     self->waiting_for = NULL;
                     retry = true;
                     waited = true;
@@ -2275,7 +2275,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 
     /* Wait until any previous flushes are completed */
     while (bs->active_flush_req) {
-        qemu_co_queue_wait(&bs->flush_queue);
+        qemu_co_queue_wait(&bs->flush_queue, NULL);
     }
 
     bs->active_flush_req = true;
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 10fcc9e..0dc12c2 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -182,7 +182,7 @@ static void nbd_coroutine_start(NBDClientSession *s,
     /* Poor man semaphore.  The free_sema is locked when no other request
      * can be accepted, and unlocked after receiving one reply.  */
     if (s->in_flight == MAX_NBD_REQUESTS) {
-        qemu_co_queue_wait(&s->free_sema);
+        qemu_co_queue_wait(&s->free_sema, NULL);
         assert(s->in_flight < MAX_NBD_REQUESTS);
     }
     s->in_flight++;
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 928c1e2..78c11d4 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -932,9 +932,7 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
             if (bytes == 0) {
                 /* Wait for the dependency to complete. We need to recheck
                  * the free/allocated clusters when we continue. */
-                qemu_co_mutex_unlock(&s->lock);
-                qemu_co_queue_wait(&old_alloc->dependent_requests);
-                qemu_co_mutex_lock(&s->lock);
+                qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock);
                 return -EAGAIN;
             }
         }
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 32c4e4c..860ba61 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -486,7 +486,7 @@ static void wait_for_overlapping_aiocb(BDRVSheepdogState *s, SheepdogAIOCB *acb)
 retry:
     QLIST_FOREACH(cb, &s->inflight_aiocb_head, aiocb_siblings) {
         if (AIOCBOverlapping(acb, cb)) {
-            qemu_co_queue_wait(&s->overlapping_queue);
+            qemu_co_queue_wait(&s->overlapping_queue, NULL);
             goto retry;
         }
     }
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index aade5de..b73e7a8 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -326,7 +326,7 @@ void coroutine_fn throttle_group_co_io_limits_intercept(BlockBackend *blk,
     if (must_wait || blkp->pending_reqs[is_write]) {
         blkp->pending_reqs[is_write]++;
         qemu_mutex_unlock(&tg->lock);
-        qemu_co_queue_wait(&blkp->throttled_reqs[is_write]);
+        qemu_co_queue_wait(&blkp->throttled_reqs[is_write], NULL);
         qemu_mutex_lock(&tg->lock);
         blkp->pending_reqs[is_write]--;
     }
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 99e9472..3af1c93 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -2374,7 +2374,7 @@ static void coroutine_fn v9fs_flush(void *opaque)
         /*
          * Wait for pdu to complete.
          */
-        qemu_co_queue_wait(&cancel_pdu->complete);
+        qemu_co_queue_wait(&cancel_pdu->complete, NULL);
         cancel_pdu->cancelled = 0;
         pdu_free(cancel_pdu);
     }
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 9f68579..d2de268 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -160,7 +160,8 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex);
 
 /**
  * CoQueues are a mechanism to queue coroutines in order to continue executing
- * them later.
+ * them later.  They are similar to condition variables, but they need help
+ * from an external mutex in order to maintain thread-safety.
  */
 typedef struct CoQueue {
     QSIMPLEQ_HEAD(, Coroutine) entries;
@@ -174,9 +175,10 @@ void qemu_co_queue_init(CoQueue *queue);
 
 /**
  * Adds the current coroutine to the CoQueue and transfers control to the
- * caller of the coroutine.
+ * caller of the coroutine.  The mutex is unlocked during the wait and
+ * locked again afterwards.
  */
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
+void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex);
 
 /**
  * Restarts the next coroutine in the CoQueue and removes it from the queue.
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 73fe77c..b0a554f 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -40,12 +40,30 @@ void qemu_co_queue_init(CoQueue *queue)
     QSIMPLEQ_INIT(&queue->entries);
 }
 
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue)
+void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
 {
     Coroutine *self = qemu_coroutine_self();
     QSIMPLEQ_INSERT_TAIL(&queue->entries, self, co_queue_next);
+
+    if (mutex) {
+        qemu_co_mutex_unlock(mutex);
+    }
+
+    /* There is no race condition here.  Other threads will call
+     * aio_co_schedule on our AioContext, which can reenter this
+     * coroutine but only after this yield and after the main loop
+     * has gone through the next iteration.
+     */
     qemu_coroutine_yield();
     assert(qemu_in_coroutine());
+
+    /* TODO: OSv implements wait morphing here, where the wakeup
+     * primitive automatically places the woken coroutine on the
+     * mutex's queue.  This avoids the thundering herd effect.
+     */
+    if (mutex) {
+        qemu_co_mutex_lock(mutex);
+    }
 }
 
 /**
@@ -335,7 +353,7 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock)
     Coroutine *self = qemu_coroutine_self();
 
     while (lock->writer) {
-        qemu_co_queue_wait(&lock->queue);
+        qemu_co_queue_wait(&lock->queue, NULL);
     }
     lock->reader++;
     self->locks_held++;
@@ -365,7 +383,7 @@ void qemu_co_rwlock_wrlock(CoRwlock *lock)
     Coroutine *self = qemu_coroutine_self();
 
     while (lock->writer || lock->reader) {
-        qemu_co_queue_wait(&lock->queue);
+        qemu_co_queue_wait(&lock->queue, NULL);
     }
     lock->writer = true;
     self->locks_held++;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 6/6] coroutine-lock: make CoRwlock thread-safe and fair
  2017-02-13 18:12 [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Paolo Bonzini
                   ` (4 preceding siblings ...)
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 5/6] coroutine-lock: add mutex argument to CoQueue APIs Paolo Bonzini
@ 2017-02-13 18:12 ` Paolo Bonzini
  2017-02-15  9:23   ` Fam Zheng
  2017-02-15  9:24 ` [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Fam Zheng
  2017-02-16 17:17 ` Stefan Hajnoczi
  7 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2017-02-13 18:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, stefanha, qemu-block

This adds a CoMutex around the existing CoQueue.  Because the write-side
can just take CoMutex, the old "writer" field is not necessary anymore.
Instead of removing it altogether, count the number of pending writers
during a read-side critical section and forbid further readers from
entering.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/coroutine.h   |  3 ++-
 util/qemu-coroutine-lock.c | 35 ++++++++++++++++++++++++-----------
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index d2de268..e60beaf 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -204,8 +204,9 @@ bool qemu_co_queue_empty(CoQueue *queue);
 
 
 typedef struct CoRwlock {
-    bool writer;
+    int pending_writer;
     int reader;
+    CoMutex mutex;
     CoQueue queue;
 } CoRwlock;
 
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index b0a554f..6328eed 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -346,16 +346,22 @@ void qemu_co_rwlock_init(CoRwlock *lock)
 {
     memset(lock, 0, sizeof(*lock));
     qemu_co_queue_init(&lock->queue);
+    qemu_co_mutex_init(&lock->mutex);
 }
 
 void qemu_co_rwlock_rdlock(CoRwlock *lock)
 {
     Coroutine *self = qemu_coroutine_self();
 
-    while (lock->writer) {
-        qemu_co_queue_wait(&lock->queue, NULL);
+    qemu_co_mutex_lock(&lock->mutex);
+    /* For fairness, wait if a writer is in line.  */
+    while (lock->pending_writer) {
+        qemu_co_queue_wait(&lock->queue, &lock->mutex);
     }
     lock->reader++;
+    qemu_co_mutex_unlock(&lock->mutex);
+
+    /* The rest of the read-side critical section is run without the mutex.  */
     self->locks_held++;
 }
 
@@ -364,10 +370,13 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
     Coroutine *self = qemu_coroutine_self();
 
     assert(qemu_in_coroutine());
-    if (lock->writer) {
-        lock->writer = false;
+    if (!lock->reader) {
+        /* The critical section started in qemu_co_rwlock_wrlock.  */
         qemu_co_queue_restart_all(&lock->queue);
     } else {
+        self->locks_held--;
+
+        qemu_co_mutex_lock(&lock->mutex);
         lock->reader--;
         assert(lock->reader >= 0);
         /* Wakeup only one waiting writer */
@@ -375,16 +384,20 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
             qemu_co_queue_next(&lock->queue);
         }
     }
-    self->locks_held--;
+    qemu_co_mutex_unlock(&lock->mutex);
 }
 
 void qemu_co_rwlock_wrlock(CoRwlock *lock)
 {
-    Coroutine *self = qemu_coroutine_self();
-
-    while (lock->writer || lock->reader) {
-        qemu_co_queue_wait(&lock->queue, NULL);
+    qemu_co_mutex_lock(&lock->mutex);
+    lock->pending_writer++;
+    while (lock->reader) {
+        qemu_co_queue_wait(&lock->queue, &lock->mutex);
     }
-    lock->writer = true;
-    self->locks_held++;
+    lock->pending_writer--;
+
+    /* The rest of the write-side critical section is run with
+     * the mutex taken, so that lock->reader remains zero.
+     * There is no need to update self->locks_held.
+     */
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] coroutine-lock: make CoRwlock thread-safe and fair
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 6/6] coroutine-lock: make CoRwlock thread-safe and fair Paolo Bonzini
@ 2017-02-15  9:23   ` Fam Zheng
  2017-02-15 11:20     ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Fam Zheng @ 2017-02-15  9:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, stefanha, qemu-block

On Mon, 02/13 19:12, Paolo Bonzini wrote:
> This adds a CoMutex around the existing CoQueue.  Because the write-side

s/CoQueue/CoRwlock/

> can just take CoMutex, the old "writer" field is not necessary anymore.
> Instead of removing it altogether, count the number of pending writers
> during a read-side critical section and forbid further readers from
> entering.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/qemu/coroutine.h   |  3 ++-
>  util/qemu-coroutine-lock.c | 35 ++++++++++++++++++++++++-----------
>  2 files changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
> index d2de268..e60beaf 100644
> --- a/include/qemu/coroutine.h
> +++ b/include/qemu/coroutine.h
> @@ -204,8 +204,9 @@ bool qemu_co_queue_empty(CoQueue *queue);
>  
>  
>  typedef struct CoRwlock {
> -    bool writer;
> +    int pending_writer;
>      int reader;
> +    CoMutex mutex;
>      CoQueue queue;
>  } CoRwlock;
>  
> diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
> index b0a554f..6328eed 100644
> --- a/util/qemu-coroutine-lock.c
> +++ b/util/qemu-coroutine-lock.c
> @@ -346,16 +346,22 @@ void qemu_co_rwlock_init(CoRwlock *lock)
>  {
>      memset(lock, 0, sizeof(*lock));
>      qemu_co_queue_init(&lock->queue);
> +    qemu_co_mutex_init(&lock->mutex);
>  }
>  
>  void qemu_co_rwlock_rdlock(CoRwlock *lock)
>  {
>      Coroutine *self = qemu_coroutine_self();
>  
> -    while (lock->writer) {
> -        qemu_co_queue_wait(&lock->queue, NULL);
> +    qemu_co_mutex_lock(&lock->mutex);
> +    /* For fairness, wait if a writer is in line.  */
> +    while (lock->pending_writer) {
> +        qemu_co_queue_wait(&lock->queue, &lock->mutex);
>      }
>      lock->reader++;
> +    qemu_co_mutex_unlock(&lock->mutex);
> +
> +    /* The rest of the read-side critical section is run without the mutex.  */
>      self->locks_held++;
>  }
>  
> @@ -364,10 +370,13 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
>      Coroutine *self = qemu_coroutine_self();
>  
>      assert(qemu_in_coroutine());
> -    if (lock->writer) {
> -        lock->writer = false;
> +    if (!lock->reader) {
> +        /* The critical section started in qemu_co_rwlock_wrlock.  */
>          qemu_co_queue_restart_all(&lock->queue);
>      } else {
> +        self->locks_held--;
> +
> +        qemu_co_mutex_lock(&lock->mutex);
>          lock->reader--;
>          assert(lock->reader >= 0);
>          /* Wakeup only one waiting writer */
> @@ -375,16 +384,20 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
>              qemu_co_queue_next(&lock->queue);
>          }
>      }
> -    self->locks_held--;
> +    qemu_co_mutex_unlock(&lock->mutex);
>  }
>  
>  void qemu_co_rwlock_wrlock(CoRwlock *lock)
>  {
> -    Coroutine *self = qemu_coroutine_self();
> -
> -    while (lock->writer || lock->reader) {
> -        qemu_co_queue_wait(&lock->queue, NULL);
> +    qemu_co_mutex_lock(&lock->mutex);
> +    lock->pending_writer++;
> +    while (lock->reader) {
> +        qemu_co_queue_wait(&lock->queue, &lock->mutex);
>      }
> -    lock->writer = true;
> -    self->locks_held++;
> +    lock->pending_writer--;
> +
> +    /* The rest of the write-side critical section is run with
> +     * the mutex taken, so that lock->reader remains zero.
> +     * There is no need to update self->locks_held.
> +     */

But is it still better to update self->locks_held anyway for the
'assert(!co->locks_held)' in qemu_coroutine_enter? Or is the same thing checked
elsewhere?

Fam

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe
  2017-02-13 18:12 [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Paolo Bonzini
                   ` (5 preceding siblings ...)
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 6/6] coroutine-lock: make CoRwlock thread-safe and fair Paolo Bonzini
@ 2017-02-15  9:24 ` Fam Zheng
  2017-02-16 17:17 ` Stefan Hajnoczi
  7 siblings, 0 replies; 14+ messages in thread
From: Fam Zheng @ 2017-02-15  9:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, stefanha, qemu-block

On Mon, 02/13 19:12, Paolo Bonzini wrote:
> This is yet another tiny bit of the multiqueue work, this time affecting
> the synchronization infrastructure for coroutines.  Currently, coroutines
> synchronize between the main I/O thread and the dataplane iothread through
> the AioContext lock.  However, for multiqueue a single BDS will be used
> by multiple iothreads and hence multiple AioContexts.  This calls for
> a different approach to coroutine synchronization, and this series is my
> attempt.

Looks good to me!

Reviewed-by: Fam Zheng <famz@redhat.com>

Fam

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] coroutine-lock: make CoRwlock thread-safe and fair
  2017-02-15  9:23   ` Fam Zheng
@ 2017-02-15 11:20     ` Paolo Bonzini
  2017-02-15 11:37       ` Fam Zheng
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2017-02-15 11:20 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-devel, stefanha, qemu-block



On 15/02/2017 10:23, Fam Zheng wrote:
> On Mon, 02/13 19:12, Paolo Bonzini wrote:
>> This adds a CoMutex around the existing CoQueue.  Because the write-side
> 
> s/CoQueue/CoRwlock/

No, I meant that CoRwlock has a CoQueue, and after this patch it is
wrapped by a CoMutex too.


>> @@ -375,16 +384,20 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
>>              qemu_co_queue_next(&lock->queue);
>>          }
>>      }
>> -    self->locks_held--;
>> +    qemu_co_mutex_unlock(&lock->mutex);
>>  }
>>  
>>  void qemu_co_rwlock_wrlock(CoRwlock *lock)
>>  {
>> -    Coroutine *self = qemu_coroutine_self();
>> -
>> -    while (lock->writer || lock->reader) {
>> -        qemu_co_queue_wait(&lock->queue, NULL);
>> +    qemu_co_mutex_lock(&lock->mutex);
>> +    lock->pending_writer++;
>> +    while (lock->reader) {
>> +        qemu_co_queue_wait(&lock->queue, &lock->mutex);
>>      }
>> -    lock->writer = true;
>> -    self->locks_held++;
>> +    lock->pending_writer--;
>> +
>> +    /* The rest of the write-side critical section is run with
>> +     * the mutex taken, so that lock->reader remains zero.
>> +     * There is no need to update self->locks_held.
>> +     */
> 
> But is it still better to update self->locks_held anyway for the
> 'assert(!co->locks_held)' in qemu_coroutine_enter? Or is the same thing checked
> elsewhere?

self->locks_held is already incremented by the qemu_co_mutex_lock call
at the beginning of qemu_co_rwlock_wrlock.  It is then decremented in
qemu_co_rwlock_unlock.

For the read side, rdlock _unlocks_ lock->mutex before returning, so
self->locks_held must be incremented by rdlock and decremented by unlock.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] coroutine-lock: make CoRwlock thread-safe and fair
  2017-02-15 11:20     ` Paolo Bonzini
@ 2017-02-15 11:37       ` Fam Zheng
  0 siblings, 0 replies; 14+ messages in thread
From: Fam Zheng @ 2017-02-15 11:37 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, qemu-devel, stefanha

On Wed, 02/15 12:20, Paolo Bonzini wrote:
> 
> 
> On 15/02/2017 10:23, Fam Zheng wrote:
> > On Mon, 02/13 19:12, Paolo Bonzini wrote:
> >> This adds a CoMutex around the existing CoQueue.  Because the write-side
> > 
> > s/CoQueue/CoRwlock/
> 
> No, I meant that CoRwlock has a CoQueue, and after this patch it is
> wrapped by a CoMutex too.

OK.

> 
> 
> >> @@ -375,16 +384,20 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
> >>              qemu_co_queue_next(&lock->queue);
> >>          }
> >>      }
> >> -    self->locks_held--;
> >> +    qemu_co_mutex_unlock(&lock->mutex);
> >>  }
> >>  
> >>  void qemu_co_rwlock_wrlock(CoRwlock *lock)
> >>  {
> >> -    Coroutine *self = qemu_coroutine_self();
> >> -
> >> -    while (lock->writer || lock->reader) {
> >> -        qemu_co_queue_wait(&lock->queue, NULL);
> >> +    qemu_co_mutex_lock(&lock->mutex);
> >> +    lock->pending_writer++;
> >> +    while (lock->reader) {
> >> +        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> >>      }
> >> -    lock->writer = true;
> >> -    self->locks_held++;
> >> +    lock->pending_writer--;
> >> +
> >> +    /* The rest of the write-side critical section is run with
> >> +     * the mutex taken, so that lock->reader remains zero.
> >> +     * There is no need to update self->locks_held.
> >> +     */
> > 
> > But is it still better to update self->locks_held anyway for the
> > 'assert(!co->locks_held)' in qemu_coroutine_enter? Or is the same thing checked
> > elsewhere?
> 
> self->locks_held is already incremented by the qemu_co_mutex_lock call
> at the beginning of qemu_co_rwlock_wrlock.  It is then decremented in
> qemu_co_rwlock_unlock.
> 
> For the read side, rdlock _unlocks_ lock->mutex before returning, so
> self->locks_held must be incremented by rdlock and decremented by unlock.

Makes sense.

Fam

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6] coroutine-lock: make CoMutex thread-safe
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 1/6] coroutine-lock: make CoMutex thread-safe Paolo Bonzini
@ 2017-02-16 16:03   ` Stefan Hajnoczi
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2017-02-16 16:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 622 bytes --]

On Mon, Feb 13, 2017 at 07:12:39PM +0100, Paolo Bonzini wrote:
> diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
> index 534807d..ada8c48 100644
> --- a/tests/test-aio-multithread.c
> +++ b/tests/test-aio-multithread.c
> @@ -196,6 +196,88 @@ static void test_multi_co_schedule_10(void)
>      test_multi_co_schedule(10);
>  }
>  
> +/* CoMutex thread-safety.  */
> +
> +static uint32_t atomic_counter;
> +static uint32_t running;
> +static uint32_t counter;
> +static CoMutex comutex;
> +
> +static void test_multi_co_mutex_entry(void *opaque)

Minor nit: missing coroutine_fn.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] coroutine-lock: add limited spinning to CoMutex
  2017-02-13 18:12 ` [Qemu-devel] [PATCH 2/6] coroutine-lock: add limited spinning to CoMutex Paolo Bonzini
@ 2017-02-16 16:10   ` Stefan Hajnoczi
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2017-02-16 16:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

On Mon, Feb 13, 2017 at 07:12:40PM +0100, Paolo Bonzini wrote:
> Running a very small critical section on pthread_mutex_t and CoMutex
> shows that pthread_mutex_t is much faster because it doesn't actually
> go to sleep.  What happens is that the critical section is shorter
> than the latency of entering the kernel and thus FUTEX_WAIT always
> fails.  With CoMutex there is no such latency but you still want to
> avoid wait and wakeup.  So introduce it artificially.
> 
> This only works with one waiters; because CoMutex is fair, it will
> always have more waits and wakeups than a pthread_mutex_t.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/qemu/coroutine.h   |  5 +++++
>  util/qemu-coroutine-lock.c | 51 ++++++++++++++++++++++++++++++++++++++++------
>  util/qemu-coroutine.c      |  2 +-
>  3 files changed, 51 insertions(+), 7 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe
  2017-02-13 18:12 [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Paolo Bonzini
                   ` (6 preceding siblings ...)
  2017-02-15  9:24 ` [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Fam Zheng
@ 2017-02-16 17:17 ` Stefan Hajnoczi
  7 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2017-02-16 17:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 4273 bytes --]

On Mon, Feb 13, 2017 at 07:12:38PM +0100, Paolo Bonzini wrote:
> This is yet another tiny bit of the multiqueue work, this time affecting
> the synchronization infrastructure for coroutines.  Currently, coroutines
> synchronize between the main I/O thread and the dataplane iothread through
> the AioContext lock.  However, for multiqueue a single BDS will be used
> by multiple iothreads and hence multiple AioContexts.  This calls for
> a different approach to coroutine synchronization, and this series is my
> attempt.
> 
> After the previous series, coroutines are already bound to an AioContext
> while they wait on a CoMutex.  Of course currently a CoMutex is generally
> not used across multiple iothreads, because you have to acquire/release
> the AioContext around CoMutex critical sections.  This series is the
> missing link between the aio_co_schedule/aio_co_wake infrastructure and
> making BDS thread-safe: by making CoMutex thread-safe, it removes
> the need for a "thread-based" mutex around it.
> 
> This will still need some changes in the formats because, for multiqueue,
> CoMutexes would need to be used like "real" thread mutexes.  Code like
> this:
> 
>     ...
>     qemu_co_mutex_unlock()
>     ... /* still access shared data, but don't yield */
>     qemu_coroutine_yield()
> 
> might be required to use this other pattern:
> 
>     ... /* access shared data, but don't yield */
>     qemu_co_mutex_unlock()
>     qemu_coroutine_yield()
> 
> because adding a second AioContext is already introducing concurrency that
> wasn't there before.  Still, even if you have to take concurrency into
> account, multitasking between coroutines remains non-preemptive.  So for
> example, it is easy to synchronize between qemu_co_mutex_lock's yield
> and the qemu_coroutine_enter in aio_co_schedule's bottom half.
> 
> CoMutex puts coroutines to sleep with qemu_coroutine_yield and wake them
> up with aio_co_wake.  I could have wrapped CoMutex's CoQueue with
> a "regular" thread mutex or spinlock.  The resulting code would
> have looked a lot like RFifoLock (with CoQueue replacing RFifoLock's
> condition variable).  Instead, CoMutex is implemented from scratch and
> CoQueue is made to depend on a CoMutex, similar to condition variables.
> Most CoQueues already have a corresponding CoMutex so this is not a big
> deal; converting the others is left for a future series, but a surprising
> number of drivers actually need no change.
> 
> The mutex algorithm comes from OSv; it only needs two to four atomic ops
> for a lock-unlock pair (two when uncontended) and if necessary
> we could even take OSv's support for wait morphing (which avoids the
> thundering herd problem) and add it to CoMutex and CoQueue.
> 
> Performance of CoMutex is comparable to pthread mutexes.  However, you
> cannot make a direct comparison between CoMutex (fair) and pthread_mutex_t
> (unfair).  For this reason the testcase also measures performance of
> a quick-and-dirty implementation of a fair mutex, based on MCS locks
> and futexes.
> 
> Paolo
> 
> Paolo Bonzini (6):
>   coroutine-lock: make CoMutex thread-safe
>   coroutine-lock: add limited spinning to CoMutex
>   test-aio-multithread: add performance comparison with thread-based
>     mutexes
>   coroutine-lock: place CoMutex before CoQueue in header
>   coroutine-lock: add mutex argument to CoQueue APIs
>   coroutine-lock: make CoRwlock thread-safe and fair
> 
>  block/backup.c               |   2 +-
>  block/io.c                   |   4 +-
>  block/nbd-client.c           |   2 +-
>  block/qcow2-cluster.c        |   4 +-
>  block/sheepdog.c             |   2 +-
>  block/throttle-groups.c      |   2 +-
>  hw/9pfs/9p.c                 |   2 +-
>  include/qemu/coroutine.h     |  84 +++++++++------
>  tests/test-aio-multithread.c | 250 +++++++++++++++++++++++++++++++++++++++++++
>  util/qemu-coroutine-lock.c   | 247 ++++++++++++++++++++++++++++++++++++++----
>  util/qemu-coroutine.c        |   2 +-
>  util/trace-events            |   1 +
>  12 files changed, 537 insertions(+), 65 deletions(-)
> 
> -- 
> 2.9.3
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-02-16 17:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-13 18:12 [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Paolo Bonzini
2017-02-13 18:12 ` [Qemu-devel] [PATCH 1/6] coroutine-lock: make CoMutex thread-safe Paolo Bonzini
2017-02-16 16:03   ` Stefan Hajnoczi
2017-02-13 18:12 ` [Qemu-devel] [PATCH 2/6] coroutine-lock: add limited spinning to CoMutex Paolo Bonzini
2017-02-16 16:10   ` Stefan Hajnoczi
2017-02-13 18:12 ` [Qemu-devel] [PATCH 3/6] test-aio-multithread: add performance comparison with thread-based mutexes Paolo Bonzini
2017-02-13 18:12 ` [Qemu-devel] [PATCH 4/6] coroutine-lock: place CoMutex before CoQueue in header Paolo Bonzini
2017-02-13 18:12 ` [Qemu-devel] [PATCH 5/6] coroutine-lock: add mutex argument to CoQueue APIs Paolo Bonzini
2017-02-13 18:12 ` [Qemu-devel] [PATCH 6/6] coroutine-lock: make CoRwlock thread-safe and fair Paolo Bonzini
2017-02-15  9:23   ` Fam Zheng
2017-02-15 11:20     ` Paolo Bonzini
2017-02-15 11:37       ` Fam Zheng
2017-02-15  9:24 ` [Qemu-devel] [PATCH 0/6] Make CoMutex/CoQueue/CoRwlock thread-safe Fam Zheng
2017-02-16 17:17 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.