[PATCH v3 0/5] coroutine rwlock downgrade fix, minor VDI changes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/5] coroutine rwlock downgrade fix, minor VDI changes
@ 2021-03-16 16:00 Paolo Bonzini
  2021-03-16 16:00 ` [PATCH 1/5] block/vdi: When writing new bmap entry fails, don't leak the buffer Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Paolo Bonzini @ 2021-03-16 16:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

This is a resubmit of David Edmondson's series at
https://patchew.org/QEMU/20210309144015.557477-1-david.edmondson@oracle.com/.
After closer analysis on IRC, the CoRwLock's attempt to ensure
fairness turned out to be flawed.  Therefore, this series
reimplements CoRwLock without using a CoQueue.  Tracking whether
each queued coroutine is a reader/writer makes it possible to
never wake a writer when only readers should be allowed and
vice versa.

David Edmondson (4):
  block/vdi: When writing new bmap entry fails, don't leak the buffer
  block/vdi: Don't assume that blocks are larger than VdiHeader
  coroutine/mutex: Store the coroutine in the CoWaitRecord only once
  test-coroutine: Add rwlock downgrade test

Paolo Bonzini (1):
  coroutine-lock: reimplement CoRwLock to fix downgrade bug

 block/vdi.c                 |  11 ++-
 include/qemu/coroutine.h    |  10 ++-
 tests/unit/test-coroutine.c | 112 ++++++++++++++++++++++++++
 util/qemu-coroutine-lock.c  | 151 +++++++++++++++++++++++-------------
 4 files changed, 225 insertions(+), 59 deletions(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] block/vdi: When writing new bmap entry fails, don't leak the buffer
  2021-03-16 16:00 [PATCH v3 0/5] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
@ 2021-03-16 16:00 ` Paolo Bonzini
  2021-03-16 16:00 ` [PATCH 2/5] block/vdi: Don't assume that blocks are larger than VdiHeader Paolo Bonzini
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2021-03-16 16:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: david.edmondson, kwolf, Philippe Mathieu-Daudé, qemu-block

From: David Edmondson <david.edmondson@oracle.com>

If a new bitmap entry is allocated, requiring the entire block to be
written, avoiding leaking the buffer allocated for the block should
the write fail.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210309144015.557477-2-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/vdi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/vdi.c b/block/vdi.c
index 5627e7d764..2a6dc26124 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -690,6 +690,7 @@ nonallocating_write:
 
     logout("finished data write\n");
     if (ret < 0) {
+        g_free(block);
         return ret;
     }
 
-- 
2.29.2




^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] block/vdi: Don't assume that blocks are larger than VdiHeader
  2021-03-16 16:00 [PATCH v3 0/5] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
  2021-03-16 16:00 ` [PATCH 1/5] block/vdi: When writing new bmap entry fails, don't leak the buffer Paolo Bonzini
@ 2021-03-16 16:00 ` Paolo Bonzini
  2021-03-16 16:00 ` [PATCH 3/5] coroutine/mutex: Store the coroutine in the CoWaitRecord only once Paolo Bonzini
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2021-03-16 16:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

From: David Edmondson <david.edmondson@oracle.com>

Given that the block size is read from the header of the VDI file, a
wide variety of sizes might be seen. Rather than re-using a block
sized memory region when writing the VDI header, allocate an
appropriately sized buffer.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210309144015.557477-3-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/vdi.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 2a6dc26124..548f8a057b 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -696,18 +696,20 @@ nonallocating_write:
 
     if (block) {
         /* One or more new blocks were allocated. */
-        VdiHeader *header = (VdiHeader *) block;
+        VdiHeader *header;
         uint8_t *base;
         uint64_t offset;
         uint32_t n_sectors;
 
+        g_free(block);
+        header = g_malloc(sizeof(*header));
+
         logout("now writing modified header\n");
         assert(VDI_IS_ALLOCATED(bmap_first));
         *header = s->header;
         vdi_header_to_le(header);
-        ret = bdrv_pwrite(bs->file, 0, block, sizeof(VdiHeader));
-        g_free(block);
-        block = NULL;
+        ret = bdrv_pwrite(bs->file, 0, header, sizeof(*header));
+        g_free(header);
 
         if (ret < 0) {
             return ret;
-- 
2.29.2




^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] coroutine/mutex: Store the coroutine in the CoWaitRecord only once
  2021-03-16 16:00 [PATCH v3 0/5] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
  2021-03-16 16:00 ` [PATCH 1/5] block/vdi: When writing new bmap entry fails, don't leak the buffer Paolo Bonzini
  2021-03-16 16:00 ` [PATCH 2/5] block/vdi: Don't assume that blocks are larger than VdiHeader Paolo Bonzini
@ 2021-03-16 16:00 ` Paolo Bonzini
  2021-03-16 16:00 ` [PATCH 4/5] coroutine-lock: reimplement CoRwLock to fix downgrade bug Paolo Bonzini
  2021-03-16 16:00 ` [PATCH 5/5] test-coroutine: Add rwlock downgrade test Paolo Bonzini
  4 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2021-03-16 16:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: david.edmondson, kwolf, Philippe Mathieu-Daudé, qemu-block

From: David Edmondson <david.edmondson@oracle.com>

When taking the slow path for mutex acquisition, set the coroutine
value in the CoWaitRecord in push_waiter(), rather than both there and
in the caller.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210309144015.557477-4-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/qemu-coroutine-lock.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 5816bf8900..eb73cf11dc 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -204,7 +204,6 @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(AioContext *ctx,
     unsigned old_handoff;
 
     trace_qemu_co_mutex_lock_entry(mutex, self);
-    w.co = self;
     push_waiter(mutex, &w);
 
     /* This is the "Responsibility Hand-Off" protocol; a lock() picks from
-- 
2.29.2




^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] coroutine-lock: reimplement CoRwLock to fix downgrade bug
  2021-03-16 16:00 [PATCH v3 0/5] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
                   ` (2 preceding siblings ...)
  2021-03-16 16:00 ` [PATCH 3/5] coroutine/mutex: Store the coroutine in the CoWaitRecord only once Paolo Bonzini
@ 2021-03-16 16:00 ` Paolo Bonzini
  2021-03-17 10:40   ` David Edmondson
  2021-03-16 16:00 ` [PATCH 5/5] test-coroutine: Add rwlock downgrade test Paolo Bonzini
  4 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2021-03-16 16:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

A feature of the current rwlock is that if multiple coroutines hold a
reader lock, all must be runnable. The unlock implementation relies on
this, choosing to wake a single coroutine when the final read lock
holder exits the critical section, assuming that it will wake a
coroutine attempting to acquire a write lock.

The downgrade implementation violates this assumption by creating a
read lock owning coroutine that is exclusively runnable - any other
coroutines that are waiting to acquire a read lock are *not* made
runnable when the write lock holder converts its ownership to read
only.

To fix this, keep the queue of waiters explicitly in the CoRwLock
instead of using CoQueue, and store for each whether it is a
potential reader or a writer.  This way, downgrade can look at the
first queued coroutines and wake it if it is a reader, causing
all other readers to be released in turn.

Reported-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/coroutine.h   |  10 ++-
 util/qemu-coroutine-lock.c | 150 ++++++++++++++++++++++++-------------
 2 files changed, 106 insertions(+), 54 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 84eab6e3bf..ae62d4bc8d 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -237,11 +237,15 @@ bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock);
 bool qemu_co_queue_empty(CoQueue *queue);
 
 
+typedef struct CoRwTicket CoRwTicket;
 typedef struct CoRwlock {
-    int pending_writer;
-    int reader;
     CoMutex mutex;
-    CoQueue queue;
+
+    /* Number of readers, of -1 if owned for writing.  */
+    int owner;
+
+    /* Waiting coroutines.  */
+    QSIMPLEQ_HEAD(, CoRwTicket) tickets;
 } CoRwlock;
 
 /**
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index eb73cf11dc..655634d185 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -327,11 +327,70 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
     trace_qemu_co_mutex_unlock_return(mutex, self);
 }
 
+struct CoRwTicket {
+    bool read;
+    Coroutine *co;
+    QSIMPLEQ_ENTRY(CoRwTicket) next;
+};
+
 void qemu_co_rwlock_init(CoRwlock *lock)
 {
-    memset(lock, 0, sizeof(*lock));
-    qemu_co_queue_init(&lock->queue);
     qemu_co_mutex_init(&lock->mutex);
+    lock->owner = 0;
+    QSIMPLEQ_INIT(&lock->tickets);
+}
+
+/* Releases the internal CoMutex.  */
+static void qemu_co_rwlock_maybe_wake_one(CoRwlock *lock)
+{
+    CoRwTicket *tkt = QSIMPLEQ_FIRST(&lock->tickets);
+    Coroutine *co = NULL;
+
+    /*
+     * Setting lock->owner here prevents rdlock and wrlock from
+     * sneaking in between unlock and wake.
+     */
+
+    if (tkt) {
+        if (tkt->read) {
+            if (lock->owner >= 0) {
+                lock->owner++;
+                co = tkt->co;
+            }
+        } else {
+            if (lock->owner == 0) {
+                lock->owner = -1;
+                co = tkt->co;
+            }
+        }
+    }
+
+    if (co) {
+        QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
+        qemu_co_mutex_unlock(&lock->mutex);
+        aio_co_wake(co);
+    } else {
+        qemu_co_mutex_unlock(&lock->mutex);
+    }
+}
+
+/* Releases the internal CoMutex.  */
+static void qemu_co_rwlock_sleep(bool read, CoRwlock *lock)
+{
+    CoRwTicket my_ticket = { read, qemu_coroutine_self() };
+
+    QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
+    qemu_co_mutex_unlock(&lock->mutex);
+    qemu_coroutine_yield();
+
+    if (read) {
+        /* Possibly wake another reader, which will wake the next in line.  */
+        assert(lock->owner >= 1);
+        qemu_co_mutex_lock(&lock->mutex);
+        qemu_co_rwlock_maybe_wake_one(lock);
+    } else {
+        assert(lock->owner == -1);
+    }
 }
 
 void qemu_co_rwlock_rdlock(CoRwlock *lock)
@@ -339,13 +398,13 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock)
 
     qemu_co_mutex_lock(&lock->mutex);
     /* For fairness, wait if a writer is in line.  */
-    while (lock->pending_writer) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+    if (lock->owner == 0 || (lock->owner > 0 && QSIMPLEQ_EMPTY(&lock->tickets))) {
+        lock->owner++;
+        qemu_co_mutex_unlock(&lock->mutex);
+    } else {
+        qemu_co_rwlock_sleep(true, lock);
     }
-    lock->reader++;
-    qemu_co_mutex_unlock(&lock->mutex);
 
-    /* The rest of the read-side critical section is run without the mutex.  */
     self->locks_held++;
 }
 
@@ -355,69 +413,58 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
     Coroutine *self = qemu_coroutine_self();
 
     assert(qemu_in_coroutine());
-    if (!lock->reader) {
-        /* The critical section started in qemu_co_rwlock_wrlock.  */
-        qemu_co_queue_restart_all(&lock->queue);
+    self->locks_held--;
+
+    qemu_co_mutex_lock(&lock->mutex);
+    if (lock->owner == -1) {
+        lock->owner = 0;
     } else {
-        self->locks_held--;
+        lock->owner--;
+    }
 
-        qemu_co_mutex_lock(&lock->mutex);
-        lock->reader--;
-        assert(lock->reader >= 0);
-        /* Wakeup only one waiting writer */
-        if (!lock->reader) {
-            qemu_co_queue_next(&lock->queue);
-        }
+    if (lock->owner == 0) {
+        qemu_co_rwlock_maybe_wake_one(lock);
+    } else {
+        qemu_co_mutex_unlock(&lock->mutex);
     }
-    qemu_co_mutex_unlock(&lock->mutex);
 }
 
 void qemu_co_rwlock_downgrade(CoRwlock *lock)
 {
-    Coroutine *self = qemu_coroutine_self();
-
-    /* lock->mutex critical section started in qemu_co_rwlock_wrlock or
-     * qemu_co_rwlock_upgrade.
-     */
-    assert(lock->reader == 0);
-    lock->reader++;
-    qemu_co_mutex_unlock(&lock->mutex);
+    qemu_co_mutex_lock(&lock->mutex);
+    assert(lock->owner == -1);
+    lock->owner = 1;
 
-    /* The rest of the read-side critical section is run without the mutex.  */
-    self->locks_held++;
+    /* Possibly wake another reader, which will wake the next in line.  */
+    qemu_co_rwlock_maybe_wake_one(lock);
 }
 
 void qemu_co_rwlock_wrlock(CoRwlock *lock)
 {
+    Coroutine *self = qemu_coroutine_self();
+
     qemu_co_mutex_lock(&lock->mutex);
-    lock->pending_writer++;
-    while (lock->reader) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+    if (lock->owner == 0) {
+        lock->owner = -1;
+        qemu_co_mutex_unlock(&lock->mutex);
+    } else {
+        qemu_co_rwlock_sleep(false, lock);
     }
-    lock->pending_writer--;
 
-    /* The rest of the write-side critical section is run with
-     * the mutex taken, so that lock->reader remains zero.
-     * There is no need to update self->locks_held.
-     */
+    self->locks_held++;
 }
 
 void qemu_co_rwlock_upgrade(CoRwlock *lock)
 {
-    Coroutine *self = qemu_coroutine_self();
-
     qemu_co_mutex_lock(&lock->mutex);
-    assert(lock->reader > 0);
-    lock->reader--;
-    lock->pending_writer++;
-    while (lock->reader) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+    assert(lock->owner > 0);
+    /* For fairness, wait if a writer is in line.  */
+    if (lock->owner == 1 && QSIMPLEQ_EMPTY(&lock->tickets)) {
+        lock->owner = -1;
+        qemu_co_mutex_unlock(&lock->mutex);
+    } else {
+        lock->owner--;
+        qemu_co_rwlock_sleep(false, lock);
     }
-    lock->pending_writer--;
 
-    /* The rest of the write-side critical section is run with
-     * the mutex taken, similar to qemu_co_rwlock_wrlock.  Do
-     * not account for the lock twice in self->locks_held.
-     */
-    self->locks_held--;
 }
-- 
2.29.2




^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] test-coroutine: Add rwlock downgrade test
  2021-03-16 16:00 [PATCH v3 0/5] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
                   ` (3 preceding siblings ...)
  2021-03-16 16:00 ` [PATCH 4/5] coroutine-lock: reimplement CoRwLock to fix downgrade bug Paolo Bonzini
@ 2021-03-16 16:00 ` Paolo Bonzini
  4 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2021-03-16 16:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

From: David Edmondson <david.edmondson@oracle.com>

Test that downgrading an rwlock does not result in a failure to
schedule coroutines queued on the rwlock.

The diagram associated with test_co_rwlock_downgrade() describes the
intended behaviour, but what was observed previously corresponds to:

| c1     | c2         | c3         | c4       |
|--------+------------+------------+----------|
| rdlock |            |            |          |
| yield  |            |            |          |
|        | wrlock     |            |          |
|        | <queued>   |            |          |
|        |            | rdlock     |          |
|        |            | <queued>   |          |
|        |            |            | wrlock   |
|        |            |            | <queued> |
| unlock |            |            |          |
| yield  |            |            |          |
|        | <dequeued> |            |          |
|        | downgrade  |            |          |
|        | ...        |            |          |
|        | unlock     |            |          |
|        |            | <dequeued> |          |
|        |            | <queued>   |          |

This results in a failure...

ERROR:../tests/test-coroutine.c:369:test_co_rwlock_downgrade: assertion failed: (c3_done)
Bail out! ERROR:../tests/test-coroutine.c:369:test_co_rwlock_downgrade: assertion failed: (c3_done)

...as a result of the c3 coroutine failing to run to completion.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210309144015.557477-5-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        Unlike in David's solution, here c3 is run as soon as c2
        is downgraded.  This only affects the drawing and not the
        test code.

 tests/unit/test-coroutine.c | 109 ++++++++++++++++++++++++++++++++++++
 1 file changed, 109 insertions(+)

diff --git a/tests/unit/test-coroutine.c b/tests/unit/test-coroutine.c
index e946d93a65..62b0092721 100644
--- a/tests/unit/test-coroutine.c
+++ b/tests/unit/test-coroutine.c
@@ -264,6 +264,114 @@ static void test_co_mutex_lockable(void)
     g_assert(QEMU_MAKE_LOCKABLE(null_pointer) == NULL);
 }
 
+static bool c1_done;
+static bool c2_done;
+static bool c3_done;
+static bool c4_done;
+
+static void coroutine_fn rwlock_c1(void *opaque)
+{
+    CoRwlock *l = opaque;
+
+    qemu_co_rwlock_rdlock(l);
+    qemu_coroutine_yield();
+
+    qemu_co_rwlock_unlock(l);
+    qemu_coroutine_yield();
+
+    c1_done = true;
+}
+
+static void coroutine_fn rwlock_c2(void *opaque)
+{
+    CoRwlock *l = opaque;
+
+    qemu_co_rwlock_wrlock(l);
+
+    qemu_co_rwlock_downgrade(l);
+    qemu_co_rwlock_unlock(l);
+    c2_done = true;
+}
+
+static void coroutine_fn rwlock_c3(void *opaque)
+{
+    CoRwlock *l = opaque;
+
+    qemu_co_rwlock_rdlock(l);
+
+    qemu_co_rwlock_unlock(l);
+    c3_done = true;
+}
+
+static void coroutine_fn rwlock_c4(void *opaque)
+{
+    CoRwlock *l = opaque;
+
+    qemu_co_rwlock_wrlock(l);
+
+    qemu_co_rwlock_unlock(l);
+    c4_done = true;
+}
+
+/*
+ * Check that downgrading a reader-writer lock does not cause a hang.
+ *
+ * Four coroutines are used to produce a situation where there are
+ * both reader and writer hopefuls waiting to acquire an rwlock that
+ * is held by a reader.
+ *
+ * The correct sequence of operations we aim to provoke can be
+ * represented as:
+ *
+ * | c1     | c2         | c3         | c4         |
+ * |--------+------------+------------+------------|
+ * | rdlock |            |            |            |
+ * | yield  |            |            |            |
+ * |        | wrlock     |            |            |
+ * |        | <queued>   |            |            |
+ * |        |            | rdlock     |            |
+ * |        |            | <queued>   |            |
+ * |        |            |            | wrlock     |
+ * |        |            |            | <queued>   |
+ * | unlock |            |            |            |
+ * | yield  |            |            |            |
+ * |        | <dequeued> |            |            |
+ * |        | downgrade  |            |            |
+ * |        |            | <dequeued> |            |
+ * |        |            | unlock     |            |
+ * |        | ...        |            |            |
+ * |        | unlock     |            |            |
+ * |        |            |            | <dequeued> |
+ * |        |            |            | unlock     |
+ */
+static void test_co_rwlock_downgrade(void)
+{
+    CoRwlock l;
+    Coroutine *c1, *c2, *c3, *c4;
+
+    qemu_co_rwlock_init(&l);
+
+    c1 = qemu_coroutine_create(rwlock_c1, &l);
+    c2 = qemu_coroutine_create(rwlock_c2, &l);
+    c3 = qemu_coroutine_create(rwlock_c3, &l);
+    c4 = qemu_coroutine_create(rwlock_c4, &l);
+
+    qemu_coroutine_enter(c1);
+    qemu_coroutine_enter(c2);
+    qemu_coroutine_enter(c3);
+    qemu_coroutine_enter(c4);
+
+    qemu_coroutine_enter(c1);
+
+    g_assert(c2_done);
+    g_assert(c3_done);
+    g_assert(c4_done);
+
+    qemu_coroutine_enter(c1);
+
+    g_assert(c1_done);
+}
+
 /*
  * Check that creation, enter, and return work
  */
@@ -501,6 +612,7 @@ int main(int argc, char **argv)
     g_test_add_func("/basic/order", test_order);
     g_test_add_func("/locking/co-mutex", test_co_mutex);
     g_test_add_func("/locking/co-mutex/lockable", test_co_mutex_lockable);
+    g_test_add_func("/locking/co-rwlock/downgrade", test_co_rwlock_downgrade);
     if (g_test_perf()) {
         g_test_add_func("/perf/lifecycle", perf_lifecycle);
         g_test_add_func("/perf/nesting", perf_nesting);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/5] coroutine-lock: reimplement CoRwLock to fix downgrade bug
  2021-03-16 16:00 ` [PATCH 4/5] coroutine-lock: reimplement CoRwLock to fix downgrade bug Paolo Bonzini
@ 2021-03-17 10:40   ` David Edmondson
  2021-03-17 12:00     ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: David Edmondson @ 2021-03-17 10:40 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

On Tuesday, 2021-03-16 at 17:00:06 +01, Paolo Bonzini wrote:

> A feature of the current rwlock is that if multiple coroutines hold a
> reader lock, all must be runnable. The unlock implementation relies on
> this, choosing to wake a single coroutine when the final read lock
> holder exits the critical section, assuming that it will wake a
> coroutine attempting to acquire a write lock.
>
> The downgrade implementation violates this assumption by creating a
> read lock owning coroutine that is exclusively runnable - any other
> coroutines that are waiting to acquire a read lock are *not* made
> runnable when the write lock holder converts its ownership to read
> only.
>
> To fix this, keep the queue of waiters explicitly in the CoRwLock
> instead of using CoQueue, and store for each whether it is a
> potential reader or a writer.  This way, downgrade can look at the
> first queued coroutines and wake it if it is a reader, causing
> all other readers to be released in turn.
>
> Reported-by: David Edmondson <david.edmondson@oracle.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/qemu/coroutine.h   |  10 ++-
>  util/qemu-coroutine-lock.c | 150 ++++++++++++++++++++++++-------------
>  2 files changed, 106 insertions(+), 54 deletions(-)
>
> diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
> index 84eab6e3bf..ae62d4bc8d 100644
> --- a/include/qemu/coroutine.h
> +++ b/include/qemu/coroutine.h
> @@ -237,11 +237,15 @@ bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock);
>  bool qemu_co_queue_empty(CoQueue *queue);
>  
>  
> +typedef struct CoRwTicket CoRwTicket;
>  typedef struct CoRwlock {
> -    int pending_writer;
> -    int reader;
>      CoMutex mutex;
> -    CoQueue queue;
> +
> +    /* Number of readers, of -1 if owned for writing.  */

s/, of/, or/

> +    int owner;
> +
> +    /* Waiting coroutines.  */
> +    QSIMPLEQ_HEAD(, CoRwTicket) tickets;
>  } CoRwlock;

Isn't this...

 * ... Also, @qemu_co_rwlock_upgrade
 * only overrides CoRwlock fairness if there are no concurrent readers, so
 * another writer might run while @qemu_co_rwlock_upgrade blocks.

...now incorrect?

>  /**
> diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
> index eb73cf11dc..655634d185 100644
> --- a/util/qemu-coroutine-lock.c
> +++ b/util/qemu-coroutine-lock.c
> @@ -327,11 +327,70 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
>      trace_qemu_co_mutex_unlock_return(mutex, self);
>  }
>  
> +struct CoRwTicket {
> +    bool read;
> +    Coroutine *co;
> +    QSIMPLEQ_ENTRY(CoRwTicket) next;
> +};
> +
>  void qemu_co_rwlock_init(CoRwlock *lock)
>  {
> -    memset(lock, 0, sizeof(*lock));
> -    qemu_co_queue_init(&lock->queue);
>      qemu_co_mutex_init(&lock->mutex);
> +    lock->owner = 0;
> +    QSIMPLEQ_INIT(&lock->tickets);
> +}
> +
> +/* Releases the internal CoMutex.  */
> +static void qemu_co_rwlock_maybe_wake_one(CoRwlock *lock)
> +{
> +    CoRwTicket *tkt = QSIMPLEQ_FIRST(&lock->tickets);
> +    Coroutine *co = NULL;
> +
> +    /*
> +     * Setting lock->owner here prevents rdlock and wrlock from
> +     * sneaking in between unlock and wake.
> +     */
> +
> +    if (tkt) {
> +        if (tkt->read) {
> +            if (lock->owner >= 0) {
> +                lock->owner++;
> +                co = tkt->co;
> +            }
> +        } else {
> +            if (lock->owner == 0) {
> +                lock->owner = -1;
> +                co = tkt->co;
> +            }
> +        }
> +    }
> +
> +    if (co) {
> +        QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
> +        qemu_co_mutex_unlock(&lock->mutex);
> +        aio_co_wake(co);
> +    } else {
> +        qemu_co_mutex_unlock(&lock->mutex);
> +    }
> +}
> +
> +/* Releases the internal CoMutex.  */
> +static void qemu_co_rwlock_sleep(bool read, CoRwlock *lock)
> +{
> +    CoRwTicket my_ticket = { read, qemu_coroutine_self() };
> +
> +    QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
> +    qemu_co_mutex_unlock(&lock->mutex);
> +    qemu_coroutine_yield();
> +
> +    if (read) {
> +        /* Possibly wake another reader, which will wake the next in line.  */
> +        assert(lock->owner >= 1);
> +        qemu_co_mutex_lock(&lock->mutex);
> +        qemu_co_rwlock_maybe_wake_one(lock);
> +    } else {
> +        assert(lock->owner == -1);
> +    }
>  }
>  
>  void qemu_co_rwlock_rdlock(CoRwlock *lock)
> @@ -339,13 +398,13 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock)
>  
>      qemu_co_mutex_lock(&lock->mutex);
>      /* For fairness, wait if a writer is in line.  */
> -    while (lock->pending_writer) {
> -        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> +    if (lock->owner == 0 || (lock->owner > 0 && QSIMPLEQ_EMPTY(&lock->tickets))) {
> +        lock->owner++;
> +        qemu_co_mutex_unlock(&lock->mutex);
> +    } else {
> +        qemu_co_rwlock_sleep(true, lock);
>      }
> -    lock->reader++;
> -    qemu_co_mutex_unlock(&lock->mutex);
>  
> -    /* The rest of the read-side critical section is run without the mutex.  */
>      self->locks_held++;
>  }
>  
> @@ -355,69 +413,58 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
>      Coroutine *self = qemu_coroutine_self();
>  
>      assert(qemu_in_coroutine());
> -    if (!lock->reader) {
> -        /* The critical section started in qemu_co_rwlock_wrlock.  */
> -        qemu_co_queue_restart_all(&lock->queue);
> +    self->locks_held--;
> +
> +    qemu_co_mutex_lock(&lock->mutex);
> +    if (lock->owner == -1) {
> +        lock->owner = 0;
>      } else {
> -        self->locks_held--;
> +        lock->owner--;
> +    }
>  
> -        qemu_co_mutex_lock(&lock->mutex);
> -        lock->reader--;
> -        assert(lock->reader >= 0);
> -        /* Wakeup only one waiting writer */
> -        if (!lock->reader) {
> -            qemu_co_queue_next(&lock->queue);
> -        }
> +    if (lock->owner == 0) {
> +        qemu_co_rwlock_maybe_wake_one(lock);
> +    } else {
> +        qemu_co_mutex_unlock(&lock->mutex);
>      }
> -    qemu_co_mutex_unlock(&lock->mutex);
>  }
>  
>  void qemu_co_rwlock_downgrade(CoRwlock *lock)
>  {
> -    Coroutine *self = qemu_coroutine_self();
> -
> -    /* lock->mutex critical section started in qemu_co_rwlock_wrlock or
> -     * qemu_co_rwlock_upgrade.
> -     */
> -    assert(lock->reader == 0);
> -    lock->reader++;
> -    qemu_co_mutex_unlock(&lock->mutex);
> +    qemu_co_mutex_lock(&lock->mutex);
> +    assert(lock->owner == -1);
> +    lock->owner = 1;
>  
> -    /* The rest of the read-side critical section is run without the mutex.  */
> -    self->locks_held++;
> +    /* Possibly wake another reader, which will wake the next in line.  */
> +    qemu_co_rwlock_maybe_wake_one(lock);
>  }
>  
>  void qemu_co_rwlock_wrlock(CoRwlock *lock)
>  {
> +    Coroutine *self = qemu_coroutine_self();
> +
>      qemu_co_mutex_lock(&lock->mutex);
> -    lock->pending_writer++;
> -    while (lock->reader) {
> -        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> +    if (lock->owner == 0) {
> +        lock->owner = -1;
> +        qemu_co_mutex_unlock(&lock->mutex);
> +    } else {
> +        qemu_co_rwlock_sleep(false, lock);
>      }
> -    lock->pending_writer--;
>  
> -    /* The rest of the write-side critical section is run with
> -     * the mutex taken, so that lock->reader remains zero.
> -     * There is no need to update self->locks_held.
> -     */
> +    self->locks_held++;
>  }
>  
>  void qemu_co_rwlock_upgrade(CoRwlock *lock)
>  {
> -    Coroutine *self = qemu_coroutine_self();
> -
>      qemu_co_mutex_lock(&lock->mutex);
> -    assert(lock->reader > 0);
> -    lock->reader--;
> -    lock->pending_writer++;
> -    while (lock->reader) {
> -        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> +    assert(lock->owner > 0);
> +    /* For fairness, wait if a writer is in line.  */
> +    if (lock->owner == 1 && QSIMPLEQ_EMPTY(&lock->tickets)) {
> +        lock->owner = -1;
> +        qemu_co_mutex_unlock(&lock->mutex);
> +    } else {
> +        lock->owner--;
> +        qemu_co_rwlock_sleep(false, lock);

Doesn't this need something for the case where lock->owner hits 0?

If not, how is two readers both attempting to upgrade ever resolved?

It feels like it should jump into the second half of
qemu_co_rwlock_wrlock().

>      }
> -    lock->pending_writer--;
>  
> -    /* The rest of the write-side critical section is run with
> -     * the mutex taken, similar to qemu_co_rwlock_wrlock.  Do
> -     * not account for the lock twice in self->locks_held.
> -     */
> -    self->locks_held--;
>  }
> -- 
> 2.29.2

dme.
-- 
And you're standing here beside me, I love the passing of time.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/5] coroutine-lock: reimplement CoRwLock to fix downgrade bug
  2021-03-17 10:40   ` David Edmondson
@ 2021-03-17 12:00     ` Paolo Bonzini
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2021-03-17 12:00 UTC (permalink / raw)
  To: David Edmondson, qemu-devel; +Cc: kwolf, qemu-block

On 17/03/21 11:40, David Edmondson wrote:
> Isn't this...
> 
>   * ... Also, @qemu_co_rwlock_upgrade
>   * only overrides CoRwlock fairness if there are no concurrent readers, so
>   * another writer might run while @qemu_co_rwlock_upgrade blocks.
> 
> ...now incorrect?

Maybe, but for sure the comment was too hard to parse.

>>
>> +    if (lock->owner == 1 && QSIMPLEQ_EMPTY(&lock->tickets)) {
>> +        lock->owner = -1;
>> +        qemu_co_mutex_unlock(&lock->mutex);
>> +    } else {
>> +        lock->owner--;
>> +        qemu_co_rwlock_sleep(false, lock);
> 
> Doesn't this need something for the case where lock->owner hits 0?

You're right, we need to call qemu_co_rwlock_maybe_wake_one if lock goes 
to 0.  The "else" branch would have to be

	if (--lock->owner == 0) {
		qemu_co_rwlock_maybe_wake_one(lock);
		qemu_co_mutex_lock(&lock->mutex);
	}
	qemu_co_rwlock_sleep(false, lock);

In the end it's actually simpler to just inline qemu_co_rwlock_sleep, 
which also leads to the following slightly more optimized code for the 
"else" branch:

	CoRwTicket my_ticket = { false, qemu_coroutine_self() };

	lock->owner--;
	QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
	qemu_co_rwlock_maybe_wake_one(lock);
	qemu_coroutine_yield();
	assert(lock->owner == -1);

I'll add a testcase, too.  You don't even need two upgraders, for example:

	rdlock
	yield
	                wrlock
	upgrade
	<queued>        <dequeued>
	                unlock
         <dequeued>
	unlock

In fact even for this simple case, the old implementation got it wrong! 
  (The new one also did, but the fix is easy).

There are a couple other improvements that can be made: 
qemu_co_rwlock_unlock can also call qemu_co_rwlock_maybe_wake_one 
unconditionally, the "if" around the call is unnecessary; and I'll 
rename "owner" to "owners".

Anyway, there is nothing that really made you scream, so that's good.

Paolo



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-17 12:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-16 16:00 [PATCH v3 0/5] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
2021-03-16 16:00 ` [PATCH 1/5] block/vdi: When writing new bmap entry fails, don't leak the buffer Paolo Bonzini
2021-03-16 16:00 ` [PATCH 2/5] block/vdi: Don't assume that blocks are larger than VdiHeader Paolo Bonzini
2021-03-16 16:00 ` [PATCH 3/5] coroutine/mutex: Store the coroutine in the CoWaitRecord only once Paolo Bonzini
2021-03-16 16:00 ` [PATCH 4/5] coroutine-lock: reimplement CoRwLock to fix downgrade bug Paolo Bonzini
2021-03-17 10:40   ` David Edmondson
2021-03-17 12:00     ` Paolo Bonzini
2021-03-16 16:00 ` [PATCH 5/5] test-coroutine: Add rwlock downgrade test Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).