qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes
@ 2021-03-17 18:00 Paolo Bonzini
  2021-03-17 18:00 ` [PATCH 1/6] block/vdi: When writing new bmap entry fails, don't leak the buffer Paolo Bonzini
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-17 18:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

This is a resubmit of David Edmondson's series at
https://patchew.org/QEMU/20210309144015.557477-1-david.edmondson@oracle.com/.
After closer analysis on IRC, the CoRwlock's attempt to ensure
fairness turned out to be flawed.  Therefore, this series
reimplements CoRwlock without using a CoQueue.  Tracking whether
each queued coroutine is a reader/writer makes it possible to
never wake a writer when only readers should be allowed and
vice versa.

v2->v3: new CoRwlock implementation

v3->v4: fix upgrade and add a test for that, too

v4->v5: typo

David Edmondson (4):
  block/vdi: When writing new bmap entry fails, don't leak the buffer
  block/vdi: Don't assume that blocks are larger than VdiHeader
  coroutine/mutex: Store the coroutine in the CoWaitRecord only once
  test-coroutine: Add rwlock downgrade test

Paolo Bonzini (2):
  coroutine-lock: reimplement CoRwlock to fix downgrade bug
  test-coroutine: add rwlock upgrade test

 block/vdi.c                 |  11 ++-
 include/qemu/coroutine.h    |  17 ++--
 tests/unit/test-coroutine.c | 161 ++++++++++++++++++++++++++++++++++++
 util/qemu-coroutine-lock.c  | 149 +++++++++++++++++++++------------
 4 files changed, 274 insertions(+), 64 deletions(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] block/vdi: When writing new bmap entry fails, don't leak the buffer
  2021-03-17 18:00 [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
@ 2021-03-17 18:00 ` Paolo Bonzini
  2021-03-24 14:25   ` Max Reitz
  2021-03-17 18:00 ` [PATCH 2/6] block/vdi: Don't assume that blocks are larger than VdiHeader Paolo Bonzini
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-17 18:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: david.edmondson, kwolf, Philippe Mathieu-Daudé, qemu-block

From: David Edmondson <david.edmondson@oracle.com>

If a new bitmap entry is allocated, requiring the entire block to be
written, avoiding leaking the buffer allocated for the block should
the write fail.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210309144015.557477-2-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/vdi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/vdi.c b/block/vdi.c
index 5627e7d764..2a6dc26124 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -690,6 +690,7 @@ nonallocating_write:
 
     logout("finished data write\n");
     if (ret < 0) {
+        g_free(block);
         return ret;
     }
 
-- 
2.29.2




^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] block/vdi: Don't assume that blocks are larger than VdiHeader
  2021-03-17 18:00 [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
  2021-03-17 18:00 ` [PATCH 1/6] block/vdi: When writing new bmap entry fails, don't leak the buffer Paolo Bonzini
@ 2021-03-17 18:00 ` Paolo Bonzini
  2021-03-24 14:25   ` Max Reitz
  2021-03-17 18:00 ` [PATCH 3/6] coroutine/mutex: Store the coroutine in the CoWaitRecord only once Paolo Bonzini
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-17 18:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

From: David Edmondson <david.edmondson@oracle.com>

Given that the block size is read from the header of the VDI file, a
wide variety of sizes might be seen. Rather than re-using a block
sized memory region when writing the VDI header, allocate an
appropriately sized buffer.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210309144015.557477-3-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/vdi.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 2a6dc26124..548f8a057b 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -696,18 +696,20 @@ nonallocating_write:
 
     if (block) {
         /* One or more new blocks were allocated. */
-        VdiHeader *header = (VdiHeader *) block;
+        VdiHeader *header;
         uint8_t *base;
         uint64_t offset;
         uint32_t n_sectors;
 
+        g_free(block);
+        header = g_malloc(sizeof(*header));
+
         logout("now writing modified header\n");
         assert(VDI_IS_ALLOCATED(bmap_first));
         *header = s->header;
         vdi_header_to_le(header);
-        ret = bdrv_pwrite(bs->file, 0, block, sizeof(VdiHeader));
-        g_free(block);
-        block = NULL;
+        ret = bdrv_pwrite(bs->file, 0, header, sizeof(*header));
+        g_free(header);
 
         if (ret < 0) {
             return ret;
-- 
2.29.2




^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] coroutine/mutex: Store the coroutine in the CoWaitRecord only once
  2021-03-17 18:00 [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
  2021-03-17 18:00 ` [PATCH 1/6] block/vdi: When writing new bmap entry fails, don't leak the buffer Paolo Bonzini
  2021-03-17 18:00 ` [PATCH 2/6] block/vdi: Don't assume that blocks are larger than VdiHeader Paolo Bonzini
@ 2021-03-17 18:00 ` Paolo Bonzini
  2021-03-17 18:00 ` [PATCH 4/6] coroutine-lock: reimplement CoRwlock to fix downgrade bug Paolo Bonzini
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-17 18:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: david.edmondson, kwolf, Philippe Mathieu-Daudé, qemu-block

From: David Edmondson <david.edmondson@oracle.com>

When taking the slow path for mutex acquisition, set the coroutine
value in the CoWaitRecord in push_waiter(), rather than both there and
in the caller.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210309144015.557477-4-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/qemu-coroutine-lock.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 5816bf8900..eb73cf11dc 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -204,7 +204,6 @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(AioContext *ctx,
     unsigned old_handoff;
 
     trace_qemu_co_mutex_lock_entry(mutex, self);
-    w.co = self;
     push_waiter(mutex, &w);
 
     /* This is the "Responsibility Hand-Off" protocol; a lock() picks from
-- 
2.29.2




^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] coroutine-lock: reimplement CoRwlock to fix downgrade bug
  2021-03-17 18:00 [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
                   ` (2 preceding siblings ...)
  2021-03-17 18:00 ` [PATCH 3/6] coroutine/mutex: Store the coroutine in the CoWaitRecord only once Paolo Bonzini
@ 2021-03-17 18:00 ` Paolo Bonzini
  2021-03-24 16:15   ` Stefan Hajnoczi
  2021-03-17 18:00 ` [PATCH 5/6] test-coroutine: add rwlock upgrade test Paolo Bonzini
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-17 18:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

An invariant of the current rwlock is that if multiple coroutines hold a
reader lock, all must be runnable. The unlock implementation relies on
this, choosing to wake a single coroutine when the final read lock
holder exits the critical section, assuming that it will wake a
coroutine attempting to acquire a write lock.

The downgrade implementation violates this assumption by creating a
read lock owning coroutine that is exclusively runnable - any other
coroutines that are waiting to acquire a read lock are *not* made
runnable when the write lock holder converts its ownership to read
only.

More in general, the old implementation had lots of other fairness bugs.
The root cause of the bugs was that CoQueue would wake up readers even
if there were pending writers, and would wake up writers even if there
were readers.  In that case, the coroutine would go back to sleep *at
the end* of the CoQueue, losing its place at the head of the line.

To fix this, keep the queue of waiters explicitly in the CoRwlock
instead of using CoQueue, and store for each whether it is a
potential reader or a writer.  This way, downgrade can look at the
first queued coroutines and wake it only if it is a reader, causing
all other readers in line to be released in turn.

Reported-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
v3->v4: clean up the code and fix upgrade logic.  Fix upgrade comment too.

 include/qemu/coroutine.h   |  17 +++--
 util/qemu-coroutine-lock.c | 148 ++++++++++++++++++++++++-------------
 2 files changed, 106 insertions(+), 59 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 84eab6e3bf..7919d3bb62 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -237,11 +237,15 @@ bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock);
 bool qemu_co_queue_empty(CoQueue *queue);
 
 
+typedef struct CoRwTicket CoRwTicket;
 typedef struct CoRwlock {
-    int pending_writer;
-    int reader;
     CoMutex mutex;
-    CoQueue queue;
+
+    /* Number of readers, or -1 if owned for writing.  */
+    int owners;
+
+    /* Waiting coroutines.  */
+    QSIMPLEQ_HEAD(, CoRwTicket) tickets;
 } CoRwlock;
 
 /**
@@ -260,10 +264,9 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock);
 /**
  * Write Locks the CoRwlock from a reader.  This is a bit more efficient than
  * @qemu_co_rwlock_unlock followed by a separate @qemu_co_rwlock_wrlock.
- * However, if the lock cannot be upgraded immediately, control is transferred
- * to the caller of the current coroutine.  Also, @qemu_co_rwlock_upgrade
- * only overrides CoRwlock fairness if there are no concurrent readers, so
- * another writer might run while @qemu_co_rwlock_upgrade blocks.
+ * Note that if the lock cannot be upgraded immediately, control is transferred
+ * to the caller of the current coroutine; another writer might run while
+ * @qemu_co_rwlock_upgrade blocks.
  */
 void qemu_co_rwlock_upgrade(CoRwlock *lock);
 
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index eb73cf11dc..2669403839 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -327,11 +327,51 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
     trace_qemu_co_mutex_unlock_return(mutex, self);
 }
 
+struct CoRwTicket {
+    bool read;
+    Coroutine *co;
+    QSIMPLEQ_ENTRY(CoRwTicket) next;
+};
+
 void qemu_co_rwlock_init(CoRwlock *lock)
 {
-    memset(lock, 0, sizeof(*lock));
-    qemu_co_queue_init(&lock->queue);
     qemu_co_mutex_init(&lock->mutex);
+    lock->owners = 0;
+    QSIMPLEQ_INIT(&lock->tickets);
+}
+
+/* Releases the internal CoMutex.  */
+static void qemu_co_rwlock_maybe_wake_one(CoRwlock *lock)
+{
+    CoRwTicket *tkt = QSIMPLEQ_FIRST(&lock->tickets);
+    Coroutine *co = NULL;
+
+    /*
+     * Setting lock->owners here prevents rdlock and wrlock from
+     * sneaking in between unlock and wake.
+     */
+
+    if (tkt) {
+        if (tkt->read) {
+            if (lock->owners >= 0) {
+                lock->owners++;
+                co = tkt->co;
+            }
+        } else {
+            if (lock->owners == 0) {
+                lock->owners = -1;
+                co = tkt->co;
+            }
+        }
+    }
+
+    if (co) {
+        QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
+        qemu_co_mutex_unlock(&lock->mutex);
+        aio_co_wake(co);
+    } else {
+        qemu_co_mutex_unlock(&lock->mutex);
+    }
 }
 
 void qemu_co_rwlock_rdlock(CoRwlock *lock)
@@ -340,13 +380,22 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock)
 
     qemu_co_mutex_lock(&lock->mutex);
     /* For fairness, wait if a writer is in line.  */
-    while (lock->pending_writer) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+    if (lock->owners == 0 || (lock->owners > 0 && QSIMPLEQ_EMPTY(&lock->tickets))) {
+        lock->owners++;
+        qemu_co_mutex_unlock(&lock->mutex);
+    } else {
+        CoRwTicket my_ticket = { true, self };
+
+        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
+        qemu_co_mutex_unlock(&lock->mutex);
+        qemu_coroutine_yield();
+        assert(lock->owners >= 1);
+
+        /* Possibly wake another reader, which will wake the next in line.  */
+        qemu_co_mutex_lock(&lock->mutex);
+        qemu_co_rwlock_maybe_wake_one(lock);
     }
-    lock->reader++;
-    qemu_co_mutex_unlock(&lock->mutex);
 
-    /* The rest of the read-side critical section is run without the mutex.  */
     self->locks_held++;
 }
 
@@ -355,69 +404,64 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
     Coroutine *self = qemu_coroutine_self();
 
     assert(qemu_in_coroutine());
-    if (!lock->reader) {
-        /* The critical section started in qemu_co_rwlock_wrlock.  */
-        qemu_co_queue_restart_all(&lock->queue);
-    } else {
-        self->locks_held--;
+    self->locks_held--;
 
-        qemu_co_mutex_lock(&lock->mutex);
-        lock->reader--;
-        assert(lock->reader >= 0);
-        /* Wakeup only one waiting writer */
-        if (!lock->reader) {
-            qemu_co_queue_next(&lock->queue);
-        }
+    qemu_co_mutex_lock(&lock->mutex);
+    if (lock->owners > 0) {
+        lock->owners--;
+    } else {
+        assert(lock->owners == -1);
+        lock->owners = 0;
     }
-    qemu_co_mutex_unlock(&lock->mutex);
+
+    qemu_co_rwlock_maybe_wake_one(lock);
 }
 
 void qemu_co_rwlock_downgrade(CoRwlock *lock)
 {
-    Coroutine *self = qemu_coroutine_self();
-
-    /* lock->mutex critical section started in qemu_co_rwlock_wrlock or
-     * qemu_co_rwlock_upgrade.
-     */
-    assert(lock->reader == 0);
-    lock->reader++;
-    qemu_co_mutex_unlock(&lock->mutex);
+    qemu_co_mutex_lock(&lock->mutex);
+    assert(lock->owners == -1);
+    lock->owners = 1;
 
-    /* The rest of the read-side critical section is run without the mutex.  */
-    self->locks_held++;
+    /* Possibly wake another reader, which will wake the next in line.  */
+    qemu_co_rwlock_maybe_wake_one(lock);
 }
 
 void qemu_co_rwlock_wrlock(CoRwlock *lock)
 {
+    Coroutine *self = qemu_coroutine_self();
+
     qemu_co_mutex_lock(&lock->mutex);
-    lock->pending_writer++;
-    while (lock->reader) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+    if (lock->owners == 0) {
+        lock->owners = -1;
+        qemu_co_mutex_unlock(&lock->mutex);
+    } else {
+        CoRwTicket my_ticket = { false, qemu_coroutine_self() };
+
+        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
+        qemu_co_mutex_unlock(&lock->mutex);
+        qemu_coroutine_yield();
+        assert(lock->owners == -1);
     }
-    lock->pending_writer--;
 
-    /* The rest of the write-side critical section is run with
-     * the mutex taken, so that lock->reader remains zero.
-     * There is no need to update self->locks_held.
-     */
+    self->locks_held++;
 }
 
 void qemu_co_rwlock_upgrade(CoRwlock *lock)
 {
-    Coroutine *self = qemu_coroutine_self();
-
     qemu_co_mutex_lock(&lock->mutex);
-    assert(lock->reader > 0);
-    lock->reader--;
-    lock->pending_writer++;
-    while (lock->reader) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
-    }
-    lock->pending_writer--;
+    assert(lock->owners > 0);
+    /* For fairness, wait if a writer is in line.  */
+    if (lock->owners == 1 && QSIMPLEQ_EMPTY(&lock->tickets)) {
+        lock->owners = -1;
+        qemu_co_mutex_unlock(&lock->mutex);
+    } else {
+        CoRwTicket my_ticket = { false, qemu_coroutine_self() };
 
-    /* The rest of the write-side critical section is run with
-     * the mutex taken, similar to qemu_co_rwlock_wrlock.  Do
-     * not account for the lock twice in self->locks_held.
-     */
-    self->locks_held--;
+        lock->owners--;
+        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
+        qemu_co_rwlock_maybe_wake_one(lock);
+        qemu_coroutine_yield();
+        assert(lock->owners == -1);
+    }
 }
-- 
2.29.2




^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] test-coroutine: add rwlock upgrade test
  2021-03-17 18:00 [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
                   ` (3 preceding siblings ...)
  2021-03-17 18:00 ` [PATCH 4/6] coroutine-lock: reimplement CoRwlock to fix downgrade bug Paolo Bonzini
@ 2021-03-17 18:00 ` Paolo Bonzini
  2021-03-17 18:19   ` David Edmondson
  2021-03-17 18:00 ` [PATCH 6/6] test-coroutine: Add rwlock downgrade test Paolo Bonzini
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-17 18:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

Test that rwlock upgrade is fair, and readers go back to sleep if a writer
is in line.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/unit/test-coroutine.c | 62 +++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/tests/unit/test-coroutine.c b/tests/unit/test-coroutine.c
index e946d93a65..6e6f51d480 100644
--- a/tests/unit/test-coroutine.c
+++ b/tests/unit/test-coroutine.c
@@ -264,6 +264,67 @@ static void test_co_mutex_lockable(void)
     g_assert(QEMU_MAKE_LOCKABLE(null_pointer) == NULL);
 }
 
+static CoRwlock rwlock;
+
+/* Test that readers are properly sent back to the queue when upgrading,
+ * even if they are the sole readers.  The test scenario is as follows:
+ *
+ *
+ * | c1           | c2         |
+ * |--------------+------------+
+ * | rdlock       |            |
+ * | yield        |            |
+ * |              | wrlock     |
+ * |              | <queued>   |
+ * | upgrade      |            |
+ * | <queued>     | <dequeued> |
+ * |              | unlock     |
+ * | <dequeued>   |            |
+ * | unlock       |            |
+ */
+
+static void coroutine_fn rwlock_yield_upgrade(void *opaque)
+{
+    qemu_co_rwlock_rdlock(&rwlock);
+    qemu_coroutine_yield();
+
+    qemu_co_rwlock_upgrade(&rwlock);
+    qemu_co_rwlock_unlock(&rwlock);
+
+    *(bool *)opaque = true;
+}
+
+static void coroutine_fn rwlock_wrlock_yield(void *opaque)
+{
+    qemu_co_rwlock_wrlock(&rwlock);
+    qemu_coroutine_yield();
+
+    qemu_co_rwlock_unlock(&rwlock);
+    *(bool *)opaque = true;
+}
+
+static void test_co_rwlock_upgrade(void)
+{
+    bool c1_done = false;
+    bool c2_done = false;
+    Coroutine *c1, *c2;
+
+    qemu_co_rwlock_init(&rwlock);
+    c1 = qemu_coroutine_create(rwlock_yield_upgrade, &c1_done);
+    c2 = qemu_coroutine_create(rwlock_wrlock_yield, &c2_done);
+
+    qemu_coroutine_enter(c1);
+    qemu_coroutine_enter(c2);
+
+    /* c1 now should go to sleep.  */
+    qemu_coroutine_enter(c1);
+    g_assert(!c1_done);
+
+    qemu_coroutine_enter(c2);
+    g_assert(c1_done);
+    g_assert(c2_done);
+}
+
 /*
  * Check that creation, enter, and return work
  */
@@ -501,6 +562,7 @@ int main(int argc, char **argv)
     g_test_add_func("/basic/order", test_order);
     g_test_add_func("/locking/co-mutex", test_co_mutex);
     g_test_add_func("/locking/co-mutex/lockable", test_co_mutex_lockable);
+    g_test_add_func("/locking/co-rwlock/upgrade", test_co_rwlock_upgrade);
     if (g_test_perf()) {
         g_test_add_func("/perf/lifecycle", perf_lifecycle);
         g_test_add_func("/perf/nesting", perf_nesting);
-- 
2.29.2




^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] test-coroutine: Add rwlock downgrade test
  2021-03-17 18:00 [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
                   ` (4 preceding siblings ...)
  2021-03-17 18:00 ` [PATCH 5/6] test-coroutine: add rwlock upgrade test Paolo Bonzini
@ 2021-03-17 18:00 ` Paolo Bonzini
  2021-03-24 14:26 ` [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Max Reitz
  2021-03-24 16:23 ` Stefan Hajnoczi
  7 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-17 18:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

From: David Edmondson <david.edmondson@oracle.com>

Test that downgrading an rwlock does not result in a failure to
schedule coroutines queued on the rwlock.

The diagram associated with test_co_rwlock_downgrade() describes the
intended behaviour, but what was observed previously corresponds to:

| c1     | c2         | c3         | c4       |
|--------+------------+------------+----------|
| rdlock |            |            |          |
| yield  |            |            |          |
|        | wrlock     |            |          |
|        | <queued>   |            |          |
|        |            | rdlock     |          |
|        |            | <queued>   |          |
|        |            |            | wrlock   |
|        |            |            | <queued> |
| unlock |            |            |          |
| yield  |            |            |          |
|        | <dequeued> |            |          |
|        | downgrade  |            |          |
|        | ...        |            |          |
|        | unlock     |            |          |
|        |            | <dequeued> |          |
|        |            | <queued>   |          |

This results in a failure...

ERROR:../tests/test-coroutine.c:369:test_co_rwlock_downgrade: assertion failed: (c3_done)
Bail out! ERROR:../tests/test-coroutine.c:369:test_co_rwlock_downgrade: assertion failed: (c3_done)

...as a result of the c3 coroutine failing to run to completion.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210309144015.557477-5-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/unit/test-coroutine.c | 99 +++++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)

diff --git a/tests/unit/test-coroutine.c b/tests/unit/test-coroutine.c
index 6e6f51d480..aa77a3bcb3 100644
--- a/tests/unit/test-coroutine.c
+++ b/tests/unit/test-coroutine.c
@@ -325,6 +325,104 @@ static void test_co_rwlock_upgrade(void)
     g_assert(c2_done);
 }
 
+static void coroutine_fn rwlock_rdlock_yield(void *opaque)
+{
+    qemu_co_rwlock_rdlock(&rwlock);
+    qemu_coroutine_yield();
+
+    qemu_co_rwlock_unlock(&rwlock);
+    qemu_coroutine_yield();
+
+    *(bool *)opaque = true;
+}
+
+static void coroutine_fn rwlock_wrlock_downgrade(void *opaque)
+{
+    qemu_co_rwlock_wrlock(&rwlock);
+
+    qemu_co_rwlock_downgrade(&rwlock);
+    qemu_co_rwlock_unlock(&rwlock);
+    *(bool *)opaque = true;
+}
+
+static void coroutine_fn rwlock_rdlock(void *opaque)
+{
+    qemu_co_rwlock_rdlock(&rwlock);
+
+    qemu_co_rwlock_unlock(&rwlock);
+    *(bool *)opaque = true;
+}
+
+static void coroutine_fn rwlock_wrlock(void *opaque)
+{
+    qemu_co_rwlock_wrlock(&rwlock);
+
+    qemu_co_rwlock_unlock(&rwlock);
+    *(bool *)opaque = true;
+}
+
+/*
+ * Check that downgrading a reader-writer lock does not cause a hang.
+ *
+ * Four coroutines are used to produce a situation where there are
+ * both reader and writer hopefuls waiting to acquire an rwlock that
+ * is held by a reader.
+ *
+ * The correct sequence of operations we aim to provoke can be
+ * represented as:
+ *
+ * | c1     | c2         | c3         | c4         |
+ * |--------+------------+------------+------------|
+ * | rdlock |            |            |            |
+ * | yield  |            |            |            |
+ * |        | wrlock     |            |            |
+ * |        | <queued>   |            |            |
+ * |        |            | rdlock     |            |
+ * |        |            | <queued>   |            |
+ * |        |            |            | wrlock     |
+ * |        |            |            | <queued>   |
+ * | unlock |            |            |            |
+ * | yield  |            |            |            |
+ * |        | <dequeued> |            |            |
+ * |        | downgrade  |            |            |
+ * |        |            | <dequeued> |            |
+ * |        |            | unlock     |            |
+ * |        | ...        |            |            |
+ * |        | unlock     |            |            |
+ * |        |            |            | <dequeued> |
+ * |        |            |            | unlock     |
+ */
+static void test_co_rwlock_downgrade(void)
+{
+    bool c1_done = false;
+    bool c2_done = false;
+    bool c3_done = false;
+    bool c4_done = false;
+    Coroutine *c1, *c2, *c3, *c4;
+
+    qemu_co_rwlock_init(&rwlock);
+
+    c1 = qemu_coroutine_create(rwlock_rdlock_yield, &c1_done);
+    c2 = qemu_coroutine_create(rwlock_wrlock_downgrade, &c2_done);
+    c3 = qemu_coroutine_create(rwlock_rdlock, &c3_done);
+    c4 = qemu_coroutine_create(rwlock_wrlock, &c4_done);
+
+    qemu_coroutine_enter(c1);
+    qemu_coroutine_enter(c2);
+    qemu_coroutine_enter(c3);
+    qemu_coroutine_enter(c4);
+
+    qemu_coroutine_enter(c1);
+
+    g_assert(c2_done);
+    g_assert(c3_done);
+    g_assert(c4_done);
+
+    qemu_coroutine_enter(c1);
+
+    g_assert(c1_done);
+}
+
 /*
  * Check that creation, enter, and return work
  */
@@ -563,6 +661,7 @@ int main(int argc, char **argv)
     g_test_add_func("/locking/co-mutex", test_co_mutex);
     g_test_add_func("/locking/co-mutex/lockable", test_co_mutex_lockable);
     g_test_add_func("/locking/co-rwlock/upgrade", test_co_rwlock_upgrade);
+    g_test_add_func("/locking/co-rwlock/downgrade", test_co_rwlock_downgrade);
     if (g_test_perf()) {
         g_test_add_func("/perf/lifecycle", perf_lifecycle);
         g_test_add_func("/perf/nesting", perf_nesting);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] test-coroutine: add rwlock upgrade test
  2021-03-17 18:00 ` [PATCH 5/6] test-coroutine: add rwlock upgrade test Paolo Bonzini
@ 2021-03-17 18:19   ` David Edmondson
  0 siblings, 0 replies; 19+ messages in thread
From: David Edmondson @ 2021-03-17 18:19 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

On Wednesday, 2021-03-17 at 19:00:12 +01, Paolo Bonzini wrote:

> Test that rwlock upgrade is fair, and readers go back to sleep if a writer
> is in line.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: David Edmondson <david.edmondson@oracle.com>

> ---
>  tests/unit/test-coroutine.c | 62 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
>
> diff --git a/tests/unit/test-coroutine.c b/tests/unit/test-coroutine.c
> index e946d93a65..6e6f51d480 100644
> --- a/tests/unit/test-coroutine.c
> +++ b/tests/unit/test-coroutine.c
> @@ -264,6 +264,67 @@ static void test_co_mutex_lockable(void)
>      g_assert(QEMU_MAKE_LOCKABLE(null_pointer) == NULL);
>  }
>  
> +static CoRwlock rwlock;
> +
> +/* Test that readers are properly sent back to the queue when upgrading,
> + * even if they are the sole readers.  The test scenario is as follows:
> + *
> + *
> + * | c1           | c2         |
> + * |--------------+------------+
> + * | rdlock       |            |
> + * | yield        |            |
> + * |              | wrlock     |
> + * |              | <queued>   |
> + * | upgrade      |            |
> + * | <queued>     | <dequeued> |
> + * |              | unlock     |
> + * | <dequeued>   |            |
> + * | unlock       |            |
> + */
> +
> +static void coroutine_fn rwlock_yield_upgrade(void *opaque)
> +{
> +    qemu_co_rwlock_rdlock(&rwlock);
> +    qemu_coroutine_yield();
> +
> +    qemu_co_rwlock_upgrade(&rwlock);
> +    qemu_co_rwlock_unlock(&rwlock);
> +
> +    *(bool *)opaque = true;
> +}
> +
> +static void coroutine_fn rwlock_wrlock_yield(void *opaque)
> +{
> +    qemu_co_rwlock_wrlock(&rwlock);
> +    qemu_coroutine_yield();
> +
> +    qemu_co_rwlock_unlock(&rwlock);
> +    *(bool *)opaque = true;
> +}
> +
> +static void test_co_rwlock_upgrade(void)
> +{
> +    bool c1_done = false;
> +    bool c2_done = false;
> +    Coroutine *c1, *c2;
> +
> +    qemu_co_rwlock_init(&rwlock);
> +    c1 = qemu_coroutine_create(rwlock_yield_upgrade, &c1_done);
> +    c2 = qemu_coroutine_create(rwlock_wrlock_yield, &c2_done);
> +
> +    qemu_coroutine_enter(c1);
> +    qemu_coroutine_enter(c2);
> +
> +    /* c1 now should go to sleep.  */
> +    qemu_coroutine_enter(c1);
> +    g_assert(!c1_done);
> +
> +    qemu_coroutine_enter(c2);
> +    g_assert(c1_done);
> +    g_assert(c2_done);
> +}
> +
>  /*
>   * Check that creation, enter, and return work
>   */
> @@ -501,6 +562,7 @@ int main(int argc, char **argv)
>      g_test_add_func("/basic/order", test_order);
>      g_test_add_func("/locking/co-mutex", test_co_mutex);
>      g_test_add_func("/locking/co-mutex/lockable", test_co_mutex_lockable);
> +    g_test_add_func("/locking/co-rwlock/upgrade", test_co_rwlock_upgrade);
>      if (g_test_perf()) {
>          g_test_add_func("/perf/lifecycle", perf_lifecycle);
>          g_test_add_func("/perf/nesting", perf_nesting);
> -- 
> 2.29.2

dme.
-- 
Don't you know you're never going to get to France.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] block/vdi: When writing new bmap entry fails, don't leak the buffer
  2021-03-17 18:00 ` [PATCH 1/6] block/vdi: When writing new bmap entry fails, don't leak the buffer Paolo Bonzini
@ 2021-03-24 14:25   ` Max Reitz
  0 siblings, 0 replies; 19+ messages in thread
From: Max Reitz @ 2021-03-24 14:25 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: david.edmondson, kwolf, Philippe Mathieu-Daudé, qemu-block

On 17.03.21 19:00, Paolo Bonzini wrote:
> From: David Edmondson <david.edmondson@oracle.com>
> 
> If a new bitmap entry is allocated, requiring the entire block to be
> written, avoiding leaking the buffer allocated for the block should
> the write fail.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
> Message-Id: <20210309144015.557477-2-david.edmondson@oracle.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/vdi.c | 1 +
>   1 file changed, 1 insertion(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] block/vdi: Don't assume that blocks are larger than VdiHeader
  2021-03-17 18:00 ` [PATCH 2/6] block/vdi: Don't assume that blocks are larger than VdiHeader Paolo Bonzini
@ 2021-03-24 14:25   ` Max Reitz
  0 siblings, 0 replies; 19+ messages in thread
From: Max Reitz @ 2021-03-24 14:25 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

On 17.03.21 19:00, Paolo Bonzini wrote:
> From: David Edmondson <david.edmondson@oracle.com>
> 
> Given that the block size is read from the header of the VDI file, a
> wide variety of sizes might be seen. Rather than re-using a block
> sized memory region when writing the VDI header, allocate an
> appropriately sized buffer.
> 
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
> Message-Id: <20210309144015.557477-3-david.edmondson@oracle.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/vdi.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index 2a6dc26124..548f8a057b 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -696,18 +696,20 @@ nonallocating_write:
>   
>       if (block) {
>           /* One or more new blocks were allocated. */
> -        VdiHeader *header = (VdiHeader *) block;
> +        VdiHeader *header;
>           uint8_t *base;
>           uint64_t offset;
>           uint32_t n_sectors;
>   
> +        g_free(block);

This logic (checking @block != NULL) was a bit weird before and this 
just seems to accomodate it.  We probably shouldn’t check @block, but 
some boolean, and *block should be freed after the while () loop.

Considering the “if (block)” was there already and it’d be weird to keep 
that check if *block were freed by that point, I suppose this isn’t 
making it worse, though, so:

Reviewed-by: Max Reitz <mreitz@redhat.com>

> +        header = g_malloc(sizeof(*header));
> +
>           logout("now writing modified header\n");
>           assert(VDI_IS_ALLOCATED(bmap_first));
>           *header = s->header;
>           vdi_header_to_le(header);
> -        ret = bdrv_pwrite(bs->file, 0, block, sizeof(VdiHeader));
> -        g_free(block);
> -        block = NULL;
> +        ret = bdrv_pwrite(bs->file, 0, header, sizeof(*header));
> +        g_free(header);
>   
>           if (ret < 0) {
>               return ret;
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes
  2021-03-17 18:00 [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
                   ` (5 preceding siblings ...)
  2021-03-17 18:00 ` [PATCH 6/6] test-coroutine: Add rwlock downgrade test Paolo Bonzini
@ 2021-03-24 14:26 ` Max Reitz
  2021-03-24 16:23 ` Stefan Hajnoczi
  7 siblings, 0 replies; 19+ messages in thread
From: Max Reitz @ 2021-03-24 14:26 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, Stefan Hajnoczi
  Cc: david.edmondson, kwolf, qemu-block

On 17.03.21 19:00, Paolo Bonzini wrote:
> This is a resubmit of David Edmondson's series at
> https://patchew.org/QEMU/20210309144015.557477-1-david.edmondson@oracle.com/.
> After closer analysis on IRC, the CoRwlock's attempt to ensure
> fairness turned out to be flawed.  Therefore, this series
> reimplements CoRwlock without using a CoQueue.  Tracking whether
> each queued coroutine is a reader/writer makes it possible to
> never wake a writer when only readers should be allowed and
> vice versa.

I’m not a maintainer for patches 3 through 6.  Stefan is, though.  So 
Stefan, can you take this series with an R-b from me on 1 and 2?

Max



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] coroutine-lock: reimplement CoRwlock to fix downgrade bug
  2021-03-17 18:00 ` [PATCH 4/6] coroutine-lock: reimplement CoRwlock to fix downgrade bug Paolo Bonzini
@ 2021-03-24 16:15   ` Stefan Hajnoczi
  2021-03-24 16:40     ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2021-03-24 16:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: david.edmondson, kwolf, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2876 bytes --]

On Wed, Mar 17, 2021 at 07:00:11PM +0100, Paolo Bonzini wrote:
> +static void qemu_co_rwlock_maybe_wake_one(CoRwlock *lock)
> +{
> +    CoRwTicket *tkt = QSIMPLEQ_FIRST(&lock->tickets);
> +    Coroutine *co = NULL;
> +
> +    /*
> +     * Setting lock->owners here prevents rdlock and wrlock from
> +     * sneaking in between unlock and wake.
> +     */
> +
> +    if (tkt) {
> +        if (tkt->read) {
> +            if (lock->owners >= 0) {
> +                lock->owners++;
> +                co = tkt->co;
> +            }
> +        } else {
> +            if (lock->owners == 0) {
> +                lock->owners = -1;
> +                co = tkt->co;
> +            }
> +        }
> +    }
> +
> +    if (co) {
> +        QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
> +        qemu_co_mutex_unlock(&lock->mutex);
> +        aio_co_wake(co);

I find it hard to reason about QSIMPLEQ_EMPTY(&lock->tickets) callers
that execute before co is entered. They see an empty queue even though a
coroutine is about to run. Updating owners above ensures that the code
correctly tracks the state of the rwlock, but I'm not 100% confident
about this aspect of the code.

>  void qemu_co_rwlock_upgrade(CoRwlock *lock)
>  {
> -    Coroutine *self = qemu_coroutine_self();
> -
>      qemu_co_mutex_lock(&lock->mutex);
> -    assert(lock->reader > 0);
> -    lock->reader--;
> -    lock->pending_writer++;
> -    while (lock->reader) {
> -        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> -    }
> -    lock->pending_writer--;
> +    assert(lock->owners > 0);
> +    /* For fairness, wait if a writer is in line.  */
> +    if (lock->owners == 1 && QSIMPLEQ_EMPTY(&lock->tickets)) {
> +        lock->owners = -1;
> +        qemu_co_mutex_unlock(&lock->mutex);
> +    } else {
> +        CoRwTicket my_ticket = { false, qemu_coroutine_self() };
>  
> -    /* The rest of the write-side critical section is run with
> -     * the mutex taken, similar to qemu_co_rwlock_wrlock.  Do
> -     * not account for the lock twice in self->locks_held.
> -     */
> -    self->locks_held--;
> +        lock->owners--;
> +        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
> +        qemu_co_rwlock_maybe_wake_one(lock);
> +        qemu_coroutine_yield();
> +        assert(lock->owners == -1);

lock->owners is read outside lock->mutex here. Not sure if this can
cause problems.

> +    }
>  }

locks_held is kept unchanged across qemu_coroutine_yield() even though
the read lock has been released. rdlock() and wrlock() only increment
locks_held after acquiring the rwlock.

In practice I don't think it matters, but it seems inconsistent. If
locks_held is supposed to track tickets (not just coroutines currently
holding a lock), then rdlock() and wrlock() should increment before
yielding.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes
  2021-03-17 18:00 [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
                   ` (6 preceding siblings ...)
  2021-03-24 14:26 ` [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Max Reitz
@ 2021-03-24 16:23 ` Stefan Hajnoczi
  2021-03-24 16:43   ` Paolo Bonzini
  7 siblings, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2021-03-24 16:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: david.edmondson, kwolf, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1467 bytes --]

On Wed, Mar 17, 2021 at 07:00:07PM +0100, Paolo Bonzini wrote:
> This is a resubmit of David Edmondson's series at
> https://patchew.org/QEMU/20210309144015.557477-1-david.edmondson@oracle.com/.
> After closer analysis on IRC, the CoRwlock's attempt to ensure
> fairness turned out to be flawed.  Therefore, this series
> reimplements CoRwlock without using a CoQueue.  Tracking whether
> each queued coroutine is a reader/writer makes it possible to
> never wake a writer when only readers should be allowed and
> vice versa.
> 
> v2->v3: new CoRwlock implementation
> 
> v3->v4: fix upgrade and add a test for that, too
> 
> v4->v5: typo
> 
> David Edmondson (4):
>   block/vdi: When writing new bmap entry fails, don't leak the buffer
>   block/vdi: Don't assume that blocks are larger than VdiHeader
>   coroutine/mutex: Store the coroutine in the CoWaitRecord only once
>   test-coroutine: Add rwlock downgrade test
> 
> Paolo Bonzini (2):
>   coroutine-lock: reimplement CoRwlock to fix downgrade bug
>   test-coroutine: add rwlock upgrade test
> 
>  block/vdi.c                 |  11 ++-
>  include/qemu/coroutine.h    |  17 ++--
>  tests/unit/test-coroutine.c | 161 ++++++++++++++++++++++++++++++++++++
>  util/qemu-coroutine-lock.c  | 149 +++++++++++++++++++++------------
>  4 files changed, 274 insertions(+), 64 deletions(-)

I had questions about the rwlock implementation. The other patches look
ready to go.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] coroutine-lock: reimplement CoRwlock to fix downgrade bug
  2021-03-24 16:15   ` Stefan Hajnoczi
@ 2021-03-24 16:40     ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-24 16:40 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: david.edmondson, kwolf, qemu-devel, qemu-block

On 24/03/21 17:15, Stefan Hajnoczi wrote:
> On Wed, Mar 17, 2021 at 07:00:11PM +0100, Paolo Bonzini wrote:
>> +static void qemu_co_rwlock_maybe_wake_one(CoRwlock *lock)
>> +{
>> +    CoRwTicket *tkt = QSIMPLEQ_FIRST(&lock->tickets);
>> +    Coroutine *co = NULL;
>> +
>> +    /*
>> +     * Setting lock->owners here prevents rdlock and wrlock from
>> +     * sneaking in between unlock and wake.
>> +     */
>> +
>> +    if (tkt) {
>> +        if (tkt->read) {
>> +            if (lock->owners >= 0) {
>> +                lock->owners++;
>> +                co = tkt->co;
>> +            }
>> +        } else {
>> +            if (lock->owners == 0) {
>> +                lock->owners = -1;
>> +                co = tkt->co;
>> +            }
>> +        }
>> +    }
>> +
>> +    if (co) {
>> +        QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
>> +        qemu_co_mutex_unlock(&lock->mutex);
>> +        aio_co_wake(co);
> 
> I find it hard to reason about QSIMPLEQ_EMPTY(&lock->tickets) callers
> that execute before co is entered. They see an empty queue even though a
> coroutine is about to run. Updating owners above ensures that the code
> correctly tracks the state of the rwlock, but I'm not 100% confident
> about this aspect of the code.

Good point.  The invariant when lock->mutex is released should be 
clarified; a better way to phrase the comment above "if (tkt)" is:

The invariant when lock->mutex is released is that every ticket is 
tracked in either lock->owners or lock->tickets.  By updating 
lock->owners here, rdlock/wrlock/upgrade will block even if they execute 
between qemu_co_mutex_unlock and aio_co_wake.

>> -    self->locks_held--;
>> +        lock->owners--;
>> +        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
>> +        qemu_co_rwlock_maybe_wake_one(lock);
>> +        qemu_coroutine_yield();
>> +        assert(lock->owners == -1);
> 
> lock->owners is read outside lock->mutex here. Not sure if this can
> cause problems.

True.  It is okay though because lock->owners cannot change until this 
coroutine unlocks.  A worse occurrence of the issue is in rdlock:

         assert(lock->owners >= 1);

         /* Possibly wake another reader, which will wake the next in 
line.  */
         qemu_co_mutex_lock(&lock->mutex);

where the assert should be moved after taking the lock, or possibly 
changed to use qatomic_read.  (I prefer the former).

> locks_held is kept unchanged across qemu_coroutine_yield() even though
> the read lock has been released. rdlock() and wrlock() only increment
> locks_held after acquiring the rwlock.
> 
> In practice I don't think it matters, but it seems inconsistent. If
> locks_held is supposed to track tickets (not just coroutines currently
> holding a lock), then rdlock() and wrlock() should increment before
> yielding.

locks_held (unlike owners) is not part of the lock, it's part of the 
Coroutine and only used for debugging (asserting that terminating 
coroutines are not holding any lock).

Paolo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes
  2021-03-24 16:23 ` Stefan Hajnoczi
@ 2021-03-24 16:43   ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-24 16:43 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: david.edmondson, kwolf, qemu-devel, qemu-block

On 24/03/21 17:23, Stefan Hajnoczi wrote:
> On Wed, Mar 17, 2021 at 07:00:07PM +0100, Paolo Bonzini wrote:
>> This is a resubmit of David Edmondson's series at
>> https://patchew.org/QEMU/20210309144015.557477-1-david.edmondson@oracle.com/.
>> After closer analysis on IRC, the CoRwlock's attempt to ensure
>> fairness turned out to be flawed.  Therefore, this series
>> reimplements CoRwlock without using a CoQueue.  Tracking whether
>> each queued coroutine is a reader/writer makes it possible to
>> never wake a writer when only readers should be allowed and
>> vice versa.
>>
>> v2->v3: new CoRwlock implementation
>>
>> v3->v4: fix upgrade and add a test for that, too
>>
>> v4->v5: typo
>>
>> David Edmondson (4):
>>    block/vdi: When writing new bmap entry fails, don't leak the buffer
>>    block/vdi: Don't assume that blocks are larger than VdiHeader
>>    coroutine/mutex: Store the coroutine in the CoWaitRecord only once
>>    test-coroutine: Add rwlock downgrade test
>>
>> Paolo Bonzini (2):
>>    coroutine-lock: reimplement CoRwlock to fix downgrade bug
>>    test-coroutine: add rwlock upgrade test
>>
>>   block/vdi.c                 |  11 ++-
>>   include/qemu/coroutine.h    |  17 ++--
>>   tests/unit/test-coroutine.c | 161 ++++++++++++++++++++++++++++++++++++
>>   util/qemu-coroutine-lock.c  | 149 +++++++++++++++++++++------------
>>   4 files changed, 274 insertions(+), 64 deletions(-)
> 
> I had questions about the rwlock implementation. The other patches look
> ready to go.

Cool, none of them seem to be blockers but you raised good points.  I'll 
send v6 tomorrow.

Paolo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] coroutine-lock: reimplement CoRwLock to fix downgrade bug
  2021-03-17 17:19     ` Paolo Bonzini
@ 2021-03-17 17:47       ` David Edmondson
  0 siblings, 0 replies; 19+ messages in thread
From: David Edmondson @ 2021-03-17 17:47 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

On Wednesday, 2021-03-17 at 18:19:58 +01, Paolo Bonzini wrote:

> On 17/03/21 16:17, David Edmondson wrote:
>>> +    if (tkt) {
>>> +        if (tkt->read) {
>>> +            if (lock->owners >= 0) {
>>> +                lock->owners++;
>>> +                co = tkt->co;
>>> +            }
>>> +        } else {
>>> +            if (lock->owners == 0) {
>>> +                lock->owners = -1;
>>> +                co = tkt->co;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    if (co) {
>>> +        QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
>>> +        qemu_co_mutex_unlock(&lock->mutex);
>>> +        aio_co_wake(co);
>>> +    } else {
>>> +        qemu_co_mutex_unlock(&lock->mutex);
>>> +    }
>> 
>> This block could be pushed up into the earlier block, but I imagine that
>> the compiler will do it for you.
>
> I guess I could do
>
>      if (!tkt || (tkt->read ? lock->owners < 0 : lock->owners != 0)) {
>          qemu_co_mutex_unlock(&lock->mutex);
>          return;
>      }
>      if (tkt->read) {
>          lock->owners++;
>      } else {
>          lock->owners = -1;
>      }
>
>      co = tkt->co;
>      QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
>      qemu_co_mutex_unlock(&lock->mutex);
>      aio_co_wake(co);
>
> but I find it less readable.

Agreed.

> So that leaves only the of/or typo, right?

Yes.

dme.
-- 
In heaven there is no beer, that's why we drink it here.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] coroutine-lock: reimplement CoRwLock to fix downgrade bug
  2021-03-17 15:17   ` David Edmondson
@ 2021-03-17 17:19     ` Paolo Bonzini
  2021-03-17 17:47       ` David Edmondson
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-17 17:19 UTC (permalink / raw)
  To: David Edmondson, qemu-devel; +Cc: kwolf, qemu-block

On 17/03/21 16:17, David Edmondson wrote:
>> +    if (tkt) {
>> +        if (tkt->read) {
>> +            if (lock->owners >= 0) {
>> +                lock->owners++;
>> +                co = tkt->co;
>> +            }
>> +        } else {
>> +            if (lock->owners == 0) {
>> +                lock->owners = -1;
>> +                co = tkt->co;
>> +            }
>> +        }
>> +    }
>> +
>> +    if (co) {
>> +        QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
>> +        qemu_co_mutex_unlock(&lock->mutex);
>> +        aio_co_wake(co);
>> +    } else {
>> +        qemu_co_mutex_unlock(&lock->mutex);
>> +    }
> 
> This block could be pushed up into the earlier block, but I imagine that
> the compiler will do it for you.

I guess I could do

     if (!tkt || (tkt->read ? lock->owners < 0 : lock->owners != 0)) {
         qemu_co_mutex_unlock(&lock->mutex);
         return;
     }
     if (tkt->read) {
         lock->owners++;
     } else {
         lock->owners = -1;
     }

     co = tkt->co;
     QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
     qemu_co_mutex_unlock(&lock->mutex);
     aio_co_wake(co);

but I find it less readable.  So that leaves only the of/or typo, right?

Paolo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] coroutine-lock: reimplement CoRwLock to fix downgrade bug
  2021-03-17 12:16 ` [PATCH 4/6] coroutine-lock: reimplement CoRwLock to fix downgrade bug Paolo Bonzini
@ 2021-03-17 15:17   ` David Edmondson
  2021-03-17 17:19     ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: David Edmondson @ 2021-03-17 15:17 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

On Wednesday, 2021-03-17 at 13:16:39 +01, Paolo Bonzini wrote:

> An invariant of the current rwlock is that if multiple coroutines hold a
> reader lock, all must be runnable. The unlock implementation relies on
> this, choosing to wake a single coroutine when the final read lock
> holder exits the critical section, assuming that it will wake a
> coroutine attempting to acquire a write lock.
>
> The downgrade implementation violates this assumption by creating a
> read lock owning coroutine that is exclusively runnable - any other
> coroutines that are waiting to acquire a read lock are *not* made
> runnable when the write lock holder converts its ownership to read
> only.
>
> More in general, the old implementation had lots of other fairness bugs.
> The root cause of the bugs was that CoQueue would wake up readers even
> if there were pending writers, and would wake up writers even if there
> were readers.  In that case, the coroutine would go back to sleep *at
> the end* of the CoQueue, losing its place at the head of the line.
>
> To fix this, keep the queue of waiters explicitly in the CoRwLock
> instead of using CoQueue, and store for each whether it is a
> potential reader or a writer.  This way, downgrade can look at the
> first queued coroutines and wake it only if it is a reader, causing
> all other readers in line to be released in turn.
>
> Reported-by: David Edmondson <david.edmondson@oracle.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

A couple of nits below, but it looks good overall.

Reviewed-by: David Edmondson <david.edmondson@oracle.com>

> ---
> v3->v4: clean up the code and fix upgrade logic.  Fix upgrade comment too.
>
>  include/qemu/coroutine.h   |  17 +++--
>  util/qemu-coroutine-lock.c | 148 ++++++++++++++++++++++++-------------
>  2 files changed, 106 insertions(+), 59 deletions(-)
>
> diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
> index 84eab6e3bf..7919d3bb62 100644
> --- a/include/qemu/coroutine.h
> +++ b/include/qemu/coroutine.h
> @@ -237,11 +237,15 @@ bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock);
>  bool qemu_co_queue_empty(CoQueue *queue);
>  
>  
> +typedef struct CoRwTicket CoRwTicket;
>  typedef struct CoRwlock {
> -    int pending_writer;
> -    int reader;
>      CoMutex mutex;
> -    CoQueue queue;
> +
> +    /* Number of readers, of -1 if owned for writing.  */
> +    int owners;

s/, of/, or/

> +
> +    /* Waiting coroutines.  */
> +    QSIMPLEQ_HEAD(, CoRwTicket) tickets;
>  } CoRwlock;
>  
>  /**
> @@ -260,10 +264,9 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock);
>  /**
>   * Write Locks the CoRwlock from a reader.  This is a bit more efficient than
>   * @qemu_co_rwlock_unlock followed by a separate @qemu_co_rwlock_wrlock.
> - * However, if the lock cannot be upgraded immediately, control is transferred
> - * to the caller of the current coroutine.  Also, @qemu_co_rwlock_upgrade
> - * only overrides CoRwlock fairness if there are no concurrent readers, so
> - * another writer might run while @qemu_co_rwlock_upgrade blocks.
> + * Note that if the lock cannot be upgraded immediately, control is transferred
> + * to the caller of the current coroutine; another writer might run while
> + * @qemu_co_rwlock_upgrade blocks.

This is better, thanks.

>   */
>  void qemu_co_rwlock_upgrade(CoRwlock *lock);
>  
> diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
> index eb73cf11dc..2669403839 100644
> --- a/util/qemu-coroutine-lock.c
> +++ b/util/qemu-coroutine-lock.c
> @@ -327,11 +327,51 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
>      trace_qemu_co_mutex_unlock_return(mutex, self);
>  }
>  
> +struct CoRwTicket {
> +    bool read;
> +    Coroutine *co;
> +    QSIMPLEQ_ENTRY(CoRwTicket) next;
> +};
> +
>  void qemu_co_rwlock_init(CoRwlock *lock)
>  {
> -    memset(lock, 0, sizeof(*lock));
> -    qemu_co_queue_init(&lock->queue);
>      qemu_co_mutex_init(&lock->mutex);
> +    lock->owners = 0;
> +    QSIMPLEQ_INIT(&lock->tickets);
> +}
> +
> +/* Releases the internal CoMutex.  */
> +static void qemu_co_rwlock_maybe_wake_one(CoRwlock *lock)
> +{
> +    CoRwTicket *tkt = QSIMPLEQ_FIRST(&lock->tickets);
> +    Coroutine *co = NULL;
> +
> +    /*
> +     * Setting lock->owners here prevents rdlock and wrlock from
> +     * sneaking in between unlock and wake.
> +     */
> +
> +    if (tkt) {
> +        if (tkt->read) {
> +            if (lock->owners >= 0) {
> +                lock->owners++;
> +                co = tkt->co;
> +            }
> +        } else {
> +            if (lock->owners == 0) {
> +                lock->owners = -1;
> +                co = tkt->co;
> +            }
> +        }
> +    }
> +
> +    if (co) {
> +        QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
> +        qemu_co_mutex_unlock(&lock->mutex);
> +        aio_co_wake(co);
> +    } else {
> +        qemu_co_mutex_unlock(&lock->mutex);
> +    }

This block could be pushed up into the earlier block, but I imagine that
the compiler will do it for you.

>  }
>  
>  void qemu_co_rwlock_rdlock(CoRwlock *lock)
> @@ -340,13 +380,22 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock)
>  
>      qemu_co_mutex_lock(&lock->mutex);
>      /* For fairness, wait if a writer is in line.  */
> -    while (lock->pending_writer) {
> -        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> +    if (lock->owners == 0 || (lock->owners > 0 && QSIMPLEQ_EMPTY(&lock->tickets))) {
> +        lock->owners++;
> +        qemu_co_mutex_unlock(&lock->mutex);
> +    } else {
> +        CoRwTicket my_ticket = { true, self };
> +
> +        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
> +        qemu_co_mutex_unlock(&lock->mutex);
> +        qemu_coroutine_yield();
> +        assert(lock->owners >= 1);
> +
> +        /* Possibly wake another reader, which will wake the next in line.  */
> +        qemu_co_mutex_lock(&lock->mutex);
> +        qemu_co_rwlock_maybe_wake_one(lock);
>      }
> -    lock->reader++;
> -    qemu_co_mutex_unlock(&lock->mutex);
>  
> -    /* The rest of the read-side critical section is run without the mutex.  */
>      self->locks_held++;
>  }
>  
> @@ -355,69 +404,64 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
>      Coroutine *self = qemu_coroutine_self();
>  
>      assert(qemu_in_coroutine());
> -    if (!lock->reader) {
> -        /* The critical section started in qemu_co_rwlock_wrlock.  */
> -        qemu_co_queue_restart_all(&lock->queue);
> -    } else {
> -        self->locks_held--;
> +    self->locks_held--;
>  
> -        qemu_co_mutex_lock(&lock->mutex);
> -        lock->reader--;
> -        assert(lock->reader >= 0);
> -        /* Wakeup only one waiting writer */
> -        if (!lock->reader) {
> -            qemu_co_queue_next(&lock->queue);
> -        }
> +    qemu_co_mutex_lock(&lock->mutex);
> +    if (lock->owners > 0) {
> +        lock->owners--;
> +    } else {
> +        assert(lock->owners == -1);
> +        lock->owners = 0;
>      }
> -    qemu_co_mutex_unlock(&lock->mutex);
> +
> +    qemu_co_rwlock_maybe_wake_one(lock);
>  }
>  
>  void qemu_co_rwlock_downgrade(CoRwlock *lock)
>  {
> -    Coroutine *self = qemu_coroutine_self();
> -
> -    /* lock->mutex critical section started in qemu_co_rwlock_wrlock or
> -     * qemu_co_rwlock_upgrade.
> -     */
> -    assert(lock->reader == 0);
> -    lock->reader++;
> -    qemu_co_mutex_unlock(&lock->mutex);
> +    qemu_co_mutex_lock(&lock->mutex);
> +    assert(lock->owners == -1);
> +    lock->owners = 1;
>  
> -    /* The rest of the read-side critical section is run without the mutex.  */
> -    self->locks_held++;
> +    /* Possibly wake another reader, which will wake the next in line.  */
> +    qemu_co_rwlock_maybe_wake_one(lock);
>  }
>  
>  void qemu_co_rwlock_wrlock(CoRwlock *lock)
>  {
> +    Coroutine *self = qemu_coroutine_self();
> +
>      qemu_co_mutex_lock(&lock->mutex);
> -    lock->pending_writer++;
> -    while (lock->reader) {
> -        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> +    if (lock->owners == 0) {
> +        lock->owners = -1;
> +        qemu_co_mutex_unlock(&lock->mutex);
> +    } else {
> +        CoRwTicket my_ticket = { false, qemu_coroutine_self() };
> +
> +        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
> +        qemu_co_mutex_unlock(&lock->mutex);
> +        qemu_coroutine_yield();
> +        assert(lock->owners == -1);
>      }
> -    lock->pending_writer--;
>  
> -    /* The rest of the write-side critical section is run with
> -     * the mutex taken, so that lock->reader remains zero.
> -     * There is no need to update self->locks_held.
> -     */
> +    self->locks_held++;
>  }
>  
>  void qemu_co_rwlock_upgrade(CoRwlock *lock)
>  {
> -    Coroutine *self = qemu_coroutine_self();
> -
>      qemu_co_mutex_lock(&lock->mutex);
> -    assert(lock->reader > 0);
> -    lock->reader--;
> -    lock->pending_writer++;
> -    while (lock->reader) {
> -        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> -    }
> -    lock->pending_writer--;
> +    assert(lock->owners > 0);
> +    /* For fairness, wait if a writer is in line.  */
> +    if (lock->owners == 1 && QSIMPLEQ_EMPTY(&lock->tickets)) {
> +        lock->owners = -1;
> +        qemu_co_mutex_unlock(&lock->mutex);
> +    } else {
> +        CoRwTicket my_ticket = { false, qemu_coroutine_self() };
>  
> -    /* The rest of the write-side critical section is run with
> -     * the mutex taken, similar to qemu_co_rwlock_wrlock.  Do
> -     * not account for the lock twice in self->locks_held.
> -     */
> -    self->locks_held--;
> +        lock->owners--;
> +        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
> +        qemu_co_rwlock_maybe_wake_one(lock);
> +        qemu_coroutine_yield();
> +        assert(lock->owners == -1);
> +    }
>  }
> -- 
> 2.29.2

dme.
-- 
Today is different. Today is not the same.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/6] coroutine-lock: reimplement CoRwLock to fix downgrade bug
  2021-03-17 12:16 [PATCH v4 " Paolo Bonzini
@ 2021-03-17 12:16 ` Paolo Bonzini
  2021-03-17 15:17   ` David Edmondson
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2021-03-17 12:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: david.edmondson, kwolf, qemu-block

An invariant of the current rwlock is that if multiple coroutines hold a
reader lock, all must be runnable. The unlock implementation relies on
this, choosing to wake a single coroutine when the final read lock
holder exits the critical section, assuming that it will wake a
coroutine attempting to acquire a write lock.

The downgrade implementation violates this assumption by creating a
read lock owning coroutine that is exclusively runnable - any other
coroutines that are waiting to acquire a read lock are *not* made
runnable when the write lock holder converts its ownership to read
only.

More in general, the old implementation had lots of other fairness bugs.
The root cause of the bugs was that CoQueue would wake up readers even
if there were pending writers, and would wake up writers even if there
were readers.  In that case, the coroutine would go back to sleep *at
the end* of the CoQueue, losing its place at the head of the line.

To fix this, keep the queue of waiters explicitly in the CoRwLock
instead of using CoQueue, and store for each whether it is a
potential reader or a writer.  This way, downgrade can look at the
first queued coroutines and wake it only if it is a reader, causing
all other readers in line to be released in turn.

Reported-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
v3->v4: clean up the code and fix upgrade logic.  Fix upgrade comment too.

 include/qemu/coroutine.h   |  17 +++--
 util/qemu-coroutine-lock.c | 148 ++++++++++++++++++++++++-------------
 2 files changed, 106 insertions(+), 59 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 84eab6e3bf..7919d3bb62 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -237,11 +237,15 @@ bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock);
 bool qemu_co_queue_empty(CoQueue *queue);
 
 
+typedef struct CoRwTicket CoRwTicket;
 typedef struct CoRwlock {
-    int pending_writer;
-    int reader;
     CoMutex mutex;
-    CoQueue queue;
+
+    /* Number of readers, of -1 if owned for writing.  */
+    int owners;
+
+    /* Waiting coroutines.  */
+    QSIMPLEQ_HEAD(, CoRwTicket) tickets;
 } CoRwlock;
 
 /**
@@ -260,10 +264,9 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock);
 /**
  * Write Locks the CoRwlock from a reader.  This is a bit more efficient than
  * @qemu_co_rwlock_unlock followed by a separate @qemu_co_rwlock_wrlock.
- * However, if the lock cannot be upgraded immediately, control is transferred
- * to the caller of the current coroutine.  Also, @qemu_co_rwlock_upgrade
- * only overrides CoRwlock fairness if there are no concurrent readers, so
- * another writer might run while @qemu_co_rwlock_upgrade blocks.
+ * Note that if the lock cannot be upgraded immediately, control is transferred
+ * to the caller of the current coroutine; another writer might run while
+ * @qemu_co_rwlock_upgrade blocks.
  */
 void qemu_co_rwlock_upgrade(CoRwlock *lock);
 
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index eb73cf11dc..2669403839 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -327,11 +327,51 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
     trace_qemu_co_mutex_unlock_return(mutex, self);
 }
 
+struct CoRwTicket {
+    bool read;
+    Coroutine *co;
+    QSIMPLEQ_ENTRY(CoRwTicket) next;
+};
+
 void qemu_co_rwlock_init(CoRwlock *lock)
 {
-    memset(lock, 0, sizeof(*lock));
-    qemu_co_queue_init(&lock->queue);
     qemu_co_mutex_init(&lock->mutex);
+    lock->owners = 0;
+    QSIMPLEQ_INIT(&lock->tickets);
+}
+
+/* Releases the internal CoMutex.  */
+static void qemu_co_rwlock_maybe_wake_one(CoRwlock *lock)
+{
+    CoRwTicket *tkt = QSIMPLEQ_FIRST(&lock->tickets);
+    Coroutine *co = NULL;
+
+    /*
+     * Setting lock->owners here prevents rdlock and wrlock from
+     * sneaking in between unlock and wake.
+     */
+
+    if (tkt) {
+        if (tkt->read) {
+            if (lock->owners >= 0) {
+                lock->owners++;
+                co = tkt->co;
+            }
+        } else {
+            if (lock->owners == 0) {
+                lock->owners = -1;
+                co = tkt->co;
+            }
+        }
+    }
+
+    if (co) {
+        QSIMPLEQ_REMOVE_HEAD(&lock->tickets, next);
+        qemu_co_mutex_unlock(&lock->mutex);
+        aio_co_wake(co);
+    } else {
+        qemu_co_mutex_unlock(&lock->mutex);
+    }
 }
 
 void qemu_co_rwlock_rdlock(CoRwlock *lock)
@@ -340,13 +380,22 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock)
 
     qemu_co_mutex_lock(&lock->mutex);
     /* For fairness, wait if a writer is in line.  */
-    while (lock->pending_writer) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+    if (lock->owners == 0 || (lock->owners > 0 && QSIMPLEQ_EMPTY(&lock->tickets))) {
+        lock->owners++;
+        qemu_co_mutex_unlock(&lock->mutex);
+    } else {
+        CoRwTicket my_ticket = { true, self };
+
+        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
+        qemu_co_mutex_unlock(&lock->mutex);
+        qemu_coroutine_yield();
+        assert(lock->owners >= 1);
+
+        /* Possibly wake another reader, which will wake the next in line.  */
+        qemu_co_mutex_lock(&lock->mutex);
+        qemu_co_rwlock_maybe_wake_one(lock);
     }
-    lock->reader++;
-    qemu_co_mutex_unlock(&lock->mutex);
 
-    /* The rest of the read-side critical section is run without the mutex.  */
     self->locks_held++;
 }
 
@@ -355,69 +404,64 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
     Coroutine *self = qemu_coroutine_self();
 
     assert(qemu_in_coroutine());
-    if (!lock->reader) {
-        /* The critical section started in qemu_co_rwlock_wrlock.  */
-        qemu_co_queue_restart_all(&lock->queue);
-    } else {
-        self->locks_held--;
+    self->locks_held--;
 
-        qemu_co_mutex_lock(&lock->mutex);
-        lock->reader--;
-        assert(lock->reader >= 0);
-        /* Wakeup only one waiting writer */
-        if (!lock->reader) {
-            qemu_co_queue_next(&lock->queue);
-        }
+    qemu_co_mutex_lock(&lock->mutex);
+    if (lock->owners > 0) {
+        lock->owners--;
+    } else {
+        assert(lock->owners == -1);
+        lock->owners = 0;
     }
-    qemu_co_mutex_unlock(&lock->mutex);
+
+    qemu_co_rwlock_maybe_wake_one(lock);
 }
 
 void qemu_co_rwlock_downgrade(CoRwlock *lock)
 {
-    Coroutine *self = qemu_coroutine_self();
-
-    /* lock->mutex critical section started in qemu_co_rwlock_wrlock or
-     * qemu_co_rwlock_upgrade.
-     */
-    assert(lock->reader == 0);
-    lock->reader++;
-    qemu_co_mutex_unlock(&lock->mutex);
+    qemu_co_mutex_lock(&lock->mutex);
+    assert(lock->owners == -1);
+    lock->owners = 1;
 
-    /* The rest of the read-side critical section is run without the mutex.  */
-    self->locks_held++;
+    /* Possibly wake another reader, which will wake the next in line.  */
+    qemu_co_rwlock_maybe_wake_one(lock);
 }
 
 void qemu_co_rwlock_wrlock(CoRwlock *lock)
 {
+    Coroutine *self = qemu_coroutine_self();
+
     qemu_co_mutex_lock(&lock->mutex);
-    lock->pending_writer++;
-    while (lock->reader) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+    if (lock->owners == 0) {
+        lock->owners = -1;
+        qemu_co_mutex_unlock(&lock->mutex);
+    } else {
+        CoRwTicket my_ticket = { false, qemu_coroutine_self() };
+
+        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
+        qemu_co_mutex_unlock(&lock->mutex);
+        qemu_coroutine_yield();
+        assert(lock->owners == -1);
     }
-    lock->pending_writer--;
 
-    /* The rest of the write-side critical section is run with
-     * the mutex taken, so that lock->reader remains zero.
-     * There is no need to update self->locks_held.
-     */
+    self->locks_held++;
 }
 
 void qemu_co_rwlock_upgrade(CoRwlock *lock)
 {
-    Coroutine *self = qemu_coroutine_self();
-
     qemu_co_mutex_lock(&lock->mutex);
-    assert(lock->reader > 0);
-    lock->reader--;
-    lock->pending_writer++;
-    while (lock->reader) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
-    }
-    lock->pending_writer--;
+    assert(lock->owners > 0);
+    /* For fairness, wait if a writer is in line.  */
+    if (lock->owners == 1 && QSIMPLEQ_EMPTY(&lock->tickets)) {
+        lock->owners = -1;
+        qemu_co_mutex_unlock(&lock->mutex);
+    } else {
+        CoRwTicket my_ticket = { false, qemu_coroutine_self() };
 
-    /* The rest of the write-side critical section is run with
-     * the mutex taken, similar to qemu_co_rwlock_wrlock.  Do
-     * not account for the lock twice in self->locks_held.
-     */
-    self->locks_held--;
+        lock->owners--;
+        QSIMPLEQ_INSERT_TAIL(&lock->tickets, &my_ticket, next);
+        qemu_co_rwlock_maybe_wake_one(lock);
+        qemu_coroutine_yield();
+        assert(lock->owners == -1);
+    }
 }
-- 
2.29.2




^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-03-24 16:46 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-17 18:00 [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Paolo Bonzini
2021-03-17 18:00 ` [PATCH 1/6] block/vdi: When writing new bmap entry fails, don't leak the buffer Paolo Bonzini
2021-03-24 14:25   ` Max Reitz
2021-03-17 18:00 ` [PATCH 2/6] block/vdi: Don't assume that blocks are larger than VdiHeader Paolo Bonzini
2021-03-24 14:25   ` Max Reitz
2021-03-17 18:00 ` [PATCH 3/6] coroutine/mutex: Store the coroutine in the CoWaitRecord only once Paolo Bonzini
2021-03-17 18:00 ` [PATCH 4/6] coroutine-lock: reimplement CoRwlock to fix downgrade bug Paolo Bonzini
2021-03-24 16:15   ` Stefan Hajnoczi
2021-03-24 16:40     ` Paolo Bonzini
2021-03-17 18:00 ` [PATCH 5/6] test-coroutine: add rwlock upgrade test Paolo Bonzini
2021-03-17 18:19   ` David Edmondson
2021-03-17 18:00 ` [PATCH 6/6] test-coroutine: Add rwlock downgrade test Paolo Bonzini
2021-03-24 14:26 ` [PATCH v5 0/6] coroutine rwlock downgrade fix, minor VDI changes Max Reitz
2021-03-24 16:23 ` Stefan Hajnoczi
2021-03-24 16:43   ` Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2021-03-17 12:16 [PATCH v4 " Paolo Bonzini
2021-03-17 12:16 ` [PATCH 4/6] coroutine-lock: reimplement CoRwLock to fix downgrade bug Paolo Bonzini
2021-03-17 15:17   ` David Edmondson
2021-03-17 17:19     ` Paolo Bonzini
2021-03-17 17:47       ` David Edmondson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).