[RFC PATCH 0/4] coroutine rwlock downgrade fix, minor VDI changes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/4] coroutine rwlock downgrade fix, minor VDI changes
@ 2021-03-09 10:21 David Edmondson
  2021-03-09 10:21 ` [RFC PATCH 1/4] block/vdi: When writing new bmap entry fails, don't leak the buffer David Edmondson
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: David Edmondson @ 2021-03-09 10:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Weil, Max Reitz, David Edmondson,
	Stefan Hajnoczi

RFC because changing the coroutine code is scary and I'm new to it.

Stressing the VDI code with qemu-img:

  qemu-img convert -p -W -m 16 -O vdi input.qcow2 output.vdi

leads to a hang relatively quickly on a machine with sufficient
CPUs. A similar test targetting either raw or qcow2 formats, or
avoiding out-of-order writes, completes fine.

At the point of the hang all of the coroutines are sitting in
qemu_co_queue_wait_impl(), called from either qemu_co_rwlock_rdlock()
or qemu_co_rwlock_upgrade(), all referencing the same CoRwlock
(BDRVVdiState.bmap_lock).

The comment in the last patch explains what I believe is happening -
downgrading an rwlock from write to read can later result in a failure
to schedule an appropriate coroutine when the read lock is released.

A less invasive change might be to simply have the read side of the
unlock code mark *all* queued coroutines as runnable. This seems
somewhat wasteful, as any read hopefuls that run before a write
hopeful will immediately put themselves back on the queue.

No code other than block/vdi.c appears to use
qemu_co_rwlock_downgrade().

The block/vdi.c changes are small things noticed by inspection when
looking for the cause of the hang.

David Edmondson (4):
  block/vdi: When writing new bmap entry fails, don't leak the buffer
  block/vdi: Don't assume that blocks are larger than VdiHeader
  coroutine/mutex: Store the coroutine in the CoWaitRecord only once
  coroutine/rwlock: Wake writers in preference to readers

 block/vdi.c                | 11 +++++++----
 include/qemu/coroutine.h   |  8 +++++---
 util/qemu-coroutine-lock.c | 25 +++++++++++++++----------
 3 files changed, 27 insertions(+), 17 deletions(-)

-- 
2.30.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/4] block/vdi: When writing new bmap entry fails, don't leak the buffer
  2021-03-09 10:21 [RFC PATCH 0/4] coroutine rwlock downgrade fix, minor VDI changes David Edmondson
@ 2021-03-09 10:21 ` David Edmondson
  2021-03-09 11:09   ` Philippe Mathieu-Daudé
  2021-03-09 10:21 ` [RFC PATCH 2/4] block/vdi: Don't assume that blocks are larger than VdiHeader David Edmondson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: David Edmondson @ 2021-03-09 10:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Weil, Max Reitz, David Edmondson,
	Stefan Hajnoczi

If a new bitmap entry is allocated, requiring the entire block to be
written, avoiding leaking the buffer allocated for the block should
the write fail.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 block/vdi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/vdi.c b/block/vdi.c
index 5627e7d764..2a6dc26124 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -690,6 +690,7 @@ nonallocating_write:
 
     logout("finished data write\n");
     if (ret < 0) {
+        g_free(block);
         return ret;
     }
 
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/4] block/vdi: Don't assume that blocks are larger than VdiHeader
  2021-03-09 10:21 [RFC PATCH 0/4] coroutine rwlock downgrade fix, minor VDI changes David Edmondson
  2021-03-09 10:21 ` [RFC PATCH 1/4] block/vdi: When writing new bmap entry fails, don't leak the buffer David Edmondson
@ 2021-03-09 10:21 ` David Edmondson
  2021-03-09 10:21 ` [RFC PATCH 3/4] coroutine/mutex: Store the coroutine in the CoWaitRecord only once David Edmondson
  2021-03-09 10:21 ` [RFC PATCH 4/4] coroutine/rwlock: Wake writers in preference to readers David Edmondson
  3 siblings, 0 replies; 14+ messages in thread
From: David Edmondson @ 2021-03-09 10:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Weil, Max Reitz, David Edmondson,
	Stefan Hajnoczi

Given that the block size is read from the header of the VDI file, a
wide variety of sizes might be seen. Rather than re-using a block
sized memory region when writing the VDI header, allocate an
appropriately sized buffer.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 block/vdi.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 2a6dc26124..548f8a057b 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -696,18 +696,20 @@ nonallocating_write:
 
     if (block) {
         /* One or more new blocks were allocated. */
-        VdiHeader *header = (VdiHeader *) block;
+        VdiHeader *header;
         uint8_t *base;
         uint64_t offset;
         uint32_t n_sectors;
 
+        g_free(block);
+        header = g_malloc(sizeof(*header));
+
         logout("now writing modified header\n");
         assert(VDI_IS_ALLOCATED(bmap_first));
         *header = s->header;
         vdi_header_to_le(header);
-        ret = bdrv_pwrite(bs->file, 0, block, sizeof(VdiHeader));
-        g_free(block);
-        block = NULL;
+        ret = bdrv_pwrite(bs->file, 0, header, sizeof(*header));
+        g_free(header);
 
         if (ret < 0) {
             return ret;
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/4] coroutine/mutex: Store the coroutine in the CoWaitRecord only once
  2021-03-09 10:21 [RFC PATCH 0/4] coroutine rwlock downgrade fix, minor VDI changes David Edmondson
  2021-03-09 10:21 ` [RFC PATCH 1/4] block/vdi: When writing new bmap entry fails, don't leak the buffer David Edmondson
  2021-03-09 10:21 ` [RFC PATCH 2/4] block/vdi: Don't assume that blocks are larger than VdiHeader David Edmondson
@ 2021-03-09 10:21 ` David Edmondson
  2021-03-09 10:49   ` Paolo Bonzini
  2021-03-09 11:11   ` Philippe Mathieu-Daudé
  2021-03-09 10:21 ` [RFC PATCH 4/4] coroutine/rwlock: Wake writers in preference to readers David Edmondson
  3 siblings, 2 replies; 14+ messages in thread
From: David Edmondson @ 2021-03-09 10:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Weil, Max Reitz, David Edmondson,
	Stefan Hajnoczi

When taking the slow path for mutex acquisition, set the coroutine
value in the CoWaitRecord in push_waiter(), rather than both there and
in the caller.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 util/qemu-coroutine-lock.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 5816bf8900..eb73cf11dc 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -204,7 +204,6 @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(AioContext *ctx,
     unsigned old_handoff;
 
     trace_qemu_co_mutex_lock_entry(mutex, self);
-    w.co = self;
     push_waiter(mutex, &w);
 
     /* This is the "Responsibility Hand-Off" protocol; a lock() picks from
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/4] coroutine/rwlock: Wake writers in preference to readers
  2021-03-09 10:21 [RFC PATCH 0/4] coroutine rwlock downgrade fix, minor VDI changes David Edmondson
                   ` (2 preceding siblings ...)
  2021-03-09 10:21 ` [RFC PATCH 3/4] coroutine/mutex: Store the coroutine in the CoWaitRecord only once David Edmondson
@ 2021-03-09 10:21 ` David Edmondson
  2021-03-09 10:59   ` Paolo Bonzini
  2021-03-09 11:06   ` Paolo Bonzini
  3 siblings, 2 replies; 14+ messages in thread
From: David Edmondson @ 2021-03-09 10:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Weil, Max Reitz, David Edmondson,
	Stefan Hajnoczi

A feature of the current rwlock is that if multiple coroutines hold a
reader lock, all must be runnable. The unlock implementation relies on
this, choosing to wake a single coroutine when the final read lock
holder exits the critical section, assuming that it will wake a
coroutine attempting to acquire a write lock.

The downgrade implementation violates this assumption by creating a
read lock owning coroutine that is exclusively runnable - any other
coroutines that are waiting to acquire a read lock are *not* made
runnable when the write lock holder converts its ownership to read
only.

As a result of this, a coroutine that downgrades a write lock can
later cause unlock to wake a coroutine that is attempting to acquire a
read lock rather than one aiming for a write lock, should the
coroutines be so ordered in the wait queue.

If the wait queue contains both read hopefuls and write hopefuls, any
read hopeful coroutine that is woken will immediately go back onto the
wait queue when it attempts to acquire the rwlock, due to the pending
write acquisition. At this point there are no coroutines holding
either read or write locks and no way for the coroutines in the queue
to be made runnable. A hang ensues.

Address this by using separate queues for coroutines attempting to
acquire read and write ownership of the rwlock. When unlocking, prefer
to make runnable a coroutine that is waiting for a write lock, but if
none is available, make all coroutines waiting to take a read lock
runnable.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 include/qemu/coroutine.h   |  8 +++++---
 util/qemu-coroutine-lock.c | 24 +++++++++++++++---------
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 84eab6e3bf..3dfbf57faf 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -241,7 +241,8 @@ typedef struct CoRwlock {
     int pending_writer;
     int reader;
     CoMutex mutex;
-    CoQueue queue;
+    CoQueue rqueue;
+    CoQueue wqueue;
 } CoRwlock;
 
 /**
@@ -283,8 +284,9 @@ void qemu_co_rwlock_downgrade(CoRwlock *lock);
 void qemu_co_rwlock_wrlock(CoRwlock *lock);
 
 /**
- * Unlocks the read/write lock and schedules the next coroutine that was
- * waiting for this lock to be run.
+ * Unlocks the read/write lock and schedules the next coroutine that
+ * was waiting for this lock to be run, preferring to wake one
+ * attempting to take a write lock over those taking a read lock.
  */
 void qemu_co_rwlock_unlock(CoRwlock *lock);
 
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index eb73cf11dc..c05c143142 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -330,7 +330,8 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
 void qemu_co_rwlock_init(CoRwlock *lock)
 {
     memset(lock, 0, sizeof(*lock));
-    qemu_co_queue_init(&lock->queue);
+    qemu_co_queue_init(&lock->rqueue);
+    qemu_co_queue_init(&lock->wqueue);
     qemu_co_mutex_init(&lock->mutex);
 }
 
@@ -341,7 +342,7 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock)
     qemu_co_mutex_lock(&lock->mutex);
     /* For fairness, wait if a writer is in line.  */
     while (lock->pending_writer) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+        qemu_co_queue_wait(&lock->rqueue, &lock->mutex);
     }
     lock->reader++;
     qemu_co_mutex_unlock(&lock->mutex);
@@ -356,17 +357,22 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
 
     assert(qemu_in_coroutine());
     if (!lock->reader) {
-        /* The critical section started in qemu_co_rwlock_wrlock.  */
-        qemu_co_queue_restart_all(&lock->queue);
+        /* The critical section started in qemu_co_rwlock_wrlock or
+         * qemu_co_rwlock_upgrade.
+         */
+        qemu_co_queue_restart_all(&lock->wqueue);
+        qemu_co_queue_restart_all(&lock->rqueue);
     } else {
         self->locks_held--;
 
         qemu_co_mutex_lock(&lock->mutex);
         lock->reader--;
         assert(lock->reader >= 0);
-        /* Wakeup only one waiting writer */
-        if (!lock->reader) {
-            qemu_co_queue_next(&lock->queue);
+        /* If there are no remaining readers wake one waiting writer
+         * or all waiting readers.
+         */
+        if (!lock->reader && !qemu_co_queue_next(&lock->wqueue)) {
+            qemu_co_queue_restart_all(&lock->rqueue);
         }
     }
     qemu_co_mutex_unlock(&lock->mutex);
@@ -392,7 +398,7 @@ void qemu_co_rwlock_wrlock(CoRwlock *lock)
     qemu_co_mutex_lock(&lock->mutex);
     lock->pending_writer++;
     while (lock->reader) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+        qemu_co_queue_wait(&lock->wqueue, &lock->mutex);
     }
     lock->pending_writer--;
 
@@ -411,7 +417,7 @@ void qemu_co_rwlock_upgrade(CoRwlock *lock)
     lock->reader--;
     lock->pending_writer++;
     while (lock->reader) {
-        qemu_co_queue_wait(&lock->queue, &lock->mutex);
+        qemu_co_queue_wait(&lock->wqueue, &lock->mutex);
     }
     lock->pending_writer--;
 
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 3/4] coroutine/mutex: Store the coroutine in the CoWaitRecord only once
  2021-03-09 10:21 ` [RFC PATCH 3/4] coroutine/mutex: Store the coroutine in the CoWaitRecord only once David Edmondson
@ 2021-03-09 10:49   ` Paolo Bonzini
  2021-03-09 11:11   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2021-03-09 10:49 UTC (permalink / raw)
  To: David Edmondson, qemu-devel
  Cc: Kevin Wolf, Stefan Weil, Stefan Hajnoczi, qemu-block, Max Reitz

On 09/03/21 11:21, David Edmondson wrote:
> When taking the slow path for mutex acquisition, set the coroutine
> value in the CoWaitRecord in push_waiter(), rather than both there and
> in the caller.
> 
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
> ---
>   util/qemu-coroutine-lock.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
> index 5816bf8900..eb73cf11dc 100644
> --- a/util/qemu-coroutine-lock.c
> +++ b/util/qemu-coroutine-lock.c
> @@ -204,7 +204,6 @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(AioContext *ctx,
>       unsigned old_handoff;
>   
>       trace_qemu_co_mutex_lock_entry(mutex, self);
> -    w.co = self;
>       push_waiter(mutex, &w);
>   
>       /* This is the "Responsibility Hand-Off" protocol; a lock() picks from
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/4] coroutine/rwlock: Wake writers in preference to readers
  2021-03-09 10:21 ` [RFC PATCH 4/4] coroutine/rwlock: Wake writers in preference to readers David Edmondson
@ 2021-03-09 10:59   ` Paolo Bonzini
  2021-03-09 11:06   ` Paolo Bonzini
  1 sibling, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2021-03-09 10:59 UTC (permalink / raw)
  To: David Edmondson, qemu-devel
  Cc: Kevin Wolf, Stefan Weil, Stefan Hajnoczi, qemu-block, Max Reitz

On 09/03/21 11:21, David Edmondson wrote:
> A feature of the current rwlock is that if multiple coroutines hold a
> reader lock, all must be runnable. The unlock implementation relies on
> this, choosing to wake a single coroutine when the final read lock
> holder exits the critical section, assuming that it will wake a
> coroutine attempting to acquire a write lock.
> 
> The downgrade implementation violates this assumption by creating a
> read lock owning coroutine that is exclusively runnable - any other
> coroutines that are waiting to acquire a read lock are *not* made
> runnable when the write lock holder converts its ownership to read
> only.
> 
> As a result of this, a coroutine that downgrades a write lock can
> later cause unlock to wake a coroutine that is attempting to acquire a
> read lock rather than one aiming for a write lock, should the
> coroutines be so ordered in the wait queue.
> 
> If the wait queue contains both read hopefuls and write hopefuls, any
> read hopeful coroutine that is woken will immediately go back onto the
> wait queue when it attempts to acquire the rwlock, due to the pending
> write acquisition. At this point there are no coroutines holding
> either read or write locks and no way for the coroutines in the queue
> to be made runnable. A hang ensues.
> 
> Address this by using separate queues for coroutines attempting to
> acquire read and write ownership of the rwlock. When unlocking, prefer
> to make runnable a coroutine that is waiting for a write lock, but if
> none is available, make all coroutines waiting to take a read lock
> runnable.
> 
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>

This is certainly the simplest solution, I like it.  And if I understand 
it correctly, doing this instead in unlock:

         if (lock->reader || !qemu_co_queue_next(&lock->wqueue)) {
             qemu_co_queue_restart_all(&lock->rqueue);

would be incorrect because readers could starve writers.

Regarding this particular bug, do you think you could write a testcase too?

Thanks,

Paolo

> ---
>   include/qemu/coroutine.h   |  8 +++++---
>   util/qemu-coroutine-lock.c | 24 +++++++++++++++---------
>   2 files changed, 20 insertions(+), 12 deletions(-)
> 
> diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
> index 84eab6e3bf..3dfbf57faf 100644
> --- a/include/qemu/coroutine.h
> +++ b/include/qemu/coroutine.h
> @@ -241,7 +241,8 @@ typedef struct CoRwlock {
>       int pending_writer;
>       int reader;
>       CoMutex mutex;
> -    CoQueue queue;
> +    CoQueue rqueue;
> +    CoQueue wqueue;
>   } CoRwlock;
>   
>   /**
> @@ -283,8 +284,9 @@ void qemu_co_rwlock_downgrade(CoRwlock *lock);
>   void qemu_co_rwlock_wrlock(CoRwlock *lock);
>   
>   /**
> - * Unlocks the read/write lock and schedules the next coroutine that was
> - * waiting for this lock to be run.
> + * Unlocks the read/write lock and schedules the next coroutine that
> + * was waiting for this lock to be run, preferring to wake one
> + * attempting to take a write lock over those taking a read lock.
>    */
>   void qemu_co_rwlock_unlock(CoRwlock *lock);
>   
> diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
> index eb73cf11dc..c05c143142 100644
> --- a/util/qemu-coroutine-lock.c
> +++ b/util/qemu-coroutine-lock.c
> @@ -330,7 +330,8 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
>   void qemu_co_rwlock_init(CoRwlock *lock)
>   {
>       memset(lock, 0, sizeof(*lock));
> -    qemu_co_queue_init(&lock->queue);
> +    qemu_co_queue_init(&lock->rqueue);
> +    qemu_co_queue_init(&lock->wqueue);
>       qemu_co_mutex_init(&lock->mutex);
>   }
>   
> @@ -341,7 +342,7 @@ void qemu_co_rwlock_rdlock(CoRwlock *lock)
>       qemu_co_mutex_lock(&lock->mutex);
>       /* For fairness, wait if a writer is in line.  */
>       while (lock->pending_writer) {
> -        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> +        qemu_co_queue_wait(&lock->rqueue, &lock->mutex);
>       }
>       lock->reader++;
>       qemu_co_mutex_unlock(&lock->mutex);
> @@ -356,17 +357,22 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
>   
>       assert(qemu_in_coroutine());
>       if (!lock->reader) {
> -        /* The critical section started in qemu_co_rwlock_wrlock.  */
> -        qemu_co_queue_restart_all(&lock->queue);
> +        /* The critical section started in qemu_co_rwlock_wrlock or
> +         * qemu_co_rwlock_upgrade.
> +         */
> +        qemu_co_queue_restart_all(&lock->wqueue);
> +        qemu_co_queue_restart_all(&lock->rqueue);
>       } else {
>           self->locks_held--;
>   
>           qemu_co_mutex_lock(&lock->mutex);
>           lock->reader--;
>           assert(lock->reader >= 0);
> -        /* Wakeup only one waiting writer */
> -        if (!lock->reader) {
> -            qemu_co_queue_next(&lock->queue);
> +        /* If there are no remaining readers wake one waiting writer
> +         * or all waiting readers.
> +         */
> +        if (!lock->reader && !qemu_co_queue_next(&lock->wqueue)) {
> +            qemu_co_queue_restart_all(&lock->rqueue);
>           }
>       }
>       qemu_co_mutex_unlock(&lock->mutex);
> @@ -392,7 +398,7 @@ void qemu_co_rwlock_wrlock(CoRwlock *lock)
>       qemu_co_mutex_lock(&lock->mutex);
>       lock->pending_writer++;
>       while (lock->reader) {
> -        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> +        qemu_co_queue_wait(&lock->wqueue, &lock->mutex);
>       }
>       lock->pending_writer--;
>   
> @@ -411,7 +417,7 @@ void qemu_co_rwlock_upgrade(CoRwlock *lock)
>       lock->reader--;
>       lock->pending_writer++;
>       while (lock->reader) {
> -        qemu_co_queue_wait(&lock->queue, &lock->mutex);
> +        qemu_co_queue_wait(&lock->wqueue, &lock->mutex);
>       }
>       lock->pending_writer--;
>   
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/4] coroutine/rwlock: Wake writers in preference to readers
  2021-03-09 10:21 ` [RFC PATCH 4/4] coroutine/rwlock: Wake writers in preference to readers David Edmondson
  2021-03-09 10:59   ` Paolo Bonzini
@ 2021-03-09 11:06   ` Paolo Bonzini
  2021-03-09 11:57     ` David Edmondson
  1 sibling, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2021-03-09 11:06 UTC (permalink / raw)
  To: David Edmondson, qemu-devel
  Cc: Kevin Wolf, Stefan Weil, Stefan Hajnoczi, qemu-block, Max Reitz

On 09/03/21 11:21, David Edmondson wrote:
> -        /* The critical section started in qemu_co_rwlock_wrlock.  */
> -        qemu_co_queue_restart_all(&lock->queue);
> +        /* The critical section started in qemu_co_rwlock_wrlock or
> +         * qemu_co_rwlock_upgrade.
> +         */
> +        qemu_co_queue_restart_all(&lock->wqueue);
> +        qemu_co_queue_restart_all(&lock->rqueue);

Hmm, the devil is in the details---this is a thundering herd waiting to 
happen.  But fortunately this can be fixed while making the unlock 
primitive even simpler:

     if (lock->reader) {
          self->locks_held--;

          /* Read-side critical sections do not keep lock->mutex.  */
          qemu_co_mutex_lock(&lock->mutex);
          lock->reader--;
          assert(lock->reader >= 0);
      }

      /* If there are no remaining readers wake one waiting writer
       * or all waiting readers.
       */
      if (!lock->reader && !qemu_co_queue_next(&lock->wqueue)) {
          assert(!lock->pending_writer);
          qemu_co_queue_restart_all(&lock->rqueue);
      }

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/4] block/vdi: When writing new bmap entry fails, don't leak the buffer
  2021-03-09 10:21 ` [RFC PATCH 1/4] block/vdi: When writing new bmap entry fails, don't leak the buffer David Edmondson
@ 2021-03-09 11:09   ` Philippe Mathieu-Daudé
  2021-03-09 11:58     ` David Edmondson
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-03-09 11:09 UTC (permalink / raw)
  To: David Edmondson, qemu-devel
  Cc: Kevin Wolf, Stefan Weil, Stefan Hajnoczi, qemu-block, Max Reitz

On 3/9/21 11:21 AM, David Edmondson wrote:
> If a new bitmap entry is allocated, requiring the entire block to be
> written, avoiding leaking the buffer allocated for the block should
> the write fail.
> 
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> ---
>  block/vdi.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index 5627e7d764..2a6dc26124 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -690,6 +690,7 @@ nonallocating_write:
>  
>      logout("finished data write\n");
>      if (ret < 0) {
> +        g_free(block);
>          return ret;
>      }

Alternative using g_autofree:

-- >8 --
diff --git a/block/vdi.c b/block/vdi.c
index 5627e7d764a..1cd8ae2ba99 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -612,7 +612,7 @@ vdi_co_pwritev(BlockDriverState *bs, uint64_t
offset, uint64_t bytes,
     uint64_t data_offset;
     uint32_t bmap_first = VDI_UNALLOCATED;
     uint32_t bmap_last = VDI_UNALLOCATED;
-    uint8_t *block = NULL;
+    g_autofree uint8_t *block = NULL;
     uint64_t bytes_done = 0;
     int ret = 0;

@@ -705,9 +705,6 @@ nonallocating_write:
         *header = s->header;
         vdi_header_to_le(header);
         ret = bdrv_pwrite(bs->file, 0, block, sizeof(VdiHeader));
-        g_free(block);
-        block = NULL;
-
         if (ret < 0) {
             return ret;
         }
---



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 3/4] coroutine/mutex: Store the coroutine in the CoWaitRecord only once
  2021-03-09 10:21 ` [RFC PATCH 3/4] coroutine/mutex: Store the coroutine in the CoWaitRecord only once David Edmondson
  2021-03-09 10:49   ` Paolo Bonzini
@ 2021-03-09 11:11   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 14+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-03-09 11:11 UTC (permalink / raw)
  To: David Edmondson, qemu-devel
  Cc: Kevin Wolf, Stefan Weil, Stefan Hajnoczi, qemu-block, Max Reitz

On 3/9/21 11:21 AM, David Edmondson wrote:
> When taking the slow path for mutex acquisition, set the coroutine
> value in the CoWaitRecord in push_waiter(), rather than both there and
> in the caller.
> 
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
> ---
>  util/qemu-coroutine-lock.c | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/4] coroutine/rwlock: Wake writers in preference to readers
  2021-03-09 11:06   ` Paolo Bonzini
@ 2021-03-09 11:57     ` David Edmondson
  0 siblings, 0 replies; 14+ messages in thread
From: David Edmondson @ 2021-03-09 11:57 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Kevin Wolf, Stefan Weil, Stefan Hajnoczi, qemu-block, Max Reitz

On Tuesday, 2021-03-09 at 12:06:22 +01, Paolo Bonzini wrote:

> On 09/03/21 11:21, David Edmondson wrote:
>> -        /* The critical section started in qemu_co_rwlock_wrlock.  */
>> -        qemu_co_queue_restart_all(&lock->queue);
>> +        /* The critical section started in qemu_co_rwlock_wrlock or
>> +         * qemu_co_rwlock_upgrade.
>> +         */
>> +        qemu_co_queue_restart_all(&lock->wqueue);
>> +        qemu_co_queue_restart_all(&lock->rqueue);
>
> Hmm, the devil is in the details---this is a thundering herd waiting to 
> happen.  But fortunately this can be fixed while making the unlock 
> primitive even simpler:
>
>      if (lock->reader) {
>           self->locks_held--;
>
>           /* Read-side critical sections do not keep lock->mutex.  */
>           qemu_co_mutex_lock(&lock->mutex);
>           lock->reader--;
>           assert(lock->reader >= 0);
>       }
>
>       /* If there are no remaining readers wake one waiting writer
>        * or all waiting readers.
>        */
>       if (!lock->reader && !qemu_co_queue_next(&lock->wqueue)) {
>           assert(!lock->pending_writer);
>           qemu_co_queue_restart_all(&lock->rqueue);
>       }

That's a nice improvement, I agree. I'll roll it into another revision
and work on a test.

dme.
-- 
But uh oh, I love her because, she moves in her own way.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/4] block/vdi: When writing new bmap entry fails, don't leak the buffer
  2021-03-09 11:09   ` Philippe Mathieu-Daudé
@ 2021-03-09 11:58     ` David Edmondson
  2021-03-09 12:06       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 14+ messages in thread
From: David Edmondson @ 2021-03-09 11:58 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Kevin Wolf, Stefan Weil, Stefan Hajnoczi, qemu-block, Max Reitz

On Tuesday, 2021-03-09 at 12:09:55 +01, Philippe Mathieu-Daudé wrote:

> On 3/9/21 11:21 AM, David Edmondson wrote:
>> If a new bitmap entry is allocated, requiring the entire block to be
>> written, avoiding leaking the buffer allocated for the block should
>> the write fail.
>> 
>> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

Thanks.

>> ---
>>  block/vdi.c | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/block/vdi.c b/block/vdi.c
>> index 5627e7d764..2a6dc26124 100644
>> --- a/block/vdi.c
>> +++ b/block/vdi.c
>> @@ -690,6 +690,7 @@ nonallocating_write:
>>  
>>      logout("finished data write\n");
>>      if (ret < 0) {
>> +        g_free(block);
>>          return ret;
>>      }
>
> Alternative using g_autofree:

Newfangled witchy magic!

I'm happy to change it if you think it beneficial.

> -- >8 --
> diff --git a/block/vdi.c b/block/vdi.c
> index 5627e7d764a..1cd8ae2ba99 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -612,7 +612,7 @@ vdi_co_pwritev(BlockDriverState *bs, uint64_t
> offset, uint64_t bytes,
>      uint64_t data_offset;
>      uint32_t bmap_first = VDI_UNALLOCATED;
>      uint32_t bmap_last = VDI_UNALLOCATED;
> -    uint8_t *block = NULL;
> +    g_autofree uint8_t *block = NULL;
>      uint64_t bytes_done = 0;
>      int ret = 0;
>
> @@ -705,9 +705,6 @@ nonallocating_write:
>          *header = s->header;
>          vdi_header_to_le(header);
>          ret = bdrv_pwrite(bs->file, 0, block, sizeof(VdiHeader));
> -        g_free(block);
> -        block = NULL;
> -
>          if (ret < 0) {
>              return ret;
>          }
> ---

dme.
-- 
Tonight I think I'll walk alone, I'll find my soul as I go home.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/4] block/vdi: When writing new bmap entry fails, don't leak the buffer
  2021-03-09 11:58     ` David Edmondson
@ 2021-03-09 12:06       ` Philippe Mathieu-Daudé
  2021-03-09 13:07         ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-03-09 12:06 UTC (permalink / raw)
  To: David Edmondson, qemu-devel
  Cc: Kevin Wolf, Stefan Weil, Stefan Hajnoczi, qemu-block, Max Reitz

On 3/9/21 12:58 PM, David Edmondson wrote:
> On Tuesday, 2021-03-09 at 12:09:55 +01, Philippe Mathieu-Daudé wrote:
> 
>> On 3/9/21 11:21 AM, David Edmondson wrote:
>>> If a new bitmap entry is allocated, requiring the entire block to be
>>> written, avoiding leaking the buffer allocated for the block should
>>> the write fail.
>>>
>>> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
>>
>> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> 
> Thanks.
> 
>>> ---
>>>  block/vdi.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/block/vdi.c b/block/vdi.c
>>> index 5627e7d764..2a6dc26124 100644
>>> --- a/block/vdi.c
>>> +++ b/block/vdi.c
>>> @@ -690,6 +690,7 @@ nonallocating_write:
>>>  
>>>      logout("finished data write\n");
>>>      if (ret < 0) {
>>> +        g_free(block);
>>>          return ret;
>>>      }
>>
>> Alternative using g_autofree:
> 
> Newfangled witchy magic!
> 
> I'm happy to change it if you think it beneficial.

I then saw the next patch which keeps modifying the same
function, so this might not be a great improvement after
all.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/4] block/vdi: When writing new bmap entry fails, don't leak the buffer
  2021-03-09 12:06       ` Philippe Mathieu-Daudé
@ 2021-03-09 13:07         ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2021-03-09 13:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, David Edmondson, qemu-devel
  Cc: Kevin Wolf, Stefan Weil, qemu-block, Stefan Hajnoczi, Max Reitz

On 09/03/21 13:06, Philippe Mathieu-Daudé wrote:
>> Newfangled witchy magic!
>>
>> I'm happy to change it if you think it beneficial.
> 
> I then saw the next patch which keeps modifying the same
> function, so this might not be a great improvement after
> all.

Yeah I was also going to suggest it but considering patch 2 it doesn't 
really flow well.

paolo



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-03-09 13:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-09 10:21 [RFC PATCH 0/4] coroutine rwlock downgrade fix, minor VDI changes David Edmondson
2021-03-09 10:21 ` [RFC PATCH 1/4] block/vdi: When writing new bmap entry fails, don't leak the buffer David Edmondson
2021-03-09 11:09   ` Philippe Mathieu-Daudé
2021-03-09 11:58     ` David Edmondson
2021-03-09 12:06       ` Philippe Mathieu-Daudé
2021-03-09 13:07         ` Paolo Bonzini
2021-03-09 10:21 ` [RFC PATCH 2/4] block/vdi: Don't assume that blocks are larger than VdiHeader David Edmondson
2021-03-09 10:21 ` [RFC PATCH 3/4] coroutine/mutex: Store the coroutine in the CoWaitRecord only once David Edmondson
2021-03-09 10:49   ` Paolo Bonzini
2021-03-09 11:11   ` Philippe Mathieu-Daudé
2021-03-09 10:21 ` [RFC PATCH 4/4] coroutine/rwlock: Wake writers in preference to readers David Edmondson
2021-03-09 10:59   ` Paolo Bonzini
2021-03-09 11:06   ` Paolo Bonzini
2021-03-09 11:57     ` David Edmondson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).