qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mirror: Fix hang (operation waiting for itself/circular dependency)
@ 2020-03-25 17:23 Kevin Wolf
  2020-03-25 17:23 ` [PATCH 1/2] Revert "mirror: Don't let an operation wait for itself" Kevin Wolf
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Kevin Wolf @ 2020-03-25 17:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, jsnow, qemu-devel, mreitz

The recent fix didn't actually fix the whole problem. Operations can't
only wait for themselves, but we can also end up with circular
dependencies like two operations waiting for each other to complete.

This reverts the first fix and implements another approach.

Kevin Wolf (2):
  Revert "mirror: Don't let an operation wait for itself"
  mirror: Wait only for in-flight operations

 block/mirror.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] Revert "mirror: Don't let an operation wait for itself"
  2020-03-25 17:23 [PATCH 0/2] mirror: Fix hang (operation waiting for itself/circular dependency) Kevin Wolf
@ 2020-03-25 17:23 ` Kevin Wolf
  2020-03-25 17:36   ` Eric Blake
  2020-03-25 17:23 ` [PATCH 2/2] mirror: Wait only for in-flight operations Kevin Wolf
  2020-03-25 17:49 ` [PATCH 0/2] mirror: Fix hang (operation waiting for itself/circular dependency) Kevin Wolf
  2 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2020-03-25 17:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, jsnow, qemu-devel, mreitz

This reverts commit 7e6c4ff792734e196c8ca82564c56b5e7c6288ca.

The fix was incomplete as it only protected against requests waiting for
themselves, but not against requests waiting for each other. We need a
different solution.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/mirror.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 447051dbc6..393131b135 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -283,14 +283,11 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
 }
 
 static inline void coroutine_fn
-mirror_wait_for_any_operation(MirrorBlockJob *s, MirrorOp *self, bool active)
+mirror_wait_for_any_operation(MirrorBlockJob *s, bool active)
 {
     MirrorOp *op;
 
     QTAILQ_FOREACH(op, &s->ops_in_flight, next) {
-        if (self == op) {
-            continue;
-        }
         /* Do not wait on pseudo ops, because it may in turn wait on
          * some other operation to start, which may in fact be the
          * caller of this function.  Since there is only one pseudo op
@@ -305,10 +302,10 @@ mirror_wait_for_any_operation(MirrorBlockJob *s, MirrorOp *self, bool active)
 }
 
 static inline void coroutine_fn
-mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s, MirrorOp *self)
+mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
 {
     /* Only non-active operations use up in-flight slots */
-    mirror_wait_for_any_operation(s, self, false);
+    mirror_wait_for_any_operation(s, false);
 }
 
 /* Perform a mirror copy operation.
@@ -351,7 +348,7 @@ static void coroutine_fn mirror_co_read(void *opaque)
 
     while (s->buf_free_count < nb_chunks) {
         trace_mirror_yield_in_flight(s, op->offset, s->in_flight);
-        mirror_wait_for_free_in_flight_slot(s, op);
+        mirror_wait_for_free_in_flight_slot(s);
     }
 
     /* Now make a QEMUIOVector taking enough granularity-sized chunks
@@ -558,7 +555,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 
         while (s->in_flight >= MAX_IN_FLIGHT) {
             trace_mirror_yield_in_flight(s, offset, s->in_flight);
-            mirror_wait_for_free_in_flight_slot(s, pseudo_op);
+            mirror_wait_for_free_in_flight_slot(s);
         }
 
         if (s->ret < 0) {
@@ -612,7 +609,7 @@ static void mirror_free_init(MirrorBlockJob *s)
 static void coroutine_fn mirror_wait_for_all_io(MirrorBlockJob *s)
 {
     while (s->in_flight > 0) {
-        mirror_wait_for_free_in_flight_slot(s, NULL);
+        mirror_wait_for_free_in_flight_slot(s);
     }
 }
 
@@ -809,7 +806,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
             if (s->in_flight >= MAX_IN_FLIGHT) {
                 trace_mirror_yield(s, UINT64_MAX, s->buf_free_count,
                                    s->in_flight);
-                mirror_wait_for_free_in_flight_slot(s, NULL);
+                mirror_wait_for_free_in_flight_slot(s);
                 continue;
             }
 
@@ -962,7 +959,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
         /* Do not start passive operations while there are active
          * writes in progress */
         while (s->in_active_write_counter) {
-            mirror_wait_for_any_operation(s, NULL, true);
+            mirror_wait_for_any_operation(s, true);
         }
 
         if (s->ret < 0) {
@@ -988,7 +985,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
             if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
                 (cnt == 0 && s->in_flight > 0)) {
                 trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
-                mirror_wait_for_free_in_flight_slot(s, NULL);
+                mirror_wait_for_free_in_flight_slot(s);
                 continue;
             } else if (cnt != 0) {
                 delay_ns = mirror_iteration(s);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] mirror: Wait only for in-flight operations
  2020-03-25 17:23 [PATCH 0/2] mirror: Fix hang (operation waiting for itself/circular dependency) Kevin Wolf
  2020-03-25 17:23 ` [PATCH 1/2] Revert "mirror: Don't let an operation wait for itself" Kevin Wolf
@ 2020-03-25 17:23 ` Kevin Wolf
  2020-03-25 17:39   ` Eric Blake
  2020-03-25 17:49 ` [PATCH 0/2] mirror: Fix hang (operation waiting for itself/circular dependency) Kevin Wolf
  2 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2020-03-25 17:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, jsnow, qemu-devel, mreitz

mirror_wait_for_free_in_flight_slot() just picks a random operation to
wait for. However, a MirrorOp is already in s->ops_in_flight when
mirror_co_read() waits for free slots, so if not enough slots are
immediately available, an operation can end up waiting for itself, or
two or more operations can wait for each other to complete, which
results in a hang.

Fix this by adding a flag to MirrorOp that tells us if the request is
already in flight (and therefore occupies slots that it will later
free), and picking only such operations for waiting.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/mirror.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 393131b135..7fef52ded2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -102,6 +102,7 @@ struct MirrorOp {
 
     bool is_pseudo_op;
     bool is_active_write;
+    bool is_in_flight;
     CoQueue waiting_requests;
     Coroutine *co;
 
@@ -293,7 +294,9 @@ mirror_wait_for_any_operation(MirrorBlockJob *s, bool active)
          * caller of this function.  Since there is only one pseudo op
          * at any given time, we will always find some real operation
          * to wait on. */
-        if (!op->is_pseudo_op && op->is_active_write == active) {
+        if (!op->is_pseudo_op && op->is_in_flight &&
+            op->is_active_write == active)
+        {
             qemu_co_queue_wait(&op->waiting_requests, NULL);
             return;
         }
@@ -367,6 +370,7 @@ static void coroutine_fn mirror_co_read(void *opaque)
     /* Copy the dirty cluster.  */
     s->in_flight++;
     s->bytes_in_flight += op->bytes;
+    op->is_in_flight = true;
     trace_mirror_one_iteration(s, op->offset, op->bytes);
 
     ret = bdrv_co_preadv(s->mirror_top_bs->backing, op->offset, op->bytes,
@@ -382,6 +386,7 @@ static void coroutine_fn mirror_co_zero(void *opaque)
     op->s->in_flight++;
     op->s->bytes_in_flight += op->bytes;
     *op->bytes_handled = op->bytes;
+    op->is_in_flight = true;
 
     ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
                                op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
@@ -396,6 +401,7 @@ static void coroutine_fn mirror_co_discard(void *opaque)
     op->s->in_flight++;
     op->s->bytes_in_flight += op->bytes;
     *op->bytes_handled = op->bytes;
+    op->is_in_flight = true;
 
     ret = blk_co_pdiscard(op->s->target, op->offset, op->bytes);
     mirror_write_complete(op, ret);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] Revert "mirror: Don't let an operation wait for itself"
  2020-03-25 17:23 ` [PATCH 1/2] Revert "mirror: Don't let an operation wait for itself" Kevin Wolf
@ 2020-03-25 17:36   ` Eric Blake
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Blake @ 2020-03-25 17:36 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: jsnow, qemu-devel, mreitz

On 3/25/20 12:23 PM, Kevin Wolf wrote:
> This reverts commit 7e6c4ff792734e196c8ca82564c56b5e7c6288ca.
> 
> The fix was incomplete as it only protected against requests waiting for
> themselves, but not against requests waiting for each other. We need a
> different solution.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/mirror.c | 21 +++++++++------------
>   1 file changed, 9 insertions(+), 12 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] mirror: Wait only for in-flight operations
  2020-03-25 17:23 ` [PATCH 2/2] mirror: Wait only for in-flight operations Kevin Wolf
@ 2020-03-25 17:39   ` Eric Blake
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Blake @ 2020-03-25 17:39 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: jsnow, qemu-devel, mreitz

On 3/25/20 12:23 PM, Kevin Wolf wrote:
> mirror_wait_for_free_in_flight_slot() just picks a random operation to
> wait for. However, a MirrorOp is already in s->ops_in_flight when
> mirror_co_read() waits for free slots, so if not enough slots are
> immediately available, an operation can end up waiting for itself, or
> two or more operations can wait for each other to complete, which
> results in a hang.
> 
> Fix this by adding a flag to MirrorOp that tells us if the request is
> already in flight (and therefore occupies slots that it will later
> free), and picking only such operations for waiting.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/mirror.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 393131b135..7fef52ded2 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -102,6 +102,7 @@ struct MirrorOp {
>   
>       bool is_pseudo_op;
>       bool is_active_write;
> +    bool is_in_flight;
>       CoQueue waiting_requests;
>       Coroutine *co;
>   
> @@ -293,7 +294,9 @@ mirror_wait_for_any_operation(MirrorBlockJob *s, bool active)
>            * caller of this function.  Since there is only one pseudo op
>            * at any given time, we will always find some real operation
>            * to wait on. */
> -        if (!op->is_pseudo_op && op->is_active_write == active) {
> +        if (!op->is_pseudo_op && op->is_in_flight &&
> +            op->is_active_write == active)
> +        {
>               qemu_co_queue_wait(&op->waiting_requests, NULL);

Looks like a one-way transition - op->is_in_flight always starts as 
false, and only ever gets set to true (once the op is finished, op is no 
longer needed).  And being more selective on what you wait for here does 
look like it should work in more cases than what patch 1 reverted.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] mirror: Fix hang (operation waiting for itself/circular dependency)
  2020-03-25 17:23 [PATCH 0/2] mirror: Fix hang (operation waiting for itself/circular dependency) Kevin Wolf
  2020-03-25 17:23 ` [PATCH 1/2] Revert "mirror: Don't let an operation wait for itself" Kevin Wolf
  2020-03-25 17:23 ` [PATCH 2/2] mirror: Wait only for in-flight operations Kevin Wolf
@ 2020-03-25 17:49 ` Kevin Wolf
  2 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2020-03-25 17:49 UTC (permalink / raw)
  To: qemu-block; +Cc: jsnow, qemu-devel, mreitz

Am 25.03.2020 um 18:23 hat Kevin Wolf geschrieben:
> The recent fix didn't actually fix the whole problem. Operations can't
> only wait for themselves, but we can also end up with circular
> dependencies like two operations waiting for each other to complete.
> 
> This reverts the first fix and implements another approach.

Hm, somehow this seems to break iotests 151. I don't actually understand
the backtrace, because that's during job initialisation, so my changes
shouldn't have had any effect yet:

(gdb) bt
#0  0x00007fba6d85057f in raise () at /lib64/libc.so.6
#1  0x00007fba6d83a895 in abort () at /lib64/libc.so.6
#2  0x00005624d94d109a in bitmap_new (nbits=<optimized out>) at /home/kwolf/source/qemu/include/qemu/bitmap.h:103
#3  0x00005624d94d109a in mirror_run (job=0x5624dc8d5560, errp=<optimized out>) at block/mirror.c:922
#4  0x00005624d988053f in job_co_entry (opaque=0x5624dc8d5560) at job.c:878
#5  0x00005624d998d3bb in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115
#6  0x00007fba6d866250 in __start_context () at /lib64/libc.so.6
#7  0x00007fffa2d48130 in  ()
#8  0x0000000000000000 in  ()

Something to check tomorrow.

Kevin



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-25 17:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-25 17:23 [PATCH 0/2] mirror: Fix hang (operation waiting for itself/circular dependency) Kevin Wolf
2020-03-25 17:23 ` [PATCH 1/2] Revert "mirror: Don't let an operation wait for itself" Kevin Wolf
2020-03-25 17:36   ` Eric Blake
2020-03-25 17:23 ` [PATCH 2/2] mirror: Wait only for in-flight operations Kevin Wolf
2020-03-25 17:39   ` Eric Blake
2020-03-25 17:49 ` [PATCH 0/2] mirror: Fix hang (operation waiting for itself/circular dependency) Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).