All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix active mirror dead-lock
@ 2021-07-02 21:16 Vladimir Sementsov-Ogievskiy
  2021-07-02 21:16 ` [PATCH 1/3] block/mirror: set .co for active-write MirrorOp objects Vladimir Sementsov-Ogievskiy
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-02 21:16 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, vsementsov, jsnow, dem

Hi all!

We've faced a dead-lock in active mirror in our Rhev-8.4 based Qemu
build. And it's reproducible on master too.

Vladimir Sementsov-Ogievskiy (3):
  block/mirror: set .co for active-write MirrorOp objects
  iotest 151: add test-case that shows active mirror dead-lock
  block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts

 block/mirror.c             | 13 +++++++++
 tests/qemu-iotests/151     | 54 ++++++++++++++++++++++++++++++++++++--
 tests/qemu-iotests/151.out |  4 +--
 3 files changed, 67 insertions(+), 4 deletions(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] block/mirror: set .co for active-write MirrorOp objects
  2021-07-02 21:16 [PATCH 0/3] Fix active mirror dead-lock Vladimir Sementsov-Ogievskiy
@ 2021-07-02 21:16 ` Vladimir Sementsov-Ogievskiy
  2021-07-02 21:16 ` [PATCH 2/3] iotest 151: add test-case that shows active mirror dead-lock Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-02 21:16 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, vsementsov, jsnow, dem

This field is unused, but it very helpful for debugging.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/mirror.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/mirror.c b/block/mirror.c
index 019f6deaa5..ad6aac2f95 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1343,6 +1343,7 @@ static MirrorOp *coroutine_fn active_write_prepare(MirrorBlockJob *s,
         .bytes              = bytes,
         .is_active_write    = true,
         .is_in_flight       = true,
+        .co                 = qemu_coroutine_self(),
     };
     qemu_co_queue_init(&op->waiting_requests);
     QTAILQ_INSERT_TAIL(&s->ops_in_flight, op, next);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] iotest 151: add test-case that shows active mirror dead-lock
  2021-07-02 21:16 [PATCH 0/3] Fix active mirror dead-lock Vladimir Sementsov-Ogievskiy
  2021-07-02 21:16 ` [PATCH 1/3] block/mirror: set .co for active-write MirrorOp objects Vladimir Sementsov-Ogievskiy
@ 2021-07-02 21:16 ` Vladimir Sementsov-Ogievskiy
  2021-07-02 21:16 ` [PATCH 3/3] block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-02 21:16 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, vsementsov, jsnow, dem

There is a dead-lock in active mirror: when we have parallel
intersecting requests (note that non intersecting requests may be
considered intersecting after aligning to mirror granularity), it may
happen that request A waits request B in mirror_wait_on_conflicts() and
request B waits for A.

Look at the test for details. Test now dead-locks, that's why it's
disabled. Next commit will fix mirror and enable the test.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/151     | 62 ++++++++++++++++++++++++++++++++++++--
 tests/qemu-iotests/151.out |  4 +--
 2 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/151 b/tests/qemu-iotests/151
index 182f6b5321..ab46c5e8ba 100755
--- a/tests/qemu-iotests/151
+++ b/tests/qemu-iotests/151
@@ -38,8 +38,9 @@ class TestActiveMirror(iotests.QMPTestCase):
                       'if': 'none',
                       'node-name': 'source-node',
                       'driver': iotests.imgfmt,
-                      'file': {'driver': 'file',
-                               'filename': source_img}}
+                      'file': {'driver': 'blkdebug',
+                               'image': {'driver': 'file',
+                                         'filename': source_img}}}
 
         blk_target = {'node-name': 'target-node',
                       'driver': iotests.imgfmt,
@@ -141,6 +142,63 @@ class TestActiveMirror(iotests.QMPTestCase):
 
         self.potential_writes_in_flight = False
 
+    def testIntersectingActiveIO(self):
+        # FIXME: test-case is dead-locking. To reproduce dead-lock just drop
+        # this return statement
+        return
+
+        # Fill the source image
+        result = self.vm.hmp_qemu_io('source', 'write -P 1 0 2M')
+
+        # Start the block job (very slowly)
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             filter_node_name='mirror-node',
+                             device='source-node',
+                             target='target-node',
+                             sync='full',
+                             copy_mode='write-blocking',
+                             speed=1)
+
+        self.vm.hmp_qemu_io('source', 'break write_aio A')
+        self.vm.hmp_qemu_io('source', 'aio_write 0 1M')  # 1
+        self.vm.hmp_qemu_io('source', 'wait_break A')
+        self.vm.hmp_qemu_io('source', 'aio_write 0 2M')  # 2
+        self.vm.hmp_qemu_io('source', 'aio_write 0 2M')  # 3
+
+        # Now 2 and 3 are in mirror_wait_on_conflicts, waiting for 1
+
+        self.vm.hmp_qemu_io('source', 'break write_aio B')
+        self.vm.hmp_qemu_io('source', 'aio_write 1M 2M')  # 4
+        self.vm.hmp_qemu_io('source', 'wait_break B')
+
+        # 4 doesn't wait for 2 and 3, because they didn't yet set
+        # in_flight_bitmap. So, nothing prevents 4 to go except for our
+        # break-point B.
+
+        self.vm.hmp_qemu_io('source', 'resume A')
+
+        # Now we resumed 1, so 2 and 3 goes to the next iteration of while loop
+        # in mirror_wait_on_conflicts(). They don't exit, as bitmap is dirty
+        # due to request 4. And they start to wait: 2 wait for 3, 3 wait for 2
+        # - DEAD LOCK.
+        # Note that it's important that we add request 4 at last: requests are
+        # appended to the list, so we are sure that 4 is last in the list, so 2
+        # and 3 now waits for each other, not for 4.
+
+        self.vm.hmp_qemu_io('source', 'resume B')
+
+        # Resuming 4 doesn't help, 2 and 3 already dead-locked
+        # To check the dead-lock run:
+        #    gdb -p $(pidof qemu-system-x86_64) -ex 'set $job=(MirrorBlockJob *)jobs.lh_first' -ex 'p *$job->ops_in_flight.tqh_first' -ex 'p *$job->ops_in_flight.tqh_first->next.tqe_next'
+        # You'll see two MirrorOp objects waiting on each other
+
+        result = self.vm.qmp('block-job-set-speed', device='mirror', speed=0)
+        self.assert_qmp(result, 'return', {})
+        self.complete_and_wait(drive='mirror')
+
+        self.potential_writes_in_flight = False
+
 
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'raw'],
diff --git a/tests/qemu-iotests/151.out b/tests/qemu-iotests/151.out
index 8d7e996700..89968f35d7 100644
--- a/tests/qemu-iotests/151.out
+++ b/tests/qemu-iotests/151.out
@@ -1,5 +1,5 @@
-...
+....
 ----------------------------------------------------------------------
-Ran 3 tests
+Ran 4 tests
 
 OK
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts
  2021-07-02 21:16 [PATCH 0/3] Fix active mirror dead-lock Vladimir Sementsov-Ogievskiy
  2021-07-02 21:16 ` [PATCH 1/3] block/mirror: set .co for active-write MirrorOp objects Vladimir Sementsov-Ogievskiy
  2021-07-02 21:16 ` [PATCH 2/3] iotest 151: add test-case that shows active mirror dead-lock Vladimir Sementsov-Ogievskiy
@ 2021-07-02 21:16 ` Vladimir Sementsov-Ogievskiy
  2021-07-02 22:34 ` [PATCH 0/3] Fix active mirror dead-lock Vladimir Sementsov-Ogievskiy
  2021-07-14 14:08 ` Kevin Wolf
  4 siblings, 0 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-02 21:16 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, vsementsov, jsnow, dem

It's possible that requests start to wait each other in
mirror_wait_on_conflicts(). To avoid it let's use same technique as in
block/io.c in bdrv_wait_serialising_requests_locked() /
bdrv_find_conflicting_request(): don't wait on intersecting request if
it is already waiting for some other request.

For details of the dead-lock look at testIntersectingActiveIO()
test-case which we actually fixing now.

Fixes: d06107ade0ce74dc39739bac80de84b51ec18546
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

Note, that I failed to run the test on d06107ade0ce74d, that introduces
active mirror. But it seems that the problem should exit in it too, and
it's better to leave "Fixes:" tag than don't do it.

 block/mirror.c         | 12 ++++++++++++
 tests/qemu-iotests/151 | 18 +++++-------------
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index ad6aac2f95..98fc66eabf 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -107,6 +107,7 @@ struct MirrorOp {
     bool is_in_flight;
     CoQueue waiting_requests;
     Coroutine *co;
+    MirrorOp *waiting_for_op;
 
     QTAILQ_ENTRY(MirrorOp) next;
 };
@@ -159,7 +160,18 @@ static void coroutine_fn mirror_wait_on_conflicts(MirrorOp *self,
             if (ranges_overlap(self_start_chunk, self_nb_chunks,
                                op_start_chunk, op_nb_chunks))
             {
+                /*
+                 * If the operation is already (indirectly) waiting for us, or
+                 * will wait for us as soon as it wakes up, then just go on
+                 * (instead of producing a deadlock in the former case).
+                 */
+                if (op->waiting_for_op) {
+                    continue;
+                }
+
+                self->waiting_for_op = op;
                 qemu_co_queue_wait(&op->waiting_requests, NULL);
+                self->waiting_for_op = NULL;
                 break;
             }
         }
diff --git a/tests/qemu-iotests/151 b/tests/qemu-iotests/151
index ab46c5e8ba..93d14193d0 100755
--- a/tests/qemu-iotests/151
+++ b/tests/qemu-iotests/151
@@ -143,10 +143,6 @@ class TestActiveMirror(iotests.QMPTestCase):
         self.potential_writes_in_flight = False
 
     def testIntersectingActiveIO(self):
-        # FIXME: test-case is dead-locking. To reproduce dead-lock just drop
-        # this return statement
-        return
-
         # Fill the source image
         result = self.vm.hmp_qemu_io('source', 'write -P 1 0 2M')
 
@@ -180,18 +176,14 @@ class TestActiveMirror(iotests.QMPTestCase):
 
         # Now we resumed 1, so 2 and 3 goes to the next iteration of while loop
         # in mirror_wait_on_conflicts(). They don't exit, as bitmap is dirty
-        # due to request 4. And they start to wait: 2 wait for 3, 3 wait for 2
-        # - DEAD LOCK.
-        # Note that it's important that we add request 4 at last: requests are
-        # appended to the list, so we are sure that 4 is last in the list, so 2
-        # and 3 now waits for each other, not for 4.
+        # due to request 4.
+        # In the past at that point 2 and 3 would wait for each other producing
+        # a dead-lock. Now this is fixed and they will wait for request 4.
 
         self.vm.hmp_qemu_io('source', 'resume B')
 
-        # Resuming 4 doesn't help, 2 and 3 already dead-locked
-        # To check the dead-lock run:
-        #    gdb -p $(pidof qemu-system-x86_64) -ex 'set $job=(MirrorBlockJob *)jobs.lh_first' -ex 'p *$job->ops_in_flight.tqh_first' -ex 'p *$job->ops_in_flight.tqh_first->next.tqe_next'
-        # You'll see two MirrorOp objects waiting on each other
+        # After resuming 4, one of 2 and 3 goes first and set in_flight_bitmap,
+        # so the other will wait for it.
 
         result = self.vm.qmp('block-job-set-speed', device='mirror', speed=0)
         self.assert_qmp(result, 'return', {})
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] Fix active mirror dead-lock
  2021-07-02 21:16 [PATCH 0/3] Fix active mirror dead-lock Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2021-07-02 21:16 ` [PATCH 3/3] block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts Vladimir Sementsov-Ogievskiy
@ 2021-07-02 22:34 ` Vladimir Sementsov-Ogievskiy
  2021-07-14 14:08 ` Kevin Wolf
  4 siblings, 0 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-02 22:34 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, jsnow, Denis V. Lunev

[Fix Den's email address in CC]

03.07.2021 00:16, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> We've faced a dead-lock in active mirror in our Rhev-8.4 based Qemu
> build. And it's reproducible on master too.
> 
> Vladimir Sementsov-Ogievskiy (3):
>    block/mirror: set .co for active-write MirrorOp objects
>    iotest 151: add test-case that shows active mirror dead-lock
>    block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts
> 
>   block/mirror.c             | 13 +++++++++
>   tests/qemu-iotests/151     | 54 ++++++++++++++++++++++++++++++++++++--
>   tests/qemu-iotests/151.out |  4 +--
>   3 files changed, 67 insertions(+), 4 deletions(-)
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] Fix active mirror dead-lock
  2021-07-02 21:16 [PATCH 0/3] Fix active mirror dead-lock Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2021-07-02 22:34 ` [PATCH 0/3] Fix active mirror dead-lock Vladimir Sementsov-Ogievskiy
@ 2021-07-14 14:08 ` Kevin Wolf
  4 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2021-07-14 14:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: dem, jsnow, qemu-devel, qemu-block, mreitz

Am 02.07.2021 um 23:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi all!
> 
> We've faced a dead-lock in active mirror in our Rhev-8.4 based Qemu
> build. And it's reproducible on master too.

Thanks, applied to the block branch.

Kevin



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-07-14 14:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-02 21:16 [PATCH 0/3] Fix active mirror dead-lock Vladimir Sementsov-Ogievskiy
2021-07-02 21:16 ` [PATCH 1/3] block/mirror: set .co for active-write MirrorOp objects Vladimir Sementsov-Ogievskiy
2021-07-02 21:16 ` [PATCH 2/3] iotest 151: add test-case that shows active mirror dead-lock Vladimir Sementsov-Ogievskiy
2021-07-02 21:16 ` [PATCH 3/3] block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts Vladimir Sementsov-Ogievskiy
2021-07-02 22:34 ` [PATCH 0/3] Fix active mirror dead-lock Vladimir Sementsov-Ogievskiy
2021-07-14 14:08 ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.