qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] nbd reconnect new fixes
@ 2020-09-03 19:02 Vladimir Sementsov-Ogievskiy
  2020-09-03 19:02 ` [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay Vladimir Sementsov-Ogievskiy
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-03 19:02 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, eblake, vsementsov, den

Hi! Let's continue fixing nbd reconnect feature.

These series is based on "[PULL 0/6] NBD patches for 2020-09-02"
Based-on: <20200902215400.2673028-1-eblake@redhat.com>

Vladimir Sementsov-Ogievskiy (4):
  block/nbd: fix drain dead-lock because of nbd reconnect-delay
  block/nbd: correctly use qio_channel_detach_aio_context when needed
  block/nbd: fix reconnect-delay
  block/nbd: nbd_co_reconnect_loop(): don't connect if drained

 block/nbd.c | 71 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 60 insertions(+), 11 deletions(-)

-- 
2.18.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay
  2020-09-03 19:02 [PATCH 0/4] nbd reconnect new fixes Vladimir Sementsov-Ogievskiy
@ 2020-09-03 19:02 ` Vladimir Sementsov-Ogievskiy
  2020-09-23 15:08   ` Eric Blake
  2021-02-03 10:53   ` Roman Kagan
  2020-09-03 19:02 ` [PATCH 2/4] block/nbd: correctly use qio_channel_detach_aio_context when needed Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-03 19:02 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, eblake, vsementsov, den

We pause reconnect process during drained section. So, if we have some
requests, waiting for reconnect we should cancel them, otherwise they
deadlock the drained section.

How to reproduce:

1. Create an image:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server:
   qemu-nbd xx

3. Start vm with second nbd disk on node2, like this:

  ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
     file=/work/images/cent7.qcow2 -drive \
     driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
     -vnc :0 -m 2G -enable-kvm -vga std

4. Access the vm through vnc (or some other way?), and check that NBD
   drive works:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

   - the command should succeed.

5. Now, kill the nbd server, and run dd in the guest again:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

Now Qemu is trying to reconnect, and dd-generated requests are waiting
for the connection (they will wait up to 60 seconds (see
reconnect-delay option above) and than fail). But suddenly, vm may
totally hang in the deadlock. You may need to increase reconnect-delay
period to catch the dead-lock.

VM doesn't respond because drain dead-lock happens in cpu thread with
global mutex taken. That's not good thing by itself and is not fixed
by this commit (true way is using iothreads). Still this commit fixes
drain dead-lock itself.

Note: probably, we can instead continue to reconnect during drained
section. To achieve this, we may move negotiation to the connect thread
to make it independent of bs aio context. But expanding drained section
doesn't seem good anyway. So, let's now fix the bug the simplest way.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 9daf003bea..912ea27be7 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -242,6 +242,11 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
     }
 
     nbd_co_establish_connection_cancel(bs, false);
+
+    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
+        qemu_co_queue_restart_all(&s->free_sema);
+    }
 }
 
 static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs)
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] block/nbd: correctly use qio_channel_detach_aio_context when needed
  2020-09-03 19:02 [PATCH 0/4] nbd reconnect new fixes Vladimir Sementsov-Ogievskiy
  2020-09-03 19:02 ` [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay Vladimir Sementsov-Ogievskiy
@ 2020-09-03 19:02 ` Vladimir Sementsov-Ogievskiy
  2020-09-23 15:10   ` Eric Blake
  2020-09-03 19:03 ` [PATCH 3/4] block/nbd: fix reconnect-delay Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-03 19:02 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, eblake, vsementsov, den

Don't use nbd_client_detach_aio_context() driver handler where we want
to finalize the connection. We should directly use
qio_channel_detach_aio_context() in such cases. Driver handler may (and
will) contain another things, unrelated to the qio channel.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 912ea27be7..a495ad7ddf 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -549,7 +549,7 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 
     /* Finalize previous connection if any */
     if (s->ioc) {
-        nbd_client_detach_aio_context(s->bs);
+        qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
@@ -707,7 +707,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
 
     s->connection_co = NULL;
     if (s->ioc) {
-        nbd_client_detach_aio_context(s->bs);
+        qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] block/nbd: fix reconnect-delay
  2020-09-03 19:02 [PATCH 0/4] nbd reconnect new fixes Vladimir Sementsov-Ogievskiy
  2020-09-03 19:02 ` [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay Vladimir Sementsov-Ogievskiy
  2020-09-03 19:02 ` [PATCH 2/4] block/nbd: correctly use qio_channel_detach_aio_context when needed Vladimir Sementsov-Ogievskiy
@ 2020-09-03 19:03 ` Vladimir Sementsov-Ogievskiy
  2020-09-23 15:15   ` Eric Blake
  2020-09-03 19:03 ` [PATCH 4/4] block/nbd: nbd_co_reconnect_loop(): don't connect if drained Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-03 19:03 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, eblake, vsementsov, den

reconnect-delay has a design flaw: we handle it in the same loop where
we do connection attempt. So, reconnect-delay may be exceeded by
unpredictable time of connection attempt.

Let's instead use separate timer.

How to reproduce the bug:

1. Create an image on node1:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server on node1:
   qemu-nbd xx

3. On node2 start qemu-io:

./build/qemu-io --image-opts \
driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15

4. Type 'read 0 512' in qemu-io interface to check that connection
   works

Be careful: you should make steps 5-7 in a short time, less than 15
seconds.

5. Kill nbd server on node1

6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd
client goes to reconnect loop.

7. On node1 run the following command

   sudo iptables -A INPUT -p tcp --dport 10809 -j DROP

This will make the connect() call of qemu-io at node2 take a long time.

And you'll see that read command in qemu-io will hang for a long time,
more than 15 seconds specified by reconnect-delay parameter. It's the
bug.

8. Don't forget to drop iptables rule on node1:

   sudo iptables -D INPUT -p tcp --dport 10809 -j DROP

Important note: Step [5] is necessary to reproduce _this_ bug. If we
miss step [5], the read command (step 6) will hang for a long time and
this commit doesn't help, because there will be not long connect() to
unreachable host, but long sendmsg() to unreachable host, which should
be fixed by enabling and adjusting keep-alive on the socket, which is a
thing for further patch set.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 50 insertions(+), 9 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index a495ad7ddf..caae0e6d31 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -122,6 +122,8 @@ typedef struct BDRVNBDState {
     Error *connect_err;
     bool wait_in_flight;
 
+    QEMUTimer *reconnect_delay_timer;
+
     NBDClientRequest requests[MAX_NBD_REQUESTS];
     NBDReply reply;
     BlockDriverState *bs;
@@ -188,10 +190,49 @@ static void nbd_recv_coroutines_wake_all(BDRVNBDState *s)
     }
 }
 
+static void reconnect_delay_timer_del(BDRVNBDState *s)
+{
+    if (s->reconnect_delay_timer) {
+        timer_del(s->reconnect_delay_timer);
+        timer_free(s->reconnect_delay_timer);
+        s->reconnect_delay_timer = NULL;
+    }
+}
+
+static void reconnect_delay_timer_cb(void *opaque)
+{
+    BDRVNBDState *s = opaque;
+
+    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
+        while (qemu_co_enter_next(&s->free_sema, NULL)) {
+            /* Resume all queued requests */
+        }
+    }
+
+    reconnect_delay_timer_del(s);
+}
+
+static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns)
+{
+    if (s->state != NBD_CLIENT_CONNECTING_WAIT) {
+        return;
+    }
+
+    assert(!s->reconnect_delay_timer);
+    s->reconnect_delay_timer = aio_timer_new(bdrv_get_aio_context(s->bs),
+                                             QEMU_CLOCK_REALTIME,
+                                             SCALE_NS,
+                                             reconnect_delay_timer_cb, s);
+    timer_mod(s->reconnect_delay_timer, expire_time_ns);
+}
+
 static void nbd_client_detach_aio_context(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
+    /* Timer is deleted in nbd_client_co_drain_begin() */
+    assert(!s->reconnect_delay_timer);
     qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
 }
 
@@ -243,6 +284,8 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
 
     nbd_co_establish_connection_cancel(bs, false);
 
+    reconnect_delay_timer_del(s);
+
     if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
         s->state = NBD_CLIENT_CONNECTING_NOWAIT;
         qemu_co_queue_restart_all(&s->free_sema);
@@ -593,21 +636,17 @@ out:
 
 static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
 {
-    uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-    uint64_t delay_ns = s->reconnect_delay * NANOSECONDS_PER_SECOND;
     uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
     uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;
 
+    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+        reconnect_delay_timer_init(s, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
+                                   s->reconnect_delay * NANOSECONDS_PER_SECOND);
+    }
+
     nbd_reconnect_attempt(s);
 
     while (nbd_client_connecting(s)) {
-        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
-            qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
-        {
-            s->state = NBD_CLIENT_CONNECTING_NOWAIT;
-            qemu_co_queue_restart_all(&s->free_sema);
-        }
-
         if (s->drained) {
             bdrv_dec_in_flight(s->bs);
             s->wait_drained_end = true;
@@ -629,6 +668,8 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
 
         nbd_reconnect_attempt(s);
     }
+
+    reconnect_delay_timer_del(s);
 }
 
 static coroutine_fn void nbd_connection_entry(void *opaque)
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] block/nbd: nbd_co_reconnect_loop(): don't connect if drained
  2020-09-03 19:02 [PATCH 0/4] nbd reconnect new fixes Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2020-09-03 19:03 ` [PATCH 3/4] block/nbd: fix reconnect-delay Vladimir Sementsov-Ogievskiy
@ 2020-09-03 19:03 ` Vladimir Sementsov-Ogievskiy
  2020-09-23 15:16   ` Eric Blake
  2020-09-04 13:32 ` [PATCH 0/4] nbd reconnect new fixes no-reply
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-03 19:03 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, eblake, vsementsov, den

In a recent commit 12c75e20a269ac we've improved
nbd_co_reconnect_loop() to not make drain wait for additional sleep.
Similarly, we shouldn't try to connect, if previous sleep was
interrupted by drain begin, otherwise drain_begin will have to wait for
the whole connection attempt.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index caae0e6d31..4548046cd7 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -661,6 +661,9 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
         } else {
             qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
                                       &s->connection_co_sleep_ns_state);
+            if (s->drained) {
+                continue;
+            }
             if (timeout < max_timeout) {
                 timeout *= 2;
             }
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] nbd reconnect new fixes
  2020-09-03 19:02 [PATCH 0/4] nbd reconnect new fixes Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2020-09-03 19:03 ` [PATCH 4/4] block/nbd: nbd_co_reconnect_loop(): don't connect if drained Vladimir Sementsov-Ogievskiy
@ 2020-09-04 13:32 ` no-reply
  2020-09-04 14:00   ` Vladimir Sementsov-Ogievskiy
  2020-09-04 13:34 ` no-reply
  2020-09-18 18:29 ` Vladimir Sementsov-Ogievskiy
  6 siblings, 1 reply; 19+ messages in thread
From: no-reply @ 2020-09-04 13:32 UTC (permalink / raw)
  To: vsementsov; +Cc: kwolf, vsementsov, qemu-block, qemu-devel, mreitz, den

Patchew URL: https://patchew.org/QEMU/20200903190301.367620-1-vsementsov@virtuozzo.com/



Hi,

This series failed build test on FreeBSD host. Please find the details below.






The full log is available at
http://patchew.org/logs/20200903190301.367620-1-vsementsov@virtuozzo.com/testing.FreeBSD/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] nbd reconnect new fixes
  2020-09-03 19:02 [PATCH 0/4] nbd reconnect new fixes Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2020-09-04 13:32 ` [PATCH 0/4] nbd reconnect new fixes no-reply
@ 2020-09-04 13:34 ` no-reply
  2020-09-04 14:01   ` Vladimir Sementsov-Ogievskiy
  2020-09-18 18:29 ` Vladimir Sementsov-Ogievskiy
  6 siblings, 1 reply; 19+ messages in thread
From: no-reply @ 2020-09-04 13:34 UTC (permalink / raw)
  To: vsementsov; +Cc: kwolf, vsementsov, qemu-block, qemu-devel, mreitz, den

Patchew URL: https://patchew.org/QEMU/20200903190301.367620-1-vsementsov@virtuozzo.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.






The full log is available at
http://patchew.org/logs/20200903190301.367620-1-vsementsov@virtuozzo.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] nbd reconnect new fixes
  2020-09-04 13:32 ` [PATCH 0/4] nbd reconnect new fixes no-reply
@ 2020-09-04 14:00   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-04 14:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, mreitz, kwolf, eblake, den

04.09.2020 16:32, no-reply@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20200903190301.367620-1-vsementsov@virtuozzo.com/
> 
> 
> 
> Hi,
> 
> This series failed build test on FreeBSD host. Please find the details below.
> 
> 
> 
> 
> 
> 
> The full log is available at
> http://patchew.org/logs/20200903190301.367620-1-vsementsov@virtuozzo.com/testing.FreeBSD/?type=message.

bad link. it shows:

N/A. Internal error while reading log file

> ---
> Email generated automatically by Patchew [https://patchew.org/].
> Please send your feedback to patchew-devel@redhat.com
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] nbd reconnect new fixes
  2020-09-04 13:34 ` no-reply
@ 2020-09-04 14:01   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-04 14:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, mreitz, kwolf, eblake, den

04.09.2020 16:34, no-reply@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20200903190301.367620-1-vsementsov@virtuozzo.com/
> 
> 
> 
> Hi,
> 
> This series failed the docker-mingw@fedora build test. Please find the testing commands and
> their output below. If you have Docker installed, you can probably reproduce it
> locally.
> 
> 
> 
> 
> 
> 
> The full log is available at
> http://patchew.org/logs/20200903190301.367620-1-vsementsov@virtuozzo.com/testing.docker-mingw@fedora/?type=message.


bad link. it shows:

N/A. Internal error while reading log file

> ---
> Email generated automatically by Patchew [https://patchew.org/].
> Please send your feedback to patchew-devel@redhat.com
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] nbd reconnect new fixes
  2020-09-03 19:02 [PATCH 0/4] nbd reconnect new fixes Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2020-09-04 13:34 ` no-reply
@ 2020-09-18 18:29 ` Vladimir Sementsov-Ogievskiy
  6 siblings, 0 replies; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:29 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, mreitz, kwolf, eblake, den

ping

03.09.2020 22:02, Vladimir Sementsov-Ogievskiy wrote:
> Hi! Let's continue fixing nbd reconnect feature.
> 
> These series is based on "[PULL 0/6] NBD patches for 2020-09-02"
> Based-on: <20200902215400.2673028-1-eblake@redhat.com>
> 
> Vladimir Sementsov-Ogievskiy (4):
>    block/nbd: fix drain dead-lock because of nbd reconnect-delay
>    block/nbd: correctly use qio_channel_detach_aio_context when needed
>    block/nbd: fix reconnect-delay
>    block/nbd: nbd_co_reconnect_loop(): don't connect if drained
> 
>   block/nbd.c | 71 ++++++++++++++++++++++++++++++++++++++++++++---------
>   1 file changed, 60 insertions(+), 11 deletions(-)
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay
  2020-09-03 19:02 ` [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay Vladimir Sementsov-Ogievskiy
@ 2020-09-23 15:08   ` Eric Blake
  2021-02-03 10:53   ` Roman Kagan
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Blake @ 2020-09-23 15:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: kwolf, den, qemu-devel, mreitz

On 9/3/20 2:02 PM, Vladimir Sementsov-Ogievskiy wrote:
> We pause reconnect process during drained section. So, if we have some
> requests, waiting for reconnect we should cancel them, otherwise they
> deadlock the drained section.
> 
> How to reproduce:
> 

> Now Qemu is trying to reconnect, and dd-generated requests are waiting
> for the connection (they will wait up to 60 seconds (see
> reconnect-delay option above) and than fail). But suddenly, vm may
> totally hang in the deadlock. You may need to increase reconnect-delay
> period to catch the dead-lock.
> 
> VM doesn't respond because drain dead-lock happens in cpu thread with
> global mutex taken. That's not good thing by itself and is not fixed
> by this commit (true way is using iothreads). Still this commit fixes
> drain dead-lock itself.
> 
> Note: probably, we can instead continue to reconnect during drained
> section. To achieve this, we may move negotiation to the connect thread
> to make it independent of bs aio context. But expanding drained section
> doesn't seem good anyway. So, let's now fix the bug the simplest way.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/nbd.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 9daf003bea..912ea27be7 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -242,6 +242,11 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
>       }
>   
>       nbd_co_establish_connection_cancel(bs, false);
> +
> +    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
> +        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
> +        qemu_co_queue_restart_all(&s->free_sema);
> +    }
>   }
>   

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] block/nbd: correctly use qio_channel_detach_aio_context when needed
  2020-09-03 19:02 ` [PATCH 2/4] block/nbd: correctly use qio_channel_detach_aio_context when needed Vladimir Sementsov-Ogievskiy
@ 2020-09-23 15:10   ` Eric Blake
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2020-09-23 15:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: kwolf, den, qemu-devel, mreitz

On 9/3/20 2:02 PM, Vladimir Sementsov-Ogievskiy wrote:
> Don't use nbd_client_detach_aio_context() driver handler where we want
> to finalize the connection. We should directly use
> qio_channel_detach_aio_context() in such cases. Driver handler may (and
> will) contain another things, unrelated to the qio channel.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/nbd.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] block/nbd: fix reconnect-delay
  2020-09-03 19:03 ` [PATCH 3/4] block/nbd: fix reconnect-delay Vladimir Sementsov-Ogievskiy
@ 2020-09-23 15:15   ` Eric Blake
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2020-09-23 15:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: kwolf, den, qemu-devel, mreitz

On 9/3/20 2:03 PM, Vladimir Sementsov-Ogievskiy wrote:
> reconnect-delay has a design flaw: we handle it in the same loop where
> we do connection attempt. So, reconnect-delay may be exceeded by
> unpredictable time of connection attempt.
> 
> Let's instead use separate timer.
> 
> How to reproduce the bug:
> 

> 
> This will make the connect() call of qemu-io at node2 take a long time.
> 
> And you'll see that read command in qemu-io will hang for a long time,
> more than 15 seconds specified by reconnect-delay parameter. It's the
> bug.
> 
> 8. Don't forget to drop iptables rule on node1:
> 
>     sudo iptables -D INPUT -p tcp --dport 10809 -j DROP
> 
> Important note: Step [5] is necessary to reproduce _this_ bug. If we
> miss step [5], the read command (step 6) will hang for a long time and
> this commit doesn't help, because there will be not long connect() to
> unreachable host, but long sendmsg() to unreachable host, which should
> be fixed by enabling and adjusting keep-alive on the socket, which is a
> thing for further patch set.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/nbd.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 50 insertions(+), 9 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] block/nbd: nbd_co_reconnect_loop(): don't connect if drained
  2020-09-03 19:03 ` [PATCH 4/4] block/nbd: nbd_co_reconnect_loop(): don't connect if drained Vladimir Sementsov-Ogievskiy
@ 2020-09-23 15:16   ` Eric Blake
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2020-09-23 15:16 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: kwolf, den, qemu-devel, mreitz

On 9/3/20 2:03 PM, Vladimir Sementsov-Ogievskiy wrote:
> In a recent commit 12c75e20a269ac we've improved
> nbd_co_reconnect_loop() to not make drain wait for additional sleep.
> Similarly, we shouldn't try to connect, if previous sleep was
> interrupted by drain begin, otherwise drain_begin will have to wait for
> the whole connection attempt.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/nbd.c | 3 +++
>   1 file changed, 3 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block/nbd.c b/block/nbd.c
> index caae0e6d31..4548046cd7 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -661,6 +661,9 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
>           } else {
>               qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
>                                         &s->connection_co_sleep_ns_state);
> +            if (s->drained) {
> +                continue;
> +            }
>               if (timeout < max_timeout) {
>                   timeout *= 2;
>               }
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay
  2020-09-03 19:02 ` [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay Vladimir Sementsov-Ogievskiy
  2020-09-23 15:08   ` Eric Blake
@ 2021-02-03 10:53   ` Roman Kagan
  2021-02-03 13:10     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 19+ messages in thread
From: Roman Kagan @ 2021-02-03 10:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: kwolf, qemu-block, qemu-devel, mreitz, den

On Thu, Sep 03, 2020 at 10:02:58PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> We pause reconnect process during drained section. So, if we have some
> requests, waiting for reconnect we should cancel them, otherwise they
> deadlock the drained section.
> 
> How to reproduce:
> 
> 1. Create an image:
>    qemu-img create -f qcow2 xx 100M
> 
> 2. Start NBD server:
>    qemu-nbd xx
> 
> 3. Start vm with second nbd disk on node2, like this:
> 
>   ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
>      file=/work/images/cent7.qcow2 -drive \
>      driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
>      -vnc :0 -m 2G -enable-kvm -vga std
> 
> 4. Access the vm through vnc (or some other way?), and check that NBD
>    drive works:
> 
>    dd if=/dev/sdb of=/dev/null bs=1M count=10
> 
>    - the command should succeed.
> 
> 5. Now, kill the nbd server, and run dd in the guest again:
> 
>    dd if=/dev/sdb of=/dev/null bs=1M count=10
> 
> Now Qemu is trying to reconnect, and dd-generated requests are waiting
> for the connection (they will wait up to 60 seconds (see
> reconnect-delay option above) and than fail). But suddenly, vm may
> totally hang in the deadlock. You may need to increase reconnect-delay
> period to catch the dead-lock.
> 
> VM doesn't respond because drain dead-lock happens in cpu thread with
> global mutex taken. That's not good thing by itself and is not fixed
> by this commit (true way is using iothreads). Still this commit fixes
> drain dead-lock itself.
> 
> Note: probably, we can instead continue to reconnect during drained
> section. To achieve this, we may move negotiation to the connect thread
> to make it independent of bs aio context. But expanding drained section
> doesn't seem good anyway. So, let's now fix the bug the simplest way.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 9daf003bea..912ea27be7 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -242,6 +242,11 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
>      }
>  
>      nbd_co_establish_connection_cancel(bs, false);
> +
> +    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
> +        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
> +        qemu_co_queue_restart_all(&s->free_sema);
> +    }
>  }
>  
>  static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs)

This basically defeats the whole purpose of reconnect: if the nbd client
is trying to reconnect, drain effectively cancels that and makes all
in-flight requests to complete with an error.

I'm not suggesting to revert this patch (it's now in the tree as commit
8c517de24a), because the deadlock is no better, but I'm afraid the only
real fix is to implement reconnect during the drain section.  I'm still
trying to get my head around it so no patch yet, but I just wanted to
bring this up in case anybody beats me to it.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay
  2021-02-03 10:53   ` Roman Kagan
@ 2021-02-03 13:10     ` Vladimir Sementsov-Ogievskiy
  2021-02-03 14:21       ` Roman Kagan
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-02-03 13:10 UTC (permalink / raw)
  To: Roman Kagan, qemu-block, qemu-devel, mreitz, kwolf, eblake, den

03.02.2021 13:53, Roman Kagan wrote:
> On Thu, Sep 03, 2020 at 10:02:58PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> We pause reconnect process during drained section. So, if we have some
>> requests, waiting for reconnect we should cancel them, otherwise they
>> deadlock the drained section.
>>
>> How to reproduce:
>>
>> 1. Create an image:
>>     qemu-img create -f qcow2 xx 100M
>>
>> 2. Start NBD server:
>>     qemu-nbd xx
>>
>> 3. Start vm with second nbd disk on node2, like this:
>>
>>    ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
>>       file=/work/images/cent7.qcow2 -drive \
>>       driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
>>       -vnc :0 -m 2G -enable-kvm -vga std
>>
>> 4. Access the vm through vnc (or some other way?), and check that NBD
>>     drive works:
>>
>>     dd if=/dev/sdb of=/dev/null bs=1M count=10
>>
>>     - the command should succeed.
>>
>> 5. Now, kill the nbd server, and run dd in the guest again:
>>
>>     dd if=/dev/sdb of=/dev/null bs=1M count=10
>>
>> Now Qemu is trying to reconnect, and dd-generated requests are waiting
>> for the connection (they will wait up to 60 seconds (see
>> reconnect-delay option above) and than fail). But suddenly, vm may
>> totally hang in the deadlock. You may need to increase reconnect-delay
>> period to catch the dead-lock.
>>
>> VM doesn't respond because drain dead-lock happens in cpu thread with
>> global mutex taken. That's not good thing by itself and is not fixed
>> by this commit (true way is using iothreads). Still this commit fixes
>> drain dead-lock itself.
>>
>> Note: probably, we can instead continue to reconnect during drained
>> section. To achieve this, we may move negotiation to the connect thread
>> to make it independent of bs aio context. But expanding drained section
>> doesn't seem good anyway. So, let's now fix the bug the simplest way.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/nbd.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/block/nbd.c b/block/nbd.c
>> index 9daf003bea..912ea27be7 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -242,6 +242,11 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
>>       }
>>   
>>       nbd_co_establish_connection_cancel(bs, false);
>> +
>> +    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
>> +        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
>> +        qemu_co_queue_restart_all(&s->free_sema);
>> +    }
>>   }
>>   
>>   static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs)
> 
> This basically defeats the whole purpose of reconnect: if the nbd client
> is trying to reconnect, drain effectively cancels that and makes all
> in-flight requests to complete with an error.
> 
> I'm not suggesting to revert this patch (it's now in the tree as commit
> 8c517de24a), because the deadlock is no better, but I'm afraid the only
> real fix is to implement reconnect during the drain section.  I'm still
> trying to get my head around it so no patch yet, but I just wanted to
> bring this up in case anybody beats me to it.
> 


What do you mean by "reconnect during drained section"? Trying to establish the connection? Or keeping in-flight requests instead of cancelling them? We can't keep in-flight requests during drained section, as it's the whole sense of drained-section: no in-flight requests. So we'll have to wait for them at drain_begin (waiting up to reconnect-delay, which doesn't seem good and triggers dead-lock for non-iothread environment) or just cancel them..

Do you argue that waiting on drained-begin is somehow better than cancelling?

Or, the problem is that after drained section (assume it was short) we are in NBD_CLIENT_CONNECTING_NOWAIT ?  Fixing it should be simple enough, we just need to check the time at drained_end and if the delay is not expired enable NBD_CLIENT_CONNECTING_WAIT agaub,,


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay
  2021-02-03 13:10     ` Vladimir Sementsov-Ogievskiy
@ 2021-02-03 14:21       ` Roman Kagan
  2021-02-03 14:44         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Roman Kagan @ 2021-02-03 14:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: kwolf, qemu-block, qemu-devel, mreitz, den

On Wed, Feb 03, 2021 at 04:10:41PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 03.02.2021 13:53, Roman Kagan wrote:
> > On Thu, Sep 03, 2020 at 10:02:58PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > We pause reconnect process during drained section. So, if we have some
> > > requests, waiting for reconnect we should cancel them, otherwise they
> > > deadlock the drained section.
> > > 
> > > How to reproduce:
> > > 
> > > 1. Create an image:
> > >     qemu-img create -f qcow2 xx 100M
> > > 
> > > 2. Start NBD server:
> > >     qemu-nbd xx
> > > 
> > > 3. Start vm with second nbd disk on node2, like this:
> > > 
> > >    ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
> > >       file=/work/images/cent7.qcow2 -drive \
> > >       driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
> > >       -vnc :0 -m 2G -enable-kvm -vga std
> > > 
> > > 4. Access the vm through vnc (or some other way?), and check that NBD
> > >     drive works:
> > > 
> > >     dd if=/dev/sdb of=/dev/null bs=1M count=10
> > > 
> > >     - the command should succeed.
> > > 
> > > 5. Now, kill the nbd server, and run dd in the guest again:
> > > 
> > >     dd if=/dev/sdb of=/dev/null bs=1M count=10
> > > 
> > > Now Qemu is trying to reconnect, and dd-generated requests are waiting
> > > for the connection (they will wait up to 60 seconds (see
> > > reconnect-delay option above) and than fail). But suddenly, vm may
> > > totally hang in the deadlock. You may need to increase reconnect-delay
> > > period to catch the dead-lock.
> > > 
> > > VM doesn't respond because drain dead-lock happens in cpu thread with
> > > global mutex taken. That's not good thing by itself and is not fixed
> > > by this commit (true way is using iothreads). Still this commit fixes
> > > drain dead-lock itself.
> > > 
> > > Note: probably, we can instead continue to reconnect during drained
> > > section. To achieve this, we may move negotiation to the connect thread
> > > to make it independent of bs aio context. But expanding drained section
> > > doesn't seem good anyway. So, let's now fix the bug the simplest way.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > ---
> > >   block/nbd.c | 5 +++++
> > >   1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/block/nbd.c b/block/nbd.c
> > > index 9daf003bea..912ea27be7 100644
> > > --- a/block/nbd.c
> > > +++ b/block/nbd.c
> > > @@ -242,6 +242,11 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
> > >       }
> > >       nbd_co_establish_connection_cancel(bs, false);
> > > +
> > > +    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
> > > +        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
> > > +        qemu_co_queue_restart_all(&s->free_sema);
> > > +    }
> > >   }
> > >   static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs)
> > 
> > This basically defeats the whole purpose of reconnect: if the nbd client
> > is trying to reconnect, drain effectively cancels that and makes all
> > in-flight requests to complete with an error.
> > 
> > I'm not suggesting to revert this patch (it's now in the tree as commit
> > 8c517de24a), because the deadlock is no better, but I'm afraid the only
> > real fix is to implement reconnect during the drain section.  I'm still
> > trying to get my head around it so no patch yet, but I just wanted to
> > bring this up in case anybody beats me to it.
> > 
> 
> 
> What do you mean by "reconnect during drained section"? Trying to
> establish the connection? Or keeping in-flight requests instead of
> cancelling them? We can't keep in-flight requests during drained
> section, as it's the whole sense of drained-section: no in-flight
> requests. So we'll have to wait for them at drain_begin (waiting up to
> reconnect-delay, which doesn't seem good and triggers dead-lock for
> non-iothread environment) or just cancel them..
> 
> Do you argue that waiting on drained-begin is somehow better than
> cancelling?

Sorry I should have used more accurate wording to be clear.

Yes, my point is that canceling the requests on entry to a drained
section is incorrect.  And no, it doesn't matter if it can be long:
that's the price you pay for doing the drain.  Think of reconnect as a
special case of a slow connection: if an nbd reply from the server is
delayed for whatever reason without dropping the connection, you have to
wait here, too.  (In fact, reconnect is even slightly better here since
it has a configurable finite timeout while the message delay does not
AFAIK.)

Does it make sense?

> Or, the problem is that after drained section (assume it was short) we
> are in NBD_CLIENT_CONNECTING_NOWAIT ?  Fixing it should be simple
> enough, we just need to check the time at drained_end and if the delay
> is not expired enable NBD_CLIENT_CONNECTING_WAIT agaub,,

Hmm, I didn't get thus far but this is probably an issue too.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay
  2021-02-03 14:21       ` Roman Kagan
@ 2021-02-03 14:44         ` Vladimir Sementsov-Ogievskiy
  2021-02-03 15:00           ` Roman Kagan
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-02-03 14:44 UTC (permalink / raw)
  To: Roman Kagan, qemu-block, qemu-devel, mreitz, kwolf, eblake, den

03.02.2021 17:21, Roman Kagan wrote:
> On Wed, Feb 03, 2021 at 04:10:41PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> 03.02.2021 13:53, Roman Kagan wrote:
>>> On Thu, Sep 03, 2020 at 10:02:58PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> We pause reconnect process during drained section. So, if we have some
>>>> requests, waiting for reconnect we should cancel them, otherwise they
>>>> deadlock the drained section.
>>>>
>>>> How to reproduce:
>>>>
>>>> 1. Create an image:
>>>>      qemu-img create -f qcow2 xx 100M
>>>>
>>>> 2. Start NBD server:
>>>>      qemu-nbd xx
>>>>
>>>> 3. Start vm with second nbd disk on node2, like this:
>>>>
>>>>     ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
>>>>        file=/work/images/cent7.qcow2 -drive \
>>>>        driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
>>>>        -vnc :0 -m 2G -enable-kvm -vga std
>>>>
>>>> 4. Access the vm through vnc (or some other way?), and check that NBD
>>>>      drive works:
>>>>
>>>>      dd if=/dev/sdb of=/dev/null bs=1M count=10
>>>>
>>>>      - the command should succeed.
>>>>
>>>> 5. Now, kill the nbd server, and run dd in the guest again:
>>>>
>>>>      dd if=/dev/sdb of=/dev/null bs=1M count=10
>>>>
>>>> Now Qemu is trying to reconnect, and dd-generated requests are waiting
>>>> for the connection (they will wait up to 60 seconds (see
>>>> reconnect-delay option above) and than fail). But suddenly, vm may
>>>> totally hang in the deadlock. You may need to increase reconnect-delay
>>>> period to catch the dead-lock.
>>>>
>>>> VM doesn't respond because drain dead-lock happens in cpu thread with
>>>> global mutex taken. That's not good thing by itself and is not fixed
>>>> by this commit (true way is using iothreads). Still this commit fixes
>>>> drain dead-lock itself.
>>>>
>>>> Note: probably, we can instead continue to reconnect during drained
>>>> section. To achieve this, we may move negotiation to the connect thread
>>>> to make it independent of bs aio context. But expanding drained section
>>>> doesn't seem good anyway. So, let's now fix the bug the simplest way.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    block/nbd.c | 5 +++++
>>>>    1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/block/nbd.c b/block/nbd.c
>>>> index 9daf003bea..912ea27be7 100644
>>>> --- a/block/nbd.c
>>>> +++ b/block/nbd.c
>>>> @@ -242,6 +242,11 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
>>>>        }
>>>>        nbd_co_establish_connection_cancel(bs, false);
>>>> +
>>>> +    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
>>>> +        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
>>>> +        qemu_co_queue_restart_all(&s->free_sema);
>>>> +    }
>>>>    }
>>>>    static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs)
>>>
>>> This basically defeats the whole purpose of reconnect: if the nbd client
>>> is trying to reconnect, drain effectively cancels that and makes all
>>> in-flight requests to complete with an error.
>>>
>>> I'm not suggesting to revert this patch (it's now in the tree as commit
>>> 8c517de24a), because the deadlock is no better, but I'm afraid the only
>>> real fix is to implement reconnect during the drain section.  I'm still
>>> trying to get my head around it so no patch yet, but I just wanted to
>>> bring this up in case anybody beats me to it.
>>>
>>
>>
>> What do you mean by "reconnect during drained section"? Trying to
>> establish the connection? Or keeping in-flight requests instead of
>> cancelling them? We can't keep in-flight requests during drained
>> section, as it's the whole sense of drained-section: no in-flight
>> requests. So we'll have to wait for them at drain_begin (waiting up to
>> reconnect-delay, which doesn't seem good and triggers dead-lock for
>> non-iothread environment) or just cancel them..
>>
>> Do you argue that waiting on drained-begin is somehow better than
>> cancelling?
> 
> Sorry I should have used more accurate wording to be clear.
> 
> Yes, my point is that canceling the requests on entry to a drained
> section is incorrect.  And no, it doesn't matter if it can be long:
> that's the price you pay for doing the drain.  Think of reconnect as a
> special case of a slow connection: if an nbd reply from the server is
> delayed for whatever reason without dropping the connection, you have to
> wait here, too.  (In fact, reconnect is even slightly better here since
> it has a configurable finite timeout while the message delay does not
> AFAIK.)
> 
> Does it make sense?

Hmm, yes..

But then we should fix the original deadlock some other way.. Probably it will
be possible only by using iothread for nbd node (I failed to find original
thread where someone said that iothreads is a solution). And than we can do cancel
in nbd_client_co_drain_begin() only if bs doesn't have a separate iothread.

> 
>> Or, the problem is that after drained section (assume it was short) we
>> are in NBD_CLIENT_CONNECTING_NOWAIT ?  Fixing it should be simple
>> enough, we just need to check the time at drained_end and if the delay
>> is not expired enable NBD_CLIENT_CONNECTING_WAIT agaub,,
> 
> Hmm, I didn't get thus far but this is probably an issue too.
> 
> Thanks,
> Roman.
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay
  2021-02-03 14:44         ` Vladimir Sementsov-Ogievskiy
@ 2021-02-03 15:00           ` Roman Kagan
  0 siblings, 0 replies; 19+ messages in thread
From: Roman Kagan @ 2021-02-03 15:00 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: kwolf, qemu-block, qemu-devel, mreitz, den

On Wed, Feb 03, 2021 at 05:44:34PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 03.02.2021 17:21, Roman Kagan wrote:
> > On Wed, Feb 03, 2021 at 04:10:41PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > 03.02.2021 13:53, Roman Kagan wrote:
> > > > On Thu, Sep 03, 2020 at 10:02:58PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > We pause reconnect process during drained section. So, if we have some
> > > > > requests, waiting for reconnect we should cancel them, otherwise they
> > > > > deadlock the drained section.
> > > > > 
> > > > > How to reproduce:
> > > > > 
> > > > > 1. Create an image:
> > > > >      qemu-img create -f qcow2 xx 100M
> > > > > 
> > > > > 2. Start NBD server:
> > > > >      qemu-nbd xx
> > > > > 
> > > > > 3. Start vm with second nbd disk on node2, like this:
> > > > > 
> > > > >     ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
> > > > >        file=/work/images/cent7.qcow2 -drive \
> > > > >        driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
> > > > >        -vnc :0 -m 2G -enable-kvm -vga std
> > > > > 
> > > > > 4. Access the vm through vnc (or some other way?), and check that NBD
> > > > >      drive works:
> > > > > 
> > > > >      dd if=/dev/sdb of=/dev/null bs=1M count=10
> > > > > 
> > > > >      - the command should succeed.
> > > > > 
> > > > > 5. Now, kill the nbd server, and run dd in the guest again:
> > > > > 
> > > > >      dd if=/dev/sdb of=/dev/null bs=1M count=10
> > > > > 
> > > > > Now Qemu is trying to reconnect, and dd-generated requests are waiting
> > > > > for the connection (they will wait up to 60 seconds (see
> > > > > reconnect-delay option above) and than fail). But suddenly, vm may
> > > > > totally hang in the deadlock. You may need to increase reconnect-delay
> > > > > period to catch the dead-lock.
> > > > > 
> > > > > VM doesn't respond because drain dead-lock happens in cpu thread with
> > > > > global mutex taken. That's not good thing by itself and is not fixed
> > > > > by this commit (true way is using iothreads). Still this commit fixes
> > > > > drain dead-lock itself.
> > > > > 
> > > > > Note: probably, we can instead continue to reconnect during drained
> > > > > section. To achieve this, we may move negotiation to the connect thread
> > > > > to make it independent of bs aio context. But expanding drained section
> > > > > doesn't seem good anyway. So, let's now fix the bug the simplest way.
> > > > > 
> > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > > > ---
> > > > >    block/nbd.c | 5 +++++
> > > > >    1 file changed, 5 insertions(+)
> > > > > 
> > > > > diff --git a/block/nbd.c b/block/nbd.c
> > > > > index 9daf003bea..912ea27be7 100644
> > > > > --- a/block/nbd.c
> > > > > +++ b/block/nbd.c
> > > > > @@ -242,6 +242,11 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
> > > > >        }
> > > > >        nbd_co_establish_connection_cancel(bs, false);
> > > > > +
> > > > > +    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
> > > > > +        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
> > > > > +        qemu_co_queue_restart_all(&s->free_sema);
> > > > > +    }
> > > > >    }
> > > > >    static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs)
> > > > 
> > > > This basically defeats the whole purpose of reconnect: if the nbd client
> > > > is trying to reconnect, drain effectively cancels that and makes all
> > > > in-flight requests to complete with an error.
> > > > 
> > > > I'm not suggesting to revert this patch (it's now in the tree as commit
> > > > 8c517de24a), because the deadlock is no better, but I'm afraid the only
> > > > real fix is to implement reconnect during the drain section.  I'm still
> > > > trying to get my head around it so no patch yet, but I just wanted to
> > > > bring this up in case anybody beats me to it.
> > > > 
> > > 
> > > 
> > > What do you mean by "reconnect during drained section"? Trying to
> > > establish the connection? Or keeping in-flight requests instead of
> > > cancelling them? We can't keep in-flight requests during drained
> > > section, as it's the whole sense of drained-section: no in-flight
> > > requests. So we'll have to wait for them at drain_begin (waiting up to
> > > reconnect-delay, which doesn't seem good and triggers dead-lock for
> > > non-iothread environment) or just cancel them..
> > > 
> > > Do you argue that waiting on drained-begin is somehow better than
> > > cancelling?
> > 
> > Sorry I should have used more accurate wording to be clear.
> > 
> > Yes, my point is that canceling the requests on entry to a drained
> > section is incorrect.  And no, it doesn't matter if it can be long:
> > that's the price you pay for doing the drain.  Think of reconnect as a
> > special case of a slow connection: if an nbd reply from the server is
> > delayed for whatever reason without dropping the connection, you have to
> > wait here, too.  (In fact, reconnect is even slightly better here since
> > it has a configurable finite timeout while the message delay does not
> > AFAIK.)
> > 
> > Does it make sense?
> 
> Hmm, yes..
> 
> But then we should fix the original deadlock some other way.

Exactly.

> Probably it will be possible only by using iothread for nbd node (I
> failed to find original thread where someone said that iothreads is a
> solution). And than we can do cancel in nbd_client_co_drain_begin()
> only if bs doesn't have a separate iothread.

Without this commit, I see qemu hang with nbd in an iothread, too.  I'll
double-check if it's that very same deadlock or something else.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-02-03 15:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-03 19:02 [PATCH 0/4] nbd reconnect new fixes Vladimir Sementsov-Ogievskiy
2020-09-03 19:02 ` [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay Vladimir Sementsov-Ogievskiy
2020-09-23 15:08   ` Eric Blake
2021-02-03 10:53   ` Roman Kagan
2021-02-03 13:10     ` Vladimir Sementsov-Ogievskiy
2021-02-03 14:21       ` Roman Kagan
2021-02-03 14:44         ` Vladimir Sementsov-Ogievskiy
2021-02-03 15:00           ` Roman Kagan
2020-09-03 19:02 ` [PATCH 2/4] block/nbd: correctly use qio_channel_detach_aio_context when needed Vladimir Sementsov-Ogievskiy
2020-09-23 15:10   ` Eric Blake
2020-09-03 19:03 ` [PATCH 3/4] block/nbd: fix reconnect-delay Vladimir Sementsov-Ogievskiy
2020-09-23 15:15   ` Eric Blake
2020-09-03 19:03 ` [PATCH 4/4] block/nbd: nbd_co_reconnect_loop(): don't connect if drained Vladimir Sementsov-Ogievskiy
2020-09-23 15:16   ` Eric Blake
2020-09-04 13:32 ` [PATCH 0/4] nbd reconnect new fixes no-reply
2020-09-04 14:00   ` Vladimir Sementsov-Ogievskiy
2020-09-04 13:34 ` no-reply
2020-09-04 14:01   ` Vladimir Sementsov-Ogievskiy
2020-09-18 18:29 ` Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).