qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] block/nbd: decouple reconnect from drain
@ 2021-03-15  6:06 Roman Kagan
  2021-03-15  6:06 ` [PATCH 1/7] block/nbd: avoid touching freed connect_thread Roman Kagan
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-15  6:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, Max Reitz, yc-core

The reconnection logic doesn't need to stop while in a drained section.\r
Moreover it has to be active during the drained section, as the requests\r
that were caught in-flight with the connection to the server broken can\r
only usefully get drained if the connection is restored.  Otherwise such\r
requests can only either stall resulting in a deadlock (before\r
8c517de24a), or be aborted defeating the purpose of the reconnection\r
machinery (after 8c517de24a).\r
\r
This series aims to just stop messing with the drained section in the\r
reconnection code.\r
\r
While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict\r
connection_co reentrance"); as I've missed the point of that commit I'd\r
appreciate more scrutiny in this area.\r
\r
Roman Kagan (7):\r
  block/nbd: avoid touching freed connect_thread\r
  block/nbd: use uniformly nbd_client_connecting_wait\r
  block/nbd: assert attach/detach runs in the proper context\r
  block/nbd: transfer reconnection stuff across aio_context switch\r
  block/nbd: better document a case in nbd_co_establish_connection\r
  block/nbd: decouple reconnect from drain\r
  block/nbd: stop manipulating in_flight counter\r
\r
 block/nbd.c  | 191 +++++++++++++++++++++++----------------------------\r
 nbd/client.c |   2 -\r
 2 files changed, 86 insertions(+), 107 deletions(-)\r
\r
-- \r
2.30.2\r
\r


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/7] block/nbd: avoid touching freed connect_thread
  2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
@ 2021-03-15  6:06 ` Roman Kagan
  2021-03-15 15:40   ` Vladimir Sementsov-Ogievskiy
  2021-04-06 16:21   ` Vladimir Sementsov-Ogievskiy
  2021-03-15  6:06 ` [PATCH 2/7] block/nbd: use uniformly nbd_client_connecting_wait Roman Kagan
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-15  6:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, Max Reitz, yc-core

When the NBD connection is being torn down, the connection thread gets
canceled and "detached", meaning it is about to get freed.

If this happens while the connection coroutine yielded waiting for the
connection thread to complete, when it resumes it may access the
invalidated connection thread data.

To prevent this, revalidate the ->connect_thread pointer in
nbd_co_establish_connection_cancel before using after the the yield.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
 block/nbd.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index c26dc5a54f..447d176b76 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -486,6 +486,15 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
     s->wait_connect = true;
     qemu_coroutine_yield();
 
+    /*
+     * If nbd_co_establish_connection_cancel had a chance to run it may have
+     * invalidated ->connect_thread.
+     */
+    thr = s->connect_thread;
+    if (!thr) {
+        return -ECONNABORTED;
+    }
+
     qemu_mutex_lock(&thr->mutex);
 
     switch (thr->state) {
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/7] block/nbd: use uniformly nbd_client_connecting_wait
  2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
  2021-03-15  6:06 ` [PATCH 1/7] block/nbd: avoid touching freed connect_thread Roman Kagan
@ 2021-03-15  6:06 ` Roman Kagan
  2021-03-15 15:48   ` Vladimir Sementsov-Ogievskiy
  2021-03-15  6:06 ` [PATCH 3/7] block/nbd: assert attach/detach runs in the proper context Roman Kagan
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Roman Kagan @ 2021-03-15  6:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, Max Reitz, yc-core

Use nbd_client_connecting_wait uniformly all over the block/nbd.c.

While at this, drop the redundant check for nbd_client_connecting_wait
in reconnect_delay_timer_init, as all its callsites do this check too.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
 block/nbd.c | 34 +++++++++++++++-------------------
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 447d176b76..1d8edb5b21 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -165,6 +165,18 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s)
     s->x_dirty_bitmap = NULL;
 }
 
+static bool nbd_client_connecting(BDRVNBDState *s)
+{
+    NBDClientState state = qatomic_load_acquire(&s->state);
+    return state == NBD_CLIENT_CONNECTING_WAIT ||
+        state == NBD_CLIENT_CONNECTING_NOWAIT;
+}
+
+static bool nbd_client_connecting_wait(BDRVNBDState *s)
+{
+    return qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT;
+}
+
 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
     if (ret == -EIO) {
@@ -205,7 +217,7 @@ static void reconnect_delay_timer_cb(void *opaque)
 {
     BDRVNBDState *s = opaque;
 
-    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
+    if (nbd_client_connecting_wait(s)) {
         s->state = NBD_CLIENT_CONNECTING_NOWAIT;
         while (qemu_co_enter_next(&s->free_sema, NULL)) {
             /* Resume all queued requests */
@@ -217,10 +229,6 @@ static void reconnect_delay_timer_cb(void *opaque)
 
 static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns)
 {
-    if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTING_WAIT) {
-        return;
-    }
-
     assert(!s->reconnect_delay_timer);
     s->reconnect_delay_timer = aio_timer_new(bdrv_get_aio_context(s->bs),
                                              QEMU_CLOCK_REALTIME,
@@ -346,18 +354,6 @@ static void nbd_teardown_connection(BlockDriverState *bs)
     assert(!s->connection_co);
 }
 
-static bool nbd_client_connecting(BDRVNBDState *s)
-{
-    NBDClientState state = qatomic_load_acquire(&s->state);
-    return state == NBD_CLIENT_CONNECTING_WAIT ||
-        state == NBD_CLIENT_CONNECTING_NOWAIT;
-}
-
-static bool nbd_client_connecting_wait(BDRVNBDState *s)
-{
-    return qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT;
-}
-
 static void connect_bh(void *opaque)
 {
     BDRVNBDState *state = opaque;
@@ -667,7 +663,7 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
     uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
     uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;
 
-    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
+    if (nbd_client_connecting_wait(s)) {
         reconnect_delay_timer_init(s, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
                                    s->reconnect_delay * NANOSECONDS_PER_SECOND);
     }
@@ -2473,7 +2469,7 @@ static void nbd_cancel_in_flight(BlockDriverState *bs)
 
     reconnect_delay_timer_del(s);
 
-    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+    if (nbd_client_connecting_wait(s)) {
         s->state = NBD_CLIENT_CONNECTING_NOWAIT;
         qemu_co_queue_restart_all(&s->free_sema);
     }
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/7] block/nbd: assert attach/detach runs in the proper context
  2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
  2021-03-15  6:06 ` [PATCH 1/7] block/nbd: avoid touching freed connect_thread Roman Kagan
  2021-03-15  6:06 ` [PATCH 2/7] block/nbd: use uniformly nbd_client_connecting_wait Roman Kagan
@ 2021-03-15  6:06 ` Roman Kagan
  2021-03-15 16:41   ` Vladimir Sementsov-Ogievskiy
  2021-03-15  6:06 ` [PATCH 4/7] block/nbd: transfer reconnection stuff across aio_context switch Roman Kagan
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Roman Kagan @ 2021-03-15  6:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, Max Reitz, yc-core

Document (via a comment and an assert) that
nbd_client_detach_aio_context and nbd_client_attach_aio_context_bh run
in the desired aio_context.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
 block/nbd.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 1d8edb5b21..658b827d24 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -241,6 +241,12 @@ static void nbd_client_detach_aio_context(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
+    /*
+     * This runs in the (old, about to be detached) aio context of the @bs so
+     * accessing the stuff on @s is concurrency-free.
+     */
+    assert(qemu_get_current_aio_context() == bdrv_get_aio_context(bs));
+
     /* Timer is deleted in nbd_client_co_drain_begin() */
     assert(!s->reconnect_delay_timer);
     /*
@@ -258,6 +264,12 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
     BlockDriverState *bs = opaque;
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
+    /*
+     * This runs in the (new, just attached) aio context of the @bs so
+     * accessing the stuff on @s is concurrency-free.
+     */
+    assert(qemu_get_current_aio_context() == bdrv_get_aio_context(bs));
+
     if (s->connection_co) {
         /*
          * The node is still drained, so we know the coroutine has yielded in
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/7] block/nbd: transfer reconnection stuff across aio_context switch
  2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
                   ` (2 preceding siblings ...)
  2021-03-15  6:06 ` [PATCH 3/7] block/nbd: assert attach/detach runs in the proper context Roman Kagan
@ 2021-03-15  6:06 ` Roman Kagan
  2021-03-15  6:06 ` [PATCH 5/7] block/nbd: better document a case in nbd_co_establish_connection Roman Kagan
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-15  6:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, Max Reitz, yc-core

Make varios pieces of reconnection logic correctly survive the
transition of the BDRVNBDState from one aio_context to another.  In
particular,

- cancel the reconnect_delay_timer and rearm it in the new context;
- cancel the sleep of the connection_co between reconnect attempt so
  that it continues in the new context;
- prevent the connection thread from delivering its status to the old
  context, and retartget it to the new context on attach.

None of these is needed at the moment because the aio_context switch
happens within a drained section and that effectively terminates the
reconnection logic on entry and starts it over on exit.  However, this
patch paves the way to keeping the reconnection process active across
the drained section (in a followup patch).

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
 block/nbd.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 658b827d24..a6d713ba58 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -126,6 +126,7 @@ typedef struct BDRVNBDState {
     bool wait_in_flight;
 
     QEMUTimer *reconnect_delay_timer;
+    int64_t reconnect_expire_time_ns;
 
     NBDClientRequest requests[MAX_NBD_REQUESTS];
     NBDReply reply;
@@ -240,6 +241,7 @@ static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns)
 static void nbd_client_detach_aio_context(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+    NBDConnectThread *thr = s->connect_thread;
 
     /*
      * This runs in the (old, about to be detached) aio context of the @bs so
@@ -247,8 +249,31 @@ static void nbd_client_detach_aio_context(BlockDriverState *bs)
      */
     assert(qemu_get_current_aio_context() == bdrv_get_aio_context(bs));
 
-    /* Timer is deleted in nbd_client_co_drain_begin() */
-    assert(!s->reconnect_delay_timer);
+    /*
+     * Make sure the connection thread doesn't try to deliver its status to the
+     * old context.
+     */
+    qemu_mutex_lock(&thr->mutex);
+    thr->bh_ctx = NULL;
+    qemu_mutex_unlock(&thr->mutex);
+
+    /*
+     * Preserve the expiration time of the reconnect_delay_timer in order to
+     * resume it on the new aio context.
+     */
+    s->reconnect_expire_time_ns = s->reconnect_delay_timer ?
+        timer_expire_time_ns(s->reconnect_delay_timer) : -1;
+    reconnect_delay_timer_del(s);
+
+    /*
+     * If the connection coroutine was sleeping between reconnect attempts,
+     * wake it up now and let it continue the process in the new aio context.
+     * This will distort the exponential back-off but that's probably ok.
+     */
+    if (s->connection_co_sleep_ns_state) {
+        qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
+    }
+
     /*
      * If reconnect is in progress we may have no ->ioc.  It will be
      * re-instantiated in the proper aio context once the connection is
@@ -263,6 +288,7 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
 {
     BlockDriverState *bs = opaque;
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+    NBDConnectThread *thr = s->connect_thread;
 
     /*
      * This runs in the (new, just attached) aio context of the @bs so
@@ -270,6 +296,20 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
      */
     assert(qemu_get_current_aio_context() == bdrv_get_aio_context(bs));
 
+    if (nbd_client_connecting_wait(s) && s->reconnect_expire_time_ns >= 0) {
+        reconnect_delay_timer_init(s, s->reconnect_expire_time_ns);
+    }
+
+    /*
+     * If the connection thread hasn't completed connecting yet, make sure it
+     * can deliver its status in the new context.
+     */
+    qemu_mutex_lock(&thr->mutex);
+    if (thr->state == CONNECT_THREAD_RUNNING) {
+        thr->bh_ctx = qemu_get_current_aio_context();
+    }
+    qemu_mutex_unlock(&thr->mutex);
+
     if (s->connection_co) {
         /*
          * The node is still drained, so we know the coroutine has yielded in
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 5/7] block/nbd: better document a case in nbd_co_establish_connection
  2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
                   ` (3 preceding siblings ...)
  2021-03-15  6:06 ` [PATCH 4/7] block/nbd: transfer reconnection stuff across aio_context switch Roman Kagan
@ 2021-03-15  6:06 ` Roman Kagan
  2021-03-15  6:06 ` [PATCH 6/7] block/nbd: decouple reconnect from drain Roman Kagan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-15  6:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, Max Reitz, yc-core

Cosmetic: adjust the comment and the return value in
nbd_co_establish_connection where it's entered while the connection
thread is still running.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
 block/nbd.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index a6d713ba58..a3eb9b9079 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -562,11 +562,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
     case CONNECT_THREAD_RUNNING:
     case CONNECT_THREAD_RUNNING_DETACHED:
         /*
-         * Obviously, drained section wants to start. Report the attempt as
-         * failed. Still connect thread is executing in background, and its
+         * Spurious corotine resume before connection attempt has finished,
+         * presumably upon attaching a new aio_context. Report the attempt as
+         * failed.  Still connect thread is executing in background, and its
          * result may be used for next connection attempt.
          */
-        ret = -1;
+        ret = -ECONNABORTED;
         error_setg(errp, "Connection attempt cancelled by other operation");
         break;
 
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 6/7] block/nbd: decouple reconnect from drain
  2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
                   ` (4 preceding siblings ...)
  2021-03-15  6:06 ` [PATCH 5/7] block/nbd: better document a case in nbd_co_establish_connection Roman Kagan
@ 2021-03-15  6:06 ` Roman Kagan
  2021-03-15 20:10   ` Vladimir Sementsov-Ogievskiy
  2021-03-15  6:06 ` [PATCH 7/7] block/nbd: stop manipulating in_flight counter Roman Kagan
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Roman Kagan @ 2021-03-15  6:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, Max Reitz, yc-core

The reconnection logic doesn't need to stop while in a drained section.
Moreover it has to be active during the drained section, as the requests
that were caught in-flight with the connection to the server broken can
only usefully get drained if the connection is restored.  Otherwise such
requests can only either stall resulting in a deadlock (before
8c517de24a), or be aborted defeating the purpose of the reconnection
machinery (after 8c517de24a).

Since the pieces of the reconnection logic are now properly migrated
from one aio_context to another, it appears safe to just stop messing
with the drained section in the reconnection code.

Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
Fixes: 8c517de24a ("block/nbd: fix drain dead-lock because of nbd reconnect-delay")
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
 block/nbd.c | 79 +++--------------------------------------------------
 1 file changed, 4 insertions(+), 75 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index a3eb9b9079..a5a9e4aca5 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -117,8 +117,6 @@ typedef struct BDRVNBDState {
     Coroutine *connection_co;
     Coroutine *teardown_co;
     QemuCoSleepState *connection_co_sleep_ns_state;
-    bool drained;
-    bool wait_drained_end;
     int in_flight;
     NBDClientState state;
     int connect_status;
@@ -311,12 +309,6 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
     qemu_mutex_unlock(&thr->mutex);
 
     if (s->connection_co) {
-        /*
-         * The node is still drained, so we know the coroutine has yielded in
-         * nbd_read_eof(), the only place where bs->in_flight can reach 0, or
-         * it is entered for the first time. Both places are safe for entering
-         * the coroutine.
-         */
         qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
     }
     bdrv_dec_in_flight(bs);
@@ -344,37 +336,6 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs,
     aio_wait_bh_oneshot(new_context, nbd_client_attach_aio_context_bh, bs);
 }
 
-static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
-{
-    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-
-    s->drained = true;
-    if (s->connection_co_sleep_ns_state) {
-        qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
-    }
-
-    nbd_co_establish_connection_cancel(bs, false);
-
-    reconnect_delay_timer_del(s);
-
-    if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
-        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
-        qemu_co_queue_restart_all(&s->free_sema);
-    }
-}
-
-static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs)
-{
-    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-
-    s->drained = false;
-    if (s->wait_drained_end) {
-        s->wait_drained_end = false;
-        aio_co_wake(s->connection_co);
-    }
-}
-
-
 static void nbd_teardown_connection(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
@@ -686,16 +647,6 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 
     ret = nbd_client_handshake(s->bs, &local_err);
 
-    if (s->drained) {
-        s->wait_drained_end = true;
-        while (s->drained) {
-            /*
-             * We may be entered once from nbd_client_attach_aio_context_bh
-             * and then from nbd_client_co_drain_end. So here is a loop.
-             */
-            qemu_coroutine_yield();
-        }
-    }
     bdrv_inc_in_flight(s->bs);
 
 out:
@@ -724,26 +675,10 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
     nbd_reconnect_attempt(s);
 
     while (nbd_client_connecting(s)) {
-        if (s->drained) {
-            bdrv_dec_in_flight(s->bs);
-            s->wait_drained_end = true;
-            while (s->drained) {
-                /*
-                 * We may be entered once from nbd_client_attach_aio_context_bh
-                 * and then from nbd_client_co_drain_end. So here is a loop.
-                 */
-                qemu_coroutine_yield();
-            }
-            bdrv_inc_in_flight(s->bs);
-        } else {
-            qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
-                                      &s->connection_co_sleep_ns_state);
-            if (s->drained) {
-                continue;
-            }
-            if (timeout < max_timeout) {
-                timeout *= 2;
-            }
+        qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
+                                  &s->connection_co_sleep_ns_state);
+        if (timeout < max_timeout) {
+            timeout *= 2;
         }
 
         nbd_reconnect_attempt(s);
@@ -2548,8 +2483,6 @@ static BlockDriver bdrv_nbd = {
     .bdrv_getlength             = nbd_getlength,
     .bdrv_detach_aio_context    = nbd_client_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_client_attach_aio_context,
-    .bdrv_co_drain_begin        = nbd_client_co_drain_begin,
-    .bdrv_co_drain_end          = nbd_client_co_drain_end,
     .bdrv_refresh_filename      = nbd_refresh_filename,
     .bdrv_co_block_status       = nbd_client_co_block_status,
     .bdrv_dirname               = nbd_dirname,
@@ -2577,8 +2510,6 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_getlength             = nbd_getlength,
     .bdrv_detach_aio_context    = nbd_client_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_client_attach_aio_context,
-    .bdrv_co_drain_begin        = nbd_client_co_drain_begin,
-    .bdrv_co_drain_end          = nbd_client_co_drain_end,
     .bdrv_refresh_filename      = nbd_refresh_filename,
     .bdrv_co_block_status       = nbd_client_co_block_status,
     .bdrv_dirname               = nbd_dirname,
@@ -2606,8 +2537,6 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_getlength             = nbd_getlength,
     .bdrv_detach_aio_context    = nbd_client_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_client_attach_aio_context,
-    .bdrv_co_drain_begin        = nbd_client_co_drain_begin,
-    .bdrv_co_drain_end          = nbd_client_co_drain_end,
     .bdrv_refresh_filename      = nbd_refresh_filename,
     .bdrv_co_block_status       = nbd_client_co_block_status,
     .bdrv_dirname               = nbd_dirname,
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 7/7] block/nbd: stop manipulating in_flight counter
  2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
                   ` (5 preceding siblings ...)
  2021-03-15  6:06 ` [PATCH 6/7] block/nbd: decouple reconnect from drain Roman Kagan
@ 2021-03-15  6:06 ` Roman Kagan
  2021-03-15 20:15   ` Vladimir Sementsov-Ogievskiy
  2021-03-15 19:45 ` [PATCH 0/7] block/nbd: decouple reconnect from drain Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Roman Kagan @ 2021-03-15  6:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, Max Reitz, yc-core

As the reconnect logic no longer interferes with drained sections, it
appears unnecessary to explicitly manipulate the in_flight counter.

Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
 block/nbd.c  | 6 ------
 nbd/client.c | 2 --
 2 files changed, 8 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index a5a9e4aca5..3a22f5d897 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -311,7 +311,6 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
     if (s->connection_co) {
         qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
     }
-    bdrv_dec_in_flight(bs);
 }
 
 static void nbd_client_attach_aio_context(BlockDriverState *bs,
@@ -327,7 +326,6 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs,
         qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
     }
 
-    bdrv_inc_in_flight(bs);
 
     /*
      * Need to wait here for the BH to run because the BH must run while the
@@ -643,11 +641,9 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
         goto out;
     }
 
-    bdrv_dec_in_flight(s->bs);
 
     ret = nbd_client_handshake(s->bs, &local_err);
 
-    bdrv_inc_in_flight(s->bs);
 
 out:
     s->connect_status = ret;
@@ -759,7 +755,6 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
 
     qemu_co_queue_restart_all(&s->free_sema);
     nbd_recv_coroutines_wake_all(s);
-    bdrv_dec_in_flight(s->bs);
 
     s->connection_co = NULL;
     if (s->ioc) {
@@ -2307,7 +2302,6 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
     nbd_init_connect_thread(s);
 
     s->connection_co = qemu_coroutine_create(nbd_connection_entry, s);
-    bdrv_inc_in_flight(bs);
     aio_co_schedule(bdrv_get_aio_context(bs), s->connection_co);
 
     return 0;
diff --git a/nbd/client.c b/nbd/client.c
index 0c2db4bcba..30d5383cb1 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1434,9 +1434,7 @@ nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,
 
         len = qio_channel_readv(ioc, &iov, 1, errp);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
-            bdrv_dec_in_flight(bs);
             qio_channel_yield(ioc, G_IO_IN);
-            bdrv_inc_in_flight(bs);
             continue;
         } else if (len < 0) {
             return -EIO;
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] block/nbd: avoid touching freed connect_thread
  2021-03-15  6:06 ` [PATCH 1/7] block/nbd: avoid touching freed connect_thread Roman Kagan
@ 2021-03-15 15:40   ` Vladimir Sementsov-Ogievskiy
  2021-03-16 15:29     ` Roman Kagan
  2021-04-06 16:21   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-15 15:40 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel
  Cc: yc-core, Eric Blake, Max Reitz, Kevin Wolf, qemu-block

15.03.2021 09:06, Roman Kagan wrote:
> When the NBD connection is being torn down, the connection thread gets
> canceled and "detached", meaning it is about to get freed.
> 
> If this happens while the connection coroutine yielded waiting for the
> connection thread to complete, when it resumes it may access the
> invalidated connection thread data.
> 
> To prevent this, revalidate the ->connect_thread pointer in
> nbd_co_establish_connection_cancel before using after the the yield.
> 
> Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>

Seems possible. Do you have a reproducer? Would be great to make an iotest.

> ---
>   block/nbd.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index c26dc5a54f..447d176b76 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -486,6 +486,15 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
>       s->wait_connect = true;
>       qemu_coroutine_yield();
>   
> +    /*
> +     * If nbd_co_establish_connection_cancel had a chance to run it may have
> +     * invalidated ->connect_thread.
> +     */
> +    thr = s->connect_thread;
> +    if (!thr) {
> +        return -ECONNABORTED;
> +    }
> +
>       qemu_mutex_lock(&thr->mutex);
>   
>       switch (thr->state) {
> 

Patch looks a bit like s->connect_thread may become something other than NULL during the yield.. But it can't actually. Maybe it will be possible with further patches?

Anyway, make sense to me:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] block/nbd: use uniformly nbd_client_connecting_wait
  2021-03-15  6:06 ` [PATCH 2/7] block/nbd: use uniformly nbd_client_connecting_wait Roman Kagan
@ 2021-03-15 15:48   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-15 15:48 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel
  Cc: yc-core, Eric Blake, Max Reitz, Kevin Wolf, qemu-block

15.03.2021 09:06, Roman Kagan wrote:
> Use nbd_client_connecting_wait uniformly all over the block/nbd.c.
> 
> While at this, drop the redundant check for nbd_client_connecting_wait
> in reconnect_delay_timer_init, as all its callsites do this check too.
> 
> Signed-off-by: Roman Kagan<rvkagan@yandex-team.ru>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] block/nbd: assert attach/detach runs in the proper context
  2021-03-15  6:06 ` [PATCH 3/7] block/nbd: assert attach/detach runs in the proper context Roman Kagan
@ 2021-03-15 16:41   ` Vladimir Sementsov-Ogievskiy
  2021-03-15 19:57     ` Roman Kagan
  0 siblings, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-15 16:41 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel
  Cc: yc-core, Eric Blake, Max Reitz, Kevin Wolf, qemu-block

15.03.2021 09:06, Roman Kagan wrote:
> Document (via a comment and an assert) that
> nbd_client_detach_aio_context and nbd_client_attach_aio_context_bh run
> in the desired aio_context
> 
> Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
> ---
>   block/nbd.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 1d8edb5b21..658b827d24 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -241,6 +241,12 @@ static void nbd_client_detach_aio_context(BlockDriverState *bs)
>   {
>       BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
>   
> +    /*
> +     * This runs in the (old, about to be detached) aio context of the @bs so
> +     * accessing the stuff on @s is concurrency-free.
> +     */
> +    assert(qemu_get_current_aio_context() == bdrv_get_aio_context(bs));

Hmm. I don't think so. The handler is called from bdrv_set_aio_context_ignore(), which have the assertion g_assert(qemu_get_current_aio_context() == qemu_get_aio_context());. There is also a comment above bdrv_set_aio_context_ignore() "The caller must own the AioContext lock for the old AioContext of bs".

So, we are not in the home context of bs here. We are in the main aio context and we hold AioContext lock of old bs context.

> +
>       /* Timer is deleted in nbd_client_co_drain_begin() */
>       assert(!s->reconnect_delay_timer);
>       /*
> @@ -258,6 +264,12 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
>       BlockDriverState *bs = opaque;
>       BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
>   
> +    /*
> +     * This runs in the (new, just attached) aio context of the @bs so
> +     * accessing the stuff on @s is concurrency-free.
> +     */
> +    assert(qemu_get_current_aio_context() == bdrv_get_aio_context(bs));

This is correct just because we are in a BH, scheduled for this context (I hope we can't reattach some third context prior to entering the BH in the second:). So, I don't think this assertion really adds something.

> +
>       if (s->connection_co) {
>           /*
>            * The node is still drained, so we know the coroutine has yielded in
> 

I'm not sure that the asserted fact gives us "concurrency-free". For this we also need to ensure that all other things in the driver are always called in same aio context.. Still, it's a general assumption we have when writing block drivers "everything in one aio context, so don't care".. Sometime it leads to bugs, as some things are still called even from non-coroutine context. Also, Paolo said (https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg03814.html) that many iothreads will send requests to bs, and the code in driver should be thread safe. I don't have good understanding of all these things, and what I have is:

   For now (at least we don't have problems in Rhel based downstream) it maybe OK to think that in block-driver everything is protected by AioContext lock and all concurrency we have inside block driver is switching between coroutines but never real parallelism. But in general it's not so, and with multiqueue it's not so.. So, I'd not put such a comment :)

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] block/nbd: decouple reconnect from drain
  2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
                   ` (6 preceding siblings ...)
  2021-03-15  6:06 ` [PATCH 7/7] block/nbd: stop manipulating in_flight counter Roman Kagan
@ 2021-03-15 19:45 ` Vladimir Sementsov-Ogievskiy
  2021-03-16 15:52   ` Roman Kagan
  2021-03-16 14:41 ` Eric Blake
  2021-03-17  8:35 ` Vladimir Sementsov-Ogievskiy
  9 siblings, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-15 19:45 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel
  Cc: yc-core, Eric Blake, Max Reitz, Kevin Wolf, qemu-block

15.03.2021 09:06, Roman Kagan wrote:
> The reconnection logic doesn't need to stop while in a drained section.
> Moreover it has to be active during the drained section, as the requests
> that were caught in-flight with the connection to the server broken can
> only usefully get drained if the connection is restored.  Otherwise such
> requests can only either stall resulting in a deadlock (before
> 8c517de24a), or be aborted defeating the purpose of the reconnection
> machinery (after 8c517de24a).
> 
> This series aims to just stop messing with the drained section in the
> reconnection code.
> 
> While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> connection_co reentrance"); as I've missed the point of that commit I'd
> appreciate more scrutiny in this area.


The actual point is:

connection_co (together with all functions called from it) has a lot of yield points. And we can't just enter the coroutine in any of the when we want, as it may break some BH which is actually waited for in this yield point..

Still, we should care only about yield points possible during drained section, so we don't need to care about direct qemu_coroutine_yield() inside nbd_connection_entry().

Many things changed since 5ad81b4946.. So probably, now all the (possible during drained section) yield points in nbd_connection_entry support reentering. But some analysis of possible yield points should be done.

> 
> Roman Kagan (7):
>    block/nbd: avoid touching freed connect_thread
>    block/nbd: use uniformly nbd_client_connecting_wait
>    block/nbd: assert attach/detach runs in the proper context
>    block/nbd: transfer reconnection stuff across aio_context switch
>    block/nbd: better document a case in nbd_co_establish_connection
>    block/nbd: decouple reconnect from drain
>    block/nbd: stop manipulating in_flight counter
> 
>   block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
>   nbd/client.c |   2 -
>   2 files changed, 86 insertions(+), 107 deletions(-)
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] block/nbd: assert attach/detach runs in the proper context
  2021-03-15 16:41   ` Vladimir Sementsov-Ogievskiy
@ 2021-03-15 19:57     ` Roman Kagan
  0 siblings, 0 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-15 19:57 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, yc-core

On Mon, Mar 15, 2021 at 07:41:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > Document (via a comment and an assert) that
> > nbd_client_detach_aio_context and nbd_client_attach_aio_context_bh run
> > in the desired aio_context
> > 
> > Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
> > ---
> >   block/nbd.c | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> > 
> > diff --git a/block/nbd.c b/block/nbd.c
> > index 1d8edb5b21..658b827d24 100644
> > --- a/block/nbd.c
> > +++ b/block/nbd.c
> > @@ -241,6 +241,12 @@ static void nbd_client_detach_aio_context(BlockDriverState *bs)
> >   {
> >       BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
> > +    /*
> > +     * This runs in the (old, about to be detached) aio context of the @bs so
> > +     * accessing the stuff on @s is concurrency-free.
> > +     */
> > +    assert(qemu_get_current_aio_context() == bdrv_get_aio_context(bs));
> 
> Hmm. I don't think so. The handler is called from bdrv_set_aio_context_ignore(), which have the assertion g_assert(qemu_get_current_aio_context() == qemu_get_aio_context());. There is also a comment above bdrv_set_aio_context_ignore() "The caller must own the AioContext lock for the old AioContext of bs".
> 
> So, we are not in the home context of bs here. We are in the main aio context and we hold AioContext lock of old bs context.

You're absolutely right.  I'm wondering where I got the idea of this
assertion from...

> 
> > +
> >       /* Timer is deleted in nbd_client_co_drain_begin() */
> >       assert(!s->reconnect_delay_timer);
> >       /*
> > @@ -258,6 +264,12 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
> >       BlockDriverState *bs = opaque;
> >       BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
> > +    /*
> > +     * This runs in the (new, just attached) aio context of the @bs so
> > +     * accessing the stuff on @s is concurrency-free.
> > +     */
> > +    assert(qemu_get_current_aio_context() == bdrv_get_aio_context(bs));
> 
> This is correct just because we are in a BH, scheduled for this
> context (I hope we can't reattach some third context prior to entering
> the BH in the second:). So, I don't think this assertion really adds
> something.

Indeed.

> > +
> >       if (s->connection_co) {
> >           /*
> >            * The node is still drained, so we know the coroutine has yielded in
> > 
> 
> I'm not sure that the asserted fact gives us "concurrency-free". For
> this we also need to ensure that all other things in the driver are
> always called in same aio context.. Still, it's a general assumption
> we have when writing block drivers "everything in one aio context, so
> don't care".. Sometime it leads to bugs, as some things are still
> called even from non-coroutine context. Also, Paolo said
> (https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg03814.html)
> that many iothreads will send requests to bs, and the code in driver
> should be thread safe. I don't have good understanding of all these
> things, and what I have is:
> 
> For now (at least we don't have problems in Rhel based downstream) it
> maybe OK to think that in block-driver everything is protected by
> AioContext lock and all concurrency we have inside block driver is
> switching between coroutines but never real parallelism. But in
> general it's not so, and with multiqueue it's not so.. So, I'd not put
> such a comment :)

So the patch is bogus in every respect; let's just drop it.

I hope it doesn't invalidate completely the rest of the series though.

Meanwhile I certainly need to update my idea of concurrency assumptions
in the block layer.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] block/nbd: decouple reconnect from drain
  2021-03-15  6:06 ` [PATCH 6/7] block/nbd: decouple reconnect from drain Roman Kagan
@ 2021-03-15 20:10   ` Vladimir Sementsov-Ogievskiy
  2021-03-16 16:03     ` Roman Kagan
  0 siblings, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-15 20:10 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel
  Cc: yc-core, Eric Blake, Max Reitz, Kevin Wolf, qemu-block

15.03.2021 09:06, Roman Kagan wrote:
> The reconnection logic doesn't need to stop while in a drained section.
> Moreover it has to be active during the drained section, as the requests
> that were caught in-flight with the connection to the server broken can
> only usefully get drained if the connection is restored.  Otherwise such
> requests can only either stall resulting in a deadlock (before
> 8c517de24a), or be aborted defeating the purpose of the reconnection
> machinery (after 8c517de24a).
> 
> Since the pieces of the reconnection logic are now properly migrated
> from one aio_context to another, it appears safe to just stop messing
> with the drained section in the reconnection code.
> 
> Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")

I'd not think that it "fixes" it. Behavior changes.. But 5ad81b4946 didn't introduce any bugs.

> Fixes: 8c517de24a ("block/nbd: fix drain dead-lock because of nbd reconnect-delay")

And here..

1. There is an existing problem (unrelated to nbd) in Qemu that long io request which we have to wait for at drained_begin may trigger a dead lock (https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html)

2. So, when we have nbd reconnect (and therefore long io requests) we simply trigger this deadlock.. That's why I decided to cancel the requests (assuming they will most probably fail anyway).

I agree that nbd driver is wrong place for fixing the problem described in (https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html), but if you just revert 8c517de24a, you'll see the deadlock again..



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] block/nbd: stop manipulating in_flight counter
  2021-03-15  6:06 ` [PATCH 7/7] block/nbd: stop manipulating in_flight counter Roman Kagan
@ 2021-03-15 20:15   ` Vladimir Sementsov-Ogievskiy
  2021-03-16 16:08     ` Roman Kagan
  0 siblings, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-15 20:15 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel
  Cc: yc-core, Eric Blake, Max Reitz, Kevin Wolf, qemu-block

15.03.2021 09:06, Roman Kagan wrote:
> As the reconnect logic no longer interferes with drained sections, it
> appears unnecessary to explicitly manipulate the in_flight counter.
> 
> Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")

And here you actually allow qemu_aio_coroutine_enter() call in nbd_client_attach_aio_context_bh() to enter connection_co in any yield point which is possible during drained section. The analysis should be done to be sure that all these yield points are safe for reentering by external qemu_aio_coroutine_enter(). (By external I mean not by the actual enter() we are waiting for at the yield() point. For example qemu_channel_yield() supports reentering.. And therefore (as I understand after fast looking through) nbd_read() should support reentering too..


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] block/nbd: decouple reconnect from drain
  2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
                   ` (7 preceding siblings ...)
  2021-03-15 19:45 ` [PATCH 0/7] block/nbd: decouple reconnect from drain Vladimir Sementsov-Ogievskiy
@ 2021-03-16 14:41 ` Eric Blake
  2021-03-16 16:10   ` Roman Kagan
  2021-03-17  8:35 ` Vladimir Sementsov-Ogievskiy
  9 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2021-03-16 14:41 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, yc-core, Max Reitz

On 3/15/21 1:06 AM, Roman Kagan wrote:
> The reconnection logic doesn't need to stop while in a drained section.
> Moreover it has to be active during the drained section, as the requests
> that were caught in-flight with the connection to the server broken can
> only usefully get drained if the connection is restored.  Otherwise such
> requests can only either stall resulting in a deadlock (before
> 8c517de24a), or be aborted defeating the purpose of the reconnection
> machinery (after 8c517de24a).
> 
> This series aims to just stop messing with the drained section in the
> reconnection code.
> 
> While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> connection_co reentrance"); as I've missed the point of that commit I'd
> appreciate more scrutiny in this area.

Soft freeze is today.  I'm leaning towards declaring this series as a
bug fix (and so give it some more soak time to get right, but still okay
for -rc1) rather than a feature addition (and therefore would need to be
in a pull request today).  Speak up now if this characterization is off
base.

> 
> Roman Kagan (7):
>   block/nbd: avoid touching freed connect_thread
>   block/nbd: use uniformly nbd_client_connecting_wait
>   block/nbd: assert attach/detach runs in the proper context
>   block/nbd: transfer reconnection stuff across aio_context switch
>   block/nbd: better document a case in nbd_co_establish_connection
>   block/nbd: decouple reconnect from drain
>   block/nbd: stop manipulating in_flight counter
> 
>  block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
>  nbd/client.c |   2 -
>  2 files changed, 86 insertions(+), 107 deletions(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] block/nbd: avoid touching freed connect_thread
  2021-03-15 15:40   ` Vladimir Sementsov-Ogievskiy
@ 2021-03-16 15:29     ` Roman Kagan
  0 siblings, 0 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-16 15:29 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, yc-core

On Mon, Mar 15, 2021 at 06:40:12PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > When the NBD connection is being torn down, the connection thread gets
> > canceled and "detached", meaning it is about to get freed.
> > 
> > If this happens while the connection coroutine yielded waiting for the
> > connection thread to complete, when it resumes it may access the
> > invalidated connection thread data.
> > 
> > To prevent this, revalidate the ->connect_thread pointer in
> > nbd_co_establish_connection_cancel before using after the the yield.
> > 
> > Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
> 
> Seems possible. Do you have a reproducer? Would be great to make an iotest.

It triggered on me in iotest 277, but seems to be timing-dependent.
I'll see if I can do it reliable.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] block/nbd: decouple reconnect from drain
  2021-03-15 19:45 ` [PATCH 0/7] block/nbd: decouple reconnect from drain Vladimir Sementsov-Ogievskiy
@ 2021-03-16 15:52   ` Roman Kagan
  0 siblings, 0 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-16 15:52 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, yc-core

On Mon, Mar 15, 2021 at 10:45:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > This series aims to just stop messing with the drained section in the
> > reconnection code.
> > 
> > While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> > connection_co reentrance"); as I've missed the point of that commit I'd
> > appreciate more scrutiny in this area.
> 
> 
> The actual point is:
> 
> connection_co (together with all functions called from it) has a lot of yield points. And we can't just enter the coroutine in any of the when we want, as it may break some BH which is actually waited for in this yield point..
> 
> Still, we should care only about yield points possible during drained section, so we don't need to care about direct qemu_coroutine_yield() inside nbd_connection_entry().
> 
> Many things changed since 5ad81b4946.. So probably, now all the (possible during drained section) yield points in nbd_connection_entry support reentering. But some analysis of possible yield points should be done.

Thanks for the explanation.  Will do this analysis.

Roman.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] block/nbd: decouple reconnect from drain
  2021-03-15 20:10   ` Vladimir Sementsov-Ogievskiy
@ 2021-03-16 16:03     ` Roman Kagan
  2021-03-16 18:09       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Roman Kagan @ 2021-03-16 16:03 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, yc-core

On Mon, Mar 15, 2021 at 11:10:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > Since the pieces of the reconnection logic are now properly migrated
> > from one aio_context to another, it appears safe to just stop messing
> > with the drained section in the reconnection code.
> > 
> > Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
> 
> I'd not think that it "fixes" it. Behavior changes.. But 5ad81b4946
> didn't introduce any bugs.

I guess you're right.

In fact I did reproduce the situation when the drain made no progress
exactly becase the only in-flight reference was taken by the
connection_co, but it may be due to some intermediate stage of the patch
development so I need to do a more thorough analysis to tell if it was
triggerable with the original code.

> > Fixes: 8c517de24a ("block/nbd: fix drain dead-lock because of nbd reconnect-delay")
> 
> And here..
> 
> 1. There is an existing problem (unrelated to nbd) in Qemu that long
> io request which we have to wait for at drained_begin may trigger a
> dead lock
> (https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html)
> 
> 2. So, when we have nbd reconnect (and therefore long io requests) we
> simply trigger this deadlock.. That's why I decided to cancel the
> requests (assuming they will most probably fail anyway).
> 
> I agree that nbd driver is wrong place for fixing the problem
> described in
> (https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html),
> but if you just revert 8c517de24a, you'll see the deadlock again..

I may have misunderstood that thread, but isn't that deadlock exactly
due to the requests being unable to ever drain because the
reconnection process is suspended while the drain is in progress?

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] block/nbd: stop manipulating in_flight counter
  2021-03-15 20:15   ` Vladimir Sementsov-Ogievskiy
@ 2021-03-16 16:08     ` Roman Kagan
  2021-03-16 18:37       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Roman Kagan @ 2021-03-16 16:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, yc-core

On Mon, Mar 15, 2021 at 11:15:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > As the reconnect logic no longer interferes with drained sections, it
> > appears unnecessary to explicitly manipulate the in_flight counter.
> > 
> > Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
> 
> And here you actually allow qemu_aio_coroutine_enter() call in
> nbd_client_attach_aio_context_bh() to enter connection_co in any yield
> point which is possible during drained section. The analysis should be
> done to be sure that all these yield points are safe for reentering by
> external qemu_aio_coroutine_enter(). (By external I mean not by the
> actual enter() we are waiting for at the yield() point. For example
> qemu_channel_yield() supports reentering.. And therefore (as I
> understand after fast looking through) nbd_read() should support
> reentering too..

I'll do a more thorough analysis of how safe it is.

FWIW this hasn't triggered any test failures yet, but that assert in
patch 3 didn't ever go off either so I'm not sure I can trust the tests
on this.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] block/nbd: decouple reconnect from drain
  2021-03-16 14:41 ` Eric Blake
@ 2021-03-16 16:10   ` Roman Kagan
  0 siblings, 0 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-16 16:10 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Max Reitz, yc-core

On Tue, Mar 16, 2021 at 09:41:36AM -0500, Eric Blake wrote:
> On 3/15/21 1:06 AM, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > This series aims to just stop messing with the drained section in the
> > reconnection code.
> > 
> > While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> > connection_co reentrance"); as I've missed the point of that commit I'd
> > appreciate more scrutiny in this area.
> 
> Soft freeze is today.  I'm leaning towards declaring this series as a
> bug fix (and so give it some more soak time to get right, but still okay
> for -rc1) rather than a feature addition (and therefore would need to be
> in a pull request today).  Speak up now if this characterization is off
> base.

Yes I'd consider it a bug fix, too.  I'll do my best beating it into
shape before -rc2.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] block/nbd: decouple reconnect from drain
  2021-03-16 16:03     ` Roman Kagan
@ 2021-03-16 18:09       ` Vladimir Sementsov-Ogievskiy
  2021-03-26  6:16         ` Roman Kagan
  0 siblings, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-16 18:09 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel, yc-core, Eric Blake, Max Reitz,
	Kevin Wolf, qemu-block

16.03.2021 19:03, Roman Kagan wrote:
> On Mon, Mar 15, 2021 at 11:10:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> 15.03.2021 09:06, Roman Kagan wrote:
>>> The reconnection logic doesn't need to stop while in a drained section.
>>> Moreover it has to be active during the drained section, as the requests
>>> that were caught in-flight with the connection to the server broken can
>>> only usefully get drained if the connection is restored.  Otherwise such
>>> requests can only either stall resulting in a deadlock (before
>>> 8c517de24a), or be aborted defeating the purpose of the reconnection
>>> machinery (after 8c517de24a).
>>>
>>> Since the pieces of the reconnection logic are now properly migrated
>>> from one aio_context to another, it appears safe to just stop messing
>>> with the drained section in the reconnection code.
>>>
>>> Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
>>
>> I'd not think that it "fixes" it. Behavior changes.. But 5ad81b4946
>> didn't introduce any bugs.
> 
> I guess you're right.
> 
> In fact I did reproduce the situation when the drain made no progress
> exactly becase the only in-flight reference was taken by the
> connection_co, but it may be due to some intermediate stage of the patch
> development so I need to do a more thorough analysis to tell if it was
> triggerable with the original code.
> 
>>> Fixes: 8c517de24a ("block/nbd: fix drain dead-lock because of nbd reconnect-delay")
>>
>> And here..
>>
>> 1. There is an existing problem (unrelated to nbd) in Qemu that long
>> io request which we have to wait for at drained_begin may trigger a
>> dead lock
>> (https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html)
>>
>> 2. So, when we have nbd reconnect (and therefore long io requests) we
>> simply trigger this deadlock.. That's why I decided to cancel the
>> requests (assuming they will most probably fail anyway).
>>
>> I agree that nbd driver is wrong place for fixing the problem
>> described in
>> (https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html),
>> but if you just revert 8c517de24a, you'll see the deadlock again..
> 
> I may have misunderstood that thread, but isn't that deadlock exactly
> due to the requests being unable to ever drain because the
> reconnection process is suspended while the drain is in progress?
> 


Hmm, I didn't thought about it this way.  What you mean is that reconnection is cancelled on drain_begin, so drain_begin will never finish, because it waits for requests, which will never be reconnected. So, you are right it's a deadlock too.

But as I remember what is described in 8c517de24a is another deadlock, triggered by "just a long request during drain_begin". And it may be triggered again, if the we'll try to reconnect for several seconds during drained_begin() instead of cancelling requests. Didn't you try the scenario described in 8c517de24a on top of your series?



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] block/nbd: stop manipulating in_flight counter
  2021-03-16 16:08     ` Roman Kagan
@ 2021-03-16 18:37       ` Vladimir Sementsov-Ogievskiy
  2021-03-26  7:41         ` Roman Kagan
  0 siblings, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-16 18:37 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel, yc-core, Eric Blake, Max Reitz,
	Kevin Wolf, qemu-block

16.03.2021 19:08, Roman Kagan wrote:
> On Mon, Mar 15, 2021 at 11:15:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> 15.03.2021 09:06, Roman Kagan wrote:
>>> As the reconnect logic no longer interferes with drained sections, it
>>> appears unnecessary to explicitly manipulate the in_flight counter.
>>>
>>> Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
>>
>> And here you actually allow qemu_aio_coroutine_enter() call in
>> nbd_client_attach_aio_context_bh() to enter connection_co in any yield
>> point which is possible during drained section. The analysis should be
>> done to be sure that all these yield points are safe for reentering by
>> external qemu_aio_coroutine_enter(). (By external I mean not by the
>> actual enter() we are waiting for at the yield() point. For example
>> qemu_channel_yield() supports reentering.. And therefore (as I
>> understand after fast looking through) nbd_read() should support
>> reentering too..
> 
> I'll do a more thorough analysis of how safe it is.
> 
> FWIW this hasn't triggered any test failures yet, but that assert in
> patch 3 didn't ever go off either so I'm not sure I can trust the tests
> on this.
> 

Hmm. First, we should consider qemu_coroutine_yield() in nbd_co_establish_connection().

Most of nbd_co_establish_connection_cancel() purpose is to avoid reentering this yield()..

And I don't know, how to make it safely reenterable: keep in mind bh that may be already scheduled by connect_thread_func(). And if bh is already scheduled, can we cancel it? I'm not sure.

We have qemu_bh_delete(). But is it possible, that BH is near to be executed and already cannot be removed by qemu_bh_delete()? I don't know.

And if we can't safely drop the bh at any moment, we should wait in nbd_client_detach_aio_context until the scheduled bh enters the connection_co.. Or something like this


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] block/nbd: decouple reconnect from drain
  2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
                   ` (8 preceding siblings ...)
  2021-03-16 14:41 ` Eric Blake
@ 2021-03-17  8:35 ` Vladimir Sementsov-Ogievskiy
  2021-03-26  8:07   ` Roman Kagan
  2021-04-07  7:45   ` Roman Kagan
  9 siblings, 2 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-17  8:35 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel
  Cc: yc-core, Eric Blake, Max Reitz, Kevin Wolf, qemu-block

15.03.2021 09:06, Roman Kagan wrote:
> The reconnection logic doesn't need to stop while in a drained section.
> Moreover it has to be active during the drained section, as the requests
> that were caught in-flight with the connection to the server broken can
> only usefully get drained if the connection is restored.  Otherwise such
> requests can only either stall resulting in a deadlock (before
> 8c517de24a), or be aborted defeating the purpose of the reconnection
> machinery (after 8c517de24a).
> 
> This series aims to just stop messing with the drained section in the
> reconnection code.
> 
> While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> connection_co reentrance"); as I've missed the point of that commit I'd
> appreciate more scrutiny in this area.
> 
> Roman Kagan (7):
>    block/nbd: avoid touching freed connect_thread
>    block/nbd: use uniformly nbd_client_connecting_wait
>    block/nbd: assert attach/detach runs in the proper context
>    block/nbd: transfer reconnection stuff across aio_context switch
>    block/nbd: better document a case in nbd_co_establish_connection
>    block/nbd: decouple reconnect from drain
>    block/nbd: stop manipulating in_flight counter
> 
>   block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
>   nbd/client.c |   2 -
>   2 files changed, 86 insertions(+), 107 deletions(-)
> 


Hmm. The huge source of problems for this series is weird logic around drain and aio context switch in NBD driver.

Why do we have all these too complicated logic with abuse of in_flight counter in NBD? The answer is connection_co. NBD differs from other drivers, it has a coroutine independent of request coroutines. And we have to move this coroutine carefully to new aio context. We can't just enter it from the new context, we want to be sure that connection_co is in one of yield points that supports reentering.

I have an idea of how to avoid this thing: drop connection_co at all.

1. nbd negotiation goes to connection thread and becomes independent of any aio context.

2. waiting for server reply goes to request code. So, instead of reading the replay from socket always in connection_co, we read in the request coroutine, after sending the request. We'll need a CoMutex for it (as only one request coroutine should read from socket), and be prepared to coming reply is not for _this_ request (in this case we should wake another request and continue read from socket).

but this may be too much for soft freeze.


Another idea:

You want all the requests be completed on drain_begin(), not cancelled. Actually, you don't need reconnect runnning during drained section for it. It should be enough just wait for all currenct requests before disabling the reconnect in drain_begin handler.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] block/nbd: decouple reconnect from drain
  2021-03-16 18:09       ` Vladimir Sementsov-Ogievskiy
@ 2021-03-26  6:16         ` Roman Kagan
  0 siblings, 0 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-26  6:16 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, yc-core

On Tue, Mar 16, 2021 at 09:09:12PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 16.03.2021 19:03, Roman Kagan wrote:
> > On Mon, Mar 15, 2021 at 11:10:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > 15.03.2021 09:06, Roman Kagan wrote:
> > > > The reconnection logic doesn't need to stop while in a drained section.
> > > > Moreover it has to be active during the drained section, as the requests
> > > > that were caught in-flight with the connection to the server broken can
> > > > only usefully get drained if the connection is restored.  Otherwise such
> > > > requests can only either stall resulting in a deadlock (before
> > > > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > > > machinery (after 8c517de24a).
> > > > 
> > > > Since the pieces of the reconnection logic are now properly migrated
> > > > from one aio_context to another, it appears safe to just stop messing
> > > > with the drained section in the reconnection code.
> > > > 
> > > > Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
> > > 
> > > I'd not think that it "fixes" it. Behavior changes.. But 5ad81b4946
> > > didn't introduce any bugs.
> > 
> > I guess you're right.
> > 
> > In fact I did reproduce the situation when the drain made no progress
> > exactly becase the only in-flight reference was taken by the
> > connection_co, but it may be due to some intermediate stage of the patch
> > development so I need to do a more thorough analysis to tell if it was
> > triggerable with the original code.
> > 
> > > > Fixes: 8c517de24a ("block/nbd: fix drain dead-lock because of nbd reconnect-delay")
> > > 
> > > And here..
> > > 
> > > 1. There is an existing problem (unrelated to nbd) in Qemu that long
> > > io request which we have to wait for at drained_begin may trigger a
> > > dead lock
> > > (https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html)
> > > 
> > > 2. So, when we have nbd reconnect (and therefore long io requests) we
> > > simply trigger this deadlock.. That's why I decided to cancel the
> > > requests (assuming they will most probably fail anyway).
> > > 
> > > I agree that nbd driver is wrong place for fixing the problem
> > > described in
> > > (https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html),
> > > but if you just revert 8c517de24a, you'll see the deadlock again..
> > 
> > I may have misunderstood that thread, but isn't that deadlock exactly
> > due to the requests being unable to ever drain because the
> > reconnection process is suspended while the drain is in progress?
> > 
> 
> 
> Hmm, I didn't thought about it this way.  What you mean is that
> reconnection is cancelled on drain_begin, so drain_begin will never
> finish, because it waits for requests, which will never be
> reconnected. So, you are right it's a deadlock too.

Right.

> But as I remember what is described in 8c517de24a is another deadlock,
> triggered by "just a long request during drain_begin". And it may be
> triggered again, if the we'll try to reconnect for several seconds
> during drained_begin() instead of cancelling requests.

IMO it wasn't a different deadlock, it wass exactly this one: the drain
got stuck the way described above with the qemu global mutex taken, so
the rest of qemu got stuck too.

> Didn't you try the scenario described in 8c517de24a on top of your
> series?

I've tried it with the current master with just 8c517de24a reverted --
the deadlock reproduced in all several attempts.  Then with my series --
the deadlock didn't reproduce in any of my several attempts.  More
precisely, qemu appeared frozen once the guest timed out the requests
and initiated ATA link soft reset, which presumably caused that drain,
but, as soon as the reconnect timer expired (resulting in requests
completed into the guest with errors) or the connection was restored
(resulting in requests completed successfully), it resumed normal
operation.

So yes, I think this series does address that deadlock.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] block/nbd: stop manipulating in_flight counter
  2021-03-16 18:37       ` Vladimir Sementsov-Ogievskiy
@ 2021-03-26  7:41         ` Roman Kagan
  0 siblings, 0 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-26  7:41 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, yc-core

On Tue, Mar 16, 2021 at 09:37:13PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 16.03.2021 19:08, Roman Kagan wrote:
> > On Mon, Mar 15, 2021 at 11:15:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > 15.03.2021 09:06, Roman Kagan wrote:
> > > > As the reconnect logic no longer interferes with drained sections, it
> > > > appears unnecessary to explicitly manipulate the in_flight counter.
> > > > 
> > > > Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
> > > 
> > > And here you actually allow qemu_aio_coroutine_enter() call in
> > > nbd_client_attach_aio_context_bh() to enter connection_co in any yield
> > > point which is possible during drained section. The analysis should be
> > > done to be sure that all these yield points are safe for reentering by
> > > external qemu_aio_coroutine_enter(). (By external I mean not by the
> > > actual enter() we are waiting for at the yield() point. For example
> > > qemu_channel_yield() supports reentering.. And therefore (as I
> > > understand after fast looking through) nbd_read() should support
> > > reentering too..
> > 
> > I'll do a more thorough analysis of how safe it is.
> > 
> > FWIW this hasn't triggered any test failures yet, but that assert in
> > patch 3 didn't ever go off either so I'm not sure I can trust the tests
> > on this.
> > 
> 
> Hmm. First, we should consider qemu_coroutine_yield() in
> nbd_co_establish_connection().
> 
> Most of nbd_co_establish_connection_cancel() purpose is to avoid
> reentering this yield()..

Unless I'm overlooking something, nbd_co_establish_connection() is fine
with spurious entering at this yield point.  What does look problematic,
though, is your next point:

> And I don't know, how to make it safely reenterable: keep in mind bh
> that may be already scheduled by connect_thread_func(). And if bh is
> already scheduled, can we cancel it? I'm not sure.
> 
> We have qemu_bh_delete(). But is it possible, that BH is near to be
> executed and already cannot be removed by qemu_bh_delete()? I don't
> know.
> 
> And if we can't safely drop the bh at any moment, we should wait in
> nbd_client_detach_aio_context until the scheduled bh enters the
> connection_co.. Or something like this

So I think it's not the reentry at this yield point per se which is
problematic, it's that that bh may have been scheduled before the
aio_context switch so once it runs it would wake up connection_co on the
old aio_context.  I think it may be possible to address by adding a
check into connect_bh().

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] block/nbd: decouple reconnect from drain
  2021-03-17  8:35 ` Vladimir Sementsov-Ogievskiy
@ 2021-03-26  8:07   ` Roman Kagan
  2021-04-07  7:45   ` Roman Kagan
  1 sibling, 0 replies; 30+ messages in thread
From: Roman Kagan @ 2021-03-26  8:07 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, yc-core

On Wed, Mar 17, 2021 at 11:35:31AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > This series aims to just stop messing with the drained section in the
> > reconnection code.
> > 
> > While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> > connection_co reentrance"); as I've missed the point of that commit I'd
> > appreciate more scrutiny in this area.
> > 
> > Roman Kagan (7):
> >    block/nbd: avoid touching freed connect_thread
> >    block/nbd: use uniformly nbd_client_connecting_wait
> >    block/nbd: assert attach/detach runs in the proper context
> >    block/nbd: transfer reconnection stuff across aio_context switch
> >    block/nbd: better document a case in nbd_co_establish_connection
> >    block/nbd: decouple reconnect from drain
> >    block/nbd: stop manipulating in_flight counter
> > 
> >   block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
> >   nbd/client.c |   2 -
> >   2 files changed, 86 insertions(+), 107 deletions(-)
> > 
> 
> 
> Hmm. The huge source of problems for this series is weird logic around
> drain and aio context switch in NBD driver.
> 
> Why do we have all these too complicated logic with abuse of in_flight
> counter in NBD? The answer is connection_co. NBD differs from other
> drivers, it has a coroutine independent of request coroutines. And we
> have to move this coroutine carefully to new aio context. We can't
> just enter it from the new context, we want to be sure that
> connection_co is in one of yield points that supports reentering.
> 
> I have an idea of how to avoid this thing: drop connection_co at all.
> 
> 1. nbd negotiation goes to connection thread and becomes independent
> of any aio context.
> 
> 2. waiting for server reply goes to request code. So, instead of
> reading the replay from socket always in connection_co, we read in the
> request coroutine, after sending the request. We'll need a CoMutex for
> it (as only one request coroutine should read from socket), and be
> prepared to coming reply is not for _this_ request (in this case we
> should wake another request and continue read from socket).
> 
> but this may be too much for soft freeze.

This approach does look appealing to me, and I gave it a quick shot but
the amount of changes this involves exceeds the rc tolerance indeed.

> Another idea:
> 
> You want all the requests be completed on drain_begin(), not
> cancelled. Actually, you don't need reconnect runnning during drained
> section for it. It should be enough just wait for all currenct
> requests before disabling the reconnect in drain_begin handler.

So effectively you suggest doing nbd's own drain within
bdrv_co_drain_begin callback.  I'm not totally sure there are no
assumptions this may break, but I'll try to look into this possibility.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] block/nbd: avoid touching freed connect_thread
  2021-03-15  6:06 ` [PATCH 1/7] block/nbd: avoid touching freed connect_thread Roman Kagan
  2021-03-15 15:40   ` Vladimir Sementsov-Ogievskiy
@ 2021-04-06 16:21   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-04-06 16:21 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel
  Cc: yc-core, Eric Blake, Max Reitz, Kevin Wolf, qemu-block

15.03.2021 09:06, Roman Kagan wrote:
> When the NBD connection is being torn down, the connection thread gets
> canceled and "detached", meaning it is about to get freed.
> 
> If this happens while the connection coroutine yielded waiting for the
> connection thread to complete, when it resumes it may access the
> invalidated connection thread data.
> 
> To prevent this, revalidate the ->connect_thread pointer in
> nbd_co_establish_connection_cancel before using after the the yield.
> 
> Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
> ---
>   block/nbd.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index c26dc5a54f..447d176b76 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -486,6 +486,15 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
>       s->wait_connect = true;
>       qemu_coroutine_yield();
>   
> +    /*
> +     * If nbd_co_establish_connection_cancel had a chance to run it may have
> +     * invalidated ->connect_thread.
> +     */
> +    thr = s->connect_thread;
> +    if (!thr) {
> +        return -ECONNABORTED;

nbd_co_establish_connection() tends to return -1 or 0, not -errno, so -1 is better here. Still it doesn't really matter.

> +    }
> +
>       qemu_mutex_lock(&thr->mutex);
>   
>       switch (thr->state) {
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] block/nbd: decouple reconnect from drain
  2021-03-17  8:35 ` Vladimir Sementsov-Ogievskiy
  2021-03-26  8:07   ` Roman Kagan
@ 2021-04-07  7:45   ` Roman Kagan
  2021-04-07 10:13     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 30+ messages in thread
From: Roman Kagan @ 2021-04-07  7:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, yc-core

On Wed, Mar 17, 2021 at 11:35:31AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > This series aims to just stop messing with the drained section in the
> > reconnection code.
> > 
> > While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> > connection_co reentrance"); as I've missed the point of that commit I'd
> > appreciate more scrutiny in this area.
> > 
> > Roman Kagan (7):
> >    block/nbd: avoid touching freed connect_thread
> >    block/nbd: use uniformly nbd_client_connecting_wait
> >    block/nbd: assert attach/detach runs in the proper context
> >    block/nbd: transfer reconnection stuff across aio_context switch
> >    block/nbd: better document a case in nbd_co_establish_connection
> >    block/nbd: decouple reconnect from drain
> >    block/nbd: stop manipulating in_flight counter
> > 
> >   block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
> >   nbd/client.c |   2 -
> >   2 files changed, 86 insertions(+), 107 deletions(-)
> > 
> 
> 
> Hmm. The huge source of problems for this series is weird logic around
> drain and aio context switch in NBD driver.
> 
> Why do we have all these too complicated logic with abuse of in_flight
> counter in NBD? The answer is connection_co. NBD differs from other
> drivers, it has a coroutine independent of request coroutines. And we
> have to move this coroutine carefully to new aio context. We can't
> just enter it from the new context, we want to be sure that
> connection_co is in one of yield points that supports reentering.
> 
> I have an idea of how to avoid this thing: drop connection_co at all.
> 
> 1. nbd negotiation goes to connection thread and becomes independent
> of any aio context.
> 
> 2. waiting for server reply goes to request code. So, instead of
> reading the replay from socket always in connection_co, we read in the
> request coroutine, after sending the request. We'll need a CoMutex for
> it (as only one request coroutine should read from socket), and be
> prepared to coming reply is not for _this_ request (in this case we
> should wake another request and continue read from socket).

The problem with this approach is that it would change the reconnect
behavior.

Currently connection_co purpose is three-fold:

1) receive the header of the server response, identify the request it
   pertains to, and wake the resective request coroutine

2) take on the responsibility to reestablish the connection when it's
   lost

3) monitor the idle connection and initiate the reconnect as soon as the
   connection is lost

Points 1 and 2 can be moved to the request coroutines indeed.  However I
don't see how to do 3 without an extra ever-running coroutine.
Sacrificing it would mean that a connection loss wouldn't be noticed and
the recovery wouldn't be attempted until a request arrived.

This change looks to me like a degradation compared to the current
state.

Roman.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] block/nbd: decouple reconnect from drain
  2021-04-07  7:45   ` Roman Kagan
@ 2021-04-07 10:13     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-04-07 10:13 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel, yc-core, Eric Blake, Max Reitz,
	Kevin Wolf, qemu-block

07.04.2021 10:45, Roman Kagan wrote:
> On Wed, Mar 17, 2021 at 11:35:31AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> 15.03.2021 09:06, Roman Kagan wrote:
>>> The reconnection logic doesn't need to stop while in a drained section.
>>> Moreover it has to be active during the drained section, as the requests
>>> that were caught in-flight with the connection to the server broken can
>>> only usefully get drained if the connection is restored.  Otherwise such
>>> requests can only either stall resulting in a deadlock (before
>>> 8c517de24a), or be aborted defeating the purpose of the reconnection
>>> machinery (after 8c517de24a).
>>>
>>> This series aims to just stop messing with the drained section in the
>>> reconnection code.
>>>
>>> While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
>>> connection_co reentrance"); as I've missed the point of that commit I'd
>>> appreciate more scrutiny in this area.
>>>
>>> Roman Kagan (7):
>>>     block/nbd: avoid touching freed connect_thread
>>>     block/nbd: use uniformly nbd_client_connecting_wait
>>>     block/nbd: assert attach/detach runs in the proper context
>>>     block/nbd: transfer reconnection stuff across aio_context switch
>>>     block/nbd: better document a case in nbd_co_establish_connection
>>>     block/nbd: decouple reconnect from drain
>>>     block/nbd: stop manipulating in_flight counter
>>>
>>>    block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
>>>    nbd/client.c |   2 -
>>>    2 files changed, 86 insertions(+), 107 deletions(-)
>>>
>>
>>
>> Hmm. The huge source of problems for this series is weird logic around
>> drain and aio context switch in NBD driver.
>>
>> Why do we have all these too complicated logic with abuse of in_flight
>> counter in NBD? The answer is connection_co. NBD differs from other
>> drivers, it has a coroutine independent of request coroutines. And we
>> have to move this coroutine carefully to new aio context. We can't
>> just enter it from the new context, we want to be sure that
>> connection_co is in one of yield points that supports reentering.
>>
>> I have an idea of how to avoid this thing: drop connection_co at all.
>>
>> 1. nbd negotiation goes to connection thread and becomes independent
>> of any aio context.
>>
>> 2. waiting for server reply goes to request code. So, instead of
>> reading the replay from socket always in connection_co, we read in the
>> request coroutine, after sending the request. We'll need a CoMutex for
>> it (as only one request coroutine should read from socket), and be
>> prepared to coming reply is not for _this_ request (in this case we
>> should wake another request and continue read from socket).
> 
> The problem with this approach is that it would change the reconnect
> behavior.
> 
> Currently connection_co purpose is three-fold:
> 
> 1) receive the header of the server response, identify the request it
>     pertains to, and wake the resective request coroutine
> 
> 2) take on the responsibility to reestablish the connection when it's
>     lost
> 
> 3) monitor the idle connection and initiate the reconnect as soon as the
>     connection is lost
> 
> Points 1 and 2 can be moved to the request coroutines indeed.  However I
> don't see how to do 3 without an extra ever-running coroutine.
> Sacrificing it would mean that a connection loss wouldn't be noticed and
> the recovery wouldn't be attempted until a request arrived.
> 
> This change looks to me like a degradation compared to the current
> state.
> 

For 3 we can check the connection by timeout:

  - getsockopt( .. SO_ERROR .. ), which could be done from bs aio context, or even from reconnect-thread context

  - or, we can create a PING request: just use some request with parameters for which we are sure that NBD server should do no action but report some expected error. We can create such request by timeout when there no more requests, just to check that connection still works.

Note, that neither of this (and nor current [3] which is just endless read from socket) will work only with keep-alive set, which is not by default for now.

Anyway I think first step is splitting connect-thread out of nbd.c which is overcomplicated now, I'm going to send a refactoring series for this.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-04-07 10:15 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15  6:06 [PATCH 0/7] block/nbd: decouple reconnect from drain Roman Kagan
2021-03-15  6:06 ` [PATCH 1/7] block/nbd: avoid touching freed connect_thread Roman Kagan
2021-03-15 15:40   ` Vladimir Sementsov-Ogievskiy
2021-03-16 15:29     ` Roman Kagan
2021-04-06 16:21   ` Vladimir Sementsov-Ogievskiy
2021-03-15  6:06 ` [PATCH 2/7] block/nbd: use uniformly nbd_client_connecting_wait Roman Kagan
2021-03-15 15:48   ` Vladimir Sementsov-Ogievskiy
2021-03-15  6:06 ` [PATCH 3/7] block/nbd: assert attach/detach runs in the proper context Roman Kagan
2021-03-15 16:41   ` Vladimir Sementsov-Ogievskiy
2021-03-15 19:57     ` Roman Kagan
2021-03-15  6:06 ` [PATCH 4/7] block/nbd: transfer reconnection stuff across aio_context switch Roman Kagan
2021-03-15  6:06 ` [PATCH 5/7] block/nbd: better document a case in nbd_co_establish_connection Roman Kagan
2021-03-15  6:06 ` [PATCH 6/7] block/nbd: decouple reconnect from drain Roman Kagan
2021-03-15 20:10   ` Vladimir Sementsov-Ogievskiy
2021-03-16 16:03     ` Roman Kagan
2021-03-16 18:09       ` Vladimir Sementsov-Ogievskiy
2021-03-26  6:16         ` Roman Kagan
2021-03-15  6:06 ` [PATCH 7/7] block/nbd: stop manipulating in_flight counter Roman Kagan
2021-03-15 20:15   ` Vladimir Sementsov-Ogievskiy
2021-03-16 16:08     ` Roman Kagan
2021-03-16 18:37       ` Vladimir Sementsov-Ogievskiy
2021-03-26  7:41         ` Roman Kagan
2021-03-15 19:45 ` [PATCH 0/7] block/nbd: decouple reconnect from drain Vladimir Sementsov-Ogievskiy
2021-03-16 15:52   ` Roman Kagan
2021-03-16 14:41 ` Eric Blake
2021-03-16 16:10   ` Roman Kagan
2021-03-17  8:35 ` Vladimir Sementsov-Ogievskiy
2021-03-26  8:07   ` Roman Kagan
2021-04-07  7:45   ` Roman Kagan
2021-04-07 10:13     ` Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).