All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v9 0/3] NBD reconnect
@ 2019-09-17 17:13 Vladimir Sementsov-Ogievskiy
  2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 1/3] qemu-coroutine-sleep: introduce qemu_co_sleep_wake Vladimir Sementsov-Ogievskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-17 17:13 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, qemu-devel, mreitz, stefanha, den

Hi all!
Here is NBD reconnect. Previously, if connection failed all current
and future requests will fail. After the series, nbd-client driver
will try to reconnect unlimited times. During first @reconnect-delay
seconds of reconnecting all requests will wait for the connection,
and if it is established requests will be resent. After
@reconnect-delay period all requests will be failed (until successful
reconnect).

v9:

01: - fix grammar [Eric] [1]
    - add "static" to qemu_co_sleep_ns__scheduled variable definition [Eric] [2]
    - add public typedef for internal QemuCoSleepState and use it [Kevin]
        [this leads to returning of co_sleep_cb, as we need wrapper with
         void * argument o pass to aio_timer_new][3]
    - rename function and add helper with old name and old semantics [Kevin]
    - keep Kevin's r-b, hope [1], [2] and [3] are not too much.
02: rebase on 01 changes
    some tiny improvements from Eric's review

Vladimir Sementsov-Ogievskiy (3):
  qemu-coroutine-sleep: introduce qemu_co_sleep_wake
  block/nbd: nbd reconnect
  iotests: test nbd reconnect

 include/qemu/coroutine.h      |  23 ++-
 block/nbd.c                   | 332 +++++++++++++++++++++++++++-------
 util/qemu-coroutine-sleep.c   |  51 ++++--
 tests/qemu-iotests/264        |  65 +++++++
 tests/qemu-iotests/264.out    |  12 ++
 tests/qemu-iotests/group      |   1 +
 tests/qemu-iotests/iotests.py |   4 +
 7 files changed, 410 insertions(+), 78 deletions(-)
 create mode 100755 tests/qemu-iotests/264
 create mode 100644 tests/qemu-iotests/264.out

-- 
2.21.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v9 1/3] qemu-coroutine-sleep: introduce qemu_co_sleep_wake
  2019-09-17 17:13 [Qemu-devel] [PATCH v9 0/3] NBD reconnect Vladimir Sementsov-Ogievskiy
@ 2019-09-17 17:13 ` Vladimir Sementsov-Ogievskiy
  2019-09-23 14:53   ` Eric Blake
  2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 2/3] block/nbd: nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-17 17:13 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, qemu-devel, mreitz, stefanha, den

Introduce a function to gracefully wake a coroutine sleeping in
qemu_co_sleep_ns().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/coroutine.h    | 23 +++++++++++++++--
 util/qemu-coroutine-sleep.c | 51 +++++++++++++++++++++++++++----------
 2 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 9801e7f5a4..8d55663062 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -273,10 +273,29 @@ void qemu_co_rwlock_wrlock(CoRwlock *lock);
  */
 void qemu_co_rwlock_unlock(CoRwlock *lock);
 
+typedef struct QemuCoSleepState QemuCoSleepState;
+
+/**
+ * Yield the coroutine for a given duration. During this yield, @sleep_state
+ * (if not NULL) is set to an opaque pointer, which may be used for
+ * qemu_co_sleep_wake(). Be careful, the pointer is set back to zero when the
+ * timer fires. Don't save the obtained value to other variables and don't call
+ * qemu_co_sleep_wake from another aio context.
+ */
+void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType type, int64_t ns,
+                                            QemuCoSleepState **sleep_state);
+static inline void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns)
+{
+    qemu_co_sleep_ns_wakeable(type, ns, NULL);
+}
+
 /**
- * Yield the coroutine for a given duration
+ * Wake a coroutine if it is sleeping in qemu_co_sleep_ns. The timer will be
+ * deleted. @sleep_state must be the variable whose address was given to
+ * qemu_co_sleep_ns() and should be checked to be non-NULL before calling
+ * qemu_co_sleep_wake().
  */
-void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns);
+void qemu_co_sleep_wake(QemuCoSleepState *sleep_state);
 
 /**
  * Yield until a file descriptor becomes readable
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 4bfdd30cbf..ae91b92b6e 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -17,31 +17,56 @@
 #include "qemu/timer.h"
 #include "block/aio.h"
 
-static void co_sleep_cb(void *opaque)
-{
-    Coroutine *co = opaque;
+static const char *qemu_co_sleep_ns__scheduled = "qemu_co_sleep_ns";
+
+struct QemuCoSleepState {
+    Coroutine *co;
+    QEMUTimer *ts;
+    QemuCoSleepState **user_state_pointer;
+};
 
+void qemu_co_sleep_wake(QemuCoSleepState *sleep_state)
+{
     /* Write of schedule protected by barrier write in aio_co_schedule */
-    atomic_set(&co->scheduled, NULL);
-    aio_co_wake(co);
+    const char *scheduled = atomic_cmpxchg(&sleep_state->co->scheduled,
+                                           qemu_co_sleep_ns__scheduled, NULL);
+
+    assert(scheduled == qemu_co_sleep_ns__scheduled);
+    if (sleep_state->user_state_pointer) {
+        *sleep_state->user_state_pointer = NULL;
+    }
+    timer_del(sleep_state->ts);
+    aio_co_wake(sleep_state->co);
 }
 
-void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns)
+static void co_sleep_cb(void *opaque)
+{
+    qemu_co_sleep_wake(opaque);
+}
+
+void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType type, int64_t ns,
+                                            QemuCoSleepState **sleep_state)
 {
     AioContext *ctx = qemu_get_current_aio_context();
-    QEMUTimer *ts;
-    Coroutine *co = qemu_coroutine_self();
+    QemuCoSleepState state = {
+        .co = qemu_coroutine_self(),
+        .ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &state),
+        .user_state_pointer = sleep_state,
+    };
 
-    const char *scheduled = atomic_cmpxchg(&co->scheduled, NULL, __func__);
+    const char *scheduled = atomic_cmpxchg(&state.co->scheduled, NULL,
+                                           qemu_co_sleep_ns__scheduled);
     if (scheduled) {
         fprintf(stderr,
                 "%s: Co-routine was already scheduled in '%s'\n",
                 __func__, scheduled);
         abort();
     }
-    ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, co);
-    timer_mod(ts, qemu_clock_get_ns(type) + ns);
+
+    if (sleep_state) {
+        *sleep_state = &state;
+    }
+    timer_mod(state.ts, qemu_clock_get_ns(type) + ns);
     qemu_coroutine_yield();
-    timer_del(ts);
-    timer_free(ts);
+    timer_free(state.ts);
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v9 2/3] block/nbd: nbd reconnect
  2019-09-17 17:13 [Qemu-devel] [PATCH v9 0/3] NBD reconnect Vladimir Sementsov-Ogievskiy
  2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 1/3] qemu-coroutine-sleep: introduce qemu_co_sleep_wake Vladimir Sementsov-Ogievskiy
@ 2019-09-17 17:13 ` Vladimir Sementsov-Ogievskiy
  2019-09-23 19:23   ` Eric Blake
  2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 3/3] iotests: test " Vladimir Sementsov-Ogievskiy
  2019-09-24  9:27 ` [PATCH v9 4/3] " Vladimir Sementsov-Ogievskiy
  3 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-17 17:13 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, qemu-devel, mreitz, stefanha, den

Implement reconnect. To achieve this:

1. add new modes:
   connecting-wait: means, that reconnecting is in progress, and there
     were small number of reconnect attempts, so all requests are
     waiting for the connection.
   connecting-nowait: reconnecting is in progress, there were a lot of
     attempts of reconnect, all requests will return errors.

   two old modes are used too:
   connected: normal state
   quit: exiting after fatal error or on close

Possible transitions are:

   * -> quit
   connecting-* -> connected
   connecting-wait -> connecting-nowait (transition is done after
                      reconnect-delay seconds in connecting-wait mode)
   connected -> connecting-wait

2. Implement reconnect in connection_co. So, in connecting-* mode,
    connection_co, tries to reconnect unlimited times.

3. Retry nbd queries on channel error, if we are in connecting-wait
    state.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd.c | 332 ++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 269 insertions(+), 63 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 813c40d8f0..bec01d85c7 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1,6 +1,7 @@
 /*
  * QEMU Block driver for  NBD
  *
+ * Copyright (c) 2019 Virtuozzo International GmbH.
  * Copyright (C) 2016 Red Hat, Inc.
  * Copyright (C) 2008 Bull S.A.S.
  *     Author: Laurent Vivier <Laurent.Vivier@bull.net>
@@ -55,6 +56,8 @@ typedef struct {
 } NBDClientRequest;
 
 typedef enum NBDClientState {
+    NBD_CLIENT_CONNECTING_WAIT,
+    NBD_CLIENT_CONNECTING_NOWAIT,
     NBD_CLIENT_CONNECTED,
     NBD_CLIENT_QUIT
 } NBDClientState;
@@ -67,8 +70,14 @@ typedef struct BDRVNBDState {
     CoMutex send_mutex;
     CoQueue free_sema;
     Coroutine *connection_co;
+    QemuCoSleepState *connection_co_sleep_ns_state;
+    bool drained;
+    bool wait_drained_end;
     int in_flight;
     NBDClientState state;
+    int connect_status;
+    Error *connect_err;
+    bool wait_in_flight;
 
     NBDClientRequest requests[MAX_NBD_REQUESTS];
     NBDReply reply;
@@ -83,10 +92,21 @@ typedef struct BDRVNBDState {
     char *x_dirty_bitmap;
 } BDRVNBDState;
 
-/* @ret will be used for reconnect in future */
+static int nbd_client_connect(BlockDriverState *bs, Error **errp);
+
 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
-    s->state = NBD_CLIENT_QUIT;
+    if (ret == -EIO) {
+        if (s->state == NBD_CLIENT_CONNECTED) {
+            s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
+                                            NBD_CLIENT_CONNECTING_NOWAIT;
+        }
+    } else {
+        if (s->state == NBD_CLIENT_CONNECTED) {
+            qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+        }
+        s->state = NBD_CLIENT_QUIT;
+    }
 }
 
 static void nbd_recv_coroutines_wake_all(BDRVNBDState *s)
@@ -129,7 +149,13 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs,
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
-    qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
+    /*
+     * s->connection_co is either yielded from nbd_receive_reply or from
+     * nbd_reconnect_loop()
+     */
+    if (s->state == NBD_CLIENT_CONNECTED) {
+        qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
+    }
 
     bdrv_inc_in_flight(bs);
 
@@ -140,24 +166,151 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs,
     aio_wait_bh_oneshot(new_context, nbd_client_attach_aio_context_bh, bs);
 }
 
+static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
+{
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
-static void nbd_teardown_connection(BlockDriverState *bs)
+    s->drained = true;
+    if (s->connection_co_sleep_ns_state) {
+        qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
+    }
+}
+
+static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
-    assert(s->ioc);
+    s->drained = false;
+    if (s->wait_drained_end) {
+        s->wait_drained_end = false;
+        aio_co_wake(s->connection_co);
+    }
+}
+
 
-    /* finish any pending coroutines */
-    qio_channel_shutdown(s->ioc,
-                         QIO_CHANNEL_SHUTDOWN_BOTH,
-                         NULL);
+static void nbd_teardown_connection(BlockDriverState *bs)
+{
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+
+    if (s->state == NBD_CLIENT_CONNECTED) {
+        /* finish any pending coroutines */
+        assert(s->ioc);
+        qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+    }
+    s->state = NBD_CLIENT_QUIT;
+    if (s->connection_co) {
+        if (s->connection_co_sleep_ns_state) {
+            qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
+        }
+    }
     BDRV_POLL_WHILE(bs, s->connection_co);
+}
 
-    nbd_client_detach_aio_context(bs);
-    object_unref(OBJECT(s->sioc));
-    s->sioc = NULL;
-    object_unref(OBJECT(s->ioc));
-    s->ioc = NULL;
+static bool nbd_client_connecting(BDRVNBDState *s)
+{
+    return s->state == NBD_CLIENT_CONNECTING_WAIT ||
+        s->state == NBD_CLIENT_CONNECTING_NOWAIT;
+}
+
+static bool nbd_client_connecting_wait(BDRVNBDState *s)
+{
+    return s->state == NBD_CLIENT_CONNECTING_WAIT;
+}
+
+static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
+{
+    Error *local_err = NULL;
+
+    if (!nbd_client_connecting(s)) {
+        return;
+    }
+
+    /* Wait for completion of all in-flight requests */
+
+    qemu_co_mutex_lock(&s->send_mutex);
+
+    while (s->in_flight > 0) {
+        qemu_co_mutex_unlock(&s->send_mutex);
+        nbd_recv_coroutines_wake_all(s);
+        s->wait_in_flight = true;
+        qemu_coroutine_yield();
+        s->wait_in_flight = false;
+        qemu_co_mutex_lock(&s->send_mutex);
+    }
+
+    qemu_co_mutex_unlock(&s->send_mutex);
+
+    if (!nbd_client_connecting(s)) {
+        return;
+    }
+
+    /*
+     * Now we are sure that nobody is accessing the channel, and no one will
+     * try until we set the state to CONNECTED.
+     */
+
+    /* Finalize previous connection if any */
+    if (s->ioc) {
+        nbd_client_detach_aio_context(s->bs);
+        object_unref(OBJECT(s->sioc));
+        s->sioc = NULL;
+        object_unref(OBJECT(s->ioc));
+        s->ioc = NULL;
+    }
+
+    s->connect_status = nbd_client_connect(s->bs, &local_err);
+    error_free(s->connect_err);
+    s->connect_err = NULL;
+    error_propagate(&s->connect_err, local_err);
+    local_err = NULL;
+
+    if (s->connect_status < 0) {
+        /* failed attempt */
+        return;
+    }
+
+    /* successfully connected */
+    s->state = NBD_CLIENT_CONNECTED;
+    qemu_co_queue_restart_all(&s->free_sema);
+}
+
+static coroutine_fn void nbd_reconnect_loop(BDRVNBDState *s)
+{
+    uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+    uint64_t delay_ns = s->reconnect_delay * NANOSECONDS_PER_SECOND;
+    uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
+    uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;
+
+    nbd_reconnect_attempt(s);
+
+    while (nbd_client_connecting(s)) {
+        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
+            qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
+        {
+            s->state = NBD_CLIENT_CONNECTING_NOWAIT;
+            qemu_co_queue_restart_all(&s->free_sema);
+        }
+
+        qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
+                                  &s->connection_co_sleep_ns_state);
+        if (s->drained) {
+            bdrv_dec_in_flight(s->bs);
+            s->wait_drained_end = true;
+            while (s->drained) {
+                /*
+                 * We may be entered once from nbd_client_attach_aio_context_bh
+                 * and then from nbd_client_co_drain_end. So here is a loop.
+                 */
+                qemu_coroutine_yield();
+            }
+            bdrv_inc_in_flight(s->bs);
+        }
+        if (timeout < max_timeout) {
+            timeout *= 2;
+        }
+
+        nbd_reconnect_attempt(s);
+    }
 }
 
 static coroutine_fn void nbd_connection_entry(void *opaque)
@@ -177,16 +330,26 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
          * Therefore we keep an additional in_flight reference all the time and
          * only drop it temporarily here.
          */
+
+        if (nbd_client_connecting(s)) {
+            nbd_reconnect_loop(s);
+        }
+
+        if (s->state != NBD_CLIENT_CONNECTED) {
+            continue;
+        }
+
         assert(s->reply.handle == 0);
         ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
 
         if (local_err) {
             trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
             error_free(local_err);
+            local_err = NULL;
         }
         if (ret <= 0) {
             nbd_channel_error(s, ret ? ret : -EIO);
-            break;
+            continue;
         }
 
         /*
@@ -201,7 +364,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
             (nbd_reply_is_structured(&s->reply) && !s->info.structured_reply))
         {
             nbd_channel_error(s, -EINVAL);
-            break;
+            continue;
         }
 
         /*
@@ -220,10 +383,19 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
         qemu_coroutine_yield();
     }
 
+    qemu_co_queue_restart_all(&s->free_sema);
     nbd_recv_coroutines_wake_all(s);
     bdrv_dec_in_flight(s->bs);
 
     s->connection_co = NULL;
+    if (s->ioc) {
+        nbd_client_detach_aio_context(s->bs);
+        object_unref(OBJECT(s->sioc));
+        s->sioc = NULL;
+        object_unref(OBJECT(s->ioc));
+        s->ioc = NULL;
+    }
+
     aio_wait_kick();
 }
 
@@ -235,7 +407,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
     int rc, i = -1;
 
     qemu_co_mutex_lock(&s->send_mutex);
-    while (s->in_flight == MAX_NBD_REQUESTS) {
+    while (s->in_flight == MAX_NBD_REQUESTS || nbd_client_connecting_wait(s)) {
         qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
     }
 
@@ -286,7 +458,11 @@ err:
             s->requests[i].coroutine = NULL;
             s->in_flight--;
         }
-        qemu_co_queue_next(&s->free_sema);
+        if (s->in_flight == 0 && s->wait_in_flight) {
+            aio_co_wake(s->connection_co);
+        } else {
+            qemu_co_queue_next(&s->free_sema);
+        }
     }
     qemu_co_mutex_unlock(&s->send_mutex);
     return rc;
@@ -666,10 +842,15 @@ static coroutine_fn int nbd_co_receive_one_chunk(
     } else {
         /* For assert at loop start in nbd_connection_entry */
         *reply = s->reply;
-        s->reply.handle = 0;
     }
+    s->reply.handle = 0;
 
-    if (s->connection_co) {
+    if (s->connection_co && !s->wait_in_flight) {
+        /*
+         * We must check s->wait_in_flight, because we may entered by
+         * nbd_recv_coroutines_wake_all(), in this case we should not
+         * wake connection_co here, it will woken by last request.
+         */
         aio_co_wake(s->connection_co);
     }
 
@@ -781,7 +962,11 @@ break_loop:
 
     qemu_co_mutex_lock(&s->send_mutex);
     s->in_flight--;
-    qemu_co_queue_next(&s->free_sema);
+    if (s->in_flight == 0 && s->wait_in_flight) {
+        aio_co_wake(s->connection_co);
+    } else {
+        qemu_co_queue_next(&s->free_sema);
+    }
     qemu_co_mutex_unlock(&s->send_mutex);
 
     return false;
@@ -927,20 +1112,26 @@ static int nbd_co_request(BlockDriverState *bs, NBDRequest *request,
     } else {
         assert(request->type != NBD_CMD_WRITE);
     }
-    ret = nbd_co_send_request(bs, request, write_qiov);
-    if (ret < 0) {
-        return ret;
-    }
 
-    ret = nbd_co_receive_return_code(s, request->handle,
-                                     &request_ret, &local_err);
-    if (local_err) {
-        trace_nbd_co_request_fail(request->from, request->len, request->handle,
-                                  request->flags, request->type,
-                                  nbd_cmd_lookup(request->type),
-                                  ret, error_get_pretty(local_err));
-        error_free(local_err);
-    }
+    do {
+        ret = nbd_co_send_request(bs, request, write_qiov);
+        if (ret < 0) {
+            continue;
+        }
+
+        ret = nbd_co_receive_return_code(s, request->handle,
+                                         &request_ret, &local_err);
+        if (local_err) {
+            trace_nbd_co_request_fail(request->from, request->len,
+                                      request->handle, request->flags,
+                                      request->type,
+                                      nbd_cmd_lookup(request->type),
+                                      ret, error_get_pretty(local_err));
+            error_free(local_err);
+            local_err = NULL;
+        }
+    } while (ret < 0 && nbd_client_connecting_wait(s));
+
     return ret ? ret : request_ret;
 }
 
@@ -981,20 +1172,24 @@ static int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
         request.len -= slop;
     }
 
-    ret = nbd_co_send_request(bs, &request, NULL);
-    if (ret < 0) {
-        return ret;
-    }
+    do {
+        ret = nbd_co_send_request(bs, &request, NULL);
+        if (ret < 0) {
+            continue;
+        }
+
+        ret = nbd_co_receive_cmdread_reply(s, request.handle, offset, qiov,
+                                           &request_ret, &local_err);
+        if (local_err) {
+            trace_nbd_co_request_fail(request.from, request.len, request.handle,
+                                      request.flags, request.type,
+                                      nbd_cmd_lookup(request.type),
+                                      ret, error_get_pretty(local_err));
+            error_free(local_err);
+            local_err = NULL;
+        }
+    } while (ret < 0 && nbd_client_connecting_wait(s));
 
-    ret = nbd_co_receive_cmdread_reply(s, request.handle, offset, qiov,
-                                       &request_ret, &local_err);
-    if (local_err) {
-        trace_nbd_co_request_fail(request.from, request.len, request.handle,
-                                  request.flags, request.type,
-                                  nbd_cmd_lookup(request.type),
-                                  ret, error_get_pretty(local_err));
-        error_free(local_err);
-    }
     return ret ? ret : request_ret;
 }
 
@@ -1131,20 +1326,25 @@ static int coroutine_fn nbd_client_co_block_status(
     if (s->info.min_block) {
         assert(QEMU_IS_ALIGNED(request.len, s->info.min_block));
     }
-    ret = nbd_co_send_request(bs, &request, NULL);
-    if (ret < 0) {
-        return ret;
-    }
+    do {
+        ret = nbd_co_send_request(bs, &request, NULL);
+        if (ret < 0) {
+            continue;
+        }
+
+        ret = nbd_co_receive_blockstatus_reply(s, request.handle, bytes,
+                                               &extent, &request_ret,
+                                               &local_err);
+        if (local_err) {
+            trace_nbd_co_request_fail(request.from, request.len, request.handle,
+                                      request.flags, request.type,
+                                      nbd_cmd_lookup(request.type),
+                                      ret, error_get_pretty(local_err));
+            error_free(local_err);
+            local_err = NULL;
+        }
+    } while (ret < 0 && nbd_client_connecting_wait(s));
 
-    ret = nbd_co_receive_blockstatus_reply(s, request.handle, bytes,
-                                           &extent, &request_ret, &local_err);
-    if (local_err) {
-        trace_nbd_co_request_fail(request.from, request.len, request.handle,
-                                  request.flags, request.type,
-                                  nbd_cmd_lookup(request.type),
-                                  ret, error_get_pretty(local_err));
-        error_free(local_err);
-    }
     if (ret < 0 || request_ret < 0) {
         return ret ? ret : request_ret;
     }
@@ -1163,9 +1363,9 @@ static void nbd_client_close(BlockDriverState *bs)
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
     NBDRequest request = { .type = NBD_CMD_DISC };
 
-    assert(s->ioc);
-
-    nbd_send_request(s->ioc, &request);
+    if (s->ioc) {
+        nbd_send_request(s->ioc, &request);
+    }
 
     nbd_teardown_connection(bs);
 }
@@ -1808,6 +2008,8 @@ static BlockDriver bdrv_nbd = {
     .bdrv_getlength             = nbd_getlength,
     .bdrv_detach_aio_context    = nbd_client_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_client_attach_aio_context,
+    .bdrv_co_drain_begin        = nbd_client_co_drain_begin,
+    .bdrv_co_drain_end          = nbd_client_co_drain_end,
     .bdrv_refresh_filename      = nbd_refresh_filename,
     .bdrv_co_block_status       = nbd_client_co_block_status,
     .bdrv_dirname               = nbd_dirname,
@@ -1830,6 +2032,8 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_getlength             = nbd_getlength,
     .bdrv_detach_aio_context    = nbd_client_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_client_attach_aio_context,
+    .bdrv_co_drain_begin        = nbd_client_co_drain_begin,
+    .bdrv_co_drain_end          = nbd_client_co_drain_end,
     .bdrv_refresh_filename      = nbd_refresh_filename,
     .bdrv_co_block_status       = nbd_client_co_block_status,
     .bdrv_dirname               = nbd_dirname,
@@ -1852,6 +2056,8 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_getlength             = nbd_getlength,
     .bdrv_detach_aio_context    = nbd_client_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_client_attach_aio_context,
+    .bdrv_co_drain_begin        = nbd_client_co_drain_begin,
+    .bdrv_co_drain_end          = nbd_client_co_drain_end,
     .bdrv_refresh_filename      = nbd_refresh_filename,
     .bdrv_co_block_status       = nbd_client_co_block_status,
     .bdrv_dirname               = nbd_dirname,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v9 3/3] iotests: test nbd reconnect
  2019-09-17 17:13 [Qemu-devel] [PATCH v9 0/3] NBD reconnect Vladimir Sementsov-Ogievskiy
  2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 1/3] qemu-coroutine-sleep: introduce qemu_co_sleep_wake Vladimir Sementsov-Ogievskiy
  2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 2/3] block/nbd: nbd reconnect Vladimir Sementsov-Ogievskiy
@ 2019-09-17 17:13 ` Vladimir Sementsov-Ogievskiy
  2019-09-23 19:51   ` Eric Blake
  2019-09-24  9:27 ` [PATCH v9 4/3] " Vladimir Sementsov-Ogievskiy
  3 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-17 17:13 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, qemu-devel, mreitz, stefanha, den

Add test, which starts backup to nbd target and restarts nbd server
during backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/264        | 65 +++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/264.out    | 12 +++++++
 tests/qemu-iotests/group      |  1 +
 tests/qemu-iotests/iotests.py |  4 +++
 4 files changed, 82 insertions(+)
 create mode 100755 tests/qemu-iotests/264
 create mode 100644 tests/qemu-iotests/264.out

diff --git a/tests/qemu-iotests/264 b/tests/qemu-iotests/264
new file mode 100755
index 0000000000..e70f91c5ca
--- /dev/null
+++ b/tests/qemu-iotests/264
@@ -0,0 +1,65 @@
+#!/usr/bin/env python
+#
+# Test nbd reconnect
+#
+# Copyright (c) 2019 Virtuozzo International GmbH. All rights reserved.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import time
+
+import iotests
+from iotests import qemu_img_create, file_path, qemu_nbd_popen, log
+
+disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
+
+qemu_img_create('-f', iotests.imgfmt, disk_a, '5M')
+qemu_img_create('-f', iotests.imgfmt, disk_b, '5M')
+srv = qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b)
+time.sleep(1)
+
+vm = iotests.VM().add_drive(disk_a)
+vm.launch()
+vm.hmp_qemu_io('drive0', 'write 0 5M')
+
+vm.qmp_log('blockdev-add', filters=[iotests.filter_qmp_testfiles],
+           **{'node_name': 'backup0',
+              'driver': 'raw',
+              'file': {'driver': 'nbd',
+                       'server': {'type': 'unix', 'path': nbd_sock},
+                       'reconnect-delay': 10}})
+vm.qmp_log('blockdev-backup', device='drive0', sync='full', target='backup0',
+           speed=(1 * 1024 * 1024))
+
+time.sleep(1)
+log('Kill NBD server')
+srv.kill()
+
+jobs = vm.qmp('query-block-jobs')['return']
+if jobs and jobs[0]['offset'] < jobs[0]['len']:
+    log('Backup job is still in progress')
+
+time.sleep(1)
+
+log('Start NBD server')
+srv = qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b)
+
+vm.qmp_log('block-job-set-speed', device='drive0', speed=0)
+e = vm.event_wait('BLOCK_JOB_COMPLETED')
+log('Backup completed: {}'.format(e['data']['offset']))
+
+vm.qmp_log('blockdev-del', node_name='backup0')
+srv.kill()
+vm.shutdown()
diff --git a/tests/qemu-iotests/264.out b/tests/qemu-iotests/264.out
new file mode 100644
index 0000000000..4a2f4aa509
--- /dev/null
+++ b/tests/qemu-iotests/264.out
@@ -0,0 +1,12 @@
+{"execute": "blockdev-add", "arguments": {"driver": "raw", "file": {"driver": "nbd", "reconnect-delay": 10, "server": {"path": "TEST_DIR/PID-nbd-sock", "type": "unix"}}, "node-name": "backup0"}}
+{"return": {}}
+{"execute": "blockdev-backup", "arguments": {"device": "drive0", "speed": 1048576, "sync": "full", "target": "backup0"}}
+{"return": {}}
+Kill NBD server
+Backup job is still in progress
+Start NBD server
+{"execute": "block-job-set-speed", "arguments": {"device": "drive0", "speed": 0}}
+{"return": {}}
+Backup completed: 5242880
+{"execute": "blockdev-del", "arguments": {"node-name": "backup0"}}
+{"return": {}}
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 5d3da937e4..4f6dd6f153 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -275,5 +275,6 @@
 258 rw quick
 262 rw quick migration
 263 rw quick
+264 rw quick
 265 rw auto quick
 266 rw quick
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b26271187c..a9c496dd7e 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -229,6 +229,10 @@ def qemu_nbd_early_pipe(*args):
     else:
         return exitcode, subp.communicate()[0]
 
+def qemu_nbd_popen(*args):
+    '''Run qemu-nbd in daemon mode and return the parent's exit code'''
+    return subprocess.Popen(qemu_nbd_args + ['--persistent'] + list(args))
+
 def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
     '''Return True if two image files are identical'''
     return qemu_img('compare', '-f', fmt1,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 1/3] qemu-coroutine-sleep: introduce qemu_co_sleep_wake
  2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 1/3] qemu-coroutine-sleep: introduce qemu_co_sleep_wake Vladimir Sementsov-Ogievskiy
@ 2019-09-23 14:53   ` Eric Blake
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Blake @ 2019-09-23 14:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, den, qemu-devel, stefanha, mreitz


[-- Attachment #1.1: Type: text/plain, Size: 647 bytes --]

On 9/17/19 12:13 PM, Vladimir Sementsov-Ogievskiy wrote:
> Introduce a function to gracefully wake a coroutine sleeping in
> qemu_co_sleep_ns().
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/coroutine.h    | 23 +++++++++++++++--
>  util/qemu-coroutine-sleep.c | 51 +++++++++++++++++++++++++++----------
>  2 files changed, 59 insertions(+), 15 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 2/3] block/nbd: nbd reconnect
  2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 2/3] block/nbd: nbd reconnect Vladimir Sementsov-Ogievskiy
@ 2019-09-23 19:23   ` Eric Blake
  2019-09-24  8:05     ` Vladimir Sementsov-Ogievskiy
  2019-10-04 17:29     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 14+ messages in thread
From: Eric Blake @ 2019-09-23 19:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, den, qemu-devel, stefanha, mreitz


[-- Attachment #1.1: Type: text/plain, Size: 6575 bytes --]

On 9/17/19 12:13 PM, Vladimir Sementsov-Ogievskiy wrote:
> Implement reconnect. To achieve this:
> 
> 1. add new modes:
>    connecting-wait: means, that reconnecting is in progress, and there
>      were small number of reconnect attempts, so all requests are
>      waiting for the connection.
>    connecting-nowait: reconnecting is in progress, there were a lot of
>      attempts of reconnect, all requests will return errors.
> 
>    two old modes are used too:
>    connected: normal state
>    quit: exiting after fatal error or on close
> 
> Possible transitions are:
> 
>    * -> quit
>    connecting-* -> connected
>    connecting-wait -> connecting-nowait (transition is done after
>                       reconnect-delay seconds in connecting-wait mode)
>    connected -> connecting-wait
> 
> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>     connection_co, tries to reconnect unlimited times.
> 
> 3. Retry nbd queries on channel error, if we are in connecting-wait
>     state.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd.c | 332 ++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 269 insertions(+), 63 deletions(-)
> 

> +
> +static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
> +{
> +    Error *local_err = NULL;
> +
> +    if (!nbd_client_connecting(s)) {
> +        return;
> +    }
> +
> +    /* Wait for completion of all in-flight requests */
> +
> +    qemu_co_mutex_lock(&s->send_mutex);
> +
> +    while (s->in_flight > 0) {
> +        qemu_co_mutex_unlock(&s->send_mutex);
> +        nbd_recv_coroutines_wake_all(s);
> +        s->wait_in_flight = true;
> +        qemu_coroutine_yield();
> +        s->wait_in_flight = false;
> +        qemu_co_mutex_lock(&s->send_mutex);
> +    }
> +
> +    qemu_co_mutex_unlock(&s->send_mutex);
> +
> +    if (!nbd_client_connecting(s)) {
> +        return;
> +    }
> +
> +    /*
> +     * Now we are sure that nobody is accessing the channel, and no one will
> +     * try until we set the state to CONNECTED.
> +     */
> +
> +    /* Finalize previous connection if any */
> +    if (s->ioc) {
> +        nbd_client_detach_aio_context(s->bs);
> +        object_unref(OBJECT(s->sioc));
> +        s->sioc = NULL;
> +        object_unref(OBJECT(s->ioc));
> +        s->ioc = NULL;
> +    }
> +
> +    s->connect_status = nbd_client_connect(s->bs, &local_err);
> +    error_free(s->connect_err);
> +    s->connect_err = NULL;
> +    error_propagate(&s->connect_err, local_err);
> +    local_err = NULL;
> +

Looks like a dead assignment to local_err.  But I see elsewhere you add
it, because you convert straight-line code into loops where it matters.

> +    if (s->connect_status < 0) {
> +        /* failed attempt */
> +        return;
> +    }
> +
> +    /* successfully connected */
> +    s->state = NBD_CLIENT_CONNECTED;
> +    qemu_co_queue_restart_all(&s->free_sema);
> +}
> +
> +static coroutine_fn void nbd_reconnect_loop(BDRVNBDState *s)

Coroutine functions should generally have '_co_' in their name.  I'd
prefer nbd_co_reconnect_loop.

> +{
> +    uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> +    uint64_t delay_ns = s->reconnect_delay * NANOSECONDS_PER_SECOND;
> +    uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
> +    uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;
> +
> +    nbd_reconnect_attempt(s);
> +
> +    while (nbd_client_connecting(s)) {
> +        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
> +            qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
> +        {
> +            s->state = NBD_CLIENT_CONNECTING_NOWAIT;
> +            qemu_co_queue_restart_all(&s->free_sema);
> +        }
> +
> +        qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
> +                                  &s->connection_co_sleep_ns_state);
> +        if (s->drained) {
> +            bdrv_dec_in_flight(s->bs);
> +            s->wait_drained_end = true;
> +            while (s->drained) {
> +                /*
> +                 * We may be entered once from nbd_client_attach_aio_context_bh
> +                 * and then from nbd_client_co_drain_end. So here is a loop.
> +                 */
> +                qemu_coroutine_yield();
> +            }
> +            bdrv_inc_in_flight(s->bs);
> +        }
> +        if (timeout < max_timeout) {
> +            timeout *= 2;
> +        }

Exponential backup, ok.  If I read the loop correctly, you've hardcoded
the max_timeout at 16s, which means the overall timeout is about 30s
when adding in the time of the earlier iterations.  Does that need to be
tunable?  But for now I can live with it.

> +
> +        nbd_reconnect_attempt(s);
> +    }
>  }
>  
>  static coroutine_fn void nbd_connection_entry(void *opaque)
> @@ -177,16 +330,26 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
>           * Therefore we keep an additional in_flight reference all the time and
>           * only drop it temporarily here.
>           */
> +
> +        if (nbd_client_connecting(s)) {
> +            nbd_reconnect_loop(s);
> +        }
> +
> +        if (s->state != NBD_CLIENT_CONNECTED) {
> +            continue;
> +        }

Is 'continue' right, even if s->state == QUIT?

> +
>          assert(s->reply.handle == 0);
>          ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
>  
>          if (local_err) {
>              trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
>              error_free(local_err);
> +            local_err = NULL;

Could be fun in concert with your proposal to get rid of local_err ;)
But here, we aren't using error_propagate().

>          }
>          if (ret <= 0) {
>              nbd_channel_error(s, ret ? ret : -EIO);
> -            break;
> +            continue;
>          }
>  
>          /*

We're getting really close.  If you can answer my questions above, and
the only thing left is adding _co_ in the function name, I could tweak
that locally to spare you a v10.  At any rate, I'm tentatively queuing
this on my NBD tree; I'll probably do a pull request today without it,
and save it for next week's PR after I've had a week to hammer on it in
local tests.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] iotests: test nbd reconnect
  2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 3/3] iotests: test " Vladimir Sementsov-Ogievskiy
@ 2019-09-23 19:51   ` Eric Blake
  2019-09-24  8:31     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Blake @ 2019-09-23 19:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, den, qemu-devel, stefanha, mreitz


[-- Attachment #1.1: Type: text/plain, Size: 4646 bytes --]

On 9/17/19 12:13 PM, Vladimir Sementsov-Ogievskiy wrote:
> Add test, which starts backup to nbd target and restarts nbd server
> during backup.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/264        | 65 +++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/264.out    | 12 +++++++
>  tests/qemu-iotests/group      |  1 +
>  tests/qemu-iotests/iotests.py |  4 +++
>  4 files changed, 82 insertions(+)
>  create mode 100755 tests/qemu-iotests/264
>  create mode 100644 tests/qemu-iotests/264.out
> 

> +import time
> +
> +import iotests
> +from iotests import qemu_img_create, file_path, qemu_nbd_popen, log
> +
> +disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
> +
> +qemu_img_create('-f', iotests.imgfmt, disk_a, '5M')
> +qemu_img_create('-f', iotests.imgfmt, disk_b, '5M')
> +srv = qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b)
> +time.sleep(1)
> +

Hard-coded sleep times can be problematic (too long, and the test is no
longer quick; too short, and heavy load can interfere); is there any way
to poll for a completion event rather than just waiting one second?

> +vm = iotests.VM().add_drive(disk_a)
> +vm.launch()
> +vm.hmp_qemu_io('drive0', 'write 0 5M')
> +
> +vm.qmp_log('blockdev-add', filters=[iotests.filter_qmp_testfiles],
> +           **{'node_name': 'backup0',
> +              'driver': 'raw',
> +              'file': {'driver': 'nbd',
> +                       'server': {'type': 'unix', 'path': nbd_sock},
> +                       'reconnect-delay': 10}})
> +vm.qmp_log('blockdev-backup', device='drive0', sync='full', target='backup0',
> +           speed=(1 * 1024 * 1024))

Throttling to make sure we are likely to still be going and thus test
the reconnect logic.

> +
> +time.sleep(1)
> +log('Kill NBD server')
> +srv.kill()

Again, that hard-coded sleep looks a bit risky.

> +
> +jobs = vm.qmp('query-block-jobs')['return']
> +if jobs and jobs[0]['offset'] < jobs[0]['len']:
> +    log('Backup job is still in progress')
> +
> +time.sleep(1)

Why do we need yet another sleep?

> +
> +log('Start NBD server')
> +srv = qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b)
> +
> +vm.qmp_log('block-job-set-speed', device='drive0', speed=0)

So here, you release the throttle to finish the job.

> +e = vm.event_wait('BLOCK_JOB_COMPLETED')
> +log('Backup completed: {}'.format(e['data']['offset']))
> +
> +vm.qmp_log('blockdev-del', node_name='backup0')
> +srv.kill()
> +vm.shutdown()
> diff --git a/tests/qemu-iotests/264.out b/tests/qemu-iotests/264.out
> new file mode 100644
> index 0000000000..4a2f4aa509
> --- /dev/null
> +++ b/tests/qemu-iotests/264.out
> @@ -0,0 +1,12 @@
> +{"execute": "blockdev-add", "arguments": {"driver": "raw", "file": {"driver": "nbd", "reconnect-delay": 10, "server": {"path": "TEST_DIR/PID-nbd-sock", "type": "unix"}}, "node-name": "backup0"}}
> +{"return": {}}
> +{"execute": "blockdev-backup", "arguments": {"device": "drive0", "speed": 1048576, "sync": "full", "target": "backup0"}}
> +{"return": {}}
> +Kill NBD server
> +Backup job is still in progress
> +Start NBD server
> +{"execute": "block-job-set-speed", "arguments": {"device": "drive0", "speed": 0}}
> +{"return": {}}
> +Backup completed: 5242880
> +{"execute": "blockdev-del", "arguments": {"node-name": "backup0"}}
> +{"return": {}}
> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index 5d3da937e4..4f6dd6f153 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -275,5 +275,6 @@
>  258 rw quick
>  262 rw quick migration
>  263 rw quick
> +264 rw quick

With that many hard-coded sleeps, is it still quick?

/me applies the patch and runs it...

264      pass       [14:49:55] [14:50:01]   6s

that's borderline enough that I would not call it quick.

>  
> +def qemu_nbd_popen(*args):
> +    '''Run qemu-nbd in daemon mode and return the parent's exit code'''
> +    return subprocess.Popen(qemu_nbd_args + ['--persistent'] + list(args))
> +

Should you also use a pid file here, and wait for the existence of the
pid file before returning (rather than hard-coding sleep(1))?

>  def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
>      '''Return True if two image files are identical'''
>      return qemu_img('compare', '-f', fmt1,
> 

At any rate,

Tested-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 2/3] block/nbd: nbd reconnect
  2019-09-23 19:23   ` Eric Blake
@ 2019-09-24  8:05     ` Vladimir Sementsov-Ogievskiy
  2019-10-04 17:29     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-24  8:05 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: kwolf, Denis Lunev, qemu-devel, stefanha, mreitz

23.09.2019 22:23, Eric Blake wrote:
> On 9/17/19 12:13 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Implement reconnect. To achieve this:
>>
>> 1. add new modes:
>>     connecting-wait: means, that reconnecting is in progress, and there
>>       were small number of reconnect attempts, so all requests are
>>       waiting for the connection.
>>     connecting-nowait: reconnecting is in progress, there were a lot of
>>       attempts of reconnect, all requests will return errors.
>>
>>     two old modes are used too:
>>     connected: normal state
>>     quit: exiting after fatal error or on close
>>
>> Possible transitions are:
>>
>>     * -> quit
>>     connecting-* -> connected
>>     connecting-wait -> connecting-nowait (transition is done after
>>                        reconnect-delay seconds in connecting-wait mode)
>>     connected -> connecting-wait
>>
>> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>>      connection_co, tries to reconnect unlimited times.
>>
>> 3. Retry nbd queries on channel error, if we are in connecting-wait
>>      state.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/nbd.c | 332 ++++++++++++++++++++++++++++++++++++++++++----------
>>   1 file changed, 269 insertions(+), 63 deletions(-)
>>
> 
>> +
>> +static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
>> +{
>> +    Error *local_err = NULL;
>> +
>> +    if (!nbd_client_connecting(s)) {
>> +        return;
>> +    }
>> +
>> +    /* Wait for completion of all in-flight requests */
>> +
>> +    qemu_co_mutex_lock(&s->send_mutex);
>> +
>> +    while (s->in_flight > 0) {
>> +        qemu_co_mutex_unlock(&s->send_mutex);
>> +        nbd_recv_coroutines_wake_all(s);
>> +        s->wait_in_flight = true;
>> +        qemu_coroutine_yield();
>> +        s->wait_in_flight = false;
>> +        qemu_co_mutex_lock(&s->send_mutex);
>> +    }
>> +
>> +    qemu_co_mutex_unlock(&s->send_mutex);
>> +
>> +    if (!nbd_client_connecting(s)) {
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * Now we are sure that nobody is accessing the channel, and no one will
>> +     * try until we set the state to CONNECTED.
>> +     */
>> +
>> +    /* Finalize previous connection if any */
>> +    if (s->ioc) {
>> +        nbd_client_detach_aio_context(s->bs);
>> +        object_unref(OBJECT(s->sioc));
>> +        s->sioc = NULL;
>> +        object_unref(OBJECT(s->ioc));
>> +        s->ioc = NULL;
>> +    }
>> +
>> +    s->connect_status = nbd_client_connect(s->bs, &local_err);
>> +    error_free(s->connect_err);
>> +    s->connect_err = NULL;
>> +    error_propagate(&s->connect_err, local_err);
>> +    local_err = NULL;
>> +
> 
> Looks like a dead assignment to local_err. 

Hmm, agree, it's dead.

> But I see elsewhere you add
> it, because you convert straight-line code into loops where it matters.
> 
>> +    if (s->connect_status < 0) {
>> +        /* failed attempt */
>> +        return;
>> +    }
>> +
>> +    /* successfully connected */
>> +    s->state = NBD_CLIENT_CONNECTED;
>> +    qemu_co_queue_restart_all(&s->free_sema);
>> +}
>> +
>> +static coroutine_fn void nbd_reconnect_loop(BDRVNBDState *s)
> 
> Coroutine functions should generally have '_co_' in their name.  I'd
> prefer nbd_co_reconnect_loop.

OK, agreed.

> 
>> +{
>> +    uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>> +    uint64_t delay_ns = s->reconnect_delay * NANOSECONDS_PER_SECOND;
>> +    uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
>> +    uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;
>> +
>> +    nbd_reconnect_attempt(s);
>> +
>> +    while (nbd_client_connecting(s)) {
>> +        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
>> +            qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
>> +        {
>> +            s->state = NBD_CLIENT_CONNECTING_NOWAIT;
>> +            qemu_co_queue_restart_all(&s->free_sema);
>> +        }
>> +
>> +        qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
>> +                                  &s->connection_co_sleep_ns_state);
>> +        if (s->drained) {
>> +            bdrv_dec_in_flight(s->bs);
>> +            s->wait_drained_end = true;
>> +            while (s->drained) {
>> +                /*
>> +                 * We may be entered once from nbd_client_attach_aio_context_bh
>> +                 * and then from nbd_client_co_drain_end. So here is a loop.
>> +                 */
>> +                qemu_coroutine_yield();
>> +            }
>> +            bdrv_inc_in_flight(s->bs);
>> +        }
>> +        if (timeout < max_timeout) {
>> +            timeout *= 2;
>> +        }
> 
> Exponential backup, ok.  If I read the loop correctly, you've hardcoded
> the max_timeout at 16s, which means the overall timeout is about 30s
> when adding in the time of the earlier iterations.  Does that need to be
> tunable?  But for now I can live with it.

I think, we can add an option later if needed.

> 
>> +
>> +        nbd_reconnect_attempt(s);
>> +    }
>>   }
>>   
>>   static coroutine_fn void nbd_connection_entry(void *opaque)
>> @@ -177,16 +330,26 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
>>            * Therefore we keep an additional in_flight reference all the time and
>>            * only drop it temporarily here.
>>            */
>> +
>> +        if (nbd_client_connecting(s)) {
>> +            nbd_reconnect_loop(s);
>> +        }
>> +
>> +        if (s->state != NBD_CLIENT_CONNECTED) {
>> +            continue;
>> +        }
> 
> Is 'continue' right, even if s->state == QUIT?

No matter, as we jump to "while (s->state != NBD_CLIENT_QUIT) {".

> 
>> +
>>           assert(s->reply.handle == 0);
>>           ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
>>   
>>           if (local_err) {
>>               trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
>>               error_free(local_err);
>> +            local_err = NULL;
> 
> Could be fun in concert with your proposal to get rid of local_err ;)
> But here, we aren't using error_propagate().

And we don't have Error **errp parameter here too.

> 
>>           }
>>           if (ret <= 0) {
>>               nbd_channel_error(s, ret ? ret : -EIO);
>> -            break;
>> +            continue;
>>           }
>>   
>>           /*
> 
> We're getting really close.  If you can answer my questions above, and
> the only thing left is adding _co_ in the function name, I could tweak
> that locally to spare you a v10.  At any rate, I'm tentatively queuing
> this on my NBD tree; I'll probably do a pull request today without it,
> and save it for next week's PR after I've had a week to hammer on it in
> local tests.
> 

Thank you! That's great!

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] iotests: test nbd reconnect
  2019-09-23 19:51   ` Eric Blake
@ 2019-09-24  8:31     ` Vladimir Sementsov-Ogievskiy
  2019-10-04 18:05       ` Eric Blake
  0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-24  8:31 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: kwolf, Denis Lunev, qemu-devel, stefanha, mreitz

23.09.2019 22:51, Eric Blake wrote:
> On 9/17/19 12:13 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Add test, which starts backup to nbd target and restarts nbd server
>> during backup.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/264        | 65 +++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/264.out    | 12 +++++++
>>   tests/qemu-iotests/group      |  1 +
>>   tests/qemu-iotests/iotests.py |  4 +++
>>   4 files changed, 82 insertions(+)
>>   create mode 100755 tests/qemu-iotests/264
>>   create mode 100644 tests/qemu-iotests/264.out
>>
> 
>> +import time
>> +
>> +import iotests
>> +from iotests import qemu_img_create, file_path, qemu_nbd_popen, log
>> +
>> +disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
>> +
>> +qemu_img_create('-f', iotests.imgfmt, disk_a, '5M')
>> +qemu_img_create('-f', iotests.imgfmt, disk_b, '5M')
>> +srv = qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b)
>> +time.sleep(1)
>> +
> 
> Hard-coded sleep times can be problematic (too long, and the test is no
> longer quick; too short, and heavy load can interfere); is there any way
> to poll for a completion event rather than just waiting one second?

Hmm, I think qemu-nbd don't send events.. Here I just want to give it a time
to start. Possibly we can enable tracing and wait for some trace-point, but I'm
not sure that this is good idea. And sleep is simpler.

> 
>> +vm = iotests.VM().add_drive(disk_a)
>> +vm.launch()
>> +vm.hmp_qemu_io('drive0', 'write 0 5M')
>> +
>> +vm.qmp_log('blockdev-add', filters=[iotests.filter_qmp_testfiles],
>> +           **{'node_name': 'backup0',
>> +              'driver': 'raw',
>> +              'file': {'driver': 'nbd',
>> +                       'server': {'type': 'unix', 'path': nbd_sock},
>> +                       'reconnect-delay': 10}})
>> +vm.qmp_log('blockdev-backup', device='drive0', sync='full', target='backup0',
>> +           speed=(1 * 1024 * 1024))
> 
> Throttling to make sure we are likely to still be going and thus test
> the reconnect logic.
> 
>> +
>> +time.sleep(1)
>> +log('Kill NBD server')
>> +srv.kill()
> 
> Again, that hard-coded sleep looks a bit risky.

Hmm, you are right it's definitely risky.. I forget, that in down-stream virtuozzo
branch this place is rewritten to be smarter, I'll resend updated test, replying
to this thread.

> 
>> +
>> +jobs = vm.qmp('query-block-jobs')['return']
>> +if jobs and jobs[0]['offset'] < jobs[0]['len']:
>> +    log('Backup job is still in progress')
>> +
>> +time.sleep(1)
> 
> Why do we need yet another sleep?

The idea is to emulate that server is down for some time. It may be better
to disable throttling before this sleep, to give backup more chance to create
requests during server down-time.

> 
>> +
>> +log('Start NBD server')
>> +srv = qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b)
>> +
>> +vm.qmp_log('block-job-set-speed', device='drive0', speed=0)
> 
> So here, you release the throttle to finish the job.
> 
>> +e = vm.event_wait('BLOCK_JOB_COMPLETED')
>> +log('Backup completed: {}'.format(e['data']['offset']))
>> +
>> +vm.qmp_log('blockdev-del', node_name='backup0')
>> +srv.kill()
>> +vm.shutdown()
>> diff --git a/tests/qemu-iotests/264.out b/tests/qemu-iotests/264.out
>> new file mode 100644
>> index 0000000000..4a2f4aa509
>> --- /dev/null
>> +++ b/tests/qemu-iotests/264.out
>> @@ -0,0 +1,12 @@
>> +{"execute": "blockdev-add", "arguments": {"driver": "raw", "file": {"driver": "nbd", "reconnect-delay": 10, "server": {"path": "TEST_DIR/PID-nbd-sock", "type": "unix"}}, "node-name": "backup0"}}
>> +{"return": {}}
>> +{"execute": "blockdev-backup", "arguments": {"device": "drive0", "speed": 1048576, "sync": "full", "target": "backup0"}}
>> +{"return": {}}
>> +Kill NBD server
>> +Backup job is still in progress
>> +Start NBD server
>> +{"execute": "block-job-set-speed", "arguments": {"device": "drive0", "speed": 0}}
>> +{"return": {}}
>> +Backup completed: 5242880
>> +{"execute": "blockdev-del", "arguments": {"node-name": "backup0"}}
>> +{"return": {}}
>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>> index 5d3da937e4..4f6dd6f153 100644
>> --- a/tests/qemu-iotests/group
>> +++ b/tests/qemu-iotests/group
>> @@ -275,5 +275,6 @@
>>   258 rw quick
>>   262 rw quick migration
>>   263 rw quick
>> +264 rw quick
> 
> With that many hard-coded sleeps, is it still quick?
> 
> /me applies the patch and runs it...
> 
> 264      pass       [14:49:55] [14:50:01]   6s
> 
> that's borderline enough that I would not call it quick.
> 
>>   
>> +def qemu_nbd_popen(*args):
>> +    '''Run qemu-nbd in daemon mode and return the parent's exit code'''
>> +    return subprocess.Popen(qemu_nbd_args + ['--persistent'] + list(args))
>> +
> 
> Should you also use a pid file here, and wait for the existence of the
> pid file before returning (rather than hard-coding sleep(1))?

What do you mean / how to do it?

We want to wait until listening socket is prepared..

> 
>>   def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
>>       '''Return True if two image files are identical'''
>>       return qemu_img('compare', '-f', fmt1,
>>
> 
> At any rate,
> 
> Tested-by: Eric Blake <eblake@redhat.com>
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v9 4/3] iotests: test nbd reconnect
  2019-09-17 17:13 [Qemu-devel] [PATCH v9 0/3] NBD reconnect Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 3/3] iotests: test " Vladimir Sementsov-Ogievskiy
@ 2019-09-24  9:27 ` Vladimir Sementsov-Ogievskiy
  3 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-24  9:27 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, qemu-devel, mreitz, stefanha, den

Add test, which starts backup to nbd target and restarts nbd server
during backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

Updated test: avoid race conditions + some improvements.
  - introduce qemu_io_silent_check and use it to wait for NBD server
  start
  - use size variable instead of '5M' in all places
  - use smart waiting loop instead of sleep, to wait for job progress
  - do srv.wait() after srv.kill(), to be sure that server is stopped
  - drop throttling earlier

  Now, test lasts 3 seconds for me. If it is wrong for you,
  be free to drop 'quick' tag.

 tests/qemu-iotests/264        | 95 +++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/264.out    | 13 +++++
 tests/qemu-iotests/group      |  1 +
 tests/qemu-iotests/iotests.py | 11 ++++
 4 files changed, 120 insertions(+)
 create mode 100755 tests/qemu-iotests/264
 create mode 100644 tests/qemu-iotests/264.out

diff --git a/tests/qemu-iotests/264 b/tests/qemu-iotests/264
new file mode 100755
index 0000000000..c8cd97ae2b
--- /dev/null
+++ b/tests/qemu-iotests/264
@@ -0,0 +1,95 @@
+#!/usr/bin/env python
+#
+# Test nbd reconnect
+#
+# Copyright (c) 2019 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import time
+
+import iotests
+from iotests import qemu_img_create, qemu_io_silent_check, file_path, \
+        qemu_nbd_popen, log
+
+disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
+nbd_uri = 'nbd+unix:///?socket=' + nbd_sock
+size = 5 * 1024 * 1024
+wait_limit = 3
+wait_step = 0.2
+
+qemu_img_create('-f', iotests.imgfmt, disk_a, str(size))
+qemu_img_create('-f', iotests.imgfmt, disk_b, str(size))
+srv = qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b)
+
+# Wait for NBD server availability
+t = 0
+ok = False
+while t < wait_limit:
+    ok = qemu_io_silent_check('-f', 'raw', '-c', 'read 0 512', nbd_uri)
+    if ok:
+        break
+    time.sleep(wait_step)
+    t += wait_step
+
+assert ok
+
+vm = iotests.VM().add_drive(disk_a)
+vm.launch()
+vm.hmp_qemu_io('drive0', 'write 0 {}'.format(size))
+
+vm.qmp_log('blockdev-add', filters=[iotests.filter_qmp_testfiles],
+           **{'node_name': 'backup0',
+              'driver': 'raw',
+              'file': {'driver': 'nbd',
+                       'server': {'type': 'unix', 'path': nbd_sock},
+                       'reconnect-delay': 10}})
+vm.qmp_log('blockdev-backup', device='drive0', sync='full', target='backup0',
+           speed=(1 * 1024 * 1024))
+
+# Wait for some progress
+t = 0
+while t < wait_limit:
+    jobs = vm.qmp('query-block-jobs')['return']
+    if jobs and jobs[0]['offset'] > 0:
+        break
+    time.sleep(wait_step)
+    t += wait_step
+
+if jobs and jobs[0]['offset'] > 0:
+    log('Backup job is started')
+
+log('Kill NBD server')
+srv.kill()
+srv.wait()
+
+jobs = vm.qmp('query-block-jobs')['return']
+if jobs and jobs[0]['offset'] < jobs[0]['len']:
+    log('Backup job is still in progress')
+
+vm.qmp_log('block-job-set-speed', device='drive0', speed=0)
+
+# Emulate server down time for 1 second
+time.sleep(1)
+
+log('Start NBD server')
+srv = qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b)
+
+e = vm.event_wait('BLOCK_JOB_COMPLETED')
+log('Backup completed: {}'.format(e['data']['offset']))
+
+vm.qmp_log('blockdev-del', node_name='backup0')
+srv.kill()
+vm.shutdown()
diff --git a/tests/qemu-iotests/264.out b/tests/qemu-iotests/264.out
new file mode 100644
index 0000000000..3000944b09
--- /dev/null
+++ b/tests/qemu-iotests/264.out
@@ -0,0 +1,13 @@
+{"execute": "blockdev-add", "arguments": {"driver": "raw", "file": {"driver": "nbd", "reconnect-delay": 10, "server": {"path": "TEST_DIR/PID-nbd-sock", "type": "unix"}}, "node-name": "backup0"}}
+{"return": {}}
+{"execute": "blockdev-backup", "arguments": {"device": "drive0", "speed": 1048576, "sync": "full", "target": "backup0"}}
+{"return": {}}
+Backup job is started
+Kill NBD server
+Backup job is still in progress
+{"execute": "block-job-set-speed", "arguments": {"device": "drive0", "speed": 0}}
+{"return": {}}
+Start NBD server
+Backup completed: 5242880
+{"execute": "blockdev-del", "arguments": {"node-name": "backup0"}}
+{"return": {}}
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 5d3da937e4..4f6dd6f153 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -275,5 +275,6 @@
 258 rw quick
 262 rw quick migration
 263 rw quick
+264 rw quick
 265 rw auto quick
 266 rw quick
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b26271187c..8912bd4661 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -164,6 +164,13 @@ def qemu_io_silent(*args):
                          (-exitcode, ' '.join(args)))
     return exitcode
 
+def qemu_io_silent_check(*args):
+    '''Run qemu-io and return the true if subprocess returned 0'''
+    args = qemu_io_args + list(args)
+    exitcode = subprocess.call(args, stdout=open('/dev/null', 'w'),
+                               stderr=subprocess.STDOUT)
+    return exitcode == 0
+
 def get_virtio_scsi_device():
     if qemu_default_machine == 's390-ccw-virtio':
         return 'virtio-scsi-ccw'
@@ -229,6 +236,10 @@ def qemu_nbd_early_pipe(*args):
     else:
         return exitcode, subp.communicate()[0]
 
+def qemu_nbd_popen(*args):
+    '''Run qemu-nbd in daemon mode and return the parent's exit code'''
+    return subprocess.Popen(qemu_nbd_args + ['--persistent'] + list(args))
+
 def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
     '''Return True if two image files are identical'''
     return qemu_img('compare', '-f', fmt1,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 2/3] block/nbd: nbd reconnect
  2019-09-23 19:23   ` Eric Blake
  2019-09-24  8:05     ` Vladimir Sementsov-Ogievskiy
@ 2019-10-04 17:29     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-04 17:29 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, Denis Lunev, qemu-block, qemu-devel, mreitz, stefanha

[-- Attachment #1: Type: text/plain, Size: 6595 bytes --]

23 Sep 2019 22:23  Eric Blake <eblake@redhat.com> wrote:

On 9/17/19 12:13 PM, Vladimir Sementsov-Ogievskiy wrote:
> Implement reconnect. To achieve this:
>
> 1. add new modes:
>    connecting-wait: means, that reconnecting is in progress, and there
>      were small number of reconnect attempts, so all requests are
>      waiting for the connection.
>    connecting-nowait: reconnecting is in progress, there were a lot of
>      attempts of reconnect, all requests will return errors.
>
>    two old modes are used too:
>    connected: normal state
>    quit: exiting after fatal error or on close
>
> Possible transitions are:
>
>    * -> quit
>    connecting-* -> connected
>    connecting-wait -> connecting-nowait (transition is done after
>                       reconnect-delay seconds in connecting-wait mode)
>    connected -> connecting-wait
>
> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>     connection_co, tries to reconnect unlimited times.
>
> 3. Retry nbd queries on channel error, if we are in connecting-wait
>     state.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd.c | 332 ++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 269 insertions(+), 63 deletions(-)
>

> +
> +static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
> +{
> +    Error *local_err = NULL;
> +
> +    if (!nbd_client_connecting(s)) {
> +        return;
> +    }
> +
> +    /* Wait for completion of all in-flight requests */
> +
> +    qemu_co_mutex_lock(&s->send_mutex);
> +
> +    while (s->in_flight > 0) {
> +        qemu_co_mutex_unlock(&s->send_mutex);
> +        nbd_recv_coroutines_wake_all(s);
> +        s->wait_in_flight = true;
> +        qemu_coroutine_yield();
> +        s->wait_in_flight = false;
> +        qemu_co_mutex_lock(&s->send_mutex);
> +    }
> +
> +    qemu_co_mutex_unlock(&s->send_mutex);
> +
> +    if (!nbd_client_connecting(s)) {
> +        return;
> +    }
> +
> +    /*
> +     * Now we are sure that nobody is accessing the channel, and no one will
> +     * try until we set the state to CONNECTED.
> +     */
> +
> +    /* Finalize previous connection if any */
> +    if (s->ioc) {
> +        nbd_client_detach_aio_context(s->bs);
> +        object_unref(OBJECT(s->sioc));
> +        s->sioc = NULL;
> +        object_unref(OBJECT(s->ioc));
> +        s->ioc = NULL;
> +    }
> +
> +    s->connect_status = nbd_client_connect(s->bs, &local_err);
> +    error_free(s->connect_err);
> +    s->connect_err = NULL;
> +    error_propagate(&s->connect_err, local_err);
> +    local_err = NULL;
> +

Looks like a dead assignment to local_err.  But I see elsewhere you add
it, because you convert straight-line code into loops where it matters.

> +    if (s->connect_status < 0) {
> +        /* failed attempt */
> +        return;
> +    }
> +
> +    /* successfully connected */
> +    s->state = NBD_CLIENT_CONNECTED;
> +    qemu_co_queue_restart_all(&s->free_sema);
> +}
> +
> +static coroutine_fn void nbd_reconnect_loop(BDRVNBDState *s)

Coroutine functions should generally have '_co_' in their name.  I'd
prefer nbd_co_reconnect_loop.

> +{
> +    uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> +    uint64_t delay_ns = s->reconnect_delay * NANOSECONDS_PER_SECOND;
> +    uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
> +    uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;
> +
> +    nbd_reconnect_attempt(s);
> +
> +    while (nbd_client_connecting(s)) {
> +        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
> +            qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
> +        {
> +            s->state = NBD_CLIENT_CONNECTING_NOWAIT;
> +            qemu_co_queue_restart_all(&s->free_sema);
> +        }
> +
> +        qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
> +                                  &s->connection_co_sleep_ns_state);
> +        if (s->drained) {
> +            bdrv_dec_in_flight(s->bs);
> +            s->wait_drained_end = true;
> +            while (s->drained) {
> +                /*
> +                 * We may be entered once from nbd_client_attach_aio_context_bh
> +                 * and then from nbd_client_co_drain_end. So here is a loop.
> +                 */
> +                qemu_coroutine_yield();
> +            }
> +            bdrv_inc_in_flight(s->bs);
> +        }
> +        if (timeout < max_timeout) {
> +            timeout *= 2;
> +        }

Exponential backup, ok.  If I read the loop correctly, you've hardcoded
the max_timeout at 16s, which means the overall timeout is about 30s
when adding in the time of the earlier iterations.  Does that need to be
tunable?  But for now I can live with it.

> +
> +        nbd_reconnect_attempt(s);
> +    }
>  }
>
>  static coroutine_fn void nbd_connection_entry(void *opaque)
> @@ -177,16 +330,26 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
>           * Therefore we keep an additional in_flight reference all the time and
>           * only drop it temporarily here.
>           */
> +
> +        if (nbd_client_connecting(s)) {
> +            nbd_reconnect_loop(s);
> +        }
> +
> +        if (s->state != NBD_CLIENT_CONNECTED) {
> +            continue;
> +        }

Is 'continue' right, even if s->state == QUIT?

> +
>          assert(s->reply.handle == 0);
>          ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
>
>          if (local_err) {
>              trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
>              error_free(local_err);
> +            local_err = NULL;

Could be fun in concert with your proposal to get rid of local_err ;)
But here, we aren't using error_propagate().

>          }
>          if (ret <= 0) {
>              nbd_channel_error(s, ret ? ret : -EIO);
> -            break;
> +            continue;
>          }
>
>          /*

We're getting really close.  If you can answer my questions above, and
the only thing left is adding _co_ in the function name, I could tweak
that locally to spare you a v10.  At any rate, I'm tentatively queuing
this on my NBD tree; I'll probably do a pull request today without it,
and save it for next week's PR after I've had a week to hammer on it in
local tests.


Kindly remind. Not a problem of you remember but didn't have time for it.

Best regards,
Vladimir

[-- Attachment #2: Type: text/html, Size: 13339 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] iotests: test nbd reconnect
  2019-09-24  8:31     ` Vladimir Sementsov-Ogievskiy
@ 2019-10-04 18:05       ` Eric Blake
  2019-10-07 10:48         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Blake @ 2019-10-04 18:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, Denis Lunev, qemu-devel, stefanha, mreitz

On 9/24/19 3:31 AM, Vladimir Sementsov-Ogievskiy wrote:

>>> +def qemu_nbd_popen(*args):
>>> +    '''Run qemu-nbd in daemon mode and return the parent's exit code'''
>>> +    return subprocess.Popen(qemu_nbd_args + ['--persistent'] + list(args))
>>> +
>>
>> Should you also use a pid file here, and wait for the existence of the
>> pid file before returning (rather than hard-coding sleep(1))?
> 
> What do you mean / how to do it?
> 
> We want to wait until listening socket is prepared..

In shell:

qemu-nbd --pid-file=/path/to/file ...
while [ ! -e /path/to/file ]; do
   sleep ... # fractional second, or exponential, or whatever...
done
# Now the listening socket is indeed prepared

You'd have to translate that idiom to python.

Or:

pre-open Unix socket at /path/to/socket
LISTEN_PID=... LISTEN_FDS=1 qemu-nbd ... 3<>/path/to/socket

Now the socket is pre-created and passed into qemu-nbd via systemd 
socket activation, so you know the listening socket is ready without 
having to do any loop at all.  Here's a patch in libnbd where we just 
switched from waiting for the port to appear (because the test predated 
qemu-nbd pidfile support) to instead using socket activation, for reference:
https://github.com/libguestfs/libnbd/commit/352331d177

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] iotests: test nbd reconnect
  2019-10-04 18:05       ` Eric Blake
@ 2019-10-07 10:48         ` Vladimir Sementsov-Ogievskiy
  2019-10-07 20:03           ` Eric Blake
  0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-07 10:48 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: kwolf, Denis Lunev, qemu-devel, stefanha, mreitz

04.10.2019 21:05, Eric Blake wrote:
> On 9/24/19 3:31 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
>>>> +def qemu_nbd_popen(*args):
>>>> +    '''Run qemu-nbd in daemon mode and return the parent's exit code'''
>>>> +    return subprocess.Popen(qemu_nbd_args + ['--persistent'] + list(args))
>>>> +
>>>
>>> Should you also use a pid file here, and wait for the existence of the
>>> pid file before returning (rather than hard-coding sleep(1))?
>>
>> What do you mean / how to do it?
>>
>> We want to wait until listening socket is prepared..
> 
> In shell:
> 
> qemu-nbd --pid-file=/path/to/file ...
> while [ ! -e /path/to/file ]; do
>    sleep ... # fractional second, or exponential, or whatever...
> done
> # Now the listening socket is indeed prepared
> 
> You'd have to translate that idiom to python.

Don't see, how it is better than what I've done in 04.. But I can resend with this.
At least, the fact that socket is initialized before creating pid file is undocumented..

> 
> Or:
> 
> pre-open Unix socket at /path/to/socket
> LISTEN_PID=... LISTEN_FDS=1 qemu-nbd ... 3<>/path/to/socket
> 
> Now the socket is pre-created and passed into qemu-nbd via systemd socket activation, so you know the listening socket is ready without having to do any loop at all.  Here's a patch in libnbd where we just switched from waiting for the port to appear (because the test predated qemu-nbd pidfile support) to instead using socket activation, for reference:
> https://github.com/libguestfs/libnbd/commit/352331d177
> 

Hmm, I'm afraid I need more help in it, I don't know socket activation and googling for some time didn't help.
How to pre-open the socket? How to set LISTEN_PID? Looking at the code I see that activation path failed if
LISTEN_PID != getpid().. So I need to know qemu-nbd pid before starting it? o_O

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] iotests: test nbd reconnect
  2019-10-07 10:48         ` Vladimir Sementsov-Ogievskiy
@ 2019-10-07 20:03           ` Eric Blake
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Blake @ 2019-10-07 20:03 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, Denis Lunev, qemu-devel, stefanha, mreitz

On 10/7/19 5:48 AM, Vladimir Sementsov-Ogievskiy wrote:

>>> We want to wait until listening socket is prepared..
>>
>> In shell:
>>
>> qemu-nbd --pid-file=/path/to/file ...
>> while [ ! -e /path/to/file ]; do
>>     sleep ... # fractional second, or exponential, or whatever...
>> done
>> # Now the listening socket is indeed prepared
>>
>> You'd have to translate that idiom to python.
> 
> Don't see, how it is better than what I've done in 04.. But I can resend with this.
> At least, the fact that socket is initialized before creating pid file is undocumented..

I just posted a patch to rectify that.

> 
>>
>> Or:
>>
>> pre-open Unix socket at /path/to/socket
>> LISTEN_PID=... LISTEN_FDS=1 qemu-nbd ... 3<>/path/to/socket
>>
>> Now the socket is pre-created and passed into qemu-nbd via systemd socket activation, so you know the listening socket is ready without having to do any loop at all.  Here's a patch in libnbd where we just switched from waiting for the port to appear (because the test predated qemu-nbd pidfile support) to instead using socket activation, for reference:
>> https://github.com/libguestfs/libnbd/commit/352331d177
>>
> 
> Hmm, I'm afraid I need more help in it, I don't know socket activation and googling for some time didn't help.
> How to pre-open the socket? How to set LISTEN_PID? Looking at the code I see that activation path failed if
> LISTEN_PID != getpid().. So I need to know qemu-nbd pid before starting it? o_O
> 

I'm not sure if a shell can open a Unix socket as a server.  The shell 
_does_ have the 'exec' builtin, such that you can manipulate the 
environment in shell and then use exec to guarantee the qemu-nbd process 
has the same pid as the shell process that just manipulated the 
environment; but it doesn't help unless you can also pass in the Unix 
socket pre-bound, and I'm not aware of command line tools that make that 
easy (maybe nc can do something like it, but then you have to wonder if 
nc is adding yet another layer of bounce-buffering).  But in other 
languages (C, Python, ...) you create a Unix socket, call bind() on it, 
then call fork().  In the parent process, you then close the just-bound 
fd, and call connect() on the socket - the connect will block until the 
child process uses the socket, but you are guaranteed that it won't fail 
because the server side is already bound.  In the child process, you use 
dup2() to move the fd to 3 and clear O_CLOEXEC, manipulate the 
environment to set LISTEN_PID to the current process id and LISTEN_FDS 
to 1, then exec.  The child process then has the correct id to realize 
that it has been handed a pre-bound socket, and skips any code that it 
would normally do to create the socket and bind it.

Note that setting LISTEN_PID in the child process after a fork() from a 
multi-threaded parent is _extremely_ difficult to do correctly (setenv() 
is not async-signal-safe, and anything that is not async-signal-safe can 
cause deadlock if invoked between fork() and exec() if the parent 
process was multi-threaded) - so it may even be easier to do a 
double-exec() - the first exec is to a shim to get rid of the 
async-signal-safe restrictions, and can then set LISTEN_PID without 
issues, and the second exec to the actual target.  But I don't know of 
any such shim designed for common use.

That said, socket activation isn't a necessity to use, just a 
convenience.  I don't care if the regression test uses socket 
activation, as long as it works at testing nbd reconnect.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-10-07 20:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-17 17:13 [Qemu-devel] [PATCH v9 0/3] NBD reconnect Vladimir Sementsov-Ogievskiy
2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 1/3] qemu-coroutine-sleep: introduce qemu_co_sleep_wake Vladimir Sementsov-Ogievskiy
2019-09-23 14:53   ` Eric Blake
2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 2/3] block/nbd: nbd reconnect Vladimir Sementsov-Ogievskiy
2019-09-23 19:23   ` Eric Blake
2019-09-24  8:05     ` Vladimir Sementsov-Ogievskiy
2019-10-04 17:29     ` Vladimir Sementsov-Ogievskiy
2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 3/3] iotests: test " Vladimir Sementsov-Ogievskiy
2019-09-23 19:51   ` Eric Blake
2019-09-24  8:31     ` Vladimir Sementsov-Ogievskiy
2019-10-04 18:05       ` Eric Blake
2019-10-07 10:48         ` Vladimir Sementsov-Ogievskiy
2019-10-07 20:03           ` Eric Blake
2019-09-24  9:27 ` [PATCH v9 4/3] " Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.