All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/10] NBD reconnect
@ 2018-07-31 17:30 Vladimir Sementsov-Ogievskiy
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 01/10] block/nbd-client: split channel errors from export errors Vladimir Sementsov-Ogievskiy
                   ` (12 more replies)
  0 siblings, 13 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

Hi all.

Here is NBD reconnect. Previously, if connection failed all current
and future requests will fail. After the series, nbd-client driver
will try to reconnect unlimited times. During first @reconnect-delay
seconds of reconnecting all requests will wait for the connection,
and if it is established requests will be resent. After
@reconnect-delay period all requests will be failed (until successful
reconnect).

v4: - add Eric's r-b to 01.
    - drop CONNECTING_INIT mode, don't reconnect on _open.
    - new api: only one parameter @reconnect-delay
    - new interval scheme between reconnect attempts
        (1 - 2 - 4 - 8 - 16 - 16 ... seconds)
    - fixes and refactorings in main patch (09), including merge with
      old 08 patch
    

v3:
06: fix build error in function 'nbd_co_send_request':
     error: 'i' may be used uninitialized in this function

v2 notes:
Here is v2 of NBD reconnect, but it is very very different from v1, so,
forget about v1.
The series includes my "NBD reconnect: preliminary refactoring", with
changes in 05: leave asserts (Eric).

Vladimir Sementsov-Ogievskiy (10):
  block/nbd-client: split channel errors from export errors
  block/nbd: move connection code from block/nbd to block/nbd-client
  block/nbd-client: split connection from initialization
  block/nbd-client: fix nbd_reply_chunk_iter_receive
  block/nbd-client: don't check ioc
  block/nbd-client: move from quit to state
  block/nbd-client: rename read_reply_co to connection_co
  block/nbd: add cmdline and qapi parameter reconnect-delay
  block/nbd-client: nbd reconnect
  iotests: test nbd reconnect

 qapi/block-core.json          |  12 +-
 block/nbd-client.h            |  20 +-
 block/nbd-client.c            | 515 +++++++++++++++++++++++++++++++-----------
 block/nbd.c                   |  56 ++---
 tests/qemu-iotests/220        |  67 ++++++
 tests/qemu-iotests/220.out    |   7 +
 tests/qemu-iotests/group      |   1 +
 tests/qemu-iotests/iotests.py |   4 +
 8 files changed, 512 insertions(+), 170 deletions(-)
 create mode 100755 tests/qemu-iotests/220
 create mode 100644 tests/qemu-iotests/220.out

-- 
2.11.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v4 01/10] block/nbd-client: split channel errors from export errors
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
@ 2018-07-31 17:30 ` Vladimir Sementsov-Ogievskiy
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 02/10] block/nbd: move connection code from block/nbd to block/nbd-client Vladimir Sementsov-Ogievskiy
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

To implement nbd reconnect in further patches, we need to distinguish
error codes, returned by nbd server, from channel errors, to reconnect
only in the latter case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/nbd-client.c | 83 +++++++++++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 36 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 9686ecbd5e..1efc3d36cc 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -501,11 +501,11 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
  */
 static coroutine_fn int nbd_co_receive_one_chunk(
         NBDClientSession *s, uint64_t handle, bool only_structured,
-        QEMUIOVector *qiov, NBDReply *reply, void **payload, Error **errp)
+        int *request_ret, QEMUIOVector *qiov, NBDReply *reply, void **payload,
+        Error **errp)
 {
-    int request_ret;
     int ret = nbd_co_do_receive_one_chunk(s, handle, only_structured,
-                                          &request_ret, qiov, payload, errp);
+                                          request_ret, qiov, payload, errp);
 
     if (ret < 0) {
         s->quit = true;
@@ -515,7 +515,6 @@ static coroutine_fn int nbd_co_receive_one_chunk(
             *reply = s->reply;
         }
         s->reply.handle = 0;
-        ret = request_ret;
     }
 
     if (s->read_reply_co) {
@@ -527,22 +526,17 @@ static coroutine_fn int nbd_co_receive_one_chunk(
 
 typedef struct NBDReplyChunkIter {
     int ret;
-    bool fatal;
+    int request_ret;
     Error *err;
     bool done, only_structured;
 } NBDReplyChunkIter;
 
-static void nbd_iter_error(NBDReplyChunkIter *iter, bool fatal,
-                           int ret, Error **local_err)
+static void nbd_iter_channel_error(NBDReplyChunkIter *iter,
+                                   int ret, Error **local_err)
 {
     assert(ret < 0);
 
-    if ((fatal && !iter->fatal) || iter->ret == 0) {
-        if (iter->ret != 0) {
-            error_free(iter->err);
-            iter->err = NULL;
-        }
-        iter->fatal = fatal;
+    if (!iter->ret) {
         iter->ret = ret;
         error_propagate(&iter->err, *local_err);
     } else {
@@ -552,6 +546,15 @@ static void nbd_iter_error(NBDReplyChunkIter *iter, bool fatal,
     *local_err = NULL;
 }
 
+static void nbd_iter_request_error(NBDReplyChunkIter *iter, int ret)
+{
+    assert(ret < 0);
+
+    if (!iter->request_ret) {
+        iter->request_ret = ret;
+    }
+}
+
 /* NBD_FOREACH_REPLY_CHUNK
  */
 #define NBD_FOREACH_REPLY_CHUNK(s, iter, handle, structured, \
@@ -567,13 +570,13 @@ static bool nbd_reply_chunk_iter_receive(NBDClientSession *s,
                                          QEMUIOVector *qiov, NBDReply *reply,
                                          void **payload)
 {
-    int ret;
+    int ret, request_ret;
     NBDReply local_reply;
     NBDStructuredReplyChunk *chunk;
     Error *local_err = NULL;
     if (s->quit) {
         error_setg(&local_err, "Connection closed");
-        nbd_iter_error(iter, true, -EIO, &local_err);
+        nbd_iter_channel_error(iter, -EIO, &local_err);
         goto break_loop;
     }
 
@@ -587,10 +590,12 @@ static bool nbd_reply_chunk_iter_receive(NBDClientSession *s,
     }
 
     ret = nbd_co_receive_one_chunk(s, handle, iter->only_structured,
-                                   qiov, reply, payload, &local_err);
+                                   &request_ret, qiov, reply, payload,
+                                   &local_err);
     if (ret < 0) {
-        /* If it is a fatal error s->quit is set by nbd_co_receive_one_chunk */
-        nbd_iter_error(iter, s->quit, ret, &local_err);
+        nbd_iter_channel_error(iter, ret, &local_err);
+    } else if (request_ret < 0) {
+        nbd_iter_request_error(iter, request_ret);
     }
 
     /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
@@ -627,7 +632,7 @@ break_loop:
 }
 
 static int nbd_co_receive_return_code(NBDClientSession *s, uint64_t handle,
-                                      Error **errp)
+                                      int *request_ret, Error **errp)
 {
     NBDReplyChunkIter iter;
 
@@ -636,12 +641,13 @@ static int nbd_co_receive_return_code(NBDClientSession *s, uint64_t handle,
     }
 
     error_propagate(errp, iter.err);
+    *request_ret = iter.request_ret;
     return iter.ret;
 }
 
 static int nbd_co_receive_cmdread_reply(NBDClientSession *s, uint64_t handle,
                                         uint64_t offset, QEMUIOVector *qiov,
-                                        Error **errp)
+                                        int *request_ret, Error **errp)
 {
     NBDReplyChunkIter iter;
     NBDReply reply;
@@ -666,7 +672,7 @@ static int nbd_co_receive_cmdread_reply(NBDClientSession *s, uint64_t handle,
                                                 offset, qiov, &local_err);
             if (ret < 0) {
                 s->quit = true;
-                nbd_iter_error(&iter, true, ret, &local_err);
+                nbd_iter_channel_error(&iter, ret, &local_err);
             }
             break;
         default:
@@ -676,7 +682,7 @@ static int nbd_co_receive_cmdread_reply(NBDClientSession *s, uint64_t handle,
                 error_setg(&local_err,
                            "Unexpected reply type: %d (%s) for CMD_READ",
                            chunk->type, nbd_reply_type_lookup(chunk->type));
-                nbd_iter_error(&iter, true, -EINVAL, &local_err);
+                nbd_iter_channel_error(&iter, -EINVAL, &local_err);
             }
         }
 
@@ -685,12 +691,14 @@ static int nbd_co_receive_cmdread_reply(NBDClientSession *s, uint64_t handle,
     }
 
     error_propagate(errp, iter.err);
+    *request_ret = iter.request_ret;
     return iter.ret;
 }
 
 static int nbd_co_receive_blockstatus_reply(NBDClientSession *s,
                                             uint64_t handle, uint64_t length,
-                                            NBDExtent *extent, Error **errp)
+                                            NBDExtent *extent,
+                                            int *request_ret, Error **errp)
 {
     NBDReplyChunkIter iter;
     NBDReply reply;
@@ -712,7 +720,7 @@ static int nbd_co_receive_blockstatus_reply(NBDClientSession *s,
             if (received) {
                 s->quit = true;
                 error_setg(&local_err, "Several BLOCK_STATUS chunks in reply");
-                nbd_iter_error(&iter, true, -EINVAL, &local_err);
+                nbd_iter_channel_error(&iter, -EINVAL, &local_err);
             }
             received = true;
 
@@ -721,7 +729,7 @@ static int nbd_co_receive_blockstatus_reply(NBDClientSession *s,
                                                 &local_err);
             if (ret < 0) {
                 s->quit = true;
-                nbd_iter_error(&iter, true, ret, &local_err);
+                nbd_iter_channel_error(&iter, ret, &local_err);
             }
             break;
         default:
@@ -731,7 +739,7 @@ static int nbd_co_receive_blockstatus_reply(NBDClientSession *s,
                            "Unexpected reply type: %d (%s) "
                            "for CMD_BLOCK_STATUS",
                            chunk->type, nbd_reply_type_lookup(chunk->type));
-                nbd_iter_error(&iter, true, -EINVAL, &local_err);
+                nbd_iter_channel_error(&iter, -EINVAL, &local_err);
             }
         }
 
@@ -746,14 +754,16 @@ static int nbd_co_receive_blockstatus_reply(NBDClientSession *s,
             iter.ret = -EIO;
         }
     }
+
     error_propagate(errp, iter.err);
+    *request_ret = iter.request_ret;
     return iter.ret;
 }
 
 static int nbd_co_request(BlockDriverState *bs, NBDRequest *request,
                           QEMUIOVector *write_qiov)
 {
-    int ret;
+    int ret, request_ret;
     Error *local_err = NULL;
     NBDClientSession *client = nbd_get_client_session(bs);
 
@@ -769,17 +779,18 @@ static int nbd_co_request(BlockDriverState *bs, NBDRequest *request,
         return ret;
     }
 
-    ret = nbd_co_receive_return_code(client, request->handle, &local_err);
+    ret = nbd_co_receive_return_code(client, request->handle,
+                                     &request_ret, &local_err);
     if (local_err) {
         error_report_err(local_err);
     }
-    return ret;
+    return ret ? ret : request_ret;
 }
 
 int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
                          uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
-    int ret;
+    int ret, request_ret;
     Error *local_err = NULL;
     NBDClientSession *client = nbd_get_client_session(bs);
     NBDRequest request = {
@@ -800,11 +811,11 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
     }
 
     ret = nbd_co_receive_cmdread_reply(client, request.handle, offset, qiov,
-                                       &local_err);
+                                       &request_ret, &local_err);
     if (local_err) {
         error_report_err(local_err);
     }
-    return ret;
+    return ret ? ret : request_ret;
 }
 
 int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
@@ -898,7 +909,7 @@ int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
                                             int64_t *pnum, int64_t *map,
                                             BlockDriverState **file)
 {
-    int64_t ret;
+    int ret, request_ret;
     NBDExtent extent = { 0 };
     NBDClientSession *client = nbd_get_client_session(bs);
     Error *local_err = NULL;
@@ -923,12 +934,12 @@ int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
     }
 
     ret = nbd_co_receive_blockstatus_reply(client, request.handle, bytes,
-                                           &extent, &local_err);
+                                           &extent, &request_ret, &local_err);
     if (local_err) {
         error_report_err(local_err);
     }
-    if (ret < 0) {
-        return ret;
+    if (ret < 0 || request_ret < 0) {
+        return ret ? ret : request_ret;
     }
 
     assert(extent.length);
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v4 02/10] block/nbd: move connection code from block/nbd to block/nbd-client
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 01/10] block/nbd-client: split channel errors from export errors Vladimir Sementsov-Ogievskiy
@ 2018-07-31 17:30 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 15:56   ` Eric Blake
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 03/10] block/nbd-client: split connection from initialization Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

Keep all connection code in one file, to be able to implement reconnect
in further patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.h |  2 +-
 block/nbd-client.c | 37 +++++++++++++++++++++++++++++++++++--
 block/nbd.c        | 40 ++--------------------------------------
 3 files changed, 38 insertions(+), 41 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index cfc90550b9..2f047ba614 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -41,7 +41,7 @@ typedef struct NBDClientSession {
 NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
 
 int nbd_client_init(BlockDriverState *bs,
-                    QIOChannelSocket *sock,
+                    SocketAddress *saddr,
                     const char *export_name,
                     QCryptoTLSCreds *tlscreds,
                     const char *hostname,
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 1efc3d36cc..a0d8f2523e 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -976,8 +976,31 @@ void nbd_client_close(BlockDriverState *bs)
     nbd_teardown_connection(bs);
 }
 
+static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
+                                                  Error **errp)
+{
+    QIOChannelSocket *sioc;
+    Error *local_err = NULL;
+
+    sioc = qio_channel_socket_new();
+    qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client");
+
+    qio_channel_socket_connect_sync(sioc,
+                                    saddr,
+                                    &local_err);
+    if (local_err) {
+        object_unref(OBJECT(sioc));
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    qio_channel_set_delay(QIO_CHANNEL(sioc), false);
+
+    return sioc;
+}
+
 int nbd_client_init(BlockDriverState *bs,
-                    QIOChannelSocket *sioc,
+                    SocketAddress *saddr,
                     const char *export,
                     QCryptoTLSCreds *tlscreds,
                     const char *hostname,
@@ -987,6 +1010,15 @@ int nbd_client_init(BlockDriverState *bs,
     NBDClientSession *client = nbd_get_client_session(bs);
     int ret;
 
+    /* establish TCP connection, return error if it fails
+     * TODO: Configurable retry-until-timeout behaviour.
+     */
+    QIOChannelSocket *sioc = nbd_establish_connection(saddr, errp);
+
+    if (!sioc) {
+        return -ECONNREFUSED;
+    }
+
     /* NBD handshake */
     logout("session init %s\n", export);
     qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
@@ -1001,12 +1033,14 @@ int nbd_client_init(BlockDriverState *bs,
     g_free(client->info.x_dirty_bitmap);
     if (ret < 0) {
         logout("Failed to negotiate with the NBD server\n");
+        object_unref(OBJECT(sioc));
         return ret;
     }
     if (client->info.flags & NBD_FLAG_READ_ONLY &&
         !bdrv_is_read_only(bs)) {
         error_setg(errp,
                    "request for write access conflicts with read-only export");
+        object_unref(OBJECT(sioc));
         return -EACCES;
     }
     if (client->info.flags & NBD_FLAG_SEND_FUA) {
@@ -1020,7 +1054,6 @@ int nbd_client_init(BlockDriverState *bs,
     qemu_co_mutex_init(&client->send_mutex);
     qemu_co_queue_init(&client->free_sema);
     client->sioc = sioc;
-    object_ref(OBJECT(client->sioc));
 
     if (!client->ioc) {
         client->ioc = QIO_CHANNEL(sioc);
diff --git a/block/nbd.c b/block/nbd.c
index e87699fb73..9db5eded89 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -295,30 +295,6 @@ NBDClientSession *nbd_get_client_session(BlockDriverState *bs)
     return &s->client;
 }
 
-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-                                                  Error **errp)
-{
-    QIOChannelSocket *sioc;
-    Error *local_err = NULL;
-
-    sioc = qio_channel_socket_new();
-    qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client");
-
-    qio_channel_socket_connect_sync(sioc,
-                                    saddr,
-                                    &local_err);
-    if (local_err) {
-        object_unref(OBJECT(sioc));
-        error_propagate(errp, local_err);
-        return NULL;
-    }
-
-    qio_channel_set_delay(QIO_CHANNEL(sioc), false);
-
-    return sioc;
-}
-
-
 static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, Error **errp)
 {
     Object *obj;
@@ -394,7 +370,6 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
     BDRVNBDState *s = bs->opaque;
     QemuOpts *opts = NULL;
     Error *local_err = NULL;
-    QIOChannelSocket *sioc = NULL;
     QCryptoTLSCreds *tlscreds = NULL;
     const char *hostname = NULL;
     int ret = -EINVAL;
@@ -434,22 +409,11 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
         hostname = s->saddr->u.inet.host;
     }
 
-    /* establish TCP connection, return error if it fails
-     * TODO: Configurable retry-until-timeout behaviour.
-     */
-    sioc = nbd_establish_connection(s->saddr, errp);
-    if (!sioc) {
-        ret = -ECONNREFUSED;
-        goto error;
-    }
-
     /* NBD handshake */
-    ret = nbd_client_init(bs, sioc, s->export, tlscreds, hostname,
+    ret = nbd_client_init(bs, s->saddr, s->export, tlscreds, hostname,
                           qemu_opt_get(opts, "x-dirty-bitmap"), errp);
+
  error:
-    if (sioc) {
-        object_unref(OBJECT(sioc));
-    }
     if (tlscreds) {
         object_unref(OBJECT(tlscreds));
     }
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v4 03/10] block/nbd-client: split connection from initialization
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 01/10] block/nbd-client: split channel errors from export errors Vladimir Sementsov-Ogievskiy
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 02/10] block/nbd: move connection code from block/nbd to block/nbd-client Vladimir Sementsov-Ogievskiy
@ 2018-07-31 17:30 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 15:52   ` Eric Blake
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 04/10] block/nbd-client: fix nbd_reply_chunk_iter_receive Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

Split connection code to reuse it for reconnect.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index a0d8f2523e..a1813cbfe1 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -999,13 +999,13 @@ static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
     return sioc;
 }
 
-int nbd_client_init(BlockDriverState *bs,
-                    SocketAddress *saddr,
-                    const char *export,
-                    QCryptoTLSCreds *tlscreds,
-                    const char *hostname,
-                    const char *x_dirty_bitmap,
-                    Error **errp)
+static int nbd_client_connect(BlockDriverState *bs,
+                              SocketAddress *saddr,
+                              const char *export,
+                              QCryptoTLSCreds *tlscreds,
+                              const char *hostname,
+                              const char *x_dirty_bitmap,
+                              Error **errp)
 {
     NBDClientSession *client = nbd_get_client_session(bs);
     int ret;
@@ -1051,8 +1051,6 @@ int nbd_client_init(BlockDriverState *bs,
         bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP;
     }
 
-    qemu_co_mutex_init(&client->send_mutex);
-    qemu_co_queue_init(&client->free_sema);
     client->sioc = sioc;
 
     if (!client->ioc) {
@@ -1069,3 +1067,20 @@ int nbd_client_init(BlockDriverState *bs,
     logout("Established connection with NBD server\n");
     return 0;
 }
+
+int nbd_client_init(BlockDriverState *bs,
+                    SocketAddress *saddr,
+                    const char *export,
+                    QCryptoTLSCreds *tlscreds,
+                    const char *hostname,
+                    const char *x_dirty_bitmap,
+                    Error **errp)
+{
+    NBDClientSession *client = nbd_get_client_session(bs);
+
+    qemu_co_mutex_init(&client->send_mutex);
+    qemu_co_queue_init(&client->free_sema);
+
+    return nbd_client_connect(bs, saddr, export, tlscreds, hostname,
+                              x_dirty_bitmap, errp);
+}
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v4 04/10] block/nbd-client: fix nbd_reply_chunk_iter_receive
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 03/10] block/nbd-client: split connection from initialization Vladimir Sementsov-Ogievskiy
@ 2018-07-31 17:30 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 16:01   ` Eric Blake
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 05/10] block/nbd-client: don't check ioc Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

Use exported report, not the variable to be reused (should not really
matter).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index a1813cbfe1..263d1721f9 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -599,7 +599,7 @@ static bool nbd_reply_chunk_iter_receive(NBDClientSession *s,
     }
 
     /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
-    if (nbd_reply_is_simple(&s->reply) || s->quit) {
+    if (nbd_reply_is_simple(reply) || s->quit) {
         goto break_loop;
     }
 
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v4 05/10] block/nbd-client: don't check ioc
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 04/10] block/nbd-client: fix nbd_reply_chunk_iter_receive Vladimir Sementsov-Ogievskiy
@ 2018-07-31 17:30 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 16:05   ` Eric Blake
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 06/10] block/nbd-client: move from quit to state Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

We have several paranoiac checks for ioc != NULL. But ioc may become
NULL only on close, which should not happen during requests handling.
Also, we check ioc only sometimes, not after each yield, which is
inconsistent. Let's drop these checks. However, for safety, lets leave
asserts instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 263d1721f9..7eaf0149f0 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -51,9 +51,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 {
     NBDClientSession *client = nbd_get_client_session(bs);
 
-    if (!client->ioc) { /* Already closed */
-        return;
-    }
+    assert(client->ioc);
 
     /* finish any pending coroutines */
     qio_channel_shutdown(client->ioc,
@@ -150,10 +148,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
         rc = -EIO;
         goto err;
     }
-    if (!s->ioc) {
-        rc = -EPIPE;
-        goto err;
-    }
+    assert(s->ioc);
 
     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
@@ -426,10 +421,11 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
     s->requests[i].receiving = true;
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
-    if (!s->ioc || s->quit) {
+    if (s->quit) {
         error_setg(errp, "Connection closed");
         return -EIO;
     }
+    assert(s->ioc);
 
     assert(s->reply.handle == handle);
 
@@ -967,9 +963,7 @@ void nbd_client_close(BlockDriverState *bs)
     NBDClientSession *client = nbd_get_client_session(bs);
     NBDRequest request = { .type = NBD_CMD_DISC };
 
-    if (client->ioc == NULL) {
-        return;
-    }
+    assert(client->ioc);
 
     nbd_send_request(client->ioc, &request);
 
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v4 06/10] block/nbd-client: move from quit to state
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 05/10] block/nbd-client: don't check ioc Vladimir Sementsov-Ogievskiy
@ 2018-07-31 17:30 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 16:25   ` Eric Blake
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 07/10] block/nbd-client: rename read_reply_co to connection_co Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

To implement reconnect we need several states for the client:
CONNECTED, QUIT and two CONNECTING states. CONNECTING states will
be realized in the following patches. This patch implements CONNECTED
and QUIT.

QUIT means, that we should close the connection and fail all current
and further requests (like old quit = true).

CONNECTED means that connection is ok, we can send requests (like old
quit = false).

For receiving loop we use a comparison of the current state with QUIT,
because reconnect will be in the same loop, so it should be looping
until the end.

Opposite, for requests we use a comparison of the current state with
CONNECTED, as we don't want to send requests in CONNECTING states (
which are unreachable now, but will be reachable after the following
commits)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.h |  9 ++++++++-
 block/nbd-client.c | 55 ++++++++++++++++++++++++++++++++----------------------
 2 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 2f047ba614..5367425774 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -23,6 +23,13 @@ typedef struct {
     bool receiving;         /* waiting for read_reply_co? */
 } NBDClientRequest;
 
+typedef enum NBDClientState {
+    NBD_CLIENT_CONNECTING_WAIT,
+    NBD_CLIENT_CONNECTING_NOWAIT,
+    NBD_CLIENT_CONNECTED,
+    NBD_CLIENT_QUIT
+} NBDClientState;
+
 typedef struct NBDClientSession {
     QIOChannelSocket *sioc; /* The master data channel */
     QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
@@ -32,10 +39,10 @@ typedef struct NBDClientSession {
     CoQueue free_sema;
     Coroutine *read_reply_co;
     int in_flight;
+    NBDClientState state;
 
     NBDClientRequest requests[MAX_NBD_REQUESTS];
     NBDReply reply;
-    bool quit;
 } NBDClientSession;
 
 NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 7eaf0149f0..a91fd3ea3e 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -34,6 +34,12 @@
 #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
 #define INDEX_TO_HANDLE(bs, index)  ((index)  ^ (uint64_t)(intptr_t)(bs))
 
+/* @ret would be used for reconnect in future */
+static void nbd_channel_error(NBDClientSession *s, int ret)
+{
+    s->state = NBD_CLIENT_QUIT;
+}
+
 static void nbd_recv_coroutines_wake_all(NBDClientSession *s)
 {
     int i;
@@ -73,14 +79,15 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
     int ret = 0;
     Error *local_err = NULL;
 
-    while (!s->quit) {
+    while (s->state != NBD_CLIENT_QUIT) {
         assert(s->reply.handle == 0);
         ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
         if (local_err) {
             error_report_err(local_err);
         }
         if (ret <= 0) {
-            break;
+            nbd_channel_error(s, ret ? ret : -EIO);
+            continue;
         }
 
         /* There's no need for a mutex on the receive side, because the
@@ -93,7 +100,8 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
             !s->requests[i].receiving ||
             (nbd_reply_is_structured(&s->reply) && !s->info.structured_reply))
         {
-            break;
+            nbd_channel_error(s, -EINVAL);
+            continue;
         }
 
         /* We're woken up again by the request itself.  Note that there
@@ -111,7 +119,6 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
         qemu_coroutine_yield();
     }
 
-    s->quit = true;
     nbd_recv_coroutines_wake_all(s);
     s->read_reply_co = NULL;
 }
@@ -121,12 +128,18 @@ static int nbd_co_send_request(BlockDriverState *bs,
                                QEMUIOVector *qiov)
 {
     NBDClientSession *s = nbd_get_client_session(bs);
-    int rc, i;
+    int rc, i = -1;
 
     qemu_co_mutex_lock(&s->send_mutex);
     while (s->in_flight == MAX_NBD_REQUESTS) {
         qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
     }
+
+    if (s->state != NBD_CLIENT_CONNECTED) {
+        rc = -EIO;
+        goto err;
+    }
+
     s->in_flight++;
 
     for (i = 0; i < MAX_NBD_REQUESTS; i++) {
@@ -144,16 +157,12 @@ static int nbd_co_send_request(BlockDriverState *bs,
 
     request->handle = INDEX_TO_HANDLE(s, i);
 
-    if (s->quit) {
-        rc = -EIO;
-        goto err;
-    }
     assert(s->ioc);
 
     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
         rc = nbd_send_request(s->ioc, request);
-        if (rc >= 0 && !s->quit) {
+        if (rc >= 0 && s->state == NBD_CLIENT_CONNECTED) {
             if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                        NULL) < 0) {
                 rc = -EIO;
@@ -168,9 +177,11 @@ static int nbd_co_send_request(BlockDriverState *bs,
 
 err:
     if (rc < 0) {
-        s->quit = true;
-        s->requests[i].coroutine = NULL;
-        s->in_flight--;
+        nbd_channel_error(s, rc);
+        if (i != -1) {
+            s->requests[i].coroutine = NULL;
+            s->in_flight--;
+        }
         qemu_co_queue_next(&s->free_sema);
     }
     qemu_co_mutex_unlock(&s->send_mutex);
@@ -421,7 +432,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
     s->requests[i].receiving = true;
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
-    if (s->quit) {
+    if (s->state != NBD_CLIENT_CONNECTED) {
         error_setg(errp, "Connection closed");
         return -EIO;
     }
@@ -504,7 +515,7 @@ static coroutine_fn int nbd_co_receive_one_chunk(
                                           request_ret, qiov, payload, errp);
 
     if (ret < 0) {
-        s->quit = true;
+        nbd_channel_error(s, ret);
     } else {
         /* For assert at loop start in nbd_read_reply_entry */
         if (reply) {
@@ -570,7 +581,7 @@ static bool nbd_reply_chunk_iter_receive(NBDClientSession *s,
     NBDReply local_reply;
     NBDStructuredReplyChunk *chunk;
     Error *local_err = NULL;
-    if (s->quit) {
+    if (s->state != NBD_CLIENT_CONNECTED) {
         error_setg(&local_err, "Connection closed");
         nbd_iter_channel_error(iter, -EIO, &local_err);
         goto break_loop;
@@ -595,7 +606,7 @@ static bool nbd_reply_chunk_iter_receive(NBDClientSession *s,
     }
 
     /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
-    if (nbd_reply_is_simple(reply) || s->quit) {
+    if (nbd_reply_is_simple(reply) || s->state != NBD_CLIENT_CONNECTED) {
         goto break_loop;
     }
 
@@ -667,14 +678,14 @@ static int nbd_co_receive_cmdread_reply(NBDClientSession *s, uint64_t handle,
             ret = nbd_parse_offset_hole_payload(&reply.structured, payload,
                                                 offset, qiov, &local_err);
             if (ret < 0) {
-                s->quit = true;
+                nbd_channel_error(s, ret);
                 nbd_iter_channel_error(&iter, ret, &local_err);
             }
             break;
         default:
             if (!nbd_reply_type_is_error(chunk->type)) {
                 /* not allowed reply type */
-                s->quit = true;
+                nbd_channel_error(s, -EINVAL);
                 error_setg(&local_err,
                            "Unexpected reply type: %d (%s) for CMD_READ",
                            chunk->type, nbd_reply_type_lookup(chunk->type));
@@ -714,7 +725,7 @@ static int nbd_co_receive_blockstatus_reply(NBDClientSession *s,
         switch (chunk->type) {
         case NBD_REPLY_TYPE_BLOCK_STATUS:
             if (received) {
-                s->quit = true;
+                nbd_channel_error(s, -EINVAL);
                 error_setg(&local_err, "Several BLOCK_STATUS chunks in reply");
                 nbd_iter_channel_error(&iter, -EINVAL, &local_err);
             }
@@ -724,13 +735,13 @@ static int nbd_co_receive_blockstatus_reply(NBDClientSession *s,
                                                 payload, length, extent,
                                                 &local_err);
             if (ret < 0) {
-                s->quit = true;
+                nbd_channel_error(s, ret);
                 nbd_iter_channel_error(&iter, ret, &local_err);
             }
             break;
         default:
             if (!nbd_reply_type_is_error(chunk->type)) {
-                s->quit = true;
+                nbd_channel_error(s, -EINVAL);
                 error_setg(&local_err,
                            "Unexpected reply type: %d (%s) "
                            "for CMD_BLOCK_STATUS",
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v4 07/10] block/nbd-client: rename read_reply_co to connection_co
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 06/10] block/nbd-client: move from quit to state Vladimir Sementsov-Ogievskiy
@ 2018-07-31 17:30 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 16:35   ` Eric Blake
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

This coroutine will serve nbd reconnects, so, rename it to be something
more generic.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.h |  4 ++--
 block/nbd-client.c | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 5367425774..e7bda4d3c4 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -20,7 +20,7 @@
 typedef struct {
     Coroutine *coroutine;
     uint64_t offset;        /* original offset of the request */
-    bool receiving;         /* waiting for read_reply_co? */
+    bool receiving;         /* waiting for connection_co? */
 } NBDClientRequest;
 
 typedef enum NBDClientState {
@@ -37,7 +37,7 @@ typedef struct NBDClientSession {
 
     CoMutex send_mutex;
     CoQueue free_sema;
-    Coroutine *read_reply_co;
+    Coroutine *connection_co;
     int in_flight;
     NBDClientState state;
 
diff --git a/block/nbd-client.c b/block/nbd-client.c
index a91fd3ea3e..c184a9f0e9 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -63,7 +63,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
     qio_channel_shutdown(client->ioc,
                          QIO_CHANNEL_SHUTDOWN_BOTH,
                          NULL);
-    BDRV_POLL_WHILE(bs, client->read_reply_co);
+    BDRV_POLL_WHILE(bs, client->connection_co);
 
     nbd_client_detach_aio_context(bs);
     object_unref(OBJECT(client->sioc));
@@ -72,7 +72,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
     client->ioc = NULL;
 }
 
-static coroutine_fn void nbd_read_reply_entry(void *opaque)
+static coroutine_fn void nbd_connection_entry(void *opaque)
 {
     NBDClientSession *s = opaque;
     uint64_t i;
@@ -105,14 +105,14 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
         }
 
         /* We're woken up again by the request itself.  Note that there
-         * is no race between yielding and reentering read_reply_co.  This
+         * is no race between yielding and reentering connection_co.  This
          * is because:
          *
          * - if the request runs on the same AioContext, it is only
          *   entered after we yield
          *
          * - if the request runs on a different AioContext, reentering
-         *   read_reply_co happens through a bottom half, which can only
+         *   connection_co happens through a bottom half, which can only
          *   run after we yield.
          */
         aio_co_wake(s->requests[i].coroutine);
@@ -120,7 +120,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
     }
 
     nbd_recv_coroutines_wake_all(s);
-    s->read_reply_co = NULL;
+    s->connection_co = NULL;
 }
 
 static int nbd_co_send_request(BlockDriverState *bs,
@@ -428,7 +428,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
     }
     *request_ret = 0;
 
-    /* Wait until we're woken up by nbd_read_reply_entry.  */
+    /* Wait until we're woken up by nbd_connection_entry.  */
     s->requests[i].receiving = true;
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
@@ -503,7 +503,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 }
 
 /* nbd_co_receive_one_chunk
- * Read reply, wake up read_reply_co and set s->quit if needed.
+ * Read reply, wake up connection_co and set s->quit if needed.
  * Return value is a fatal error code or normal nbd reply error code
  */
 static coroutine_fn int nbd_co_receive_one_chunk(
@@ -517,15 +517,15 @@ static coroutine_fn int nbd_co_receive_one_chunk(
     if (ret < 0) {
         nbd_channel_error(s, ret);
     } else {
-        /* For assert at loop start in nbd_read_reply_entry */
+        /* For assert at loop start in nbd_connection_entry */
         if (reply) {
             *reply = s->reply;
         }
         s->reply.handle = 0;
     }
 
-    if (s->read_reply_co) {
-        aio_co_wake(s->read_reply_co);
+    if (s->connection_co) {
+        aio_co_wake(s->connection_co);
     }
 
     return ret;
@@ -966,7 +966,7 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
 {
     NBDClientSession *client = nbd_get_client_session(bs);
     qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
-    aio_co_schedule(new_context, client->read_reply_co);
+    aio_co_schedule(new_context, client->connection_co);
 }
 
 void nbd_client_close(BlockDriverState *bs)
@@ -1066,7 +1066,7 @@ static int nbd_client_connect(BlockDriverState *bs,
     /* Now that we're connected, set the socket to be non-blocking and
      * kick the reply mechanism.  */
     qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
-    client->read_reply_co = qemu_coroutine_create(nbd_read_reply_entry, client);
+    client->connection_co = qemu_coroutine_create(nbd_connection_entry, client);
     nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
     logout("Established connection with NBD server\n");
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 07/10] block/nbd-client: rename read_reply_co to connection_co Vladimir Sementsov-Ogievskiy
@ 2018-07-31 17:30 ` Vladimir Sementsov-Ogievskiy
  2019-01-04 22:25   ` Eric Blake
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

Reconnect will be implemented in the following commit, so for now,
in semantics below, disconnect itself is a "serious error".

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json | 12 +++++++++++-
 block/nbd-client.h   |  1 +
 block/nbd-client.c   |  1 +
 block/nbd.c          | 16 +++++++++++++++-
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5b9084a394..cf03402ec5 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3511,13 +3511,23 @@
 #                  traditional "base:allocation" block status (see
 #                  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 3.0)
 #
+# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to connect
+#                   again, until success or serious error. During first
+#                   @reconnect-delay seconds of reconnecting loop all requests
+#                   are paused and have a chance to rerun, if successful
+#                   connect occures during this time. After @reconnect-delay
+#                   seconds all delayed requests are failed and all following
+#                   requests will be failed to (until successfull reconnect).
+#                   Default 300 seconds (Since 3.1)
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsNbd',
   'data': { 'server': 'SocketAddress',
             '*export': 'str',
             '*tls-creds': 'str',
-            '*x-dirty-bitmap': 'str' } }
+            '*x-dirty-bitmap': 'str',
+            '*reconnect-delay': 'uint32' } }
 
 ##
 # @BlockdevOptionsRaw:
diff --git a/block/nbd-client.h b/block/nbd-client.h
index e7bda4d3c4..ef8a6a9239 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -53,6 +53,7 @@ int nbd_client_init(BlockDriverState *bs,
                     QCryptoTLSCreds *tlscreds,
                     const char *hostname,
                     const char *x_dirty_bitmap,
+                    uint32_t reconnect_delay,
                     Error **errp);
 void nbd_client_close(BlockDriverState *bs);
 
diff --git a/block/nbd-client.c b/block/nbd-client.c
index c184a9f0e9..41e6e6e702 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -1079,6 +1079,7 @@ int nbd_client_init(BlockDriverState *bs,
                     QCryptoTLSCreds *tlscreds,
                     const char *hostname,
                     const char *x_dirty_bitmap,
+                    uint32_t reconnect_delay,
                     Error **errp)
 {
     NBDClientSession *client = nbd_get_client_session(bs);
diff --git a/block/nbd.c b/block/nbd.c
index 9db5eded89..b40fb5a977 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
             .help = "experimental: expose named dirty bitmap in place of "
                     "block status",
         },
+        {
+            .name = "reconnect-delay",
+            .type = QEMU_OPT_NUMBER,
+            .help = "Reconnect delay. On disconnect, nbd client tries to"
+                    "connect again, until success or serious error. During"
+                    "first @reconnect-delay seconds of reconnecting loop all"
+                    "requests are paused and have a chance to rerun, if"
+                    "successful connect occures during this time. After"
+                    "@reconnect-delay seconds all delayed requests are failed"
+                    "and all following requests will be failed to (until"
+                    "successfull reconnect). Default 300 seconds",
+        },
         { /* end of list */ }
     },
 };
@@ -411,7 +423,9 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
 
     /* NBD handshake */
     ret = nbd_client_init(bs, s->saddr, s->export, tlscreds, hostname,
-                          qemu_opt_get(opts, "x-dirty-bitmap"), errp);
+                          qemu_opt_get(opts, "x-dirty-bitmap"),
+                          qemu_opt_get_number(opts, "reconnect-delay", 300),
+                          errp);
 
  error:
     if (tlscreds) {
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay Vladimir Sementsov-Ogievskiy
@ 2018-07-31 17:30 ` Vladimir Sementsov-Ogievskiy
  2018-11-02 12:39   ` Vladimir Sementsov-Ogievskiy
  2019-01-16 17:04   ` Eric Blake
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 10/10] iotests: test " Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

Implement reconnect. To achieve this:

1. add new modes:
   connecting-wait: means, that reconnecting is in progress, and there
     were small number of reconnect attempts, so all requests are
     waiting for the connection.
   connecting-nowait: reconnecting is in progress, there were a lot of
     attempts of reconnect, all requests will return errors.

   two old modes are used too:
   connected: normal state
   quit: exiting after fatal error or on close

Possible transitions are:

   * -> quit
   connecting-* -> connected
   connecting-wait -> connecting-nowait (transition is done after
                      reconnect-delay seconds in connecting-wait mode)
   connected -> connecting-wait

2. Implement reconnect in connection_co. So, in connecting-* mode,
    connection_co, tries to reconnect unlimited times.

3. Retry nbd queries on channel error, if we are in connecting-wait
    state.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.h |   4 +
 block/nbd-client.c | 304 +++++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 255 insertions(+), 53 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index ef8a6a9239..52e4ec66be 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -40,6 +40,10 @@ typedef struct NBDClientSession {
     Coroutine *connection_co;
     int in_flight;
     NBDClientState state;
+    bool receiving;
+    int connect_status;
+    Error *connect_err;
+    bool wait_in_flight;
 
     NBDClientRequest requests[MAX_NBD_REQUESTS];
     NBDReply reply;
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 41e6e6e702..b09907096d 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -34,10 +34,26 @@
 #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
 #define INDEX_TO_HANDLE(bs, index)  ((index)  ^ (uint64_t)(intptr_t)(bs))
 
-/* @ret would be used for reconnect in future */
+static int nbd_client_connect(BlockDriverState *bs,
+                              SocketAddress *saddr,
+                              const char *export,
+                              QCryptoTLSCreds *tlscreds,
+                              const char *hostname,
+                              const char *x_dirty_bitmap,
+                              Error **errp);
+
 static void nbd_channel_error(NBDClientSession *s, int ret)
 {
-    s->state = NBD_CLIENT_QUIT;
+    if (ret == -EIO) {
+        if (s->state == NBD_CLIENT_CONNECTED) {
+            s->state = NBD_CLIENT_CONNECTING_WAIT;
+        }
+    } else {
+        if (s->state == NBD_CLIENT_CONNECTED) {
+            qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+        }
+        s->state = NBD_CLIENT_QUIT;
+    }
 }
 
 static void nbd_recv_coroutines_wake_all(NBDClientSession *s)
@@ -57,33 +73,151 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 {
     NBDClientSession *client = nbd_get_client_session(bs);
 
-    assert(client->ioc);
-
-    /* finish any pending coroutines */
-    qio_channel_shutdown(client->ioc,
-                         QIO_CHANNEL_SHUTDOWN_BOTH,
-                         NULL);
+    if (client->state == NBD_CLIENT_CONNECTED) {
+        /* finish any pending coroutines */
+        assert(client->ioc);
+        qio_channel_shutdown(client->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+    }
+    client->state = NBD_CLIENT_QUIT;
     BDRV_POLL_WHILE(bs, client->connection_co);
+}
+
+typedef struct NBDConnection {
+    BlockDriverState *bs;
+    SocketAddress *saddr;
+    const char *export;
+    QCryptoTLSCreds *tlscreds;
+    const char *hostname;
+    const char *x_dirty_bitmap;
+    uint32_t reconnect_delay;
+} NBDConnection;
+
+static bool nbd_client_connecting(NBDClientSession *client)
+{
+    return client->state == NBD_CLIENT_CONNECTING_WAIT ||
+           client->state == NBD_CLIENT_CONNECTING_NOWAIT;
+}
+
+static bool nbd_client_connecting_wait(NBDClientSession *client)
+{
+    return client->state == NBD_CLIENT_CONNECTING_WAIT;
+}
+
+static coroutine_fn void nbd_reconnect_attempt(NBDConnection *con)
+{
+    NBDClientSession *s = nbd_get_client_session(con->bs);
+    Error *local_err = NULL;
+
+    assert(nbd_client_connecting(s));
+
+    /* Wait completion of all in-flight requests */
+
+    qemu_co_mutex_lock(&s->send_mutex);
+
+    while (s->in_flight > 0) {
+        qemu_co_mutex_unlock(&s->send_mutex);
+        nbd_recv_coroutines_wake_all(s);
+        s->wait_in_flight = true;
+        qemu_coroutine_yield();
+        s->wait_in_flight = false;
+        qemu_co_mutex_lock(&s->send_mutex);
+    }
+
+    qemu_co_mutex_unlock(&s->send_mutex);
+
+    /* Now we are sure, that nobody accessing the channel now and nobody
+     * will try to access the channel, until we set state to CONNECTED
+     */
+
+    /* Finalize previous connection if any */
+    if (s->ioc) {
+        nbd_client_detach_aio_context(con->bs);
+        object_unref(OBJECT(s->sioc));
+        s->sioc = NULL;
+        object_unref(OBJECT(s->ioc));
+        s->ioc = NULL;
+    }
+
+    s->connect_status = nbd_client_connect(con->bs, con->saddr,
+                                           con->export, con->tlscreds,
+                                           con->hostname, con->x_dirty_bitmap,
+                                           &local_err);
+    error_free(s->connect_err);
+    s->connect_err = NULL;
+    error_propagate(&s->connect_err, local_err);
+    local_err = NULL;
 
-    nbd_client_detach_aio_context(bs);
-    object_unref(OBJECT(client->sioc));
-    client->sioc = NULL;
-    object_unref(OBJECT(client->ioc));
-    client->ioc = NULL;
+    if (s->connect_status == -EINVAL) {
+        /* Protocol error or something like this, go to NBD_CLIENT_QUIT */
+        nbd_channel_error(s, s->connect_status);
+        return;
+    }
+
+    if (s->connect_status < 0) {
+        /* failed attempt */
+        return;
+    }
+
+    /* successfully connected */
+    s->state = NBD_CLIENT_CONNECTED;
+    qemu_co_queue_restart_all(&s->free_sema);
+}
+
+static coroutine_fn void nbd_reconnect_loop(NBDConnection *con)
+{
+    NBDClientSession *s = nbd_get_client_session(con->bs);
+    uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+    uint64_t delay_ns = con->reconnect_delay * 1000000000UL;
+    uint64_t timeout = 1000000000UL; /* 1 sec */
+    uint64_t max_timeout = 16000000000UL; /* 16 sec */
+
+    if (!nbd_client_connecting(s)) {
+        return;
+    }
+
+    nbd_reconnect_attempt(con);
+
+    while (nbd_client_connecting(s)) {
+        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
+            qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
+        {
+            s->state = NBD_CLIENT_CONNECTING_NOWAIT;
+            qemu_co_queue_restart_all(&s->free_sema);
+        }
+
+        qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, timeout);
+        if (timeout < max_timeout) {
+            timeout *= 2;
+        }
+
+        nbd_reconnect_attempt(con);
+    }
 }
 
 static coroutine_fn void nbd_connection_entry(void *opaque)
 {
-    NBDClientSession *s = opaque;
+    NBDConnection *con = opaque;
+    NBDClientSession *s = nbd_get_client_session(con->bs);
     uint64_t i;
     int ret = 0;
     Error *local_err = NULL;
 
     while (s->state != NBD_CLIENT_QUIT) {
+        if (nbd_client_connecting(s)) {
+            nbd_reconnect_loop(con);
+        }
+
+        if (s->state != NBD_CLIENT_CONNECTED) {
+            continue;
+        }
+
+        s->receiving = true;
         assert(s->reply.handle == 0);
         ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
+        s->receiving = false;
         if (local_err) {
             error_report_err(local_err);
+            local_err = NULL;
         }
         if (ret <= 0) {
             nbd_channel_error(s, ret ? ret : -EIO);
@@ -119,8 +253,18 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
         qemu_coroutine_yield();
     }
 
+    qemu_co_queue_restart_all(&s->free_sema);
     nbd_recv_coroutines_wake_all(s);
     s->connection_co = NULL;
+    if (s->ioc) {
+        nbd_client_detach_aio_context(con->bs);
+        object_unref(OBJECT(s->sioc));
+        s->sioc = NULL;
+        object_unref(OBJECT(s->ioc));
+        s->ioc = NULL;
+    }
+
+    g_free(con);
 }
 
 static int nbd_co_send_request(BlockDriverState *bs,
@@ -131,7 +275,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
     int rc, i = -1;
 
     qemu_co_mutex_lock(&s->send_mutex);
-    while (s->in_flight == MAX_NBD_REQUESTS) {
+    while (s->in_flight == MAX_NBD_REQUESTS || nbd_client_connecting_wait(s)) {
         qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
     }
 
@@ -182,7 +326,11 @@ err:
             s->requests[i].coroutine = NULL;
             s->in_flight--;
         }
-        qemu_co_queue_next(&s->free_sema);
+        if (s->in_flight == 0 && s->wait_in_flight) {
+            aio_co_wake(s->connection_co);
+        } else {
+            qemu_co_queue_next(&s->free_sema);
+        }
     }
     qemu_co_mutex_unlock(&s->send_mutex);
     return rc;
@@ -521,10 +669,13 @@ static coroutine_fn int nbd_co_receive_one_chunk(
         if (reply) {
             *reply = s->reply;
         }
-        s->reply.handle = 0;
     }
+    s->reply.handle = 0;
 
-    if (s->connection_co) {
+    if (s->connection_co && !s->wait_in_flight) {
+        /* We must check s->wait_in_flight, because we may entered by
+         * nbd_recv_coroutines_wake_all(), in this case we should not
+         * wake connection_co here, it will woken by last request. */
         aio_co_wake(s->connection_co);
     }
 
@@ -632,7 +783,11 @@ break_loop:
 
     qemu_co_mutex_lock(&s->send_mutex);
     s->in_flight--;
-    qemu_co_queue_next(&s->free_sema);
+    if (s->in_flight == 0 && s->wait_in_flight) {
+        aio_co_wake(s->connection_co);
+    } else {
+        qemu_co_queue_next(&s->free_sema);
+    }
     qemu_co_mutex_unlock(&s->send_mutex);
 
     return false;
@@ -781,16 +936,21 @@ static int nbd_co_request(BlockDriverState *bs, NBDRequest *request,
     } else {
         assert(request->type != NBD_CMD_WRITE);
     }
-    ret = nbd_co_send_request(bs, request, write_qiov);
-    if (ret < 0) {
-        return ret;
-    }
 
-    ret = nbd_co_receive_return_code(client, request->handle,
-                                     &request_ret, &local_err);
-    if (local_err) {
-        error_report_err(local_err);
-    }
+    do {
+        ret = nbd_co_send_request(bs, request, write_qiov);
+        if (ret < 0) {
+            continue;
+        }
+
+        ret = nbd_co_receive_return_code(client, request->handle,
+                                         &request_ret, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            local_err = NULL;
+        }
+    } while (ret < 0 && nbd_client_connecting_wait(client));
+
     return ret ? ret : request_ret;
 }
 
@@ -812,16 +972,21 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
     if (!bytes) {
         return 0;
     }
-    ret = nbd_co_send_request(bs, &request, NULL);
-    if (ret < 0) {
-        return ret;
-    }
 
-    ret = nbd_co_receive_cmdread_reply(client, request.handle, offset, qiov,
-                                       &request_ret, &local_err);
-    if (local_err) {
-        error_report_err(local_err);
-    }
+    do {
+        ret = nbd_co_send_request(bs, &request, NULL);
+        if (ret < 0) {
+            continue;
+        }
+
+        ret = nbd_co_receive_cmdread_reply(client, request.handle, offset, qiov,
+                                           &request_ret, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            local_err = NULL;
+        }
+    } while (ret < 0 && nbd_client_connecting_wait(client));
+
     return ret ? ret : request_ret;
 }
 
@@ -935,16 +1100,21 @@ int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
         return BDRV_BLOCK_DATA;
     }
 
-    ret = nbd_co_send_request(bs, &request, NULL);
-    if (ret < 0) {
-        return ret;
-    }
+    do {
+        ret = nbd_co_send_request(bs, &request, NULL);
+        if (ret < 0) {
+            continue;
+        }
+
+        ret = nbd_co_receive_blockstatus_reply(client, request.handle, bytes,
+                                               &extent, &request_ret,
+                                               &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            local_err = NULL;
+        }
+    } while (ret < 0 && nbd_client_connecting_wait(client));
 
-    ret = nbd_co_receive_blockstatus_reply(client, request.handle, bytes,
-                                           &extent, &request_ret, &local_err);
-    if (local_err) {
-        error_report_err(local_err);
-    }
     if (ret < 0 || request_ret < 0) {
         return ret ? ret : request_ret;
     }
@@ -966,7 +1136,15 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
 {
     NBDClientSession *client = nbd_get_client_session(bs);
     qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
-    aio_co_schedule(new_context, client->connection_co);
+    if (client->receiving) {
+        /* We schedule connection_co only if it is waiting for read from the
+         * channel. We should not schedule it if it is some other yield, and if
+         * it is currently executing (we call nbd_client_attach_aio_context from
+         * connection code).
+         */
+
+        aio_co_schedule(new_context, client->connection_co);
+    }
 }
 
 void nbd_client_close(BlockDriverState *bs)
@@ -974,9 +1152,9 @@ void nbd_client_close(BlockDriverState *bs)
     NBDClientSession *client = nbd_get_client_session(bs);
     NBDRequest request = { .type = NBD_CMD_DISC };
 
-    assert(client->ioc);
-
-    nbd_send_request(client->ioc, &request);
+    if (client->ioc) {
+        nbd_send_request(client->ioc, &request);
+    }
 
     nbd_teardown_connection(bs);
 }
@@ -1066,7 +1244,6 @@ static int nbd_client_connect(BlockDriverState *bs,
     /* Now that we're connected, set the socket to be non-blocking and
      * kick the reply mechanism.  */
     qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
-    client->connection_co = qemu_coroutine_create(nbd_connection_entry, client);
     nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
     logout("Established connection with NBD server\n");
@@ -1082,11 +1259,32 @@ int nbd_client_init(BlockDriverState *bs,
                     uint32_t reconnect_delay,
                     Error **errp)
 {
+    int ret;
     NBDClientSession *client = nbd_get_client_session(bs);
+    NBDConnection *con;
 
     qemu_co_mutex_init(&client->send_mutex);
     qemu_co_queue_init(&client->free_sema);
 
-    return nbd_client_connect(bs, saddr, export, tlscreds, hostname,
-                              x_dirty_bitmap, errp);
+    ret = nbd_client_connect(bs, saddr, export, tlscreds, hostname,
+                             x_dirty_bitmap, errp);
+    if (ret < 0) {
+        return ret;
+    }
+    /* successfully connected */
+    client->state = NBD_CLIENT_CONNECTED;
+
+    con = g_new(NBDConnection, 1);
+    con->bs = bs;
+    con->saddr = saddr;
+    con->export = export;
+    con->tlscreds = tlscreds;
+    con->hostname = hostname;
+    con->x_dirty_bitmap = x_dirty_bitmap;
+    con->reconnect_delay = reconnect_delay;
+
+    client->connection_co = qemu_coroutine_create(nbd_connection_entry, con);
+    aio_co_schedule(bdrv_get_aio_context(bs), client->connection_co);
+
+    return 0;
 }
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v4 10/10] iotests: test nbd reconnect
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect Vladimir Sementsov-Ogievskiy
@ 2018-07-31 17:30 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 17:11   ` Eric Blake
       [not found] ` <fc24ba9e-e325-6478-cb22-bc0a256c6e87@virtuozzo.com>
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-31 17:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, vsementsov, den

Add test, which starts backup to nbd target and restarts nbd server
during backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/220        | 67 +++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/220.out    |  7 +++++
 tests/qemu-iotests/group      |  1 +
 tests/qemu-iotests/iotests.py |  4 +++
 4 files changed, 79 insertions(+)
 create mode 100755 tests/qemu-iotests/220
 create mode 100644 tests/qemu-iotests/220.out

diff --git a/tests/qemu-iotests/220 b/tests/qemu-iotests/220
new file mode 100755
index 0000000000..c9702a7dad
--- /dev/null
+++ b/tests/qemu-iotests/220
@@ -0,0 +1,67 @@
+#!/usr/bin/env python
+#
+# Test nbd reconnect
+#
+# Copyright (c) 2018 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import time
+
+import iotests
+from iotests import qemu_img_create, file_path, qemu_nbd_popen
+
+disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
+nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
+
+qemu_img_create('-f', iotests.imgfmt, disk_a, '5M')
+qemu_img_create('-f', iotests.imgfmt, disk_b, '5M')
+srv = qemu_nbd_popen('-k', nbd_sock, '-x', 'exp', '-f', iotests.imgfmt, disk_b)
+time.sleep(1)
+
+vm = iotests.VM().add_drive(disk_a)
+vm.launch()
+vm.hmp_qemu_io('drive0', 'write 0 5M')
+
+print 'blockdev-add:', vm.qmp('blockdev-add', node_name='backup0', driver='raw',
+                              file={'driver':'nbd',
+                                    'export': 'exp',
+                                    'server': {'type': 'unix',
+                                               'path': nbd_sock}})
+print 'blockdev-backup:', vm.qmp('blockdev-backup', device='drive0',
+                                 sync='full', target='backup0')
+
+time.sleep(1)
+print 'Kill NBD server'
+srv.kill()
+
+jobs = vm.qmp('query-block-jobs')['return']
+if jobs and jobs[0]['offset'] < jobs[0]['len']:
+    print 'Backup job is still in progress'
+
+time.sleep(1)
+
+print 'Start NBD server'
+srv = qemu_nbd_popen('-k', nbd_sock, '-x', 'exp', '-f', iotests.imgfmt, disk_b)
+
+try:
+    e = vm.event_wait('BLOCK_JOB_COMPLETED')
+    print e['event'], ':', e['data']
+except:
+    pass
+
+print 'blockdev-del:', vm.qmp('blockdev-del', node_name='backup0')
+srv.kill()
+vm.shutdown()
diff --git a/tests/qemu-iotests/220.out b/tests/qemu-iotests/220.out
new file mode 100644
index 0000000000..dae1a49d9f
--- /dev/null
+++ b/tests/qemu-iotests/220.out
@@ -0,0 +1,7 @@
+blockdev-add: {u'return': {}}
+blockdev-backup: {u'return': {}}
+Kill NBD server
+Backup job is still in progress
+Start NBD server
+BLOCK_JOB_COMPLETED : {u'device': u'drive0', u'type': u'backup', u'speed': 0, u'len': 5242880, u'offset': 5242880}
+blockdev-del: {u'return': {}}
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index b973dc842d..ee2473c6a3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -219,6 +219,7 @@
 217 rw auto quick
 218 rw auto quick
 219 rw auto
+220 rw auto quick
 221 rw auto quick
 222 rw auto quick
 223 rw auto quick
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 4e67fbbe96..17bc8c8e32 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -185,6 +185,10 @@ def qemu_nbd(*args):
     '''Run qemu-nbd in daemon mode and return the parent's exit code'''
     return subprocess.call(qemu_nbd_args + ['--fork'] + list(args))
 
+def qemu_nbd_popen(*args):
+    '''Run qemu-nbd in daemon mode and return the parent's exit code'''
+    return subprocess.Popen(qemu_nbd_args + ['--persistent'] + list(args))
+
 def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
     '''Return True if two image files are identical'''
     return qemu_img('compare', '-f', fmt1,
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 00/10] NBD reconnect
       [not found] ` <fc24ba9e-e325-6478-cb22-bc0a256c6e87@virtuozzo.com>
@ 2018-10-09 19:33   ` John Snow
  2018-10-09 21:59     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 35+ messages in thread
From: John Snow @ 2018-10-09 19:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, armbru, mreitz, den, pbonzini



On 09/17/2018 11:26 AM, Vladimir Sementsov-Ogievskiy wrote:
> ping
> 

Is this still pending or did I/we miss a v5?

> 31.07.2018 20:30, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all.
>>
>> Here is NBD reconnect. Previously, if connection failed all current
>> and future requests will fail. After the series, nbd-client driver
>> will try to reconnect unlimited times. During first @reconnect-delay
>> seconds of reconnecting all requests will wait for the connection,
>> and if it is established requests will be resent. After
>> @reconnect-delay period all requests will be failed (until successful
>> reconnect).
>>
>> v4: - add Eric's r-b to 01.
>>      - drop CONNECTING_INIT mode, don't reconnect on _open.
>>      - new api: only one parameter @reconnect-delay
>>      - new interval scheme between reconnect attempts
>>          (1 - 2 - 4 - 8 - 16 - 16 ... seconds)
>>      - fixes and refactorings in main patch (09), including merge with
>>        old 08 patch
>>     
>> v3:
>> 06: fix build error in function 'nbd_co_send_request':
>>       error: 'i' may be used uninitialized in this function
>>
>> v2 notes:
>> Here is v2 of NBD reconnect, but it is very very different from v1, so,
>> forget about v1.
>> The series includes my "NBD reconnect: preliminary refactoring", with
>> changes in 05: leave asserts (Eric).
>>
>> Vladimir Sementsov-Ogievskiy (10):
>>    block/nbd-client: split channel errors from export errors
>>    block/nbd: move connection code from block/nbd to block/nbd-client
>>    block/nbd-client: split connection from initialization
>>    block/nbd-client: fix nbd_reply_chunk_iter_receive
>>    block/nbd-client: don't check ioc
>>    block/nbd-client: move from quit to state
>>    block/nbd-client: rename read_reply_co to connection_co
>>    block/nbd: add cmdline and qapi parameter reconnect-delay
>>    block/nbd-client: nbd reconnect
>>    iotests: test nbd reconnect
>>
>>   qapi/block-core.json          |  12 +-
>>   block/nbd-client.h            |  20 +-
>>   block/nbd-client.c            | 515
>> +++++++++++++++++++++++++++++++-----------
>>   block/nbd.c                   |  56 ++---
>>   tests/qemu-iotests/220        |  67 ++++++
>>   tests/qemu-iotests/220.out    |   7 +
>>   tests/qemu-iotests/group      |   1 +
>>   tests/qemu-iotests/iotests.py |   4 +
>>   8 files changed, 512 insertions(+), 170 deletions(-)
>>   create mode 100755 tests/qemu-iotests/220
>>   create mode 100644 tests/qemu-iotests/220.out
>>
> 
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 00/10] NBD reconnect
  2018-10-09 19:33   ` [Qemu-devel] [Qemu-block] [PATCH v4 00/10] NBD reconnect John Snow
@ 2018-10-09 21:59     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-09 21:59 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block
  Cc: kwolf, armbru, mreitz, Denis Lunev, pbonzini



On 10/09/2018 10:33 PM, John Snow wrote:
> 
> 
> On 09/17/2018 11:26 AM, Vladimir Sementsov-Ogievskiy wrote:
>> ping
>>
> 
> Is this still pending or did I/we miss a v5?

still pending

> 
>> 31.07.2018 20:30, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all.
>>>
>>> Here is NBD reconnect. Previously, if connection failed all current
>>> and future requests will fail. After the series, nbd-client driver
>>> will try to reconnect unlimited times. During first @reconnect-delay
>>> seconds of reconnecting all requests will wait for the connection,
>>> and if it is established requests will be resent. After
>>> @reconnect-delay period all requests will be failed (until successful
>>> reconnect).
>>>
>>> v4: - add Eric's r-b to 01.
>>>       - drop CONNECTING_INIT mode, don't reconnect on _open.
>>>       - new api: only one parameter @reconnect-delay
>>>       - new interval scheme between reconnect attempts
>>>           (1 - 2 - 4 - 8 - 16 - 16 ... seconds)
>>>       - fixes and refactorings in main patch (09), including merge with
>>>         old 08 patch
>>>      
>>> v3:
>>> 06: fix build error in function 'nbd_co_send_request':
>>>        error: 'i' may be used uninitialized in this function
>>>
>>> v2 notes:
>>> Here is v2 of NBD reconnect, but it is very very different from v1, so,
>>> forget about v1.
>>> The series includes my "NBD reconnect: preliminary refactoring", with
>>> changes in 05: leave asserts (Eric).
>>>
>>> Vladimir Sementsov-Ogievskiy (10):
>>>     block/nbd-client: split channel errors from export errors
>>>     block/nbd: move connection code from block/nbd to block/nbd-client
>>>     block/nbd-client: split connection from initialization
>>>     block/nbd-client: fix nbd_reply_chunk_iter_receive
>>>     block/nbd-client: don't check ioc
>>>     block/nbd-client: move from quit to state
>>>     block/nbd-client: rename read_reply_co to connection_co
>>>     block/nbd: add cmdline and qapi parameter reconnect-delay
>>>     block/nbd-client: nbd reconnect
>>>     iotests: test nbd reconnect
>>>
>>>    qapi/block-core.json          |  12 +-
>>>    block/nbd-client.h            |  20 +-
>>>    block/nbd-client.c            | 515
>>> +++++++++++++++++++++++++++++++-----------
>>>    block/nbd.c                   |  56 ++---
>>>    tests/qemu-iotests/220        |  67 ++++++
>>>    tests/qemu-iotests/220.out    |   7 +
>>>    tests/qemu-iotests/group      |   1 +
>>>    tests/qemu-iotests/iotests.py |   4 +
>>>    8 files changed, 512 insertions(+), 170 deletions(-)
>>>    create mode 100755 tests/qemu-iotests/220
>>>    create mode 100644 tests/qemu-iotests/220.out
>>>
>>
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect Vladimir Sementsov-Ogievskiy
@ 2018-11-02 12:39   ` Vladimir Sementsov-Ogievskiy
  2019-01-16 17:04   ` Eric Blake
  1 sibling, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-11-02 12:39 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, Denis Lunev

31.07.2018 20:30, Vladimir Sementsov-Ogievskiy wrote:
> Implement reconnect. To achieve this:
>
> 1. add new modes:
>     connecting-wait: means, that reconnecting is in progress, and there
>       were small number of reconnect attempts, so all requests are
>       waiting for the connection.
>     connecting-nowait: reconnecting is in progress, there were a lot of
>       attempts of reconnect, all requests will return errors.
>
>     two old modes are used too:
>     connected: normal state
>     quit: exiting after fatal error or on close
>
> Possible transitions are:
>
>     * -> quit
>     connecting-* -> connected
>     connecting-wait -> connecting-nowait (transition is done after
>                        reconnect-delay seconds in connecting-wait mode)
>     connected -> connecting-wait
>
> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>      connection_co, tries to reconnect unlimited times.
>
> 3. Retry nbd queries on channel error, if we are in connecting-wait
>      state.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/nbd-client.h |   4 +
>   block/nbd-client.c | 304 +++++++++++++++++++++++++++++++++++++++++++----------
>   2 files changed, 255 insertions(+), 53 deletions(-)
>
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index ef8a6a9239..52e4ec66be 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -40,6 +40,10 @@ typedef struct NBDClientSession {
>       Coroutine *connection_co;
>       int in_flight;
>       NBDClientState state;
> +    bool receiving;
> +    int connect_status;
> +    Error *connect_err;
> +    bool wait_in_flight;
>   
>       NBDClientRequest requests[MAX_NBD_REQUESTS];
>       NBDReply reply;
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 41e6e6e702..b09907096d 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -34,10 +34,26 @@
>   #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
>   #define INDEX_TO_HANDLE(bs, index)  ((index)  ^ (uint64_t)(intptr_t)(bs))

[...]

> +static coroutine_fn void nbd_reconnect_attempt(NBDConnection *con)
> +{
> +    NBDClientSession *s = nbd_get_client_session(con->bs);
> +    Error *local_err = NULL;
> +
> +    assert(nbd_client_connecting(s));
> +
> +    /* Wait completion of all in-flight requests */
> +
> +    qemu_co_mutex_lock(&s->send_mutex);
> +
> +    while (s->in_flight > 0) {
> +        qemu_co_mutex_unlock(&s->send_mutex);
> +        nbd_recv_coroutines_wake_all(s);
> +        s->wait_in_flight = true;
> +        qemu_coroutine_yield();
> +        s->wait_in_flight = false;
> +        qemu_co_mutex_lock(&s->send_mutex);
> +    }
> +
> +    qemu_co_mutex_unlock(&s->send_mutex);
> +
> +    /* Now we are sure, that nobody accessing the channel now and nobody
> +     * will try to access the channel, until we set state to CONNECTED
> +     */
> +
> +    /* Finalize previous connection if any */
> +    if (s->ioc) {
> +        nbd_client_detach_aio_context(con->bs);
> +        object_unref(OBJECT(s->sioc));
> +        s->sioc = NULL;
> +        object_unref(OBJECT(s->ioc));
> +        s->ioc = NULL;
> +    }
> +
> +    s->connect_status = nbd_client_connect(con->bs, con->saddr,
> +                                           con->export, con->tlscreds,
> +                                           con->hostname, con->x_dirty_bitmap,
> +                                           &local_err);
> +    error_free(s->connect_err);
> +    s->connect_err = NULL;
> +    error_propagate(&s->connect_err, local_err);
> +    local_err = NULL;
>   
> -    nbd_client_detach_aio_context(bs);
> -    object_unref(OBJECT(client->sioc));
> -    client->sioc = NULL;
> -    object_unref(OBJECT(client->ioc));
> -    client->ioc = NULL;
> +    if (s->connect_status == -EINVAL) {
> +        /* Protocol error or something like this, go to NBD_CLIENT_QUIT */
> +        nbd_channel_error(s, s->connect_status);
> +        return;

Unfortunately, nbd_client_connect returns -EINVAL for io errors instead 
of -EIO. And it is not trivial to fix it. So, this if{} should be removed.

> +    }
> +
> +    if (s->connect_status < 0) {
> +        /* failed attempt */
> +        return;
> +    }
> +
> +    /* successfully connected */
> +    s->state = NBD_CLIENT_CONNECTED;
> +    qemu_co_queue_restart_all(&s->free_sema);
> +}
> +



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] ping Re: [PATCH v4 00/10] NBD reconnect
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
       [not found] ` <fc24ba9e-e325-6478-cb22-bc0a256c6e87@virtuozzo.com>
@ 2018-12-12 10:33 ` Vladimir Sementsov-Ogievskiy
  2018-12-29 12:23 ` [Qemu-devel] ping3 " Vladimir Sementsov-Ogievskiy
  12 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-12 10:33 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, Denis Lunev

ping

31.07.2018 20:30, Vladimir Sementsov-Ogievskiy wrote:
> Hi all.
> 
> Here is NBD reconnect. Previously, if connection failed all current
> and future requests will fail. After the series, nbd-client driver
> will try to reconnect unlimited times. During first @reconnect-delay
> seconds of reconnecting all requests will wait for the connection,
> and if it is established requests will be resent. After
> @reconnect-delay period all requests will be failed (until successful
> reconnect).
> 
> v4: - add Eric's r-b to 01.
>      - drop CONNECTING_INIT mode, don't reconnect on _open.
>      - new api: only one parameter @reconnect-delay
>      - new interval scheme between reconnect attempts
>          (1 - 2 - 4 - 8 - 16 - 16 ... seconds)
>      - fixes and refactorings in main patch (09), including merge with
>        old 08 patch
>      
> 
> v3:
> 06: fix build error in function 'nbd_co_send_request':
>       error: 'i' may be used uninitialized in this function
> 
> v2 notes:
> Here is v2 of NBD reconnect, but it is very very different from v1, so,
> forget about v1.
> The series includes my "NBD reconnect: preliminary refactoring", with
> changes in 05: leave asserts (Eric).
> 
> Vladimir Sementsov-Ogievskiy (10):
>    block/nbd-client: split channel errors from export errors
>    block/nbd: move connection code from block/nbd to block/nbd-client
>    block/nbd-client: split connection from initialization
>    block/nbd-client: fix nbd_reply_chunk_iter_receive
>    block/nbd-client: don't check ioc
>    block/nbd-client: move from quit to state
>    block/nbd-client: rename read_reply_co to connection_co
>    block/nbd: add cmdline and qapi parameter reconnect-delay
>    block/nbd-client: nbd reconnect
>    iotests: test nbd reconnect
> 
>   qapi/block-core.json          |  12 +-
>   block/nbd-client.h            |  20 +-
>   block/nbd-client.c            | 515 +++++++++++++++++++++++++++++++-----------
>   block/nbd.c                   |  56 ++---
>   tests/qemu-iotests/220        |  67 ++++++
>   tests/qemu-iotests/220.out    |   7 +
>   tests/qemu-iotests/group      |   1 +
>   tests/qemu-iotests/iotests.py |   4 +
>   8 files changed, 512 insertions(+), 170 deletions(-)
>   create mode 100755 tests/qemu-iotests/220
>   create mode 100644 tests/qemu-iotests/220.out
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] ping3 Re: [PATCH v4 00/10] NBD reconnect
  2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
                   ` (11 preceding siblings ...)
  2018-12-12 10:33 ` [Qemu-devel] ping " Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:23 ` Vladimir Sementsov-Ogievskiy
  12 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:23 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, eblake, Denis Lunev

ping

31.07.2018 20:30, Vladimir Sementsov-Ogievskiy wrote:
> Hi all.
> 
> Here is NBD reconnect. Previously, if connection failed all current
> and future requests will fail. After the series, nbd-client driver
> will try to reconnect unlimited times. During first @reconnect-delay
> seconds of reconnecting all requests will wait for the connection,
> and if it is established requests will be resent. After
> @reconnect-delay period all requests will be failed (until successful
> reconnect).
> 
> v4: - add Eric's r-b to 01.
>      - drop CONNECTING_INIT mode, don't reconnect on _open.
>      - new api: only one parameter @reconnect-delay
>      - new interval scheme between reconnect attempts
>          (1 - 2 - 4 - 8 - 16 - 16 ... seconds)
>      - fixes and refactorings in main patch (09), including merge with
>        old 08 patch
>      
> 
> v3:
> 06: fix build error in function 'nbd_co_send_request':
>       error: 'i' may be used uninitialized in this function
> 
> v2 notes:
> Here is v2 of NBD reconnect, but it is very very different from v1, so,
> forget about v1.
> The series includes my "NBD reconnect: preliminary refactoring", with
> changes in 05: leave asserts (Eric).
> 
> Vladimir Sementsov-Ogievskiy (10):
>    block/nbd-client: split channel errors from export errors
>    block/nbd: move connection code from block/nbd to block/nbd-client
>    block/nbd-client: split connection from initialization
>    block/nbd-client: fix nbd_reply_chunk_iter_receive
>    block/nbd-client: don't check ioc
>    block/nbd-client: move from quit to state
>    block/nbd-client: rename read_reply_co to connection_co
>    block/nbd: add cmdline and qapi parameter reconnect-delay
>    block/nbd-client: nbd reconnect
>    iotests: test nbd reconnect
> 
>   qapi/block-core.json          |  12 +-
>   block/nbd-client.h            |  20 +-
>   block/nbd-client.c            | 515 +++++++++++++++++++++++++++++++-----------
>   block/nbd.c                   |  56 ++---
>   tests/qemu-iotests/220        |  67 ++++++
>   tests/qemu-iotests/220.out    |   7 +
>   tests/qemu-iotests/group      |   1 +
>   tests/qemu-iotests/iotests.py |   4 +
>   8 files changed, 512 insertions(+), 170 deletions(-)
>   create mode 100755 tests/qemu-iotests/220
>   create mode 100644 tests/qemu-iotests/220.out
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay Vladimir Sementsov-Ogievskiy
@ 2019-01-04 22:25   ` Eric Blake
  2019-02-05 16:48     ` Vladimir Sementsov-Ogievskiy
  2019-04-11 15:47       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 35+ messages in thread
From: Eric Blake @ 2019-01-04 22:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 3829 bytes --]

On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> Reconnect will be implemented in the following commit, so for now,
> in semantics below, disconnect itself is a "serious error".
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  qapi/block-core.json | 12 +++++++++++-
>  block/nbd-client.h   |  1 +
>  block/nbd-client.c   |  1 +
>  block/nbd.c          | 16 +++++++++++++++-
>  4 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 5b9084a394..cf03402ec5 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3511,13 +3511,23 @@
>  #                  traditional "base:allocation" block status (see
>  #                  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 3.0)
>  #
> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to connect

Maybe 'On unexpected disconnect', since intentional disconnect is not
unexpected.

> +#                   again, until success or serious error. During first
> +#                   @reconnect-delay seconds of reconnecting loop all requests
> +#                   are paused and have a chance to rerun, if successful
> +#                   connect occures during this time. After @reconnect-delay

occurs

> +#                   seconds all delayed requests are failed and all following
> +#                   requests will be failed to (until successfull reconnect).

successful

> +#                   Default 300 seconds (Since 3.1)

My delay in reviewing means this now has to be 4.0.

I'm guessing that a delay of 0 means disable auto-reconnect.  From a
backwards-compatibility standpoint, no auto-reconnect is more in line
with what we previously had - but from a usability standpoint, trying to
reconnect can avoid turning transient network hiccups into permanent
loss of a device to EIO errors, especially if the retry timeout is long
enough to allow an administrator to reroute the network to an
alternative server.  So I'm probably okay with the default being
non-zero - but it DOES mean that where you used to get instant EIO
failures when a network connection was severed, you now have to wait for
the reconnect delay to expire, and 5 minutes can be a long wait.  Since
the long delay is guest-observable, can we run into issues where a guest
that is currently used to instant EIO and total loss of the device could
instead get confused by not getting any response for up to 5 minutes,
whether or not that response eventually turns out to be EIO or a
successful recovery?

> +++ b/block/nbd.c
> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
>              .help = "experimental: expose named dirty bitmap in place of "
>                      "block status",
>          },
> +        {
> +            .name = "reconnect-delay",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "Reconnect delay. On disconnect, nbd client tries to"
> +                    "connect again, until success or serious error. During"
> +                    "first @reconnect-delay seconds of reconnecting loop all"
> +                    "requests are paused and have a chance to rerun, if"
> +                    "successful connect occures during this time. After"
> +                    "@reconnect-delay seconds all delayed requests are failed"
> +                    "and all following requests will be failed to (until"
> +                    "successfull reconnect). Default 300 seconds",

Same typos as in qapi.

The UI aspects look fine, now I need to review the patch series for code
issues :)


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/10] block/nbd-client: split connection from initialization
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 03/10] block/nbd-client: split connection from initialization Vladimir Sementsov-Ogievskiy
@ 2019-01-16 15:52   ` Eric Blake
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Blake @ 2019-01-16 15:52 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 489 bytes --]

On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> Split connection code to reuse it for reconnect.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.c | 33 ++++++++++++++++++++++++---------
>  1 file changed, 24 insertions(+), 9 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/10] block/nbd: move connection code from block/nbd to block/nbd-client
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 02/10] block/nbd: move connection code from block/nbd to block/nbd-client Vladimir Sementsov-Ogievskiy
@ 2019-01-16 15:56   ` Eric Blake
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Blake @ 2019-01-16 15:56 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> Keep all connection code in one file, to be able to implement reconnect
> in further patches.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.h |  2 +-
>  block/nbd-client.c | 37 +++++++++++++++++++++++++++++++++++--
>  block/nbd.c        | 40 ++--------------------------------------
>  3 files changed, 38 insertions(+), 41 deletions(-)
> 

> @@ -1001,12 +1033,14 @@ int nbd_client_init(BlockDriverState *bs,
>      g_free(client->info.x_dirty_bitmap);
>      if (ret < 0) {
>          logout("Failed to negotiate with the NBD server\n");
> +        object_unref(OBJECT(sioc));
>          return ret;
>      }
>      if (client->info.flags & NBD_FLAG_READ_ONLY &&
>          !bdrv_is_read_only(bs)) {
>          error_setg(errp,
>                     "request for write access conflicts with read-only export");
> +        object_unref(OBJECT(sioc));
>          return -EACCES;
>      }

Conflicts with changes committed in the meantime. I think I can resolve
the conflicts, but you may want to post a v5.

The patch itself looks reasonable;
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/10] block/nbd-client: fix nbd_reply_chunk_iter_receive
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 04/10] block/nbd-client: fix nbd_reply_chunk_iter_receive Vladimir Sementsov-Ogievskiy
@ 2019-01-16 16:01   ` Eric Blake
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Blake @ 2019-01-16 16:01 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 985 bytes --]

On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> Use exported report, not the variable to be reused (should not really
> matter).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index a1813cbfe1..263d1721f9 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -599,7 +599,7 @@ static bool nbd_reply_chunk_iter_receive(NBDClientSession *s,
>      }
>  
>      /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
> -    if (nbd_reply_is_simple(&s->reply) || s->quit) {
> +    if (nbd_reply_is_simple(reply) || s->quit) {
>          goto break_loop;
>      }
>  
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/10] block/nbd-client: don't check ioc
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 05/10] block/nbd-client: don't check ioc Vladimir Sementsov-Ogievskiy
@ 2019-01-16 16:05   ` Eric Blake
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Blake @ 2019-01-16 16:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 828 bytes --]

On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> We have several paranoiac checks for ioc != NULL. But ioc may become

s/paranoiac/paranoid/ sounds nicer, even if both forms are valid words.

> NULL only on close, which should not happen during requests handling.
> Also, we check ioc only sometimes, not after each yield, which is
> inconsistent. Let's drop these checks. However, for safety, lets leave

s/lets/let's/

> asserts instead.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.c | 16 +++++-----------
>  1 file changed, 5 insertions(+), 11 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/10] block/nbd-client: move from quit to state
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 06/10] block/nbd-client: move from quit to state Vladimir Sementsov-Ogievskiy
@ 2019-01-16 16:25   ` Eric Blake
  2019-01-16 16:58     ` Daniel P. Berrangé
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Blake @ 2019-01-16 16:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, den, Daniel P. Berrangé

[-- Attachment #1: Type: text/plain, Size: 5104 bytes --]

[adding Dan]

On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> To implement reconnect we need several states for the client:
> CONNECTED, QUIT and two CONNECTING states. CONNECTING states will
> be realized in the following patches. This patch implements CONNECTED
> and QUIT.
> 
> QUIT means, that we should close the connection and fail all current
> and further requests (like old quit = true).
> 
> CONNECTED means that connection is ok, we can send requests (like old
> quit = false).
> 
> For receiving loop we use a comparison of the current state with QUIT,
> because reconnect will be in the same loop, so it should be looping
> until the end.
> 
> Opposite, for requests we use a comparison of the current state with
> CONNECTED, as we don't want to send requests in CONNECTING states (
> which are unreachable now, but will be reachable after the following
> commits)
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.h |  9 ++++++++-
>  block/nbd-client.c | 55 ++++++++++++++++++++++++++++++++----------------------
>  2 files changed, 41 insertions(+), 23 deletions(-)

Dan just recently proposed patches to SocketChardev in general to use a
state machine that distinguishes between connecting and connected:

https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03339.html

I'm wondering how much of his work is related or can be reused to get
restartable connections on NBD sockets?

Remember, right now, the NBD code always starts in blocking mode, and
does single-threaded handshaking until it is ready for transmission,
then switches to non-blocking mode for all subsequent transmissions (so,
for example, servicing a read request can assume that the socket is
valid without further waiting).  But once we start allowing reconnects,
a read request will need to detect when one socket has gone down, and
wait for its replacement socket to come back up, in order to retry the
request; this retry is in a context where we are in non-blocking
context, but the retry must establish a new socket, and possibly convert
the socket into TLS mode, all before being ready to retry the read request.

> 
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index 2f047ba614..5367425774 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -23,6 +23,13 @@ typedef struct {
>      bool receiving;         /* waiting for read_reply_co? */
>  } NBDClientRequest;
>  
> +typedef enum NBDClientState {
> +    NBD_CLIENT_CONNECTING_WAIT,
> +    NBD_CLIENT_CONNECTING_NOWAIT,

Would we be better off adding these enum values in the later patch that
uses them?

> +    NBD_CLIENT_CONNECTED,
> +    NBD_CLIENT_QUIT
> +} NBDClientState;
> +
>  typedef struct NBDClientSession {
>      QIOChannelSocket *sioc; /* The master data channel */
>      QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
> @@ -32,10 +39,10 @@ typedef struct NBDClientSession {
>      CoQueue free_sema;
>      Coroutine *read_reply_co;
>      int in_flight;
> +    NBDClientState state;
>  
>      NBDClientRequest requests[MAX_NBD_REQUESTS];
>      NBDReply reply;
> -    bool quit;
>  } NBDClientSession;
>  
>  NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 7eaf0149f0..a91fd3ea3e 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -34,6 +34,12 @@
>  #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
>  #define INDEX_TO_HANDLE(bs, index)  ((index)  ^ (uint64_t)(intptr_t)(bs))
>  
> +/* @ret would be used for reconnect in future */

s/would/will/

> +static void nbd_channel_error(NBDClientSession *s, int ret)
> +{
> +    s->state = NBD_CLIENT_QUIT;
> +}
> +
>  static void nbd_recv_coroutines_wake_all(NBDClientSession *s)
>  {
>      int i;
> @@ -73,14 +79,15 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
>      int ret = 0;
>      Error *local_err = NULL;
>  
> -    while (!s->quit) {
> +    while (s->state != NBD_CLIENT_QUIT) {
>          assert(s->reply.handle == 0);
>          ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
>          if (local_err) {
>              error_report_err(local_err);
>          }
>          if (ret <= 0) {
> -            break;
> +            nbd_channel_error(s, ret ? ret : -EIO);
> +            continue;

I guess the continue instead of the break is pre-supposing that
nbd_channel_error() might be able to recover in later patches?  But for
this patch, there is no change in control flow, because the loop
condition is met for no further iterations, the same as a break would
have done.

The rest of the patch looks sane, but fails to apply easily for me (I'm
getting enough rebase churn, that it's getting harder to state if it is
accurate against the latest git master).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/10] block/nbd-client: rename read_reply_co to connection_co
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 07/10] block/nbd-client: rename read_reply_co to connection_co Vladimir Sementsov-Ogievskiy
@ 2019-01-16 16:35   ` Eric Blake
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Blake @ 2019-01-16 16:35 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> This coroutine will serve nbd reconnects, so, rename it to be something
> more generic.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.h |  4 ++--
>  block/nbd-client.c | 24 ++++++++++++------------
>  2 files changed, 14 insertions(+), 14 deletions(-)
> 

> @@ -503,7 +503,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
>  }
>  
>  /* nbd_co_receive_one_chunk
> - * Read reply, wake up read_reply_co and set s->quit if needed.
> + * Read reply, wake up connection_co and set s->quit if needed.

Comment is stale; should have been fixed in 6/10 to mention s->state now
that s->quit does not exist.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/10] block/nbd-client: move from quit to state
  2019-01-16 16:25   ` Eric Blake
@ 2019-01-16 16:58     ` Daniel P. Berrangé
  2019-02-05 16:35       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel P. Berrangé @ 2019-01-16 16:58 UTC (permalink / raw)
  To: Eric Blake
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, armbru,
	mreitz, kwolf, pbonzini, den

On Wed, Jan 16, 2019 at 10:25:03AM -0600, Eric Blake wrote:
> [adding Dan]
> 
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> > To implement reconnect we need several states for the client:
> > CONNECTED, QUIT and two CONNECTING states. CONNECTING states will
> > be realized in the following patches. This patch implements CONNECTED
> > and QUIT.
> > 
> > QUIT means, that we should close the connection and fail all current
> > and further requests (like old quit = true).
> > 
> > CONNECTED means that connection is ok, we can send requests (like old
> > quit = false).
> > 
> > For receiving loop we use a comparison of the current state with QUIT,
> > because reconnect will be in the same loop, so it should be looping
> > until the end.
> > 
> > Opposite, for requests we use a comparison of the current state with
> > CONNECTED, as we don't want to send requests in CONNECTING states (
> > which are unreachable now, but will be reachable after the following
> > commits)
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > ---
> >  block/nbd-client.h |  9 ++++++++-
> >  block/nbd-client.c | 55 ++++++++++++++++++++++++++++++++----------------------
> >  2 files changed, 41 insertions(+), 23 deletions(-)
> 
> Dan just recently proposed patches to SocketChardev in general to use a
> state machine that distinguishes between connecting and connected:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03339.html
> 
> I'm wondering how much of his work is related or can be reused to get
> restartable connections on NBD sockets?

There's nothing really special about what I did. Vladimir looks to
have basically done the same kind of approach, but I don't think
there's real scope for sharing with chardevs, as each care about
their own set of states.

> Remember, right now, the NBD code always starts in blocking mode, and
> does single-threaded handshaking until it is ready for transmission,
> then switches to non-blocking mode for all subsequent transmissions (so,
> for example, servicing a read request can assume that the socket is
> valid without further waiting).  But once we start allowing reconnects,
> a read request will need to detect when one socket has gone down, and
> wait for its replacement socket to come back up, in order to retry the
> request; this retry is in a context where we are in non-blocking
> context, but the retry must establish a new socket, and possibly convert
> the socket into TLS mode, all before being ready to retry the read request.

That makes it sound like the NBD handshake needs to be converted to
use entirely non-blocking I/O.

The TLS handshake already uses an asynchronous callback pattern and
to deal with that NBD had to create & run a private main loop to
complete the TLS handshake in its blocking code pattern.

You could potentially push this concept up to the top level. ie
implement the entire NBD handshake with async callbacks / non-blocking
I/O. Then simply use a private main loop to run that in a blocking
fashion for the initial connection. When you need to do re-connect
you now just run the async code without the extra main loop around
it.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect Vladimir Sementsov-Ogievskiy
  2018-11-02 12:39   ` Vladimir Sementsov-Ogievskiy
@ 2019-01-16 17:04   ` Eric Blake
  2019-02-05 17:07     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Blake @ 2019-01-16 17:04 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 3779 bytes --]

On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> Implement reconnect. To achieve this:
> 
> 1. add new modes:
>    connecting-wait: means, that reconnecting is in progress, and there
>      were small number of reconnect attempts, so all requests are
>      waiting for the connection.
>    connecting-nowait: reconnecting is in progress, there were a lot of
>      attempts of reconnect, all requests will return errors.
> 
>    two old modes are used too:
>    connected: normal state
>    quit: exiting after fatal error or on close

What makes an error fatal?  Without reconnect, life is simple - if the
server sends something we can't parse, we permanently turn the device
into an error condition - because we have no way to get back in sync
with the server for further commands.  Your patch allows reconnect
attempts where the connection is down (we failed to send to the server
or failed to receive the server's reply), but why can we not ALSO
attempt to reconnect after a parse error?  A reconnect would let us get
back in sync for attempting further commands.  You're right that the
current command should probably fail in that case (if the server sent us
garbage for a specific request, it will probably do so again on a repeat
of that request; which is different than when we don't even know what
the server would have sent because of a disconnect).

Or, put another way, we KNOW we have (corner) cases where a mis-aligned
image can currently cause the server to return BLOCK_STATUS replies that
aren't aligned to the advertised minimumm block size.  Attempting to
read the last sector of an image then causes the client to see the
misaligned reply and complain, which we are treating as fatal. But why
not instead just fail that particular read, but still attempt a
reconnect, in order to attempt further reads elsewhere in the image that
do not trip up the server's misaligned reply?

> 
> Possible transitions are:
> 
>    * -> quit
>    connecting-* -> connected
>    connecting-wait -> connecting-nowait (transition is done after
>                       reconnect-delay seconds in connecting-wait mode)
>    connected -> connecting-wait
> 
> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>     connection_co, tries to reconnect unlimited times.
> 
> 3. Retry nbd queries on channel error, if we are in connecting-wait
>     state.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.h |   4 +
>  block/nbd-client.c | 304 +++++++++++++++++++++++++++++++++++++++++++----------
>  2 files changed, 255 insertions(+), 53 deletions(-)
> 

> @@ -781,16 +936,21 @@ static int nbd_co_request(BlockDriverState *bs, NBDRequest *request,
>      } else {
>          assert(request->type != NBD_CMD_WRITE);
>      }
> -    ret = nbd_co_send_request(bs, request, write_qiov);
> -    if (ret < 0) {
> -        return ret;
> -    }
>  
> -    ret = nbd_co_receive_return_code(client, request->handle,
> -                                     &request_ret, &local_err);
> -    if (local_err) {
> -        error_report_err(local_err);
> -    }
> +    do {
> +        ret = nbd_co_send_request(bs, request, write_qiov);
> +        if (ret < 0) {
> +            continue;
> +        }
> +
> +        ret = nbd_co_receive_return_code(client, request->handle,
> +                                         &request_ret, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            local_err = NULL;

Conflicts with the conversion to use trace points.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] iotests: test nbd reconnect
  2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 10/10] iotests: test " Vladimir Sementsov-Ogievskiy
@ 2019-01-16 17:11   ` Eric Blake
  2019-04-11 16:02       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Blake @ 2019-01-16 17:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 4466 bytes --]

On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> Add test, which starts backup to nbd target and restarts nbd server
> during backup.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/220        | 67 +++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/220.out    |  7 +++++
>  tests/qemu-iotests/group      |  1 +
>  tests/qemu-iotests/iotests.py |  4 +++
>  4 files changed, 79 insertions(+)
>  create mode 100755 tests/qemu-iotests/220
>  create mode 100644 tests/qemu-iotests/220.out
> 

Test 220 has been created in the meantime; the obvious resolution is to
pick a new test number.

> diff --git a/tests/qemu-iotests/220 b/tests/qemu-iotests/220
> new file mode 100755
> index 0000000000..c9702a7dad
> --- /dev/null
> +++ b/tests/qemu-iotests/220
> @@ -0,0 +1,67 @@
> +#!/usr/bin/env python

> +
> +import iotests
> +from iotests import qemu_img_create, file_path, qemu_nbd_popen
> +
> +disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
> +nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
> +
> +qemu_img_create('-f', iotests.imgfmt, disk_a, '5M')
> +qemu_img_create('-f', iotests.imgfmt, disk_b, '5M')
> +srv = qemu_nbd_popen('-k', nbd_sock, '-x', 'exp', '-f', iotests.imgfmt, disk_b)
> +time.sleep(1)
> +
> +vm = iotests.VM().add_drive(disk_a)
> +vm.launch()
> +vm.hmp_qemu_io('drive0', 'write 0 5M')
> +
> +print 'blockdev-add:', vm.qmp('blockdev-add', node_name='backup0', driver='raw',
> +                              file={'driver':'nbd',
> +                                    'export': 'exp',
> +                                    'server': {'type': 'unix',
> +                                               'path': nbd_sock}})
> +print 'blockdev-backup:', vm.qmp('blockdev-backup', device='drive0',
> +                                 sync='full', target='backup0')
> +
> +time.sleep(1)
> +print 'Kill NBD server'
> +srv.kill()
> +
> +jobs = vm.qmp('query-block-jobs')['return']
> +if jobs and jobs[0]['offset'] < jobs[0]['len']:
> +    print 'Backup job is still in progress'
> +
> +time.sleep(1)

That's a lot of sleep()s for a test marked quick. Are we sure it won't
fail under heavy load? Can you make the test more reliable by looking
for specific events rather than just a fixed-length sleep?

> +
> +print 'Start NBD server'
> +srv = qemu_nbd_popen('-k', nbd_sock, '-x', 'exp', '-f', iotests.imgfmt, disk_b)
> +
> +try:
> +    e = vm.event_wait('BLOCK_JOB_COMPLETED')
> +    print e['event'], ':', e['data']
> +except:
> +    pass
> +
> +print 'blockdev-del:', vm.qmp('blockdev-del', node_name='backup0')
> +srv.kill()
> +vm.shutdown()
> diff --git a/tests/qemu-iotests/220.out b/tests/qemu-iotests/220.out
> new file mode 100644
> index 0000000000..dae1a49d9f
> --- /dev/null
> +++ b/tests/qemu-iotests/220.out
> @@ -0,0 +1,7 @@
> +blockdev-add: {u'return': {}}
> +blockdev-backup: {u'return': {}}
> +Kill NBD server
> +Backup job is still in progress
> +Start NBD server
> +BLOCK_JOB_COMPLETED : {u'device': u'drive0', u'type': u'backup', u'speed': 0, u'len': 5242880, u'offset': 5242880}
> +blockdev-del: {u'return': {}}
> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index b973dc842d..ee2473c6a3 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -219,6 +219,7 @@
>  217 rw auto quick
>  218 rw auto quick
>  219 rw auto
> +220 rw auto quick
>  221 rw auto quick
>  222 rw auto quick
>  223 rw auto quick
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 4e67fbbe96..17bc8c8e32 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -185,6 +185,10 @@ def qemu_nbd(*args):
>      '''Run qemu-nbd in daemon mode and return the parent's exit code'''
>      return subprocess.call(qemu_nbd_args + ['--fork'] + list(args))
>  
> +def qemu_nbd_popen(*args):
> +    '''Run qemu-nbd in daemon mode and return the parent's exit code'''
> +    return subprocess.Popen(qemu_nbd_args + ['--persistent'] + list(args))
> +
>  def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
>      '''Return True if two image files are identical'''
>      return qemu_img('compare', '-f', fmt1,
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/10] block/nbd-client: move from quit to state
  2019-01-16 16:58     ` Daniel P. Berrangé
@ 2019-02-05 16:35       ` Vladimir Sementsov-Ogievskiy
  2019-02-06  8:51         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-02-05 16:35 UTC (permalink / raw)
  To: Daniel P. Berrangé, Eric Blake
  Cc: qemu-devel, qemu-block, armbru, mreitz, kwolf, pbonzini, Denis Lunev

16.01.2019 19:58, Daniel P. Berrangé wrote:
> On Wed, Jan 16, 2019 at 10:25:03AM -0600, Eric Blake wrote:
>> [adding Dan]
>>
>> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> To implement reconnect we need several states for the client:
>>> CONNECTED, QUIT and two CONNECTING states. CONNECTING states will
>>> be realized in the following patches. This patch implements CONNECTED
>>> and QUIT.
>>>
>>> QUIT means, that we should close the connection and fail all current
>>> and further requests (like old quit = true).
>>>
>>> CONNECTED means that connection is ok, we can send requests (like old
>>> quit = false).
>>>
>>> For receiving loop we use a comparison of the current state with QUIT,
>>> because reconnect will be in the same loop, so it should be looping
>>> until the end.
>>>
>>> Opposite, for requests we use a comparison of the current state with
>>> CONNECTED, as we don't want to send requests in CONNECTING states (
>>> which are unreachable now, but will be reachable after the following
>>> commits)
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/nbd-client.h |  9 ++++++++-
>>>   block/nbd-client.c | 55 ++++++++++++++++++++++++++++++++----------------------
>>>   2 files changed, 41 insertions(+), 23 deletions(-)
>>
>> Dan just recently proposed patches to SocketChardev in general to use a
>> state machine that distinguishes between connecting and connected:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03339.html
>>
>> I'm wondering how much of his work is related or can be reused to get
>> restartable connections on NBD sockets?
> 
> There's nothing really special about what I did. Vladimir looks to
> have basically done the same kind of approach, but I don't think
> there's real scope for sharing with chardevs, as each care about
> their own set of states.
> 
>> Remember, right now, the NBD code always starts in blocking mode, and
>> does single-threaded handshaking until it is ready for transmission,
>> then switches to non-blocking mode for all subsequent transmissions (so,
>> for example, servicing a read request can assume that the socket is
>> valid without further waiting).  But once we start allowing reconnects,
>> a read request will need to detect when one socket has gone down, and
>> wait for its replacement socket to come back up, in order to retry the
>> request; this retry is in a context where we are in non-blocking
>> context, but the retry must establish a new socket, and possibly convert
>> the socket into TLS mode, all before being ready to retry the read request.
> 
> That makes it sound like the NBD handshake needs to be converted to
> use entirely non-blocking I/O.
> 
> The TLS handshake already uses an asynchronous callback pattern and
> to deal with that NBD had to create & run a private main loop to
> complete the TLS handshake in its blocking code pattern.
> 
> You could potentially push this concept up to the top level. ie
> implement the entire NBD handshake with async callbacks / non-blocking
> I/O. Then simply use a private main loop to run that in a blocking
> fashion for the initial connection. When you need to do re-connect
> you now just run the async code without the extra main loop around
> it.
> 

Hmm, you mean this code:

     data.loop = g_main_loop_new(g_main_context_default(), FALSE);
     trace_nbd_receive_starttls_tls_handshake();
     qio_channel_tls_handshake(tioc,
                               nbd_tls_handshake,
                               &data,
                               NULL,
                               NULL);

     if (!data.complete) {
         g_main_loop_run(data.loop);
     }
     g_main_loop_unref(data.loop);


What this does in context of Qemu? Isn't it more correct to do
coroutine based async staff, like in qcow2_open():

     if (qemu_in_coroutine()) {
         /* From bdrv_co_create.  */
         qcow2_open_entry(&qoc);
     } else {
         assert(qemu_get_current_aio_context() == qemu_get_aio_context());
         qemu_coroutine_enter(qemu_coroutine_create(qcow2_open_entry, &qoc));
         BDRV_POLL_WHILE(bs, qoc.ret == -EINPROGRESS);
     }
     return qoc.ret;

And then yield after handshake() and enter back from nbd_tls_handshake callback?

Hmm, also, checked, nobody calls g_main_context_default() in qemu, except
util/main-loop.c, nbd and tests. So, I'm not sure that this is a valid thing
to do in nbd..


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay
  2019-01-04 22:25   ` Eric Blake
@ 2019-02-05 16:48     ` Vladimir Sementsov-Ogievskiy
  2019-04-11 15:47       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-02-05 16:48 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, Denis Lunev

05.01.2019 1:25, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Reconnect will be implemented in the following commit, so for now,
>> in semantics below, disconnect itself is a "serious error".
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json | 12 +++++++++++-
>>   block/nbd-client.h   |  1 +
>>   block/nbd-client.c   |  1 +
>>   block/nbd.c          | 16 +++++++++++++++-
>>   4 files changed, 28 insertions(+), 2 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 5b9084a394..cf03402ec5 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3511,13 +3511,23 @@
>>   #                  traditional "base:allocation" block status (see
>>   #                  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 3.0)
>>   #
>> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to connect
> 
> Maybe 'On unexpected disconnect', since intentional disconnect is not
> unexpected.
> 
>> +#                   again, until success or serious error. During first
>> +#                   @reconnect-delay seconds of reconnecting loop all requests
>> +#                   are paused and have a chance to rerun, if successful
>> +#                   connect occures during this time. After @reconnect-delay
> 
> occurs
> 
>> +#                   seconds all delayed requests are failed and all following
>> +#                   requests will be failed to (until successfull reconnect).
> 
> successful
> 
>> +#                   Default 300 seconds (Since 3.1)
> 
> My delay in reviewing means this now has to be 4.0.
> 
> I'm guessing that a delay of 0 means disable auto-reconnect.  From a
> backwards-compatibility standpoint, no auto-reconnect is more in line
> with what we previously had - but from a usability standpoint, trying to
> reconnect can avoid turning transient network hiccups into permanent
> loss of a device to EIO errors, especially if the retry timeout is long
> enough to allow an administrator to reroute the network to an
> alternative server.  So I'm probably okay with the default being
> non-zero - but it DOES mean that where you used to get instant EIO
> failures when a network connection was severed, you now have to wait for
> the reconnect delay to expire, and 5 minutes can be a long wait.  Since
> the long delay is guest-observable, can we run into issues where a guest
> that is currently used to instant EIO and total loss of the device could
> instead get confused by not getting any response for up to 5 minutes,
> whether or not that response eventually turns out to be EIO or a
> successful recovery?

Hmm, do you have in mind such scenarios? Who could know?

If we unsure, I'm OK to proceed with no-reconnect behavior by default. We can
change the default in a separate patch later, if needed.

> 
>> +++ b/block/nbd.c
>> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
>>               .help = "experimental: expose named dirty bitmap in place of "
>>                       "block status",
>>           },
>> +        {
>> +            .name = "reconnect-delay",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "Reconnect delay. On disconnect, nbd client tries to"
>> +                    "connect again, until success or serious error. During"
>> +                    "first @reconnect-delay seconds of reconnecting loop all"
>> +                    "requests are paused and have a chance to rerun, if"
>> +                    "successful connect occures during this time. After"
>> +                    "@reconnect-delay seconds all delayed requests are failed"
>> +                    "and all following requests will be failed to (until"
>> +                    "successfull reconnect). Default 300 seconds",
> 
> Same typos as in qapi.
> 
> The UI aspects look fine, now I need to review the patch series for code
> issues :)
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect
  2019-01-16 17:04   ` Eric Blake
@ 2019-02-05 17:07     ` Vladimir Sementsov-Ogievskiy
  2019-02-05 17:15       ` Eric Blake
  0 siblings, 1 reply; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-02-05 17:07 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, Denis Lunev

16.01.2019 20:04, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Implement reconnect. To achieve this:
>>
>> 1. add new modes:
>>     connecting-wait: means, that reconnecting is in progress, and there
>>       were small number of reconnect attempts, so all requests are
>>       waiting for the connection.
>>     connecting-nowait: reconnecting is in progress, there were a lot of
>>       attempts of reconnect, all requests will return errors.
>>
>>     two old modes are used too:
>>     connected: normal state
>>     quit: exiting after fatal error or on close
> 
> What makes an error fatal?  Without reconnect, life is simple - if the
> server sends something we can't parse, we permanently turn the device
> into an error condition - because we have no way to get back in sync
> with the server for further commands.  Your patch allows reconnect
> attempts where the connection is down (we failed to send to the server
> or failed to receive the server's reply), but why can we not ALSO
> attempt to reconnect after a parse error?  A reconnect would let us get
> back in sync for attempting further commands.  You're right that the
> current command should probably fail in that case (if the server sent us
> garbage for a specific request, it will probably do so again on a repeat
> of that request; which is different than when we don't even know what
> the server would have sent because of a disconnect).

My thought was that protocol error is fatal, as it shows that server is
buggy, and it seems unsafe to communicate to buggy server..

> 
> Or, put another way, we KNOW we have (corner) cases where a mis-aligned
> image can currently cause the server to return BLOCK_STATUS replies that
> aren't aligned to the advertised minimumm block size.  Attempting to
> read the last sector of an image then causes the client to see the
> misaligned reply and complain, which we are treating as fatal.

Do we have fixes for it?

> But why
> not instead just fail that particular read, but still attempt a
> reconnect, in order to attempt further reads elsewhere in the image that
> do not trip up the server's misaligned reply?
> 

Hmm, for these cases, if we consider this errors not fatal, we don't need
even a reconnect..

If we want to consider protocol errors to be recoverable, we need reconnect only
on wrong magic and may be some kind of inconsistent variable data lenghts..

And it may need addition option, like --strict=false

>>
>> Possible transitions are:
>>
>>     * -> quit
>>     connecting-* -> connected
>>     connecting-wait -> connecting-nowait (transition is done after
>>                        reconnect-delay seconds in connecting-wait mode)
>>     connected -> connecting-wait
>>
>> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>>      connection_co, tries to reconnect unlimited times.
>>
>> 3. Retry nbd queries on channel error, if we are in connecting-wait
>>      state.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/nbd-client.h |   4 +
>>   block/nbd-client.c | 304 +++++++++++++++++++++++++++++++++++++++++++----------
>>   2 files changed, 255 insertions(+), 53 deletions(-)
>>
> 
>> @@ -781,16 +936,21 @@ static int nbd_co_request(BlockDriverState *bs, NBDRequest *request,
>>       } else {
>>           assert(request->type != NBD_CMD_WRITE);
>>       }
>> -    ret = nbd_co_send_request(bs, request, write_qiov);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>>   
>> -    ret = nbd_co_receive_return_code(client, request->handle,
>> -                                     &request_ret, &local_err);
>> -    if (local_err) {
>> -        error_report_err(local_err);
>> -    }
>> +    do {
>> +        ret = nbd_co_send_request(bs, request, write_qiov);
>> +        if (ret < 0) {
>> +            continue;
>> +        }
>> +
>> +        ret = nbd_co_receive_return_code(client, request->handle,
>> +                                         &request_ret, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            local_err = NULL;
> 
> Conflicts with the conversion to use trace points.
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect
  2019-02-05 17:07     ` Vladimir Sementsov-Ogievskiy
@ 2019-02-05 17:15       ` Eric Blake
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Blake @ 2019-02-05 17:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, Denis Lunev

[-- Attachment #1: Type: text/plain, Size: 1704 bytes --]

On 2/5/19 11:07 AM, Vladimir Sementsov-Ogievskiy wrote:

>> Or, put another way, we KNOW we have (corner) cases where a mis-aligned
>> image can currently cause the server to return BLOCK_STATUS replies that
>> aren't aligned to the advertised minimumm block size.  Attempting to
>> read the last sector of an image then causes the client to see the
>> misaligned reply and complain, which we are treating as fatal.
> 
> Do we have fixes for it?

Not yet - it's still on my queue of things to fix after I get libvirt
incremental backup APIs in.  Might make 4.0, might not (but not the end
of the world; it's been known-broken since 3.0, so it's not a new
regression).

> 
>> But why
>> not instead just fail that particular read, but still attempt a
>> reconnect, in order to attempt further reads elsewhere in the image that
>> do not trip up the server's misaligned reply?
>>
> 
> Hmm, for these cases, if we consider this errors not fatal, we don't need
> even a reconnect..

Well, it all depends on whether the client is still in sync with the
server. If either side has disobeyed the spec, and send too many/too few
bytes compared to what the other side expects, you'll have magic number
mismatches, where you really DO need a reconnect to get back in sync.

> 
> If we want to consider protocol errors to be recoverable, we need reconnect only
> on wrong magic and may be some kind of inconsistent variable data lenghts..
> 
> And it may need addition option, like --strict=false

An option on how hard to try may be reasonable.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/10] block/nbd-client: move from quit to state
  2019-02-05 16:35       ` Vladimir Sementsov-Ogievskiy
@ 2019-02-06  8:51         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-02-06  8:51 UTC (permalink / raw)
  To: Daniel P. Berrangé, Eric Blake
  Cc: qemu-devel, qemu-block, armbru, mreitz, kwolf, pbonzini, Denis Lunev

05.02.2019 19:35, Vladimir Sementsov-Ogievskiy wrote:
> 16.01.2019 19:58, Daniel P. Berrangé wrote:
>> On Wed, Jan 16, 2019 at 10:25:03AM -0600, Eric Blake wrote:
>>> [adding Dan]
>>>
>>> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> To implement reconnect we need several states for the client:
>>>> CONNECTED, QUIT and two CONNECTING states. CONNECTING states will
>>>> be realized in the following patches. This patch implements CONNECTED
>>>> and QUIT.
>>>>
>>>> QUIT means, that we should close the connection and fail all current
>>>> and further requests (like old quit = true).
>>>>
>>>> CONNECTED means that connection is ok, we can send requests (like old
>>>> quit = false).
>>>>
>>>> For receiving loop we use a comparison of the current state with QUIT,
>>>> because reconnect will be in the same loop, so it should be looping
>>>> until the end.
>>>>
>>>> Opposite, for requests we use a comparison of the current state with
>>>> CONNECTED, as we don't want to send requests in CONNECTING states (
>>>> which are unreachable now, but will be reachable after the following
>>>> commits)
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>   block/nbd-client.h |  9 ++++++++-
>>>>   block/nbd-client.c | 55 ++++++++++++++++++++++++++++++++----------------------
>>>>   2 files changed, 41 insertions(+), 23 deletions(-)
>>>
>>> Dan just recently proposed patches to SocketChardev in general to use a
>>> state machine that distinguishes between connecting and connected:
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03339.html
>>>
>>> I'm wondering how much of his work is related or can be reused to get
>>> restartable connections on NBD sockets?
>>
>> There's nothing really special about what I did. Vladimir looks to
>> have basically done the same kind of approach, but I don't think
>> there's real scope for sharing with chardevs, as each care about
>> their own set of states.
>>
>>> Remember, right now, the NBD code always starts in blocking mode, and
>>> does single-threaded handshaking until it is ready for transmission,
>>> then switches to non-blocking mode for all subsequent transmissions (so,
>>> for example, servicing a read request can assume that the socket is
>>> valid without further waiting).  But once we start allowing reconnects,
>>> a read request will need to detect when one socket has gone down, and
>>> wait for its replacement socket to come back up, in order to retry the
>>> request; this retry is in a context where we are in non-blocking
>>> context, but the retry must establish a new socket, and possibly convert
>>> the socket into TLS mode, all before being ready to retry the read request.
>>
>> That makes it sound like the NBD handshake needs to be converted to
>> use entirely non-blocking I/O.
>>
>> The TLS handshake already uses an asynchronous callback pattern and
>> to deal with that NBD had to create & run a private main loop to
>> complete the TLS handshake in its blocking code pattern.
>>
>> You could potentially push this concept up to the top level. ie
>> implement the entire NBD handshake with async callbacks / non-blocking
>> I/O. Then simply use a private main loop to run that in a blocking
>> fashion for the initial connection. When you need to do re-connect
>> you now just run the async code without the extra main loop around
>> it.
>>
> 
> Hmm, you mean this code:
> 
>      data.loop = g_main_loop_new(g_main_context_default(), FALSE);
>      trace_nbd_receive_starttls_tls_handshake();
>      qio_channel_tls_handshake(tioc,
>                                nbd_tls_handshake,
>                                &data,
>                                NULL,
>                                NULL);
> 
>      if (!data.complete) {
>          g_main_loop_run(data.loop);
>      }
>      g_main_loop_unref(data.loop);
> 
> 
> What this does in context of Qemu? Isn't it more correct to do
> coroutine based async staff, like in qcow2_open():
> 
>      if (qemu_in_coroutine()) {
>          /* From bdrv_co_create.  */
>          qcow2_open_entry(&qoc);
>      } else {
>          assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>          qemu_coroutine_enter(qemu_coroutine_create(qcow2_open_entry, &qoc));
>          BDRV_POLL_WHILE(bs, qoc.ret == -EINPROGRESS);
>      }
>      return qoc.ret;
> 
> And then yield after handshake() and enter back from nbd_tls_handshake callback?
> 
> Hmm, also, checked, nobody calls g_main_context_default() in qemu, except
> util/main-loop.c, nbd and tests. So, I'm not sure that this is a valid thing
> to do in nbd..
> 
> 

Aha, we just don't have any bs in nbd/ code. But anyway, moving to AioContext and make
negotiation non-blocking should be good idea. I'll try to do something around it.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay
@ 2019-04-11 15:47       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-11 15:47 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, Denis Lunev

05.01.2019 1:25, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Reconnect will be implemented in the following commit, so for now,
>> in semantics below, disconnect itself is a "serious error".
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json | 12 +++++++++++-
>>   block/nbd-client.h   |  1 +
>>   block/nbd-client.c   |  1 +
>>   block/nbd.c          | 16 +++++++++++++++-
>>   4 files changed, 28 insertions(+), 2 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 5b9084a394..cf03402ec5 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3511,13 +3511,23 @@
>>   #                  traditional "base:allocation" block status (see
>>   #                  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 3.0)
>>   #
>> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to connect
> 
> Maybe 'On unexpected disconnect', since intentional disconnect is not
> unexpected.
> 
>> +#                   again, until success or serious error. During first
>> +#                   @reconnect-delay seconds of reconnecting loop all requests
>> +#                   are paused and have a chance to rerun, if successful
>> +#                   connect occures during this time. After @reconnect-delay
> 
> occurs
> 
>> +#                   seconds all delayed requests are failed and all following
>> +#                   requests will be failed to (until successfull reconnect).
> 
> successful
> 
>> +#                   Default 300 seconds (Since 3.1)
> 
> My delay in reviewing means this now has to be 4.0.
> 
> I'm guessing that a delay of 0 means disable auto-reconnect.

I doubt, as reconnect-delay=0 is a valid case, when all request are failed
(pre-patch behavior), but we still try to reconnect in background. Any reason
not to try? If any, I suggest one of the following:

1. treat absence of the option as no-reconnect-at-all. So absence and 0 would be
    not the same

2. use -1 value as no-reconnect-at-all

3. additional bool option "reconnect"

>  From a
> backwards-compatibility standpoint, no auto-reconnect is more in line
> with what we previously had - but from a usability standpoint, trying to
> reconnect can avoid turning transient network hiccups into permanent
> loss of a device to EIO errors, especially if the retry timeout is long
> enough to allow an administrator to reroute the network to an
> alternative server.  So I'm probably okay with the default being
> non-zero - but it DOES mean that where you used to get instant EIO
> failures when a network connection was severed, you now have to wait for
> the reconnect delay to expire, and 5 minutes can be a long wait.  Since
> the long delay is guest-observable, can we run into issues where a guest
> that is currently used to instant EIO and total loss of the device could
> instead get confused by not getting any response for up to 5 minutes,
> whether or not that response eventually turns out to be EIO or a
> successful recovery?
> 
>> +++ b/block/nbd.c
>> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
>>               .help = "experimental: expose named dirty bitmap in place of "
>>                       "block status",
>>           },
>> +        {
>> +            .name = "reconnect-delay",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "Reconnect delay. On disconnect, nbd client tries to"
>> +                    "connect again, until success or serious error. During"
>> +                    "first @reconnect-delay seconds of reconnecting loop all"
>> +                    "requests are paused and have a chance to rerun, if"
>> +                    "successful connect occures during this time. After"
>> +                    "@reconnect-delay seconds all delayed requests are failed"
>> +                    "and all following requests will be failed to (until"
>> +                    "successfull reconnect). Default 300 seconds",
> 
> Same typos as in qapi.
> 
> The UI aspects look fine, now I need to review the patch series for code
> issues :)
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay
@ 2019-04-11 15:47       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-11 15:47 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: kwolf, pbonzini, Denis Lunev, armbru, mreitz

05.01.2019 1:25, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Reconnect will be implemented in the following commit, so for now,
>> in semantics below, disconnect itself is a "serious error".
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json | 12 +++++++++++-
>>   block/nbd-client.h   |  1 +
>>   block/nbd-client.c   |  1 +
>>   block/nbd.c          | 16 +++++++++++++++-
>>   4 files changed, 28 insertions(+), 2 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 5b9084a394..cf03402ec5 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3511,13 +3511,23 @@
>>   #                  traditional "base:allocation" block status (see
>>   #                  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 3.0)
>>   #
>> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to connect
> 
> Maybe 'On unexpected disconnect', since intentional disconnect is not
> unexpected.
> 
>> +#                   again, until success or serious error. During first
>> +#                   @reconnect-delay seconds of reconnecting loop all requests
>> +#                   are paused and have a chance to rerun, if successful
>> +#                   connect occures during this time. After @reconnect-delay
> 
> occurs
> 
>> +#                   seconds all delayed requests are failed and all following
>> +#                   requests will be failed to (until successfull reconnect).
> 
> successful
> 
>> +#                   Default 300 seconds (Since 3.1)
> 
> My delay in reviewing means this now has to be 4.0.
> 
> I'm guessing that a delay of 0 means disable auto-reconnect.

I doubt, as reconnect-delay=0 is a valid case, when all request are failed
(pre-patch behavior), but we still try to reconnect in background. Any reason
not to try? If any, I suggest one of the following:

1. treat absence of the option as no-reconnect-at-all. So absence and 0 would be
    not the same

2. use -1 value as no-reconnect-at-all

3. additional bool option "reconnect"

>  From a
> backwards-compatibility standpoint, no auto-reconnect is more in line
> with what we previously had - but from a usability standpoint, trying to
> reconnect can avoid turning transient network hiccups into permanent
> loss of a device to EIO errors, especially if the retry timeout is long
> enough to allow an administrator to reroute the network to an
> alternative server.  So I'm probably okay with the default being
> non-zero - but it DOES mean that where you used to get instant EIO
> failures when a network connection was severed, you now have to wait for
> the reconnect delay to expire, and 5 minutes can be a long wait.  Since
> the long delay is guest-observable, can we run into issues where a guest
> that is currently used to instant EIO and total loss of the device could
> instead get confused by not getting any response for up to 5 minutes,
> whether or not that response eventually turns out to be EIO or a
> successful recovery?
> 
>> +++ b/block/nbd.c
>> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
>>               .help = "experimental: expose named dirty bitmap in place of "
>>                       "block status",
>>           },
>> +        {
>> +            .name = "reconnect-delay",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "Reconnect delay. On disconnect, nbd client tries to"
>> +                    "connect again, until success or serious error. During"
>> +                    "first @reconnect-delay seconds of reconnecting loop all"
>> +                    "requests are paused and have a chance to rerun, if"
>> +                    "successful connect occures during this time. After"
>> +                    "@reconnect-delay seconds all delayed requests are failed"
>> +                    "and all following requests will be failed to (until"
>> +                    "successfull reconnect). Default 300 seconds",
> 
> Same typos as in qapi.
> 
> The UI aspects look fine, now I need to review the patch series for code
> issues :)
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] iotests: test nbd reconnect
@ 2019-04-11 16:02       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-11 16:02 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: armbru, mreitz, kwolf, pbonzini, Denis Lunev

16.01.2019 20:11, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Add test, which starts backup to nbd target and restarts nbd server
>> during backup.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/220        | 67 +++++++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/220.out    |  7 +++++
>>   tests/qemu-iotests/group      |  1 +
>>   tests/qemu-iotests/iotests.py |  4 +++
>>   4 files changed, 79 insertions(+)
>>   create mode 100755 tests/qemu-iotests/220
>>   create mode 100644 tests/qemu-iotests/220.out
>>
> 
> Test 220 has been created in the meantime; the obvious resolution is to
> pick a new test number.
> 
>> diff --git a/tests/qemu-iotests/220 b/tests/qemu-iotests/220
>> new file mode 100755
>> index 0000000000..c9702a7dad
>> --- /dev/null
>> +++ b/tests/qemu-iotests/220
>> @@ -0,0 +1,67 @@
>> +#!/usr/bin/env python
> 
>> +
>> +import iotests
>> +from iotests import qemu_img_create, file_path, qemu_nbd_popen
>> +
>> +disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
>> +nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
>> +
>> +qemu_img_create('-f', iotests.imgfmt, disk_a, '5M')
>> +qemu_img_create('-f', iotests.imgfmt, disk_b, '5M')
>> +srv = qemu_nbd_popen('-k', nbd_sock, '-x', 'exp', '-f', iotests.imgfmt, disk_b)
>> +time.sleep(1)
>> +
>> +vm = iotests.VM().add_drive(disk_a)
>> +vm.launch()
>> +vm.hmp_qemu_io('drive0', 'write 0 5M')
>> +
>> +print 'blockdev-add:', vm.qmp('blockdev-add', node_name='backup0', driver='raw',
>> +                              file={'driver':'nbd',
>> +                                    'export': 'exp',
>> +                                    'server': {'type': 'unix',
>> +                                               'path': nbd_sock}})
>> +print 'blockdev-backup:', vm.qmp('blockdev-backup', device='drive0',
>> +                                 sync='full', target='backup0')
>> +
>> +time.sleep(1)
>> +print 'Kill NBD server'
>> +srv.kill()
>> +
>> +jobs = vm.qmp('query-block-jobs')['return']
>> +if jobs and jobs[0]['offset'] < jobs[0]['len']:
>> +    print 'Backup job is still in progress'
>> +
>> +time.sleep(1)
> 
> That's a lot of sleep()s for a test marked quick. Are we sure it won't
> fail under heavy load? Can you make the test more reliable by looking
> for specific events rather than just a fixed-length sleep?

Hmm.. 3 seconds is still quick I think.

Firstly I want nbd server to be actually started, but it has no events. Then I want
backup to do some progress, again, no event for progress. And the last one actual server
unavailable time.. So, I don't have better idea.

> 
>> +
>> +print 'Start NBD server'
>> +srv = qemu_nbd_popen('-k', nbd_sock, '-x', 'exp', '-f', iotests.imgfmt, disk_b)
>> +
>> +try:
>> +    e = vm.event_wait('BLOCK_JOB_COMPLETED')
>> +    print e['event'], ':', e['data']
>> +except:
>> +    pass
>> +
>> +print 'blockdev-del:', vm.qmp('blockdev-del', node_name='backup0')
>> +srv.kill()
>> +vm.shutdown()
>> diff --git a/tests/qemu-iotests/220.out b/tests/qemu-iotests/220.out
>> new file mode 100644
>> index 0000000000..dae1a49d9f
>> --- /dev/null
>> +++ b/tests/qemu-iotests/220.out
>> @@ -0,0 +1,7 @@
>> +blockdev-add: {u'return': {}}
>> +blockdev-backup: {u'return': {}}
>> +Kill NBD server
>> +Backup job is still in progress
>> +Start NBD server
>> +BLOCK_JOB_COMPLETED : {u'device': u'drive0', u'type': u'backup', u'speed': 0, u'len': 5242880, u'offset': 5242880}
>> +blockdev-del: {u'return': {}}
>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>> index b973dc842d..ee2473c6a3 100644
>> --- a/tests/qemu-iotests/group
>> +++ b/tests/qemu-iotests/group
>> @@ -219,6 +219,7 @@
>>   217 rw auto quick
>>   218 rw auto quick
>>   219 rw auto
>> +220 rw auto quick
>>   221 rw auto quick
>>   222 rw auto quick
>>   223 rw auto quick
>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>> index 4e67fbbe96..17bc8c8e32 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -185,6 +185,10 @@ def qemu_nbd(*args):
>>       '''Run qemu-nbd in daemon mode and return the parent's exit code'''
>>       return subprocess.call(qemu_nbd_args + ['--fork'] + list(args))
>>   
>> +def qemu_nbd_popen(*args):
>> +    '''Run qemu-nbd in daemon mode and return the parent's exit code'''
>> +    return subprocess.Popen(qemu_nbd_args + ['--persistent'] + list(args))
>> +
>>   def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
>>       '''Return True if two image files are identical'''
>>       return qemu_img('compare', '-f', fmt1,
>>
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/10] iotests: test nbd reconnect
@ 2019-04-11 16:02       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-11 16:02 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: kwolf, pbonzini, Denis Lunev, armbru, mreitz

16.01.2019 20:11, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Add test, which starts backup to nbd target and restarts nbd server
>> during backup.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/220        | 67 +++++++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/220.out    |  7 +++++
>>   tests/qemu-iotests/group      |  1 +
>>   tests/qemu-iotests/iotests.py |  4 +++
>>   4 files changed, 79 insertions(+)
>>   create mode 100755 tests/qemu-iotests/220
>>   create mode 100644 tests/qemu-iotests/220.out
>>
> 
> Test 220 has been created in the meantime; the obvious resolution is to
> pick a new test number.
> 
>> diff --git a/tests/qemu-iotests/220 b/tests/qemu-iotests/220
>> new file mode 100755
>> index 0000000000..c9702a7dad
>> --- /dev/null
>> +++ b/tests/qemu-iotests/220
>> @@ -0,0 +1,67 @@
>> +#!/usr/bin/env python
> 
>> +
>> +import iotests
>> +from iotests import qemu_img_create, file_path, qemu_nbd_popen
>> +
>> +disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
>> +nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
>> +
>> +qemu_img_create('-f', iotests.imgfmt, disk_a, '5M')
>> +qemu_img_create('-f', iotests.imgfmt, disk_b, '5M')
>> +srv = qemu_nbd_popen('-k', nbd_sock, '-x', 'exp', '-f', iotests.imgfmt, disk_b)
>> +time.sleep(1)
>> +
>> +vm = iotests.VM().add_drive(disk_a)
>> +vm.launch()
>> +vm.hmp_qemu_io('drive0', 'write 0 5M')
>> +
>> +print 'blockdev-add:', vm.qmp('blockdev-add', node_name='backup0', driver='raw',
>> +                              file={'driver':'nbd',
>> +                                    'export': 'exp',
>> +                                    'server': {'type': 'unix',
>> +                                               'path': nbd_sock}})
>> +print 'blockdev-backup:', vm.qmp('blockdev-backup', device='drive0',
>> +                                 sync='full', target='backup0')
>> +
>> +time.sleep(1)
>> +print 'Kill NBD server'
>> +srv.kill()
>> +
>> +jobs = vm.qmp('query-block-jobs')['return']
>> +if jobs and jobs[0]['offset'] < jobs[0]['len']:
>> +    print 'Backup job is still in progress'
>> +
>> +time.sleep(1)
> 
> That's a lot of sleep()s for a test marked quick. Are we sure it won't
> fail under heavy load? Can you make the test more reliable by looking
> for specific events rather than just a fixed-length sleep?

Hmm.. 3 seconds is still quick I think.

Firstly I want nbd server to be actually started, but it has no events. Then I want
backup to do some progress, again, no event for progress. And the last one actual server
unavailable time.. So, I don't have better idea.

> 
>> +
>> +print 'Start NBD server'
>> +srv = qemu_nbd_popen('-k', nbd_sock, '-x', 'exp', '-f', iotests.imgfmt, disk_b)
>> +
>> +try:
>> +    e = vm.event_wait('BLOCK_JOB_COMPLETED')
>> +    print e['event'], ':', e['data']
>> +except:
>> +    pass
>> +
>> +print 'blockdev-del:', vm.qmp('blockdev-del', node_name='backup0')
>> +srv.kill()
>> +vm.shutdown()
>> diff --git a/tests/qemu-iotests/220.out b/tests/qemu-iotests/220.out
>> new file mode 100644
>> index 0000000000..dae1a49d9f
>> --- /dev/null
>> +++ b/tests/qemu-iotests/220.out
>> @@ -0,0 +1,7 @@
>> +blockdev-add: {u'return': {}}
>> +blockdev-backup: {u'return': {}}
>> +Kill NBD server
>> +Backup job is still in progress
>> +Start NBD server
>> +BLOCK_JOB_COMPLETED : {u'device': u'drive0', u'type': u'backup', u'speed': 0, u'len': 5242880, u'offset': 5242880}
>> +blockdev-del: {u'return': {}}
>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>> index b973dc842d..ee2473c6a3 100644
>> --- a/tests/qemu-iotests/group
>> +++ b/tests/qemu-iotests/group
>> @@ -219,6 +219,7 @@
>>   217 rw auto quick
>>   218 rw auto quick
>>   219 rw auto
>> +220 rw auto quick
>>   221 rw auto quick
>>   222 rw auto quick
>>   223 rw auto quick
>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>> index 4e67fbbe96..17bc8c8e32 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -185,6 +185,10 @@ def qemu_nbd(*args):
>>       '''Run qemu-nbd in daemon mode and return the parent's exit code'''
>>       return subprocess.call(qemu_nbd_args + ['--fork'] + list(args))
>>   
>> +def qemu_nbd_popen(*args):
>> +    '''Run qemu-nbd in daemon mode and return the parent's exit code'''
>> +    return subprocess.Popen(qemu_nbd_args + ['--persistent'] + list(args))
>> +
>>   def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
>>       '''Return True if two image files are identical'''
>>       return qemu_img('compare', '-f', fmt1,
>>
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2019-04-11 16:04 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 01/10] block/nbd-client: split channel errors from export errors Vladimir Sementsov-Ogievskiy
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 02/10] block/nbd: move connection code from block/nbd to block/nbd-client Vladimir Sementsov-Ogievskiy
2019-01-16 15:56   ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 03/10] block/nbd-client: split connection from initialization Vladimir Sementsov-Ogievskiy
2019-01-16 15:52   ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 04/10] block/nbd-client: fix nbd_reply_chunk_iter_receive Vladimir Sementsov-Ogievskiy
2019-01-16 16:01   ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 05/10] block/nbd-client: don't check ioc Vladimir Sementsov-Ogievskiy
2019-01-16 16:05   ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 06/10] block/nbd-client: move from quit to state Vladimir Sementsov-Ogievskiy
2019-01-16 16:25   ` Eric Blake
2019-01-16 16:58     ` Daniel P. Berrangé
2019-02-05 16:35       ` Vladimir Sementsov-Ogievskiy
2019-02-06  8:51         ` Vladimir Sementsov-Ogievskiy
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 07/10] block/nbd-client: rename read_reply_co to connection_co Vladimir Sementsov-Ogievskiy
2019-01-16 16:35   ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay Vladimir Sementsov-Ogievskiy
2019-01-04 22:25   ` Eric Blake
2019-02-05 16:48     ` Vladimir Sementsov-Ogievskiy
2019-04-11 15:47     ` Vladimir Sementsov-Ogievskiy
2019-04-11 15:47       ` Vladimir Sementsov-Ogievskiy
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect Vladimir Sementsov-Ogievskiy
2018-11-02 12:39   ` Vladimir Sementsov-Ogievskiy
2019-01-16 17:04   ` Eric Blake
2019-02-05 17:07     ` Vladimir Sementsov-Ogievskiy
2019-02-05 17:15       ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 10/10] iotests: test " Vladimir Sementsov-Ogievskiy
2019-01-16 17:11   ` Eric Blake
2019-04-11 16:02     ` Vladimir Sementsov-Ogievskiy
2019-04-11 16:02       ` Vladimir Sementsov-Ogievskiy
     [not found] ` <fc24ba9e-e325-6478-cb22-bc0a256c6e87@virtuozzo.com>
2018-10-09 19:33   ` [Qemu-devel] [Qemu-block] [PATCH v4 00/10] NBD reconnect John Snow
2018-10-09 21:59     ` Vladimir Sementsov-Ogievskiy
2018-12-12 10:33 ` [Qemu-devel] ping " Vladimir Sementsov-Ogievskiy
2018-12-29 12:23 ` [Qemu-devel] ping3 " Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.