All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/8] nbd minimal structured read
@ 2017-09-25 13:57 Vladimir Sementsov-Ogievskiy
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 1/8] block/nbd-client: assert qiov len once in nbd_co_request Vladimir Sementsov-Ogievskiy
                   ` (8 more replies)
  0 siblings, 9 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-25 13:57 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Hmreitz, kwolf, pbonzini, eblake, vsementsov, den

Hi all. Here is minimal structured-read extension implementation for nbd.
It is needed mostly to implement nbd block-status extension over it, so
it is minimal.

It's based on my three patches, currently on pull-request stage:
      block/nbd-client: refactor nbd_co_receive_reply
      block/nbd-client: simplify check in nbd_co_receive_reply
      block/nbd-client: nbd_co_send_request: fix return code

clone: tag up-nbd-minimal-structured-read-v1 from https://src.openvz.org/scm/~vsementsov/qemu.git
online: https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=up-nbd-minimal-structured-read-v1

Note: actually it's a continuation of a part of my series "nbd: BLOCK_STATUS",
but it was sent about half a year ago, so let's consider it as new series.

Vladimir Sementsov-Ogievskiy (8):
  block/nbd-client: assert qiov len once in nbd_co_request
  block/nbd-client: refactor nbd_co_receive_reply
  nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC
  nbd-server: refactor simple reply sending
  nbd: header constants indenting
  nbd: Minimal structured read for server
  nbd/client: refactor nbd_receive_starttls
  nbd: Minimal structured read for client

 block/nbd-client.h                       |   2 +
 include/block/nbd.h                      |  67 +++++++--
 nbd/nbd-internal.h                       |  44 +++---
 block/nbd-client.c                       | 104 +++++++++++---
 nbd/client.c                             | 233 ++++++++++++++++++++++++++-----
 nbd/server.c                             | 189 +++++++++++++++++--------
 nbd/trace-events                         |   7 +-
 tests/qemu-iotests/nbd-fault-injector.py |   4 +-
 8 files changed, 507 insertions(+), 143 deletions(-)

-- 
2.11.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 1/8] block/nbd-client: assert qiov len once in nbd_co_request
  2017-09-25 13:57 [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
@ 2017-09-25 13:57 ` Vladimir Sementsov-Ogievskiy
  2017-09-25 21:58   ` Eric Blake
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 2/8] block/nbd-client: refactor nbd_co_receive_reply Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-25 13:57 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Hmreitz, kwolf, pbonzini, eblake, vsementsov, den

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 9d1e154feb..88fd10270e 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -156,7 +156,6 @@ static int nbd_co_send_request(BlockDriverState *bs,
         qio_channel_set_cork(s->ioc, true);
         rc = nbd_send_request(s->ioc, request);
         if (rc >= 0 && !s->quit) {
-            assert(request->len == iov_size(qiov->iov, qiov->niov));
             if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                        NULL) < 0) {
                 rc = -EIO;
@@ -197,7 +196,6 @@ static int nbd_co_receive_reply(NBDClientSession *s,
         assert(s->reply.handle == request->handle);
         ret = -s->reply.error;
         if (qiov && s->reply.error == 0) {
-            assert(request->len == iov_size(qiov->iov, qiov->niov));
             if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
                                       NULL) < 0) {
                 ret = -EIO;
@@ -233,6 +231,7 @@ static int nbd_co_request(BlockDriverState *bs,
 
     assert(!qiov || request->type == NBD_CMD_WRITE ||
            request->type == NBD_CMD_READ);
+    assert(!qiov || request->len == iov_size(qiov->iov, qiov->niov));
     ret = nbd_co_send_request(bs, request,
                               request->type == NBD_CMD_WRITE ? qiov : NULL);
     if (ret < 0) {
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 2/8] block/nbd-client: refactor nbd_co_receive_reply
  2017-09-25 13:57 [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 1/8] block/nbd-client: assert qiov len once in nbd_co_request Vladimir Sementsov-Ogievskiy
@ 2017-09-25 13:57 ` Vladimir Sementsov-Ogievskiy
  2017-09-25 21:59   ` Eric Blake
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 3/8] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-25 13:57 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Hmreitz, kwolf, pbonzini, eblake, vsementsov, den

Pass handle parameter directly, as the whole request isn't needed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 88fd10270e..e4f0c789f4 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -180,11 +180,11 @@ err:
 }
 
 static int nbd_co_receive_reply(NBDClientSession *s,
-                                NBDRequest *request,
+                                uint64_t handle,
                                 QEMUIOVector *qiov)
 {
     int ret;
-    int i = HANDLE_TO_INDEX(s, request->handle);
+    int i = HANDLE_TO_INDEX(s, handle);
 
     /* Wait until we're woken up by nbd_read_reply_entry.  */
     s->requests[i].receiving = true;
@@ -193,7 +193,7 @@ static int nbd_co_receive_reply(NBDClientSession *s,
     if (!s->ioc || s->quit) {
         ret = -EIO;
     } else {
-        assert(s->reply.handle == request->handle);
+        assert(s->reply.handle == handle);
         ret = -s->reply.error;
         if (qiov && s->reply.error == 0) {
             if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
@@ -238,7 +238,7 @@ static int nbd_co_request(BlockDriverState *bs,
         return ret;
     }
 
-    return nbd_co_receive_reply(client, request,
+    return nbd_co_receive_reply(client, request->handle,
                                 request->type == NBD_CMD_READ ? qiov : NULL);
 }
 
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 3/8] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC
  2017-09-25 13:57 [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 1/8] block/nbd-client: assert qiov len once in nbd_co_request Vladimir Sementsov-Ogievskiy
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 2/8] block/nbd-client: refactor nbd_co_receive_reply Vladimir Sementsov-Ogievskiy
@ 2017-09-25 13:57 ` Vladimir Sementsov-Ogievskiy
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 4/8] nbd-server: refactor simple reply sending Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-25 13:57 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Hmreitz, kwolf, pbonzini, eblake, vsementsov, den

To be consistent when NBD_STRUCTURED_REPLY_MAGIC will be introduced.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 nbd/nbd-internal.h                       | 2 +-
 nbd/client.c                             | 4 ++--
 nbd/server.c                             | 4 ++--
 tests/qemu-iotests/nbd-fault-injector.py | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 8a609a227f..2d6663de23 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -47,7 +47,7 @@
 #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124)
 
 #define NBD_REQUEST_MAGIC       0x25609513
-#define NBD_REPLY_MAGIC         0x67446698
+#define NBD_SIMPLE_REPLY_MAGIC  0x67446698
 #define NBD_OPTS_MAGIC          0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC        0x0000420281861253LL
 #define NBD_REP_MAGIC           0x0003e889045565a9LL
diff --git a/nbd/client.c b/nbd/client.c
index 68a0bc1ffc..cd5a2c80ac 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -931,7 +931,7 @@ int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
     }
 
     /* Reply
-       [ 0 ..  3]    magic   (NBD_REPLY_MAGIC)
+       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
        [ 4 ..  7]    error   (0 == no error)
        [ 7 .. 15]    handle
      */
@@ -949,7 +949,7 @@ int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
     }
     trace_nbd_receive_reply(magic, reply->error, reply->handle);
 
-    if (magic != NBD_REPLY_MAGIC) {
+    if (magic != NBD_SIMPLE_REPLY_MAGIC) {
         error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
diff --git a/nbd/server.c b/nbd/server.c
index 993ade30bb..a1b21a6951 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -911,11 +911,11 @@ static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
     trace_nbd_send_reply(reply->error, reply->handle);
 
     /* Reply
-       [ 0 ..  3]    magic   (NBD_REPLY_MAGIC)
+       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
        [ 4 ..  7]    error   (0 == no error)
        [ 7 .. 15]    handle
      */
-    stl_be_p(buf, NBD_REPLY_MAGIC);
+    stl_be_p(buf, NBD_SIMPLE_REPLY_MAGIC);
     stl_be_p(buf + 4, reply->error);
     stq_be_p(buf + 8, reply->handle);
 
diff --git a/tests/qemu-iotests/nbd-fault-injector.py b/tests/qemu-iotests/nbd-fault-injector.py
index 1c10dcb51c..8a04d979aa 100755
--- a/tests/qemu-iotests/nbd-fault-injector.py
+++ b/tests/qemu-iotests/nbd-fault-injector.py
@@ -56,7 +56,7 @@ NBD_CMD_READ = 0
 NBD_CMD_WRITE = 1
 NBD_CMD_DISC = 2
 NBD_REQUEST_MAGIC = 0x25609513
-NBD_REPLY_MAGIC = 0x67446698
+NBD_SIMPLE_REPLY_MAGIC = 0x67446698
 NBD_PASSWD = 0x4e42444d41474943
 NBD_OPTS_MAGIC = 0x49484156454F5054
 NBD_CLIENT_MAGIC = 0x0000420281861253
@@ -166,7 +166,7 @@ def read_request(conn):
     return req
 
 def write_reply(conn, error, handle):
-    buf = reply_struct.pack(NBD_REPLY_MAGIC, error, handle)
+    buf = reply_struct.pack(NBD_SIMPLE_REPLY_MAGIC, error, handle)
     conn.send(buf, event='reply')
 
 def handle_connection(conn, use_export):
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 4/8] nbd-server: refactor simple reply sending
  2017-09-25 13:57 [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 3/8] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC Vladimir Sementsov-Ogievskiy
@ 2017-09-25 13:57 ` Vladimir Sementsov-Ogievskiy
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 5/8] nbd: header constants indenting Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-25 13:57 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Hmreitz, kwolf, pbonzini, eblake, vsementsov, den

Get rid of calculating structure fields offsets by hand and set_cork,
rename nbd_co_send_reply to nbd_co_send_simple_reply. Do not use
NBDReply which will be upgraded in future patches to handle both simple
and structured replies and will be used only in the client.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h |  6 ++++
 nbd/nbd-internal.h  |  9 +++++
 nbd/server.c        | 97 ++++++++++++++++++++++-------------------------------
 3 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 707fd37575..49008980f4 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -63,6 +63,12 @@ struct NBDReply {
 };
 typedef struct NBDReply NBDReply;
 
+typedef struct NBDSimpleReply {
+    uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
+    uint32_t error;
+    uint64_t handle;
+} QEMU_PACKED NBDSimpleReply;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 2d6663de23..320abef296 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -113,6 +113,15 @@ static inline int nbd_write(QIOChannel *ioc, const void *buffer, size_t size,
     return qio_channel_write_all(ioc, buffer, size, errp) < 0 ? -EIO : 0;
 }
 
+/* nbd_writev
+ * Writes @size bytes to @ioc. Returns 0 on success.
+ */
+static inline int nbd_writev(QIOChannel *ioc, const struct iovec *iov,
+                             size_t niov, Error **errp)
+{
+    return qio_channel_writev_all(ioc, iov, niov, errp) < 0 ? -EIO : 0;
+}
+
 struct NBDTLSHandshakeData {
     GMainLoop *loop;
     bool complete;
diff --git a/nbd/server.c b/nbd/server.c
index a1b21a6951..57d5598e0f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -902,26 +902,6 @@ static int nbd_receive_request(QIOChannel *ioc, NBDRequest *request,
     return 0;
 }
 
-static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
-{
-    uint8_t buf[NBD_REPLY_SIZE];
-
-    reply->error = system_errno_to_nbd_errno(reply->error);
-
-    trace_nbd_send_reply(reply->error, reply->handle);
-
-    /* Reply
-       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
-       [ 4 ..  7]    error   (0 == no error)
-       [ 7 .. 15]    handle
-     */
-    stl_be_p(buf, NBD_SIMPLE_REPLY_MAGIC);
-    stl_be_p(buf + 4, reply->error);
-    stq_be_p(buf + 8, reply->handle);
-
-    return nbd_write(ioc, buf, sizeof(buf), errp);
-}
-
 #define MAX_NBD_REQUESTS 16
 
 void nbd_client_get(NBDClient *client)
@@ -1208,38 +1188,51 @@ void nbd_export_close_all(void)
     }
 }
 
-static int nbd_co_send_reply(NBDRequestData *req, NBDReply *reply, int len,
-                             Error **errp)
+static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
+                                        unsigned niov, Error **errp)
 {
-    NBDClient *client = req->client;
     int ret;
 
     g_assert(qemu_in_coroutine());
-
-    trace_nbd_co_send_reply(reply->handle, reply->error, len);
-
     qemu_co_mutex_lock(&client->send_lock);
     client->send_coroutine = qemu_coroutine_self();
 
-    if (!len) {
-        ret = nbd_send_reply(client->ioc, reply, errp);
-    } else {
-        qio_channel_set_cork(client->ioc, true);
-        ret = nbd_send_reply(client->ioc, reply, errp);
-        if (ret == 0) {
-            ret = nbd_write(client->ioc, req->data, len, errp);
-            if (ret < 0) {
-                ret = -EIO;
-            }
-        }
-        qio_channel_set_cork(client->ioc, false);
-    }
+    ret = nbd_writev(client->ioc, iov, niov, errp);
 
     client->send_coroutine = NULL;
     qemu_co_mutex_unlock(&client->send_lock);
+
     return ret;
 }
 
+static inline void set_be_simple_reply(NBDSimpleReply *reply, uint64_t error,
+                                       uint64_t handle)
+{
+    stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
+    stl_be_p(&reply->error, error);
+    stq_be_p(&reply->handle, handle);
+}
+
+static int nbd_co_send_simple_reply(NBDClient *client,
+                                    uint64_t handle,
+                                    uint32_t error,
+                                    void *data,
+                                    size_t size,
+                                    Error **errp)
+{
+    NBDSimpleReply reply;
+    struct iovec iov[] = {
+        {.iov_base = &reply, .iov_len = sizeof(reply)},
+        {.iov_base = data, .iov_len = size}
+    };
+
+    trace_nbd_co_send_reply(handle, error, size);
+
+    set_be_simple_reply(&reply, system_errno_to_nbd_errno(error), handle);
+
+    return nbd_co_send_iov(client, iov, size ? 2 : 1, errp);
+}
+
 /* nbd_co_receive_request
  * Collect a client request. Return 0 if request looks valid, -EIO to drop
  * connection right away, and any other negative value to report an error to
@@ -1331,7 +1324,6 @@ static coroutine_fn void nbd_trip(void *opaque)
     NBDExport *exp = client->exp;
     NBDRequestData *req;
     NBDRequest request = { 0 };    /* GCC thinks it can be used uninitialized */
-    NBDReply reply;
     int ret;
     int flags;
     int reply_data_len = 0;
@@ -1351,11 +1343,7 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto disconnect;
     }
 
-    reply.handle = request.handle;
-    reply.error = 0;
-
     if (ret < 0) {
-        reply.error = -ret;
         goto reply;
     }
 
@@ -1374,7 +1362,6 @@ static coroutine_fn void nbd_trip(void *opaque)
             ret = blk_co_flush(exp->blk);
             if (ret < 0) {
                 error_setg_errno(&local_err, -ret, "flush failed");
-                reply.error = -ret;
                 break;
             }
         }
@@ -1383,7 +1370,6 @@ static coroutine_fn void nbd_trip(void *opaque)
                         req->data, request.len);
         if (ret < 0) {
             error_setg_errno(&local_err, -ret, "reading from file failed");
-            reply.error = -ret;
             break;
         }
 
@@ -1392,7 +1378,7 @@ static coroutine_fn void nbd_trip(void *opaque)
         break;
     case NBD_CMD_WRITE:
         if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
-            reply.error = EROFS;
+            ret = -EROFS;
             break;
         }
 
@@ -1404,14 +1390,13 @@ static coroutine_fn void nbd_trip(void *opaque)
                          req->data, request.len, flags);
         if (ret < 0) {
             error_setg_errno(&local_err, -ret, "writing to file failed");
-            reply.error = -ret;
         }
 
         break;
     case NBD_CMD_WRITE_ZEROES:
         if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
             error_setg(&local_err, "Server is read-only, return error");
-            reply.error = EROFS;
+            ret = -EROFS;
             break;
         }
 
@@ -1426,7 +1411,6 @@ static coroutine_fn void nbd_trip(void *opaque)
                                 request.len, flags);
         if (ret < 0) {
             error_setg_errno(&local_err, -ret, "writing to file failed");
-            reply.error = -ret;
         }
 
         break;
@@ -1438,7 +1422,6 @@ static coroutine_fn void nbd_trip(void *opaque)
         ret = blk_co_flush(exp->blk);
         if (ret < 0) {
             error_setg_errno(&local_err, -ret, "flush failed");
-            reply.error = -ret;
         }
 
         break;
@@ -1447,25 +1430,27 @@ static coroutine_fn void nbd_trip(void *opaque)
                               request.len);
         if (ret < 0) {
             error_setg_errno(&local_err, -ret, "discard failed");
-            reply.error = -ret;
         }
 
         break;
     default:
         error_setg(&local_err, "invalid request type (%" PRIu32 ") received",
                    request.type);
-        reply.error = EINVAL;
+        ret = -EINVAL;
     }
 
 reply:
     if (local_err) {
-        /* If we are here local_err is not fatal error, already stored in
-         * reply.error */
+        /* If we are here local_err is not fatal error, which should be sent
+         * to client. */
         error_report_err(local_err);
         local_err = NULL;
     }
 
-    if (nbd_co_send_reply(req, &reply, reply_data_len, &local_err) < 0) {
+    if (nbd_co_send_simple_reply(req->client, request.handle,
+                                 ret < 0 ? -ret : 0,
+                                 req->data, reply_data_len, &local_err) < 0)
+    {
         error_prepend(&local_err, "Failed to send reply: ");
         goto disconnect;
     }
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 5/8] nbd: header constants indenting
  2017-09-25 13:57 [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 4/8] nbd-server: refactor simple reply sending Vladimir Sementsov-Ogievskiy
@ 2017-09-25 13:57 ` Vladimir Sementsov-Ogievskiy
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 6/8] nbd: Minimal structured read for server Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-25 13:57 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Hmreitz, kwolf, pbonzini, eblake, vsementsov, den

Prepare indenting for the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h | 15 ++++++++-------
 nbd/nbd-internal.h  | 34 +++++++++++++++++-----------------
 2 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 49008980f4..a6df5ce8b5 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -71,13 +71,14 @@ typedef struct NBDSimpleReply {
 
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
-#define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
-#define NBD_FLAG_READ_ONLY      (1 << 1)        /* Device is read-only */
-#define NBD_FLAG_SEND_FLUSH     (1 << 2)        /* Send FLUSH */
-#define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
-#define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
-#define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
-#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)     /* Send WRITE_ZEROES */
+#define NBD_FLAG_HAS_FLAGS         (1 << 0) /* Flags are there */
+#define NBD_FLAG_READ_ONLY         (1 << 1) /* Device is read-only */
+#define NBD_FLAG_SEND_FLUSH        (1 << 2) /* Send FLUSH */
+#define NBD_FLAG_SEND_FUA          (1 << 3) /* Send FUA (Force Unit Access) */
+#define NBD_FLAG_ROTATIONAL        (1 << 4) /* Use elevator algorithm -
+                                               rotational media */
+#define NBD_FLAG_SEND_TRIM         (1 << 5) /* Send TRIM (discard) */
+#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) /* Send WRITE_ZEROES */
 
 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 320abef296..d96c9cc7fd 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -46,23 +46,23 @@
 /* Size of oldstyle negotiation */
 #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124)
 
-#define NBD_REQUEST_MAGIC       0x25609513
-#define NBD_SIMPLE_REPLY_MAGIC  0x67446698
-#define NBD_OPTS_MAGIC          0x49484156454F5054LL
-#define NBD_CLIENT_MAGIC        0x0000420281861253LL
-#define NBD_REP_MAGIC           0x0003e889045565a9LL
-
-#define NBD_SET_SOCK            _IO(0xab, 0)
-#define NBD_SET_BLKSIZE         _IO(0xab, 1)
-#define NBD_SET_SIZE            _IO(0xab, 2)
-#define NBD_DO_IT               _IO(0xab, 3)
-#define NBD_CLEAR_SOCK          _IO(0xab, 4)
-#define NBD_CLEAR_QUE           _IO(0xab, 5)
-#define NBD_PRINT_DEBUG         _IO(0xab, 6)
-#define NBD_SET_SIZE_BLOCKS     _IO(0xab, 7)
-#define NBD_DISCONNECT          _IO(0xab, 8)
-#define NBD_SET_TIMEOUT         _IO(0xab, 9)
-#define NBD_SET_FLAGS           _IO(0xab, 10)
+#define NBD_REQUEST_MAGIC           0x25609513
+#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
+#define NBD_OPTS_MAGIC              0x49484156454F5054LL
+#define NBD_CLIENT_MAGIC            0x0000420281861253LL
+#define NBD_REP_MAGIC               0x0003e889045565a9LL
+
+#define NBD_SET_SOCK                _IO(0xab, 0)
+#define NBD_SET_BLKSIZE             _IO(0xab, 1)
+#define NBD_SET_SIZE                _IO(0xab, 2)
+#define NBD_DO_IT                   _IO(0xab, 3)
+#define NBD_CLEAR_SOCK              _IO(0xab, 4)
+#define NBD_CLEAR_QUE               _IO(0xab, 5)
+#define NBD_PRINT_DEBUG             _IO(0xab, 6)
+#define NBD_SET_SIZE_BLOCKS         _IO(0xab, 7)
+#define NBD_DISCONNECT              _IO(0xab, 8)
+#define NBD_SET_TIMEOUT             _IO(0xab, 9)
+#define NBD_SET_FLAGS               _IO(0xab, 10)
 
 /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
  * but only a limited set of errno values is specified in the protocol.
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 6/8] nbd: Minimal structured read for server
  2017-09-25 13:57 [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 5/8] nbd: header constants indenting Vladimir Sementsov-Ogievskiy
@ 2017-09-25 13:57 ` Vladimir Sementsov-Ogievskiy
  2017-09-25 13:58 ` [Qemu-devel] [PATCH 7/8] nbd/client: refactor nbd_receive_starttls Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-25 13:57 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Hmreitz, kwolf, pbonzini, eblake, vsementsov, den

Minimal implementation of structured read: one structured reply chunk,
no segmentation.
Minimal structured error implementation: no text message.
Support DF flag, but just ignore it, as there is no segmentation any
way.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h |  31 ++++++++++++++++
 nbd/nbd-internal.h  |   1 +
 nbd/server.c        | 100 ++++++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 126 insertions(+), 6 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index a6df5ce8b5..314f2f9bbc 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -69,6 +69,25 @@ typedef struct NBDSimpleReply {
     uint64_t handle;
 } QEMU_PACKED NBDSimpleReply;
 
+typedef struct NBDStructuredReplyChunk {
+    uint32_t magic;  /* NBD_STRUCTURED_REPLY_MAGIC */
+    uint16_t flags;  /* combination of NBD_SREP_FLAG_* */
+    uint16_t type;   /* NBD_SREP_TYPE_* */
+    uint64_t handle; /* request handle */
+    uint32_t length; /* length of payload */
+} QEMU_PACKED NBDStructuredReplyChunk;
+
+typedef struct NBDStructuredRead {
+    NBDStructuredReplyChunk h;
+    uint64_t offset;
+} QEMU_PACKED NBDStructuredRead;
+
+typedef struct NBDStructuredError {
+    NBDStructuredReplyChunk h;
+    uint32_t error;
+    uint16_t message_length;
+} QEMU_PACKED NBDStructuredError;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS         (1 << 0) /* Flags are there */
@@ -79,6 +98,7 @@ typedef struct NBDSimpleReply {
                                                rotational media */
 #define NBD_FLAG_SEND_TRIM         (1 << 5) /* Send TRIM (discard) */
 #define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) /* Send WRITE_ZEROES */
+#define NBD_FLAG_SEND_DF           (1 << 7) /* Send DF (Do not Fragment) */
 
 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
@@ -125,6 +145,7 @@ typedef struct NBDSimpleReply {
 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
 #define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */
+#define NBD_CMD_FLAG_DF         (1 << 2) /* don't fragment structured read */
 
 /* Supported request types */
 enum {
@@ -149,6 +170,16 @@ enum {
  * aren't overflowing some other buffer. */
 #define NBD_MAX_NAME_SIZE 256
 
+/* Structured reply flags */
+#define NBD_SREP_FLAG_DONE          (1 << 0) /* This reply-chunk is last */
+
+/* Structured reply types */
+#define NBD_SREP_ERR(value)         ((1 << 15) | (value))
+
+#define NBD_SREP_TYPE_NONE          0
+#define NBD_SREP_TYPE_OFFSET_DATA   1
+#define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
+
 /* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
 struct NBDExportInfo {
     /* Set by client before nbd_receive_negotiate() */
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index d96c9cc7fd..6b0d1183ba 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -48,6 +48,7 @@
 
 #define NBD_REQUEST_MAGIC           0x25609513
 #define NBD_SIMPLE_REPLY_MAGIC      0x67446698
+#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
 #define NBD_OPTS_MAGIC              0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC            0x0000420281861253LL
 #define NBD_REP_MAGIC               0x0003e889045565a9LL
diff --git a/nbd/server.c b/nbd/server.c
index 57d5598e0f..0af94a293d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -98,6 +98,8 @@ struct NBDClient {
     QTAILQ_ENTRY(NBDClient) next;
     int nb_requests;
     bool closing;
+
+    bool structured_reply;
 };
 
 /* That's all folks */
@@ -760,6 +762,20 @@ static int nbd_negotiate_options(NBDClient *client, uint16_t myflags,
                     return ret;
                 }
                 break;
+
+            case NBD_OPT_STRUCTURED_REPLY:
+                if (client->structured_reply) {
+                    error_setg(errp, "Double negotiation of structured reply");
+                    return -EINVAL;
+                }
+                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, option,
+                                             errp);
+                if (ret < 0) {
+                    return ret;
+                }
+                client->structured_reply = true;
+                break;
+
             default:
                 if (nbd_drop(client->ioc, length, errp) < 0) {
                     return -EIO;
@@ -1233,6 +1249,61 @@ static int nbd_co_send_simple_reply(NBDClient *client,
     return nbd_co_send_iov(client, iov, size ? 2 : 1, errp);
 }
 
+static inline void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
+                                uint16_t type, uint64_t handle, uint32_t length)
+{
+    stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
+    stw_be_p(&chunk->flags, flags);
+    stw_be_p(&chunk->type, type);
+    stq_be_p(&chunk->handle, handle);
+    stl_be_p(&chunk->length, length);
+}
+
+static inline int coroutine_fn nbd_co_send_buf(NBDClient *client, void *buf,
+                                               size_t size, Error **errp)
+{
+    struct iovec iov[] = {
+        {.iov_base = buf, .iov_len = size}
+    };
+
+    return nbd_co_send_iov(client, iov, 1, errp);
+}
+
+static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
+                                                    uint64_t handle,
+                                                    uint64_t offset,
+                                                    void *data,
+                                                    size_t size,
+                                                    Error **errp)
+{
+    NBDStructuredRead chunk;
+    struct iovec iov[] = {
+        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+        {.iov_base = data, .iov_len = size}
+    };
+
+    set_be_chunk(&chunk.h, NBD_SREP_FLAG_DONE, NBD_SREP_TYPE_OFFSET_DATA,
+                 handle, sizeof(chunk) - sizeof(chunk.h) + size);
+    stq_be_p(&chunk.offset, offset);
+
+    return nbd_co_send_iov(client, iov, 2, errp);
+}
+
+static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
+                                                     uint64_t handle,
+                                                     uint32_t error,
+                                                     Error **errp)
+{
+    NBDStructuredError chunk;
+
+    set_be_chunk(&chunk.h, NBD_SREP_FLAG_DONE, NBD_SREP_TYPE_ERROR, handle,
+                 sizeof(chunk) - sizeof(chunk.h));
+    stl_be_p(&chunk.error, error);
+    stw_be_p(&chunk.message_length, 0);
+
+    return nbd_co_send_buf(client, &chunk, sizeof(chunk), errp);
+}
+
 /* nbd_co_receive_request
  * Collect a client request. Return 0 if request looks valid, -EIO to drop
  * connection right away, and any other negative value to report an error to
@@ -1304,10 +1375,17 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
                    (uint64_t)client->exp->size);
         return request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
     }
-    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
+    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
+                           NBD_CMD_FLAG_DF))
+    {
         error_setg(errp, "unsupported flags (got 0x%x)", request->flags);
         return -EINVAL;
     }
+    if (request->type != NBD_CMD_READ && (request->flags & NBD_CMD_FLAG_DF)) {
+        error_setg(errp, "DF flag used with command %d (%s) which is not READ",
+                   request->type, nbd_cmd_lookup(request->type));
+        return -EINVAL;
+    }
     if (request->type != NBD_CMD_WRITE_ZEROES &&
         (request->flags & NBD_CMD_FLAG_NO_HOLE)) {
         error_setg(errp, "unexpected flags (got 0x%x)", request->flags);
@@ -1374,7 +1452,6 @@ static coroutine_fn void nbd_trip(void *opaque)
         }
 
         reply_data_len = request.len;
-
         break;
     case NBD_CMD_WRITE:
         if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
@@ -1447,10 +1524,21 @@ reply:
         local_err = NULL;
     }
 
-    if (nbd_co_send_simple_reply(req->client, request.handle,
-                                 ret < 0 ? -ret : 0,
-                                 req->data, reply_data_len, &local_err) < 0)
-    {
+    if (client->structured_reply && request.type == NBD_CMD_READ) {
+        if (ret < 0) {
+            ret = nbd_co_send_structured_error(req->client, request.handle,
+                                               -ret, &local_err);
+        } else {
+            ret = nbd_co_send_structured_read(req->client, request.handle,
+                                              request.from, req->data,
+                                              reply_data_len, &local_err);
+        }
+    } else {
+        ret = nbd_co_send_simple_reply(req->client, request.handle,
+                                       ret < 0 ? -ret : 0,
+                                       req->data, reply_data_len, &local_err);
+    }
+    if (ret < 0) {
         error_prepend(&local_err, "Failed to send reply: ");
         goto disconnect;
     }
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 7/8] nbd/client: refactor nbd_receive_starttls
  2017-09-25 13:57 [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 6/8] nbd: Minimal structured read for server Vladimir Sementsov-Ogievskiy
@ 2017-09-25 13:58 ` Vladimir Sementsov-Ogievskiy
  2017-09-25 13:58 ` [Qemu-devel] [PATCH 8/8] nbd: Minimal structured read for client Vladimir Sementsov-Ogievskiy
  2017-09-25 14:06 ` [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
  8 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-25 13:58 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Hmreitz, kwolf, pbonzini, eblake, vsementsov, den

Split out nbd_receive_simple_option to be reused for structured reply
option.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/client.c     | 64 ++++++++++++++++++++++++++++++++++++++++----------------
 nbd/trace-events |  7 ++++---
 2 files changed, 50 insertions(+), 21 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index cd5a2c80ac..51ae492e92 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -540,35 +540,63 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
     }
 }
 
-static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
-                                        QCryptoTLSCreds *tlscreds,
-                                        const char *hostname, Error **errp)
+/* nbd_request_simple_option
+ * return 1 for successful negotiation,
+ *        0 if operation is unsupported,
+ *        -1 with errp set for any other error
+ */
+static int nbd_request_simple_option(QIOChannel *ioc, int opt, Error **errp)
 {
     nbd_opt_reply reply;
-    QIOChannelTLS *tioc;
-    struct NBDTLSHandshakeData data = { 0 };
 
-    trace_nbd_receive_starttls_request();
-    if (nbd_send_option_request(ioc, NBD_OPT_STARTTLS, 0, NULL, errp) < 0) {
-        return NULL;
+    trace_nbd_receive_simple_option_request(opt, nbd_opt_lookup(opt));
+    if (nbd_send_option_request(ioc, opt, 0, NULL, errp) < 0) {
+        return -1;
     }
 
-    trace_nbd_receive_starttls_reply();
-    if (nbd_receive_option_reply(ioc, NBD_OPT_STARTTLS, &reply, errp) < 0) {
-        return NULL;
+    trace_nbd_receive_simple_option_reply(opt, nbd_opt_lookup(opt));
+    if (nbd_receive_option_reply(ioc, opt, &reply, errp) < 0) {
+        return -1;
     }
 
-    if (reply.type != NBD_REP_ACK) {
-        error_setg(errp, "Server rejected request to start TLS %" PRIx32,
-                   reply.type);
+    if (reply.length != 0) {
+        error_setg(errp, "Option %d ('%s') response length is %" PRIu32
+                   " (it should be zero)", opt, nbd_opt_lookup(opt),
+                   reply.length);
         nbd_send_opt_abort(ioc);
-        return NULL;
+        return -1;
     }
 
-    if (reply.length != 0) {
-        error_setg(errp, "Start TLS response was not zero %" PRIu32,
-                   reply.length);
+    if (reply.type == NBD_REP_ERR_UNSUP) {
+        return 1;
+    }
+
+    if (reply.type != NBD_REP_ACK) {
+        error_setg(errp, "Server rejected request for option %d (%s) "
+                   "with reply %" PRIx32 " (%s)", opt, nbd_opt_lookup(opt),
+                   reply.type, nbd_rep_lookup(reply.type));
         nbd_send_opt_abort(ioc);
+        return -1;
+    }
+
+    trace_nbd_receive_simple_option_approved(opt, nbd_opt_lookup(opt));
+    return 0;
+}
+
+static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
+                                        QCryptoTLSCreds *tlscreds,
+                                        const char *hostname, Error **errp)
+{
+    int ret;
+    QIOChannelTLS *tioc;
+    struct NBDTLSHandshakeData data = { 0 };
+
+    ret = nbd_request_simple_option(ioc, NBD_OPT_STARTTLS, errp);
+    if (ret <= 0) {
+        if (ret == 0) {
+            error_setg(errp, "Server don't support STARTTLS option");
+            nbd_send_opt_abort(ioc);
+        }
         return NULL;
     }
 
diff --git a/nbd/trace-events b/nbd/trace-events
index 48a4f27682..ea44e6963f 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -9,9 +9,10 @@ nbd_opt_go_info_unknown(int info, const char *name) "Ignoring unknown info %d (%
 nbd_opt_go_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t maximum) "Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32
 nbd_receive_query_exports_start(const char *wantname) "Querying export list for '%s'"
 nbd_receive_query_exports_success(const char *wantname) "Found desired export name '%s'"
-nbd_receive_starttls_request(void) "Requesting TLS from server"
-nbd_receive_starttls_reply(void) "Getting TLS reply from server"
-nbd_receive_starttls_new_client(void) "TLS request approved, setting up TLS"
+nbd_receive_simple_option_request(int opt, const char *name) "Requesting option %d (%s) from server"
+nbd_receive_simple_option_reply(int opt, const char *name) "Getting reply for option %d (%s) from server"
+nbd_receive_simple_option_approved(int opt, const char *name) "Option %d (%s) approved"
+nbd_receive_starttls_new_client(void) "Setting up TLS"
 nbd_receive_starttls_tls_handshake(void) "Starting TLS handshake"
 nbd_receive_negotiate(void *tlscreds, const char *hostname) "Receiving negotiation tlscreds=%p hostname=%s"
 nbd_receive_negotiate_magic(uint64_t magic) "Magic is 0x%" PRIx64
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 8/8] nbd: Minimal structured read for client
  2017-09-25 13:57 [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2017-09-25 13:58 ` [Qemu-devel] [PATCH 7/8] nbd/client: refactor nbd_receive_starttls Vladimir Sementsov-Ogievskiy
@ 2017-09-25 13:58 ` Vladimir Sementsov-Ogievskiy
  2017-09-25 22:19   ` Eric Blake
  2017-09-25 14:06 ` [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
  8 siblings, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-25 13:58 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Hmreitz, kwolf, pbonzini, eblake, vsementsov, den

Minimal implementation: drop most of additional error information.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.h  |   2 +
 include/block/nbd.h |  15 ++++-
 block/nbd-client.c  |  97 +++++++++++++++++++++++++-----
 nbd/client.c        | 169 +++++++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 249 insertions(+), 34 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index b435754b82..9e178de510 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -35,6 +35,8 @@ typedef struct NBDClientSession {
     NBDClientRequest requests[MAX_NBD_REQUESTS];
     NBDReply reply;
     bool quit;
+
+    bool structured_reply;
 } NBDClientSession;
 
 NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 314f2f9bbc..7604e80c49 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -57,11 +57,17 @@ struct NBDRequest {
 };
 typedef struct NBDRequest NBDRequest;
 
-struct NBDReply {
+typedef struct NBDReply {
+    bool simple;
     uint64_t handle;
     uint32_t error;
-};
-typedef struct NBDReply NBDReply;
+
+    uint16_t flags;
+    uint16_t type;
+    uint32_t tail_length;
+    uint64_t offset;
+    uint32_t hole_size;
+} NBDReply;
 
 typedef struct NBDSimpleReply {
     uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
@@ -178,12 +184,15 @@ enum {
 
 #define NBD_SREP_TYPE_NONE          0
 #define NBD_SREP_TYPE_OFFSET_DATA   1
+#define NBD_SREP_TYPE_OFFSET_HOLE   2
 #define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
+#define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)
 
 /* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
 struct NBDExportInfo {
     /* Set by client before nbd_receive_negotiate() */
     bool request_sizes;
+    bool structured_reply;
     /* Set by server results during nbd_receive_negotiate() */
     uint64_t size;
     uint16_t flags;
diff --git a/block/nbd-client.c b/block/nbd-client.c
index e4f0c789f4..bdf9299bb9 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -179,9 +179,10 @@ err:
     return rc;
 }
 
-static int nbd_co_receive_reply(NBDClientSession *s,
-                                uint64_t handle,
-                                QEMUIOVector *qiov)
+static int nbd_co_receive_1_reply_or_chunk(NBDClientSession *s,
+                                           uint64_t handle,
+                                           bool *cont,
+                                           QEMUIOVector *qiov)
 {
     int ret;
     int i = HANDLE_TO_INDEX(s, handle);
@@ -191,29 +192,95 @@ static int nbd_co_receive_reply(NBDClientSession *s,
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
     if (!s->ioc || s->quit) {
-        ret = -EIO;
-    } else {
-        assert(s->reply.handle == handle);
-        ret = -s->reply.error;
-        if (qiov && s->reply.error == 0) {
+        *cont = false;
+        return -EIO;
+    }
+
+    assert(s->reply.handle == handle);
+    *cont = !(s->reply.simple || (s->reply.flags & NBD_SREP_FLAG_DONE));
+    ret = -s->reply.error;
+    if (ret < 0) {
+        goto out;
+    }
+
+    if (s->reply.simple) {
+        if (qiov) {
             if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
-                                      NULL) < 0) {
-                ret = -EIO;
-                s->quit = true;
+                                      NULL) < 0)
+            {
+                goto fatal;
             }
         }
+        goto out;
+    }
 
-        /* Tell the read handler to read another header.  */
-        s->reply.handle = 0;
+    /* here we deal with successful structured reply */
+    switch (s->reply.type) {
+        QEMUIOVector sub_qiov;
+    case NBD_SREP_TYPE_OFFSET_DATA:
+        if (!qiov || s->reply.offset + s->reply.tail_length > qiov->size) {
+            goto fatal;
+        }
+        qemu_iovec_init(&sub_qiov, qiov->niov);
+        qemu_iovec_concat(&sub_qiov, qiov, s->reply.offset,
+                          s->reply.tail_length);
+        ret = qio_channel_readv_all(s->ioc, sub_qiov.iov, sub_qiov.niov, NULL);
+        qemu_iovec_destroy(&sub_qiov);
+        if (ret < 0) {
+            goto fatal;
+        }
+        assert(ret == 0);
+        break;
+    case NBD_SREP_TYPE_OFFSET_HOLE:
+        if (!qiov || s->reply.offset + s->reply.hole_size > qiov->size) {
+            goto fatal;
+        }
+        qemu_iovec_memset(qiov, s->reply.offset, 0, s->reply.hole_size);
+        break;
+    case NBD_SREP_TYPE_NONE:
+        break;
+    default:
+        goto fatal;
     }
 
-    s->requests[i].coroutine = NULL;
+out:
+    /* For assert at loop start in nbd_read_reply_entry */
+    s->reply.handle = 0;
+
+    if (s->read_reply_co) {
+        aio_co_wake(s->read_reply_co);
+    }
+
+    return ret;
 
-    /* Kick the read_reply_co to get the next reply.  */
+fatal:
+    /* protocol or ioc failure */
+    *cont = false;
+    s->quit = true;
     if (s->read_reply_co) {
         aio_co_wake(s->read_reply_co);
     }
 
+    return -EIO;
+}
+
+static int nbd_co_receive_reply(NBDClientSession *s,
+                                uint64_t handle,
+                                QEMUIOVector *qiov)
+{
+    int ret = 0;
+    int i = HANDLE_TO_INDEX(s, handle);
+    bool cont = true;
+
+    while (cont) {
+        int rc = nbd_co_receive_1_reply_or_chunk(s, handle, &cont, qiov);
+        if (rc < 0 && ret == 0) {
+            ret = rc;
+        }
+    }
+
+    s->requests[i].coroutine = NULL;
+
     qemu_co_mutex_lock(&s->send_mutex);
     s->in_flight--;
     qemu_co_queue_next(&s->free_sema);
diff --git a/nbd/client.c b/nbd/client.c
index 51ae492e92..880eb17b85 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -719,6 +719,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
         if (fixedNewStyle) {
             int result;
 
+            result = nbd_request_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
+                                               errp);
+            if (result < 0) {
+                goto fail;
+            }
+            info->structured_reply = result > 0;
+
             /* Try NBD_OPT_GO first - if it works, we are done (it
              * also gives us a good message if the server requires
              * TLS).  If it is not available, fall back to
@@ -759,6 +766,12 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
             goto fail;
         }
         be16_to_cpus(&info->flags);
+
+        if (info->structured_reply && !(info->flags & NBD_CMD_FLAG_DF)) {
+            error_setg(errp, "Structured reply is negotiated, "
+                             "but DF flag is not.");
+            goto fail;
+        }
     } else if (magic == NBD_CLIENT_MAGIC) {
         uint32_t oldflags;
 
@@ -942,6 +955,128 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
     return nbd_write(ioc, buf, sizeof(buf), NULL);
 }
 
+/* nbd_receive_simple_reply
+ * Read simple reply except magic field (which should be already read)
+ */
+static int nbd_receive_simple_reply(QIOChannel *ioc, NBDReply *reply,
+                                    Error **errp)
+{
+    NBDSimpleReply simple_reply;
+    int ret;
+
+    ret = nbd_read(ioc, (uint8_t *)&simple_reply + sizeof(simple_reply.magic),
+                   sizeof(simple_reply) - sizeof(simple_reply.magic), errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    reply->error = be32_to_cpu(simple_reply.error);
+    reply->handle = be64_to_cpu(simple_reply.handle);
+
+    return 0;
+}
+
+/* nbd_receive_structured_reply_chunk
+ * Read structured reply chunk except magic field (which should be already read)
+ * Data for NBD_SREP_TYPE_OFFSET_DATA is not read too.
+ * tail_length field of reply out parameter corresponds to unread part of reply.
+ */
+static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *reply,
+                                              Error **errp)
+{
+    NBDStructuredReplyChunk chunk;
+    ssize_t ret;
+    uint16_t message_size;
+
+    ret = nbd_read(ioc, (uint8_t *)&chunk + sizeof(chunk.magic),
+                          sizeof(chunk) - sizeof(chunk.magic), errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    reply->flags = be16_to_cpu(chunk.flags);
+    reply->type = be16_to_cpu(chunk.type);
+    reply->handle = be64_to_cpu(chunk.handle);
+    reply->tail_length = be32_to_cpu(chunk.length);
+
+    switch (reply->type) {
+    case NBD_SREP_TYPE_NONE:
+        break;
+    case NBD_SREP_TYPE_OFFSET_DATA:
+        if (reply->tail_length < sizeof(reply->offset)) {
+            return -EIO;
+        }
+        ret = nbd_read(ioc, &reply->offset, sizeof(reply->offset), errp);
+        if (ret < 0) {
+            return ret;
+        }
+        be64_to_cpus(&reply->offset);
+        reply->tail_length -= sizeof(reply->offset);
+
+        break;
+    case NBD_SREP_TYPE_OFFSET_HOLE:
+        ret = nbd_read(ioc, &reply->offset, sizeof(reply->offset), errp);
+        if (ret < 0) {
+            return ret;
+        }
+        be64_to_cpus(&reply->offset);
+
+        ret = nbd_read(ioc, &reply->hole_size, sizeof(reply->hole_size), errp);
+        if (ret < 0) {
+            return ret;
+        }
+        be32_to_cpus(&reply->hole_size);
+
+        break;
+    case NBD_SREP_TYPE_ERROR:
+    case NBD_SREP_TYPE_ERROR_OFFSET:
+        ret = nbd_read(ioc, &reply->error, sizeof(reply->error), errp);
+        if (ret < 0) {
+            return ret;
+        }
+        be32_to_cpus(&reply->error);
+
+        ret = nbd_read(ioc, &message_size, sizeof(message_size), errp);
+        if (ret < 0) {
+            return ret;
+        }
+        be16_to_cpus(&message_size);
+
+        if (message_size > 0) {
+            /* TODO: provide error message to user */
+            ret = nbd_drop(ioc, message_size, errp);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+
+        if (reply->type == NBD_SREP_TYPE_ERROR_OFFSET) {
+            /* drop 64bit offset */
+            ret = nbd_drop(ioc, 8, errp);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+        break;
+    default:
+        if (reply->type & (1 << 15)) {
+            /* unknown error */
+            ret = nbd_drop(ioc, reply->tail_length, errp);
+            if (ret < 0) {
+                return ret;
+            }
+
+            reply->error = NBD_EINVAL;
+            reply->tail_length = 0;
+        } else {
+            /* unknown non-error reply type */
+            return -EINVAL;
+        }
+    }
+
+    return 0;
+}
+
 /* nbd_receive_reply
  * Returns 1 on success
  *         0 on eof, when no data was read (errp is not set)
@@ -949,24 +1084,32 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
  */
 int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
 {
-    uint8_t buf[NBD_REPLY_SIZE];
     uint32_t magic;
     int ret;
 
-    ret = nbd_read_eof(ioc, buf, sizeof(buf), errp);
+    ret = nbd_read_eof(ioc, &magic, sizeof(magic), errp);
     if (ret <= 0) {
         return ret;
     }
 
-    /* Reply
-       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
-       [ 4 ..  7]    error   (0 == no error)
-       [ 7 .. 15]    handle
-     */
+    be32_to_cpus(&magic);
 
-    magic = ldl_be_p(buf);
-    reply->error  = ldl_be_p(buf + 4);
-    reply->handle = ldq_be_p(buf + 8);
+    switch (magic) {
+    case NBD_SIMPLE_REPLY_MAGIC:
+        reply->simple = true;
+        ret = nbd_receive_simple_reply(ioc, reply, errp);
+        break;
+    case NBD_STRUCTURED_REPLY_MAGIC:
+        reply->simple = false;
+        ret = nbd_receive_structured_reply_chunk(ioc, reply, errp);
+        break;
+    default:
+        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
+        return -EINVAL;
+    }
+    if (ret < 0) {
+        return ret;
+    }
 
     reply->error = nbd_errno_to_system_errno(reply->error);
 
@@ -977,11 +1120,5 @@ int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
     }
     trace_nbd_receive_reply(magic, reply->error, reply->handle);
 
-    if (magic != NBD_SIMPLE_REPLY_MAGIC) {
-        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
-        return -EINVAL;
-    }
-
     return 1;
 }
-
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] nbd minimal structured read
  2017-09-25 13:57 [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2017-09-25 13:58 ` [Qemu-devel] [PATCH 8/8] nbd: Minimal structured read for client Vladimir Sementsov-Ogievskiy
@ 2017-09-25 14:06 ` Vladimir Sementsov-Ogievskiy
  8 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-25 14:06 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: Max Reitz, kwolf, pbonzini, eblake, den

Add Max.

Max, sorry, I've mistaken in copy-pasting your email address..

25.09.2017 16:57, Vladimir Sementsov-Ogievskiy wrote:
> Hi all. Here is minimal structured-read extension implementation for nbd.
> It is needed mostly to implement nbd block-status extension over it, so
> it is minimal.
>
> It's based on my three patches, currently on pull-request stage:
>        block/nbd-client: refactor nbd_co_receive_reply
>        block/nbd-client: simplify check in nbd_co_receive_reply
>        block/nbd-client: nbd_co_send_request: fix return code
>
> clone: tag up-nbd-minimal-structured-read-v1 from https://src.openvz.org/scm/~vsementsov/qemu.git
> online: https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=up-nbd-minimal-structured-read-v1
>
> Note: actually it's a continuation of a part of my series "nbd: BLOCK_STATUS",
> but it was sent about half a year ago, so let's consider it as new series.
>
> Vladimir Sementsov-Ogievskiy (8):
>    block/nbd-client: assert qiov len once in nbd_co_request
>    block/nbd-client: refactor nbd_co_receive_reply
>    nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC
>    nbd-server: refactor simple reply sending
>    nbd: header constants indenting
>    nbd: Minimal structured read for server
>    nbd/client: refactor nbd_receive_starttls
>    nbd: Minimal structured read for client
>
>   block/nbd-client.h                       |   2 +
>   include/block/nbd.h                      |  67 +++++++--
>   nbd/nbd-internal.h                       |  44 +++---
>   block/nbd-client.c                       | 104 +++++++++++---
>   nbd/client.c                             | 233 ++++++++++++++++++++++++++-----
>   nbd/server.c                             | 189 +++++++++++++++++--------
>   nbd/trace-events                         |   7 +-
>   tests/qemu-iotests/nbd-fault-injector.py |   4 +-
>   8 files changed, 507 insertions(+), 143 deletions(-)
>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] block/nbd-client: assert qiov len once in nbd_co_request
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 1/8] block/nbd-client: assert qiov len once in nbd_co_request Vladimir Sementsov-Ogievskiy
@ 2017-09-25 21:58   ` Eric Blake
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Blake @ 2017-09-25 21:58 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 1233 bytes --]

On 09/25/2017 08:57 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 

> @@ -233,6 +231,7 @@ static int nbd_co_request(BlockDriverState *bs,
>  
>      assert(!qiov || request->type == NBD_CMD_WRITE ||
>             request->type == NBD_CMD_READ);
> +    assert(!qiov || request->len == iov_size(qiov->iov, qiov->niov));

Asserting !qiov twice in a row feels a bit odd, although I see why you
have to do it to prevent a NULL dereference.  Would this be any easier
to read as a single assertion:

assert(!qiov ||
       ((request->type == NBD_CMD_WRITE ||
         request->type == NBD_CMD_READ) &&
        request->len == iov_size(qiov->iov, qiov->niov)));

Or, as a conditional:

if (qiov) {
    assert(request->type == NBD_CMD_WRITE || request->type == NBD_CMD_READ);
    assert(request->len == iov_size(qiov->iov, qiov->niov));
} else {
    assert(request->type != NBD_CMD_WRITE && request->type != NBD_CMD_READ);
}

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] block/nbd-client: refactor nbd_co_receive_reply
  2017-09-25 13:57 ` [Qemu-devel] [PATCH 2/8] block/nbd-client: refactor nbd_co_receive_reply Vladimir Sementsov-Ogievskiy
@ 2017-09-25 21:59   ` Eric Blake
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Blake @ 2017-09-25 21:59 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]

On 09/25/2017 08:57 AM, Vladimir Sementsov-Ogievskiy wrote:
> Pass handle parameter directly, as the whole request isn't needed.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] nbd: Minimal structured read for client
  2017-09-25 13:58 ` [Qemu-devel] [PATCH 8/8] nbd: Minimal structured read for client Vladimir Sementsov-Ogievskiy
@ 2017-09-25 22:19   ` Eric Blake
  2017-09-27 10:05     ` Vladimir Sementsov-Ogievskiy
  2017-10-03 10:07     ` [Qemu-devel] [Qemu-block] [PATCH 8/8] " Paolo Bonzini
  0 siblings, 2 replies; 31+ messages in thread
From: Eric Blake @ 2017-09-25 22:19 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: mreitz, kwolf, pbonzini, den

[-- Attachment #1: Type: text/plain, Size: 4990 bytes --]

On 09/25/2017 08:58 AM, Vladimir Sementsov-Ogievskiy wrote:
> Minimal implementation: drop most of additional error information.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.h  |   2 +
>  include/block/nbd.h |  15 ++++-
>  block/nbd-client.c  |  97 +++++++++++++++++++++++++-----
>  nbd/client.c        | 169 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  4 files changed, 249 insertions(+), 34 deletions(-)
> 

> +++ b/include/block/nbd.h
> @@ -57,11 +57,17 @@ struct NBDRequest {
>  };
>  typedef struct NBDRequest NBDRequest;
>  
> -struct NBDReply {
> +typedef struct NBDReply {
> +    bool simple;
>      uint64_t handle;
>      uint32_t error;
> -};
> -typedef struct NBDReply NBDReply;
> +
> +    uint16_t flags;
> +    uint16_t type;
> +    uint32_t tail_length;
> +    uint64_t offset;
> +    uint32_t hole_size;
> +} NBDReply;

This feels like it should be a discriminated union, rather than a struct
containing fields that are only sometimes valid...

>  
>  #define NBD_SREP_TYPE_NONE          0
>  #define NBD_SREP_TYPE_OFFSET_DATA   1
> +#define NBD_SREP_TYPE_OFFSET_HOLE   2
>  #define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
> +#define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)

...especially since there is more than one type of SREP packet layout.

I also wonder why you are defining constants in a piecemeal fashion,
rather than all at once (even if your minimal server implementation
doesn't send a particular constant, there's no harm in defining them all
up front).

> +++ b/block/nbd-client.c
> @@ -179,9 +179,10 @@ err:
>      return rc;
>  }
>  
> -static int nbd_co_receive_reply(NBDClientSession *s,
> -                                uint64_t handle,
> -                                QEMUIOVector *qiov)
> +static int nbd_co_receive_1_reply_or_chunk(NBDClientSession *s,

Long name, and unusual to mix in "1" instead of "one".  Would it be
better to name it nbd_co_receive_chunk, where we declare that a simple
reply is (roughly) the same amount of effort as a chunk in a structured
reply?

> +                                           uint64_t handle,
> +                                           bool *cont,
> +                                           QEMUIOVector *qiov)
>  {

No documentation of the function?

>      int ret;
>      int i = HANDLE_TO_INDEX(s, handle);
> @@ -191,29 +192,95 @@ static int nbd_co_receive_reply(NBDClientSession *s,
>      qemu_coroutine_yield();
>      s->requests[i].receiving = false;
>      if (!s->ioc || s->quit) {
> -        ret = -EIO;
> -    } else {
> -        assert(s->reply.handle == handle);
> -        ret = -s->reply.error;
> -        if (qiov && s->reply.error == 0) {
> +        *cont = false;
> +        return -EIO;
> +    }
> +
> +    assert(s->reply.handle == handle);
> +    *cont = !(s->reply.simple || (s->reply.flags & NBD_SREP_FLAG_DONE));

We need to validate that the server is not sending us SREP chunks unless
we negotiated them.  I'm thinking it might be better to do it here
(maybe you did it somewhere else, but I haven't seen it yet; I'm
reviewing the patch in textual order rather than the order in which
things are called).

> +    ret = -s->reply.error;
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    if (s->reply.simple) {
> +        if (qiov) {
>              if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
> -                                      NULL) < 0) {
> -                ret = -EIO;
> -                s->quit = true;
> +                                      NULL) < 0)
> +            {
> +                goto fatal;
>              }
>          }
> +        goto out;
> +    }
>  
> -        /* Tell the read handler to read another header.  */
> -        s->reply.handle = 0;
> +    /* here we deal with successful structured reply */
> +    switch (s->reply.type) {
> +        QEMUIOVector sub_qiov;
> +    case NBD_SREP_TYPE_OFFSET_DATA:

This is putting a LOT of smarts directly into the receive routine.
Here's where I was previously wondering (and I think Paolo as well)
whether it might be better to split the efforts: the generic function
reads off the chunk information and any payload, but a per-command
callback function then parses the chunks.  Especially since the
definition of the chunks differs on a per-command basis (yes, the NBD
spec will try to not reuse an SREP chunk type across multiple commands
unless the semantics are similar, but that's a bit more fragile).  This
particularly matters given my statement above that you want a
discriminated union, rather than a struct that contains unused fields,
for handling different SREP chunk types.

My review has to pause here for now...


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] nbd: Minimal structured read for client
  2017-09-25 22:19   ` Eric Blake
@ 2017-09-27 10:05     ` Vladimir Sementsov-Ogievskiy
  2017-09-27 12:32       ` Vladimir Sementsov-Ogievskiy
  2017-10-03 10:07     ` [Qemu-devel] [Qemu-block] [PATCH 8/8] " Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-27 10:05 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block; +Cc: mreitz, kwolf, pbonzini, den

26.09.2017 01:19, Eric Blake wrote:
> On 09/25/2017 08:58 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Minimal implementation: drop most of additional error information.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/nbd-client.h  |   2 +
>>   include/block/nbd.h |  15 ++++-
>>   block/nbd-client.c  |  97 +++++++++++++++++++++++++-----
>>   nbd/client.c        | 169 +++++++++++++++++++++++++++++++++++++++++++++++-----
>>   4 files changed, 249 insertions(+), 34 deletions(-)
>>
>> +++ b/include/block/nbd.h
>> @@ -57,11 +57,17 @@ struct NBDRequest {
>>   };
>>   typedef struct NBDRequest NBDRequest;
>>   
>> -struct NBDReply {
>> +typedef struct NBDReply {
>> +    bool simple;
>>       uint64_t handle;
>>       uint32_t error;
>> -};
>> -typedef struct NBDReply NBDReply;
>> +
>> +    uint16_t flags;
>> +    uint16_t type;
>> +    uint32_t tail_length;
>> +    uint64_t offset;
>> +    uint32_t hole_size;
>> +} NBDReply;
> This feels like it should be a discriminated union, rather than a struct
> containing fields that are only sometimes valid...

No:

simple, handle and error are always valid
flags, type, tail_length are valid for all structured replies
offset and hole_size are valid for structured hole reply

so, we have one case when all fields are valid.. how do you see it with 
union, what is the real difference? I think it would be just a complication.

>
>>   
>>   #define NBD_SREP_TYPE_NONE          0
>>   #define NBD_SREP_TYPE_OFFSET_DATA   1
>> +#define NBD_SREP_TYPE_OFFSET_HOLE   2
>>   #define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
>> +#define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)
> ...especially since there is more than one type of SREP packet layout.
>
> I also wonder why you are defining constants in a piecemeal fashion,
> rather than all at once (even if your minimal server implementation
> doesn't send a particular constant, there's no harm in defining them all
> up front).

hmm. just to not define unused constants. It doesn't matter, I can 
define them all if you prefer.

>
>> +++ b/block/nbd-client.c
>> @@ -179,9 +179,10 @@ err:
>>       return rc;
>>   }
>>   
>> -static int nbd_co_receive_reply(NBDClientSession *s,
>> -                                uint64_t handle,
>> -                                QEMUIOVector *qiov)
>> +static int nbd_co_receive_1_reply_or_chunk(NBDClientSession *s,
> Long name, and unusual to mix in "1" instead of "one".  Would it be
> better to name it nbd_co_receive_chunk, where we declare that a simple
> reply is (roughly) the same amount of effort as a chunk in a structured
> reply?
>
>> +                                           uint64_t handle,
>> +                                           bool *cont,
>> +                                           QEMUIOVector *qiov)
>>   {
> No documentation of the function?
>
>>       int ret;
>>       int i = HANDLE_TO_INDEX(s, handle);
>> @@ -191,29 +192,95 @@ static int nbd_co_receive_reply(NBDClientSession *s,
>>       qemu_coroutine_yield();
>>       s->requests[i].receiving = false;
>>       if (!s->ioc || s->quit) {
>> -        ret = -EIO;
>> -    } else {
>> -        assert(s->reply.handle == handle);
>> -        ret = -s->reply.error;
>> -        if (qiov && s->reply.error == 0) {
>> +        *cont = false;
>> +        return -EIO;
>> +    }
>> +
>> +    assert(s->reply.handle == handle);
>> +    *cont = !(s->reply.simple || (s->reply.flags & NBD_SREP_FLAG_DONE));
> We need to validate that the server is not sending us SREP chunks unless
> we negotiated them.  I'm thinking it might be better to do it here
> (maybe you did it somewhere else, but I haven't seen it yet; I'm
> reviewing the patch in textual order rather than the order in which
> things are called).

No, I didn't. Will add (may be early, in reply_entry).

>
>> +    ret = -s->reply.error;
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    if (s->reply.simple) {
>> +        if (qiov) {
>>               if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
>> -                                      NULL) < 0) {
>> -                ret = -EIO;
>> -                s->quit = true;
>> +                                      NULL) < 0)
>> +            {
>> +                goto fatal;
>>               }
>>           }
>> +        goto out;
>> +    }
>>   
>> -        /* Tell the read handler to read another header.  */
>> -        s->reply.handle = 0;
>> +    /* here we deal with successful structured reply */
>> +    switch (s->reply.type) {
>> +        QEMUIOVector sub_qiov;
>> +    case NBD_SREP_TYPE_OFFSET_DATA:
> This is putting a LOT of smarts directly into the receive routine.
> Here's where I was previously wondering (and I think Paolo as well)
> whether it might be better to split the efforts: the generic function
> reads off the chunk information and any payload, but a per-command

Hmm. it was my idea to move all reading into one coroutine (in my 
refactoring series, but Paolo was against).

Or you mean to read a payload as raw? It would lead to double copying it 
to destination qiov, which I dislike..

> callback function then parses the chunks.

per-command? Then callback for CMD_READ would have all of these 
"smarts", so the whole code would not be simpler.. (However it will 
simplify adding block-status later).

>    Especially since the
> definition of the chunks differs on a per-command basis (yes, the NBD
> spec will try to not reuse an SREP chunk type across multiple commands
> unless the semantics are similar, but that's a bit more fragile).  This
> particularly matters given my statement above that you want a
> discriminated union, rather than a struct that contains unused fields,
> for handling different SREP chunk types.
>
> My review has to pause here for now...
>
>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] nbd: Minimal structured read for client
  2017-09-27 10:05     ` Vladimir Sementsov-Ogievskiy
@ 2017-09-27 12:32       ` Vladimir Sementsov-Ogievskiy
  2017-09-27 15:10         ` [Qemu-devel] [PATCH v1.1 DRAFT] " Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-27 12:32 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block; +Cc: mreitz, kwolf, pbonzini, den

27.09.2017 13:05, Vladimir Sementsov-Ogievskiy wrote:
> 26.09.2017 01:19, Eric Blake wrote:
>> On 09/25/2017 08:58 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Minimal implementation: drop most of additional error information.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/nbd-client.h  |   2 +
>>>   include/block/nbd.h |  15 ++++-
>>>   block/nbd-client.c  |  97 +++++++++++++++++++++++++-----
>>>   nbd/client.c        | 169 
>>> +++++++++++++++++++++++++++++++++++++++++++++++-----
>>>   4 files changed, 249 insertions(+), 34 deletions(-)
>>>
>>> +++ b/include/block/nbd.h
>>> @@ -57,11 +57,17 @@ struct NBDRequest {
>>>   };
>>>   typedef struct NBDRequest NBDRequest;
>>>   -struct NBDReply {
>>> +typedef struct NBDReply {
>>> +    bool simple;
>>>       uint64_t handle;
>>>       uint32_t error;
>>> -};
>>> -typedef struct NBDReply NBDReply;
>>> +
>>> +    uint16_t flags;
>>> +    uint16_t type;
>>> +    uint32_t tail_length;
>>> +    uint64_t offset;
>>> +    uint32_t hole_size;
>>> +} NBDReply;
>> This feels like it should be a discriminated union, rather than a struct
>> containing fields that are only sometimes valid...
>
> No:
>
> simple, handle and error are always valid
> flags, type, tail_length are valid for all structured replies
> offset and hole_size are valid for structured hole reply
>
> so, we have one case when all fields are valid.. how do you see it 
> with union, what is the real difference? I think it would be just a 
> complication.
>
>>
>>>     #define NBD_SREP_TYPE_NONE 0
>>>   #define NBD_SREP_TYPE_OFFSET_DATA   1
>>> +#define NBD_SREP_TYPE_OFFSET_HOLE   2
>>>   #define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
>>> +#define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)
>> ...especially since there is more than one type of SREP packet layout.
>>
>> I also wonder why you are defining constants in a piecemeal fashion,
>> rather than all at once (even if your minimal server implementation
>> doesn't send a particular constant, there's no harm in defining them all
>> up front).
>
> hmm. just to not define unused constants. It doesn't matter, I can 
> define them all if you prefer.
>
>>
>>> +++ b/block/nbd-client.c
>>> @@ -179,9 +179,10 @@ err:
>>>       return rc;
>>>   }
>>>   -static int nbd_co_receive_reply(NBDClientSession *s,
>>> -                                uint64_t handle,
>>> -                                QEMUIOVector *qiov)
>>> +static int nbd_co_receive_1_reply_or_chunk(NBDClientSession *s,
>> Long name, and unusual to mix in "1" instead of "one".  Would it be
>> better to name it nbd_co_receive_chunk, where we declare that a simple
>> reply is (roughly) the same amount of effort as a chunk in a structured
>> reply?
>>
>>> + uint64_t handle,
>>> +                                           bool *cont,
>>> +                                           QEMUIOVector *qiov)
>>>   {
>> No documentation of the function?
>>
>>>       int ret;
>>>       int i = HANDLE_TO_INDEX(s, handle);
>>> @@ -191,29 +192,95 @@ static int 
>>> nbd_co_receive_reply(NBDClientSession *s,
>>>       qemu_coroutine_yield();
>>>       s->requests[i].receiving = false;
>>>       if (!s->ioc || s->quit) {
>>> -        ret = -EIO;
>>> -    } else {
>>> -        assert(s->reply.handle == handle);
>>> -        ret = -s->reply.error;
>>> -        if (qiov && s->reply.error == 0) {
>>> +        *cont = false;
>>> +        return -EIO;
>>> +    }
>>> +
>>> +    assert(s->reply.handle == handle);
>>> +    *cont = !(s->reply.simple || (s->reply.flags & 
>>> NBD_SREP_FLAG_DONE));
>> We need to validate that the server is not sending us SREP chunks unless
>> we negotiated them.  I'm thinking it might be better to do it here
>> (maybe you did it somewhere else, but I haven't seen it yet; I'm
>> reviewing the patch in textual order rather than the order in which
>> things are called).
>
> No, I didn't. Will add (may be early, in reply_entry).
>
>>
>>> +    ret = -s->reply.error;
>>> +    if (ret < 0) {
>>> +        goto out;
>>> +    }
>>> +
>>> +    if (s->reply.simple) {
>>> +        if (qiov) {
>>>               if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
>>> -                                      NULL) < 0) {
>>> -                ret = -EIO;
>>> -                s->quit = true;
>>> +                                      NULL) < 0)
>>> +            {
>>> +                goto fatal;
>>>               }
>>>           }
>>> +        goto out;
>>> +    }
>>>   -        /* Tell the read handler to read another header. */
>>> -        s->reply.handle = 0;
>>> +    /* here we deal with successful structured reply */
>>> +    switch (s->reply.type) {
>>> +        QEMUIOVector sub_qiov;
>>> +    case NBD_SREP_TYPE_OFFSET_DATA:
>> This is putting a LOT of smarts directly into the receive routine.
>> Here's where I was previously wondering (and I think Paolo as well)
>> whether it might be better to split the efforts: the generic function
>> reads off the chunk information and any payload, but a per-command
>
> Hmm. it was my idea to move all reading into one coroutine (in my 
> refactoring series, but Paolo was against).
>
> Or you mean to read a payload as raw? It would lead to double copying 
> it to destination qiov, which I dislike..
>
>> callback function then parses the chunks.
>
> per-command? Then callback for CMD_READ would have all of these 
> "smarts", so the whole code would not be simpler.. (However it will 
> simplify adding block-status later).
>
>>    Especially since the
>> definition of the chunks differs on a per-command basis (yes, the NBD
>> spec will try to not reuse an SREP chunk type across multiple commands
>> unless the semantics are similar, but that's a bit more fragile).  This
>> particularly matters given my statement above that you want a
>> discriminated union, rather than a struct that contains unused fields,
>> for handling different SREP chunk types.
>>
>> My review has to pause here for now...
>>
>>
>
>

I'll post a proposal of extendable architecture for this today to have a 
more concrete discussion.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v1.1 DRAFT] nbd: Minimal structured read for client
  2017-09-27 12:32       ` Vladimir Sementsov-Ogievskiy
@ 2017-09-27 15:10         ` Vladimir Sementsov-Ogievskiy
  2017-10-03  9:59           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-09-27 15:10 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: pbonzini, eblake, mreitz, kwolf, vsementsov, den

Minimal implementation: drop most of additional error information.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

Hi!

Here is a draft of how to refactor reply-payload receiving if you don't
like my previous simple (but not flexible) try. Ofcourse, if we agree on this
approach this patch should be splitted into several ones and many things
(error handling) should be improved.

The idea is:

nbd_read_reply_entry reads only reply header through nbd/client.c code.

Then, the payload is read through block/nbd-client-cmds.c:
simple payload: generic per-command handler, however it should only exist
  for CMD_READ
structured NONE: no payload, handle in nbd_co_receive_one_chunk
structured error: read by nbd_handle_structured_error_payload
  defined in block/nbd-client-cmds.c
structured success: read by per-[command X reply-type] handler
  defined in block/nbd-client-cmds.c

For now nbd-client-cmds.c looks more like nbd-payload.c, but, may be, we
should move command sending special-casing (CMD_WRITE) to it too..

Don't waste time on careful reviewing this patch, let's consider first
the concept of nbd-client-cmds.c,

 block/nbd-client.h      |  10 +++
 include/block/nbd.h     |  82 ++++++++++++++++--
 nbd/nbd-internal.h      |  25 ------
 block/nbd-client-cmds.c | 220 ++++++++++++++++++++++++++++++++++++++++++++++++
 block/nbd-client.c      | 118 ++++++++++++++++++++------
 nbd/client.c            | 128 ++++++++++++++++------------
 block/Makefile.objs     |   2 +-
 7 files changed, 475 insertions(+), 110 deletions(-)
 create mode 100644 block/nbd-client-cmds.c

diff --git a/block/nbd-client.h b/block/nbd-client.h
index b435754b82..abb88e4ea5 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -35,6 +35,8 @@ typedef struct NBDClientSession {
     NBDClientRequest requests[MAX_NBD_REQUESTS];
     NBDReply reply;
     bool quit;
+
+    bool structured_reply;
 } NBDClientSession;
 
 NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
@@ -60,4 +62,12 @@ void nbd_client_detach_aio_context(BlockDriverState *bs);
 void nbd_client_attach_aio_context(BlockDriverState *bs,
                                    AioContext *new_context);
 
+int nbd_handle_structured_payload(QIOChannel *ioc, int cmd,
+                                  NBDStructuredReplyChunk *reply, void *opaque);
+int nbd_handle_simple_payload(QIOChannel *ioc, int cmd, void *opaque);
+
+int nbd_handle_structured_error_payload(QIOChannel *ioc,
+                                        NBDStructuredReplyChunk *reply,
+                                        int *request_ret);
+
 #endif /* NBD_CLIENT_H */
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 314f2f9bbc..b9a4e1dfa9 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -57,12 +57,6 @@ struct NBDRequest {
 };
 typedef struct NBDRequest NBDRequest;
 
-struct NBDReply {
-    uint64_t handle;
-    uint32_t error;
-};
-typedef struct NBDReply NBDReply;
-
 typedef struct NBDSimpleReply {
     uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
     uint32_t error;
@@ -77,6 +71,24 @@ typedef struct NBDStructuredReplyChunk {
     uint32_t length; /* length of payload */
 } QEMU_PACKED NBDStructuredReplyChunk;
 
+typedef union NBDReply {
+    NBDSimpleReply simple;
+    NBDStructuredReplyChunk structured;
+    struct {
+        uint32_t magic;
+        uint32_t _skip;
+        uint64_t handle;
+    } QEMU_PACKED;
+} NBDReply;
+
+#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
+#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
+
+static inline bool nbd_reply_is_simple(NBDReply *reply)
+{
+    return reply->magic == NBD_SIMPLE_REPLY_MAGIC;
+}
+
 typedef struct NBDStructuredRead {
     NBDStructuredReplyChunk h;
     uint64_t offset;
@@ -88,6 +100,11 @@ typedef struct NBDStructuredError {
     uint16_t message_length;
 } QEMU_PACKED NBDStructuredError;
 
+typedef struct NBDPayloadOffsetHole {
+    uint64_t offset;
+    uint32_t hole_size;
+} QEMU_PACKED NBDPayloadOffsetHole;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS         (1 << 0) /* Flags are there */
@@ -178,12 +195,54 @@ enum {
 
 #define NBD_SREP_TYPE_NONE          0
 #define NBD_SREP_TYPE_OFFSET_DATA   1
+#define NBD_SREP_TYPE_OFFSET_HOLE   2
 #define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
+#define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)
+
+static inline bool nbd_srep_type_is_error(int type)
+{
+    return type & (1 << 15);
+}
+
+/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
+ * but only a limited set of errno values is specified in the protocol.
+ * Everything else is squashed to EINVAL.
+ */
+#define NBD_SUCCESS    0
+#define NBD_EPERM      1
+#define NBD_EIO        5
+#define NBD_ENOMEM     12
+#define NBD_EINVAL     22
+#define NBD_ENOSPC     28
+#define NBD_ESHUTDOWN  108
+
+static inline int nbd_errno_to_system_errno(int err)
+{
+    switch (err) {
+    case NBD_SUCCESS:
+        return 0;
+    case NBD_EPERM:
+        return EPERM;
+    case NBD_EIO:
+        return EIO;
+    case NBD_ENOMEM:
+        return ENOMEM;
+    case NBD_ENOSPC:
+        return ENOSPC;
+    case NBD_ESHUTDOWN:
+        return ESHUTDOWN;
+    case NBD_EINVAL:
+        return EINVAL;
+    }
+
+    return EINVAL;
+}
 
 /* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
 struct NBDExportInfo {
     /* Set by client before nbd_receive_negotiate() */
     bool request_sizes;
+    bool structured_reply;
     /* Set by server results during nbd_receive_negotiate() */
     uint64_t size;
     uint16_t flags;
@@ -233,4 +292,15 @@ void nbd_client_put(NBDClient *client);
 void nbd_server_start(SocketAddress *addr, const char *tls_creds,
                       Error **errp);
 
+/* nbd_read
+ * Reads @size bytes from @ioc. Returns 0 on success.
+ */
+static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size,
+                           Error **errp)
+{
+    return qio_channel_read_all(ioc, buffer, size, errp) < 0 ? -EIO : 0;
+}
+
+int nbd_drop(QIOChannel *ioc, size_t size, Error **errp);
+
 #endif
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 6b0d1183ba..9f7b6b68e8 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -47,8 +47,6 @@
 #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124)
 
 #define NBD_REQUEST_MAGIC           0x25609513
-#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
-#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
 #define NBD_OPTS_MAGIC              0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC            0x0000420281861253LL
 #define NBD_REP_MAGIC               0x0003e889045565a9LL
@@ -65,18 +63,6 @@
 #define NBD_SET_TIMEOUT             _IO(0xab, 9)
 #define NBD_SET_FLAGS               _IO(0xab, 10)
 
-/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
- * but only a limited set of errno values is specified in the protocol.
- * Everything else is squashed to EINVAL.
- */
-#define NBD_SUCCESS    0
-#define NBD_EPERM      1
-#define NBD_EIO        5
-#define NBD_ENOMEM     12
-#define NBD_EINVAL     22
-#define NBD_ENOSPC     28
-#define NBD_ESHUTDOWN  108
-
 /* nbd_read_eof
  * Tries to read @size bytes from @ioc.
  * Returns 1 on success
@@ -96,15 +82,6 @@ static inline int nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size,
     return ret;
 }
 
-/* nbd_read
- * Reads @size bytes from @ioc. Returns 0 on success.
- */
-static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size,
-                           Error **errp)
-{
-    return qio_channel_read_all(ioc, buffer, size, errp) < 0 ? -EIO : 0;
-}
-
 /* nbd_write
  * Writes @size bytes to @ioc. Returns 0 on success.
  */
@@ -137,6 +114,4 @@ const char *nbd_rep_lookup(uint32_t rep);
 const char *nbd_info_lookup(uint16_t info);
 const char *nbd_cmd_lookup(uint16_t info);
 
-int nbd_drop(QIOChannel *ioc, size_t size, Error **errp);
-
 #endif
diff --git a/block/nbd-client-cmds.c b/block/nbd-client-cmds.c
new file mode 100644
index 0000000000..488a3dc267
--- /dev/null
+++ b/block/nbd-client-cmds.c
@@ -0,0 +1,220 @@
+/*
+ * QEMU Block driver for NBD
+ *
+ * Copyright (c) 2017 Parallels International GmbH
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "nbd-client.h"
+
+
+typedef int (*NBDCommandFn)(QIOChannel *ioc,
+                            NBDStructuredReplyChunk *chunk,
+                            void *opaque);
+typedef struct NBDCommand {
+    int (*read_simple_payload)(QIOChannel *ioc, void *opaque);
+    NBDCommandFn read_offset_data;
+    NBDCommandFn read_offset_hole;
+} NBDCommand;
+
+static int nbd_cmd_read__siple_payload(QIOChannel *ioc, void *opaque)
+{
+    QEMUIOVector *qiov = (QEMUIOVector *)opaque;
+    return qio_channel_readv_all(ioc, qiov->iov, qiov->niov, NULL);
+}
+
+static int nbd_cmd_read__offset_data(QIOChannel *ioc,
+                                     NBDStructuredReplyChunk *chunk,
+                                     void *opaque)
+{
+    QEMUIOVector *qiov = (QEMUIOVector *)opaque;
+    QEMUIOVector sub_qiov;
+    uint64_t offset;
+    size_t data_size;
+    int ret;
+
+    if (chunk->length < sizeof(offset)) {
+        return -1;
+    }
+
+    if (nbd_read(ioc, &offset, sizeof(offset), NULL) < 0) {
+        return -1;
+    }
+    be64_to_cpus(&offset);
+
+    data_size = chunk->length - sizeof(offset);
+    if (offset + data_size > qiov->size) {
+        return -1;
+    }
+
+    qemu_iovec_init(&sub_qiov, qiov->niov);
+    qemu_iovec_concat(&sub_qiov, qiov, offset, data_size);
+    ret = qio_channel_readv_all(ioc, sub_qiov.iov, sub_qiov.niov, NULL);
+    qemu_iovec_destroy(&sub_qiov);
+
+    return ret;
+}
+
+static int nbd_cmd_read__offset_hole(QIOChannel *ioc,
+                                     NBDStructuredReplyChunk *chunk,
+                                     void *opaque)
+{
+    QEMUIOVector *qiov = (QEMUIOVector *)opaque;
+    NBDPayloadOffsetHole pl;
+
+    if (chunk->length != sizeof(pl)) {
+        return -1;
+    }
+
+    if (nbd_read(ioc, &pl, sizeof(pl), NULL) < 0) {
+        return -1;
+    }
+
+    be64_to_cpus(&pl.offset);
+    be32_to_cpus(&pl.hole_size);
+
+    if (pl.offset + pl.hole_size > qiov->size) {
+        return -1;
+    }
+
+    qemu_iovec_memset(qiov, pl.offset, 0, pl.hole_size);
+
+    return 0;
+}
+
+NBDCommand nbd_cmd_read = {
+    .read_simple_payload = nbd_cmd_read__siple_payload,
+    .read_offset_data = nbd_cmd_read__offset_data,
+    .read_offset_hole = nbd_cmd_read__offset_hole,
+};
+
+static NBDCommand *nbd_get_cmd(int cmd)
+{
+    switch (cmd) {
+    case NBD_CMD_READ:
+        return &nbd_cmd_read;
+    }
+
+    return NULL;
+}
+
+static NBDCommandFn nbd_cmd_get_handler(int cmd, int reply_type)
+{
+    NBDCommand *command = nbd_get_cmd(cmd);
+
+    if (command == NULL) {
+        return NULL;
+    }
+
+    switch (reply_type) {
+    case NBD_SREP_TYPE_OFFSET_DATA:
+        return command->read_offset_data;
+    case NBD_SREP_TYPE_OFFSET_HOLE:
+        return command->read_offset_hole;
+    }
+
+    return NULL;
+}
+
+/* nbd_handle_structured_payload
+ * Find corresponding handler and call it.
+ * If not found return -1 (fail)
+ */
+int nbd_handle_structured_payload(QIOChannel *ioc, int cmd, NBDStructuredReplyChunk *chunk,
+                                  void *opaque)
+{
+    NBDCommandFn fn = nbd_cmd_get_handler(cmd, chunk->type);
+
+    if (fn == NULL) {
+        return -1;
+    }
+
+    return fn(ioc, chunk, opaque);
+}
+
+/* nbd_handle_simple_payload
+ * Find corresponding handler and call it.
+ * If not found return 0 (success)
+ */
+int nbd_handle_simple_payload(QIOChannel *ioc, int cmd, void *opaque)
+{
+    NBDCommand *command = nbd_get_cmd(cmd);
+
+    if (command == NULL || command->read_simple_payload == NULL) {
+        return 0;
+    }
+
+    return command->read_simple_payload(ioc, opaque);
+}
+
+int nbd_handle_structured_error_payload(QIOChannel *ioc, NBDStructuredReplyChunk *chunk,
+                                        int *request_ret)
+{
+    uint32_t error;
+    uint16_t message_size;
+    int ret;
+    assert(chunk->type & (1 << 15));
+
+    switch (chunk->type) {
+    case NBD_SREP_TYPE_ERROR:
+    case NBD_SREP_TYPE_ERROR_OFFSET:
+        ret = nbd_read(ioc, &error, sizeof(error), NULL);
+        if (ret < 0) {
+            return ret;
+        }
+        be32_to_cpus(&error);
+
+        ret = nbd_read(ioc, &message_size, sizeof(message_size), NULL);
+        if (ret < 0) {
+            return ret;
+        }
+        be16_to_cpus(&message_size);
+
+        if (message_size > 0) {
+            /* TODO: provide error message to user */
+            ret = nbd_drop(ioc, message_size, NULL);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+
+        if (chunk->type == NBD_SREP_TYPE_ERROR_OFFSET) {
+            /* drop 64bit offset */
+            ret = nbd_drop(ioc, 8, NULL);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+        break;
+    default:
+        /* unknown error */
+        ret = nbd_drop(ioc, chunk->length, NULL);
+        if (ret < 0) {
+            return ret;
+        }
+
+        error = NBD_EINVAL;
+    }
+
+    *request_ret = nbd_errno_to_system_errno(error);
+    return 0;
+}
diff --git a/block/nbd-client.c b/block/nbd-client.c
index e4f0c789f4..cf80564a83 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -179,9 +179,9 @@ err:
     return rc;
 }
 
-static int nbd_co_receive_reply(NBDClientSession *s,
-                                uint64_t handle,
-                                QEMUIOVector *qiov)
+static int nbd_co_receive_one_chunk(NBDClientSession *s, uint64_t handle,
+                                    int request_cmd, void *request_opaque,
+                                    bool *cont)
 {
     int ret;
     int i = HANDLE_TO_INDEX(s, handle);
@@ -191,29 +191,93 @@ static int nbd_co_receive_reply(NBDClientSession *s,
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
     if (!s->ioc || s->quit) {
-        ret = -EIO;
-    } else {
-        assert(s->reply.handle == handle);
-        ret = -s->reply.error;
-        if (qiov && s->reply.error == 0) {
-            if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
-                                      NULL) < 0) {
-                ret = -EIO;
-                s->quit = true;
+        *cont = false;
+        return -EIO;
+    }
+
+    assert(s->reply.handle == handle);
+
+    if (nbd_reply_is_simple(&s->reply)) {
+        *cont = false;
+        ret = -s->reply.simple.error;
+        if (s->structured_reply && request_cmd == NBD_CMD_READ) {
+            goto fatal;
+        }
+        if (ret == 0) {
+            if (nbd_handle_simple_payload(s->ioc, request_cmd,
+                                          request_opaque) < 0)
+            {
+                goto fatal;
             }
         }
+        goto out;
+    }
+
+    /* handle structured reply chunk */
+
+    *cont = !(s->reply.structured.flags & NBD_SREP_FLAG_DONE);
 
-        /* Tell the read handler to read another header.  */
-        s->reply.handle = 0;
+    if (s->reply.structured.type == NBD_SREP_TYPE_NONE) {
+        goto out;
     }
 
-    s->requests[i].coroutine = NULL;
+    if (nbd_srep_type_is_error(s->reply.structured.type)) {
+        if (nbd_handle_structured_error_payload(s->ioc, &s->reply.structured,
+                                                &ret) < 0)
+        {
+            goto fatal;
+        }
+        goto out;
+    }
+
+    /* here we deal with successful not-NONE structured reply */
+    if (nbd_handle_structured_payload(s->ioc, request_cmd,
+                                      &s->reply.structured,
+                                      request_opaque) < 0)
+    {
+        goto fatal;
+    }
+
+out:
+    /* For assert at loop start in nbd_read_reply_entry */
+    s->reply.handle = 0;
 
-    /* Kick the read_reply_co to get the next reply.  */
     if (s->read_reply_co) {
         aio_co_wake(s->read_reply_co);
     }
 
+    return ret;
+
+fatal:
+    /* protocol or ioc failure */
+    *cont = false;
+    s->quit = true;
+    if (s->read_reply_co) {
+        aio_co_wake(s->read_reply_co);
+    }
+
+    return -EIO;
+}
+
+static int nbd_co_receive_reply(NBDClientSession *s,
+                                uint64_t handle,
+                                int request_cmd,
+                                void *request_opaque)
+{
+    int ret = 0;
+    int i = HANDLE_TO_INDEX(s, handle);
+    bool cont = true;
+
+    while (cont) {
+        int rc = nbd_co_receive_one_chunk(s, handle, request_cmd,
+                                          request_opaque, &cont);
+        if (rc < 0 && ret == 0) {
+            ret = rc;
+        }
+    }
+
+    s->requests[i].coroutine = NULL;
+
     qemu_co_mutex_lock(&s->send_mutex);
     s->in_flight--;
     qemu_co_queue_next(&s->free_sema);
@@ -224,22 +288,28 @@ static int nbd_co_receive_reply(NBDClientSession *s,
 
 static int nbd_co_request(BlockDriverState *bs,
                           NBDRequest *request,
-                          QEMUIOVector *qiov)
+                          void *request_opaque)
 {
     NBDClientSession *client = nbd_get_client_session(bs);
     int ret;
+    QEMUIOVector *send_qiov = NULL;
+
+    if (request->type == NBD_CMD_WRITE) {
+        /* TODO: move request sending special casing to nbd-client-cmds.c like
+         * receiving part. */
+        send_qiov = request_opaque;
+        request_opaque = NULL;
+        assert(send_qiov &&
+               request->len == iov_size(send_qiov->iov, send_qiov->niov));
+    }
 
-    assert(!qiov || request->type == NBD_CMD_WRITE ||
-           request->type == NBD_CMD_READ);
-    assert(!qiov || request->len == iov_size(qiov->iov, qiov->niov));
-    ret = nbd_co_send_request(bs, request,
-                              request->type == NBD_CMD_WRITE ? qiov : NULL);
+    ret = nbd_co_send_request(bs, request, send_qiov);
     if (ret < 0) {
         return ret;
     }
 
-    return nbd_co_receive_reply(client, request->handle,
-                                request->type == NBD_CMD_READ ? qiov : NULL);
+    return nbd_co_receive_reply(client, request->handle, request->type,
+                                request_opaque);
 }
 
 int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
diff --git a/nbd/client.c b/nbd/client.c
index 51ae492e92..d0e9b8f138 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -22,38 +22,6 @@
 #include "trace.h"
 #include "nbd-internal.h"
 
-static int nbd_errno_to_system_errno(int err)
-{
-    int ret;
-    switch (err) {
-    case NBD_SUCCESS:
-        ret = 0;
-        break;
-    case NBD_EPERM:
-        ret = EPERM;
-        break;
-    case NBD_EIO:
-        ret = EIO;
-        break;
-    case NBD_ENOMEM:
-        ret = ENOMEM;
-        break;
-    case NBD_ENOSPC:
-        ret = ENOSPC;
-        break;
-    case NBD_ESHUTDOWN:
-        ret = ESHUTDOWN;
-        break;
-    default:
-        trace_nbd_unknown_error(err);
-        /* fallthrough */
-    case NBD_EINVAL:
-        ret = EINVAL;
-        break;
-    }
-    return ret;
-}
-
 /* Definitions for opaque data types */
 
 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
@@ -719,6 +687,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
         if (fixedNewStyle) {
             int result;
 
+            result = nbd_request_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
+                                               errp);
+            if (result < 0) {
+                goto fail;
+            }
+            info->structured_reply = result > 0;
+
             /* Try NBD_OPT_GO first - if it works, we are done (it
              * also gives us a good message if the server requires
              * TLS).  If it is not available, fall back to
@@ -759,6 +734,12 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
             goto fail;
         }
         be16_to_cpus(&info->flags);
+
+        if (info->structured_reply && !(info->flags & NBD_CMD_FLAG_DF)) {
+            error_setg(errp, "Structured reply is negotiated, "
+                             "but DF flag is not.");
+            goto fail;
+        }
     } else if (magic == NBD_CLIENT_MAGIC) {
         uint32_t oldflags;
 
@@ -942,6 +923,46 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
     return nbd_write(ioc, buf, sizeof(buf), NULL);
 }
 
+/* nbd_receive_simple_reply
+ * Read simple reply except magic field (which should be already read)
+ */
+static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply,
+                                   Error **errp)
+{
+    int ret = nbd_read(ioc, (uint8_t *)reply + sizeof(reply->magic),
+                       sizeof(*reply) - sizeof(reply->magic), errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    be32_to_cpus(&reply->error);
+    be64_to_cpus(&reply->handle);
+
+    return 0;
+}
+
+/* nbd_receive_structured_reply_chunk
+ * Read structured reply chunk except magic field (which should be already read)
+ * Payload is not read.
+ */
+static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
+                                              NBDStructuredReplyChunk *chunk,
+                                              Error **errp)
+{
+    int ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
+                       sizeof(*chunk) - sizeof(chunk->magic), errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    be16_to_cpus(&chunk->flags);
+    be16_to_cpus(&chunk->type);
+    be64_to_cpus(&chunk->handle);
+    be32_to_cpus(&chunk->length);
+
+    return 0;
+}
+
 /* nbd_receive_reply
  * Returns 1 on success
  *         0 on eof, when no data was read (errp is not set)
@@ -949,39 +970,38 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
  */
 int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
 {
-    uint8_t buf[NBD_REPLY_SIZE];
-    uint32_t magic;
     int ret;
 
-    ret = nbd_read_eof(ioc, buf, sizeof(buf), errp);
+    ret = nbd_read_eof(ioc, &reply->magic, sizeof(reply->magic), errp);
     if (ret <= 0) {
         return ret;
     }
 
-    /* Reply
-       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
-       [ 4 ..  7]    error   (0 == no error)
-       [ 7 .. 15]    handle
-     */
+    be32_to_cpus(&reply->magic);
 
-    magic = ldl_be_p(buf);
-    reply->error  = ldl_be_p(buf + 4);
-    reply->handle = ldq_be_p(buf + 8);
+    switch (reply->magic) {
+    case NBD_SIMPLE_REPLY_MAGIC:
+        ret = nbd_receive_simple_reply(ioc, &reply->simple, errp);
+        reply->simple.error = nbd_errno_to_system_errno(reply->simple.error);
 
-    reply->error = nbd_errno_to_system_errno(reply->error);
-
-    if (reply->error == ESHUTDOWN) {
-        /* This works even on mingw which lacks a native ESHUTDOWN */
-        error_setg(errp, "server shutting down");
+        if (reply->simple.error == ESHUTDOWN) {
+            /* This works even on mingw which lacks a native ESHUTDOWN */
+            error_setg(errp, "server shutting down");
+            return -EINVAL;
+        }
+        trace_nbd_receive_reply(reply->magic, reply->simple.error,
+                                reply->simple.handle);
+        break;
+    case NBD_STRUCTURED_REPLY_MAGIC:
+        ret = nbd_receive_structured_reply_chunk(ioc, &reply->structured, errp);
+        break;
+    default:
+        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", reply->magic);
         return -EINVAL;
     }
-    trace_nbd_receive_reply(magic, reply->error, reply->handle);
-
-    if (magic != NBD_SIMPLE_REPLY_MAGIC) {
-        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
-        return -EINVAL;
+    if (ret < 0) {
+        return ret;
     }
 
     return 1;
 }
-
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 6eaf78a046..c99420f42b 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -12,7 +12,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 block-obj-y += null.o mirror.o commit.o io.o
 block-obj-y += throttle-groups.o
 
-block-obj-y += nbd.o nbd-client.o sheepdog.o
+block-obj-y += nbd.o nbd-client.o nbd-client-cmds.o sheepdog.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 block-obj-$(if $(CONFIG_LIBISCSI),y,n) += iscsi-opts.o
 block-obj-$(CONFIG_LIBNFS) += nfs.o
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v1.1 DRAFT] nbd: Minimal structured read for client
  2017-09-27 15:10         ` [Qemu-devel] [PATCH v1.1 DRAFT] " Vladimir Sementsov-Ogievskiy
@ 2017-10-03  9:59           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-10-03  9:59 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: pbonzini, eblake, mreitz, kwolf, den

Eric?


27.09.2017 18:10, Vladimir Sementsov-Ogievskiy wrote:
> Minimal implementation: drop most of additional error information.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>
> Hi!
>
> Here is a draft of how to refactor reply-payload receiving if you don't
> like my previous simple (but not flexible) try. Ofcourse, if we agree on this
> approach this patch should be splitted into several ones and many things
> (error handling) should be improved.
>
> The idea is:
>
> nbd_read_reply_entry reads only reply header through nbd/client.c code.
>
> Then, the payload is read through block/nbd-client-cmds.c:
> simple payload: generic per-command handler, however it should only exist
>    for CMD_READ
> structured NONE: no payload, handle in nbd_co_receive_one_chunk
> structured error: read by nbd_handle_structured_error_payload
>    defined in block/nbd-client-cmds.c
> structured success: read by per-[command X reply-type] handler
>    defined in block/nbd-client-cmds.c
>
> For now nbd-client-cmds.c looks more like nbd-payload.c, but, may be, we
> should move command sending special-casing (CMD_WRITE) to it too..
>
> Don't waste time on careful reviewing this patch, let's consider first
> the concept of nbd-client-cmds.c,
>
>   block/nbd-client.h      |  10 +++
>   include/block/nbd.h     |  82 ++++++++++++++++--
>   nbd/nbd-internal.h      |  25 ------
>   block/nbd-client-cmds.c | 220 ++++++++++++++++++++++++++++++++++++++++++++++++
>   block/nbd-client.c      | 118 ++++++++++++++++++++------
>   nbd/client.c            | 128 ++++++++++++++++------------
>   block/Makefile.objs     |   2 +-
>   7 files changed, 475 insertions(+), 110 deletions(-)
>   create mode 100644 block/nbd-client-cmds.c
>
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index b435754b82..abb88e4ea5 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -35,6 +35,8 @@ typedef struct NBDClientSession {
>       NBDClientRequest requests[MAX_NBD_REQUESTS];
>       NBDReply reply;
>       bool quit;
> +
> +    bool structured_reply;
>   } NBDClientSession;
>   
>   NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
> @@ -60,4 +62,12 @@ void nbd_client_detach_aio_context(BlockDriverState *bs);
>   void nbd_client_attach_aio_context(BlockDriverState *bs,
>                                      AioContext *new_context);
>   
> +int nbd_handle_structured_payload(QIOChannel *ioc, int cmd,
> +                                  NBDStructuredReplyChunk *reply, void *opaque);
> +int nbd_handle_simple_payload(QIOChannel *ioc, int cmd, void *opaque);
> +
> +int nbd_handle_structured_error_payload(QIOChannel *ioc,
> +                                        NBDStructuredReplyChunk *reply,
> +                                        int *request_ret);
> +
>   #endif /* NBD_CLIENT_H */
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 314f2f9bbc..b9a4e1dfa9 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -57,12 +57,6 @@ struct NBDRequest {
>   };
>   typedef struct NBDRequest NBDRequest;
>   
> -struct NBDReply {
> -    uint64_t handle;
> -    uint32_t error;
> -};
> -typedef struct NBDReply NBDReply;
> -
>   typedef struct NBDSimpleReply {
>       uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
>       uint32_t error;
> @@ -77,6 +71,24 @@ typedef struct NBDStructuredReplyChunk {
>       uint32_t length; /* length of payload */
>   } QEMU_PACKED NBDStructuredReplyChunk;
>   
> +typedef union NBDReply {
> +    NBDSimpleReply simple;
> +    NBDStructuredReplyChunk structured;
> +    struct {
> +        uint32_t magic;
> +        uint32_t _skip;
> +        uint64_t handle;
> +    } QEMU_PACKED;
> +} NBDReply;
> +
> +#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
> +#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
> +
> +static inline bool nbd_reply_is_simple(NBDReply *reply)
> +{
> +    return reply->magic == NBD_SIMPLE_REPLY_MAGIC;
> +}
> +
>   typedef struct NBDStructuredRead {
>       NBDStructuredReplyChunk h;
>       uint64_t offset;
> @@ -88,6 +100,11 @@ typedef struct NBDStructuredError {
>       uint16_t message_length;
>   } QEMU_PACKED NBDStructuredError;
>   
> +typedef struct NBDPayloadOffsetHole {
> +    uint64_t offset;
> +    uint32_t hole_size;
> +} QEMU_PACKED NBDPayloadOffsetHole;
> +
>   /* Transmission (export) flags: sent from server to client during handshake,
>      but describe what will happen during transmission */
>   #define NBD_FLAG_HAS_FLAGS         (1 << 0) /* Flags are there */
> @@ -178,12 +195,54 @@ enum {
>   
>   #define NBD_SREP_TYPE_NONE          0
>   #define NBD_SREP_TYPE_OFFSET_DATA   1
> +#define NBD_SREP_TYPE_OFFSET_HOLE   2
>   #define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
> +#define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)
> +
> +static inline bool nbd_srep_type_is_error(int type)
> +{
> +    return type & (1 << 15);
> +}
> +
> +/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
> + * but only a limited set of errno values is specified in the protocol.
> + * Everything else is squashed to EINVAL.
> + */
> +#define NBD_SUCCESS    0
> +#define NBD_EPERM      1
> +#define NBD_EIO        5
> +#define NBD_ENOMEM     12
> +#define NBD_EINVAL     22
> +#define NBD_ENOSPC     28
> +#define NBD_ESHUTDOWN  108
> +
> +static inline int nbd_errno_to_system_errno(int err)
> +{
> +    switch (err) {
> +    case NBD_SUCCESS:
> +        return 0;
> +    case NBD_EPERM:
> +        return EPERM;
> +    case NBD_EIO:
> +        return EIO;
> +    case NBD_ENOMEM:
> +        return ENOMEM;
> +    case NBD_ENOSPC:
> +        return ENOSPC;
> +    case NBD_ESHUTDOWN:
> +        return ESHUTDOWN;
> +    case NBD_EINVAL:
> +        return EINVAL;
> +    }
> +
> +    return EINVAL;
> +}
>   
>   /* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
>   struct NBDExportInfo {
>       /* Set by client before nbd_receive_negotiate() */
>       bool request_sizes;
> +    bool structured_reply;
>       /* Set by server results during nbd_receive_negotiate() */
>       uint64_t size;
>       uint16_t flags;
> @@ -233,4 +292,15 @@ void nbd_client_put(NBDClient *client);
>   void nbd_server_start(SocketAddress *addr, const char *tls_creds,
>                         Error **errp);
>   
> +/* nbd_read
> + * Reads @size bytes from @ioc. Returns 0 on success.
> + */
> +static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size,
> +                           Error **errp)
> +{
> +    return qio_channel_read_all(ioc, buffer, size, errp) < 0 ? -EIO : 0;
> +}
> +
> +int nbd_drop(QIOChannel *ioc, size_t size, Error **errp);
> +
>   #endif
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 6b0d1183ba..9f7b6b68e8 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -47,8 +47,6 @@
>   #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124)
>   
>   #define NBD_REQUEST_MAGIC           0x25609513
> -#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
> -#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
>   #define NBD_OPTS_MAGIC              0x49484156454F5054LL
>   #define NBD_CLIENT_MAGIC            0x0000420281861253LL
>   #define NBD_REP_MAGIC               0x0003e889045565a9LL
> @@ -65,18 +63,6 @@
>   #define NBD_SET_TIMEOUT             _IO(0xab, 9)
>   #define NBD_SET_FLAGS               _IO(0xab, 10)
>   
> -/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
> - * but only a limited set of errno values is specified in the protocol.
> - * Everything else is squashed to EINVAL.
> - */
> -#define NBD_SUCCESS    0
> -#define NBD_EPERM      1
> -#define NBD_EIO        5
> -#define NBD_ENOMEM     12
> -#define NBD_EINVAL     22
> -#define NBD_ENOSPC     28
> -#define NBD_ESHUTDOWN  108
> -
>   /* nbd_read_eof
>    * Tries to read @size bytes from @ioc.
>    * Returns 1 on success
> @@ -96,15 +82,6 @@ static inline int nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size,
>       return ret;
>   }
>   
> -/* nbd_read
> - * Reads @size bytes from @ioc. Returns 0 on success.
> - */
> -static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size,
> -                           Error **errp)
> -{
> -    return qio_channel_read_all(ioc, buffer, size, errp) < 0 ? -EIO : 0;
> -}
> -
>   /* nbd_write
>    * Writes @size bytes to @ioc. Returns 0 on success.
>    */
> @@ -137,6 +114,4 @@ const char *nbd_rep_lookup(uint32_t rep);
>   const char *nbd_info_lookup(uint16_t info);
>   const char *nbd_cmd_lookup(uint16_t info);
>   
> -int nbd_drop(QIOChannel *ioc, size_t size, Error **errp);
> -
>   #endif
> diff --git a/block/nbd-client-cmds.c b/block/nbd-client-cmds.c
> new file mode 100644
> index 0000000000..488a3dc267
> --- /dev/null
> +++ b/block/nbd-client-cmds.c
> @@ -0,0 +1,220 @@
> +/*
> + * QEMU Block driver for NBD
> + *
> + * Copyright (c) 2017 Parallels International GmbH
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "nbd-client.h"
> +
> +
> +typedef int (*NBDCommandFn)(QIOChannel *ioc,
> +                            NBDStructuredReplyChunk *chunk,
> +                            void *opaque);
> +typedef struct NBDCommand {
> +    int (*read_simple_payload)(QIOChannel *ioc, void *opaque);
> +    NBDCommandFn read_offset_data;
> +    NBDCommandFn read_offset_hole;
> +} NBDCommand;
> +
> +static int nbd_cmd_read__siple_payload(QIOChannel *ioc, void *opaque)
> +{
> +    QEMUIOVector *qiov = (QEMUIOVector *)opaque;
> +    return qio_channel_readv_all(ioc, qiov->iov, qiov->niov, NULL);
> +}
> +
> +static int nbd_cmd_read__offset_data(QIOChannel *ioc,
> +                                     NBDStructuredReplyChunk *chunk,
> +                                     void *opaque)
> +{
> +    QEMUIOVector *qiov = (QEMUIOVector *)opaque;
> +    QEMUIOVector sub_qiov;
> +    uint64_t offset;
> +    size_t data_size;
> +    int ret;
> +
> +    if (chunk->length < sizeof(offset)) {
> +        return -1;
> +    }
> +
> +    if (nbd_read(ioc, &offset, sizeof(offset), NULL) < 0) {
> +        return -1;
> +    }
> +    be64_to_cpus(&offset);
> +
> +    data_size = chunk->length - sizeof(offset);
> +    if (offset + data_size > qiov->size) {
> +        return -1;
> +    }
> +
> +    qemu_iovec_init(&sub_qiov, qiov->niov);
> +    qemu_iovec_concat(&sub_qiov, qiov, offset, data_size);
> +    ret = qio_channel_readv_all(ioc, sub_qiov.iov, sub_qiov.niov, NULL);
> +    qemu_iovec_destroy(&sub_qiov);
> +
> +    return ret;
> +}
> +
> +static int nbd_cmd_read__offset_hole(QIOChannel *ioc,
> +                                     NBDStructuredReplyChunk *chunk,
> +                                     void *opaque)
> +{
> +    QEMUIOVector *qiov = (QEMUIOVector *)opaque;
> +    NBDPayloadOffsetHole pl;
> +
> +    if (chunk->length != sizeof(pl)) {
> +        return -1;
> +    }
> +
> +    if (nbd_read(ioc, &pl, sizeof(pl), NULL) < 0) {
> +        return -1;
> +    }
> +
> +    be64_to_cpus(&pl.offset);
> +    be32_to_cpus(&pl.hole_size);
> +
> +    if (pl.offset + pl.hole_size > qiov->size) {
> +        return -1;
> +    }
> +
> +    qemu_iovec_memset(qiov, pl.offset, 0, pl.hole_size);
> +
> +    return 0;
> +}
> +
> +NBDCommand nbd_cmd_read = {
> +    .read_simple_payload = nbd_cmd_read__siple_payload,
> +    .read_offset_data = nbd_cmd_read__offset_data,
> +    .read_offset_hole = nbd_cmd_read__offset_hole,
> +};
> +
> +static NBDCommand *nbd_get_cmd(int cmd)
> +{
> +    switch (cmd) {
> +    case NBD_CMD_READ:
> +        return &nbd_cmd_read;
> +    }
> +
> +    return NULL;
> +}
> +
> +static NBDCommandFn nbd_cmd_get_handler(int cmd, int reply_type)
> +{
> +    NBDCommand *command = nbd_get_cmd(cmd);
> +
> +    if (command == NULL) {
> +        return NULL;
> +    }
> +
> +    switch (reply_type) {
> +    case NBD_SREP_TYPE_OFFSET_DATA:
> +        return command->read_offset_data;
> +    case NBD_SREP_TYPE_OFFSET_HOLE:
> +        return command->read_offset_hole;
> +    }
> +
> +    return NULL;
> +}
> +
> +/* nbd_handle_structured_payload
> + * Find corresponding handler and call it.
> + * If not found return -1 (fail)
> + */
> +int nbd_handle_structured_payload(QIOChannel *ioc, int cmd, NBDStructuredReplyChunk *chunk,
> +                                  void *opaque)
> +{
> +    NBDCommandFn fn = nbd_cmd_get_handler(cmd, chunk->type);
> +
> +    if (fn == NULL) {
> +        return -1;
> +    }
> +
> +    return fn(ioc, chunk, opaque);
> +}
> +
> +/* nbd_handle_simple_payload
> + * Find corresponding handler and call it.
> + * If not found return 0 (success)
> + */
> +int nbd_handle_simple_payload(QIOChannel *ioc, int cmd, void *opaque)
> +{
> +    NBDCommand *command = nbd_get_cmd(cmd);
> +
> +    if (command == NULL || command->read_simple_payload == NULL) {
> +        return 0;
> +    }
> +
> +    return command->read_simple_payload(ioc, opaque);
> +}
> +
> +int nbd_handle_structured_error_payload(QIOChannel *ioc, NBDStructuredReplyChunk *chunk,
> +                                        int *request_ret)
> +{
> +    uint32_t error;
> +    uint16_t message_size;
> +    int ret;
> +    assert(chunk->type & (1 << 15));
> +
> +    switch (chunk->type) {
> +    case NBD_SREP_TYPE_ERROR:
> +    case NBD_SREP_TYPE_ERROR_OFFSET:
> +        ret = nbd_read(ioc, &error, sizeof(error), NULL);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        be32_to_cpus(&error);
> +
> +        ret = nbd_read(ioc, &message_size, sizeof(message_size), NULL);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        be16_to_cpus(&message_size);
> +
> +        if (message_size > 0) {
> +            /* TODO: provide error message to user */
> +            ret = nbd_drop(ioc, message_size, NULL);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +        }
> +
> +        if (chunk->type == NBD_SREP_TYPE_ERROR_OFFSET) {
> +            /* drop 64bit offset */
> +            ret = nbd_drop(ioc, 8, NULL);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +        }
> +        break;
> +    default:
> +        /* unknown error */
> +        ret = nbd_drop(ioc, chunk->length, NULL);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        error = NBD_EINVAL;
> +    }
> +
> +    *request_ret = nbd_errno_to_system_errno(error);
> +    return 0;
> +}
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index e4f0c789f4..cf80564a83 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -179,9 +179,9 @@ err:
>       return rc;
>   }
>   
> -static int nbd_co_receive_reply(NBDClientSession *s,
> -                                uint64_t handle,
> -                                QEMUIOVector *qiov)
> +static int nbd_co_receive_one_chunk(NBDClientSession *s, uint64_t handle,
> +                                    int request_cmd, void *request_opaque,
> +                                    bool *cont)
>   {
>       int ret;
>       int i = HANDLE_TO_INDEX(s, handle);
> @@ -191,29 +191,93 @@ static int nbd_co_receive_reply(NBDClientSession *s,
>       qemu_coroutine_yield();
>       s->requests[i].receiving = false;
>       if (!s->ioc || s->quit) {
> -        ret = -EIO;
> -    } else {
> -        assert(s->reply.handle == handle);
> -        ret = -s->reply.error;
> -        if (qiov && s->reply.error == 0) {
> -            if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
> -                                      NULL) < 0) {
> -                ret = -EIO;
> -                s->quit = true;
> +        *cont = false;
> +        return -EIO;
> +    }
> +
> +    assert(s->reply.handle == handle);
> +
> +    if (nbd_reply_is_simple(&s->reply)) {
> +        *cont = false;
> +        ret = -s->reply.simple.error;
> +        if (s->structured_reply && request_cmd == NBD_CMD_READ) {
> +            goto fatal;
> +        }
> +        if (ret == 0) {
> +            if (nbd_handle_simple_payload(s->ioc, request_cmd,
> +                                          request_opaque) < 0)
> +            {
> +                goto fatal;
>               }
>           }
> +        goto out;
> +    }
> +
> +    /* handle structured reply chunk */
> +
> +    *cont = !(s->reply.structured.flags & NBD_SREP_FLAG_DONE);
>   
> -        /* Tell the read handler to read another header.  */
> -        s->reply.handle = 0;
> +    if (s->reply.structured.type == NBD_SREP_TYPE_NONE) {
> +        goto out;
>       }
>   
> -    s->requests[i].coroutine = NULL;
> +    if (nbd_srep_type_is_error(s->reply.structured.type)) {
> +        if (nbd_handle_structured_error_payload(s->ioc, &s->reply.structured,
> +                                                &ret) < 0)
> +        {
> +            goto fatal;
> +        }
> +        goto out;
> +    }
> +
> +    /* here we deal with successful not-NONE structured reply */
> +    if (nbd_handle_structured_payload(s->ioc, request_cmd,
> +                                      &s->reply.structured,
> +                                      request_opaque) < 0)
> +    {
> +        goto fatal;
> +    }
> +
> +out:
> +    /* For assert at loop start in nbd_read_reply_entry */
> +    s->reply.handle = 0;
>   
> -    /* Kick the read_reply_co to get the next reply.  */
>       if (s->read_reply_co) {
>           aio_co_wake(s->read_reply_co);
>       }
>   
> +    return ret;
> +
> +fatal:
> +    /* protocol or ioc failure */
> +    *cont = false;
> +    s->quit = true;
> +    if (s->read_reply_co) {
> +        aio_co_wake(s->read_reply_co);
> +    }
> +
> +    return -EIO;
> +}
> +
> +static int nbd_co_receive_reply(NBDClientSession *s,
> +                                uint64_t handle,
> +                                int request_cmd,
> +                                void *request_opaque)
> +{
> +    int ret = 0;
> +    int i = HANDLE_TO_INDEX(s, handle);
> +    bool cont = true;
> +
> +    while (cont) {
> +        int rc = nbd_co_receive_one_chunk(s, handle, request_cmd,
> +                                          request_opaque, &cont);
> +        if (rc < 0 && ret == 0) {
> +            ret = rc;
> +        }
> +    }
> +
> +    s->requests[i].coroutine = NULL;
> +
>       qemu_co_mutex_lock(&s->send_mutex);
>       s->in_flight--;
>       qemu_co_queue_next(&s->free_sema);
> @@ -224,22 +288,28 @@ static int nbd_co_receive_reply(NBDClientSession *s,
>   
>   static int nbd_co_request(BlockDriverState *bs,
>                             NBDRequest *request,
> -                          QEMUIOVector *qiov)
> +                          void *request_opaque)
>   {
>       NBDClientSession *client = nbd_get_client_session(bs);
>       int ret;
> +    QEMUIOVector *send_qiov = NULL;
> +
> +    if (request->type == NBD_CMD_WRITE) {
> +        /* TODO: move request sending special casing to nbd-client-cmds.c like
> +         * receiving part. */
> +        send_qiov = request_opaque;
> +        request_opaque = NULL;
> +        assert(send_qiov &&
> +               request->len == iov_size(send_qiov->iov, send_qiov->niov));
> +    }
>   
> -    assert(!qiov || request->type == NBD_CMD_WRITE ||
> -           request->type == NBD_CMD_READ);
> -    assert(!qiov || request->len == iov_size(qiov->iov, qiov->niov));
> -    ret = nbd_co_send_request(bs, request,
> -                              request->type == NBD_CMD_WRITE ? qiov : NULL);
> +    ret = nbd_co_send_request(bs, request, send_qiov);
>       if (ret < 0) {
>           return ret;
>       }
>   
> -    return nbd_co_receive_reply(client, request->handle,
> -                                request->type == NBD_CMD_READ ? qiov : NULL);
> +    return nbd_co_receive_reply(client, request->handle, request->type,
> +                                request_opaque);
>   }
>   
>   int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
> diff --git a/nbd/client.c b/nbd/client.c
> index 51ae492e92..d0e9b8f138 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -22,38 +22,6 @@
>   #include "trace.h"
>   #include "nbd-internal.h"
>   
> -static int nbd_errno_to_system_errno(int err)
> -{
> -    int ret;
> -    switch (err) {
> -    case NBD_SUCCESS:
> -        ret = 0;
> -        break;
> -    case NBD_EPERM:
> -        ret = EPERM;
> -        break;
> -    case NBD_EIO:
> -        ret = EIO;
> -        break;
> -    case NBD_ENOMEM:
> -        ret = ENOMEM;
> -        break;
> -    case NBD_ENOSPC:
> -        ret = ENOSPC;
> -        break;
> -    case NBD_ESHUTDOWN:
> -        ret = ESHUTDOWN;
> -        break;
> -    default:
> -        trace_nbd_unknown_error(err);
> -        /* fallthrough */
> -    case NBD_EINVAL:
> -        ret = EINVAL;
> -        break;
> -    }
> -    return ret;
> -}
> -
>   /* Definitions for opaque data types */
>   
>   static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
> @@ -719,6 +687,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
>           if (fixedNewStyle) {
>               int result;
>   
> +            result = nbd_request_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
> +                                               errp);
> +            if (result < 0) {
> +                goto fail;
> +            }
> +            info->structured_reply = result > 0;
> +
>               /* Try NBD_OPT_GO first - if it works, we are done (it
>                * also gives us a good message if the server requires
>                * TLS).  If it is not available, fall back to
> @@ -759,6 +734,12 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
>               goto fail;
>           }
>           be16_to_cpus(&info->flags);
> +
> +        if (info->structured_reply && !(info->flags & NBD_CMD_FLAG_DF)) {
> +            error_setg(errp, "Structured reply is negotiated, "
> +                             "but DF flag is not.");
> +            goto fail;
> +        }
>       } else if (magic == NBD_CLIENT_MAGIC) {
>           uint32_t oldflags;
>   
> @@ -942,6 +923,46 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
>       return nbd_write(ioc, buf, sizeof(buf), NULL);
>   }
>   
> +/* nbd_receive_simple_reply
> + * Read simple reply except magic field (which should be already read)
> + */
> +static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply,
> +                                   Error **errp)
> +{
> +    int ret = nbd_read(ioc, (uint8_t *)reply + sizeof(reply->magic),
> +                       sizeof(*reply) - sizeof(reply->magic), errp);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    be32_to_cpus(&reply->error);
> +    be64_to_cpus(&reply->handle);
> +
> +    return 0;
> +}
> +
> +/* nbd_receive_structured_reply_chunk
> + * Read structured reply chunk except magic field (which should be already read)
> + * Payload is not read.
> + */
> +static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
> +                                              NBDStructuredReplyChunk *chunk,
> +                                              Error **errp)
> +{
> +    int ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
> +                       sizeof(*chunk) - sizeof(chunk->magic), errp);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    be16_to_cpus(&chunk->flags);
> +    be16_to_cpus(&chunk->type);
> +    be64_to_cpus(&chunk->handle);
> +    be32_to_cpus(&chunk->length);
> +
> +    return 0;
> +}
> +
>   /* nbd_receive_reply
>    * Returns 1 on success
>    *         0 on eof, when no data was read (errp is not set)
> @@ -949,39 +970,38 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
>    */
>   int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
>   {
> -    uint8_t buf[NBD_REPLY_SIZE];
> -    uint32_t magic;
>       int ret;
>   
> -    ret = nbd_read_eof(ioc, buf, sizeof(buf), errp);
> +    ret = nbd_read_eof(ioc, &reply->magic, sizeof(reply->magic), errp);
>       if (ret <= 0) {
>           return ret;
>       }
>   
> -    /* Reply
> -       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
> -       [ 4 ..  7]    error   (0 == no error)
> -       [ 7 .. 15]    handle
> -     */
> +    be32_to_cpus(&reply->magic);
>   
> -    magic = ldl_be_p(buf);
> -    reply->error  = ldl_be_p(buf + 4);
> -    reply->handle = ldq_be_p(buf + 8);
> +    switch (reply->magic) {
> +    case NBD_SIMPLE_REPLY_MAGIC:
> +        ret = nbd_receive_simple_reply(ioc, &reply->simple, errp);
> +        reply->simple.error = nbd_errno_to_system_errno(reply->simple.error);
>   
> -    reply->error = nbd_errno_to_system_errno(reply->error);
> -
> -    if (reply->error == ESHUTDOWN) {
> -        /* This works even on mingw which lacks a native ESHUTDOWN */
> -        error_setg(errp, "server shutting down");
> +        if (reply->simple.error == ESHUTDOWN) {
> +            /* This works even on mingw which lacks a native ESHUTDOWN */
> +            error_setg(errp, "server shutting down");
> +            return -EINVAL;
> +        }
> +        trace_nbd_receive_reply(reply->magic, reply->simple.error,
> +                                reply->simple.handle);
> +        break;
> +    case NBD_STRUCTURED_REPLY_MAGIC:
> +        ret = nbd_receive_structured_reply_chunk(ioc, &reply->structured, errp);
> +        break;
> +    default:
> +        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", reply->magic);
>           return -EINVAL;
>       }
> -    trace_nbd_receive_reply(magic, reply->error, reply->handle);
> -
> -    if (magic != NBD_SIMPLE_REPLY_MAGIC) {
> -        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
> -        return -EINVAL;
> +    if (ret < 0) {
> +        return ret;
>       }
>   
>       return 1;
>   }
> -
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 6eaf78a046..c99420f42b 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -12,7 +12,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>   block-obj-y += null.o mirror.o commit.o io.o
>   block-obj-y += throttle-groups.o
>   
> -block-obj-y += nbd.o nbd-client.o sheepdog.o
> +block-obj-y += nbd.o nbd-client.o nbd-client-cmds.o sheepdog.o
>   block-obj-$(CONFIG_LIBISCSI) += iscsi.o
>   block-obj-$(if $(CONFIG_LIBISCSI),y,n) += iscsi-opts.o
>   block-obj-$(CONFIG_LIBNFS) += nfs.o


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-09-25 22:19   ` Eric Blake
  2017-09-27 10:05     ` Vladimir Sementsov-Ogievskiy
@ 2017-10-03 10:07     ` Paolo Bonzini
  2017-10-03 12:26       ` Vladimir Sementsov-Ogievskiy
  2017-10-03 12:58       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 2 replies; 31+ messages in thread
From: Paolo Bonzini @ 2017-10-03 10:07 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, den, mreitz

On 26/09/2017 00:19, Eric Blake wrote:
>> +    /* here we deal with successful structured reply */
>> +    switch (s->reply.type) {
>> +        QEMUIOVector sub_qiov;
>> +    case NBD_SREP_TYPE_OFFSET_DATA:
> This is putting a LOT of smarts directly into the receive routine.
> Here's where I was previously wondering (and I think Paolo as well)
> whether it might be better to split the efforts: the generic function
> reads off the chunk information and any payload, but a per-command
> callback function then parses the chunks.  Especially since the
> definition of the chunks differs on a per-command basis (yes, the NBD
> spec will try to not reuse an SREP chunk type across multiple commands
> unless the semantics are similar, but that's a bit more fragile).  This
> particularly matters given my statement above that you want a
> discriminated union, rather than a struct that contains unused fields,
> for handling different SREP chunk types.

I think there should be two kinds of replies: 1) read directly into a
QEMUIOVector, using structured replies only as an encapsulation of the
payload; 2) read a chunk at a time into malloc-ed memory, yielding back
to the calling coroutine after receiving one complete chunk.

In the end this probably means that you have a read_chunk_header
function and a read_chunk function.  READ has a loop that calls
read_chunk_header followed by direct reading into the QEMUIOVector,
while everyone else calls read_chunk.

Maybe qio_channel_readv/writev_full could have "offset" and "bytes"
arguments.  Most code in iov_send_recv could be cut-and-pasted.  (When
sheepdog is converted to QIOChannel, iov_send_recv can go away).

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-03 10:07     ` [Qemu-devel] [Qemu-block] [PATCH 8/8] " Paolo Bonzini
@ 2017-10-03 12:26       ` Vladimir Sementsov-Ogievskiy
  2017-10-03 14:05         ` Paolo Bonzini
  2017-10-03 12:58       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-10-03 12:26 UTC (permalink / raw)
  To: Paolo Bonzini, Eric Blake, qemu-devel, qemu-block; +Cc: kwolf, den, mreitz

03.10.2017 13:07, Paolo Bonzini wrote:
> On 26/09/2017 00:19, Eric Blake wrote:
>>> +    /* here we deal with successful structured reply */
>>> +    switch (s->reply.type) {
>>> +        QEMUIOVector sub_qiov;
>>> +    case NBD_SREP_TYPE_OFFSET_DATA:
>> This is putting a LOT of smarts directly into the receive routine.
>> Here's where I was previously wondering (and I think Paolo as well)
>> whether it might be better to split the efforts: the generic function
>> reads off the chunk information and any payload, but a per-command
>> callback function then parses the chunks.  Especially since the
>> definition of the chunks differs on a per-command basis (yes, the NBD
>> spec will try to not reuse an SREP chunk type across multiple commands
>> unless the semantics are similar, but that's a bit more fragile).  This
>> particularly matters given my statement above that you want a
>> discriminated union, rather than a struct that contains unused fields,
>> for handling different SREP chunk types.
> I think there should be two kinds of replies: 1) read directly into a
> QEMUIOVector, using structured replies only as an encapsulation of the

who should read to qiov? reply_entry, or calling coroutine?.. 
reply_entry should anyway
parse reply, to understand should it read it all or read it to qiov (or 
yield back, and then
calling coroutine will read to qiov)..

> payload; 2) read a chunk at a time into malloc-ed memory, yielding back
> to the calling coroutine after receiving one complete chunk.
>
> In the end this probably means that you have a read_chunk_header
> function and a read_chunk function.  READ has a loop that calls
> read_chunk_header followed by direct reading into the QEMUIOVector,
> while everyone else calls read_chunk.
>
> Maybe qio_channel_readv/writev_full could have "offset" and "bytes"
> arguments.  Most code in iov_send_recv could be cut-and-pasted.  (When
> sheepdog is converted to QIOChannel, iov_send_recv can go away).
>
> Paolo


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-03 10:07     ` [Qemu-devel] [Qemu-block] [PATCH 8/8] " Paolo Bonzini
  2017-10-03 12:26       ` Vladimir Sementsov-Ogievskiy
@ 2017-10-03 12:58       ` Vladimir Sementsov-Ogievskiy
  2017-10-03 13:35         ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-10-03 12:58 UTC (permalink / raw)
  To: Paolo Bonzini, Eric Blake, qemu-devel, qemu-block; +Cc: kwolf, den, mreitz

03.10.2017 13:07, Paolo Bonzini wrote:
> On 26/09/2017 00:19, Eric Blake wrote:
>>> +    /* here we deal with successful structured reply */
>>> +    switch (s->reply.type) {
>>> +        QEMUIOVector sub_qiov;
>>> +    case NBD_SREP_TYPE_OFFSET_DATA:
>> This is putting a LOT of smarts directly into the receive routine.
>> Here's where I was previously wondering (and I think Paolo as well)
>> whether it might be better to split the efforts: the generic function
>> reads off the chunk information and any payload, but a per-command
>> callback function then parses the chunks.  Especially since the
>> definition of the chunks differs on a per-command basis (yes, the NBD
>> spec will try to not reuse an SREP chunk type across multiple commands
>> unless the semantics are similar, but that's a bit more fragile).  This
>> particularly matters given my statement above that you want a
>> discriminated union, rather than a struct that contains unused fields,
>> for handling different SREP chunk types.
> I think there should be two kinds of replies: 1) read directly into a
> QEMUIOVector, using structured replies only as an encapsulation of the
> payload; 2) read a chunk at a time into malloc-ed memory, yielding back
> to the calling coroutine after receiving one complete chunk.
>
> In the end this probably means that you have a read_chunk_header
> function and a read_chunk function.  READ has a loop that calls
> read_chunk_header followed by direct reading into the QEMUIOVector,
> while everyone else calls read_chunk.

accordingly to spec, we can receive several error reply chunks to any 
request,
so loop, receiving them should be common for all requests I think

>
> Maybe qio_channel_readv/writev_full could have "offset" and "bytes"
> arguments.  Most code in iov_send_recv could be cut-and-pasted.  (When
> sheepdog is converted to QIOChannel, iov_send_recv can go away).
>
> Paolo


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-03 12:58       ` Vladimir Sementsov-Ogievskiy
@ 2017-10-03 13:35         ` Vladimir Sementsov-Ogievskiy
  2017-10-03 14:06           ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-10-03 13:35 UTC (permalink / raw)
  To: Paolo Bonzini, Eric Blake, qemu-devel, qemu-block; +Cc: kwolf, den, mreitz

03.10.2017 15:58, Vladimir Sementsov-Ogievskiy wrote:
> 03.10.2017 13:07, Paolo Bonzini wrote:
>> On 26/09/2017 00:19, Eric Blake wrote:
>>>> +    /* here we deal with successful structured reply */
>>>> +    switch (s->reply.type) {
>>>> +        QEMUIOVector sub_qiov;
>>>> +    case NBD_SREP_TYPE_OFFSET_DATA:
>>> This is putting a LOT of smarts directly into the receive routine.
>>> Here's where I was previously wondering (and I think Paolo as well)
>>> whether it might be better to split the efforts: the generic function
>>> reads off the chunk information and any payload, but a per-command
>>> callback function then parses the chunks.  Especially since the
>>> definition of the chunks differs on a per-command basis (yes, the NBD
>>> spec will try to not reuse an SREP chunk type across multiple commands
>>> unless the semantics are similar, but that's a bit more fragile).  This
>>> particularly matters given my statement above that you want a
>>> discriminated union, rather than a struct that contains unused fields,
>>> for handling different SREP chunk types.
>> I think there should be two kinds of replies: 1) read directly into a
>> QEMUIOVector, using structured replies only as an encapsulation of the
>> payload; 2) read a chunk at a time into malloc-ed memory, yielding back
>> to the calling coroutine after receiving one complete chunk.
>>
>> In the end this probably means that you have a read_chunk_header
>> function and a read_chunk function.  READ has a loop that calls
>> read_chunk_header followed by direct reading into the QEMUIOVector,
>> while everyone else calls read_chunk.
>
> accordingly to spec, we can receive several error reply chunks to any 
> request,
> so loop, receiving them should be common for all requests I think

as well as handling error chunks should be common.. What do you think 
about my DRAFT proposal?


>
>>
>> Maybe qio_channel_readv/writev_full could have "offset" and "bytes"
>> arguments.  Most code in iov_send_recv could be cut-and-pasted. (When
>> sheepdog is converted to QIOChannel, iov_send_recv can go away).
>>
>> Paolo
>
>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-03 12:26       ` Vladimir Sementsov-Ogievskiy
@ 2017-10-03 14:05         ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2017-10-03 14:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Eric Blake, qemu-devel, qemu-block
  Cc: kwolf, den, mreitz

On 03/10/2017 14:26, Vladimir Sementsov-Ogievskiy wrote:
> 03.10.2017 13:07, Paolo Bonzini wrote:
>> On 26/09/2017 00:19, Eric Blake wrote:
>>>> +    /* here we deal with successful structured reply */
>>>> +    switch (s->reply.type) {
>>>> +        QEMUIOVector sub_qiov;
>>>> +    case NBD_SREP_TYPE_OFFSET_DATA:
>>> This is putting a LOT of smarts directly into the receive routine.
>>> Here's where I was previously wondering (and I think Paolo as well)
>>> whether it might be better to split the efforts: the generic function
>>> reads off the chunk information and any payload, but a per-command
>>> callback function then parses the chunks.  Especially since the
>>> definition of the chunks differs on a per-command basis (yes, the NBD
>>> spec will try to not reuse an SREP chunk type across multiple commands
>>> unless the semantics are similar, but that's a bit more fragile).  This
>>> particularly matters given my statement above that you want a
>>> discriminated union, rather than a struct that contains unused fields,
>>> for handling different SREP chunk types.
>> I think there should be two kinds of replies: 1) read directly into a
>> QEMUIOVector, using structured replies only as an encapsulation of the
> 
> who should read to qiov? reply_entry, or calling coroutine?..
> reply_entry should anyway
> parse reply, to understand should it read it all or read it to qiov (or
> yield back, and then
> calling coroutine will read to qiov)..

The CMD_READ coroutine should---either directly or, in case you have a
structured reply, after reading each chunk header.

Paolo

>> payload; 2) read a chunk at a time into malloc-ed memory, yielding back
>> to the calling coroutine after receiving one complete chunk.
>>
>> In the end this probably means that you have a read_chunk_header
>> function and a read_chunk function.  READ has a loop that calls
>> read_chunk_header followed by direct reading into the QEMUIOVector,
>> while everyone else calls read_chunk.
>>
>> Maybe qio_channel_readv/writev_full could have "offset" and "bytes"
>> arguments.  Most code in iov_send_recv could be cut-and-pasted.  (When
>> sheepdog is converted to QIOChannel, iov_send_recv can go away).
>>
>> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-03 13:35         ` Vladimir Sementsov-Ogievskiy
@ 2017-10-03 14:06           ` Paolo Bonzini
  2017-10-05  9:59             ` Vladimir Sementsov-Ogievskiy
  2017-10-05 10:02             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 31+ messages in thread
From: Paolo Bonzini @ 2017-10-03 14:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Eric Blake, qemu-devel, qemu-block
  Cc: kwolf, den, mreitz

On 03/10/2017 15:35, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>> In the end this probably means that you have a read_chunk_header
>>> function and a read_chunk function.  READ has a loop that calls
>>> read_chunk_header followed by direct reading into the QEMUIOVector,
>>> while everyone else calls read_chunk.
>>
>> accordingly to spec, we can receive several error reply chunks to any
>> request,
>> so loop, receiving them should be common for all requests I think
> 
> as well as handling error chunks should be common..

Yes, reading error chunks should be part of read_chunk_header.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-03 14:06           ` Paolo Bonzini
@ 2017-10-05  9:59             ` Vladimir Sementsov-Ogievskiy
  2017-10-05 10:02             ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-10-05  9:59 UTC (permalink / raw)
  To: Paolo Bonzini, Eric Blake, qemu-devel, qemu-block; +Cc: kwolf, den, mreitz

03.10.2017 17:06, Paolo Bonzini wrote:
> On 03/10/2017 15:35, Vladimir Sementsov-Ogievskiy wrote:
>>>> In the end this probably means that you have a read_chunk_header
>>>> function and a read_chunk function.  READ has a loop that calls
>>>> read_chunk_header followed by direct reading into the QEMUIOVector,
>>>> while everyone else calls read_chunk.
>>> accordingly to spec, we can receive several error reply chunks to any
>>> request,
>>> so loop, receiving them should be common for all requests I think
>> as well as handling error chunks should be common..
> Yes, reading error chunks should be part of read_chunk_header.
>
> Paolo


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-03 14:06           ` Paolo Bonzini
  2017-10-05  9:59             ` Vladimir Sementsov-Ogievskiy
@ 2017-10-05 10:02             ` Vladimir Sementsov-Ogievskiy
  2017-10-05 10:36               ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-10-05 10:02 UTC (permalink / raw)
  To: Paolo Bonzini, Eric Blake, qemu-devel, qemu-block; +Cc: kwolf, den, mreitz

03.10.2017 17:06, Paolo Bonzini wrote:
> On 03/10/2017 15:35, Vladimir Sementsov-Ogievskiy wrote:
>>>> In the end this probably means that you have a read_chunk_header
>>>> function and a read_chunk function.  READ has a loop that calls
>>>> read_chunk_header followed by direct reading into the QEMUIOVector,
>>>> while everyone else calls read_chunk.
>>> accordingly to spec, we can receive several error reply chunks to any
>>> request,
>>> so loop, receiving them should be common for all requests I think
>> as well as handling error chunks should be common..
> Yes, reading error chunks should be part of read_chunk_header.
>
> Paolo

So, you want a loop in READ, and separate loop for other commands? Then 
we will have separate loop for BLOCK_STATUS and for all future commands 
with specific replies?

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-05 10:02             ` Vladimir Sementsov-Ogievskiy
@ 2017-10-05 10:36               ` Paolo Bonzini
  2017-10-05 22:12                 ` Eric Blake
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2017-10-05 10:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Eric Blake, qemu-devel, qemu-block
  Cc: kwolf, den, mreitz

On 05/10/2017 12:02, Vladimir Sementsov-Ogievskiy wrote:
> 03.10.2017 17:06, Paolo Bonzini wrote:
>> On 03/10/2017 15:35, Vladimir Sementsov-Ogievskiy wrote:
>>>>> In the end this probably means that you have a read_chunk_header
>>>>> function and a read_chunk function.  READ has a loop that calls
>>>>> read_chunk_header followed by direct reading into the QEMUIOVector,
>>>>> while everyone else calls read_chunk.
>>>> accordingly to spec, we can receive several error reply chunks to any
>>>> request,
>>>> so loop, receiving them should be common for all requests I think
>>> as well as handling error chunks should be common..
>> Yes, reading error chunks should be part of read_chunk_header.
>>
>> Paolo
> 
> So, you want a loop in READ, and separate loop for other commands? Then
> we will have separate loop for BLOCK_STATUS and for all future commands
> with specific replies?

There should be a separate loop for each command.

The only difference between READ and other commands is that READ
receives directly in QEMUIOVector, while other commands can use a common
function to to receive each structured reply chunk into malloc-ed memory.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-05 10:36               ` Paolo Bonzini
@ 2017-10-05 22:12                 ` Eric Blake
  2017-10-06  7:09                   ` Vladimir Sementsov-Ogievskiy
  2017-10-06  7:34                   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 31+ messages in thread
From: Eric Blake @ 2017-10-05 22:12 UTC (permalink / raw)
  To: Paolo Bonzini, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, den, mreitz

[-- Attachment #1: Type: text/plain, Size: 4206 bytes --]

On 10/05/2017 05:36 AM, Paolo Bonzini wrote:
> On 05/10/2017 12:02, Vladimir Sementsov-Ogievskiy wrote:
>> 03.10.2017 17:06, Paolo Bonzini wrote:
>>> On 03/10/2017 15:35, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> In the end this probably means that you have a read_chunk_header
>>>>>> function and a read_chunk function.  READ has a loop that calls
>>>>>> read_chunk_header followed by direct reading into the QEMUIOVector,
>>>>>> while everyone else calls read_chunk.
>>>>> accordingly to spec, we can receive several error reply chunks to any
>>>>> request,
>>>>> so loop, receiving them should be common for all requests I think
>>>> as well as handling error chunks should be common..
>>> Yes, reading error chunks should be part of read_chunk_header.
>>>
>>> Paolo
>>
>> So, you want a loop in READ, and separate loop for other commands? Then
>> we will have separate loop for BLOCK_STATUS and for all future commands
>> with specific replies?
> 
> There should be a separate loop for each command.
> 
> The only difference between READ and other commands is that READ
> receives directly in QEMUIOVector, while other commands can use a common
> function to to receive each structured reply chunk into malloc-ed memory.

To make sure we're on the same page, here's how I see it.  At a high
level, we have:

Each command calls nbd_co_send_request once, then calls
nbd_co_receive_reply in a loop until there is an indication of the last
packet.  nbd_co_receive_reply waits for data to come from the server,
and parses the header.

If the packet is unrecognized, report failure and request a quit
(negative return value)

If it is old-style:
- if the command is read, and we did not negotiate structured read, then
we also read the payload into qiov
- if the command is read, but we negotiated structured read, the server
is buggy, so report the bug and request a quit
- for all other commands, there is no payload, return success or failure
to the caller based on the header payload
- at any rate, the reply to the caller is that this is the final packet,
and there is no payload returned (so we return negative or 1, but never 0)

Otherwise, it is new-style:
- if we did not negotiate structured reply, the server is buggy, so
report the bug and request a quit (negative return)
- if the chunk is an error, we process the entire packet and report the
error; if we have any commands that care about the error details, we
could return the error in a malloc'd discriminated union, but we can
probably get by without that. If callers don't care about details, but
the error chunk is not final, it may be easier to just store the fact
that an error occurred and return 0 to tell the caller to keep looping,
and return the negative value later when the final chunk is finally received
- if the chunk is NBD_REPLY_TYPE_NONE, there is no payload, and this
should be the final chunk, so the return to the caller can be the same
as for old-style (return 1 if we had no earlier error packets, or the
saved negative value corresponding to the first error received)
- if the command is read, we can read the payload into qiov (saves
malloc'ing space for the reply only to copy it into the qiov), so we
don't have to return any data
- for any other command, we malloc space for the non-error payload, and
then it is up to the command's loop to process the payload

so the signature can be something like:

int nbd_co_receive_reply(NBDClientSession *s, QEMUIOVector *qiov,
                         void **payload)

where it returns -errno on failure, 0 if the command is not complete,
and 1 if the command is done.  READ passes qiov, which is fully
populated when the function returns 1; all other commands pass NULL.
Commands pass NULL for payload if they don't expect a payload return
(this includes READ); but a command that expects a payload
(BLOCK_STATUS) passes a pointer in payload and gets malloc'd space
stored there if return is 0 or 1.

Does that sound like we're on the right design track?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-05 22:12                 ` Eric Blake
@ 2017-10-06  7:09                   ` Vladimir Sementsov-Ogievskiy
  2017-10-06  7:23                     ` Vladimir Sementsov-Ogievskiy
  2017-10-06  7:34                   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-10-06  7:09 UTC (permalink / raw)
  To: Eric Blake, Paolo Bonzini, qemu-devel, qemu-block; +Cc: kwolf, den, mreitz

06.10.2017 01:12, Eric Blake wrote:
> On 10/05/2017 05:36 AM, Paolo Bonzini wrote:
>> On 05/10/2017 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>> 03.10.2017 17:06, Paolo Bonzini wrote:
>>>> On 03/10/2017 15:35, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> In the end this probably means that you have a read_chunk_header
>>>>>>> function and a read_chunk function.  READ has a loop that calls
>>>>>>> read_chunk_header followed by direct reading into the QEMUIOVector,
>>>>>>> while everyone else calls read_chunk.
>>>>>> accordingly to spec, we can receive several error reply chunks to any
>>>>>> request,
>>>>>> so loop, receiving them should be common for all requests I think
>>>>> as well as handling error chunks should be common..
>>>> Yes, reading error chunks should be part of read_chunk_header.
>>>>
>>>> Paolo
>>> So, you want a loop in READ, and separate loop for other commands? Then
>>> we will have separate loop for BLOCK_STATUS and for all future commands
>>> with specific replies?
>> There should be a separate loop for each command.
>>
>> The only difference between READ and other commands is that READ
>> receives directly in QEMUIOVector, while other commands can use a common
>> function to to receive each structured reply chunk into malloc-ed memory.
> To make sure we're on the same page, here's how I see it.  At a high
> level, we have:
>
> Each command calls nbd_co_send_request once, then calls
> nbd_co_receive_reply in a loop until there is an indication of the last
> packet.  nbd_co_receive_reply waits for data to come from the server,
> and parses the header.
>
> If the packet is unrecognized, report failure and request a quit
> (negative return value)
>
> If it is old-style:
> - if the command is read, and we did not negotiate structured read, then
> we also read the payload into qiov
> - if the command is read, but we negotiated structured read, the server
> is buggy, so report the bug and request a quit
> - for all other commands, there is no payload, return success or failure
> to the caller based on the header payload
> - at any rate, the reply to the caller is that this is the final packet,
> and there is no payload returned (so we return negative or 1, but never 0)
>
> Otherwise, it is new-style:
> - if we did not negotiate structured reply, the server is buggy, so
> report the bug and request a quit (negative return)
> - if the chunk is an error, we process the entire packet and report the
> error; if we have any commands that care about the error details, we
> could return the error in a malloc'd discriminated union, but we can
> probably get by without that. If callers don't care about details, but
> the error chunk is not final, it may be easier to just store the fact
> that an error occurred and return 0 to tell the caller to keep looping,
> and return the negative value later when the final chunk is finally received
> - if the chunk is NBD_REPLY_TYPE_NONE, there is no payload, and this
> should be the final chunk, so the return to the caller can be the same
> as for old-style (return 1 if we had no earlier error packets, or the
> saved negative value corresponding to the first error received)
> - if the command is read, we can read the payload into qiov (saves
> malloc'ing space for the reply only to copy it into the qiov), so we
> don't have to return any data
> - for any other command, we malloc space for the non-error payload, and
> then it is up to the command's loop to process the payload
>
> so the signature can be something like:
>
> int nbd_co_receive_reply(NBDClientSession *s, QEMUIOVector *qiov,
>                           void **payload)
>
> where it returns -errno on failure, 0 if the command is not complete,
> and 1 if the command is done.  READ passes qiov, which is fully
> populated when the function returns 1; all other commands pass NULL.
> Commands pass NULL for payload if they don't expect a payload return
> (this includes READ); but a command that expects a payload
> (BLOCK_STATUS) passes a pointer in payload and gets malloc'd space
> stored there if return is 0 or 1.
>
> Does that sound like we're on the right design track?
>


Hmm. and what is the difference with my patch? Except payload? The only 
command with payload
now is READ (except errors), but you (and me in my patch) suggest to 
fill this qiov in nbd_co_receive_reply
(nbd_co_receive_1_reply_or_chunk in my patch), so payload will be unused 
for now, we can add it
later if it will be needed for BLOCK_STATUS.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-06  7:09                   ` Vladimir Sementsov-Ogievskiy
@ 2017-10-06  7:23                     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-10-06  7:23 UTC (permalink / raw)
  To: Eric Blake, Paolo Bonzini, qemu-devel, qemu-block; +Cc: kwolf, den, mreitz

06.10.2017 10:09, Vladimir Sementsov-Ogievskiy wrote:
> 06.10.2017 01:12, Eric Blake wrote:
>> On 10/05/2017 05:36 AM, Paolo Bonzini wrote:
>>> On 05/10/2017 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>>> 03.10.2017 17:06, Paolo Bonzini wrote:
>>>>> On 03/10/2017 15:35, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> In the end this probably means that you have a read_chunk_header
>>>>>>>> function and a read_chunk function.  READ has a loop that calls
>>>>>>>> read_chunk_header followed by direct reading into the 
>>>>>>>> QEMUIOVector,
>>>>>>>> while everyone else calls read_chunk.
>>>>>>> accordingly to spec, we can receive several error reply chunks 
>>>>>>> to any
>>>>>>> request,
>>>>>>> so loop, receiving them should be common for all requests I think
>>>>>> as well as handling error chunks should be common..
>>>>> Yes, reading error chunks should be part of read_chunk_header.
>>>>>
>>>>> Paolo
>>>> So, you want a loop in READ, and separate loop for other commands? 
>>>> Then
>>>> we will have separate loop for BLOCK_STATUS and for all future 
>>>> commands
>>>> with specific replies?
>>> There should be a separate loop for each command.
>>>
>>> The only difference between READ and other commands is that READ
>>> receives directly in QEMUIOVector, while other commands can use a 
>>> common
>>> function to to receive each structured reply chunk into malloc-ed 
>>> memory.
>> To make sure we're on the same page, here's how I see it.  At a high
>> level, we have:
>>
>> Each command calls nbd_co_send_request once, then calls
>> nbd_co_receive_reply in a loop until there is an indication of the last
>> packet.  nbd_co_receive_reply waits for data to come from the server,
>> and parses the header.
>>
>> If the packet is unrecognized, report failure and request a quit
>> (negative return value)
>>
>> If it is old-style:
>> - if the command is read, and we did not negotiate structured read, then
>> we also read the payload into qiov
>> - if the command is read, but we negotiated structured read, the server
>> is buggy, so report the bug and request a quit
>> - for all other commands, there is no payload, return success or failure
>> to the caller based on the header payload
>> - at any rate, the reply to the caller is that this is the final packet,
>> and there is no payload returned (so we return negative or 1, but 
>> never 0)
>>
>> Otherwise, it is new-style:
>> - if we did not negotiate structured reply, the server is buggy, so
>> report the bug and request a quit (negative return)
>> - if the chunk is an error, we process the entire packet and report the
>> error; if we have any commands that care about the error details, we
>> could return the error in a malloc'd discriminated union, but we can
>> probably get by without that. If callers don't care about details, but
>> the error chunk is not final, it may be easier to just store the fact
>> that an error occurred and return 0 to tell the caller to keep looping,
>> and return the negative value later when the final chunk is finally 
>> received
>> - if the chunk is NBD_REPLY_TYPE_NONE, there is no payload, and this
>> should be the final chunk, so the return to the caller can be the same
>> as for old-style (return 1 if we had no earlier error packets, or the
>> saved negative value corresponding to the first error received)
>> - if the command is read, we can read the payload into qiov (saves
>> malloc'ing space for the reply only to copy it into the qiov), so we
>> don't have to return any data
>> - for any other command, we malloc space for the non-error payload, and
>> then it is up to the command's loop to process the payload
>>
>> so the signature can be something like:
>>
>> int nbd_co_receive_reply(NBDClientSession *s, QEMUIOVector *qiov,
>>                           void **payload)
>>
>> where it returns -errno on failure, 0 if the command is not complete,
>> and 1 if the command is done.  READ passes qiov, which is fully
>> populated when the function returns 1; all other commands pass NULL.
>> Commands pass NULL for payload if they don't expect a payload return
>> (this includes READ); but a command that expects a payload
>> (BLOCK_STATUS) passes a pointer in payload and gets malloc'd space
>> stored there if return is 0 or 1.
>>
>> Does that sound like we're on the right design track?
>>
>
>
> Hmm. and what is the difference with my patch? Except payload? The 
> only command with payload
> now is READ (except errors), but you (and me in my patch) suggest to 
> fill this qiov in nbd_co_receive_reply
> (nbd_co_receive_1_reply_or_chunk in my patch), so payload will be 
> unused for now, we can add it
> later if it will be needed for BLOCK_STATUS.
>

Ahm, sorry, I've already forget my original patch, which reads most of 
payload in nbd/client.c code called from reply_entry.. So, ok for me, I 
think I'll post a new version today.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-05 22:12                 ` Eric Blake
  2017-10-06  7:09                   ` Vladimir Sementsov-Ogievskiy
@ 2017-10-06  7:34                   ` Vladimir Sementsov-Ogievskiy
  2017-10-06 13:44                     ` Eric Blake
  1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-10-06  7:34 UTC (permalink / raw)
  To: Eric Blake, Paolo Bonzini, qemu-devel, qemu-block; +Cc: kwolf, den, mreitz

06.10.2017 01:12, Eric Blake wrote:
> On 10/05/2017 05:36 AM, Paolo Bonzini wrote:
>> On 05/10/2017 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>> 03.10.2017 17:06, Paolo Bonzini wrote:
>>>> On 03/10/2017 15:35, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> In the end this probably means that you have a read_chunk_header
>>>>>>> function and a read_chunk function.  READ has a loop that calls
>>>>>>> read_chunk_header followed by direct reading into the QEMUIOVector,
>>>>>>> while everyone else calls read_chunk.
>>>>>> accordingly to spec, we can receive several error reply chunks to any
>>>>>> request,
>>>>>> so loop, receiving them should be common for all requests I think
>>>>> as well as handling error chunks should be common..
>>>> Yes, reading error chunks should be part of read_chunk_header.
>>>>
>>>> Paolo
>>> So, you want a loop in READ, and separate loop for other commands? Then
>>> we will have separate loop for BLOCK_STATUS and for all future commands
>>> with specific replies?
>> There should be a separate loop for each command.
>>
>> The only difference between READ and other commands is that READ
>> receives directly in QEMUIOVector, while other commands can use a common
>> function to to receive each structured reply chunk into malloc-ed memory.
> To make sure we're on the same page, here's how I see it.  At a high
> level, we have:
>
> Each command calls nbd_co_send_request once, then calls
> nbd_co_receive_reply in a loop until there is an indication of the last
> packet.  nbd_co_receive_reply waits for data to come from the server,
> and parses the header.
>
> If the packet is unrecognized, report failure and request a quit
> (negative return value)
>
> If it is old-style:
> - if the command is read, and we did not negotiate structured read, then
> we also read the payload into qiov
> - if the command is read, but we negotiated structured read, the server
> is buggy, so report the bug and request a quit
> - for all other commands, there is no payload, return success or failure
> to the caller based on the header payload
> - at any rate, the reply to the caller is that this is the final packet,
> and there is no payload returned (so we return negative or 1, but never 0)
>
> Otherwise, it is new-style:
> - if we did not negotiate structured reply, the server is buggy, so
> report the bug and request a quit (negative return)
> - if the chunk is an error, we process the entire packet and report the
> error; if we have any commands that care about the error details, we
> could return the error in a malloc'd discriminated union, but we can
> probably get by without that. If callers don't care about details, but
> the error chunk is not final, it may be easier to just store the fact
> that an error occurred and return 0 to tell the caller to keep looping,
> and return the negative value later when the final chunk is finally received
> - if the chunk is NBD_REPLY_TYPE_NONE, there is no payload, and this
> should be the final chunk, so the return to the caller can be the same
> as for old-style (return 1 if we had no earlier error packets, or the
> saved negative value corresponding to the first error received)
> - if the command is read, we can read the payload into qiov (saves
> malloc'ing space for the reply only to copy it into the qiov), so we

but here you said "This is putting a LOT of smarts directly into the 
receive routine"

> don't have to return any data
> - for any other command, we malloc space for the non-error payload, and
> then it is up to the command's loop to process the payload
>
> so the signature can be something like:
>
> int nbd_co_receive_reply(NBDClientSession *s, QEMUIOVector *qiov,
>                           void **payload)
>
> where it returns -errno on failure, 0 if the command is not complete,
> and 1 if the command is done.  READ passes qiov, which is fully
> populated when the function returns 1; all other commands pass NULL.
> Commands pass NULL for payload if they don't expect a payload return
> (this includes READ); but a command that expects a payload
> (BLOCK_STATUS) passes a pointer in payload and gets malloc'd space
> stored there if return is 0 or 1.
>
> Does that sound like we're on the right design track?
>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
  2017-10-06  7:34                   ` Vladimir Sementsov-Ogievskiy
@ 2017-10-06 13:44                     ` Eric Blake
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Blake @ 2017-10-06 13:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Paolo Bonzini, qemu-devel, qemu-block
  Cc: kwolf, den, mreitz

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

On 10/06/2017 02:34 AM, Vladimir Sementsov-Ogievskiy wrote:

>> - if the chunk is NBD_REPLY_TYPE_NONE, there is no payload, and this
>> should be the final chunk, so the return to the caller can be the same
>> as for old-style (return 1 if we had no earlier error packets, or the
>> saved negative value corresponding to the first error received)
>> - if the command is read, we can read the payload into qiov (saves
>> malloc'ing space for the reply only to copy it into the qiov), so we
> 
> but here you said "This is putting a LOT of smarts directly into the
> receive routine"

About the only reason to justify putting smarts into the receive routine
is if it is the most common case, where the overhead of not
special-casing will penalize us.  READ happens to be a frequent command,
all other commands, like BLOCK_STATUS, will probably be infrequent.
Making READ malloc the result, only to then copy it into the qiov, is a
waste of time; no other structured command will pass a qiov.  So yeah,
maybe I'm stating things a bit differently than in my earlier messages,
but that's because I've now stated reasonings for why it is okay to
special-case READ with a qiov differently from all other commands (none
of which will pass a qiov to the receive routine).

Thanks for persisting with this, and I'm looking forward to the next
revision that you post; hopefully, by taking this discussion, we are
making sure that the design is solid.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-10-06 13:44 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-25 13:57 [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy
2017-09-25 13:57 ` [Qemu-devel] [PATCH 1/8] block/nbd-client: assert qiov len once in nbd_co_request Vladimir Sementsov-Ogievskiy
2017-09-25 21:58   ` Eric Blake
2017-09-25 13:57 ` [Qemu-devel] [PATCH 2/8] block/nbd-client: refactor nbd_co_receive_reply Vladimir Sementsov-Ogievskiy
2017-09-25 21:59   ` Eric Blake
2017-09-25 13:57 ` [Qemu-devel] [PATCH 3/8] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC Vladimir Sementsov-Ogievskiy
2017-09-25 13:57 ` [Qemu-devel] [PATCH 4/8] nbd-server: refactor simple reply sending Vladimir Sementsov-Ogievskiy
2017-09-25 13:57 ` [Qemu-devel] [PATCH 5/8] nbd: header constants indenting Vladimir Sementsov-Ogievskiy
2017-09-25 13:57 ` [Qemu-devel] [PATCH 6/8] nbd: Minimal structured read for server Vladimir Sementsov-Ogievskiy
2017-09-25 13:58 ` [Qemu-devel] [PATCH 7/8] nbd/client: refactor nbd_receive_starttls Vladimir Sementsov-Ogievskiy
2017-09-25 13:58 ` [Qemu-devel] [PATCH 8/8] nbd: Minimal structured read for client Vladimir Sementsov-Ogievskiy
2017-09-25 22:19   ` Eric Blake
2017-09-27 10:05     ` Vladimir Sementsov-Ogievskiy
2017-09-27 12:32       ` Vladimir Sementsov-Ogievskiy
2017-09-27 15:10         ` [Qemu-devel] [PATCH v1.1 DRAFT] " Vladimir Sementsov-Ogievskiy
2017-10-03  9:59           ` Vladimir Sementsov-Ogievskiy
2017-10-03 10:07     ` [Qemu-devel] [Qemu-block] [PATCH 8/8] " Paolo Bonzini
2017-10-03 12:26       ` Vladimir Sementsov-Ogievskiy
2017-10-03 14:05         ` Paolo Bonzini
2017-10-03 12:58       ` Vladimir Sementsov-Ogievskiy
2017-10-03 13:35         ` Vladimir Sementsov-Ogievskiy
2017-10-03 14:06           ` Paolo Bonzini
2017-10-05  9:59             ` Vladimir Sementsov-Ogievskiy
2017-10-05 10:02             ` Vladimir Sementsov-Ogievskiy
2017-10-05 10:36               ` Paolo Bonzini
2017-10-05 22:12                 ` Eric Blake
2017-10-06  7:09                   ` Vladimir Sementsov-Ogievskiy
2017-10-06  7:23                     ` Vladimir Sementsov-Ogievskiy
2017-10-06  7:34                   ` Vladimir Sementsov-Ogievskiy
2017-10-06 13:44                     ` Eric Blake
2017-09-25 14:06 ` [Qemu-devel] [PATCH 0/8] nbd minimal structured read Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.