All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS
@ 2017-02-03 15:47 Vladimir Sementsov-Ogievskiy
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 01/18] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC Vladimir Sementsov-Ogievskiy
                   ` (18 more replies)
  0 siblings, 19 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Hi all!

We really need exporting dirty bitmaps feature as well as remote
get_block_status for nbd devices. So, here is minimalistic and restricted
realization of 'structured reply' and 'block status' nbd protocol extension
(as second is developed over the first the combined spec may be found here:
https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md)

What is done:

server:
 - can send not fragmented structured replies for CMD_READ (this was done only
   because the spec doesn't allow structure reply feature without maintaining
   structured read)
 - can export dirty bitmap through BLOCK_STATUS. Only one bitmap can be exported,
   negotiation query should be 'qemu-dirty-bitmap:<bitmap name>'
 - cab export block status through BLOCK_STATUS. Client can negotiate only one
   entity for exporting through BLOCK_STATUS - bitmap _or_ block status.
   negotiation query should be 'base:allocation', as defined in the spec.
   server sends only one extent on each BLOCK_STATUS query.

client:
 - can receive not fragmented structured replies for CMD_READ
 - can get load dirty bitmap through nbd. Be careful: bitmap for export is
   is selected during nbd negotiation - actually in open(). So, name argument for
   qmp block-dirty-bitmap-load is just a _new_ name for loaded bitmap.
   (any way, for us block-dirty-bitmap-load is just a way to test the feature,
    really we need only server part)
 - get_block_status works now through nbd CMD_BLOCK_STATUS, if base:allocation is
   negotiated for export.

It should be minimal but fully compatible realization of the spec.

web: https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=nbd-block-status-v1
git: https://src.openvz.org/scm/~vsementsov/qemu.git (tag nbd-block-status-v1)

Vladimir Sementsov-Ogievskiy (18):
  nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC
  nbd-server: refactor simple reply sending
  nbd: Minimal structured read for server
  nbd/client: refactor nbd_receive_starttls
  nbd/client: fix drop_sync
  nbd/client: refactor drop_sync
  nbd: Minimal structured read for client
  hbitmap: add next_zero function
  block/dirty-bitmap: add bdrv_dirty_bitmap_next()
  block/dirty-bitmap: add bdrv_load_dirty_bitmap
  nbd: BLOCK_STATUS for bitmap export: server part
  nbd: BLOCK_STATUS for bitmap export: client part
  nbd: add nbd_dirty_bitmap_load
  qmp: add x-debug-block-dirty-bitmap-sha256
  qmp: add block-dirty-bitmap-load
  iotests: add test for nbd dirty bitmap export
  nbd: BLOCK_STATUS for standard get_block_status function: server part
  nbd: BLOCK_STATUS for standard get_block_status function: client part

 block/dirty-bitmap.c                     |  70 ++++
 block/nbd-client.c                       | 230 ++++++++++-
 block/nbd-client.h                       |  13 +
 block/nbd.c                              |  44 +-
 blockdev.c                               |  57 +++
 include/block/block_int.h                |   4 +
 include/block/dirty-bitmap.h             |  10 +
 include/block/nbd.h                      |  73 +++-
 include/qemu/hbitmap.h                   |  16 +
 nbd/client.c                             | 373 ++++++++++++++---
 nbd/nbd-internal.h                       |  25 +-
 nbd/server.c                             | 661 ++++++++++++++++++++++++++++++-
 qapi/block-core.json                     |  46 ++-
 qemu-nbd.c                               |   2 +-
 tests/Makefile.include                   |   2 +-
 tests/qemu-iotests/180                   | 133 +++++++
 tests/qemu-iotests/180.out               |   5 +
 tests/qemu-iotests/group                 |   1 +
 tests/qemu-iotests/nbd-fault-injector.py |   4 +-
 util/hbitmap.c                           |  37 ++
 20 files changed, 1713 insertions(+), 93 deletions(-)
 create mode 100755 tests/qemu-iotests/180
 create mode 100644 tests/qemu-iotests/180.out

-- 
2.11.0

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 01/18] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-06 19:54   ` Eric Blake
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 02/18] nbd-server: refactor simple reply sending Vladimir Sementsov-Ogievskiy
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

To be consistent when NBD_STRUCTURED_REPLY_MAGIC will be introduced.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/client.c                             | 4 ++--
 nbd/nbd-internal.h                       | 2 +-
 nbd/server.c                             | 4 ++--
 tests/qemu-iotests/nbd-fault-injector.py | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index ffb0743bce..de5c9366c7 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -788,7 +788,7 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
     }
 
     /* Reply
-       [ 0 ..  3]    magic   (NBD_REPLY_MAGIC)
+       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
        [ 4 ..  7]    error   (0 == no error)
        [ 7 .. 15]    handle
      */
@@ -808,7 +808,7 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
           ", handle = %" PRIu64" }",
           magic, reply->error, reply->handle);
 
-    if (magic != NBD_REPLY_MAGIC) {
+    if (magic != NBD_SIMPLE_REPLY_MAGIC) {
         LOG("invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index eee20abc25..49b66b6896 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -59,7 +59,7 @@
 #define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
 #define NBD_REPLY_SIZE          (4 + 4 + 8)
 #define NBD_REQUEST_MAGIC       0x25609513
-#define NBD_REPLY_MAGIC         0x67446698
+#define NBD_SIMPLE_REPLY_MAGIC  0x67446698
 #define NBD_OPTS_MAGIC          0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC        0x0000420281861253LL
 #define NBD_REP_MAGIC           0x0003e889045565a9LL
diff --git a/nbd/server.c b/nbd/server.c
index 5b76261666..b63a8b85e3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -750,11 +750,11 @@ static ssize_t nbd_send_reply(QIOChannel *ioc, NBDReply *reply)
           reply->error, reply->handle);
 
     /* Reply
-       [ 0 ..  3]    magic   (NBD_REPLY_MAGIC)
+       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
        [ 4 ..  7]    error   (0 == no error)
        [ 7 .. 15]    handle
      */
-    stl_be_p(buf, NBD_REPLY_MAGIC);
+    stl_be_p(buf, NBD_SIMPLE_REPLY_MAGIC);
     stl_be_p(buf + 4, reply->error);
     stq_be_p(buf + 8, reply->handle);
 
diff --git a/tests/qemu-iotests/nbd-fault-injector.py b/tests/qemu-iotests/nbd-fault-injector.py
index 6c07191a5a..5d092ee1f6 100755
--- a/tests/qemu-iotests/nbd-fault-injector.py
+++ b/tests/qemu-iotests/nbd-fault-injector.py
@@ -56,7 +56,7 @@ NBD_CMD_READ = 0
 NBD_CMD_WRITE = 1
 NBD_CMD_DISC = 2
 NBD_REQUEST_MAGIC = 0x25609513
-NBD_REPLY_MAGIC = 0x67446698
+NBD_SIMPLE_REPLY_MAGIC = 0x67446698
 NBD_PASSWD = 0x4e42444d41474943
 NBD_OPTS_MAGIC = 0x49484156454F5054
 NBD_CLIENT_MAGIC = 0x0000420281861253
@@ -166,7 +166,7 @@ def read_request(conn):
     return req
 
 def write_reply(conn, error, handle):
-    buf = reply_struct.pack(NBD_REPLY_MAGIC, error, handle)
+    buf = reply_struct.pack(NBD_SIMPLE_REPLY_MAGIC, error, handle)
     conn.send(buf, event='reply')
 
 def handle_connection(conn, use_export):
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 02/18] nbd-server: refactor simple reply sending
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 01/18] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-06 21:09   ` Eric Blake
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 03/18] nbd: Minimal structured read for server Vladimir Sementsov-Ogievskiy
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Rename functions appropriately and also make a separate copy of NBDReply
- NBDSimpleReply, to replace NBDReply for the server. NBDReply itself
will be upgraded in future patches to handle both simple and structured
replies in the client.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h |  7 +++++++
 nbd/server.c        | 25 +++++++++++++------------
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3e373f0498..3c65cf8d87 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -63,6 +63,13 @@ struct NBDReply {
 };
 typedef struct NBDReply NBDReply;
 
+struct NBDSimpleReply {
+    /* uint32_t NBD_SIMPLE_REPLY_MAGIC */
+    uint64_t handle;
+    uint32_t error;
+};
+typedef struct NBDSimpleReply NBDSimpleReply;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
diff --git a/nbd/server.c b/nbd/server.c
index b63a8b85e3..4cfc02123b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -738,7 +738,7 @@ static ssize_t nbd_receive_request(QIOChannel *ioc, NBDRequest *request)
     return 0;
 }
 
-static ssize_t nbd_send_reply(QIOChannel *ioc, NBDReply *reply)
+static ssize_t nbd_send_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply)
 {
     uint8_t buf[NBD_REPLY_SIZE];
     ssize_t ret;
@@ -1036,8 +1036,8 @@ void nbd_export_close_all(void)
     }
 }
 
-static ssize_t nbd_co_send_reply(NBDRequestData *req, NBDReply *reply,
-                                 int len)
+static ssize_t nbd_co_send_simple_reply(NBDRequestData *req,
+                                        NBDSimpleReply *reply, int len)
 {
     NBDClient *client = req->client;
     ssize_t rc, ret;
@@ -1048,10 +1048,10 @@ static ssize_t nbd_co_send_reply(NBDRequestData *req, NBDReply *reply,
     nbd_set_handlers(client);
 
     if (!len) {
-        rc = nbd_send_reply(client->ioc, reply);
+        rc = nbd_send_simple_reply(client->ioc, reply);
     } else {
         qio_channel_set_cork(client->ioc, true);
-        rc = nbd_send_reply(client->ioc, reply);
+        rc = nbd_send_simple_reply(client->ioc, reply);
         if (rc >= 0) {
             ret = write_sync(client->ioc, req->data, len);
             if (ret != len) {
@@ -1174,7 +1174,7 @@ static void nbd_trip(void *opaque)
     NBDExport *exp = client->exp;
     NBDRequestData *req;
     NBDRequest request;
-    NBDReply reply;
+    NBDSimpleReply reply;
     ssize_t ret;
     int flags;
 
@@ -1231,8 +1231,9 @@ static void nbd_trip(void *opaque)
         }
 
         TRACE("Read %" PRIu32" byte(s)", request.len);
-        if (nbd_co_send_reply(req, &reply, request.len) < 0)
+        if (nbd_co_send_simple_reply(req, &reply, request.len) < 0) {
             goto out;
+        }
         break;
     case NBD_CMD_WRITE:
         TRACE("Request type is WRITE");
@@ -1257,7 +1258,7 @@ static void nbd_trip(void *opaque)
             goto error_reply;
         }
 
-        if (nbd_co_send_reply(req, &reply, 0) < 0) {
+        if (nbd_co_send_simple_reply(req, &reply, 0) < 0) {
             goto out;
         }
         break;
@@ -1288,7 +1289,7 @@ static void nbd_trip(void *opaque)
             goto error_reply;
         }
 
-        if (nbd_co_send_reply(req, &reply, 0) < 0) {
+        if (nbd_co_send_simple_reply(req, &reply, 0) < 0) {
             goto out;
         }
         break;
@@ -1305,7 +1306,7 @@ static void nbd_trip(void *opaque)
             LOG("flush failed");
             reply.error = -ret;
         }
-        if (nbd_co_send_reply(req, &reply, 0) < 0) {
+        if (nbd_co_send_simple_reply(req, &reply, 0) < 0) {
             goto out;
         }
         break;
@@ -1317,7 +1318,7 @@ static void nbd_trip(void *opaque)
             LOG("discard failed");
             reply.error = -ret;
         }
-        if (nbd_co_send_reply(req, &reply, 0) < 0) {
+        if (nbd_co_send_simple_reply(req, &reply, 0) < 0) {
             goto out;
         }
         break;
@@ -1328,7 +1329,7 @@ static void nbd_trip(void *opaque)
         /* We must disconnect after NBD_CMD_WRITE if we did not
          * read the payload.
          */
-        if (nbd_co_send_reply(req, &reply, 0) < 0 || !req->complete) {
+        if (nbd_co_send_simple_reply(req, &reply, 0) < 0 || !req->complete) {
             goto out;
         }
         break;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 03/18] nbd: Minimal structured read for server
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 01/18] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC Vladimir Sementsov-Ogievskiy
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 02/18] nbd-server: refactor simple reply sending Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-06 23:01   ` Eric Blake
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls Vladimir Sementsov-Ogievskiy
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Minimal implementation of structured read: one data chunk + finishing
none chunk. No segmentation.
Minimal structured error implementation: no text message.
Support DF flag, but just ignore it, as there is no segmentation any
way.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h |  31 +++++++++++++
 nbd/nbd-internal.h  |   2 +
 nbd/server.c        | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 154 insertions(+), 4 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3c65cf8d87..58b864f145 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -70,6 +70,25 @@ struct NBDSimpleReply {
 };
 typedef struct NBDSimpleReply NBDSimpleReply;
 
+typedef struct NBDStructuredReplyChunk {
+    uint32_t magic;
+    uint16_t flags;
+    uint16_t type;
+    uint64_t handle;
+    uint32_t length;
+} QEMU_PACKED NBDStructuredReplyChunk;
+
+typedef struct NBDStructuredRead {
+    NBDStructuredReplyChunk h;
+    uint64_t offset;
+} QEMU_PACKED NBDStructuredRead;
+
+typedef struct NBDStructuredError {
+    NBDStructuredReplyChunk h;
+    uint32_t error;
+    uint16_t message_length;
+} QEMU_PACKED NBDStructuredError;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
@@ -79,6 +98,7 @@ typedef struct NBDSimpleReply NBDSimpleReply;
 #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
 #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
 #define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)     /* Send WRITE_ZEROES */
+#define NBD_FLAG_SEND_DF        (1 << 7)        /* Send DF (Do not Fragment) */
 
 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
@@ -106,6 +126,7 @@ typedef struct NBDSimpleReply NBDSimpleReply;
 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
 #define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */
+#define NBD_CMD_FLAG_DF         (1 << 2) /* don't fragment structured read */
 
 /* Supported request types */
 enum {
@@ -130,6 +151,16 @@ enum {
  * aren't overflowing some other buffer. */
 #define NBD_MAX_NAME_SIZE 256
 
+/* Structured reply flags */
+#define NBD_REPLY_FLAG_DONE 1
+
+/* Structured reply types */
+#define NBD_REPLY_TYPE_NONE 0
+#define NBD_REPLY_TYPE_OFFSET_DATA 1
+#define NBD_REPLY_TYPE_OFFSET_HOLE 2
+#define NBD_REPLY_TYPE_ERROR ((1 << 15) + 1)
+#define NBD_REPLY_TYPE_ERROR_OFFSET ((1 << 15) + 2)
+
 ssize_t nbd_wr_syncv(QIOChannel *ioc,
                      struct iovec *iov,
                      size_t niov,
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 49b66b6896..489eeaf887 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -60,6 +60,7 @@
 #define NBD_REPLY_SIZE          (4 + 4 + 8)
 #define NBD_REQUEST_MAGIC       0x25609513
 #define NBD_SIMPLE_REPLY_MAGIC  0x67446698
+#define NBD_STRUCTURED_REPLY_MAGIC 0x668e33ef
 #define NBD_OPTS_MAGIC          0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC        0x0000420281861253LL
 #define NBD_REP_MAGIC           0x0003e889045565a9LL
@@ -81,6 +82,7 @@
 #define NBD_OPT_LIST            (3)
 #define NBD_OPT_PEEK_EXPORT     (4)
 #define NBD_OPT_STARTTLS        (5)
+#define NBD_OPT_STRUCTURED_REPLY (8)
 
 /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
  * but only a limited set of errno values is specified in the protocol.
diff --git a/nbd/server.c b/nbd/server.c
index 4cfc02123b..cb79a93c87 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -100,6 +100,8 @@ struct NBDClient {
     QTAILQ_ENTRY(NBDClient) next;
     int nb_requests;
     bool closing;
+
+    bool structured_reply;
 };
 
 /* That's all folks */
@@ -573,6 +575,16 @@ static int nbd_negotiate_options(NBDClient *client)
                     return ret;
                 }
                 break;
+
+            case NBD_OPT_STRUCTURED_REPLY:
+                client->structured_reply = true;
+                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
+                                             clientflags);
+                if (ret < 0) {
+                    return ret;
+                }
+                break;
+
             default:
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
@@ -1067,6 +1079,86 @@ static ssize_t nbd_co_send_simple_reply(NBDRequestData *req,
     return rc;
 }
 
+static void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
+                         uint16_t type, uint64_t handle, uint32_t length)
+{
+    stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
+    stw_be_p(&chunk->flags, flags);
+    stw_be_p(&chunk->type, type);
+    stq_be_p(&chunk->handle, handle);
+    stl_be_p(&chunk->length, length);
+}
+
+static int nbd_co_send_iov(NBDClient *client, struct iovec *iov, unsigned niov)
+{
+    ssize_t ret;
+    size_t size = iov_size(iov, niov);
+
+    g_assert(qemu_in_coroutine());
+    qemu_co_mutex_lock(&client->send_lock);
+    client->send_coroutine = qemu_coroutine_self();
+    nbd_set_handlers(client);
+
+    ret = nbd_wr_syncv(client->ioc, iov, niov, size, false);
+    if (ret >= 0 && ret != size) {
+        ret = -EIO;
+    }
+
+    client->send_coroutine = NULL;
+    nbd_set_handlers(client);
+    qemu_co_mutex_unlock(&client->send_lock);
+
+    return ret;
+}
+
+static inline int nbd_co_send_buf(NBDClient *client, void *buf, size_t size)
+{
+    struct iovec iov[] = {
+        {.iov_base = buf, .iov_len = size}
+    };
+
+    return nbd_co_send_iov(client, iov, 1);
+}
+
+static int nbd_co_send_structured_read(NBDClient *client, uint64_t handle,
+                                       uint64_t offset, void *data, size_t size)
+{
+    NBDStructuredRead chunk;
+
+    struct iovec iov[] = {
+        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+        {.iov_base = data, .iov_len = size}
+    };
+
+    set_be_chunk(&chunk.h, 0, NBD_REPLY_TYPE_OFFSET_DATA, handle,
+                 sizeof(chunk) - sizeof(chunk.h) + size);
+    stq_be_p(&chunk.offset, offset);
+
+    return nbd_co_send_iov(client, iov, 2);
+}
+
+static int nbd_co_send_structured_error(NBDClient *client, uint64_t handle,
+                                        uint32_t error)
+{
+    NBDStructuredError chunk;
+
+    set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_ERROR, handle,
+                 sizeof(chunk) - sizeof(chunk.h));
+    stl_be_p(&chunk.error, error);
+    stw_be_p(&chunk.message_length, 0);
+
+    return nbd_co_send_buf(client, &chunk, sizeof(chunk));
+}
+
+static int nbd_co_send_structured_none(NBDClient *client, uint64_t handle)
+{
+    NBDStructuredReplyChunk chunk;
+
+    set_be_chunk(&chunk, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE, handle, 0);
+
+    return nbd_co_send_buf(client, &chunk, sizeof(chunk));
+}
+
 /* Collect a client request.  Return 0 if request looks valid, -EAGAIN
  * to keep trying the collection, -EIO to drop connection right away,
  * and any other negative value to report an error to the client
@@ -1147,7 +1239,8 @@ static ssize_t nbd_co_receive_request(NBDRequestData *req,
         rc = request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
         goto out;
     }
-    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
+    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
+                           NBD_CMD_FLAG_DF)) {
         LOG("unsupported flags (got 0x%x)", request->flags);
         rc = -EINVAL;
         goto out;
@@ -1226,12 +1319,34 @@ static void nbd_trip(void *opaque)
                         req->data, request.len);
         if (ret < 0) {
             LOG("reading from file failed");
-            reply.error = -ret;
-            goto error_reply;
+            if (client->structured_reply) {
+                ret = nbd_co_send_structured_error(req->client, request.handle,
+                                                   -ret);
+                if (ret < 0) {
+                    goto out;
+                } else {
+                    break;
+                }
+            } else {
+                reply.error = -ret;
+                goto error_reply;
+            }
         }
 
         TRACE("Read %" PRIu32" byte(s)", request.len);
-        if (nbd_co_send_simple_reply(req, &reply, request.len) < 0) {
+        if (client->structured_reply) {
+            ret = nbd_co_send_structured_read(req->client, request.handle,
+                                              request.from, req->data,
+                                              request.len);
+            if (ret < 0) {
+                goto out;
+            }
+
+            ret = nbd_co_send_structured_none(req->client, request.handle);
+        } else {
+            ret = nbd_co_send_simple_reply(req, &reply, request.len);
+        }
+        if (ret < 0) {
             goto out;
         }
         break;
@@ -1444,6 +1559,8 @@ void nbd_client_new(NBDExport *exp,
     client->can_read = true;
     client->close = close_fn;
 
+    client->structured_reply = false;
+
     data->client = client;
     data->co = qemu_coroutine_create(nbd_co_client_start, data);
     qemu_coroutine_enter(data->co);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 03/18] nbd: Minimal structured read for server Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-07 16:32   ` Eric Blake
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 05/18] nbd/client: fix drop_sync Vladimir Sementsov-Ogievskiy
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Split out nbd_receive_simple_option to be reused for structured reply
option.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/client.c       | 54 +++++++++++++++++++++++++++++++++++-------------------
 nbd/nbd-internal.h | 14 ++++++++++++++
 2 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index de5c9366c7..6caf6bda6d 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -395,39 +395,55 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
     }
 }
 
-static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
-                                        QCryptoTLSCreds *tlscreds,
-                                        const char *hostname, Error **errp)
+static int nbd_receive_simple_option(QIOChannel *ioc, int opt,
+                                     bool abort_on_notsup, Error **errp)
 {
     nbd_opt_reply reply;
-    QIOChannelTLS *tioc;
-    struct NBDTLSHandshakeData data = { 0 };
 
-    TRACE("Requesting TLS from server");
-    if (nbd_send_option_request(ioc, NBD_OPT_STARTTLS, 0, NULL, errp) < 0) {
-        return NULL;
+    TRACE("Requesting '%s' option from server", nbd_opt_name(opt));
+    if (nbd_send_option_request(ioc, opt, 0, NULL, errp) < 0) {
+        return -1;
     }
 
-    TRACE("Getting TLS reply from server");
-    if (nbd_receive_option_reply(ioc, NBD_OPT_STARTTLS, &reply, errp) < 0) {
-        return NULL;
+    TRACE("Getting '%s' option reply from server", nbd_opt_name(opt));
+    if (nbd_receive_option_reply(ioc, opt, &reply, errp) < 0) {
+        return -1;
     }
 
     if (reply.type != NBD_REP_ACK) {
-        error_setg(errp, "Server rejected request to start TLS %" PRIx32,
-                   reply.type);
-        nbd_send_opt_abort(ioc);
-        return NULL;
+        error_setg(errp, "Server rejected request for '%s' option: %" PRIx32,
+                   nbd_opt_name(opt), reply.type);
+        if (abort_on_notsup) {
+            nbd_send_opt_abort(ioc);
+        }
+        return -1;
     }
 
     if (reply.length != 0) {
-        error_setg(errp, "Start TLS response was not zero %" PRIu32,
-                   reply.length);
-        nbd_send_opt_abort(ioc);
+        error_setg(errp, "'%s' option response was not zero %" PRIu32,
+                   nbd_opt_name(opt), reply.length);
+        if (abort_on_notsup) {
+            nbd_send_opt_abort(ioc);
+        }
+        return -1;
+    }
+
+    TRACE("%s 'option' approved", nbd_opt_name(opt));
+    return 0;
+}
+
+static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
+                                        QCryptoTLSCreds *tlscreds,
+                                        const char *hostname, Error **errp)
+{
+    QIOChannelTLS *tioc;
+    struct NBDTLSHandshakeData data = { 0 };
+
+    if (nbd_receive_simple_option(ioc, NBD_OPT_STARTTLS, true, errp) < 0) {
         return NULL;
     }
 
-    TRACE("TLS request approved, setting up TLS");
+    TRACE("Setting up TLS");
     tioc = qio_channel_tls_new_client(ioc, tlscreds, hostname, errp);
     if (!tioc) {
         return NULL;
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 489eeaf887..3284bfc85a 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -96,6 +96,20 @@
 #define NBD_ENOSPC     28
 #define NBD_ESHUTDOWN  108
 
+static inline const char *nbd_opt_name(int opt)
+{
+    switch (opt) {
+    case NBD_OPT_EXPORT_NAME: return "export_name";
+    case NBD_OPT_ABORT: return "abort";
+    case NBD_OPT_LIST: return "list";
+    case NBD_OPT_PEEK_EXPORT: return "peek_export";
+    case NBD_OPT_STARTTLS: return "tls";
+    case NBD_OPT_STRUCTURED_REPLY: return "structured_reply";
+    }
+
+    return "<unknown option>";
+}
+
 static inline ssize_t read_sync(QIOChannel *ioc, void *buffer, size_t size)
 {
     struct iovec iov = { .iov_base = buffer, .iov_len = size };
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 05/18] nbd/client: fix drop_sync
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-06 23:17   ` Eric Blake
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 06/18] nbd/client: refactor drop_sync Vladimir Sementsov-Ogievskiy
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Comparison symbol is misused. It may lead to memory corruption.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/nbd/client.c b/nbd/client.c
index 6caf6bda6d..351731bc63 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -94,7 +94,7 @@ static ssize_t drop_sync(QIOChannel *ioc, size_t size)
     char small[1024];
     char *buffer;
 
-    buffer = sizeof(small) < size ? small : g_malloc(MIN(65536, size));
+    buffer = sizeof(small) > size ? small : g_malloc(MIN(65536, size));
     while (size > 0) {
         ssize_t count = read_sync(ioc, buffer, MIN(65536, size));
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 06/18] nbd/client: refactor drop_sync
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 05/18] nbd/client: fix drop_sync Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-06 23:19   ` Eric Blake
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 07/18] nbd: Minimal structured read for client Vladimir Sementsov-Ogievskiy
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Return 0 on success to simplify success checking.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/client.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 351731bc63..1c274f3012 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -86,31 +86,34 @@ static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
 
 */
 
-/* Discard length bytes from channel.  Return -errno on failure, or
- * the amount of bytes consumed. */
-static ssize_t drop_sync(QIOChannel *ioc, size_t size)
+/* Discard length bytes from channel.
+ * Return 0 on success and -errno on fail.
+ */
+static int drop_sync(QIOChannel *ioc, size_t size)
 {
-    ssize_t ret = 0;
+    ssize_t ret;
     char small[1024];
     char *buffer;
 
     buffer = sizeof(small) > size ? small : g_malloc(MIN(65536, size));
     while (size > 0) {
-        ssize_t count = read_sync(ioc, buffer, MIN(65536, size));
-
-        if (count <= 0) {
-            goto cleanup;
+        ret = read_sync(ioc, buffer, MIN(65536, size));
+        if (ret == 0) {
+            ret = -EIO;
+        }
+        if (ret < 0) {
+            break;
         }
-        assert(count <= size);
-        size -= count;
-        ret += count;
+
+        assert(ret <= size);
+        size -= ret;
     }
 
- cleanup:
     if (buffer != small) {
         g_free(buffer);
     }
-    return ret;
+
+    return ret < 0 ? ret : 0;
 }
 
 /* Send an option request.
@@ -334,7 +337,7 @@ static int nbd_receive_list(QIOChannel *ioc, const char *want, bool *match,
         return -1;
     }
     if (namelen != strlen(want)) {
-        if (drop_sync(ioc, len) != len) {
+        if (drop_sync(ioc, len) < 0) {
             error_setg(errp, "failed to skip export name with wrong length");
             nbd_send_opt_abort(ioc);
             return -1;
@@ -350,7 +353,7 @@ static int nbd_receive_list(QIOChannel *ioc, const char *want, bool *match,
     }
     name[namelen] = '\0';
     len -= namelen;
-    if (drop_sync(ioc, len) != len) {
+    if (drop_sync(ioc, len) < 0) {
         error_setg(errp, "failed to read export description");
         nbd_send_opt_abort(ioc);
         return -1;
@@ -635,7 +638,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
     }
 
     TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
-    if (zeroes && drop_sync(ioc, 124) != 124) {
+    if (zeroes && drop_sync(ioc, 124) < 0) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
     }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 07/18] nbd: Minimal structured read for client
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 06/18] nbd/client: refactor drop_sync Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-07 20:14   ` Eric Blake
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 08/18] hbitmap: add next_zero function Vladimir Sementsov-Ogievskiy
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Minimal implementation: always send DF flag, to not deal with fragmented
replies.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.c  |  47 +++++++++++----
 block/nbd-client.h  |   2 +
 include/block/nbd.h |  15 +++--
 nbd/client.c        | 170 ++++++++++++++++++++++++++++++++++++++++++++++------
 qemu-nbd.c          |   2 +-
 5 files changed, 203 insertions(+), 33 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 3779c6c999..ff96bd1635 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -180,13 +180,20 @@ static void nbd_co_receive_reply(NBDClientSession *s,
     *reply = s->reply;
     if (reply->handle != request->handle ||
         !s->ioc) {
+        reply->simple = true;
         reply->error = EIO;
     } else {
-        if (qiov && reply->error == 0) {
-            ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov, request->len,
-                               true);
-            if (ret != request->len) {
-                reply->error = EIO;
+        if (qiov) {
+            if ((reply->simple ? reply->error == 0 :
+                         reply->type == NBD_REPLY_TYPE_OFFSET_DATA)) {
+                ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov, request->len,
+                                   true);
+                if (ret != request->len) {
+                    reply->error = EIO;
+                }
+            } else if (!reply->simple &&
+                       reply->type == NBD_REPLY_TYPE_OFFSET_HOLE) {
+                qemu_iovec_memset(qiov, 0, 0, request->len);
             }
         }
 
@@ -227,6 +234,7 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
         .type = NBD_CMD_READ,
         .from = offset,
         .len = bytes,
+        .flags = client->structured_reply ? NBD_CMD_FLAG_DF : 0,
     };
     NBDReply reply;
     ssize_t ret;
@@ -237,12 +245,30 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
     nbd_coroutine_start(client, &request);
     ret = nbd_co_send_request(bs, &request, NULL);
     if (ret < 0) {
-        reply.error = -ret;
-    } else {
-        nbd_co_receive_reply(client, &request, &reply, qiov);
+        goto out;
     }
+
+    nbd_co_receive_reply(client, &request, &reply, qiov);
+    if (reply.error != 0) {
+        ret = -reply.error;
+    }
+
+    if (!reply.simple) {
+        while (!(reply.flags & NBD_REPLY_FLAG_DONE)) {
+            nbd_co_receive_reply(client, &request, &reply, qiov);
+            if (reply.error != 0) {
+                ret = -reply.error;
+            }
+            if (reply.simple) {
+                ret = -EIO;
+                goto out;
+            }
+        }
+    }
+
+out:
     nbd_coroutine_end(client, &request);
-    return -reply.error;
+    return ret;
 }
 
 int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
@@ -408,7 +434,8 @@ int nbd_client_init(BlockDriverState *bs,
                                 &client->nbdflags,
                                 tlscreds, hostname,
                                 &client->ioc,
-                                &client->size, errp);
+                                &client->size,
+                                &client->structured_reply, errp);
     if (ret < 0) {
         logout("Failed to negotiate with the NBD server\n");
         return ret;
diff --git a/block/nbd-client.h b/block/nbd-client.h
index f8d6006849..cba1f965bf 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -32,6 +32,8 @@ typedef struct NBDClientSession {
     NBDReply reply;
 
     bool is_unix;
+
+    bool structured_reply;
 } NBDClientSession;
 
 NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 58b864f145..dae2e4bd03 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -57,11 +57,16 @@ struct NBDRequest {
 };
 typedef struct NBDRequest NBDRequest;
 
-struct NBDReply {
+typedef struct NBDReply {
+    bool simple;
     uint64_t handle;
     uint32_t error;
-};
-typedef struct NBDReply NBDReply;
+
+    uint16_t flags;
+    uint16_t type;
+    uint32_t length;
+    uint64_t offset;
+} NBDReply;
 
 struct NBDSimpleReply {
     /* uint32_t NBD_SIMPLE_REPLY_MAGIC */
@@ -169,10 +174,10 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
 int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
                           QIOChannel **outioc,
-                          off_t *size, Error **errp);
+                          off_t *size, bool *structured_reply, Error **errp);
 int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
 ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
-ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
+int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 
diff --git a/nbd/client.c b/nbd/client.c
index 1c274f3012..9225f7e30d 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -472,11 +472,10 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
     return QIO_CHANNEL(tioc);
 }
 
-
 int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
                           QIOChannel **outioc,
-                          off_t *size, Error **errp)
+                          off_t *size, bool *structured_reply, Error **errp)
 {
     char buf[256];
     uint64_t magic, s;
@@ -584,6 +583,12 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
             if (nbd_receive_query_exports(ioc, name, errp) < 0) {
                 goto fail;
             }
+
+            if (structured_reply != NULL) {
+                *structured_reply =
+                    nbd_receive_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
+                                              false, NULL) == 0;
+            }
         }
         /* write the export name request */
         if (nbd_send_option_request(ioc, NBD_OPT_EXPORT_NAME, -1, name,
@@ -603,6 +608,14 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
             goto fail;
         }
         be16_to_cpus(flags);
+
+        if (!!structured_reply && *structured_reply &&
+            !(*flags & NBD_CMD_FLAG_DF))
+        {
+            error_setg(errp, "Structured reply is negotiated, "
+                             "but DF flag is not.");
+            goto fail;
+        }
     } else if (magic == NBD_CLIENT_MAGIC) {
         uint32_t oldflags;
 
@@ -790,20 +803,33 @@ ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request)
     return 0;
 }
 
-ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
+static inline int read_sync_check(QIOChannel *ioc, void *buffer, size_t size)
 {
-    uint8_t buf[NBD_REPLY_SIZE];
-    uint32_t magic;
     ssize_t ret;
 
-    ret = read_sync(ioc, buf, sizeof(buf));
+    ret = read_sync(ioc, buffer, size);
     if (ret < 0) {
         return ret;
     }
-
-    if (ret != sizeof(buf)) {
+    if (ret != size) {
         LOG("read failed");
-        return -EINVAL;
+        return -EIO;
+    }
+
+    return 0;
+}
+
+/* nbd_receive_simple_reply
+ * Read simple reply except magic field (which should be already read)
+ */
+static int nbd_receive_simple_reply(QIOChannel *ioc, NBDReply *reply)
+{
+    uint8_t buf[NBD_REPLY_SIZE - 4];
+    ssize_t ret;
+
+    ret = read_sync_check(ioc, buf, sizeof(buf));
+    if (ret < 0) {
+        return ret;
     }
 
     /* Reply
@@ -812,9 +838,124 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
        [ 7 .. 15]    handle
      */
 
-    magic = ldl_be_p(buf);
-    reply->error  = ldl_be_p(buf + 4);
-    reply->handle = ldq_be_p(buf + 8);
+    reply->error  = ldl_be_p(buf);
+    reply->handle = ldq_be_p(buf + 4);
+
+    return 0;
+}
+
+/* nbd_receive_structured_reply_chunk
+ * Read structured reply chunk except magic field (which should be already read)
+ * Data for NBD_REPLY_TYPE_OFFSET_DATA is not read too.
+ * Length field of reply out parameter corresponds to unread part of reply.
+ */
+static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *reply)
+{
+    NBDStructuredReplyChunk chunk;
+    ssize_t ret;
+    uint16_t message_size;
+
+    ret = read_sync_check(ioc, (uint8_t *)&chunk + sizeof(chunk.magic),
+                          sizeof(chunk) - sizeof(chunk.magic));
+    if (ret < 0) {
+        return ret;
+    }
+
+    reply->flags = be16_to_cpu(chunk.flags);
+    reply->type = be16_to_cpu(chunk.type);
+    reply->handle = be64_to_cpu(chunk.handle);
+    reply->length = be32_to_cpu(chunk.length);
+
+    switch (reply->type) {
+    case NBD_REPLY_TYPE_NONE:
+        break;
+    case NBD_REPLY_TYPE_OFFSET_DATA:
+    case NBD_REPLY_TYPE_OFFSET_HOLE:
+        ret = read_sync_check(ioc, &reply->offset, sizeof(reply->offset));
+        if (ret < 0) {
+            return ret;
+        }
+        be64_to_cpus(&reply->offset);
+        reply->length -= sizeof(reply->offset);
+        break;
+    case NBD_REPLY_TYPE_ERROR:
+    case NBD_REPLY_TYPE_ERROR_OFFSET:
+        ret = read_sync_check(ioc, &reply->error, sizeof(reply->error));
+        if (ret < 0) {
+            return ret;
+        }
+        be32_to_cpus(&reply->error);
+
+        ret = read_sync_check(ioc, &message_size, sizeof(message_size));
+        if (ret < 0) {
+            return ret;
+        }
+        be16_to_cpus(&message_size);
+
+        if (message_size > 0) {
+            /* TODO: provide error message to user */
+            ret = drop_sync(ioc, message_size);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+
+        if (reply->type == NBD_REPLY_TYPE_ERROR_OFFSET) {
+            /* drop 64bit offset */
+            ret = drop_sync(ioc, 8);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+        break;
+    default:
+        if (reply->type & (1 << 15)) {
+            /* unknown error */
+            ret = drop_sync(ioc, reply->length);
+            if (ret < 0) {
+                return ret;
+            }
+
+            reply->error = NBD_EINVAL;
+            reply->length = 0;
+        } else {
+            /* unknown non-error reply type */
+            return -EINVAL;
+        }
+    }
+
+    return 0;
+}
+
+int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
+{
+    uint32_t magic;
+    int ret;
+
+    ret = read_sync_check(ioc, &magic, sizeof(magic));
+    if (ret < 0) {
+        return ret;
+    }
+
+    be32_to_cpus(&magic);
+
+    switch (magic) {
+    case NBD_SIMPLE_REPLY_MAGIC:
+        reply->simple = true;
+        ret = nbd_receive_simple_reply(ioc, reply);
+        break;
+    case NBD_STRUCTURED_REPLY_MAGIC:
+        reply->simple = false;
+        ret = nbd_receive_structured_reply_chunk(ioc, reply);
+        break;
+    default:
+        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
+        return -EINVAL;
+    }
+
+    if (ret < 0) {
+        return ret;
+    }
 
     reply->error = nbd_errno_to_system_errno(reply->error);
 
@@ -827,10 +968,5 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
           ", handle = %" PRIu64" }",
           magic, reply->error, reply->handle);
 
-    if (magic != NBD_SIMPLE_REPLY_MAGIC) {
-        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
-        return -EINVAL;
-    }
     return 0;
 }
-
diff --git a/qemu-nbd.c b/qemu-nbd.c
index c734f627b4..de0099e333 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -272,7 +272,7 @@ static void *nbd_client_thread(void *arg)
 
     ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL, &nbdflags,
                                 NULL, NULL, NULL,
-                                &size, &local_error);
+                                &size, NULL, &local_error);
     if (ret < 0) {
         if (local_error) {
             error_report_err(local_error);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 08/18] hbitmap: add next_zero function
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 07/18] nbd: Minimal structured read for client Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-07 22:55   ` Eric Blake
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 09/18] block/dirty-bitmap: add bdrv_dirty_bitmap_next() Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

The function searches for next zero bit.
Also add interface for BdrvDirtyBitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/dirty-bitmap.c         |  5 +++++
 include/block/dirty-bitmap.h |  2 ++
 include/qemu/hbitmap.h       |  8 ++++++++
 util/hbitmap.c               | 26 ++++++++++++++++++++++++++
 4 files changed, 41 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 519737c8d3..d2f23a00d9 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -533,3 +533,8 @@ int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap)
 {
     return hbitmap_count(bitmap->meta);
 }
+
+int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start)
+{
+    return hbitmap_next_zero(bitmap->bitmap, start);
+}
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 9dea14ba03..8826b74b37 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -72,4 +72,6 @@ void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
                                           bool finish);
 void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
 
+int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start);
+
 #endif
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index eb464759d5..d6fe553b12 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -257,6 +257,14 @@ void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first);
  */
 unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi);
 
+/* hbitmap_next_zero:
+ * @hb: The HBitmap to operate on
+ * @start: The bit to start from.
+ *
+ * Find next not dirty bit.
+ */
+int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start);
+
 /* hbitmap_create_meta:
  * Create a "meta" hbitmap to track dirtiness of the bits in this HBitmap.
  * The caller owns the created bitmap and must call hbitmap_free_meta(hb) to
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 9f691b76bd..b850c2baf5 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -166,6 +166,32 @@ void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first)
     }
 }
 
+int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start)
+{
+    size_t pos = (start >> hb->granularity) >> BITS_PER_LEVEL;
+    unsigned long *last_lev = hb->levels[HBITMAP_LEVELS - 1];
+    uint64_t sz = hb->sizes[HBITMAP_LEVELS - 1];
+    unsigned long cur = last_lev[pos];
+    unsigned start_bit_offset =
+            (start >> hb->granularity) & (BITS_PER_LONG - 1);
+    cur |= (1UL << start_bit_offset) - 1;
+
+    if (cur == (unsigned long)-1) {
+        do {
+            pos++;
+        } while (pos < sz && last_lev[pos] == (unsigned long)-1);
+
+        if (pos >= sz) {
+            return -1;
+        }
+
+        cur = last_lev[pos];
+    }
+
+    return MIN((pos << BITS_PER_LEVEL) + ctol(cur), hb->size - 1)
+            << hb->granularity;
+}
+
 bool hbitmap_empty(const HBitmap *hb)
 {
     return hb->count == 0;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 09/18] block/dirty-bitmap: add bdrv_dirty_bitmap_next()
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 08/18] hbitmap: add next_zero function Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 10/18] block/dirty-bitmap: add bdrv_load_dirty_bitmap Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/dirty-bitmap.c         | 7 +++++++
 include/block/dirty-bitmap.h | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index d2f23a00d9..3b7db1d78c 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -538,3 +538,10 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start)
 {
     return hbitmap_next_zero(bitmap->bitmap, start);
 }
+
+BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
+                                        BdrvDirtyBitmap *bitmap)
+{
+    return bitmap == NULL ? QLIST_FIRST(&bs->dirty_bitmaps) :
+                            QLIST_NEXT(bitmap, list);
+}
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 8826b74b37..ff8163ba02 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -74,4 +74,7 @@ void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
 
 int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start);
 
+BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
+                                        BdrvDirtyBitmap *bitmap);
+
 #endif
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 10/18] block/dirty-bitmap: add bdrv_load_dirty_bitmap
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 09/18] block/dirty-bitmap: add bdrv_dirty_bitmap_next() Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-08 11:45   ` Fam Zheng
  2017-02-16 12:37   ` Denis V. Lunev
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 11/18] nbd: BLOCK_STATUS for bitmap export: server part Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  18 siblings, 2 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/dirty-bitmap.c         | 53 ++++++++++++++++++++++++++++++++++++++++++++
 include/block/block_int.h    |  4 ++++
 include/block/dirty-bitmap.h |  3 +++
 3 files changed, 60 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 3b7db1d78c..394d4328d5 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -545,3 +545,56 @@ BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
     return bitmap == NULL ? QLIST_FIRST(&bs->dirty_bitmaps) :
                             QLIST_NEXT(bitmap, list);
 }
+
+typedef struct BDRVLoadBitmapCo {
+    BlockDriverState *bs;
+    const char *name;
+    Error **errp;
+    BdrvDirtyBitmap *ret;
+    bool in_progress;
+} BDRVLoadBitmapCo;
+
+static void bdrv_load_dity_bitmap_co_entry(void *opaque)
+{
+    BDRVLoadBitmapCo *lbco = opaque;
+    BlockDriver *drv = lbco->bs->drv;
+
+    if (!!drv && !!drv->bdrv_dirty_bitmap_load) {
+        lbco->ret = drv->bdrv_dirty_bitmap_load(lbco->bs, lbco->name,
+                                                lbco->errp);
+    } else if (lbco->bs->file)  {
+        BlockDriverState *bs = lbco->bs;
+        lbco->bs = lbco->bs->file->bs;
+        bdrv_load_dity_bitmap_co_entry(lbco);
+        if (lbco->ret != NULL) {
+            QLIST_REMOVE(lbco->ret, list);
+            QLIST_INSERT_HEAD(&bs->dirty_bitmaps, lbco->ret, list);
+        }
+    } else {
+        lbco->ret = NULL;
+    }
+
+    lbco->in_progress = false;
+}
+
+BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, const char *name,
+                                        Error **errp)
+{
+    Coroutine *co;
+    BDRVLoadBitmapCo lbco = {
+        .bs = bs,
+        .name = name,
+        .errp = errp,
+        .in_progress = true
+    };
+
+    if (qemu_in_coroutine()) {
+        bdrv_load_dity_bitmap_co_entry(&lbco);
+    } else {
+        co = qemu_coroutine_create(bdrv_load_dity_bitmap_co_entry, &lbco);
+        qemu_coroutine_enter(co);
+        BDRV_POLL_WHILE(bs, lbco.in_progress);
+    }
+
+    return lbco.ret;
+}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 83a423c580..d3770db539 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -222,6 +222,10 @@ struct BlockDriver {
     int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
     ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
 
+    BdrvDirtyBitmap *(*bdrv_dirty_bitmap_load)(BlockDriverState *bs,
+                                               const char *name,
+                                               Error **errp);
+
     int coroutine_fn (*bdrv_save_vmstate)(BlockDriverState *bs,
                                           QEMUIOVector *qiov,
                                           int64_t pos);
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index ff8163ba02..c0c70a8c67 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -77,4 +77,7 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start);
 BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
                                         BdrvDirtyBitmap *bitmap);
 
+BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, const char *name,
+                                        Error **errp);
+
 #endif
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 11/18] nbd: BLOCK_STATUS for bitmap export: server part
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 10/18] block/dirty-bitmap: add bdrv_load_dirty_bitmap Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-08 13:13   ` Eric Blake
  2017-02-16 13:00   ` Denis V. Lunev
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 12/18] nbd: BLOCK_STATUS for bitmap export: client part Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  18 siblings, 2 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Only one meta context type is defined: qemu-bitmap:<bitmap-name>.
Maximum one query is allowed for NBD_OPT_{SET,LIST}_META_CONTEXT,
NBD_REP_ERR_TOO_BIG is returned otherwise.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h |  15 ++
 nbd/nbd-internal.h  |   6 +
 nbd/server.c        | 445 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 466 insertions(+)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index dae2e4bd03..516a24765c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -94,6 +94,16 @@ typedef struct NBDStructuredError {
     uint16_t message_length;
 } QEMU_PACKED NBDStructuredError;
 
+typedef struct NBDStructuredMeta {
+    NBDStructuredReplyChunk h;
+    uint32_t context_id;
+} QEMU_PACKED NBDStructuredMeta;
+
+typedef struct NBDExtent {
+    uint32_t length;
+    uint32_t flags;
+} QEMU_PACKED NBDExtent;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
@@ -120,6 +130,7 @@ typedef struct NBDStructuredError {
 
 #define NBD_REP_ACK             (1)             /* Data sending finished. */
 #define NBD_REP_SERVER          (2)             /* Export description. */
+#define NBD_REP_META_CONTEXT    (4)
 
 #define NBD_REP_ERR_UNSUP       NBD_REP_ERR(1)  /* Unknown option */
 #define NBD_REP_ERR_POLICY      NBD_REP_ERR(2)  /* Server denied */
@@ -127,6 +138,8 @@ typedef struct NBDStructuredError {
 #define NBD_REP_ERR_PLATFORM    NBD_REP_ERR(4)  /* Not compiled in */
 #define NBD_REP_ERR_TLS_REQD    NBD_REP_ERR(5)  /* TLS required */
 #define NBD_REP_ERR_SHUTDOWN    NBD_REP_ERR(7)  /* Server shutting down */
+#define NBD_REP_ERR_TOO_BIG     NBD_REP_ERR(9)  /* The request or the reply is
+                                                   too large to process */
 
 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
@@ -142,6 +155,7 @@ enum {
     NBD_CMD_TRIM = 4,
     /* 5 reserved for failed experiment NBD_CMD_CACHE */
     NBD_CMD_WRITE_ZEROES = 6,
+    NBD_CMD_BLOCK_STATUS = 7
 };
 
 #define NBD_DEFAULT_PORT	10809
@@ -163,6 +177,7 @@ enum {
 #define NBD_REPLY_TYPE_NONE 0
 #define NBD_REPLY_TYPE_OFFSET_DATA 1
 #define NBD_REPLY_TYPE_OFFSET_HOLE 2
+#define NBD_REPLY_TYPE_BLOCK_STATUS 5
 #define NBD_REPLY_TYPE_ERROR ((1 << 15) + 1)
 #define NBD_REPLY_TYPE_ERROR_OFFSET ((1 << 15) + 2)
 
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 3284bfc85a..fbbcf69925 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -83,6 +83,10 @@
 #define NBD_OPT_PEEK_EXPORT     (4)
 #define NBD_OPT_STARTTLS        (5)
 #define NBD_OPT_STRUCTURED_REPLY (8)
+#define NBD_OPT_LIST_META_CONTEXT (9)
+#define NBD_OPT_SET_META_CONTEXT  (10)
+
+#define NBD_META_NS_BITMAPS "qemu-dirty-bitmap"
 
 /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
  * but only a limited set of errno values is specified in the protocol.
@@ -105,6 +109,8 @@ static inline const char *nbd_opt_name(int opt)
     case NBD_OPT_PEEK_EXPORT: return "peek_export";
     case NBD_OPT_STARTTLS: return "tls";
     case NBD_OPT_STRUCTURED_REPLY: return "structured_reply";
+    case NBD_OPT_LIST_META_CONTEXT: return "list_meta_context";
+    case NBD_OPT_SET_META_CONTEXT: return "set_meta_context";
     }
 
     return "<unknown option>";
diff --git a/nbd/server.c b/nbd/server.c
index cb79a93c87..0b7b7230df 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -21,6 +21,8 @@
 #include "qapi/error.h"
 #include "nbd-internal.h"
 
+#define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8) /* 1 mb of extents data */
+
 static int system_errno_to_nbd_errno(int err)
 {
     switch (err) {
@@ -102,6 +104,7 @@ struct NBDClient {
     bool closing;
 
     bool structured_reply;
+    BdrvDirtyBitmap *export_bitmap;
 };
 
 /* That's all folks */
@@ -421,7 +424,304 @@ static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
     return QIO_CHANNEL(tioc);
 }
 
+static int nbd_negotiate_read_size_string(QIOChannel *ioc, char **str,
+                                          uint32_t max_len)
+{
+    uint32_t len;
+
+    if (nbd_negotiate_read(ioc, &len, sizeof(len)) != sizeof(len)) {
+        LOG("read failed");
+        return -EIO;
+    }
+
+    cpu_to_be32s(&len);
+
+    if (max_len > 0 && len > max_len) {
+        LOG("Bad length received");
+        return -EINVAL;
+    }
+
+    *str = g_malloc(len + 1);
+
+    if (nbd_negotiate_read(ioc, *str, len) != len) {
+        LOG("read failed");
+        g_free(str);
+        return -EIO;
+    }
+    (*str)[len] = '\0';
+
+    return sizeof(len) + len;
+}
+
+static int nbd_negotiate_send_meta_context(QIOChannel *ioc,
+                                           const char *context,
+                                           uint32_t opt)
+{
+    int ret;
+    size_t len = strlen(context);
+    uint32_t context_id = cpu_to_be32(100);
+
+    ret = nbd_negotiate_send_rep_len(ioc, NBD_REP_META_CONTEXT, opt,
+                                     len + sizeof(context_id));
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (nbd_negotiate_write(ioc, &context_id, sizeof(context_id)) !=
+        sizeof(context_id))
+    {
+        LOG("write failed");
+        return -EIO;
+    }
+
+    if (nbd_negotiate_write(ioc, context, len) != len) {
+        LOG("write failed");
+        return -EIO;
+    }
+
+    return 0;
+}
+
+static int nbd_negotiate_send_bitmap(QIOChannel *ioc, const char *bitmap_name,
+                                     uint32_t opt)
+{
+    char *context = g_strdup_printf("%s:%s", NBD_META_NS_BITMAPS, bitmap_name);
+    int ret = nbd_negotiate_send_meta_context(ioc, context, opt);
+
+    g_free(context);
+
+    return ret;
+}
+
+static int nbd_negotiate_one_bitmap_query(QIOChannel *ioc, BlockDriverState *bs,
+                                          uint32_t opt, const char *query,
+                                          BdrvDirtyBitmap **bitmap)
+{
+    BdrvDirtyBitmap *bm = bdrv_find_dirty_bitmap(bs, query);
+    if (bm != NULL) {
+        if (bitmap != NULL) {
+            *bitmap = bm;
+        }
+        return nbd_negotiate_send_bitmap(ioc, query, opt);
+    }
+
+    return 0;
+}
+
+static int nbd_negotiate_one_meta_query(QIOChannel *ioc, BlockDriverState *bs,
+                                        uint32_t opt, BdrvDirtyBitmap **bitmap)
+{
+    int ret = 0, nb_read;
+    char *query, *colon, *namespace, *subquery;
+
+    *bitmap = NULL;
 
+    nb_read = nbd_negotiate_read_size_string(ioc, &query, 0);
+    if (nb_read < 0) {
+        return nb_read;
+    }
+
+    colon = strchr(query, ':');
+    if (colon == NULL) {
+        ret = -EINVAL;
+        goto out;
+    }
+    *colon = '\0';
+    namespace = query;
+    subquery = colon + 1;
+
+    if (strcmp(namespace, NBD_META_NS_BITMAPS) == 0) {
+        ret = nbd_negotiate_one_bitmap_query(ioc, bs, opt, subquery, bitmap);
+    }
+
+out:
+    g_free(query);
+    return ret < 0 ? ret : nb_read;
+}
+
+/* start handle LIST_META_CONTEXT and SET_META_CONTEXT requests
+ * @opt          should be NBD_OPT_LIST_META_CONTEXT or NBD_OPT_SET_META_CONTEXT
+ * @length       related option data to read
+ * @nb_queries   out parameter, number of queries specified by client
+ * @bs           out parameter, bs for export, selected by client
+ *               will be zero if some not critical error occured and error reply
+ *               was sent.
+ *
+ * Returns:
+ *   Err. code < 0 on critical error
+ *   Number of bytes read otherwise (will be equal to length on non critical
+ *     error or if there no queries in request)
+ */
+static int nbd_negotiate_opt_meta_context_start(NBDClient *client, uint32_t opt,
+                                                uint32_t length,
+                                                uint32_t *nb_queries,
+                                                BlockDriverState **bs)
+{
+    int ret;
+    NBDExport *exp;
+    char *export_name;
+    int nb_read = 0;
+
+    if (!client->structured_reply) {
+        uint32_t tail = length - nb_read;
+        LOG("Structured reply is not negotiated");
+
+        if (nbd_negotiate_drop_sync(client->ioc, tail) != tail) {
+            return -EIO;
+        }
+        ret = nbd_negotiate_send_rep_err(client->ioc, NBD_REP_ERR_INVALID, opt,
+                                         "Structured reply is not negotiated");
+        g_free(export_name);
+
+        if (ret < 0) {
+            return ret;
+        } else {
+            *bs = NULL;
+            *nb_queries = 0;
+            return length;
+        }
+    }
+
+    nb_read = nbd_negotiate_read_size_string(client->ioc, &export_name,
+                                             NBD_MAX_NAME_SIZE);
+    if (nb_read < 0) {
+        return nb_read;
+    }
+
+    exp = nbd_export_find(export_name);
+    if (exp == NULL) {
+        uint32_t tail = length - nb_read;
+        LOG("export '%s' is not found", export_name);
+
+        if (nbd_negotiate_drop_sync(client->ioc, tail) != tail) {
+            return -EIO;
+        }
+        ret = nbd_negotiate_send_rep_err(client->ioc, NBD_REP_ERR_INVALID, opt,
+                                         "export '%s' is not found",
+                                         export_name);
+        g_free(export_name);
+
+        if (ret < 0) {
+            return ret;
+        } else {
+            *bs = NULL;
+            *nb_queries = 0;
+            return length;
+        }
+    }
+    g_free(export_name);
+
+    *bs = blk_bs(exp->blk);
+    if (*bs == NULL) {
+        LOG("export without bs");
+        return -EINVAL;
+    }
+
+    if (nbd_negotiate_read(client->ioc, nb_queries,
+                           sizeof(*nb_queries)) != sizeof(*nb_queries))
+    {
+        LOG("read failed");
+        return -EIO;
+    }
+    cpu_to_be32s(nb_queries);
+
+    nb_read += sizeof(*nb_queries);
+
+    return nb_read;
+}
+
+static int nbd_negotiate_list_meta_context(NBDClient *client, uint32_t length)
+{
+    int ret;
+    BlockDriverState *bs;
+    uint32_t nb_queries;
+    int i;
+    int nb_read;
+
+    nb_read = nbd_negotiate_opt_meta_context_start(client,
+                                                   NBD_OPT_LIST_META_CONTEXT,
+                                                   length, &nb_queries, &bs);
+    if (nb_read < 0) {
+        return nb_read;
+    }
+    if (bs == NULL) {
+        /* error reply was already sent by nbd_negotiate_opt_meta_context_start
+         * */
+        return 0;
+    }
+
+    if (nb_queries == 0) {
+        BdrvDirtyBitmap *bm = NULL;
+
+        if (nb_read != length) {
+            return -EINVAL;
+        }
+
+        while ((bm = bdrv_dirty_bitmap_next(bs, bm)) != 0) {
+            nbd_negotiate_send_bitmap(client->ioc, bdrv_dirty_bitmap_name(bm),
+                                      NBD_OPT_LIST_META_CONTEXT);
+        }
+    }
+
+    for (i = 0; i < nb_queries; ++i) {
+        ret = nbd_negotiate_one_meta_query(client->ioc, bs,
+                                           NBD_OPT_LIST_META_CONTEXT, NULL);
+        if (ret < 0) {
+            return ret;
+        }
+
+        nb_read += ret;
+    }
+
+    if (nb_read != length) {
+        return -EINVAL;
+    }
+
+    return nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
+                                  NBD_OPT_LIST_META_CONTEXT);
+}
+
+static int nbd_negotiate_set_meta_context(NBDClient *client, uint32_t length)
+{
+    int ret;
+    BlockDriverState *bs;
+    uint32_t nb_queries;
+    int nb_read;
+
+    nb_read = nbd_negotiate_opt_meta_context_start(client,
+                                                   NBD_OPT_SET_META_CONTEXT,
+                                                   length, &nb_queries, &bs);
+    if (nb_read < 0) {
+        return nb_read;
+    }
+    if (bs == NULL) {
+        /* error reply was already sent by nbd_negotiate_opt_meta_context_start
+         * */
+        return 0;
+    }
+
+    if (nb_queries == 0) {
+        return nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
+                                      NBD_OPT_SET_META_CONTEXT);
+    }
+
+    if (nb_queries > 1) {
+        return nbd_negotiate_send_rep_err(client->ioc, NBD_REP_ERR_TOO_BIG,
+                                          NBD_OPT_SET_META_CONTEXT,
+                                          "Only one exporting context is"
+                                          "supported");
+    }
+
+    ret = nbd_negotiate_one_meta_query(client->ioc, bs,
+                                       NBD_OPT_SET_META_CONTEXT,
+                                       &client->export_bitmap);
+    if (ret < 0) {
+        return ret;
+    }
+
+    return nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
+                                  NBD_OPT_SET_META_CONTEXT);
+}
 /* Process all NBD_OPT_* client option commands.
  * Return -errno on error, 0 on success. */
 static int nbd_negotiate_options(NBDClient *client)
@@ -585,6 +885,20 @@ static int nbd_negotiate_options(NBDClient *client)
                 }
                 break;
 
+            case NBD_OPT_LIST_META_CONTEXT:
+                ret = nbd_negotiate_list_meta_context(client, length);
+                if (ret < 0) {
+                    return ret;
+                }
+                break;
+
+            case NBD_OPT_SET_META_CONTEXT:
+                ret = nbd_negotiate_set_meta_context(client, length);
+                if (ret < 0) {
+                    return ret;
+                }
+                break;
+
             default:
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
@@ -1159,6 +1473,124 @@ static int nbd_co_send_structured_none(NBDClient *client, uint64_t handle)
     return nbd_co_send_buf(client, &chunk, sizeof(chunk));
 }
 
+#define MAX_EXTENT_LENGTH UINT32_MAX
+
+static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
+                            uint64_t length, uint32_t flags)
+{
+    unsigned i = 0;
+    uint32_t big_chunk = (MAX_EXTENT_LENGTH >> 9) << 9;
+    uint32_t big_chunk_be = cpu_to_be32(big_chunk);
+    uint32_t flags_be = cpu_to_be32(flags);
+
+    for (i = 0; i < nb_extents && length > MAX_EXTENT_LENGTH;
+         i++, length -= big_chunk)
+    {
+        extents[i].length = big_chunk_be;
+        extents[i].flags = flags_be;
+    }
+
+    if (length > 0 && i < nb_extents) {
+        extents[i].length = cpu_to_be32(length);
+        extents[i].flags = flags_be;
+        i++;
+    }
+
+    return i;
+}
+
+static unsigned bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
+                                  uint64_t length, NBDExtent *extents,
+                                  unsigned nb_extents)
+{
+    uint64_t begin, end; /* dirty region */
+    uint64_t start_sector = offset >> BDRV_SECTOR_BITS;
+    uint64_t last_sector = (offset + length - 1) >> BDRV_SECTOR_BITS;
+    unsigned i = 0;
+    uint64_t len;
+    uint32_t ma = -1;
+    ma = (ma / bdrv_dirty_bitmap_granularity(bitmap)) *
+        bdrv_dirty_bitmap_granularity(bitmap);
+
+    BdrvDirtyBitmapIter *it = bdrv_dirty_iter_new(bitmap, start_sector);
+
+    assert(nb_extents > 0);
+
+    begin = bdrv_dirty_iter_next(it);
+    if (begin == -1) {
+        begin = last_sector + 1;
+    }
+    if (begin > start_sector) {
+        len = (begin - start_sector) << BDRV_SECTOR_BITS;
+        i += add_extents(extents + i, nb_extents - i, len, 0);
+    }
+
+    while (begin != -1 && begin <= last_sector && i < nb_extents) {
+        end = bdrv_dirty_bitmap_next_zero(bitmap, begin + 1);
+
+        i += add_extents(extents + i, nb_extents - i,
+                         (end - begin) << BDRV_SECTOR_BITS, 1);
+
+        if (end > last_sector || i >= nb_extents) {
+            break;
+        }
+
+        bdrv_set_dirty_iter(it, end);
+        begin = bdrv_dirty_iter_next(it);
+        if (begin == -1) {
+            begin = last_sector + 1;
+        }
+        if (begin > end) {
+            i += add_extents(extents + i, nb_extents - i,
+                             (begin - end) << BDRV_SECTOR_BITS, 0);
+        }
+    }
+
+    bdrv_dirty_iter_free(it);
+
+    extents[0].length =
+        cpu_to_be32(be32_to_cpu(extents[0].length) -
+                    (offset - (start_sector << BDRV_SECTOR_BITS)));
+
+    return i;
+}
+
+static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
+                               NBDExtent *extents, unsigned nb_extents,
+                               uint32_t context_id)
+{
+    NBDStructuredMeta chunk;
+
+    struct iovec iov[] = {
+        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+        {.iov_base = extents, .iov_len = nb_extents * sizeof(extents[0])}
+    };
+
+    set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_BLOCK_STATUS,
+                 handle, sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len);
+    stl_be_p(&chunk.context_id, context_id);
+
+    return nbd_co_send_iov(client, iov, 2);
+}
+
+static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
+                              BdrvDirtyBitmap *bitmap, uint64_t offset,
+                              uint64_t length, uint32_t context_id)
+{
+    int ret;
+    unsigned nb_extents;
+    NBDExtent *extents = g_new(NBDExtent, NBD_MAX_BITMAP_EXTENTS);
+
+    nb_extents = bitmap_to_extents(bitmap, offset, length, extents,
+                                   NBD_MAX_BITMAP_EXTENTS);
+
+    ret = nbd_co_send_extents(client, handle, extents, nb_extents, context_id);
+
+    g_free(extents);
+
+    return ret;
+}
+
 /* Collect a client request.  Return 0 if request looks valid, -EAGAIN
  * to keep trying the collection, -EIO to drop connection right away,
  * and any other negative value to report an error to the client
@@ -1437,6 +1869,19 @@ static void nbd_trip(void *opaque)
             goto out;
         }
         break;
+    case NBD_CMD_BLOCK_STATUS:
+        TRACE("Request type is BLOCK_STATUS");
+        if (client->export_bitmap == NULL) {
+            reply.error = EINVAL;
+            goto error_reply;
+        }
+        ret = nbd_co_send_bitmap(req->client, request.handle,
+                                 client->export_bitmap, request.from,
+                                 request.len, 0);
+        if (ret < 0) {
+            goto out;
+        }
+        break;
     default:
         LOG("invalid request type (%" PRIu32 ") received", request.type);
         reply.error = EINVAL;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 12/18] nbd: BLOCK_STATUS for bitmap export: client part
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 11/18] nbd: BLOCK_STATUS for bitmap export: server part Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-08 23:06   ` Eric Blake
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 13/18] nbd: add nbd_dirty_bitmap_load Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/nbd-client.h   |   6 +++
 block/nbd.c          |   9 +++-
 include/block/nbd.h  |   6 ++-
 nbd/client.c         | 103 +++++++++++++++++++++++++++++++++++-
 nbd/server.c         |   2 -
 qapi/block-core.json |   5 +-
 qemu-nbd.c           |   2 +-
 8 files changed, 270 insertions(+), 9 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index ff96bd1635..c7eb21fb02 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -388,6 +388,147 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int count)
 
 }
 
+static inline ssize_t read_sync(QIOChannel *ioc, void *buffer, size_t size)
+{
+    struct iovec iov = { .iov_base = buffer, .iov_len = size };
+    /* Sockets are kept in blocking mode in the negotiation phase.  After
+     * that, a non-readable socket simply means that another thread stole
+     * our request/reply.  Synchronization is done with recv_coroutine, so
+     * that this is coroutine-safe.
+     */
+    return nbd_wr_syncv(ioc, &iov, 1, size, true);
+}
+
+static int nbd_client_co_cmd_block_status(BlockDriverState *bs, uint64_t offset,
+                                          uint64_t bytes, NBDExtent **pextents,
+                                          unsigned *nb_extents)
+{
+    int64_t ret;
+    NBDReply reply;
+    uint32_t context_id;
+    int64_t nb, i;
+    NBDExtent *extents = NULL;
+    NBDClientSession *client = nbd_get_client_session(bs);
+    NBDRequest request = {
+        .type = NBD_CMD_BLOCK_STATUS,
+        .from = offset,
+        .len = bytes,
+        .flags = 0,
+    };
+
+    nbd_coroutine_start(client, &request);
+
+    ret = nbd_co_send_request(bs, &request, NULL);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    nbd_co_receive_reply(client, &request, &reply, NULL);
+    if (reply.error != 0) {
+        ret = -reply.error;
+    }
+    if (reply.simple) {
+        ret = -EINVAL;
+        goto fail;
+    }
+    if (reply.error != 0) {
+        ret = -reply.error;
+        goto fail;
+    }
+    if (reply.type != NBD_REPLY_TYPE_BLOCK_STATUS) {
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    read_sync(client->ioc, &context_id, sizeof(context_id));
+    cpu_to_be32s(&context_id);
+    if (client->meta_data_context_id != context_id) {
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    nb = (reply.length - sizeof(context_id)) / sizeof(NBDExtent);
+    extents = g_new(NBDExtent, nb);
+    if (read_sync(client->ioc, extents, nb * sizeof(NBDExtent)) !=
+        nb * sizeof(NBDExtent))
+    {
+        ret = -EIO;
+        goto fail;
+    }
+
+    if (!(reply.flags && NBD_REPLY_FLAG_DONE)) {
+        nbd_co_receive_reply(client, &request, &reply, NULL);
+        if (reply.simple) {
+            ret = -EINVAL;
+            goto fail;
+        }
+        if (reply.error != 0) {
+            ret = -reply.error;
+            goto fail;
+        }
+        if (reply.type != NBD_REPLY_TYPE_NONE ||
+            !(reply.flags && NBD_REPLY_FLAG_DONE)) {
+            ret = -EINVAL;
+            goto fail;
+        }
+    }
+
+    for (i = 0; i < nb; ++i) {
+        cpu_to_be32s(&extents[i].length);
+        cpu_to_be32s(&extents[i].flags);
+    }
+
+    *pextents = extents;
+    *nb_extents = nb;
+    nbd_coroutine_end(client, &request);
+    return 0;
+
+fail:
+    g_free(extents);
+    nbd_coroutine_end(client, &request);
+    return ret;
+}
+
+/* nbd_client_co_load_bitmap_part() returns end of set area, i.e. first next
+ * byte of unknown status (may be >= disk size, which means that the bitmap was
+ * set up to the end).
+ */
+int64_t nbd_client_co_load_bitmap_part(BlockDriverState *bs, uint64_t offset,
+                                       uint64_t bytes, BdrvDirtyBitmap *bitmap)
+{
+    int64_t ret;
+    uint64_t start_byte;
+    uint32_t nb_extents;
+    int64_t i, start_sector, last_sector, nr_sectors;
+    NBDExtent *extents = NULL;
+
+    ret = nbd_client_co_cmd_block_status(bs, offset, bytes, &extents,
+                                         &nb_extents);
+    if (ret < 0) {
+        return ret;
+    }
+
+    start_byte = offset;
+    for (i = 0; i < nb_extents; ++i) {
+        if (extents[i].flags == 1) {
+            start_sector = start_byte >> BDRV_SECTOR_BITS;
+            last_sector =
+                (start_byte + extents[i].length - 1) >> BDRV_SECTOR_BITS;
+            nr_sectors = last_sector - start_sector + 1;
+
+            bdrv_set_dirty_bitmap(bitmap, start_sector, nr_sectors);
+        }
+
+        start_byte += extents[i].length;
+    }
+
+    g_free(extents);
+
+    return ROUND_UP((uint64_t)start_byte,
+                    (uint64_t)bdrv_dirty_bitmap_granularity(bitmap));
+}
+
+
 void nbd_client_detach_aio_context(BlockDriverState *bs)
 {
     aio_set_fd_handler(bdrv_get_aio_context(bs),
@@ -421,6 +562,7 @@ int nbd_client_init(BlockDriverState *bs,
                     const char *export,
                     QCryptoTLSCreds *tlscreds,
                     const char *hostname,
+                    const char *bitmap_name,
                     Error **errp)
 {
     NBDClientSession *client = nbd_get_client_session(bs);
@@ -435,7 +577,9 @@ int nbd_client_init(BlockDriverState *bs,
                                 tlscreds, hostname,
                                 &client->ioc,
                                 &client->size,
-                                &client->structured_reply, errp);
+                                &client->structured_reply,
+                                bitmap_name,
+                                &client->bitmap_ok, errp);
     if (ret < 0) {
         logout("Failed to negotiate with the NBD server\n");
         return ret;
diff --git a/block/nbd-client.h b/block/nbd-client.h
index cba1f965bf..e5ec89b9f6 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -34,6 +34,8 @@ typedef struct NBDClientSession {
     bool is_unix;
 
     bool structured_reply;
+    bool bitmap_ok;
+    uint32_t meta_data_context_id;
 } NBDClientSession;
 
 NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
@@ -43,6 +45,7 @@ int nbd_client_init(BlockDriverState *bs,
                     const char *export_name,
                     QCryptoTLSCreds *tlscreds,
                     const char *hostname,
+                    const char *bitmap_name,
                     Error **errp);
 void nbd_client_close(BlockDriverState *bs);
 
@@ -54,9 +57,12 @@ int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
                                 int count, BdrvRequestFlags flags);
 int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
                          uint64_t bytes, QEMUIOVector *qiov, int flags);
+int64_t nbd_client_co_load_bitmap_part(BlockDriverState *bs, uint64_t offset,
+                                       uint64_t bytes, BdrvDirtyBitmap *bitmap);
 
 void nbd_client_detach_aio_context(BlockDriverState *bs);
 void nbd_client_attach_aio_context(BlockDriverState *bs,
                                    AioContext *new_context);
 
+
 #endif /* NBD_CLIENT_H */
diff --git a/block/nbd.c b/block/nbd.c
index 35f24be069..63bc3f04d0 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -382,6 +382,11 @@ static QemuOptsList nbd_runtime_opts = {
             .type = QEMU_OPT_STRING,
             .help = "ID of the TLS credentials to use",
         },
+        {
+            .name = "bitmap",
+            .type = QEMU_OPT_STRING,
+            .help = "Name of dirty bitmap to export",
+        },
     },
 };
 
@@ -440,8 +445,8 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* NBD handshake */
-    ret = nbd_client_init(bs, sioc, s->export,
-                          tlscreds, hostname, errp);
+    ret = nbd_client_init(bs, sioc, s->export, tlscreds, hostname,
+                          qemu_opt_get(opts, "bitmap"), errp);
  error:
     if (sioc) {
         object_unref(OBJECT(sioc));
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 516a24765c..08d5e51f21 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -181,6 +181,8 @@ enum {
 #define NBD_REPLY_TYPE_ERROR ((1 << 15) + 1)
 #define NBD_REPLY_TYPE_ERROR_OFFSET ((1 << 15) + 2)
 
+#define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8) /* 1 mb of extents data */
+
 ssize_t nbd_wr_syncv(QIOChannel *ioc,
                      struct iovec *iov,
                      size_t niov,
@@ -189,7 +191,9 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
 int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
                           QIOChannel **outioc,
-                          off_t *size, bool *structured_reply, Error **errp);
+                          off_t *size, bool *structured_reply,
+                          const char *bitmap_name, bool *bitmap_ok,
+                          Error **errp);
 int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
 ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
 int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
diff --git a/nbd/client.c b/nbd/client.c
index 9225f7e30d..c3817b84fa 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -472,10 +472,101 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
     return QIO_CHANNEL(tioc);
 }
 
+static int nbd_receive_query_meta_context(QIOChannel *ioc, const char *export,
+                                          const char *context, bool *ok,
+                                          Error **errp)
+{
+    int ret;
+    nbd_opt_reply reply;
+    size_t export_len = strlen(export);
+    size_t context_len = strlen(context);
+    size_t data_len = 4 + export_len + 4 + 4 + context_len;
+
+    char *data = g_malloc(data_len);
+    char *p = data;
+    int nb_reps = 0;
+
+    *ok = false;
+    stl_be_p(p, export_len);
+    memcpy(p += 4, export, export_len);
+    stl_be_p(p += export_len, 1);
+    stl_be_p(p += 4, context_len);
+    memcpy(p += 4, context, context_len);
+
+    TRACE("Requesting set_meta_context option from server");
+    ret = nbd_send_option_request(ioc, NBD_OPT_SET_META_CONTEXT, data_len, data,
+                                errp);
+    if (ret < 0) {
+        goto out;
+    }
+
+    while (true) {
+        uint32_t context_id;
+        char *context_name;
+        size_t len;
+
+        ret = nbd_receive_option_reply(ioc, NBD_OPT_SET_META_CONTEXT, &reply,
+                                       errp);
+        if (ret < 0) {
+            goto out;
+        }
+
+        ret = nbd_handle_reply_err(ioc, &reply, errp);
+        if (ret <= 0) {
+            goto out;
+        }
+
+        if (reply.type != NBD_REP_META_CONTEXT) {
+            break;
+        }
+
+        if (read_sync(ioc, &context_id, sizeof(context_id)) !=
+            sizeof(context_id))
+        {
+            ret = -EIO;
+            goto out;
+        }
+
+        be32_to_cpus(&context_id);
+
+        len = reply.length - sizeof(context_id);
+        context_name = g_malloc(len + 1);
+        if (read_sync(ioc, context_name, len) != len) {
+
+            ret = -EIO;
+            goto out;
+        }
+        context_name[len] = '\0';
+
+        TRACE("set meta: %u %s", context_id, context_name);
+
+        nb_reps++;
+    }
+
+    *ok = nb_reps == 1 && reply.type == NBD_REP_ACK;
+
+out:
+    g_free(data);
+    return ret;
+}
+
+static int nbd_receive_query_bitmap(QIOChannel *ioc, const char *export,
+                                    const char *bitmap, bool *ok, Error **errp)
+{
+    char *context = g_strdup_printf("%s:%s", NBD_META_NS_BITMAPS, bitmap);
+    int ret = nbd_receive_query_meta_context(ioc, export, context, ok, errp);
+
+    g_free(context);
+
+    return ret;
+}
+
 int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
                           QIOChannel **outioc,
-                          off_t *size, bool *structured_reply, Error **errp)
+                          off_t *size, bool *structured_reply,
+                          const char *bitmap_name, bool *bitmap_ok,
+                          Error **errp)
 {
     char buf[256];
     uint64_t magic, s;
@@ -589,6 +680,16 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                     nbd_receive_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
                                               false, NULL) == 0;
             }
+
+            if (!!structured_reply && *structured_reply && !!bitmap_name) {
+                int ret;
+                assert(!!bitmap_ok);
+                ret = nbd_receive_query_bitmap(ioc, name, bitmap_name,
+                                               bitmap_ok, errp) == 0;
+                if (ret < 0) {
+                    goto fail;
+                }
+            }
         }
         /* write the export name request */
         if (nbd_send_option_request(ioc, NBD_OPT_EXPORT_NAME, -1, name,
diff --git a/nbd/server.c b/nbd/server.c
index 0b7b7230df..c96dda4086 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -21,8 +21,6 @@
 #include "qapi/error.h"
 #include "nbd-internal.h"
 
-#define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8) /* 1 mb of extents data */
-
 static int system_errno_to_nbd_errno(int err)
 {
     switch (err) {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6b42216960..0e15c73774 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2331,12 +2331,15 @@
 #
 # @tls-creds:   #optional TLS credentials ID
 #
+# @bitmap:   #optional Dirty bitmap name to export (vz-7.4)
+#
 # Since: 2.8
 ##
 { 'struct': 'BlockdevOptionsNbd',
   'data': { 'server': 'SocketAddress',
             '*export': 'str',
-            '*tls-creds': 'str' } }
+            '*tls-creds': 'str',
+            '*bitmap': 'str'} }
 
 ##
 # @BlockdevOptionsRaw:
diff --git a/qemu-nbd.c b/qemu-nbd.c
index de0099e333..cf45444faf 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -272,7 +272,7 @@ static void *nbd_client_thread(void *arg)
 
     ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL, &nbdflags,
                                 NULL, NULL, NULL,
-                                &size, NULL, &local_error);
+                                &size, NULL, NULL, NULL, &local_error);
     if (ret < 0) {
         if (local_error) {
             error_report_err(local_error);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 13/18] nbd: add nbd_dirty_bitmap_load
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (11 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 12/18] nbd: BLOCK_STATUS for bitmap export: client part Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 14/18] qmp: add x-debug-block-dirty-bitmap-sha256 Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Realize bdrv_dirty_bitmap_load interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 63bc3f04d0..b2b6fd1cf9 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -557,6 +557,35 @@ static void nbd_refresh_filename(BlockDriverState *bs, QDict *options)
     bs->full_open_options = opts;
 }
 
+static BdrvDirtyBitmap *nbd_dirty_bitmap_load(BlockDriverState *bs,
+                                              const char *name, Error **errp)
+{
+    int64_t offset = 0, end;
+    uint32_t ma = -1;
+    BdrvDirtyBitmap *bitmap =
+        bdrv_create_dirty_bitmap(bs, bdrv_get_default_bitmap_granularity(bs),
+                                 name, errp);
+    if (bitmap == NULL) {
+        return NULL;
+    }
+
+    end = bdrv_dirty_bitmap_size(bitmap) << BDRV_SECTOR_BITS;
+
+    while (offset < end) {
+        offset = nbd_client_co_load_bitmap_part(bs, offset,
+                                                MIN(end - offset, ma), bitmap);
+        if (offset < 0) {
+            goto fail;
+        }
+    }
+
+    return bitmap;
+
+fail:
+    bdrv_release_dirty_bitmap(bs, bitmap);
+    return NULL;
+}
+
 static BlockDriver bdrv_nbd = {
     .format_name                = "nbd",
     .protocol_name              = "nbd",
@@ -574,6 +603,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_dirty_bitmap_load     = nbd_dirty_bitmap_load,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -593,6 +623,7 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_dirty_bitmap_load     = nbd_dirty_bitmap_load,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -612,6 +643,7 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_dirty_bitmap_load     = nbd_dirty_bitmap_load,
 };
 
 static void bdrv_nbd_init(void)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 14/18] qmp: add x-debug-block-dirty-bitmap-sha256
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (12 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 13/18] nbd: add nbd_dirty_bitmap_load Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 15/18] qmp: add block-dirty-bitmap-load Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/dirty-bitmap.c         |  5 +++++
 blockdev.c                   | 29 +++++++++++++++++++++++++++++
 include/block/dirty-bitmap.h |  2 ++
 include/qemu/hbitmap.h       |  8 ++++++++
 qapi/block-core.json         | 27 +++++++++++++++++++++++++++
 tests/Makefile.include       |  2 +-
 util/hbitmap.c               | 11 +++++++++++
 7 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 394d4328d5..a4f77dcf73 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -598,3 +598,8 @@ BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, const char *name,
 
     return lbco.ret;
 }
+
+char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap *bitmap, Error **errp)
+{
+    return hbitmap_sha256(bitmap->bitmap, errp);
+}
diff --git a/blockdev.c b/blockdev.c
index 245e1e1d17..1bc3fe386a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2790,6 +2790,35 @@ void qmp_block_dirty_bitmap_clear(const char *node, const char *name,
     aio_context_release(aio_context);
 }
 
+BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
+                                                              const char *name,
+                                                              Error **errp)
+{
+    AioContext *aio_context;
+    BdrvDirtyBitmap *bitmap;
+    BlockDriverState *bs;
+    BlockDirtyBitmapSha256 *ret = NULL;
+    char *sha256;
+
+    bitmap = block_dirty_bitmap_lookup(node, name, &bs, &aio_context, errp);
+    if (!bitmap || !bs) {
+        return NULL;
+    }
+
+    sha256 = bdrv_dirty_bitmap_sha256(bitmap, errp);
+    if (sha256 == NULL) {
+        goto out;
+    }
+
+    ret = g_new(BlockDirtyBitmapSha256, 1);
+    ret->sha256 = sha256;
+
+out:
+    aio_context_release(aio_context);
+
+    return ret;
+}
+
 void hmp_drive_del(Monitor *mon, const QDict *qdict)
 {
     const char *id = qdict_get_str(qdict, "id");
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index c0c70a8c67..0efb5591d6 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -80,4 +80,6 @@ BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
 BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, const char *name,
                                         Error **errp);
 
+char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap *bitmap, Error **errp);
+
 #endif
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index d6fe553b12..42685b4289 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -225,6 +225,14 @@ void hbitmap_deserialize_zeroes(HBitmap *hb, uint64_t start, uint64_t count,
 void hbitmap_deserialize_finish(HBitmap *hb);
 
 /**
+ * hbitmap_sha256:
+ * @bitmap: HBitmap to operate on.
+ *
+ * Returns SHA256 hash of the last level.
+ */
+char *hbitmap_sha256(const HBitmap *bitmap, Error **errp);
+
+/**
  * hbitmap_free:
  * @hb: HBitmap to operate on.
  *
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0e15c73774..b258c45595 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1280,6 +1280,33 @@
   'data': 'BlockDirtyBitmap' }
 
 ##
+# @BlockDirtyBitmapSha256:
+#
+# SHA256 hash of dirty bitmap data
+#
+# @sha256: bitmap SHA256 hash
+#
+# Since: 2.9
+##
+  { 'struct': 'BlockDirtyBitmapSha256',
+    'data': {'sha256': 'str'} }
+
+##
+# @x-debug-block-dirty-bitmap-sha256:
+#
+# Get bitmap SHA256
+#
+# Returns: BlockDirtyBitmapSha256 on success
+#          If @node is not a valid block device, DeviceNotFound
+#          If @name is not found or if hashing is failed, GenericError with an
+#          explanation
+#
+# Since: 2.9
+##
+  { 'command': 'x-debug-block-dirty-bitmap-sha256',
+    'data': 'BlockDirtyBitmap', 'returns': 'BlockDirtyBitmapSha256' }
+
+##
 # @blockdev-mirror:
 #
 # Start mirroring a block device's writes to a new destination.
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 4841d582a1..0ee7e30a63 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -497,7 +497,7 @@ tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) $(test-u
 tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o $(test-block-obj-y) $(test-util-obj-y)
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
-tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y)
+tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) $(test-crypto-obj-y)
 tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o
 tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o page_cache.o $(test-util-obj-y)
 tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o
diff --git a/util/hbitmap.c b/util/hbitmap.c
index b850c2baf5..64078d94a1 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -13,6 +13,7 @@
 #include "qemu/hbitmap.h"
 #include "qemu/host-utils.h"
 #include "trace.h"
+#include "crypto/hash.h"
 
 /* HBitmaps provides an array of bits.  The bits are stored as usual in an
  * array of unsigned longs, but HBitmap is also optimized to provide fast
@@ -699,3 +700,13 @@ void hbitmap_free_meta(HBitmap *hb)
     hbitmap_free(hb->meta);
     hb->meta = NULL;
 }
+
+char *hbitmap_sha256(const HBitmap *bitmap, Error **errp)
+{
+    size_t size = bitmap->sizes[HBITMAP_LEVELS - 1] * sizeof(unsigned long);
+    char *data = (char *)bitmap->levels[HBITMAP_LEVELS - 1];
+    char *hash = NULL;
+    qcrypto_hash_digest(QCRYPTO_HASH_ALG_SHA256, data, size, &hash, errp);
+
+    return hash;
+}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 15/18] qmp: add block-dirty-bitmap-load
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (13 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 14/18] qmp: add x-debug-block-dirty-bitmap-sha256 Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-07 12:04   ` Fam Zheng
                     ` (2 more replies)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 16/18] iotests: add test for nbd dirty bitmap export Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  18 siblings, 3 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

For loading dirty bitmap from nbd server. Or for underlying storages for
other formats.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockdev.c           | 28 ++++++++++++++++++++++++++++
 qapi/block-core.json | 14 ++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 1bc3fe386a..2529943e7f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2790,6 +2790,34 @@ void qmp_block_dirty_bitmap_clear(const char *node, const char *name,
     aio_context_release(aio_context);
 }
 
+void qmp_block_dirty_bitmap_load(const char *node, const char *name,
+                                 Error **errp)
+{
+    AioContext *aio_context;
+    BlockDriverState *bs;
+
+    if (!node) {
+        error_setg(errp, "Node cannot be NULL");
+        return;
+    }
+    if (!name) {
+        error_setg(errp, "Bitmap name cannot be NULL");
+        return;
+    }
+    bs = bdrv_lookup_bs(node, node, NULL);
+    if (!bs) {
+        error_setg(errp, "Node '%s' not found", node);
+        return;
+    }
+
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+
+    bdrv_load_dirty_bitmap(bs, name, errp);
+
+    aio_context_release(aio_context);
+}
+
 BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
                                                               const char *name,
                                                               Error **errp)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index b258c45595..63777ea55b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1280,6 +1280,20 @@
   'data': 'BlockDirtyBitmap' }
 
 ##
+# @block-dirty-bitmap-load:
+#
+# Load a dirty bitmap from the storage (qcow2 file or nbd export)
+#
+# Returns: nothing on success
+#          If @node is not a valid block device, DeviceNotFound
+#          If @name is not found, GenericError with an explanation
+#
+# Since: vz-7.4
+##
+  { 'command': 'block-dirty-bitmap-load',
+    'data': 'BlockDirtyBitmap' }
+
+##
 # @BlockDirtyBitmapSha256:
 #
 # SHA256 hash of dirty bitmap data
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 16/18] iotests: add test for nbd dirty bitmap export
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (14 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 15/18] qmp: add block-dirty-bitmap-load Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 17/18] nbd: BLOCK_STATUS for standard get_block_status function: server part Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/180     | 133 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/180.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 139 insertions(+)
 create mode 100755 tests/qemu-iotests/180
 create mode 100644 tests/qemu-iotests/180.out

diff --git a/tests/qemu-iotests/180 b/tests/qemu-iotests/180
new file mode 100755
index 0000000000..e8238a064a
--- /dev/null
+++ b/tests/qemu-iotests/180
@@ -0,0 +1,133 @@
+#!/usr/bin/env python
+#
+# Test case for NBD's bitmap export
+# Copyright (C) 2017 Virtuozzo.
+#
+# derived from io test 147, original copyright:
+# Copyright (C) 2016 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import socket
+import stat
+import time
+import iotests
+from iotests import cachemode, imgfmt, qemu_img, qemu_nbd
+
+NBD_PORT = 10811
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+unix_socket = os.path.join(iotests.test_dir, 'nbd.socket')
+
+class NBDBlockdevAddBase(iotests.QMPTestCase):
+    def blockdev_add_options(self, address, export=None):
+        options = { 'node-name': 'nbd-blockdev',
+                    'driver': 'raw',
+                    'file': {
+                        'driver': 'nbd',
+                        'server': address,
+                        'bitmap': 'mega'
+                    } }
+        if export is not None:
+            options['file']['export'] = export
+        return options
+
+    def client_test(self, filename, address, sha256, export=None):
+        bao = self.blockdev_add_options(address, export)
+        result = self.vm.qmp('blockdev-add', **bao)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('query-named-block-nodes')
+        for node in result['return']:
+            if node['node-name'] == 'nbd-blockdev':
+                if isinstance(filename, str):
+                    self.assert_qmp(node, 'image/filename', filename)
+                else:
+                    self.assert_json_filename_equal(node['image']['filename'],
+                                                    filename)
+                break
+
+        result = self.vm.qmp('block-dirty-bitmap-load',
+                             node='nbd-blockdev', name='mega')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('x-debug-block-dirty-bitmap-sha256',
+                               node='nbd-blockdev', name='mega')
+        self.assert_qmp(result, 'return/sha256', sha256);
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, '0x400000000')
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+        self.server = iotests.VM('.server')
+        self.server.add_drive_raw('if=none,id=nbd-export,' +
+                                  'file=%s,' % test_img +
+                                  'format=%s,' % imgfmt +
+                                  'cache=%s' % cachemode)
+        self.server.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        self.server.shutdown()
+        os.remove(test_img)
+
+    def _server_up(self, address):
+        result = self.server.qmp('nbd-server-start', addr=address)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.server.qmp('nbd-server-add', device='nbd-export')
+        self.assert_qmp(result, 'return', {})
+
+    def _server_down(self):
+        result = self.server.qmp('nbd-server-stop')
+        self.assert_qmp(result, 'return', {})
+
+    def test_export_bitmap(self):
+        address = { 'type': 'inet',
+                    'data': {
+                        'host': 'localhost',
+                        'port': str(NBD_PORT)
+                    } }
+
+        granularity = 65536
+        regions = [
+            { 'start': 0,           'count': 0x100000 },
+            { 'start': 0x100000000, 'count': 0x200000  },
+            { 'start': 0x399900000, 'count': 0x100000  }
+            ]
+
+        result = self.server.qmp('block-dirty-bitmap-add', node='nbd-export',
+                                 name='mega', granularity=granularity)
+        self.assert_qmp(result, 'return', {});
+
+        for r in regions:
+            self.server.hmp_qemu_io('nbd-export',
+                                    'write %d %d' % (r['start'], r['count']))
+
+        result = self.server.qmp('x-debug-block-dirty-bitmap-sha256',
+                               node='nbd-export', name='mega')
+        sha256 = result['return']['sha256']
+
+        self._server_up(address)
+        self.client_test('nbd://localhost:%i/nbd-export' % NBD_PORT,
+                         address, sha256, 'nbd-export')
+        #self._server_down()
+
+if __name__ == '__main__':
+    # Need to support image creation
+    iotests.main(supported_fmts=['vpc', 'parallels', 'qcow', 'vdi', 'qcow2',
+                                 'vmdk', 'raw', 'vhdx', 'qed'])
diff --git a/tests/qemu-iotests/180.out b/tests/qemu-iotests/180.out
new file mode 100644
index 0000000000..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/180.out
@@ -0,0 +1,5 @@
+.
+----------------------------------------------------------------------
+Ran 1 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 866c1a032d..9d06d3f862 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -165,3 +165,4 @@
 170 rw auto quick
 171 rw auto quick
 172 auto
+180 auto
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 17/18] nbd: BLOCK_STATUS for standard get_block_status function: server part
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (15 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 16/18] iotests: add test for nbd dirty bitmap export Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-09 15:38   ` Eric Blake
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 18/18] nbd: BLOCK_STATUS for standard get_block_status function: client part Vladimir Sementsov-Ogievskiy
  2017-02-15 17:05 ` [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Paolo Bonzini
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Minimal realization: only one extent in server answer is supported.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h |  3 ++
 nbd/nbd-internal.h  |  1 +
 nbd/server.c        | 80 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 08d5e51f21..69aee1eda1 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -181,6 +181,9 @@ enum {
 #define NBD_REPLY_TYPE_ERROR ((1 << 15) + 1)
 #define NBD_REPLY_TYPE_ERROR_OFFSET ((1 << 15) + 2)
 
+#define NBD_STATE_HOLE 1
+#define NBD_STATE_ZERO (1 << 1)
+
 #define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8) /* 1 mb of extents data */
 
 ssize_t nbd_wr_syncv(QIOChannel *ioc,
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index fbbcf69925..f89baa6049 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -86,6 +86,7 @@
 #define NBD_OPT_LIST_META_CONTEXT (9)
 #define NBD_OPT_SET_META_CONTEXT  (10)
 
+#define NBD_META_NS_BASE "base"
 #define NBD_META_NS_BITMAPS "qemu-dirty-bitmap"
 
 /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
diff --git a/nbd/server.c b/nbd/server.c
index c96dda4086..cb87481382 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -103,6 +103,7 @@ struct NBDClient {
 
     bool structured_reply;
     BdrvDirtyBitmap *export_bitmap;
+    bool export_block_status;
 };
 
 /* That's all folks */
@@ -506,8 +507,24 @@ static int nbd_negotiate_one_bitmap_query(QIOChannel *ioc, BlockDriverState *bs,
     return 0;
 }
 
+static int nbd_negotiate_one_base_query(QIOChannel *ioc, BlockDriverState *bs,
+                                        uint32_t opt, const char *query,
+                                        bool *block_status)
+{
+    if (query[0] == '\0' || strcmp(query, "allocation") == 0) {
+        if (block_status != NULL) {
+            *block_status = true;
+        }
+
+        return nbd_negotiate_send_meta_context(ioc, "base:allocation", opt);
+    }
+
+    return 0;
+}
+
 static int nbd_negotiate_one_meta_query(QIOChannel *ioc, BlockDriverState *bs,
-                                        uint32_t opt, BdrvDirtyBitmap **bitmap)
+                                        uint32_t opt, BdrvDirtyBitmap **bitmap,
+                                        bool *block_status)
 {
     int ret = 0, nb_read;
     char *query, *colon, *namespace, *subquery;
@@ -530,6 +547,9 @@ static int nbd_negotiate_one_meta_query(QIOChannel *ioc, BlockDriverState *bs,
 
     if (strcmp(namespace, NBD_META_NS_BITMAPS) == 0) {
         ret = nbd_negotiate_one_bitmap_query(ioc, bs, opt, subquery, bitmap);
+    } else if (strcmp(namespace, NBD_META_NS_BASE) == 0) {
+        ret = nbd_negotiate_one_base_query(ioc, bs, opt, subquery,
+                                           block_status);
     }
 
 out:
@@ -663,7 +683,8 @@ static int nbd_negotiate_list_meta_context(NBDClient *client, uint32_t length)
 
     for (i = 0; i < nb_queries; ++i) {
         ret = nbd_negotiate_one_meta_query(client->ioc, bs,
-                                           NBD_OPT_LIST_META_CONTEXT, NULL);
+                                           NBD_OPT_LIST_META_CONTEXT, NULL,
+                                           NULL);
         if (ret < 0) {
             return ret;
         }
@@ -712,7 +733,8 @@ static int nbd_negotiate_set_meta_context(NBDClient *client, uint32_t length)
 
     ret = nbd_negotiate_one_meta_query(client->ioc, bs,
                                        NBD_OPT_SET_META_CONTEXT,
-                                       &client->export_bitmap);
+                                       &client->export_bitmap,
+                                       &client->export_block_status);
     if (ret < 0) {
         return ret;
     }
@@ -1497,6 +1519,30 @@ static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
     return i;
 }
 
+static int blockstatus_to_extent(BlockDriverState *bs, uint64_t offset,
+                                  uint64_t length, NBDExtent *extent)
+{
+    BlockDriverState *file;
+    uint64_t start_sector = offset >> BDRV_SECTOR_BITS;
+    uint64_t last_sector = (offset + length - 1) >> BDRV_SECTOR_BITS;
+    uint64_t begin = start_sector;
+    uint64_t end = last_sector + 1;
+
+    int nb = MIN(INT_MAX, end - begin);
+    int64_t ret = bdrv_get_block_status_above(bs, NULL, begin, nb, &nb, &file);
+    if (ret < 0) {
+        return ret;
+    }
+
+    extent->flags =
+        cpu_to_be32((ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
+                    (ret & BDRV_BLOCK_ZERO      ? NBD_STATE_ZERO : 0));
+    extent->length = cpu_to_be32((nb << BDRV_SECTOR_BITS) -
+                                 (offset - (start_sector << BDRV_SECTOR_BITS)));
+
+    return 0;
+}
+
 static unsigned bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
                                   uint64_t length, NBDExtent *extents,
                                   unsigned nb_extents)
@@ -1589,6 +1635,21 @@ static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
     return ret;
 }
 
+static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
+                                    BlockDriverState *bs, uint64_t offset,
+                                    uint64_t length, uint32_t context_id)
+{
+    int ret;
+    NBDExtent extent;
+
+    ret = blockstatus_to_extent(bs, offset, length, &extent);
+    if (ret < 0) {
+        return nbd_co_send_structured_error(client, handle, -ret);
+    }
+
+    return nbd_co_send_extents(client, handle, &extent, 1, context_id);
+}
+
 /* Collect a client request.  Return 0 if request looks valid, -EAGAIN
  * to keep trying the collection, -EIO to drop connection right away,
  * and any other negative value to report an error to the client
@@ -1869,13 +1930,18 @@ static void nbd_trip(void *opaque)
         break;
     case NBD_CMD_BLOCK_STATUS:
         TRACE("Request type is BLOCK_STATUS");
-        if (client->export_bitmap == NULL) {
+        if (!!client->export_bitmap) {
+            ret = nbd_co_send_bitmap(req->client, request.handle,
+                                     client->export_bitmap, request.from,
+                                     request.len, 0);
+        } else if (client->export_block_status) {
+            ret = nbd_co_send_block_status(req->client, request.handle,
+                                           blk_bs(exp->blk), request.from,
+                                           request.len, 0);
+        } else {
             reply.error = EINVAL;
             goto error_reply;
         }
-        ret = nbd_co_send_bitmap(req->client, request.handle,
-                                 client->export_bitmap, request.from,
-                                 request.len, 0);
         if (ret < 0) {
             goto out;
         }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 18/18] nbd: BLOCK_STATUS for standard get_block_status function: client part
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (16 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 17/18] nbd: BLOCK_STATUS for standard get_block_status function: server part Vladimir Sementsov-Ogievskiy
@ 2017-02-03 15:47 ` Vladimir Sementsov-Ogievskiy
  2017-02-09 16:00   ` Eric Blake
  2017-02-15 17:05 ` [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Paolo Bonzini
  18 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-03 15:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, den, stefanha

Minimal realization: only first extent from the answer is used.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.c  | 41 ++++++++++++++++++++++++++++++++++++++++-
 block/nbd-client.h  |  5 +++++
 block/nbd.c         |  3 +++
 include/block/nbd.h |  2 +-
 nbd/client.c        | 23 ++++++++++++++++-------
 qemu-nbd.c          |  2 +-
 6 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index c7eb21fb02..e419c1497c 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -528,6 +528,44 @@ int64_t nbd_client_co_load_bitmap_part(BlockDriverState *bs, uint64_t offset,
                     (uint64_t)bdrv_dirty_bitmap_granularity(bitmap));
 }
 
+int64_t coroutine_fn nbd_client_co_get_block_status(BlockDriverState *bs,
+                                                    int64_t sector_num,
+                                                    int nb_sectors, int *pnum,
+                                                    BlockDriverState **file)
+{
+    int64_t ret;
+    uint32_t nb_extents;
+    NBDExtent *extents;
+    NBDClientSession *client = nbd_get_client_session(bs);
+
+    if (!client->block_status_ok) {
+        *pnum = nb_sectors;
+        ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
+        if (bs->drv->protocol_name) {
+            ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
+        }
+        return ret;
+    }
+
+    ret = nbd_client_co_cmd_block_status(bs, sector_num << BDRV_SECTOR_BITS,
+                                         nb_sectors << BDRV_SECTOR_BITS,
+                                         &extents, &nb_extents);
+    if (ret < 0) {
+        return ret;
+    }
+
+    *pnum = extents[0].length >> BDRV_SECTOR_BITS;
+    ret = (extents[0].flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_ALLOCATED) |
+          (extents[0].flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0);
+
+    if ((ret & BDRV_BLOCK_ALLOCATED) && !(ret & BDRV_BLOCK_ZERO)) {
+        ret |= BDRV_BLOCK_DATA;
+    }
+
+    g_free(extents);
+
+    return ret;
+}
 
 void nbd_client_detach_aio_context(BlockDriverState *bs)
 {
@@ -579,7 +617,8 @@ int nbd_client_init(BlockDriverState *bs,
                                 &client->size,
                                 &client->structured_reply,
                                 bitmap_name,
-                                &client->bitmap_ok, errp);
+                                &client->bitmap_ok,
+                                &client->block_status_ok, errp);
     if (ret < 0) {
         logout("Failed to negotiate with the NBD server\n");
         return ret;
diff --git a/block/nbd-client.h b/block/nbd-client.h
index e5ec89b9f6..9848732628 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -34,6 +34,7 @@ typedef struct NBDClientSession {
     bool is_unix;
 
     bool structured_reply;
+    bool block_status_ok;
     bool bitmap_ok;
     uint32_t meta_data_context_id;
 } NBDClientSession;
@@ -59,6 +60,10 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
                          uint64_t bytes, QEMUIOVector *qiov, int flags);
 int64_t nbd_client_co_load_bitmap_part(BlockDriverState *bs, uint64_t offset,
                                        uint64_t bytes, BdrvDirtyBitmap *bitmap);
+int64_t coroutine_fn nbd_client_co_get_block_status(BlockDriverState *bs,
+                                                    int64_t sector_num,
+                                                    int nb_sectors, int *pnum,
+                                                    BlockDriverState **file);
 
 void nbd_client_detach_aio_context(BlockDriverState *bs);
 void nbd_client_attach_aio_context(BlockDriverState *bs,
diff --git a/block/nbd.c b/block/nbd.c
index b2b6fd1cf9..b3a28f0746 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -604,6 +604,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
     .bdrv_dirty_bitmap_load     = nbd_dirty_bitmap_load,
+    .bdrv_co_get_block_status   = nbd_client_co_get_block_status,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -624,6 +625,7 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
     .bdrv_dirty_bitmap_load     = nbd_dirty_bitmap_load,
+    .bdrv_co_get_block_status   = nbd_client_co_get_block_status,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -644,6 +646,7 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
     .bdrv_dirty_bitmap_load     = nbd_dirty_bitmap_load,
+    .bdrv_co_get_block_status   = nbd_client_co_get_block_status,
 };
 
 static void bdrv_nbd_init(void)
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 69aee1eda1..58c1a3866b 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -196,7 +196,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QIOChannel **outioc,
                           off_t *size, bool *structured_reply,
                           const char *bitmap_name, bool *bitmap_ok,
-                          Error **errp);
+                          bool *block_status_ok, Error **errp);
 int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
 ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
 int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
diff --git a/nbd/client.c b/nbd/client.c
index c3817b84fa..1b478d112c 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -563,10 +563,10 @@ static int nbd_receive_query_bitmap(QIOChannel *ioc, const char *export,
 
 int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
-                          QIOChannel **outioc,
-                          off_t *size, bool *structured_reply,
+                          QIOChannel **outioc, off_t *size,
+                          bool *structured_reply,
                           const char *bitmap_name, bool *bitmap_ok,
-                          Error **errp)
+                          bool *block_status_ok, Error **errp)
 {
     char buf[256];
     uint64_t magic, s;
@@ -681,11 +681,19 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                                               false, NULL) == 0;
             }
 
-            if (!!structured_reply && *structured_reply && !!bitmap_name) {
+            if (!!structured_reply && *structured_reply) {
                 int ret;
-                assert(!!bitmap_ok);
-                ret = nbd_receive_query_bitmap(ioc, name, bitmap_name,
-                                               bitmap_ok, errp) == 0;
+
+                if (!!bitmap_name) {
+                    assert(!!bitmap_ok);
+                    ret = nbd_receive_query_bitmap(ioc, name, bitmap_name,
+                                                   bitmap_ok, errp) == 0;
+                } else {
+                    ret = nbd_receive_query_meta_context(ioc, name,
+                                                         "base:allocation",
+                                                         block_status_ok,
+                                                         errp);
+                }
                 if (ret < 0) {
                     goto fail;
                 }
@@ -969,6 +977,7 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *reply)
 
     switch (reply->type) {
     case NBD_REPLY_TYPE_NONE:
+    case NBD_REPLY_TYPE_BLOCK_STATUS:
         break;
     case NBD_REPLY_TYPE_OFFSET_DATA:
     case NBD_REPLY_TYPE_OFFSET_HOLE:
diff --git a/qemu-nbd.c b/qemu-nbd.c
index cf45444faf..e3a4733e60 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -272,7 +272,7 @@ static void *nbd_client_thread(void *arg)
 
     ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL, &nbdflags,
                                 NULL, NULL, NULL,
-                                &size, NULL, NULL, NULL, &local_error);
+                                &size, NULL, NULL, NULL, NULL, &local_error);
     if (ret < 0) {
         if (local_error) {
             error_report_err(local_error);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 01/18] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 01/18] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC Vladimir Sementsov-Ogievskiy
@ 2017-02-06 19:54   ` Eric Blake
  0 siblings, 0 replies; 58+ messages in thread
From: Eric Blake @ 2017-02-06 19:54 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> To be consistent when NBD_STRUCTURED_REPLY_MAGIC will be introduced.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/client.c                             | 4 ++--
>  nbd/nbd-internal.h                       | 2 +-
>  nbd/server.c                             | 4 ++--
>  tests/qemu-iotests/nbd-fault-injector.py | 4 ++--
>  4 files changed, 7 insertions(+), 7 deletions(-)

Mechanical, and makes sense.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 02/18] nbd-server: refactor simple reply sending
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 02/18] nbd-server: refactor simple reply sending Vladimir Sementsov-Ogievskiy
@ 2017-02-06 21:09   ` Eric Blake
  0 siblings, 0 replies; 58+ messages in thread
From: Eric Blake @ 2017-02-06 21:09 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 705 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Rename functions appropriately and also make a separate copy of NBDReply
> - NBDSimpleReply, to replace NBDReply for the server. NBDReply itself
> will be upgraded in future patches to handle both simple and structured
> replies in the client.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h |  7 +++++++
>  nbd/server.c        | 25 +++++++++++++------------
>  2 files changed, 20 insertions(+), 12 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 03/18] nbd: Minimal structured read for server
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 03/18] nbd: Minimal structured read for server Vladimir Sementsov-Ogievskiy
@ 2017-02-06 23:01   ` Eric Blake
  2017-02-07 17:44     ` Paolo Bonzini
  2017-05-04 10:58     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 58+ messages in thread
From: Eric Blake @ 2017-02-06 23:01 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 6868 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Minimal implementation of structured read: one data chunk + finishing
> none chunk. No segmentation.
> Minimal structured error implementation: no text message.
> Support DF flag, but just ignore it, as there is no segmentation any
> way.

Might be worth adding that this is still an experimental extension to
the NBD spec, and therefore that this implementation serves as proof of
concept and may still need tweaking if anything major turns up before
promoting it to stable.  It might also be worth a link to:

https://github.com/NetworkBlockDevice/nbd/blob/extension-structured-reply/doc/proto.md

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h |  31 +++++++++++++
>  nbd/nbd-internal.h  |   2 +
>  nbd/server.c        | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 154 insertions(+), 4 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 3c65cf8d87..58b864f145 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -70,6 +70,25 @@ struct NBDSimpleReply {
>  };
>  typedef struct NBDSimpleReply NBDSimpleReply;
>  
> +typedef struct NBDStructuredReplyChunk {
> +    uint32_t magic;
> +    uint16_t flags;
> +    uint16_t type;
> +    uint64_t handle;
> +    uint32_t length;
> +} QEMU_PACKED NBDStructuredReplyChunk;
> +
> +typedef struct NBDStructuredRead {
> +    NBDStructuredReplyChunk h;
> +    uint64_t offset;
> +} QEMU_PACKED NBDStructuredRead;
> +
> +typedef struct NBDStructuredError {
> +    NBDStructuredReplyChunk h;
> +    uint32_t error;
> +    uint16_t message_length;
> +} QEMU_PACKED NBDStructuredError;

Definitely a subset of all types added in the NBD protocol extension,
but reasonable for a minimal implementation.  Might be worth adding
comments to the types...

>  
> +/* Structured reply flags */
> +#define NBD_REPLY_FLAG_DONE 1
> +
> +/* Structured reply types */
> +#define NBD_REPLY_TYPE_NONE 0
> +#define NBD_REPLY_TYPE_OFFSET_DATA 1
> +#define NBD_REPLY_TYPE_OFFSET_HOLE 2
> +#define NBD_REPLY_TYPE_ERROR ((1 << 15) + 1)
> +#define NBD_REPLY_TYPE_ERROR_OFFSET ((1 << 15) + 2)

...that correspond to these constants that will be used in the [h.]type
field.

Also, it's a bit odd that you are defining constants that aren't
implemented here; I don't know if it is any cleaner to save the
definition for the unimplemented types until you actually implement them
(NBD_REPLY_TYPE_OFFSET_HOLE, NBD_REPLY_TYPE_ERROR_OFFSET).

> +++ b/nbd/nbd-internal.h
> @@ -60,6 +60,7 @@
>  #define NBD_REPLY_SIZE          (4 + 4 + 8)
>  #define NBD_REQUEST_MAGIC       0x25609513
>  #define NBD_SIMPLE_REPLY_MAGIC  0x67446698
> +#define NBD_STRUCTURED_REPLY_MAGIC 0x668e33ef
>  #define NBD_OPTS_MAGIC          0x49484156454F5054LL
>  #define NBD_CLIENT_MAGIC        0x0000420281861253LL
>  #define NBD_REP_MAGIC           0x0003e889045565a9LL

I would not be bothered if you wanted to reindent the other lines by 3
spaces so that all the macro definitions start on the same column.  But
I also won't require it.

> @@ -81,6 +82,7 @@
>  #define NBD_OPT_LIST            (3)
>  #define NBD_OPT_PEEK_EXPORT     (4)
>  #define NBD_OPT_STARTTLS        (5)
> +#define NBD_OPT_STRUCTURED_REPLY (8)

Similar comments about consistency in the definition column.

> +++ b/nbd/server.c
> @@ -100,6 +100,8 @@ struct NBDClient {
>      QTAILQ_ENTRY(NBDClient) next;
>      int nb_requests;
>      bool closing;
> +
> +    bool structured_reply;
>  };
>  
>  /* That's all folks */
> @@ -573,6 +575,16 @@ static int nbd_negotiate_options(NBDClient *client)
>                      return ret;
>                  }
>                  break;
> +
> +            case NBD_OPT_STRUCTURED_REPLY:
> +                client->structured_reply = true;
> +                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
> +                                             clientflags);
> +                if (ret < 0) {
> +                    return ret;
> +                }
> +                break;
> +

As written, you allow the client to negotiate this more than once.  On
the one hand, we are idempotent, so it doesn't hurt if they do so; on
the other hand, it is a waste of bandwidth, and a client could abuse it
by sending an infinite stream of NBD_OPT_STRUCTURED_REPLY requests and
never moving into transmission phase, which is a mild form of
Denial-of-Service (they're hogging a socket from accomplishing useful
work for some other client).  It would be acceptable if we wanted to
disconnect any client that sends this option more than once, although
the NBD spec does not require us to do so.  Up to you if you think
that's worth adding.

>  
> +static void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
> +                         uint16_t type, uint64_t handle, uint32_t length)

I'm not sure I like the name of this helper. I know what you are doing:
go from native-endian local variables into network-byte-order storage in
preparation for transmission, done at the last possible moment.  But I
also don't know if I have a good suggestion for a better name off hand.

> +{
> +    stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
> +    stw_be_p(&chunk->flags, flags);
> +    stw_be_p(&chunk->type, type);
> +    stq_be_p(&chunk->handle, handle);
> +    stl_be_p(&chunk->length, length);
> +}
> +
> +static int nbd_co_send_iov(NBDClient *client, struct iovec *iov, unsigned niov)

Probably should add the coroutine_fn annotation to this function and its
friends (yeah, the NBD code doesn't consistently use it yet, but it should).

> @@ -1147,7 +1239,8 @@ static ssize_t nbd_co_receive_request(NBDRequestData *req,
>          rc = request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
>          goto out;
>      }
> -    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
> +    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
> +                           NBD_CMD_FLAG_DF)) {
>          LOG("unsupported flags (got 0x%x)", request->flags);
>          rc = -EINVAL;
>          goto out;

Missing a check that NBD_CMD_FLAG_DF is only set for NBD_CMD_READ (it is
not valid on any other command, at least in the current version of the
extension specification).

> @@ -1444,6 +1559,8 @@ void nbd_client_new(NBDExport *exp,
>      client->can_read = true;
>      client->close = close_fn;
>  
> +    client->structured_reply = false;

Dead assignment, since we used 'client = g_malloc0()' above.

Overall looks like it matches the spec.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 05/18] nbd/client: fix drop_sync
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 05/18] nbd/client: fix drop_sync Vladimir Sementsov-Ogievskiy
@ 2017-02-06 23:17   ` Eric Blake
  2017-02-15 14:50     ` Eric Blake
  0 siblings, 1 reply; 58+ messages in thread
From: Eric Blake @ 2017-02-06 23:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha, qemu-stable

[-- Attachment #1: Type: text/plain, Size: 1070 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Comparison symbol is misused. It may lead to memory corruption.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/client.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Adding qemu-stable; this needs to be back-ported, and can be applied
independently from your series.

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 6caf6bda6d..351731bc63 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -94,7 +94,7 @@ static ssize_t drop_sync(QIOChannel *ioc, size_t size)
>      char small[1024];
>      char *buffer;
>  
> -    buffer = sizeof(small) < size ? small : g_malloc(MIN(65536, size));
> +    buffer = sizeof(small) > size ? small : g_malloc(MIN(65536, size));
>      while (size > 0) {
>          ssize_t count = read_sync(ioc, buffer, MIN(65536, size));
>  
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 06/18] nbd/client: refactor drop_sync
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 06/18] nbd/client: refactor drop_sync Vladimir Sementsov-Ogievskiy
@ 2017-02-06 23:19   ` Eric Blake
  2017-02-08  7:55     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 58+ messages in thread
From: Eric Blake @ 2017-02-06 23:19 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 614 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Return 0 on success to simplify success checking.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/client.c | 35 +++++++++++++++++++----------------
>  1 file changed, 19 insertions(+), 16 deletions(-)

I'm not sure that this simplifies anything.  You have a net addition in
lines of code, so unless some later patch is improved because of this,
I'm inclined to say this is needless churn.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 15/18] qmp: add block-dirty-bitmap-load
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 15/18] qmp: add block-dirty-bitmap-load Vladimir Sementsov-Ogievskiy
@ 2017-02-07 12:04   ` Fam Zheng
  2017-02-09 15:18   ` Eric Blake
  2017-02-15 16:57   ` Paolo Bonzini
  2 siblings, 0 replies; 58+ messages in thread
From: Fam Zheng @ 2017-02-07 12:04 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-block, qemu-devel, jsnow, kwolf, mreitz, pbonzini, armbru,
	eblake, den, stefanha

On Fri, 02/03 18:47, Vladimir Sementsov-Ogievskiy wrote:
>  ##
> +# @block-dirty-bitmap-load:
> +#
> +# Load a dirty bitmap from the storage (qcow2 file or nbd export)
> +#
> +# Returns: nothing on success
> +#          If @node is not a valid block device, DeviceNotFound
> +#          If @name is not found, GenericError with an explanation
> +#
> +# Since: vz-7.4

Version number doesn't look very familiar :)

Fam

> +##
> +  { 'command': 'block-dirty-bitmap-load',
> +    'data': 'BlockDirtyBitmap' }
> +
> +##
>  # @BlockDirtyBitmapSha256:
>  #
>  # SHA256 hash of dirty bitmap data
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls Vladimir Sementsov-Ogievskiy
@ 2017-02-07 16:32   ` Eric Blake
  2017-02-09  6:20     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 58+ messages in thread
From: Eric Blake @ 2017-02-07 16:32 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Split out nbd_receive_simple_option to be reused for structured reply
> option.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/client.c       | 54 +++++++++++++++++++++++++++++++++++-------------------
>  nbd/nbd-internal.h | 14 ++++++++++++++
>  2 files changed, 49 insertions(+), 19 deletions(-)
> 

> +++ b/nbd/nbd-internal.h
> @@ -96,6 +96,20 @@
>  #define NBD_ENOSPC     28
>  #define NBD_ESHUTDOWN  108
>  
> +static inline const char *nbd_opt_name(int opt)
> +{
> +    switch (opt) {
> +    case NBD_OPT_EXPORT_NAME: return "export_name";

Does this really get past checkpatch?

> +    case NBD_OPT_ABORT: return "abort";
> +    case NBD_OPT_LIST: return "list";
> +    case NBD_OPT_PEEK_EXPORT: return "peek_export";
> +    case NBD_OPT_STARTTLS: return "tls";

Why just 'tls' instead of 'starttls'?

> +    case NBD_OPT_STRUCTURED_REPLY: return "structured_reply";
> +    }
> +
> +    return "<unknown option>";

Can you please consider making this include the %d representation of the
unknown option; perhaps by snprintf'ing into static storage?  While it
is unlikely that a well-behaved server will respond to a client with an
option the client doesn't recognize, it is much more likely that this
reverse lookup function will be used in a server to respond to an
unknown option from a client.

In fact, I might have split this into two patches: one providing
nbd_opt_name() and using it throughout the code base where appropriate,
and the other refactoring starttls in the client.

I'm not sure if the reverse lookup function needs to be inline in the
header; it could reasonably live in nbd/common.c, particularly if you
are going to take my advice to have it format a message for unknown values.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 03/18] nbd: Minimal structured read for server
  2017-02-06 23:01   ` Eric Blake
@ 2017-02-07 17:44     ` Paolo Bonzini
  2017-05-04 10:58     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2017-02-07 17:44 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 7140 bytes --]

On 07/02/2017 00:01, Eric Blake wrote:
> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Minimal implementation of structured read: one data chunk + finishing
>> none chunk. No segmentation.
>> Minimal structured error implementation: no text message.
>> Support DF flag, but just ignore it, as there is no segmentation any
>> way.
> 
> Might be worth adding that this is still an experimental extension to
> the NBD spec, and therefore that this implementation serves as proof of
> concept and may still need tweaking if anything major turns up before
> promoting it to stable.  It might also be worth a link to:
> 
> https://github.com/NetworkBlockDevice/nbd/blob/extension-structured-reply/doc/proto.md

Wouter's slides from FOSDEM said the state is "discussion complete, not
yet implemented".

Paolo

>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  include/block/nbd.h |  31 +++++++++++++
>>  nbd/nbd-internal.h  |   2 +
>>  nbd/server.c        | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  3 files changed, 154 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/block/nbd.h b/include/block/nbd.h
>> index 3c65cf8d87..58b864f145 100644
>> --- a/include/block/nbd.h
>> +++ b/include/block/nbd.h
>> @@ -70,6 +70,25 @@ struct NBDSimpleReply {
>>  };
>>  typedef struct NBDSimpleReply NBDSimpleReply;
>>  
>> +typedef struct NBDStructuredReplyChunk {
>> +    uint32_t magic;
>> +    uint16_t flags;
>> +    uint16_t type;
>> +    uint64_t handle;
>> +    uint32_t length;
>> +} QEMU_PACKED NBDStructuredReplyChunk;
>> +
>> +typedef struct NBDStructuredRead {
>> +    NBDStructuredReplyChunk h;
>> +    uint64_t offset;
>> +} QEMU_PACKED NBDStructuredRead;
>> +
>> +typedef struct NBDStructuredError {
>> +    NBDStructuredReplyChunk h;
>> +    uint32_t error;
>> +    uint16_t message_length;
>> +} QEMU_PACKED NBDStructuredError;
> 
> Definitely a subset of all types added in the NBD protocol extension,
> but reasonable for a minimal implementation.  Might be worth adding
> comments to the types...
> 
>>  
>> +/* Structured reply flags */
>> +#define NBD_REPLY_FLAG_DONE 1
>> +
>> +/* Structured reply types */
>> +#define NBD_REPLY_TYPE_NONE 0
>> +#define NBD_REPLY_TYPE_OFFSET_DATA 1
>> +#define NBD_REPLY_TYPE_OFFSET_HOLE 2
>> +#define NBD_REPLY_TYPE_ERROR ((1 << 15) + 1)
>> +#define NBD_REPLY_TYPE_ERROR_OFFSET ((1 << 15) + 2)
> 
> ...that correspond to these constants that will be used in the [h.]type
> field.
> 
> Also, it's a bit odd that you are defining constants that aren't
> implemented here; I don't know if it is any cleaner to save the
> definition for the unimplemented types until you actually implement them
> (NBD_REPLY_TYPE_OFFSET_HOLE, NBD_REPLY_TYPE_ERROR_OFFSET).
> 
>> +++ b/nbd/nbd-internal.h
>> @@ -60,6 +60,7 @@
>>  #define NBD_REPLY_SIZE          (4 + 4 + 8)
>>  #define NBD_REQUEST_MAGIC       0x25609513
>>  #define NBD_SIMPLE_REPLY_MAGIC  0x67446698
>> +#define NBD_STRUCTURED_REPLY_MAGIC 0x668e33ef
>>  #define NBD_OPTS_MAGIC          0x49484156454F5054LL
>>  #define NBD_CLIENT_MAGIC        0x0000420281861253LL
>>  #define NBD_REP_MAGIC           0x0003e889045565a9LL
> 
> I would not be bothered if you wanted to reindent the other lines by 3
> spaces so that all the macro definitions start on the same column.  But
> I also won't require it.
> 
>> @@ -81,6 +82,7 @@
>>  #define NBD_OPT_LIST            (3)
>>  #define NBD_OPT_PEEK_EXPORT     (4)
>>  #define NBD_OPT_STARTTLS        (5)
>> +#define NBD_OPT_STRUCTURED_REPLY (8)
> 
> Similar comments about consistency in the definition column.
> 
>> +++ b/nbd/server.c
>> @@ -100,6 +100,8 @@ struct NBDClient {
>>      QTAILQ_ENTRY(NBDClient) next;
>>      int nb_requests;
>>      bool closing;
>> +
>> +    bool structured_reply;
>>  };
>>  
>>  /* That's all folks */
>> @@ -573,6 +575,16 @@ static int nbd_negotiate_options(NBDClient *client)
>>                      return ret;
>>                  }
>>                  break;
>> +
>> +            case NBD_OPT_STRUCTURED_REPLY:
>> +                client->structured_reply = true;
>> +                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
>> +                                             clientflags);
>> +                if (ret < 0) {
>> +                    return ret;
>> +                }
>> +                break;
>> +
> 
> As written, you allow the client to negotiate this more than once.  On
> the one hand, we are idempotent, so it doesn't hurt if they do so; on
> the other hand, it is a waste of bandwidth, and a client could abuse it
> by sending an infinite stream of NBD_OPT_STRUCTURED_REPLY requests and
> never moving into transmission phase, which is a mild form of
> Denial-of-Service (they're hogging a socket from accomplishing useful
> work for some other client).  It would be acceptable if we wanted to
> disconnect any client that sends this option more than once, although
> the NBD spec does not require us to do so.  Up to you if you think
> that's worth adding.
> 
>>  
>> +static void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
>> +                         uint16_t type, uint64_t handle, uint32_t length)
> 
> I'm not sure I like the name of this helper. I know what you are doing:
> go from native-endian local variables into network-byte-order storage in
> preparation for transmission, done at the last possible moment.  But I
> also don't know if I have a good suggestion for a better name off hand.
> 
>> +{
>> +    stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
>> +    stw_be_p(&chunk->flags, flags);
>> +    stw_be_p(&chunk->type, type);
>> +    stq_be_p(&chunk->handle, handle);
>> +    stl_be_p(&chunk->length, length);
>> +}
>> +
>> +static int nbd_co_send_iov(NBDClient *client, struct iovec *iov, unsigned niov)
> 
> Probably should add the coroutine_fn annotation to this function and its
> friends (yeah, the NBD code doesn't consistently use it yet, but it should).
> 
>> @@ -1147,7 +1239,8 @@ static ssize_t nbd_co_receive_request(NBDRequestData *req,
>>          rc = request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
>>          goto out;
>>      }
>> -    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
>> +    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
>> +                           NBD_CMD_FLAG_DF)) {
>>          LOG("unsupported flags (got 0x%x)", request->flags);
>>          rc = -EINVAL;
>>          goto out;
> 
> Missing a check that NBD_CMD_FLAG_DF is only set for NBD_CMD_READ (it is
> not valid on any other command, at least in the current version of the
> extension specification).
> 
>> @@ -1444,6 +1559,8 @@ void nbd_client_new(NBDExport *exp,
>>      client->can_read = true;
>>      client->close = close_fn;
>>  
>> +    client->structured_reply = false;
> 
> Dead assignment, since we used 'client = g_malloc0()' above.
> 
> Overall looks like it matches the spec.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 07/18] nbd: Minimal structured read for client
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 07/18] nbd: Minimal structured read for client Vladimir Sementsov-Ogievskiy
@ 2017-02-07 20:14   ` Eric Blake
  2017-02-15 16:54     ` Paolo Bonzini
  2017-08-01 15:41     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 58+ messages in thread
From: Eric Blake @ 2017-02-07 20:14 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 21352 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Minimal implementation: always send DF flag, to not deal with fragmented
> replies.

This works well with your minimal server implementation, but I worry
that it will cause us to fall over when talking to a fully-compliant
server that chooses to send EOVERFLOW errors for any request larger than
64k when DF is set; it also makes it impossible to benefit from sparse
reads.  I guess that means we need to start thinking about followup
patches to flush out our implementation.  But maybe I can live with this
patch as is, since the goal of your series was not so much the full
power of structured reads, but getting to a point where we could use
structured reply for block status, even if it means your client can only
communicate with qemu-nbd as server for now, as long as we do get to the
rest of the patches for a full-blown structured read.

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.c  |  47 +++++++++++----
>  block/nbd-client.h  |   2 +
>  include/block/nbd.h |  15 +++--
>  nbd/client.c        | 170 ++++++++++++++++++++++++++++++++++++++++++++++------
>  qemu-nbd.c          |   2 +-
>  5 files changed, 203 insertions(+), 33 deletions(-)

Hmm - no change to the testsuite. Structured reads seems like the sort
of thing that it would be nice to test with some canned server replies,
particularly with server behavior that is permitted by the NBD protocol
but does not happen by default in qemu-nbd.

> 
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 3779c6c999..ff96bd1635 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -180,13 +180,20 @@ static void nbd_co_receive_reply(NBDClientSession *s,
>      *reply = s->reply;
>      if (reply->handle != request->handle ||
>          !s->ioc) {
> +        reply->simple = true;
>          reply->error = EIO;

I don't think this is quite right - by setting reply->simple to true,
you are forcing the caller to treat this as the final packet related to
this request->handle, even though that might not be the case.

As it is, I wonder if this code is correct, even before your patch - the
server is allowed to give responses out-of-order (if we request multiple
reads without waiting for the first response) - I don't see how setting
reply->error to EIO if the request->handle indicates that we are
receiving an out-of-order response to some other packet, but that our
request is still awaiting traffic.

>      } else {
> -        if (qiov && reply->error == 0) {
> -            ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov, request->len,
> -                               true);
> -            if (ret != request->len) {
> -                reply->error = EIO;
> +        if (qiov) {
> +            if ((reply->simple ? reply->error == 0 :
> +                         reply->type == NBD_REPLY_TYPE_OFFSET_DATA)) {
> +                ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov, request->len,
> +                                   true);

This works only because you used the DF flag.  If we allow fragmenting,
then you have to be careful to write the reply into the correct offset
of the iov.

> +                if (ret != request->len) {
> +                    reply->error = EIO;
> +                }
> +            } else if (!reply->simple &&
> +                       reply->type == NBD_REPLY_TYPE_OFFSET_HOLE) {
> +                qemu_iovec_memset(qiov, 0, 0, request->len);
>              }

Up to here, you didn't do any inspection for NBD_REPLY_FLAG_DONE (so you
don't know if this is the last packet the server is sending for this
reqeust->handle), and didn't do any special casing for
NBD_REPLY_TYPE_NONE or for the various error replies.  I'm not sure if
this will always do what you want.  In fact, I'm not even sure if
reply->error is set correctly for all structured packets.

>          }
>  
> @@ -227,6 +234,7 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
>          .type = NBD_CMD_READ,
>          .from = offset,
>          .len = bytes,
> +        .flags = client->structured_reply ? NBD_CMD_FLAG_DF : 0,
>      };
>      NBDReply reply;
>      ssize_t ret;
> @@ -237,12 +245,30 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
>      nbd_coroutine_start(client, &request);
>      ret = nbd_co_send_request(bs, &request, NULL);
>      if (ret < 0) {
> -        reply.error = -ret;
> -    } else {
> -        nbd_co_receive_reply(client, &request, &reply, qiov);
> +        goto out;
>      }
> +
> +    nbd_co_receive_reply(client, &request, &reply, qiov);
> +    if (reply.error != 0) {
> +        ret = -reply.error;
> +    }
> +
> +    if (!reply.simple) {
> +        while (!(reply.flags & NBD_REPLY_FLAG_DONE)) {
> +            nbd_co_receive_reply(client, &request, &reply, qiov);
> +            if (reply.error != 0) {
> +                ret = -reply.error;
> +            }
> +            if (reply.simple) {

Hmm. It looks like this part of the loop is only triggered if
nbd_co_receive_reply() detects a handle mismatch and slams reply.simple
to true.  As long as we use the DF flag, it looks like the server should
never send an error packet followed by a data packet, and your
particular server implementation always set the DONE flag on the error
packet, so it got past your testing.  But if we don't rely on the DF
flag, a server could reasonable send an ERROR_OFFSET packet for half the
buffer, followed by a data packet for the other half of the buffer,
which may wipe out reply.error from the error packet.

> +                ret = -EIO;
> +                goto out;
> +            }
> +        }
> +    }
> +
> +out:
>      nbd_coroutine_end(client, &request);
> -    return -reply.error;
> +    return ret;
>  }
>  
>  int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
> @@ -408,7 +434,8 @@ int nbd_client_init(BlockDriverState *bs,
>                                  &client->nbdflags,
>                                  tlscreds, hostname,
>                                  &client->ioc,
> -                                &client->size, errp);
> +                                &client->size,
> +                                &client->structured_reply, errp);
>      if (ret < 0) {
>          logout("Failed to negotiate with the NBD server\n");
>          return ret;
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index f8d6006849..cba1f965bf 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -32,6 +32,8 @@ typedef struct NBDClientSession {
>      NBDReply reply;
>  
>      bool is_unix;
> +
> +    bool structured_reply;
>  } NBDClientSession;
>  
>  NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 58b864f145..dae2e4bd03 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h

Can you add the use of an order file to list .h files first in your
diffs?  See
https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00288.html for
tips.

> @@ -57,11 +57,16 @@ struct NBDRequest {
>  };
>  typedef struct NBDRequest NBDRequest;
>  
> -struct NBDReply {
> +typedef struct NBDReply {
> +    bool simple;
>      uint64_t handle;
>      uint32_t error;
> -};
> -typedef struct NBDReply NBDReply;
> +
> +    uint16_t flags;
> +    uint16_t type;
> +    uint32_t length;
> +    uint64_t offset;
> +} NBDReply;

I don't know if this is the best way to represent things; I might have
used a union type, since not all fields are valid in all reply packets.

>  
>  struct NBDSimpleReply {
>      /* uint32_t NBD_SIMPLE_REPLY_MAGIC */
> @@ -169,10 +174,10 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
>  int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>                            QCryptoTLSCreds *tlscreds, const char *hostname,
>                            QIOChannel **outioc,
> -                          off_t *size, Error **errp);
> +                          off_t *size, bool *structured_reply, Error **errp);
>  int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
>  ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
> -ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
> +int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
>  int nbd_client(int fd);
>  int nbd_disconnect(int fd);
>  
> diff --git a/nbd/client.c b/nbd/client.c
> index 1c274f3012..9225f7e30d 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -472,11 +472,10 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>      return QIO_CHANNEL(tioc);
>  }
>  
> -
>  int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>                            QCryptoTLSCreds *tlscreds, const char *hostname,
>                            QIOChannel **outioc,
> -                          off_t *size, Error **errp)
> +                          off_t *size, bool *structured_reply, Error **errp)
>  {
>      char buf[256];
>      uint64_t magic, s;
> @@ -584,6 +583,12 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>              if (nbd_receive_query_exports(ioc, name, errp) < 0) {
>                  goto fail;
>              }
> +
> +            if (structured_reply != NULL) {
> +                *structured_reply =
> +                    nbd_receive_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
> +                                              false, NULL) == 0;

Okay, you're allowing the server to reject the option, in which case we
set structured_reply to false.  But re-reading patch 4/18,
nbd_receive_simple_option() can return -1 for multiple reasons, some of
them where it is still in sync (for sending more options), but others
where it is out of sync (such as failure to write, in which case the
connection MUST be dropped rather than trying to carry on).  I don't
think this handles errors correctly, and therefore I'm not even sure
that the refactoring in 4/18 is correct.

I think you may be better off with nbd_receive_simple_option() in 4/18
being tri-state: return -1 if the connection is unrecoverable (such as
after a write or read error, where we must not send or receive any more
data), 0 if the server replied with an error but the connection is still
in sync for trying something else, and 1 if the server replies with
success.  Then this code should check if the return is < 0 (kill
negotiation), == 0 (*structured_reply = false), or == 1
(*structured_reply = true).

> +            }
>          }
>          /* write the export name request */
>          if (nbd_send_option_request(ioc, NBD_OPT_EXPORT_NAME, -1, name,
> @@ -603,6 +608,14 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>              goto fail;
>          }
>          be16_to_cpus(flags);
> +
> +        if (!!structured_reply && *structured_reply &&

Why do you need !! to coerce to bool, when && also coerces to bool?

> +            !(*flags & NBD_CMD_FLAG_DF))
> +        {
> +            error_setg(errp, "Structured reply is negotiated, "
> +                             "but DF flag is not.");

No trailing '.' for error_setg() messages.

Also, I'm not quite sure the NBD protocol allows an implementation that
supports structured read but does not support the DF flag.  Maybe that's
an NBD spec bug that we should get clarified.  (Ideally, the server
always advertises the DF flag if structured replies are negotiated,
because the common implementation of user-space handshake followed by
kernel transmission phase works best when the already-existing
ioctl(NBD_SET_FLAGS) can then be used to tell the kernel to use/expect
structure replies.)

> +            goto fail;
> +        }
>      } else if (magic == NBD_CLIENT_MAGIC) {
>          uint32_t oldflags;
>  
> @@ -790,20 +803,33 @@ ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request)
>      return 0;
>  }
>  
> -ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
> +static inline int read_sync_check(QIOChannel *ioc, void *buffer, size_t size)
>  {
> -    uint8_t buf[NBD_REPLY_SIZE];
> -    uint32_t magic;
>      ssize_t ret;
>  
> -    ret = read_sync(ioc, buf, sizeof(buf));
> +    ret = read_sync(ioc, buffer, size);
>      if (ret < 0) {
>          return ret;
>      }
> -
> -    if (ret != sizeof(buf)) {
> +    if (ret != size) {
>          LOG("read failed");
> -        return -EINVAL;
> +        return -EIO;
> +    }
> +
> +    return 0;
> +}
> +
> +/* nbd_receive_simple_reply
> + * Read simple reply except magic field (which should be already read)
> + */
> +static int nbd_receive_simple_reply(QIOChannel *ioc, NBDReply *reply)
> +{
> +    uint8_t buf[NBD_REPLY_SIZE - 4];
> +    ssize_t ret;
> +
> +    ret = read_sync_check(ioc, buf, sizeof(buf));

This patch is fairly long; I might have split the creation and initial
use of the read_sync_check() function into its own patch.

> +    if (ret < 0) {
> +        return ret;
>      }
>  
>      /* Reply
> @@ -812,9 +838,124 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
>         [ 7 .. 15]    handle
>       */
>  
> -    magic = ldl_be_p(buf);
> -    reply->error  = ldl_be_p(buf + 4);
> -    reply->handle = ldq_be_p(buf + 8);
> +    reply->error  = ldl_be_p(buf);
> +    reply->handle = ldq_be_p(buf + 4);
> +
> +    return 0;
> +}
> +
> +/* nbd_receive_structured_reply_chunk
> + * Read structured reply chunk except magic field (which should be already read)
> + * Data for NBD_REPLY_TYPE_OFFSET_DATA is not read too.
> + * Length field of reply out parameter corresponds to unread part of reply.

Awkward wording; maybe:

Read the header portion of a structured reply chunk (except for magic,
which should already be read).  On success, the length field of reply
corresponds to number of bytes in the reply that still need to be read.

> + */
> +static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *reply)
> +{
> +    NBDStructuredReplyChunk chunk;
> +    ssize_t ret;
> +    uint16_t message_size;
> +
> +    ret = read_sync_check(ioc, (uint8_t *)&chunk + sizeof(chunk.magic),

This looks like some ugly pointer math.  I wonder if you'd be better
served by having a separate packed struct without the magic field,
particularly since you have to read the magic in first to decide whether
to dispatch to simple/structured for the rest of the structure.  In
other words, note how patch 2/18 omits a uint32_t magic in
NBDSimpleReply; maybe patch 3/18 could do the same with
NBDStructuredReplyChunk.  Then again, that would impact how you code
things for subcasses like NBDStructuredRead, so I don't know how much
churn to request here.

> +                          sizeof(chunk) - sizeof(chunk.magic));
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    reply->flags = be16_to_cpu(chunk.flags);
> +    reply->type = be16_to_cpu(chunk.type);
> +    reply->handle = be64_to_cpu(chunk.handle);
> +    reply->length = be32_to_cpu(chunk.length);
> +
> +    switch (reply->type) {
> +    case NBD_REPLY_TYPE_NONE:
> +        break;
> +    case NBD_REPLY_TYPE_OFFSET_DATA:
> +    case NBD_REPLY_TYPE_OFFSET_HOLE:
> +        ret = read_sync_check(ioc, &reply->offset, sizeof(reply->offset));
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        be64_to_cpus(&reply->offset);
> +        reply->length -= sizeof(reply->offset);
> +        break;
> +    case NBD_REPLY_TYPE_ERROR:
> +    case NBD_REPLY_TYPE_ERROR_OFFSET:

Here, I think that you want:

default:
if (reply->type & 0x8000) {

rather than specific NBD_REPLY_TYPE_ERROR labels, so that you can
gracefully handle ALL error packets sent by a server even if you haven't
explicitly coded for them (a good server should probably not be sending
an error packet you don't recognize, but the protocol also went to good
efforts to describe a common behavior of all error packets).

/me reads on...

> +        ret = read_sync_check(ioc, &reply->error, sizeof(reply->error));
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        be32_to_cpus(&reply->error);
> +
> +        ret = read_sync_check(ioc, &message_size, sizeof(message_size));
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        be16_to_cpus(&message_size);
> +
> +        if (message_size > 0) {
> +            /* TODO: provide error message to user */
> +            ret = drop_sync(ioc, message_size);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +        }
> +
> +        if (reply->type == NBD_REPLY_TYPE_ERROR_OFFSET) {
> +            /* drop 64bit offset */
> +            ret = drop_sync(ioc, 8);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +        }
> +        break;
> +    default:
> +        if (reply->type & (1 << 15)) {

...oh, you did, mostly.

> +            /* unknown error */
> +            ret = drop_sync(ioc, reply->length);

Still, even if it is an unknown error, you should still be able to
populate reply->error, rather than ignoring it.

> +            if (ret < 0) {
> +                return ret;
> +            }
> +
> +            reply->error = NBD_EINVAL;

Meanwhile, you should also probably ensure that reply->error is non-zero
even if the server was buggy and sent an error flag without telling you
an error value (your choice of EINVAL seems reasonable).

> +            reply->length = 0;
> +        } else {
> +            /* unknown non-error reply type */
> +            return -EINVAL;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
> +{
> +    uint32_t magic;
> +    int ret;
> +
> +    ret = read_sync_check(ioc, &magic, sizeof(magic));
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    be32_to_cpus(&magic);
> +
> +    switch (magic) {
> +    case NBD_SIMPLE_REPLY_MAGIC:
> +        reply->simple = true;
> +        ret = nbd_receive_simple_reply(ioc, reply);
> +        break;
> +    case NBD_STRUCTURED_REPLY_MAGIC:
> +        reply->simple = false;
> +        ret = nbd_receive_structured_reply_chunk(ioc, reply);
> +        break;
> +    default:
> +        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
> +        return -EINVAL;

This part looks good.

> +    }
> +
> +    if (ret < 0) {
> +        return ret;
> +    }
>  
>      reply->error = nbd_errno_to_system_errno(reply->error);
>  
> @@ -827,10 +968,5 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
>            ", handle = %" PRIu64" }",
>            magic, reply->error, reply->handle);
>  
> -    if (magic != NBD_SIMPLE_REPLY_MAGIC) {
> -        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
> -        return -EINVAL;
> -    }
>      return 0;
>  }
> -
> diff --git a/qemu-nbd.c b/qemu-nbd.c

Why are you deleting a blank line from the end of the file? Is that
rebase churn that should have been avoided in an earlier patch?  </me
goes and reads> - oh, it's been like that since commit 798bfe000 created
the file.  Thanks for cleaning it up (and I'm surprised checkpatch
doesn't flag it).

> index c734f627b4..de0099e333 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -272,7 +272,7 @@ static void *nbd_client_thread(void *arg)
>  
>      ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL, &nbdflags,
>                                  NULL, NULL, NULL,
> -                                &size, &local_error);
> +                                &size, NULL, &local_error);

We could reasonably allow new-style structured reads when we're handling
the entire client side; but I agree that when qemu-nbd is used as the
handshaking phase before handing things over to the kernel that we can't
request structured reads, at least not until a) the kernel nbd module
implements structured read supports, and b) we have a way to tell that
the kernel is willing to accept structured read negotiation (and that's
where my earlier comments about the DF flag being a witness of
structured reads comes into play).

Umm, how does this patch compile?  You changed the signature of
nbd_receive_negotiate() to add a parameter, but did not modify the call
in block/nbd-client.c to pass the new parameter.  (So far, I've been
reviewing based just on email content; I also plan on applying and
testing your patches before all is said and done, but I sometimes
surprise myself with my ability to do a quality read-only review even
without spending time compiling things).

>      if (ret < 0) {
>          if (local_error) {
>              error_report_err(local_error);
> 

We'll definitely need a v2, but I'm grateful that you've tackled this work.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 08/18] hbitmap: add next_zero function
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 08/18] hbitmap: add next_zero function Vladimir Sementsov-Ogievskiy
@ 2017-02-07 22:55   ` Eric Blake
  2017-02-15 16:57     ` Paolo Bonzini
  0 siblings, 1 reply; 58+ messages in thread
From: Eric Blake @ 2017-02-07 22:55 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 658 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> The function searches for next zero bit.
> Also add interface for BdrvDirtyBitmap.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/dirty-bitmap.c         |  5 +++++
>  include/block/dirty-bitmap.h |  2 ++
>  include/qemu/hbitmap.h       |  8 ++++++++
>  util/hbitmap.c               | 26 ++++++++++++++++++++++++++
>  4 files changed, 41 insertions(+)
> 

It would be nice to enhance the testsuite to cover hbitmap_next_zero().

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 06/18] nbd/client: refactor drop_sync
  2017-02-06 23:19   ` Eric Blake
@ 2017-02-08  7:55     ` Vladimir Sementsov-Ogievskiy
  2017-02-15 16:52       ` Paolo Bonzini
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-08  7:55 UTC (permalink / raw)
  To: Eric Blake, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

07.02.2017 02:19, Eric Blake wrote:
> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Return 0 on success to simplify success checking.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   nbd/client.c | 35 +++++++++++++++++++----------------
>>   1 file changed, 19 insertions(+), 16 deletions(-)
> I'm not sure that this simplifies anything.  You have a net addition in
> lines of code, so unless some later patch is improved because of this,
> I'm inclined to say this is needless churn.
>

I just dislike duplicating information like "drop_sync(ioc, 124) != 
124". In the code there is no place where positive and not equal to size 
return value actually handled. But it is not so important, if you are 
against i'll drop this, no problem.

One note: I don't have good understanding of the following: actually 
read can return positive value < queried size, which means that we 
should read again. But it is not handled in the code (handled, but just 
as an error), except drop_sync.. (With drop_sync it is side effect of 
using limited buffer size, yes?). Is it all ok?


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/18] block/dirty-bitmap: add bdrv_load_dirty_bitmap
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 10/18] block/dirty-bitmap: add bdrv_load_dirty_bitmap Vladimir Sementsov-Ogievskiy
@ 2017-02-08 11:45   ` Fam Zheng
  2017-02-16 12:37   ` Denis V. Lunev
  1 sibling, 0 replies; 58+ messages in thread
From: Fam Zheng @ 2017-02-08 11:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-block, qemu-devel, jsnow, kwolf, mreitz, pbonzini, armbru,
	eblake, den, stefanha

On Fri, 02/03 18:47, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/dirty-bitmap.c         | 53 ++++++++++++++++++++++++++++++++++++++++++++
>  include/block/block_int.h    |  4 ++++
>  include/block/dirty-bitmap.h |  3 +++
>  3 files changed, 60 insertions(+)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 3b7db1d78c..394d4328d5 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -545,3 +545,56 @@ BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
>      return bitmap == NULL ? QLIST_FIRST(&bs->dirty_bitmaps) :
>                              QLIST_NEXT(bitmap, list);
>  }
> +
> +typedef struct BDRVLoadBitmapCo {
> +    BlockDriverState *bs;
> +    const char *name;
> +    Error **errp;
> +    BdrvDirtyBitmap *ret;
> +    bool in_progress;
> +} BDRVLoadBitmapCo;
> +
> +static void bdrv_load_dity_bitmap_co_entry(void *opaque)

Add coroutine_fn?

> +{
> +    BDRVLoadBitmapCo *lbco = opaque;
> +    BlockDriver *drv = lbco->bs->drv;
> +
> +    if (!!drv && !!drv->bdrv_dirty_bitmap_load) {
> +        lbco->ret = drv->bdrv_dirty_bitmap_load(lbco->bs, lbco->name,
> +                                                lbco->errp);
> +    } else if (lbco->bs->file)  {
> +        BlockDriverState *bs = lbco->bs;
> +        lbco->bs = lbco->bs->file->bs;
> +        bdrv_load_dity_bitmap_co_entry(lbco);
> +        if (lbco->ret != NULL) {
> +            QLIST_REMOVE(lbco->ret, list);
> +            QLIST_INSERT_HEAD(&bs->dirty_bitmaps, lbco->ret, list);
> +        }
> +    } else {
> +        lbco->ret = NULL;
> +    }
> +
> +    lbco->in_progress = false;
> +}
> +
> +BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, const char *name,
> +                                        Error **errp)
> +{
> +    Coroutine *co;
> +    BDRVLoadBitmapCo lbco = {
> +        .bs = bs,
> +        .name = name,
> +        .errp = errp,
> +        .in_progress = true
> +    };
> +
> +    if (qemu_in_coroutine()) {
> +        bdrv_load_dity_bitmap_co_entry(&lbco);
> +    } else {
> +        co = qemu_coroutine_create(bdrv_load_dity_bitmap_co_entry, &lbco);
> +        qemu_coroutine_enter(co);
> +        BDRV_POLL_WHILE(bs, lbco.in_progress);
> +    }
> +
> +    return lbco.ret;
> +}
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 83a423c580..d3770db539 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -222,6 +222,10 @@ struct BlockDriver {
>      int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
>      ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
>  
> +    BdrvDirtyBitmap *(*bdrv_dirty_bitmap_load)(BlockDriverState *bs,
> +                                               const char *name,
> +                                               Error **errp);
> +
>      int coroutine_fn (*bdrv_save_vmstate)(BlockDriverState *bs,
>                                            QEMUIOVector *qiov,
>                                            int64_t pos);
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index ff8163ba02..c0c70a8c67 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -77,4 +77,7 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start);
>  BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
>                                          BdrvDirtyBitmap *bitmap);
>  
> +BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, const char *name,
> +                                        Error **errp);
> +
>  #endif
> -- 
> 2.11.0
> 

Otherwise looks good!

Fam

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] nbd: BLOCK_STATUS for bitmap export: server part
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 11/18] nbd: BLOCK_STATUS for bitmap export: server part Vladimir Sementsov-Ogievskiy
@ 2017-02-08 13:13   ` Eric Blake
  2017-02-16 13:00   ` Denis V. Lunev
  1 sibling, 0 replies; 58+ messages in thread
From: Eric Blake @ 2017-02-08 13:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 6742 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Only one meta context type is defined: qemu-bitmap:<bitmap-name>.

Why 'qemu-bitmap:' instead of 'qemu:' for our namespace?  I guess it's
okay, though.

> Maximum one query is allowed for NBD_OPT_{SET,LIST}_META_CONTEXT,
> NBD_REP_ERR_TOO_BIG is returned otherwise.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---

This is only a high-level review; I may come back and look more closely
at details (and compare against the proposed spec) when I have more time.

>  /* Request flags, sent from client to server during transmission phase */
>  #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
> @@ -142,6 +155,7 @@ enum {
>      NBD_CMD_TRIM = 4,
>      /* 5 reserved for failed experiment NBD_CMD_CACHE */
>      NBD_CMD_WRITE_ZEROES = 6,
> +    NBD_CMD_BLOCK_STATUS = 7
>  };

Please keep the trailing comma (it makes it easier for later patches to
add new values without having to modify existing lines).

>  
>  #define NBD_DEFAULT_PORT	10809
> @@ -163,6 +177,7 @@ enum {
>  #define NBD_REPLY_TYPE_NONE 0
>  #define NBD_REPLY_TYPE_OFFSET_DATA 1
>  #define NBD_REPLY_TYPE_OFFSET_HOLE 2
> +#define NBD_REPLY_TYPE_BLOCK_STATUS 5
>  #define NBD_REPLY_TYPE_ERROR ((1 << 15) + 1)
>  #define NBD_REPLY_TYPE_ERROR_OFFSET ((1 << 15) + 2)

Might be worth formatting these so that the definitions all start in the
same column (if so, it affects an earlier patch in the series as well).

>  
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 3284bfc85a..fbbcf69925 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -83,6 +83,10 @@
>  #define NBD_OPT_PEEK_EXPORT     (4)
>  #define NBD_OPT_STARTTLS        (5)
>  #define NBD_OPT_STRUCTURED_REPLY (8)
> +#define NBD_OPT_LIST_META_CONTEXT (9)
> +#define NBD_OPT_SET_META_CONTEXT  (10)
> +
> +#define NBD_META_NS_BITMAPS "qemu-dirty-bitmap"

I still find it odd that we've split our #defines across two different
headers, not your fault.

This name does not match your commit message.  Why can't we use just
'qemu:' as our namespace?

>  
>  /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
>   * but only a limited set of errno values is specified in the protocol.
> @@ -105,6 +109,8 @@ static inline const char *nbd_opt_name(int opt)
>      case NBD_OPT_PEEK_EXPORT: return "peek_export";
>      case NBD_OPT_STARTTLS: return "tls";
>      case NBD_OPT_STRUCTURED_REPLY: return "structured_reply";
> +    case NBD_OPT_LIST_META_CONTEXT: return "list_meta_context";
> +    case NBD_OPT_SET_META_CONTEXT: return "set_meta_context";

Same question as earlier as to whether this violates checkpatch
formatting, and whether it belongs in a .c instead of inline in the header.

> 
>  
> +static int nbd_negotiate_read_size_string(QIOChannel *ioc, char **str,
> +                                          uint32_t max_len)

I probably would have split the creation of this helper function into
its own patch.  Also, can it be utilized in any of the existing code, or
is the new code the only client?

> +
> +/* start handle LIST_META_CONTEXT and SET_META_CONTEXT requests
> + * @opt          should be NBD_OPT_LIST_META_CONTEXT or NBD_OPT_SET_META_CONTEXT
> + * @length       related option data to read
> + * @nb_queries   out parameter, number of queries specified by client
> + * @bs           out parameter, bs for export, selected by client
> + *               will be zero if some not critical error occured and error reply
> + *               was sent.

Wrong spelling of 'occurred', and awkward grammar.  How about:

will be zero if an error reply is successfully sent

> + *
> + * Returns:
> + *   Err. code < 0 on critical error

s/Err./Error/

> + *   Number of bytes read otherwise (will be equal to length on non critical
> + *     error or if there no queries in request)

will be equal to length if there were no errors, or no queries in the
request

> + */
> +static int nbd_negotiate_opt_meta_context_start(NBDClient *client, uint32_t opt,
> +                                                uint32_t length,
> +                                                uint32_t *nb_queries,
> +                                                BlockDriverState **bs)
> +{
> +    int ret;
> +    NBDExport *exp;
> +    char *export_name;

Uninitialized...[1]

> +    int nb_read = 0;
> +
> +    if (!client->structured_reply) {
> +        uint32_t tail = length - nb_read;
> +        LOG("Structured reply is not negotiated");

This LOG() is redundant...[2]

> +
> +        if (nbd_negotiate_drop_sync(client->ioc, tail) != tail) {
> +            return -EIO;
> +        }
> +        ret = nbd_negotiate_send_rep_err(client->ioc, NBD_REP_ERR_INVALID, opt,
> +                                         "Structured reply is not negotiated");

[2]...because this function also calls LOG().

> +        g_free(export_name);

[1]...so this crashes.  Oops.

> +
> +        if (ret < 0) {
> +            return ret;
> +        } else {
> +            *bs = NULL;
> +            *nb_queries = 0;
> +            return length;
> +        }
> +    }
> +


> +static int nbd_negotiate_set_meta_context(NBDClient *client, uint32_t length)
> +{
> +    int ret;
> +    BlockDriverState *bs;
> +    uint32_t nb_queries;
> +    int nb_read;
> +
> +    nb_read = nbd_negotiate_opt_meta_context_start(client,
> +                                                   NBD_OPT_SET_META_CONTEXT,
> +                                                   length, &nb_queries, &bs);
> +    if (nb_read < 0) {
> +        return nb_read;
> +    }
> +    if (bs == NULL) {
> +        /* error reply was already sent by nbd_negotiate_opt_meta_context_start
> +         * */
> +        return 0;
> +    }
> +
> +    if (nb_queries == 0) {
> +        return nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
> +                                      NBD_OPT_SET_META_CONTEXT);
> +    }
> +
> +    if (nb_queries > 1) {
> +        return nbd_negotiate_send_rep_err(client->ioc, NBD_REP_ERR_TOO_BIG,
> +                                          NBD_OPT_SET_META_CONTEXT,
> +                                          "Only one exporting context is"
> +                                          "supported");

Why only one? What's wrong with supporting both 'qemu:xyz' (the dirty
bitmap, with whatever prefix name we actually settle on) and
'base:allocation' at the same time?


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 12/18] nbd: BLOCK_STATUS for bitmap export: client part
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 12/18] nbd: BLOCK_STATUS for bitmap export: client part Vladimir Sementsov-Ogievskiy
@ 2017-02-08 23:06   ` Eric Blake
  0 siblings, 0 replies; 58+ messages in thread
From: Eric Blake @ 2017-02-08 23:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 11155 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Rather light on the commit description. Might be worth pointing to the
NBD protocol that it is implementing, particularly since that part of
the protocol is still an extension and may be subject to change based on
what we learn in our implementation attempt.

> ---
>  block/nbd-client.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  block/nbd-client.h   |   6 +++
>  block/nbd.c          |   9 +++-
>  include/block/nbd.h  |   6 ++-
>  nbd/client.c         | 103 +++++++++++++++++++++++++++++++++++-
>  nbd/server.c         |   2 -
>  qapi/block-core.json |   5 +-
>  qemu-nbd.c           |   2 +-
>  8 files changed, 270 insertions(+), 9 deletions(-)

Again, this is a high-level review where I haven't closely checked the spec.

> 
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index ff96bd1635..c7eb21fb02 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -388,6 +388,147 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int count)
>  
>  }
>  
> +static inline ssize_t read_sync(QIOChannel *ioc, void *buffer, size_t size)
> +{
> +    struct iovec iov = { .iov_base = buffer, .iov_len = size };
> +    /* Sockets are kept in blocking mode in the negotiation phase.  After
> +     * that, a non-readable socket simply means that another thread stole
> +     * our request/reply.  Synchronization is done with recv_coroutine, so
> +     * that this is coroutine-safe.
> +     */
> +    return nbd_wr_syncv(ioc, &iov, 1, size, true);
> +}

Does this need a coroutine_fn tag?  I wonder if part of this patch
should be split, to introduce the new helper function and rework
existing calls in one patch, then add new BLOCK_STATUS stuff in another.

> +
> +static int nbd_client_co_cmd_block_status(BlockDriverState *bs, uint64_t offset,
> +                                          uint64_t bytes, NBDExtent **pextents,
> +                                          unsigned *nb_extents)
> +{

> +
> +    nbd_co_receive_reply(client, &request, &reply, NULL);
> +    if (reply.error != 0) {
> +        ret = -reply.error;
> +    }
> +    if (reply.simple) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +    if (reply.error != 0) {
> +        ret = -reply.error;
> +        goto fail;
> +    }

Duplicate check for reply.error.

> +    if (reply.type != NBD_REPLY_TYPE_BLOCK_STATUS) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    read_sync(client->ioc, &context_id, sizeof(context_id));
> +    cpu_to_be32s(&context_id);
> +    if (client->meta_data_context_id != context_id) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    nb = (reply.length - sizeof(context_id)) / sizeof(NBDExtent);
> +    extents = g_new(NBDExtent, nb);

reply.length is server-controlled. I didn't closely check whether we
have any sanity checks that prevent a malicious server from sending a
bogus length that would cause us to allocate far too much memory.  We
may already have such a check (we forbid incoming read packets larger
than 32M), so this is more just making sure that those existing checks
cover this code too.

> +    if (read_sync(client->ioc, extents, nb * sizeof(NBDExtent)) !=
> +        nb * sizeof(NBDExtent))
> +    {
> +        ret = -EIO;
> +        goto fail;
> +    }
> +
> +    if (!(reply.flags && NBD_REPLY_FLAG_DONE)) {

Wrong logic. s/&&/&/

> +        nbd_co_receive_reply(client, &request, &reply, NULL);
> +        if (reply.simple) {
> +            ret = -EINVAL;
> +            goto fail;
> +        }
> +        if (reply.error != 0) {
> +            ret = -reply.error;
> +            goto fail;
> +        }
> +        if (reply.type != NBD_REPLY_TYPE_NONE ||
> +            !(reply.flags && NBD_REPLY_FLAG_DONE)) {
> +            ret = -EINVAL;
> +            goto fail;

It looks like you are insisting that the server send at most one extent
packet, and that either that packet, or a concluding TYPE_NONE packet,
must end the transaction (and if not, you drop the connection). While
that may happen to be the implementation you wrote for qemu as the
server, I don't think it is a requirement in the NBD spec, and that a
server could reasonably send multiple fragments that you must reassemble
into the overall final reply.  In fact, the final reply packet could be
NBD_REPLY_TYPE_ERROR instead of NBD_REPLY_TYPE_NONE (if there was a
late-breaking error detection that invalidates code sent earlier), which
you should handle gracefully rather than aborting the connection.

> +        }
> +    }
> +
> +    for (i = 0; i < nb; ++i) {
> +        cpu_to_be32s(&extents[i].length);
> +        cpu_to_be32s(&extents[i].flags);

Is it worth sanity-checking that the server's reported extents actually
fall within the range we requested information on (well, other than the
last extent which may extend beyond the range)?  When it comes to
network communication, both the client and the server SHOULD assume that
the other side may become malicious at any time, and verify that all
data makes sense before relying on it.

> +    }
> +
> +    *pextents = extents;
> +    *nb_extents = nb;
> +    nbd_coroutine_end(client, &request);
> +    return 0;
> +
> +fail:
> +    g_free(extents);
> +    nbd_coroutine_end(client, &request);
> +    return ret;
> +}
> +

> +++ b/block/nbd-client.h

> @@ -34,6 +34,8 @@ typedef struct NBDClientSession {

>  
>  void nbd_client_detach_aio_context(BlockDriverState *bs);
>  void nbd_client_attach_aio_context(BlockDriverState *bs,
>                                     AioContext *new_context);
>  
> +
>  #endif /* NBD_CLIENT_H */

Why the added blank line here?

> diff --git a/block/nbd.c b/block/nbd.c
> index 35f24be069..63bc3f04d0 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -382,6 +382,11 @@ static QemuOptsList nbd_runtime_opts = {
>              .type = QEMU_OPT_STRING,
>              .help = "ID of the TLS credentials to use",
>          },
> +        {
> +            .name = "bitmap",
> +            .type = QEMU_OPT_STRING,
> +            .help = "Name of dirty bitmap to export",
> +        },
>      },
>  };

I'm confused here - it seems like the name of a dirty bitmap to export
is a server detail, not a client detail.  After all, it is the server
that is using the bitmap to inform the client of what extents have a
given property.  So does this hunk belong in a different patch in the
series?

> +++ b/include/block/nbd.h
> @@ -181,6 +181,8 @@ enum {
>  #define NBD_REPLY_TYPE_ERROR ((1 << 15) + 1)
>  #define NBD_REPLY_TYPE_ERROR_OFFSET ((1 << 15) + 2)
>  
> +#define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8) /* 1 mb of extents data */

The commit message might be a good place to document why you picked this
arbitrary limit (1 million extents covers a LOT of file if each extent
in turn represents a cluster of that file); meanwhile, since each extent
occupies 8 bytes, this means that you are clamping the server to send at
most 8 megabytes before you drop the connection (compared to accepting
32 megabytes for a read).

>  
> +static int nbd_receive_query_meta_context(QIOChannel *ioc, const char *export,
> +                                          const char *context, bool *ok,
> +                                          Error **errp)
> +{
> +    int ret;
> +    nbd_opt_reply reply;
> +    size_t export_len = strlen(export);
> +    size_t context_len = strlen(context);
> +    size_t data_len = 4 + export_len + 4 + 4 + context_len;
> +
> +    char *data = g_malloc(data_len);
> +    char *p = data;
> +    int nb_reps = 0;
> +
> +    *ok = false;
> +    stl_be_p(p, export_len);
> +    memcpy(p += 4, export, export_len);
> +    stl_be_p(p += export_len, 1);
> +    stl_be_p(p += 4, context_len);
> +    memcpy(p += 4, context, context_len);

Is '4' better than 'sizeof(...)' here?

> +
> +    TRACE("Requesting set_meta_context option from server");
> +    ret = nbd_send_option_request(ioc, NBD_OPT_SET_META_CONTEXT, data_len, data,
> +                                errp);

> +        if (read_sync(ioc, &context_id, sizeof(context_id)) !=
> +            sizeof(context_id))
> +        {
> +            ret = -EIO;
> +            goto out;
> +        }
> +
> +        be32_to_cpus(&context_id);
> +
> +        len = reply.length - sizeof(context_id);
> +        context_name = g_malloc(len + 1);
> +        if (read_sync(ioc, context_name, len) != len) {
> +
> +            ret = -EIO;
> +            goto out;
> +        }
> +        context_name[len] = '\0';

Shouldn't we validate that the context name returned by the server
matches our expectations of the context name we requested?

> +
> +        TRACE("set meta: %u %s", context_id, context_name);
> +
> +        nb_reps++;
> +    }
> +
> +    *ok = nb_reps == 1 && reply.type == NBD_REP_ACK;

Insisting on exactly one context reply works only if you ensure that you
only request exactly one context; but we may want to request
'base:allocation' in addition to 'qemu:bitmapname' at some point.  Also,
I think the spec allows for the server to send a different number of
contexts than what we request (whether a context that it always provides
even though we didn't request it, or omitting something we requested
because it is unable to provide it).

> @@ -589,6 +680,16 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>                      nbd_receive_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
>                                                false, NULL) == 0;
>              }
> +
> +            if (!!structured_reply && *structured_reply && !!bitmap_name) {

Why the use of !! to coerce pointers to bool when you already have &&
coercing to bool?

> +                int ret;
> +                assert(!!bitmap_ok);

Another pointless use of !!

> +                ret = nbd_receive_query_bitmap(ioc, name, bitmap_name,
> +                                               bitmap_ok, errp) == 0;
> +                if (ret < 0) {
> +                    goto fail;
> +                }
> +            }
>          }
>          /* write the export name request */

> +++ b/nbd/server.c
> @@ -21,8 +21,6 @@
>  #include "qapi/error.h"
>  #include "nbd-internal.h"
>  
> -#define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8) /* 1 mb of extents data */
> -

This is needless churn, since you only barely introduced this in an
earlier patch.  Stick it in the final file in the first patch where you
introduce it.

> +++ b/qapi/block-core.json
> @@ -2331,12 +2331,15 @@
>  #
>  # @tls-creds:   #optional TLS credentials ID
>  #
> +# @bitmap:   #optional Dirty bitmap name to export (vz-7.4)

s/vz-7.4/2.9/


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls
  2017-02-07 16:32   ` Eric Blake
@ 2017-02-09  6:20     ` Vladimir Sementsov-Ogievskiy
  2017-02-09 14:41       ` Eric Blake
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-09  6:20 UTC (permalink / raw)
  To: Eric Blake, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

07.02.2017 19:32, Eric Blake wrote:
> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Split out nbd_receive_simple_option to be reused for structured reply
>> option.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   nbd/client.c       | 54 +++++++++++++++++++++++++++++++++++-------------------
>>   nbd/nbd-internal.h | 14 ++++++++++++++
>>   2 files changed, 49 insertions(+), 19 deletions(-)
>>
>> +++ b/nbd/nbd-internal.h
>> @@ -96,6 +96,20 @@
>>   #define NBD_ENOSPC     28
>>   #define NBD_ESHUTDOWN  108
>>   
>> +static inline const char *nbd_opt_name(int opt)
>> +{
>> +    switch (opt) {
>> +    case NBD_OPT_EXPORT_NAME: return "export_name";
> Does this really get past checkpatch?
>
>> +    case NBD_OPT_ABORT: return "abort";
>> +    case NBD_OPT_LIST: return "list";
>> +    case NBD_OPT_PEEK_EXPORT: return "peek_export";
>> +    case NBD_OPT_STARTTLS: return "tls";
> Why just 'tls' instead of 'starttls'?
>
>> +    case NBD_OPT_STRUCTURED_REPLY: return "structured_reply";
>> +    }
>> +
>> +    return "<unknown option>";
> Can you please consider making this include the %d representation of the
> unknown option; perhaps by snprintf'ing into static storage?  While it

Hmm.. The caller should free it in this case. Currently, the 
get_opt_name is called only on error path or if tracing enabled. How to 
achieve this with variable string return value not complicating the code 
a lot? When there is unknown message there should be other mechanism to 
inform user

> is unlikely that a well-behaved server will respond to a client with an
> option the client doesn't recognize, it is much more likely that this
> reverse lookup function will be used in a server to respond to an
> unknown option from a client.
>
> In fact, I might have split this into two patches: one providing
> nbd_opt_name() and using it throughout the code base where appropriate,
> and the other refactoring starttls in the client.
>
> I'm not sure if the reverse lookup function needs to be inline in the
> header; it could reasonably live in nbd/common.c, particularly if you
> are going to take my advice to have it format a message for unknown values.
>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls
  2017-02-09  6:20     ` Vladimir Sementsov-Ogievskiy
@ 2017-02-09 14:41       ` Eric Blake
  2017-02-10 11:23         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 58+ messages in thread
From: Eric Blake @ 2017-02-09 14:41 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 916 bytes --]

On 02/09/2017 12:20 AM, Vladimir Sementsov-Ogievskiy wrote:
> 07.02.2017 19:32, Eric Blake wrote:
>> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Split out nbd_receive_simple_option to be reused for structured reply
>>> option.

>>> +    return "<unknown option>";
>> Can you please consider making this include the %d representation of the
>> unknown option; perhaps by snprintf'ing into static storage?  While it
> 
> Hmm.. The caller should free it in this case.

Only if you print it into malloc'd space. I think that printing it into
static storage may be sufficient (although then we have a race if more
than one thread is trying to use that static storage at the same time -
but do we ever have more than one thread trying to handle an error at
the same time?).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 15/18] qmp: add block-dirty-bitmap-load
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 15/18] qmp: add block-dirty-bitmap-load Vladimir Sementsov-Ogievskiy
  2017-02-07 12:04   ` Fam Zheng
@ 2017-02-09 15:18   ` Eric Blake
  2017-02-15 16:57   ` Paolo Bonzini
  2 siblings, 0 replies; 58+ messages in thread
From: Eric Blake @ 2017-02-09 15:18 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> For loading dirty bitmap from nbd server. Or for underlying storages for

s/storages/storage/

> other formats.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  blockdev.c           | 28 ++++++++++++++++++++++++++++
>  qapi/block-core.json | 14 ++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 1bc3fe386a..2529943e7f 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2790,6 +2790,34 @@ void qmp_block_dirty_bitmap_clear(const char *node, const char *name,
>      aio_context_release(aio_context);
>  }
>  
> +void qmp_block_dirty_bitmap_load(const char *node, const char *name,
> +                                 Error **errp)
> +{
> +    AioContext *aio_context;
> +    BlockDriverState *bs;
> +
> +    if (!node) {
> +        error_setg(errp, "Node cannot be NULL");
> +        return;
> +    }
> +    if (!name) {
> +        error_setg(errp, "Bitmap name cannot be NULL");
> +        return;
> +    }

QAPI guarantees that node and name are non-null for non-optional
arguments; these checks are dead code...

> +++ b/qapi/block-core.json
> @@ -1280,6 +1280,20 @@
>    'data': 'BlockDirtyBitmap' }
>  
>  ##
> +# @block-dirty-bitmap-load:
> +#
> +# Load a dirty bitmap from the storage (qcow2 file or nbd export)
> +#
> +# Returns: nothing on success
> +#          If @node is not a valid block device, DeviceNotFound
> +#          If @name is not found, GenericError with an explanation
> +#
> +# Since: vz-7.4
> +##
> +  { 'command': 'block-dirty-bitmap-load',
> +    'data': 'BlockDirtyBitmap' }

...since BlockDirtyBitmap defines them as required fields.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 17/18] nbd: BLOCK_STATUS for standard get_block_status function: server part
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 17/18] nbd: BLOCK_STATUS for standard get_block_status function: server part Vladimir Sementsov-Ogievskiy
@ 2017-02-09 15:38   ` Eric Blake
  2017-02-15 17:02     ` Paolo Bonzini
  2017-02-15 17:02     ` Paolo Bonzini
  0 siblings, 2 replies; 58+ messages in thread
From: Eric Blake @ 2017-02-09 15:38 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 3830 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Minimal realization: only one extent in server answer is supported.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h |  3 ++
>  nbd/nbd-internal.h  |  1 +
>  nbd/server.c        | 80 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>  3 files changed, 77 insertions(+), 7 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 08d5e51f21..69aee1eda1 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -181,6 +181,9 @@ enum {
>  #define NBD_REPLY_TYPE_ERROR ((1 << 15) + 1)
>  #define NBD_REPLY_TYPE_ERROR_OFFSET ((1 << 15) + 2)
>  
> +#define NBD_STATE_HOLE 1
> +#define NBD_STATE_ZERO (1 << 1)

Why not (1 << 0) for NBD_STATE_HOLE?

I almost wonder if it is better to rebase the series to implement
base:allocation first, rather than last, just for easier
interoperability testing with other implementations that still need to
implement structured replies; but I don't think it's a show-stopper.

>  
> +static int blockstatus_to_extent(BlockDriverState *bs, uint64_t offset,
> +                                  uint64_t length, NBDExtent *extent)
> +{
> +    BlockDriverState *file;
> +    uint64_t start_sector = offset >> BDRV_SECTOR_BITS;
> +    uint64_t last_sector = (offset + length - 1) >> BDRV_SECTOR_BITS;

Converting from bytes to sectors by rounding...

> +    uint64_t begin = start_sector;
> +    uint64_t end = last_sector + 1;
> +
> +    int nb = MIN(INT_MAX, end - begin);
> +    int64_t ret = bdrv_get_block_status_above(bs, NULL, begin, nb, &nb, &file);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    extent->flags =
> +        cpu_to_be32((ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
> +                    (ret & BDRV_BLOCK_ZERO      ? NBD_STATE_ZERO : 0));
> +    extent->length = cpu_to_be32((nb << BDRV_SECTOR_BITS) -
> +                                 (offset - (start_sector << BDRV_SECTOR_BITS)));

...then computing the length by undoing the rounding. I really think we
should consider fixing bdrv_get_block_status_above() to be byte-based,
but that's a separate series.  Your calculations look correct in the
meantime, although '(offset & (BDRV_SECTOR_SIZE - 1))' may be a bit
easier to read than '(offset - (start_sector << BDRV_SECTOR_BITS))'.

> @@ -1869,13 +1930,18 @@ static void nbd_trip(void *opaque)
>          break;
>      case NBD_CMD_BLOCK_STATUS:
>          TRACE("Request type is BLOCK_STATUS");
> -        if (client->export_bitmap == NULL) {
> +        if (!!client->export_bitmap) {

!! is not necessary here.

> +            ret = nbd_co_send_bitmap(req->client, request.handle,
> +                                     client->export_bitmap, request.from,
> +                                     request.len, 0);
> +        } else if (client->export_block_status) {
> +            ret = nbd_co_send_block_status(req->client, request.handle,
> +                                           blk_bs(exp->blk), request.from,
> +                                           request.len, 0);

Umm, why are we sending only one status? If the client requests two ids
during NBD_OPT_SET_META_CONTEXT, we should be able to provide both
pieces of information at once.  For a minimal implementation, it works
for proof of concept, but it is pretty restrictive to tell clients that
they can only request a single status context.  I'm fine if we add that
functionality in a later patch, but we'd better have the implementation
ready for the same release as this patch (I still think 2.9 is a
reasonable goal).


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 18/18] nbd: BLOCK_STATUS for standard get_block_status function: client part
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 18/18] nbd: BLOCK_STATUS for standard get_block_status function: client part Vladimir Sementsov-Ogievskiy
@ 2017-02-09 16:00   ` Eric Blake
  2017-02-15 17:04     ` Paolo Bonzini
  0 siblings, 1 reply; 58+ messages in thread
From: Eric Blake @ 2017-02-09 16:00 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 4336 bytes --]

On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Minimal realization: only first extent from the answer is used.

If you're only going to use one extent, should you implement
NBD_CMD_FLAG_REQ_ONE support to save the server some effort?

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.c  | 41 ++++++++++++++++++++++++++++++++++++++++-
>  block/nbd-client.h  |  5 +++++
>  block/nbd.c         |  3 +++
>  include/block/nbd.h |  2 +-
>  nbd/client.c        | 23 ++++++++++++++++-------
>  qemu-nbd.c          |  2 +-
>  6 files changed, 66 insertions(+), 10 deletions(-)

> +int64_t coroutine_fn nbd_client_co_get_block_status(BlockDriverState *bs,
> +                                                    int64_t sector_num,
> +                                                    int nb_sectors, int *pnum,
> +                                                    BlockDriverState **file)
> +{
> +    int64_t ret;
> +    uint32_t nb_extents;
> +    NBDExtent *extents;
> +    NBDClientSession *client = nbd_get_client_session(bs);
> +
> +    if (!client->block_status_ok) {
> +        *pnum = nb_sectors;
> +        ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
> +        if (bs->drv->protocol_name) {
> +            ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
> +        }
> +        return ret;
> +    }

Looks like a sane fallback when we don't have anything more accurate.

> +
> +    ret = nbd_client_co_cmd_block_status(bs, sector_num << BDRV_SECTOR_BITS,
> +                                         nb_sectors << BDRV_SECTOR_BITS,
> +                                         &extents, &nb_extents);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    *pnum = extents[0].length >> BDRV_SECTOR_BITS;
> +    ret = (extents[0].flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_ALLOCATED) |
> +          (extents[0].flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0);
> +
> +    if ((ret & BDRV_BLOCK_ALLOCATED) && !(ret & BDRV_BLOCK_ZERO)) {
> +        ret |= BDRV_BLOCK_DATA;
> +    }

And this conversion looks correct.

> @@ -579,7 +617,8 @@ int nbd_client_init(BlockDriverState *bs,
>                                  &client->size,
>                                  &client->structured_reply,
>                                  bitmap_name,
> -                                &client->bitmap_ok, errp);
> +                                &client->bitmap_ok,
> +                                &client->block_status_ok, errp);

That's a lot of parameters we're adding; I'm wondering if its smarter to
start passing things through via a struct.  In fact, I have posted
patches previously (and need to rebase and repost them) that introduce
such a struct, in order to make the INFO extension viable.

> @@ -681,11 +681,19 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>                                                false, NULL) == 0;
>              }
>  
> -            if (!!structured_reply && *structured_reply && !!bitmap_name) {
> +            if (!!structured_reply && *structured_reply) {
>                  int ret;
> -                assert(!!bitmap_ok);
> -                ret = nbd_receive_query_bitmap(ioc, name, bitmap_name,
> -                                               bitmap_ok, errp) == 0;
> +
> +                if (!!bitmap_name) {
> +                    assert(!!bitmap_ok);
> +                    ret = nbd_receive_query_bitmap(ioc, name, bitmap_name,
> +                                                   bitmap_ok, errp) == 0;
> +                } else {
> +                    ret = nbd_receive_query_meta_context(ioc, name,
> +                                                         "base:allocation",
> +                                                         block_status_ok,
> +                                                         errp);
> +                }

This looks weird trying to grab 'base:allocation' only if bitmap_name is
not provided.  The logic will probably have to be different if we want
to allow a client to track both pieces of information at once.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls
  2017-02-09 14:41       ` Eric Blake
@ 2017-02-10 11:23         ` Vladimir Sementsov-Ogievskiy
  2017-02-11 19:30           ` Eric Blake
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-02-10 11:23 UTC (permalink / raw)
  To: Eric Blake, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

09.02.2017 17:41, Eric Blake wrote:
> On 02/09/2017 12:20 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 07.02.2017 19:32, Eric Blake wrote:
>>> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Split out nbd_receive_simple_option to be reused for structured reply
>>>> option.
>>>> +    return "<unknown option>";
>>> Can you please consider making this include the %d representation of the
>>> unknown option; perhaps by snprintf'ing into static storage?  While it
>> Hmm.. The caller should free it in this case.
> Only if you print it into malloc'd space. I think that printing it into
> static storage may be sufficient (although then we have a race if more
> than one thread is trying to use that static storage at the same time -
> but do we ever have more than one thread trying to handle an error at
> the same time?).
>

This race would be if one thread decides to print two option names in 
one message. Or save one option in a var, then print other, then print var.



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls
  2017-02-10 11:23         ` Vladimir Sementsov-Ogievskiy
@ 2017-02-11 19:30           ` Eric Blake
  2017-02-20 16:14             ` Eric Blake
  0 siblings, 1 reply; 58+ messages in thread
From: Eric Blake @ 2017-02-11 19:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]

On 02/10/2017 05:23 AM, Vladimir Sementsov-Ogievskiy wrote:
> 09.02.2017 17:41, Eric Blake wrote:
>> On 02/09/2017 12:20 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> 07.02.2017 19:32, Eric Blake wrote:
>>>> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Split out nbd_receive_simple_option to be reused for structured reply
>>>>> option.
>>>>> +    return "<unknown option>";
>>>> Can you please consider making this include the %d representation of
>>>> the
>>>> unknown option; perhaps by snprintf'ing into static storage?  While it
>>> Hmm.. The caller should free it in this case.
>> Only if you print it into malloc'd space. I think that printing it into
>> static storage may be sufficient (although then we have a race if more
>> than one thread is trying to use that static storage at the same time -
>> but do we ever have more than one thread trying to handle an error at
>> the same time?).
>>
> 
> This race would be if one thread decides to print two option names in
> one message. Or save one option in a var, then print other, then print var.

NBD_OPT_ are supposed to be handled sequentially. The NBD spec does NOT
allow for a single client to send a second NBD_OPT_ until after the
first one has received a final response.  So the only time we would have
two threads dealing with things in nbd/client.c during handshake is if
we have a single qemu process connecting as client to two different NBD
servers in parallel, where both servers issue an otherwise unknown opt
response at the same time, which I don't see as likely enough to be
worth avoiding static storage.

If you're still worried about the race (I'm not), to the point that you
don't want to use static storage just to avoid g_malloc(), then another
option is to make nbd_opt_name() take an input parameter for a buffer
and max size, and let the caller provide stack-based storage for
computing the resulting message (similar to how strerror_r does things).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 05/18] nbd/client: fix drop_sync
  2017-02-06 23:17   ` Eric Blake
@ 2017-02-15 14:50     ` Eric Blake
  0 siblings, 0 replies; 58+ messages in thread
From: Eric Blake @ 2017-02-15 14:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: kwolf, famz, den, qemu-stable, armbru, mreitz, stefanha, pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 1401 bytes --]

On 02/06/2017 05:17 PM, Eric Blake wrote:
> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Comparison symbol is misused. It may lead to memory corruption.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  nbd/client.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Adding qemu-stable; this needs to be back-ported, and can be applied
> independently from your series.
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>

For the record, this is now assigned CVE-2017-2630.  My apologies for
introducing the bug in the first place (commit 7d3123e).  The maintainer
may want to touch up the commit message to give those further details,
since it is security-related.

> 
>>
>> diff --git a/nbd/client.c b/nbd/client.c
>> index 6caf6bda6d..351731bc63 100644
>> --- a/nbd/client.c
>> +++ b/nbd/client.c
>> @@ -94,7 +94,7 @@ static ssize_t drop_sync(QIOChannel *ioc, size_t size)
>>      char small[1024];
>>      char *buffer;
>>  
>> -    buffer = sizeof(small) < size ? small : g_malloc(MIN(65536, size));
>> +    buffer = sizeof(small) > size ? small : g_malloc(MIN(65536, size));
>>      while (size > 0) {
>>          ssize_t count = read_sync(ioc, buffer, MIN(65536, size));
>>  
>>
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 06/18] nbd/client: refactor drop_sync
  2017-02-08  7:55     ` Vladimir Sementsov-Ogievskiy
@ 2017-02-15 16:52       ` Paolo Bonzini
  0 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2017-02-15 16:52 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Eric Blake, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, armbru, den, stefanha



On 08/02/2017 08:55, Vladimir Sementsov-Ogievskiy wrote:
> 07.02.2017 02:19, Eric Blake wrote:
>> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Return 0 on success to simplify success checking.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>  nbd/client.c | 35 +++++++++++++++++++----------------
>>>  1 file changed, 19 insertions(+), 16 deletions(-)
>> I'm not sure that this simplifies anything.  You have a net addition in
>> lines of code, so unless some later patch is improved because of this,
>> I'm inclined to say this is needless churn.
>>
> 
> I just dislike duplicating information like "drop_sync(ioc, 124) !=
> 124". In the code there is no place where positive and not equal to size
> return value actually handled. But it is not so important, if you are
> against i'll drop this, no problem.

I think I agree with Vladimir.

> One note: I don't have good understanding of the following: actually
> read can return positive value < queried size, which means that we
> should read again. But it is not handled in the code (handled, but just
> as an error), except drop_sync.. (With drop_sync it is side effect of
> using limited buffer size, yes?). Is it all ok?

It is handled in nbd_wr_syncv.

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 07/18] nbd: Minimal structured read for client
  2017-02-07 20:14   ` Eric Blake
@ 2017-02-15 16:54     ` Paolo Bonzini
  2017-08-01 15:41     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2017-02-15 16:54 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 22358 bytes --]



On 07/02/2017 21:14, Eric Blake wrote:
> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Minimal implementation: always send DF flag, to not deal with fragmented
>> replies.
> 
> This works well with your minimal server implementation, but I worry
> that it will cause us to fall over when talking to a fully-compliant
> server that chooses to send EOVERFLOW errors for any request larger than
> 64k when DF is set; it also makes it impossible to benefit from sparse
> reads.  I guess that means we need to start thinking about followup
> patches to flush out our implementation.  But maybe I can live with this
> patch as is, since the goal of your series was not so much the full
> power of structured reads, but getting to a point where we could use
> structured reply for block status, even if it means your client can only
> communicate with qemu-nbd as server for now, as long as we do get to the
> rest of the patches for a full-blown structured read.

Can you post a diff that expresses this as a comment?  I'll squash the
comment into this commit.

>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  block/nbd-client.c  |  47 +++++++++++----
>>  block/nbd-client.h  |   2 +
>>  include/block/nbd.h |  15 +++--
>>  nbd/client.c        | 170 ++++++++++++++++++++++++++++++++++++++++++++++------
>>  qemu-nbd.c          |   2 +-
>>  5 files changed, 203 insertions(+), 33 deletions(-)
> 
> Hmm - no change to the testsuite. Structured reads seems like the sort
> of thing that it would be nice to test with some canned server replies,
> particularly with server behavior that is permitted by the NBD protocol
> but does not happen by default in qemu-nbd.

This would require implementing some kind of mock server support.  That
would be a very good thing but not something we have much infrastructure
for (you could use either a real socket or a mock QIOChannel).

Thanks,

Paolo

>>
>> diff --git a/block/nbd-client.c b/block/nbd-client.c
>> index 3779c6c999..ff96bd1635 100644
>> --- a/block/nbd-client.c
>> +++ b/block/nbd-client.c
>> @@ -180,13 +180,20 @@ static void nbd_co_receive_reply(NBDClientSession *s,
>>      *reply = s->reply;
>>      if (reply->handle != request->handle ||
>>          !s->ioc) {
>> +        reply->simple = true;
>>          reply->error = EIO;
> 
> I don't think this is quite right - by setting reply->simple to true,
> you are forcing the caller to treat this as the final packet related to
> this request->handle, even though that might not be the case.
> 
> As it is, I wonder if this code is correct, even before your patch - the
> server is allowed to give responses out-of-order (if we request multiple
> reads without waiting for the first response) - I don't see how setting
> reply->error to EIO if the request->handle indicates that we are
> receiving an out-of-order response to some other packet, but that our
> request is still awaiting traffic.
> 
>>      } else {
>> -        if (qiov && reply->error == 0) {
>> -            ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov, request->len,
>> -                               true);
>> -            if (ret != request->len) {
>> -                reply->error = EIO;
>> +        if (qiov) {
>> +            if ((reply->simple ? reply->error == 0 :
>> +                         reply->type == NBD_REPLY_TYPE_OFFSET_DATA)) {
>> +                ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov, request->len,
>> +                                   true);
> 
> This works only because you used the DF flag.  If we allow fragmenting,
> then you have to be careful to write the reply into the correct offset
> of the iov.
> 
>> +                if (ret != request->len) {
>> +                    reply->error = EIO;
>> +                }
>> +            } else if (!reply->simple &&
>> +                       reply->type == NBD_REPLY_TYPE_OFFSET_HOLE) {
>> +                qemu_iovec_memset(qiov, 0, 0, request->len);
>>              }
> 
> Up to here, you didn't do any inspection for NBD_REPLY_FLAG_DONE (so you
> don't know if this is the last packet the server is sending for this
> reqeust->handle), and didn't do any special casing for
> NBD_REPLY_TYPE_NONE or for the various error replies.  I'm not sure if
> this will always do what you want.  In fact, I'm not even sure if
> reply->error is set correctly for all structured packets.
> 
>>          }
>>  
>> @@ -227,6 +234,7 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
>>          .type = NBD_CMD_READ,
>>          .from = offset,
>>          .len = bytes,
>> +        .flags = client->structured_reply ? NBD_CMD_FLAG_DF : 0,
>>      };
>>      NBDReply reply;
>>      ssize_t ret;
>> @@ -237,12 +245,30 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
>>      nbd_coroutine_start(client, &request);
>>      ret = nbd_co_send_request(bs, &request, NULL);
>>      if (ret < 0) {
>> -        reply.error = -ret;
>> -    } else {
>> -        nbd_co_receive_reply(client, &request, &reply, qiov);
>> +        goto out;
>>      }
>> +
>> +    nbd_co_receive_reply(client, &request, &reply, qiov);
>> +    if (reply.error != 0) {
>> +        ret = -reply.error;
>> +    }
>> +
>> +    if (!reply.simple) {
>> +        while (!(reply.flags & NBD_REPLY_FLAG_DONE)) {
>> +            nbd_co_receive_reply(client, &request, &reply, qiov);
>> +            if (reply.error != 0) {
>> +                ret = -reply.error;
>> +            }
>> +            if (reply.simple) {
> 
> Hmm. It looks like this part of the loop is only triggered if
> nbd_co_receive_reply() detects a handle mismatch and slams reply.simple
> to true.  As long as we use the DF flag, it looks like the server should
> never send an error packet followed by a data packet, and your
> particular server implementation always set the DONE flag on the error
> packet, so it got past your testing.  But if we don't rely on the DF
> flag, a server could reasonable send an ERROR_OFFSET packet for half the
> buffer, followed by a data packet for the other half of the buffer,
> which may wipe out reply.error from the error packet.
> 
>> +                ret = -EIO;
>> +                goto out;
>> +            }
>> +        }
>> +    }
>> +
>> +out:
>>      nbd_coroutine_end(client, &request);
>> -    return -reply.error;
>> +    return ret;
>>  }
>>  
>>  int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
>> @@ -408,7 +434,8 @@ int nbd_client_init(BlockDriverState *bs,
>>                                  &client->nbdflags,
>>                                  tlscreds, hostname,
>>                                  &client->ioc,
>> -                                &client->size, errp);
>> +                                &client->size,
>> +                                &client->structured_reply, errp);
>>      if (ret < 0) {
>>          logout("Failed to negotiate with the NBD server\n");
>>          return ret;
>> diff --git a/block/nbd-client.h b/block/nbd-client.h
>> index f8d6006849..cba1f965bf 100644
>> --- a/block/nbd-client.h
>> +++ b/block/nbd-client.h
>> @@ -32,6 +32,8 @@ typedef struct NBDClientSession {
>>      NBDReply reply;
>>  
>>      bool is_unix;
>> +
>> +    bool structured_reply;
>>  } NBDClientSession;
>>  
>>  NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
>> diff --git a/include/block/nbd.h b/include/block/nbd.h
>> index 58b864f145..dae2e4bd03 100644
>> --- a/include/block/nbd.h
>> +++ b/include/block/nbd.h
> 
> Can you add the use of an order file to list .h files first in your
> diffs?  See
> https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00288.html for
> tips.
> 
>> @@ -57,11 +57,16 @@ struct NBDRequest {
>>  };
>>  typedef struct NBDRequest NBDRequest;
>>  
>> -struct NBDReply {
>> +typedef struct NBDReply {
>> +    bool simple;
>>      uint64_t handle;
>>      uint32_t error;
>> -};
>> -typedef struct NBDReply NBDReply;
>> +
>> +    uint16_t flags;
>> +    uint16_t type;
>> +    uint32_t length;
>> +    uint64_t offset;
>> +} NBDReply;
> 
> I don't know if this is the best way to represent things; I might have
> used a union type, since not all fields are valid in all reply packets.
> 
>>  
>>  struct NBDSimpleReply {
>>      /* uint32_t NBD_SIMPLE_REPLY_MAGIC */
>> @@ -169,10 +174,10 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
>>  int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>>                            QCryptoTLSCreds *tlscreds, const char *hostname,
>>                            QIOChannel **outioc,
>> -                          off_t *size, Error **errp);
>> +                          off_t *size, bool *structured_reply, Error **errp);
>>  int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
>>  ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
>> -ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
>> +int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
>>  int nbd_client(int fd);
>>  int nbd_disconnect(int fd);
>>  
>> diff --git a/nbd/client.c b/nbd/client.c
>> index 1c274f3012..9225f7e30d 100644
>> --- a/nbd/client.c
>> +++ b/nbd/client.c
>> @@ -472,11 +472,10 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>>      return QIO_CHANNEL(tioc);
>>  }
>>  
>> -
>>  int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>>                            QCryptoTLSCreds *tlscreds, const char *hostname,
>>                            QIOChannel **outioc,
>> -                          off_t *size, Error **errp)
>> +                          off_t *size, bool *structured_reply, Error **errp)
>>  {
>>      char buf[256];
>>      uint64_t magic, s;
>> @@ -584,6 +583,12 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>>              if (nbd_receive_query_exports(ioc, name, errp) < 0) {
>>                  goto fail;
>>              }
>> +
>> +            if (structured_reply != NULL) {
>> +                *structured_reply =
>> +                    nbd_receive_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
>> +                                              false, NULL) == 0;
> 
> Okay, you're allowing the server to reject the option, in which case we
> set structured_reply to false.  But re-reading patch 4/18,
> nbd_receive_simple_option() can return -1 for multiple reasons, some of
> them where it is still in sync (for sending more options), but others
> where it is out of sync (such as failure to write, in which case the
> connection MUST be dropped rather than trying to carry on).  I don't
> think this handles errors correctly, and therefore I'm not even sure
> that the refactoring in 4/18 is correct.
> 
> I think you may be better off with nbd_receive_simple_option() in 4/18
> being tri-state: return -1 if the connection is unrecoverable (such as
> after a write or read error, where we must not send or receive any more
> data), 0 if the server replied with an error but the connection is still
> in sync for trying something else, and 1 if the server replies with
> success.  Then this code should check if the return is < 0 (kill
> negotiation), == 0 (*structured_reply = false), or == 1
> (*structured_reply = true).
> 
>> +            }
>>          }
>>          /* write the export name request */
>>          if (nbd_send_option_request(ioc, NBD_OPT_EXPORT_NAME, -1, name,
>> @@ -603,6 +608,14 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>>              goto fail;
>>          }
>>          be16_to_cpus(flags);
>> +
>> +        if (!!structured_reply && *structured_reply &&
> 
> Why do you need !! to coerce to bool, when && also coerces to bool?
> 
>> +            !(*flags & NBD_CMD_FLAG_DF))
>> +        {
>> +            error_setg(errp, "Structured reply is negotiated, "
>> +                             "but DF flag is not.");
> 
> No trailing '.' for error_setg() messages.
> 
> Also, I'm not quite sure the NBD protocol allows an implementation that
> supports structured read but does not support the DF flag.  Maybe that's
> an NBD spec bug that we should get clarified.  (Ideally, the server
> always advertises the DF flag if structured replies are negotiated,
> because the common implementation of user-space handshake followed by
> kernel transmission phase works best when the already-existing
> ioctl(NBD_SET_FLAGS) can then be used to tell the kernel to use/expect
> structure replies.)
> 
>> +            goto fail;
>> +        }
>>      } else if (magic == NBD_CLIENT_MAGIC) {
>>          uint32_t oldflags;
>>  
>> @@ -790,20 +803,33 @@ ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request)
>>      return 0;
>>  }
>>  
>> -ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
>> +static inline int read_sync_check(QIOChannel *ioc, void *buffer, size_t size)
>>  {
>> -    uint8_t buf[NBD_REPLY_SIZE];
>> -    uint32_t magic;
>>      ssize_t ret;
>>  
>> -    ret = read_sync(ioc, buf, sizeof(buf));
>> +    ret = read_sync(ioc, buffer, size);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> -
>> -    if (ret != sizeof(buf)) {
>> +    if (ret != size) {
>>          LOG("read failed");
>> -        return -EINVAL;
>> +        return -EIO;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* nbd_receive_simple_reply
>> + * Read simple reply except magic field (which should be already read)
>> + */
>> +static int nbd_receive_simple_reply(QIOChannel *ioc, NBDReply *reply)
>> +{
>> +    uint8_t buf[NBD_REPLY_SIZE - 4];
>> +    ssize_t ret;
>> +
>> +    ret = read_sync_check(ioc, buf, sizeof(buf));
> 
> This patch is fairly long; I might have split the creation and initial
> use of the read_sync_check() function into its own patch.
> 
>> +    if (ret < 0) {
>> +        return ret;
>>      }
>>  
>>      /* Reply
>> @@ -812,9 +838,124 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
>>         [ 7 .. 15]    handle
>>       */
>>  
>> -    magic = ldl_be_p(buf);
>> -    reply->error  = ldl_be_p(buf + 4);
>> -    reply->handle = ldq_be_p(buf + 8);
>> +    reply->error  = ldl_be_p(buf);
>> +    reply->handle = ldq_be_p(buf + 4);
>> +
>> +    return 0;
>> +}
>> +
>> +/* nbd_receive_structured_reply_chunk
>> + * Read structured reply chunk except magic field (which should be already read)
>> + * Data for NBD_REPLY_TYPE_OFFSET_DATA is not read too.
>> + * Length field of reply out parameter corresponds to unread part of reply.
> 
> Awkward wording; maybe:
> 
> Read the header portion of a structured reply chunk (except for magic,
> which should already be read).  On success, the length field of reply
> corresponds to number of bytes in the reply that still need to be read.
> 
>> + */
>> +static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *reply)
>> +{
>> +    NBDStructuredReplyChunk chunk;
>> +    ssize_t ret;
>> +    uint16_t message_size;
>> +
>> +    ret = read_sync_check(ioc, (uint8_t *)&chunk + sizeof(chunk.magic),
> 
> This looks like some ugly pointer math.  I wonder if you'd be better
> served by having a separate packed struct without the magic field,
> particularly since you have to read the magic in first to decide whether
> to dispatch to simple/structured for the rest of the structure.  In
> other words, note how patch 2/18 omits a uint32_t magic in
> NBDSimpleReply; maybe patch 3/18 could do the same with
> NBDStructuredReplyChunk.  Then again, that would impact how you code
> things for subcasses like NBDStructuredRead, so I don't know how much
> churn to request here.
> 
>> +                          sizeof(chunk) - sizeof(chunk.magic));
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    reply->flags = be16_to_cpu(chunk.flags);
>> +    reply->type = be16_to_cpu(chunk.type);
>> +    reply->handle = be64_to_cpu(chunk.handle);
>> +    reply->length = be32_to_cpu(chunk.length);
>> +
>> +    switch (reply->type) {
>> +    case NBD_REPLY_TYPE_NONE:
>> +        break;
>> +    case NBD_REPLY_TYPE_OFFSET_DATA:
>> +    case NBD_REPLY_TYPE_OFFSET_HOLE:
>> +        ret = read_sync_check(ioc, &reply->offset, sizeof(reply->offset));
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +        be64_to_cpus(&reply->offset);
>> +        reply->length -= sizeof(reply->offset);
>> +        break;
>> +    case NBD_REPLY_TYPE_ERROR:
>> +    case NBD_REPLY_TYPE_ERROR_OFFSET:
> 
> Here, I think that you want:
> 
> default:
> if (reply->type & 0x8000) {
> 
> rather than specific NBD_REPLY_TYPE_ERROR labels, so that you can
> gracefully handle ALL error packets sent by a server even if you haven't
> explicitly coded for them (a good server should probably not be sending
> an error packet you don't recognize, but the protocol also went to good
> efforts to describe a common behavior of all error packets).
> 
> /me reads on...
> 
>> +        ret = read_sync_check(ioc, &reply->error, sizeof(reply->error));
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +        be32_to_cpus(&reply->error);
>> +
>> +        ret = read_sync_check(ioc, &message_size, sizeof(message_size));
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +        be16_to_cpus(&message_size);
>> +
>> +        if (message_size > 0) {
>> +            /* TODO: provide error message to user */
>> +            ret = drop_sync(ioc, message_size);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +        }
>> +
>> +        if (reply->type == NBD_REPLY_TYPE_ERROR_OFFSET) {
>> +            /* drop 64bit offset */
>> +            ret = drop_sync(ioc, 8);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +        }
>> +        break;
>> +    default:
>> +        if (reply->type & (1 << 15)) {
> 
> ...oh, you did, mostly.
> 
>> +            /* unknown error */
>> +            ret = drop_sync(ioc, reply->length);
> 
> Still, even if it is an unknown error, you should still be able to
> populate reply->error, rather than ignoring it.
> 
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +
>> +            reply->error = NBD_EINVAL;
> 
> Meanwhile, you should also probably ensure that reply->error is non-zero
> even if the server was buggy and sent an error flag without telling you
> an error value (your choice of EINVAL seems reasonable).
> 
>> +            reply->length = 0;
>> +        } else {
>> +            /* unknown non-error reply type */
>> +            return -EINVAL;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
>> +{
>> +    uint32_t magic;
>> +    int ret;
>> +
>> +    ret = read_sync_check(ioc, &magic, sizeof(magic));
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    be32_to_cpus(&magic);
>> +
>> +    switch (magic) {
>> +    case NBD_SIMPLE_REPLY_MAGIC:
>> +        reply->simple = true;
>> +        ret = nbd_receive_simple_reply(ioc, reply);
>> +        break;
>> +    case NBD_STRUCTURED_REPLY_MAGIC:
>> +        reply->simple = false;
>> +        ret = nbd_receive_structured_reply_chunk(ioc, reply);
>> +        break;
>> +    default:
>> +        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
>> +        return -EINVAL;
> 
> This part looks good.
> 
>> +    }
>> +
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>>  
>>      reply->error = nbd_errno_to_system_errno(reply->error);
>>  
>> @@ -827,10 +968,5 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
>>            ", handle = %" PRIu64" }",
>>            magic, reply->error, reply->handle);
>>  
>> -    if (magic != NBD_SIMPLE_REPLY_MAGIC) {
>> -        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
>> -        return -EINVAL;
>> -    }
>>      return 0;
>>  }
>> -
>> diff --git a/qemu-nbd.c b/qemu-nbd.c
> 
> Why are you deleting a blank line from the end of the file? Is that
> rebase churn that should have been avoided in an earlier patch?  </me
> goes and reads> - oh, it's been like that since commit 798bfe000 created
> the file.  Thanks for cleaning it up (and I'm surprised checkpatch
> doesn't flag it).
> 
>> index c734f627b4..de0099e333 100644
>> --- a/qemu-nbd.c
>> +++ b/qemu-nbd.c
>> @@ -272,7 +272,7 @@ static void *nbd_client_thread(void *arg)
>>  
>>      ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL, &nbdflags,
>>                                  NULL, NULL, NULL,
>> -                                &size, &local_error);
>> +                                &size, NULL, &local_error);
> 
> We could reasonably allow new-style structured reads when we're handling
> the entire client side; but I agree that when qemu-nbd is used as the
> handshaking phase before handing things over to the kernel that we can't
> request structured reads, at least not until a) the kernel nbd module
> implements structured read supports, and b) we have a way to tell that
> the kernel is willing to accept structured read negotiation (and that's
> where my earlier comments about the DF flag being a witness of
> structured reads comes into play).
> 
> Umm, how does this patch compile?  You changed the signature of
> nbd_receive_negotiate() to add a parameter, but did not modify the call
> in block/nbd-client.c to pass the new parameter.  (So far, I've been
> reviewing based just on email content; I also plan on applying and
> testing your patches before all is said and done, but I sometimes
> surprise myself with my ability to do a quality read-only review even
> without spending time compiling things).
> 
>>      if (ret < 0) {
>>          if (local_error) {
>>              error_report_err(local_error);
>>
> 
> We'll definitely need a v2, but I'm grateful that you've tackled this work.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 15/18] qmp: add block-dirty-bitmap-load
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 15/18] qmp: add block-dirty-bitmap-load Vladimir Sementsov-Ogievskiy
  2017-02-07 12:04   ` Fam Zheng
  2017-02-09 15:18   ` Eric Blake
@ 2017-02-15 16:57   ` Paolo Bonzini
  2 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2017-02-15 16:57 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, armbru, eblake, den, stefanha



On 03/02/2017 16:47, Vladimir Sementsov-Ogievskiy wrote:
>  ##
> +# @block-dirty-bitmap-load:
> +#
> +# Load a dirty bitmap from the storage (qcow2 file or nbd export)

NBD export only in upstream QEMU?

Paolo

> +# Returns: nothing on success
> +#          If @node is not a valid block device, DeviceNotFound
> +#          If @name is not found, GenericError with an explanation
> +#

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 08/18] hbitmap: add next_zero function
  2017-02-07 22:55   ` Eric Blake
@ 2017-02-15 16:57     ` Paolo Bonzini
  0 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2017-02-15 16:57 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]



On 07/02/2017 23:55, Eric Blake wrote:
> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> The function searches for next zero bit.
>> Also add interface for BdrvDirtyBitmap.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  block/dirty-bitmap.c         |  5 +++++
>>  include/block/dirty-bitmap.h |  2 ++
>>  include/qemu/hbitmap.h       |  8 ++++++++
>>  util/hbitmap.c               | 26 ++++++++++++++++++++++++++
>>  4 files changed, 41 insertions(+)
>>
> 
> It would be nice to enhance the testsuite to cover hbitmap_next_zero().
> 

Agreed.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 17/18] nbd: BLOCK_STATUS for standard get_block_status function: server part
  2017-02-09 15:38   ` Eric Blake
@ 2017-02-15 17:02     ` Paolo Bonzini
  2017-02-15 17:02     ` Paolo Bonzini
  1 sibling, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2017-02-15 17:02 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 1633 bytes --]



On 09/02/2017 16:38, Eric Blake wrote:
>> +static int blockstatus_to_extent(BlockDriverState *bs, uint64_t offset,
>> +                                  uint64_t length, NBDExtent *extent)
>> +{
>> +    BlockDriverState *file;
>> +    uint64_t start_sector = offset >> BDRV_SECTOR_BITS;
>> +    uint64_t last_sector = (offset + length - 1) >> BDRV_SECTOR_BITS;
> Converting from bytes to sectors by rounding...
> 
>> +    uint64_t begin = start_sector;
>> +    uint64_t end = last_sector + 1;
>> +
>> +    int nb = MIN(INT_MAX, end - begin);
>> +    int64_t ret = bdrv_get_block_status_above(bs, NULL, begin, nb, &nb, &file);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    extent->flags =
>> +        cpu_to_be32((ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
>> +                    (ret & BDRV_BLOCK_ZERO      ? NBD_STATE_ZERO : 0));
>> +    extent->length = cpu_to_be32((nb << BDRV_SECTOR_BITS) -
>> +                                 (offset - (start_sector << BDRV_SECTOR_BITS)));
> ...then computing the length by undoing the rounding. I really think we
> should consider fixing bdrv_get_block_status_above() to be byte-based,
> but that's a separate series.  Your calculations look correct in the
> meantime, although '(offset & (BDRV_SECTOR_SIZE - 1))' may be a bit
> easier to read than '(offset - (start_sector << BDRV_SECTOR_BITS))'.

Agreed.  And please make it a separate variable, i.e.

    uint64_t length;

    length = (nb << BDRV_SECTOR_BITS) - (offset & BDRV_SECTOR_SIZE - 1);
    ...
    extent->length = cpu_to_be32(length);

Paolo

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 17/18] nbd: BLOCK_STATUS for standard get_block_status function: server part
  2017-02-09 15:38   ` Eric Blake
  2017-02-15 17:02     ` Paolo Bonzini
@ 2017-02-15 17:02     ` Paolo Bonzini
  1 sibling, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2017-02-15 17:02 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]



On 09/02/2017 16:38, Eric Blake wrote:
> Umm, why are we sending only one status? If the client requests two ids
> during NBD_OPT_SET_META_CONTEXT, we should be able to provide both
> pieces of information at once.  For a minimal implementation, it works
> for proof of concept, but it is pretty restrictive to tell clients that
> they can only request a single status context.  I'm fine if we add that
> functionality in a later patch, but we'd better have the implementation
> ready for the same release as this patch (I still think 2.9 is a
> reasonable goal).

Agreed on this too.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 18/18] nbd: BLOCK_STATUS for standard get_block_status function: client part
  2017-02-09 16:00   ` Eric Blake
@ 2017-02-15 17:04     ` Paolo Bonzini
  0 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2017-02-15 17:04 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 1184 bytes --]



On 09/02/2017 17:00, Eric Blake wrote:
>> +    if (!client->block_status_ok) {
>> +        *pnum = nb_sectors;
>> +        ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
>> +        if (bs->drv->protocol_name) {

This condition is always true, I think?

>> +            ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
>> +        }
>> +        return ret;
>> +    }
> Looks like a sane fallback when we don't have anything more accurate.

>> +
>> +    ret = nbd_client_co_cmd_block_status(bs, sector_num << BDRV_SECTOR_BITS,
>> +                                         nb_sectors << BDRV_SECTOR_BITS,
>> +                                         &extents, &nb_extents);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    *pnum = extents[0].length >> BDRV_SECTOR_BITS;
>> +    ret = (extents[0].flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_ALLOCATED) |
>> +          (extents[0].flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0);
>> +
>> +    if ((ret & BDRV_BLOCK_ALLOCATED) && !(ret & BDRV_BLOCK_ZERO)) {
>> +        ret |= BDRV_BLOCK_DATA;
>> +    }

You can always return BDRV_BLOCK_OFFSET_VALID here, too.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS
  2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
                   ` (17 preceding siblings ...)
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 18/18] nbd: BLOCK_STATUS for standard get_block_status function: client part Vladimir Sementsov-Ogievskiy
@ 2017-02-15 17:05 ` Paolo Bonzini
  2017-07-13 11:10   ` Eric Blake
  18 siblings, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2017-02-15 17:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, armbru, eblake, den, stefanha



On 03/02/2017 16:47, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> We really need exporting dirty bitmaps feature as well as remote
> get_block_status for nbd devices. So, here is minimalistic and restricted
> realization of 'structured reply' and 'block status' nbd protocol extension
> (as second is developed over the first the combined spec may be found here:
> https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md)
> 
> What is done:
> 
> server:
>  - can send not fragmented structured replies for CMD_READ (this was done only
>    because the spec doesn't allow structure reply feature without maintaining
>    structured read)
>  - can export dirty bitmap through BLOCK_STATUS. Only one bitmap can be exported,
>    negotiation query should be 'qemu-dirty-bitmap:<bitmap name>'
>  - cab export block status through BLOCK_STATUS. Client can negotiate only one
>    entity for exporting through BLOCK_STATUS - bitmap _or_ block status.
>    negotiation query should be 'base:allocation', as defined in the spec.
>    server sends only one extent on each BLOCK_STATUS query.
> 
> client:
>  - can receive not fragmented structured replies for CMD_READ
>  - can get load dirty bitmap through nbd. Be careful: bitmap for export is
>    is selected during nbd negotiation - actually in open(). So, name argument for
>    qmp block-dirty-bitmap-load is just a _new_ name for loaded bitmap.
>    (any way, for us block-dirty-bitmap-load is just a way to test the feature,
>     really we need only server part)
>  - get_block_status works now through nbd CMD_BLOCK_STATUS, if base:allocation is
>    negotiated for export.
> 
> It should be minimal but fully compatible realization of the spec.

This will require v2, but it seems pretty close already.

Paolo

> web: https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=nbd-block-status-v1
> git: https://src.openvz.org/scm/~vsementsov/qemu.git (tag nbd-block-status-v1)
> 
> Vladimir Sementsov-Ogievskiy (18):
>   nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC
>   nbd-server: refactor simple reply sending
>   nbd: Minimal structured read for server
>   nbd/client: refactor nbd_receive_starttls
>   nbd/client: fix drop_sync
>   nbd/client: refactor drop_sync
>   nbd: Minimal structured read for client
>   hbitmap: add next_zero function
>   block/dirty-bitmap: add bdrv_dirty_bitmap_next()
>   block/dirty-bitmap: add bdrv_load_dirty_bitmap
>   nbd: BLOCK_STATUS for bitmap export: server part
>   nbd: BLOCK_STATUS for bitmap export: client part
>   nbd: add nbd_dirty_bitmap_load
>   qmp: add x-debug-block-dirty-bitmap-sha256
>   qmp: add block-dirty-bitmap-load
>   iotests: add test for nbd dirty bitmap export
>   nbd: BLOCK_STATUS for standard get_block_status function: server part
>   nbd: BLOCK_STATUS for standard get_block_status function: client part
> 
>  block/dirty-bitmap.c                     |  70 ++++
>  block/nbd-client.c                       | 230 ++++++++++-
>  block/nbd-client.h                       |  13 +
>  block/nbd.c                              |  44 +-
>  blockdev.c                               |  57 +++
>  include/block/block_int.h                |   4 +
>  include/block/dirty-bitmap.h             |  10 +
>  include/block/nbd.h                      |  73 +++-
>  include/qemu/hbitmap.h                   |  16 +
>  nbd/client.c                             | 373 ++++++++++++++---
>  nbd/nbd-internal.h                       |  25 +-
>  nbd/server.c                             | 661 ++++++++++++++++++++++++++++++-
>  qapi/block-core.json                     |  46 ++-
>  qemu-nbd.c                               |   2 +-
>  tests/Makefile.include                   |   2 +-
>  tests/qemu-iotests/180                   | 133 +++++++
>  tests/qemu-iotests/180.out               |   5 +
>  tests/qemu-iotests/group                 |   1 +
>  tests/qemu-iotests/nbd-fault-injector.py |   4 +-
>  util/hbitmap.c                           |  37 ++
>  20 files changed, 1713 insertions(+), 93 deletions(-)
>  create mode 100755 tests/qemu-iotests/180
>  create mode 100644 tests/qemu-iotests/180.out
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 10/18] block/dirty-bitmap: add bdrv_load_dirty_bitmap
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 10/18] block/dirty-bitmap: add bdrv_load_dirty_bitmap Vladimir Sementsov-Ogievskiy
  2017-02-08 11:45   ` Fam Zheng
@ 2017-02-16 12:37   ` Denis V. Lunev
  1 sibling, 0 replies; 58+ messages in thread
From: Denis V. Lunev @ 2017-02-16 12:37 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, stefanha

On 02/03/2017 06:47 PM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/dirty-bitmap.c         | 53 ++++++++++++++++++++++++++++++++++++++++++++
>  include/block/block_int.h    |  4 ++++
>  include/block/dirty-bitmap.h |  3 +++
>  3 files changed, 60 insertions(+)
>
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 3b7db1d78c..394d4328d5 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -545,3 +545,56 @@ BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
>      return bitmap == NULL ? QLIST_FIRST(&bs->dirty_bitmaps) :
>                              QLIST_NEXT(bitmap, list);
>  }
> +
> +typedef struct BDRVLoadBitmapCo {
> +    BlockDriverState *bs;
> +    const char *name;
> +    Error **errp;
> +    BdrvDirtyBitmap *ret;
> +    bool in_progress;
> +} BDRVLoadBitmapCo;
> +
> +static void bdrv_load_dity_bitmap_co_entry(void *opaque)
> +{
> +    BDRVLoadBitmapCo *lbco = opaque;
> +    BlockDriver *drv = lbco->bs->drv;
> +
> +    if (!!drv && !!drv->bdrv_dirty_bitmap_load) {
> +        lbco->ret = drv->bdrv_dirty_bitmap_load(lbco->bs, lbco->name,
> +                                                lbco->errp);
> +    } else if (lbco->bs->file)  {
> +        BlockDriverState *bs = lbco->bs;
> +        lbco->bs = lbco->bs->file->bs;
> +        bdrv_load_dity_bitmap_co_entry(lbco);
> +        if (lbco->ret != NULL) {
> +            QLIST_REMOVE(lbco->ret, list);
> +            QLIST_INSERT_HEAD(&bs->dirty_bitmaps, lbco->ret, list);
> +        }
> +    } else {
> +        lbco->ret = NULL;
> +    }
> +
> +    lbco->in_progress = false;
> +}
> +
> +BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, const char *name,
> +                                        Error **errp)
> +{
I think that you'd better check bs->drv here and not
call coroutine in this case at all.


> +    Coroutine *co;
> +    BDRVLoadBitmapCo lbco = {
> +        .bs = bs,
> +        .name = name,
> +        .errp = errp,
> +        .in_progress = true
> +    };
> +
> +    if (qemu_in_coroutine()) {
> +        bdrv_load_dity_bitmap_co_entry(&lbco);
> +    } else {
> +        co = qemu_coroutine_create(bdrv_load_dity_bitmap_co_entry, &lbco);
> +        qemu_coroutine_enter(co);
> +        BDRV_POLL_WHILE(bs, lbco.in_progress);
> +    }
> +
> +    return lbco.ret;
> +}
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 83a423c580..d3770db539 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -222,6 +222,10 @@ struct BlockDriver {
>      int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
>      ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
>  
> +    BdrvDirtyBitmap *(*bdrv_dirty_bitmap_load)(BlockDriverState *bs,
> +                                               const char *name,
> +                                               Error **errp);
> +
>      int coroutine_fn (*bdrv_save_vmstate)(BlockDriverState *bs,
>                                            QEMUIOVector *qiov,
>                                            int64_t pos);
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index ff8163ba02..c0c70a8c67 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -77,4 +77,7 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start);
>  BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
>                                          BdrvDirtyBitmap *bitmap);
>  
> +BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, const char *name,
> +                                        Error **errp);
> +
>  #endif

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] nbd: BLOCK_STATUS for bitmap export: server part
  2017-02-03 15:47 ` [Qemu-devel] [PATCH 11/18] nbd: BLOCK_STATUS for bitmap export: server part Vladimir Sementsov-Ogievskiy
  2017-02-08 13:13   ` Eric Blake
@ 2017-02-16 13:00   ` Denis V. Lunev
  1 sibling, 0 replies; 58+ messages in thread
From: Denis V. Lunev @ 2017-02-16 13:00 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, eblake, stefanha

On 02/03/2017 06:47 PM, Vladimir Sementsov-Ogievskiy wrote:
> Only one meta context type is defined: qemu-bitmap:<bitmap-name>.
> Maximum one query is allowed for NBD_OPT_{SET,LIST}_META_CONTEXT,
> NBD_REP_ERR_TOO_BIG is returned otherwise.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
...
> +static int nbd_negotiate_opt_meta_context_start(NBDClient *client, uint32_t opt,
> +                                                uint32_t length,
> +                                                uint32_t *nb_queries,
> +                                                BlockDriverState **bs)
> +{
> +    int ret;
> +    NBDExport *exp;
> +    char *export_name;
> +    int nb_read = 0;
> +
> +    if (!client->structured_reply) {
> +        uint32_t tail = length - nb_read;
> +        LOG("Structured reply is not negotiated");
> +
> +        if (nbd_negotiate_drop_sync(client->ioc, tail) != tail) {
> +            return -EIO;
> +        }
> +        ret = nbd_negotiate_send_rep_err(client->ioc, NBD_REP_ERR_INVALID, opt,
> +                                         "Structured reply is not negotiated");
> +        g_free(export_name);
export_name is not initialized here! for me there is no need to free
anything here


> +
> +        if (ret < 0) {
> +            return ret;
> +        } else {
> +            *bs = NULL;
> +            *nb_queries = 0;
> +            return length;
> +        }
> +    }
> +
> +    nb_read = nbd_negotiate_read_size_string(client->ioc, &export_name,
> +                                             NBD_MAX_NAME_SIZE);
> +    if (nb_read < 0) {
> +        return nb_read;
> +    }
> +
> +    exp = nbd_export_find(export_name);
> +    if (exp == NULL) {
> +        uint32_t tail = length - nb_read;
> +        LOG("export '%s' is not found", export_name);
> +
> +        if (nbd_negotiate_drop_sync(client->ioc, tail) != tail) {
export_name is leaked on this path

> +            return -EIO;
> +        }
> +        ret = nbd_negotiate_send_rep_err(client->ioc, NBD_REP_ERR_INVALID, opt,
> +                                         "export '%s' is not found",
> +                                         export_name);
> +        g_free(export_name);
> +

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls
  2017-02-11 19:30           ` Eric Blake
@ 2017-02-20 16:14             ` Eric Blake
  0 siblings, 0 replies; 58+ messages in thread
From: Eric Blake @ 2017-02-20 16:14 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: kwolf, famz, den, armbru, mreitz, stefanha, pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 1176 bytes --]

On 02/11/2017 01:30 PM, Eric Blake wrote:
>>>>> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Split out nbd_receive_simple_option to be reused for structured reply
>>>>>> option.
>>>>>> +    return "<unknown option>";
>>>>> Can you please consider making this include the %d representation of
>>>>> the
>>>>> unknown option; perhaps by snprintf'ing into static storage?  While it

> If you're still worried about the race (I'm not), to the point that you
> don't want to use static storage just to avoid g_malloc(), then another
> option is to make nbd_opt_name() take an input parameter for a buffer
> and max size, and let the caller provide stack-based storage for
> computing the resulting message (similar to how strerror_r does things).

Or, as long as the caller prints the value as well as the reverse-mapped
name, that would work too.  I'm going to add such a reverse-mapping for
my NBD_INFO_* patches about to land later today, so I'll make my
counter-proposal for NBD_OPT_* along those lines as part of my series.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 03/18] nbd: Minimal structured read for server
  2017-02-06 23:01   ` Eric Blake
  2017-02-07 17:44     ` Paolo Bonzini
@ 2017-05-04 10:58     ` Vladimir Sementsov-Ogievskiy
  2017-05-04 13:28       ` Eric Blake
  1 sibling, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-05-04 10:58 UTC (permalink / raw)
  To: Eric Blake, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

07.02.2017 02:01, Eric Blake wrote:
> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Minimal implementation of structured read: one data chunk + finishing
>> none chunk. No segmentation.
>> Minimal structured error implementation: no text message.
>> Support DF flag, but just ignore it, as there is no segmentation any
>> way.
> Might be worth adding that this is still an experimental extension to
> the NBD spec, and therefore that this implementation serves as proof of
> concept and may still need tweaking if anything major turns up before
> promoting it to stable.  It might also be worth a link to:
>
> https://github.com/NetworkBlockDevice/nbd/blob/extension-structured-reply/doc/proto.md
>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   include/block/nbd.h |  31 +++++++++++++
>>   nbd/nbd-internal.h  |   2 +
>>   nbd/server.c        | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>>   3 files changed, 154 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/block/nbd.h b/include/block/nbd.h
>> index 3c65cf8d87..58b864f145 100644
>> --- a/include/block/nbd.h
>> +++ b/include/block/nbd.h
>> @@ -70,6 +70,25 @@ struct NBDSimpleReply {
>>   };
>>   typedef struct NBDSimpleReply NBDSimpleReply;
>>   
>> +typedef struct NBDStructuredReplyChunk {
>> +    uint32_t magic;
>> +    uint16_t flags;
>> +    uint16_t type;
>> +    uint64_t handle;
>> +    uint32_t length;
>> +} QEMU_PACKED NBDStructuredReplyChunk;
>> +
>> +typedef struct NBDStructuredRead {
>> +    NBDStructuredReplyChunk h;
>> +    uint64_t offset;
>> +} QEMU_PACKED NBDStructuredRead;
>> +
>> +typedef struct NBDStructuredError {
>> +    NBDStructuredReplyChunk h;
>> +    uint32_t error;
>> +    uint16_t message_length;
>> +} QEMU_PACKED NBDStructuredError;
> Definitely a subset of all types added in the NBD protocol extension,
> but reasonable for a minimal implementation.  Might be worth adding
> comments to the types...

Hmm, for me their names looks descriptive enough, but my sight may be 
biased.. What kind of comments do you want?

[...]


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 03/18] nbd: Minimal structured read for server
  2017-05-04 10:58     ` Vladimir Sementsov-Ogievskiy
@ 2017-05-04 13:28       ` Eric Blake
  2017-05-05 11:30         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 58+ messages in thread
From: Eric Blake @ 2017-05-04 13:28 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 1999 bytes --]

On 05/04/2017 05:58 AM, Vladimir Sementsov-Ogievskiy wrote:

>>> @@ -70,6 +70,25 @@ struct NBDSimpleReply {
>>>   };
>>>   typedef struct NBDSimpleReply NBDSimpleReply;
>>>   +typedef struct NBDStructuredReplyChunk {
>>> +    uint32_t magic;
>>> +    uint16_t flags;
>>> +    uint16_t type;
>>> +    uint64_t handle;
>>> +    uint32_t length;
>>> +} QEMU_PACKED NBDStructuredReplyChunk;
>>> +
>>> +typedef struct NBDStructuredRead {
>>> +    NBDStructuredReplyChunk h;
>>> +    uint64_t offset;
>>> +} QEMU_PACKED NBDStructuredRead;
>>> +
>>> +typedef struct NBDStructuredError {
>>> +    NBDStructuredReplyChunk h;
>>> +    uint32_t error;
>>> +    uint16_t message_length;
>>> +} QEMU_PACKED NBDStructuredError;
>> Definitely a subset of all types added in the NBD protocol extension,
>> but reasonable for a minimal implementation.  Might be worth adding
>> comments to the types...
> 
> Hmm, for me their names looks descriptive enough, but my sight may be
> biased.. What kind of comments do you want?

I guess I was thinking of existing structs in include/block/nbd.h:

/* Handshake phase structs - this struct is passed on the wire */

struct nbd_option {
    uint64_t magic; /* NBD_OPTS_MAGIC */
    uint32_t option; /* NBD_OPT_* */
    uint32_t length;
} QEMU_PACKED;
typedef struct nbd_option nbd_option;


and compared to:

/* Transmission phase structs
 *
 * Note: these are _NOT_ the same as the network representation of an NBD
 * request and reply!
 */
struct NBDRequest {
    uint64_t handle;
    uint64_t from;
    uint32_t len;
    uint16_t flags; /* NBD_CMD_FLAG_* */
    uint16_t type; /* NBD_CMD_* */
};
typedef struct NBDRequest NBDRequest;

where the comments make it obvious whether QEMU_PACKED matters because
we are using the struct to directly map bytes read/written on the wire.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 03/18] nbd: Minimal structured read for server
  2017-05-04 13:28       ` Eric Blake
@ 2017-05-05 11:30         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-05-05 11:30 UTC (permalink / raw)
  To: Eric Blake, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

04.05.2017 16:28, Eric Blake wrote:
> On 05/04/2017 05:58 AM, Vladimir Sementsov-Ogievskiy wrote:
>
>>>> @@ -70,6 +70,25 @@ struct NBDSimpleReply {
>>>>    };
>>>>    typedef struct NBDSimpleReply NBDSimpleReply;
>>>>    +typedef struct NBDStructuredReplyChunk {
>>>> +    uint32_t magic;
>>>> +    uint16_t flags;
>>>> +    uint16_t type;
>>>> +    uint64_t handle;
>>>> +    uint32_t length;
>>>> +} QEMU_PACKED NBDStructuredReplyChunk;
>>>> +
>>>> +typedef struct NBDStructuredRead {
>>>> +    NBDStructuredReplyChunk h;
>>>> +    uint64_t offset;
>>>> +} QEMU_PACKED NBDStructuredRead;
>>>> +
>>>> +typedef struct NBDStructuredError {
>>>> +    NBDStructuredReplyChunk h;
>>>> +    uint32_t error;
>>>> +    uint16_t message_length;
>>>> +} QEMU_PACKED NBDStructuredError;
>>> Definitely a subset of all types added in the NBD protocol extension,
>>> but reasonable for a minimal implementation.  Might be worth adding
>>> comments to the types...
>> Hmm, for me their names looks descriptive enough, but my sight may be
>> biased.. What kind of comments do you want?
> I guess I was thinking of existing structs in include/block/nbd.h:
>
> /* Handshake phase structs - this struct is passed on the wire */
>
> struct nbd_option {
>      uint64_t magic; /* NBD_OPTS_MAGIC */
>      uint32_t option; /* NBD_OPT_* */
>      uint32_t length;
> } QEMU_PACKED;
> typedef struct nbd_option nbd_option;
>
>
> and compared to:
>
> /* Transmission phase structs
>   *
>   * Note: these are _NOT_ the same as the network representation of an NBD
>   * request and reply!
>   */
> struct NBDRequest {
>      uint64_t handle;
>      uint64_t from;
>      uint32_t len;
>      uint16_t flags; /* NBD_CMD_FLAG_* */
>      uint16_t type; /* NBD_CMD_* */
> };
> typedef struct NBDRequest NBDRequest;
>
> where the comments make it obvious whether QEMU_PACKED matters because
> we are using the struct to directly map bytes read/written on the wire.
>

Ok, thank you, will add.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS
  2017-02-15 17:05 ` [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Paolo Bonzini
@ 2017-07-13 11:10   ` Eric Blake
  0 siblings, 0 replies; 58+ messages in thread
From: Eric Blake @ 2017-07-13 11:10 UTC (permalink / raw)
  To: Paolo Bonzini, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, armbru, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 2261 bytes --]

On 02/15/2017 11:05 AM, Paolo Bonzini wrote:

[now several months later...]

> 
> 
> On 03/02/2017 16:47, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> We really need exporting dirty bitmaps feature as well as remote
>> get_block_status for nbd devices. So, here is minimalistic and restricted
>> realization of 'structured reply' and 'block status' nbd protocol extension
>> (as second is developed over the first the combined spec may be found here:
>> https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md)
>>

>>
>> It should be minimal but fully compatible realization of the spec.
> 
> This will require v2, but it seems pretty close already.

This series now needs a major rebase on top of the cleanups already
present for qemu 2.10.  Realistically, I'm aiming to get my series [1]
for NBD_OPT_GO into 2.10 (we have less than a week before soft freeze,
so any reviews offered on that series will be appreciated), then the
work on adding structured read and BLOCK_STATUS will probably be big
enough that it will have to be 2.11 material, but getting it posted
sooner rather than later will make it easier to fine-tune all the details.

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg01971.html

And remember that in open source projects, review can often be the
bottleneck - it does not scale to have reviews done solely by the people
listed in MAINTAINERS.  My personal rule of thumb is that I try to
review 2 patches for every 1 that I submit, to make sure I'm not
contributing to a black hole of unreviewed patch backlog (although no
one will ever turn down a patch from a contributor unable to follow that
advice).  Other benefits of offering reviews: you quickly become more
familiar with the code base, and you earn some name-recognition in the
community (where it is more likely that your own patches get reviewed
quickly because someone recognizes that you are a regular contributor).
It's okay to state on a review that you are not familiar/comfortable
with the code, and want a second reviewer to double-check your work.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 07/18] nbd: Minimal structured read for client
  2017-02-07 20:14   ` Eric Blake
  2017-02-15 16:54     ` Paolo Bonzini
@ 2017-08-01 15:41     ` Vladimir Sementsov-Ogievskiy
  2017-08-01 15:56       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-08-01 15:41 UTC (permalink / raw)
  To: Eric Blake, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

07.02.2017 23:14, Eric Blake wrote:
> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Minimal implementation: always send DF flag, to not deal with fragmented
>> replies.
> This works well with your minimal server implementation, but I worry
> that it will cause us to fall over when talking to a fully-compliant
> server that chooses to send EOVERFLOW errors for any request larger than
> 64k when DF is set; it also makes it impossible to benefit from sparse
> reads.  I guess that means we need to start thinking about followup
> patches to flush out our implementation.  But maybe I can live with this
> patch as is, since the goal of your series was not so much the full
> power of structured reads, but getting to a point where we could use
> structured reply for block status, even if it means your client can only
> communicate with qemu-nbd as server for now, as long as we do get to the
> rest of the patches for a full-blown structured read.
>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/nbd-client.c  |  47 +++++++++++----
>>   block/nbd-client.h  |   2 +
>>   include/block/nbd.h |  15 +++--
>>   nbd/client.c        | 170 ++++++++++++++++++++++++++++++++++++++++++++++------
>>   qemu-nbd.c          |   2 +-
>>   5 files changed, 203 insertions(+), 33 deletions(-)
> Hmm - no change to the testsuite. Structured reads seems like the sort
> of thing that it would be nice to test with some canned server replies,
> particularly with server behavior that is permitted by the NBD protocol
> but does not happen by default in qemu-nbd.
>
>> diff --git a/block/nbd-client.c b/block/nbd-client.c
>> index 3779c6c999..ff96bd1635 100644
>> --- a/block/nbd-client.c
>> +++ b/block/nbd-client.c
>> @@ -180,13 +180,20 @@ static void nbd_co_receive_reply(NBDClientSession *s,
>>       *reply = s->reply;
>>       if (reply->handle != request->handle ||
>>           !s->ioc) {
>> +        reply->simple = true;
>>           reply->error = EIO;
> I don't think this is quite right - by setting reply->simple to true,
> you are forcing the caller to treat this as the final packet related to
> this request->handle, even though that might not be the case.
>
> As it is, I wonder if this code is correct, even before your patch - the
> server is allowed to give responses out-of-order (if we request multiple
> reads without waiting for the first response) - I don't see how setting
> reply->error to EIO if the request->handle indicates that we are
> receiving an out-of-order response to some other packet, but that our
> request is still awaiting traffic.

Hmm, looks like it should initiate disconnect instead of just reporting 
error to io operation caller.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 07/18] nbd: Minimal structured read for client
  2017-08-01 15:41     ` Vladimir Sementsov-Ogievskiy
@ 2017-08-01 15:56       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-08-01 15:56 UTC (permalink / raw)
  To: Eric Blake, qemu-block, qemu-devel
  Cc: famz, jsnow, kwolf, mreitz, pbonzini, armbru, den, stefanha

01.08.2017 18:41, Vladimir Sementsov-Ogievskiy wrote:
> 07.02.2017 23:14, Eric Blake wrote:
>> On 02/03/2017 09:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Minimal implementation: always send DF flag, to not deal with 
>>> fragmented
>>> replies.
>> This works well with your minimal server implementation, but I worry
>> that it will cause us to fall over when talking to a fully-compliant
>> server that chooses to send EOVERFLOW errors for any request larger than
>> 64k when DF is set; it also makes it impossible to benefit from sparse
>> reads.  I guess that means we need to start thinking about followup
>> patches to flush out our implementation.  But maybe I can live with this
>> patch as is, since the goal of your series was not so much the full
>> power of structured reads, but getting to a point where we could use
>> structured reply for block status, even if it means your client can only
>> communicate with qemu-nbd as server for now, as long as we do get to the
>> rest of the patches for a full-blown structured read.
>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/nbd-client.c  |  47 +++++++++++----
>>>   block/nbd-client.h  |   2 +
>>>   include/block/nbd.h |  15 +++--
>>>   nbd/client.c        | 170 
>>> ++++++++++++++++++++++++++++++++++++++++++++++------
>>>   qemu-nbd.c          |   2 +-
>>>   5 files changed, 203 insertions(+), 33 deletions(-)
>> Hmm - no change to the testsuite. Structured reads seems like the sort
>> of thing that it would be nice to test with some canned server replies,
>> particularly with server behavior that is permitted by the NBD protocol
>> but does not happen by default in qemu-nbd.
>>
>>> diff --git a/block/nbd-client.c b/block/nbd-client.c
>>> index 3779c6c999..ff96bd1635 100644
>>> --- a/block/nbd-client.c
>>> +++ b/block/nbd-client.c
>>> @@ -180,13 +180,20 @@ static void 
>>> nbd_co_receive_reply(NBDClientSession *s,
>>>       *reply = s->reply;
>>>       if (reply->handle != request->handle ||
>>>           !s->ioc) {
>>> +        reply->simple = true;
>>>           reply->error = EIO;
>> I don't think this is quite right - by setting reply->simple to true,
>> you are forcing the caller to treat this as the final packet related to
>> this request->handle, even though that might not be the case.
>>
>> As it is, I wonder if this code is correct, even before your patch - the
>> server is allowed to give responses out-of-order (if we request multiple
>> reads without waiting for the first response) - I don't see how setting
>> reply->error to EIO if the request->handle indicates that we are
>> receiving an out-of-order response to some other packet, but that our
>> request is still awaiting traffic.
>
> Hmm, looks like it should initiate disconnect instead of just 
> reporting error to io operation caller.
>
>

Also, nbd_co_send_request errors are not handled but just returned to 
the caller. Shouldn't the first

error on socket io initiate disconnect? I think, only the io errors, 
transferred from nbd-export block device

should be just returned to bdrv_{io} caller. NBD-protocol related errors 
(invalid handles, etc.) should initiate

disconnect, so that current and all future bdrv_{io}'s from client 
return -EIO.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2017-08-01 15:56 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-03 15:47 [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Vladimir Sementsov-Ogievskiy
2017-02-03 15:47 ` [Qemu-devel] [PATCH 01/18] nbd: rename NBD_REPLY_MAGIC to NBD_SIMPLE_REPLY_MAGIC Vladimir Sementsov-Ogievskiy
2017-02-06 19:54   ` Eric Blake
2017-02-03 15:47 ` [Qemu-devel] [PATCH 02/18] nbd-server: refactor simple reply sending Vladimir Sementsov-Ogievskiy
2017-02-06 21:09   ` Eric Blake
2017-02-03 15:47 ` [Qemu-devel] [PATCH 03/18] nbd: Minimal structured read for server Vladimir Sementsov-Ogievskiy
2017-02-06 23:01   ` Eric Blake
2017-02-07 17:44     ` Paolo Bonzini
2017-05-04 10:58     ` Vladimir Sementsov-Ogievskiy
2017-05-04 13:28       ` Eric Blake
2017-05-05 11:30         ` Vladimir Sementsov-Ogievskiy
2017-02-03 15:47 ` [Qemu-devel] [PATCH 04/18] nbd/client: refactor nbd_receive_starttls Vladimir Sementsov-Ogievskiy
2017-02-07 16:32   ` Eric Blake
2017-02-09  6:20     ` Vladimir Sementsov-Ogievskiy
2017-02-09 14:41       ` Eric Blake
2017-02-10 11:23         ` Vladimir Sementsov-Ogievskiy
2017-02-11 19:30           ` Eric Blake
2017-02-20 16:14             ` Eric Blake
2017-02-03 15:47 ` [Qemu-devel] [PATCH 05/18] nbd/client: fix drop_sync Vladimir Sementsov-Ogievskiy
2017-02-06 23:17   ` Eric Blake
2017-02-15 14:50     ` Eric Blake
2017-02-03 15:47 ` [Qemu-devel] [PATCH 06/18] nbd/client: refactor drop_sync Vladimir Sementsov-Ogievskiy
2017-02-06 23:19   ` Eric Blake
2017-02-08  7:55     ` Vladimir Sementsov-Ogievskiy
2017-02-15 16:52       ` Paolo Bonzini
2017-02-03 15:47 ` [Qemu-devel] [PATCH 07/18] nbd: Minimal structured read for client Vladimir Sementsov-Ogievskiy
2017-02-07 20:14   ` Eric Blake
2017-02-15 16:54     ` Paolo Bonzini
2017-08-01 15:41     ` Vladimir Sementsov-Ogievskiy
2017-08-01 15:56       ` Vladimir Sementsov-Ogievskiy
2017-02-03 15:47 ` [Qemu-devel] [PATCH 08/18] hbitmap: add next_zero function Vladimir Sementsov-Ogievskiy
2017-02-07 22:55   ` Eric Blake
2017-02-15 16:57     ` Paolo Bonzini
2017-02-03 15:47 ` [Qemu-devel] [PATCH 09/18] block/dirty-bitmap: add bdrv_dirty_bitmap_next() Vladimir Sementsov-Ogievskiy
2017-02-03 15:47 ` [Qemu-devel] [PATCH 10/18] block/dirty-bitmap: add bdrv_load_dirty_bitmap Vladimir Sementsov-Ogievskiy
2017-02-08 11:45   ` Fam Zheng
2017-02-16 12:37   ` Denis V. Lunev
2017-02-03 15:47 ` [Qemu-devel] [PATCH 11/18] nbd: BLOCK_STATUS for bitmap export: server part Vladimir Sementsov-Ogievskiy
2017-02-08 13:13   ` Eric Blake
2017-02-16 13:00   ` Denis V. Lunev
2017-02-03 15:47 ` [Qemu-devel] [PATCH 12/18] nbd: BLOCK_STATUS for bitmap export: client part Vladimir Sementsov-Ogievskiy
2017-02-08 23:06   ` Eric Blake
2017-02-03 15:47 ` [Qemu-devel] [PATCH 13/18] nbd: add nbd_dirty_bitmap_load Vladimir Sementsov-Ogievskiy
2017-02-03 15:47 ` [Qemu-devel] [PATCH 14/18] qmp: add x-debug-block-dirty-bitmap-sha256 Vladimir Sementsov-Ogievskiy
2017-02-03 15:47 ` [Qemu-devel] [PATCH 15/18] qmp: add block-dirty-bitmap-load Vladimir Sementsov-Ogievskiy
2017-02-07 12:04   ` Fam Zheng
2017-02-09 15:18   ` Eric Blake
2017-02-15 16:57   ` Paolo Bonzini
2017-02-03 15:47 ` [Qemu-devel] [PATCH 16/18] iotests: add test for nbd dirty bitmap export Vladimir Sementsov-Ogievskiy
2017-02-03 15:47 ` [Qemu-devel] [PATCH 17/18] nbd: BLOCK_STATUS for standard get_block_status function: server part Vladimir Sementsov-Ogievskiy
2017-02-09 15:38   ` Eric Blake
2017-02-15 17:02     ` Paolo Bonzini
2017-02-15 17:02     ` Paolo Bonzini
2017-02-03 15:47 ` [Qemu-devel] [PATCH 18/18] nbd: BLOCK_STATUS for standard get_block_status function: client part Vladimir Sementsov-Ogievskiy
2017-02-09 16:00   ` Eric Blake
2017-02-15 17:04     ` Paolo Bonzini
2017-02-15 17:05 ` [Qemu-devel] [PATCH 00/18] nbd: BLOCK_STATUS Paolo Bonzini
2017-07-13 11:10   ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.