All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/14] qemu patches for 64-bit NBD extensions
@ 2023-05-15 19:53 Eric Blake
  2023-05-15 19:53 ` [PATCH v3 01/14] nbd/client: Use smarter assert Eric Blake
                   ` (14 more replies)
  0 siblings, 15 replies; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, vsementsov

v2 was here:
https://lists.gnu.org/archive/html/qemu-devel/2022-11/msg02340.html

Since then:
 - upstream NBD has accepted the extension on a branch; once multiple
   implementations interoperate based on that spec, it will be promoted
   to mainline (my plan: qemu with this series, libnbd nearly ready to
   go, nbdkit a bit further out)
 - rebase to block changes in meantime
 - drop RFC patches for 64-bit NBD_CMD_READ (NBD spec did not take them)
 - per upstream spec decision, extended headers now mandates use of
   NBD_REPLY_TYPE_BLOCK_STATUS_EXT rather than server choice based on
   reply size, which in turn required rearranging server patches a bit
 - other changes that I noticed while testing with parallel changes
   being added to libnbd (link to those patches to follow in the next
   week or so)

Eric Blake (14):
  nbd/client: Use smarter assert
  nbd/client: Add safety check on chunk payload length
  nbd/server: Prepare for alternate-size headers
  nbd: Prepare for 64-bit request effect lengths
  nbd: Add types for extended headers
  nbd/server: Refactor handling of request payload
  nbd/server: Refactor to pass full request around
  nbd/server: Support 64-bit block status
  nbd/server: Initial support for extended headers
  nbd/client: Initial support for extended headers
  nbd/client: Accept 64-bit block status chunks
  nbd/client: Request extended headers during negotiation
  nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS

 docs/interop/nbd.txt                          |   1 +
 include/block/nbd.h                           | 165 +++--
 nbd/nbd-internal.h                            |   8 +-
 block/nbd.c                                   |  86 ++-
 nbd/client-connection.c                       |   4 +-
 nbd/client.c                                  | 143 ++--
 nbd/common.c                                  |  10 +-
 nbd/server.c                                  | 653 ++++++++++++------
 qemu-nbd.c                                    |   4 +
 block/trace-events                            |   1 +
 nbd/trace-events                              |  11 +-
 tests/qemu-iotests/223.out                    |  18 +-
 tests/qemu-iotests/233.out                    |   5 +
 tests/qemu-iotests/241.out                    |   3 +
 tests/qemu-iotests/307.out                    |  15 +-
 .../tests/nbd-qemu-allocation.out             |   3 +-
 16 files changed, 797 insertions(+), 333 deletions(-)


base-commit: 18b6727083acceac5d76ea0b8cb6f5cdef6858a7
-- 
2.40.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 01/14] nbd/client: Use smarter assert
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-29  8:20   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 02/14] nbd/client: Add safety check on chunk payload length Eric Blake
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: libguestfs, vsementsov, Dr . David Alan Gilbert,
	open list:Network Block Dev...

Assigning strlen() to a uint32_t and then asserting that it isn't too
large doesn't catch the case of an input string 4G in length.
Thankfully, the incoming strings can never be that large: if the
export name or query is reflecting a string the client got from the
server, we already guarantee that we dropped the NBD connection if the
server sent more than 32M in a single reply to our NBD_OPT_* request;
if the export name is coming from qemu, nbd_receive_negotiate()
asserted that strlen(info->name) <= NBD_MAX_STRING_SIZE; and
similarly, a query string via x->dirty_bitmap coming from the user was
bounds-checked in either qemu-nbd or by the limitations of QMP.
Still, it doesn't hurt to be more explicit in how we write our
assertions to not have to analyze whether inadvertent wraparound is
possible.

Fixes: 93676c88 ("nbd: Don't send oversize strings", v4.2.0)
Reported-by: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Eric Blake <eblake@redhat.com>

---

Looking through older branches, I came across this one that was never
applied at the time, but which also had a useful review comment from
Vladimir that invalidates the R-b it had back then.

v2 was here: https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg02733.html
since then - update David's email, use strnlen before strlen
---
 nbd/client.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 30d5383cb19..ff75722e487 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -650,19 +650,20 @@ static int nbd_send_meta_query(QIOChannel *ioc, uint32_t opt,
                                Error **errp)
 {
     int ret;
-    uint32_t export_len = strlen(export);
+    uint32_t export_len;
     uint32_t queries = !!query;
     uint32_t query_len = 0;
     uint32_t data_len;
     char *data;
     char *p;

+    assert(strnlen(export, NBD_MAX_STRING_SIZE + 1) <= NBD_MAX_STRING_SIZE);
+    export_len = strlen(export);
     data_len = sizeof(export_len) + export_len + sizeof(queries);
-    assert(export_len <= NBD_MAX_STRING_SIZE);
     if (query) {
+        assert(strnlen(query, NBD_MAX_STRING_SIZE + 1) <= NBD_MAX_STRING_SIZE);
         query_len = strlen(query);
         data_len += sizeof(query_len) + query_len;
-        assert(query_len <= NBD_MAX_STRING_SIZE);
     } else {
         assert(opt == NBD_OPT_LIST_META_CONTEXT);
     }
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 02/14] nbd/client: Add safety check on chunk payload length
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
  2023-05-15 19:53 ` [PATCH v3 01/14] nbd/client: Use smarter assert Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-29  8:25   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 03/14] nbd/server: Prepare for alternate-size headers Eric Blake
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, vsementsov, open list:Network Block Dev...

Our existing use of structured replies either reads into a qiov capped
at 32M (NBD_CMD_READ) or caps allocation to 1000 bytes (see
NBD_MAX_MALLOC_PAYLOAD in block/nbd.c).  But the existing length
checks are rather late; if we encounter a buggy (or malicious) server
that sends a super-large payload length, we should drop the connection
right then rather than assuming the layer on top will be careful.
This becomes more important when we permit 64-bit lengths which are
even more likely to have the potential for attempted denial of service
abuse.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/nbd/client.c b/nbd/client.c
index ff75722e487..46f476400ab 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1413,6 +1413,18 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
     chunk->handle = be64_to_cpu(chunk->handle);
     chunk->length = be32_to_cpu(chunk->length);

+    /*
+     * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests
+     * at 32M, no valid server should send us payload larger than
+     * this.  Even if we stopped using REQ_ONE, sane servers will cap
+     * the number of extents they return for block status.
+     */
+    if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
+        error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long",
+                   chunk->type, nbd_rep_lookup(chunk->type));
+        return -EINVAL;
+    }
+
     return 0;
 }

-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 03/14] nbd/server: Prepare for alternate-size headers
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
  2023-05-15 19:53 ` [PATCH v3 01/14] nbd/client: Use smarter assert Eric Blake
  2023-05-15 19:53 ` [PATCH v3 02/14] nbd/client: Add safety check on chunk payload length Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-29 14:26   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 04/14] nbd: Prepare for 64-bit request effect lengths Eric Blake
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: libguestfs, vsementsov, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests.  As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2].  Additionally, where the reply header is currently
16 bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one structured reply type, of 32
bytes.  Since we are already wired up to use iovecs, it is easiest to
allow for this change in header size by splitting each structured
reply across two iovecs, one for the header (which will become
variable-length in a future patch according to client negotiation),
and the other for the payload, and removing the header from the
payload struct definitions.  Interestingly, the client side code never
utilized the packed types, so only the server code needs to be
updated.

[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934

[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction.  But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  8 +++---
 nbd/server.c        | 64 ++++++++++++++++++++++++++++-----------------
 2 files changed, 44 insertions(+), 28 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index a4c98169c39..f1d838d24f5 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -96,28 +96,28 @@ typedef union NBDReply {

 /* Header of chunk for NBD_REPLY_TYPE_OFFSET_DATA */
 typedef struct NBDStructuredReadData {
-    NBDStructuredReplyChunk h; /* h.length >= 9 */
+    /* header's .length >= 9 */
     uint64_t offset;
     /* At least one byte of data payload follows, calculated from h.length */
 } QEMU_PACKED NBDStructuredReadData;

 /* Complete chunk for NBD_REPLY_TYPE_OFFSET_HOLE */
 typedef struct NBDStructuredReadHole {
-    NBDStructuredReplyChunk h; /* h.length == 12 */
+    /* header's length == 12 */
     uint64_t offset;
     uint32_t length;
 } QEMU_PACKED NBDStructuredReadHole;

 /* Header of all NBD_REPLY_TYPE_ERROR* errors */
 typedef struct NBDStructuredError {
-    NBDStructuredReplyChunk h; /* h.length >= 6 */
+    /* header's length >= 6 */
     uint32_t error;
     uint16_t message_length;
 } QEMU_PACKED NBDStructuredError;

 /* Header of NBD_REPLY_TYPE_BLOCK_STATUS */
 typedef struct NBDStructuredMeta {
-    NBDStructuredReplyChunk h; /* h.length >= 12 (at least one extent) */
+    /* header's length >= 12 (at least one extent) */
     uint32_t context_id;
     /* extents follows */
 } QEMU_PACKED NBDStructuredMeta;
diff --git a/nbd/server.c b/nbd/server.c
index e239c2890fa..eefe3401560 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1885,9 +1885,12 @@ static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
     return ret;
 }

-static inline void set_be_simple_reply(NBDSimpleReply *reply, uint64_t error,
-                                       uint64_t handle)
+static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov,
+                                       uint64_t error, uint64_t handle)
 {
+    NBDSimpleReply *reply = iov->iov_base;
+
+    iov->iov_len = sizeof(*reply);
     stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
     stl_be_p(&reply->error, error);
     stq_be_p(&reply->handle, handle);
@@ -1900,23 +1903,27 @@ static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
                                                  size_t len,
                                                  Error **errp)
 {
-    NBDSimpleReply reply;
+    NBDReply hdr;
     int nbd_err = system_errno_to_nbd_errno(error);
     struct iovec iov[] = {
-        {.iov_base = &reply, .iov_len = sizeof(reply)},
+        {.iov_base = &hdr},
         {.iov_base = data, .iov_len = len}
     };

     trace_nbd_co_send_simple_reply(handle, nbd_err, nbd_err_lookup(nbd_err),
                                    len);
-    set_be_simple_reply(&reply, nbd_err, handle);
+    set_be_simple_reply(client, &iov[0], nbd_err, handle);

     return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
 }

-static inline void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
-                                uint16_t type, uint64_t handle, uint32_t length)
+static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
+                                uint16_t flags, uint16_t type,
+                                uint64_t handle, uint32_t length)
 {
+    NBDStructuredReplyChunk *chunk = iov->iov_base;
+
+    iov->iov_len = sizeof(*chunk);
     stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
     stw_be_p(&chunk->flags, flags);
     stw_be_p(&chunk->type, type);
@@ -1928,13 +1935,14 @@ static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
                                                     uint64_t handle,
                                                     Error **errp)
 {
-    NBDStructuredReplyChunk chunk;
+    NBDReply hdr;
     struct iovec iov[] = {
-        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+        {.iov_base = &hdr},
     };

     trace_nbd_co_send_structured_done(handle);
-    set_be_chunk(&chunk, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE, handle, 0);
+    set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE,
+                 NBD_REPLY_TYPE_NONE, handle, 0);

     return nbd_co_send_iov(client, iov, 1, errp);
 }
@@ -1947,20 +1955,21 @@ static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
                                                     bool final,
                                                     Error **errp)
 {
+    NBDReply hdr;
     NBDStructuredReadData chunk;
     struct iovec iov[] = {
+        {.iov_base = &hdr},
         {.iov_base = &chunk, .iov_len = sizeof(chunk)},
         {.iov_base = data, .iov_len = size}
     };

     assert(size);
     trace_nbd_co_send_structured_read(handle, offset, data, size);
-    set_be_chunk(&chunk.h, final ? NBD_REPLY_FLAG_DONE : 0,
-                 NBD_REPLY_TYPE_OFFSET_DATA, handle,
-                 sizeof(chunk) - sizeof(chunk.h) + size);
+    set_be_chunk(client, &iov[0], final ? NBD_REPLY_FLAG_DONE : 0,
+                 NBD_REPLY_TYPE_OFFSET_DATA, handle, iov[1].iov_len + size);
     stq_be_p(&chunk.offset, offset);

-    return nbd_co_send_iov(client, iov, 2, errp);
+    return nbd_co_send_iov(client, iov, 3, errp);
 }

 static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
@@ -1969,9 +1978,11 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
                                                      const char *msg,
                                                      Error **errp)
 {
+    NBDReply hdr;
     NBDStructuredError chunk;
     int nbd_err = system_errno_to_nbd_errno(error);
     struct iovec iov[] = {
+        {.iov_base = &hdr},
         {.iov_base = &chunk, .iov_len = sizeof(chunk)},
         {.iov_base = (char *)msg, .iov_len = msg ? strlen(msg) : 0},
     };
@@ -1979,12 +1990,12 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
     assert(nbd_err);
     trace_nbd_co_send_structured_error(handle, nbd_err,
                                        nbd_err_lookup(nbd_err), msg ? msg : "");
-    set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_ERROR, handle,
-                 sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len);
+    set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE,
+                 NBD_REPLY_TYPE_ERROR, handle, iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.error, nbd_err);
-    stw_be_p(&chunk.message_length, iov[1].iov_len);
+    stw_be_p(&chunk.message_length, iov[2].iov_len);

-    return nbd_co_send_iov(client, iov, 1 + !!iov[1].iov_len, errp);
+    return nbd_co_send_iov(client, iov, 2 + !!iov[2].iov_len, errp);
 }

 /* Do a sparse read and send the structured reply to the client.
@@ -2022,19 +2033,22 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
         assert(pnum && pnum <= size - progress);
         final = progress + pnum == size;
         if (status & BDRV_BLOCK_ZERO) {
+            NBDReply hdr;
             NBDStructuredReadHole chunk;
             struct iovec iov[] = {
+                {.iov_base = &hdr},
                 {.iov_base = &chunk, .iov_len = sizeof(chunk)},
             };

             trace_nbd_co_send_structured_read_hole(handle, offset + progress,
                                                    pnum);
-            set_be_chunk(&chunk.h, final ? NBD_REPLY_FLAG_DONE : 0,
+            set_be_chunk(client, &iov[0],
+                         final ? NBD_REPLY_FLAG_DONE : 0,
                          NBD_REPLY_TYPE_OFFSET_HOLE,
-                         handle, sizeof(chunk) - sizeof(chunk.h));
+                         handle, iov[1].iov_len);
             stq_be_p(&chunk.offset, offset + progress);
             stl_be_p(&chunk.length, pnum);
-            ret = nbd_co_send_iov(client, iov, 1, errp);
+            ret = nbd_co_send_iov(client, iov, 2, errp);
         } else {
             ret = blk_co_pread(exp->common.blk, offset + progress, pnum,
                                data + progress, 0);
@@ -2200,8 +2214,10 @@ static int coroutine_fn
 nbd_co_send_extents(NBDClient *client, uint64_t handle, NBDExtentArray *ea,
                     bool last, uint32_t context_id, Error **errp)
 {
+    NBDReply hdr;
     NBDStructuredMeta chunk;
     struct iovec iov[] = {
+        {.iov_base = &hdr},
         {.iov_base = &chunk, .iov_len = sizeof(chunk)},
         {.iov_base = ea->extents, .iov_len = ea->count * sizeof(ea->extents[0])}
     };
@@ -2210,12 +2226,12 @@ nbd_co_send_extents(NBDClient *client, uint64_t handle, NBDExtentArray *ea,

     trace_nbd_co_send_extents(handle, ea->count, context_id, ea->total_length,
                               last);
-    set_be_chunk(&chunk.h, last ? NBD_REPLY_FLAG_DONE : 0,
+    set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0,
                  NBD_REPLY_TYPE_BLOCK_STATUS,
-                 handle, sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len);
+                 handle, iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.context_id, context_id);

-    return nbd_co_send_iov(client, iov, 2, errp);
+    return nbd_co_send_iov(client, iov, 3, errp);
 }

 /* Get block status from the exported device and send it to the client */
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 04/14] nbd: Prepare for 64-bit request effect lengths
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (2 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 03/14] nbd/server: Prepare for alternate-size headers Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-30 13:05   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 05/14] nbd: Add types for extended headers Eric Blake
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: libguestfs, vsementsov, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

Widen the length field of NBDRequest to 64-bits, although we can
assert that all current uses are still under 32 bits.  Move the
request magic number to nbd.h, to live alongside the reply magic
number.  Convert 'bool structured_reply' into a tri-state enum that
will eventually track whether the client successfully negotiated
extended headers with the server, allowing the nbd driver to pass
larger requests along where possible; although in this patch the enum
never surpasses structured replies, for no semantic change yet.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h     | 36 +++++++++++++++++++++------------
 nbd/nbd-internal.h      |  3 +--
 block/nbd.c             | 45 +++++++++++++++++++++++++++--------------
 nbd/client-connection.c |  4 ++--
 nbd/client.c            | 18 ++++++++++-------
 nbd/server.c            | 37 +++++++++++++++++++--------------
 nbd/trace-events        |  8 ++++----
 7 files changed, 93 insertions(+), 58 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index f1d838d24f5..50626ab2744 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2022 Red Hat, Inc.
+ *  Copyright Red Hat
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -51,19 +51,26 @@ typedef struct NBDOptionReplyMetaContext {
     /* metadata context name follows */
 } QEMU_PACKED NBDOptionReplyMetaContext;

-/* Transmission phase structs
- *
- * Note: these are _NOT_ the same as the network representation of an NBD
- * request and reply!
+/* Transmission phase structs */
+
+/* Header style in use */
+typedef enum NBDHeaderStyle {
+    NBD_HEADER_SIMPLE,      /* default; simple replies only */
+    NBD_HEADER_STRUCTURED,  /* NBD_OPT_STRUCTURED_REPLY negotiated */
+    NBD_HEADER_EXTENDED,    /* NBD_OPT_EXTENDED_HEADERS negotiated */
+} NBDHeaderStyle;
+
+/*
+ * Note: NBDRequest is _NOT_ the same as the network representation of an NBD
+ * request!
  */
-struct NBDRequest {
+typedef struct NBDRequest {
     uint64_t handle;
-    uint64_t from;
-    uint32_t len;
+    uint64_t from;  /* Offset touched by the command */
+    uint64_t len;   /* Effect length; 32 bit limit without extended headers */
     uint16_t flags; /* NBD_CMD_FLAG_* */
-    uint16_t type; /* NBD_CMD_* */
-};
-typedef struct NBDRequest NBDRequest;
+    uint16_t type;  /* NBD_CMD_* */
+} NBDRequest;

 typedef struct NBDSimpleReply {
     uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
@@ -236,6 +243,9 @@ enum {
  */
 #define NBD_MAX_STRING_SIZE 4096

+/* Transmission request structure */
+#define NBD_REQUEST_MAGIC           0x25609513
+
 /* Two types of reply structures */
 #define NBD_SIMPLE_REPLY_MAGIC      0x67446698
 #define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
@@ -293,7 +303,7 @@ struct NBDExportInfo {

     /* In-out fields, set by client before nbd_receive_negotiate() and
      * updated by server results during nbd_receive_negotiate() */
-    bool structured_reply;
+    NBDHeaderStyle header_style;
     bool base_allocation; /* base:allocation context for NBD_CMD_BLOCK_STATUS */

     /* Set by server results during nbd_receive_negotiate() and
@@ -323,7 +333,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
                             Error **errp);
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
-int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
+int nbd_send_request(QIOChannel *ioc, NBDRequest *request, NBDHeaderStyle hdr);
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
                                    NBDReply *reply, Error **errp);
 int nbd_client(int fd);
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index df42fef7066..133b1d94b50 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -1,7 +1,7 @@
 /*
  * NBD Internal Declarations
  *
- * Copyright (C) 2016 Red Hat, Inc.
+ * Copyright Red Hat
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -44,7 +44,6 @@
 #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124)

 #define NBD_INIT_MAGIC              0x4e42444d41474943LL /* ASCII "NBDMAGIC" */
-#define NBD_REQUEST_MAGIC           0x25609513
 #define NBD_OPTS_MAGIC              0x49484156454F5054LL /* ASCII "IHAVEOPT" */
 #define NBD_CLIENT_MAGIC            0x0000420281861253LL
 #define NBD_REP_MAGIC               0x0003e889045565a9LL
diff --git a/block/nbd.c b/block/nbd.c
index a3f8f8a9d5e..6ad6a4f5ecd 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1,8 +1,8 @@
 /*
- * QEMU Block driver for  NBD
+ * QEMU Block driver for NBD
  *
  * Copyright (c) 2019 Virtuozzo International GmbH.
- * Copyright (C) 2016 Red Hat, Inc.
+ * Copyright Red Hat
  * Copyright (C) 2008 Bull S.A.S.
  *     Author: Laurent Vivier <Laurent.Vivier@bull.net>
  *
@@ -341,7 +341,7 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
          */
         NBDRequest request = { .type = NBD_CMD_DISC };

-        nbd_send_request(s->ioc, &request);
+        nbd_send_request(s->ioc, &request, s->info.header_style);

         yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
                                  nbd_yank, bs);
@@ -464,7 +464,8 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
             nbd_channel_error(s, ret);
             return ret;
         }
-        if (nbd_reply_is_structured(&s->reply) && !s->info.structured_reply) {
+        if (nbd_reply_is_structured(&s->reply) &&
+            s->info.header_style < NBD_HEADER_STRUCTURED) {
             nbd_channel_error(s, -EINVAL);
             return -EINVAL;
         }
@@ -525,14 +526,14 @@ nbd_co_send_request(BlockDriverState *bs, NBDRequest *request,

     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
-        rc = nbd_send_request(s->ioc, request);
+        rc = nbd_send_request(s->ioc, request, s->info.header_style);
         if (rc >= 0 && qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                               NULL) < 0) {
             rc = -EIO;
         }
         qio_channel_set_cork(s->ioc, false);
     } else {
-        rc = nbd_send_request(s->ioc, request);
+        rc = nbd_send_request(s->ioc, request, s->info.header_style);
     }
     qemu_co_mutex_unlock(&s->send_mutex);

@@ -867,7 +868,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
     }

     /* handle structured reply chunk */
-    assert(s->info.structured_reply);
+    assert(s->info.header_style >= NBD_HEADER_STRUCTURED);
     chunk = &s->reply.structured;

     if (chunk->type == NBD_REPLY_TYPE_NONE) {
@@ -1069,7 +1070,8 @@ static int coroutine_fn nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t h
     void *payload = NULL;
     Error *local_err = NULL;

-    NBD_FOREACH_REPLY_CHUNK(s, iter, handle, s->info.structured_reply,
+    NBD_FOREACH_REPLY_CHUNK(s, iter, handle,
+                            s->info.header_style >= NBD_HEADER_STRUCTURED,
                             qiov, &reply, &payload)
     {
         int ret;
@@ -1301,10 +1303,11 @@ nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int64_t bytes,
     NBDRequest request = {
         .type = NBD_CMD_WRITE_ZEROES,
         .from = offset,
-        .len = bytes,  /* .len is uint32_t actually */
+        .len = bytes,
     };

-    assert(bytes <= UINT32_MAX); /* rely on max_pwrite_zeroes */
+    /* rely on max_pwrite_zeroes */
+    assert(bytes <= UINT32_MAX || s->info.header_style >= NBD_HEADER_EXTENDED);

     assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
     if (!(s->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
@@ -1351,10 +1354,11 @@ nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
     NBDRequest request = {
         .type = NBD_CMD_TRIM,
         .from = offset,
-        .len = bytes, /* len is uint32_t */
+        .len = bytes,
     };

-    assert(bytes <= UINT32_MAX); /* rely on max_pdiscard */
+    /* rely on max_pdiscard */
+    assert(bytes <= UINT32_MAX || s->info.header_style >= NBD_HEADER_EXTENDED);

     assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
     if (!(s->info.flags & NBD_FLAG_SEND_TRIM) || !bytes) {
@@ -1376,8 +1380,7 @@ static int coroutine_fn GRAPH_RDLOCK nbd_client_co_block_status(
     NBDRequest request = {
         .type = NBD_CMD_BLOCK_STATUS,
         .from = offset,
-        .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
-                   MIN(bytes, s->info.size - offset)),
+        .len = MIN(bytes, s->info.size - offset),
         .flags = NBD_CMD_FLAG_REQ_ONE,
     };

@@ -1387,6 +1390,10 @@ static int coroutine_fn GRAPH_RDLOCK nbd_client_co_block_status(
         *file = bs;
         return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
     }
+    if (s->info.header_style < NBD_HEADER_EXTENDED) {
+        request.len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
+                          request.len);
+    }

     /*
      * Work around the fact that the block layer doesn't do
@@ -1465,7 +1472,7 @@ static void nbd_client_close(BlockDriverState *bs)
     NBDRequest request = { .type = NBD_CMD_DISC };

     if (s->ioc) {
-        nbd_send_request(s->ioc, &request);
+        nbd_send_request(s->ioc, &request, s->info.header_style);
     }

     nbd_teardown_connection(bs);
@@ -1951,6 +1958,14 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
     bs->bl.max_pwrite_zeroes = max;
     bs->bl.max_transfer = max;

+    /*
+     * Assume that if the server supports extended headers, it also
+     * supports unlimited size zero and trim commands.
+     */
+    if (s->info.header_style >= NBD_HEADER_EXTENDED) {
+        bs->bl.max_pdiscard = bs->bl.max_pwrite_zeroes = 0;
+    }
+
     if (s->info.opt_block &&
         s->info.opt_block > bs->bl.opt_transfer) {
         bs->bl.opt_transfer = s->info.opt_block;
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index e5b1046a1c7..62d75af0bb3 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -1,5 +1,5 @@
 /*
- * QEMU Block driver for  NBD
+ * QEMU Block driver for NBD
  *
  * Copyright (c) 2021 Virtuozzo International GmbH.
  *
@@ -93,7 +93,7 @@ NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
         .do_negotiation = do_negotiation,

         .initial_info.request_sizes = true,
-        .initial_info.structured_reply = true,
+        .initial_info.header_style = NBD_HEADER_STRUCTURED,
         .initial_info.base_allocation = true,
         .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
         .initial_info.name = g_strdup(export_name ?: "")
diff --git a/nbd/client.c b/nbd/client.c
index 46f476400ab..17d1f57da60 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2019 Red Hat, Inc.
+ *  Copyright Red Hat
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Client Side
@@ -1031,9 +1031,10 @@ int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
     trace_nbd_receive_negotiate_name(info->name);

     result = nbd_start_negotiate(aio_context, ioc, tlscreds, hostname, outioc,
-                                 info->structured_reply, &zeroes, errp);
+                                 info->header_style >= NBD_HEADER_STRUCTURED,
+                                 &zeroes, errp);

-    info->structured_reply = false;
+    info->header_style = NBD_HEADER_SIMPLE;
     info->base_allocation = false;
     if (tlscreds && *outioc) {
         ioc = *outioc;
@@ -1041,7 +1042,7 @@ int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,

     switch (result) {
     case 3: /* newstyle, with structured replies */
-        info->structured_reply = true;
+        info->header_style = NBD_HEADER_STRUCTURED;
         if (base_allocation) {
             result = nbd_negotiate_simple_meta_context(ioc, info, errp);
             if (result < 0) {
@@ -1179,7 +1180,8 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
             memset(&array[count - 1], 0, sizeof(*array));
             array[count - 1].name = name;
             array[count - 1].description = desc;
-            array[count - 1].structured_reply = result == 3;
+            array[count - 1].header_style = result == 3 ?
+                NBD_HEADER_STRUCTURED : NBD_HEADER_SIMPLE;
         }

         for (i = 0; i < count; i++) {
@@ -1222,7 +1224,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
         if (nbd_drop(ioc, 124, NULL) == 0) {
             NBDRequest request = { .type = NBD_CMD_DISC };

-            nbd_send_request(ioc, &request);
+            nbd_send_request(ioc, &request, NBD_HEADER_SIMPLE);
         }
         break;
     default:
@@ -1346,10 +1348,12 @@ int nbd_disconnect(int fd)

 #endif /* __linux__ */

-int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
+int nbd_send_request(QIOChannel *ioc, NBDRequest *request, NBDHeaderStyle hdr)
 {
     uint8_t buf[NBD_REQUEST_SIZE];

+    assert(hdr < NBD_HEADER_EXTENDED);
+    assert(request->len <= UINT32_MAX);
     trace_nbd_send_request(request->from, request->len, request->handle,
                            request->flags, request->type,
                            nbd_cmd_lookup(request->type));
diff --git a/nbd/server.c b/nbd/server.c
index eefe3401560..cf38a104d9a 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2022 Red Hat, Inc.
+ *  Copyright Red Hat
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Server Side
@@ -143,7 +143,7 @@ struct NBDClient {

     uint32_t check_align; /* If non-zero, check for aligned client requests */

-    bool structured_reply;
+    NBDHeaderStyle header_style;
     NBDExportMetaContexts export_meta;

     uint32_t opt; /* Current option being negotiated */
@@ -502,7 +502,7 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
     }

     myflags = client->exp->nbdflags;
-    if (client->structured_reply) {
+    if (client->header_style >= NBD_HEADER_STRUCTURED) {
         myflags |= NBD_FLAG_SEND_DF;
     }
     trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags);
@@ -687,7 +687,7 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)

     /* Send NBD_INFO_EXPORT always */
     myflags = exp->nbdflags;
-    if (client->structured_reply) {
+    if (client->header_style >= NBD_HEADER_STRUCTURED) {
         myflags |= NBD_FLAG_SEND_DF;
     }
     trace_nbd_negotiate_new_style_size_flags(exp->size, myflags);
@@ -985,7 +985,8 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
     size_t i;
     size_t count = 0;

-    if (client->opt == NBD_OPT_SET_META_CONTEXT && !client->structured_reply) {
+    if (client->opt == NBD_OPT_SET_META_CONTEXT &&
+        client->header_style < NBD_HEADER_STRUCTURED) {
         return nbd_opt_invalid(client, errp,
                                "request option '%s' when structured reply "
                                "is not negotiated",
@@ -1261,13 +1262,13 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
             case NBD_OPT_STRUCTURED_REPLY:
                 if (length) {
                     ret = nbd_reject_length(client, false, errp);
-                } else if (client->structured_reply) {
+                } else if (client->header_style >= NBD_HEADER_STRUCTURED) {
                     ret = nbd_negotiate_send_rep_err(
                         client, NBD_REP_ERR_INVALID, errp,
                         "structured reply already negotiated");
                 } else {
                     ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
-                    client->structured_reply = true;
+                    client->header_style = NBD_HEADER_STRUCTURED;
                 }
                 break;

@@ -1438,7 +1439,7 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque
     request->type   = lduw_be_p(buf + 6);
     request->handle = ldq_be_p(buf + 8);
     request->from   = ldq_be_p(buf + 16);
-    request->len    = ldl_be_p(buf + 24);
+    request->len    = ldl_be_p(buf + 24); /* widen 32 to 64 bits */

     trace_nbd_receive_request(magic, request->flags, request->type,
                               request->from, request->len);
@@ -2343,7 +2344,7 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
         request->type == NBD_CMD_CACHE)
     {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
-            error_setg(errp, "len (%" PRIu32" ) is larger than max len (%u)",
+            error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
                        request->len, NBD_MAX_BUFFER_SIZE);
             return -EINVAL;
         }
@@ -2359,6 +2360,7 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
     }

     if (request->type == NBD_CMD_WRITE) {
+        assert(request->len <= NBD_MAX_BUFFER_SIZE);
         if (nbd_read(client->ioc, req->data, request->len, "CMD_WRITE data",
                      errp) < 0)
         {
@@ -2380,7 +2382,7 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
     }
     if (request->from > client->exp->size ||
         request->len > client->exp->size - request->from) {
-        error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu32
+        error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu64
                    ", Size: %" PRIu64, request->from, request->len,
                    client->exp->size);
         return (request->type == NBD_CMD_WRITE ||
@@ -2398,7 +2400,8 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
                                               client->check_align);
     }
     valid_flags = NBD_CMD_FLAG_FUA;
-    if (request->type == NBD_CMD_READ && client->structured_reply) {
+    if (request->type == NBD_CMD_READ &&
+        client->header_style >= NBD_HEADER_STRUCTURED) {
         valid_flags |= NBD_CMD_FLAG_DF;
     } else if (request->type == NBD_CMD_WRITE_ZEROES) {
         valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO;
@@ -2424,7 +2427,7 @@ static coroutine_fn int nbd_send_generic_reply(NBDClient *client,
                                                const char *error_msg,
                                                Error **errp)
 {
-    if (client->structured_reply && ret < 0) {
+    if (client->header_style >= NBD_HEADER_STRUCTURED && ret < 0) {
         return nbd_co_send_structured_error(client, handle, -ret, error_msg,
                                             errp);
     } else {
@@ -2443,6 +2446,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request,
     NBDExport *exp = client->exp;

     assert(request->type == NBD_CMD_READ);
+    assert(request->len <= NBD_MAX_BUFFER_SIZE);

     /* XXX: NBD Protocol only documents use of FUA with WRITE */
     if (request->flags & NBD_CMD_FLAG_FUA) {
@@ -2453,8 +2457,8 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request,
         }
     }

-    if (client->structured_reply && !(request->flags & NBD_CMD_FLAG_DF) &&
-        request->len)
+    if (client->header_style >= NBD_HEADER_STRUCTURED &&
+        !(request->flags & NBD_CMD_FLAG_DF) && request->len)
     {
         return nbd_co_send_sparse_read(client, request->handle, request->from,
                                        data, request->len, errp);
@@ -2466,7 +2470,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request,
                                       "reading from file failed", errp);
     }

-    if (client->structured_reply) {
+    if (client->header_style >= NBD_HEADER_STRUCTURED) {
         if (request->len) {
             return nbd_co_send_structured_read(client, request->handle,
                                                request->from, data,
@@ -2494,6 +2498,7 @@ static coroutine_fn int nbd_do_cmd_cache(NBDClient *client, NBDRequest *request,
     NBDExport *exp = client->exp;

     assert(request->type == NBD_CMD_CACHE);
+    assert(request->len <= NBD_MAX_BUFFER_SIZE);

     ret = blk_co_preadv(exp->common.blk, request->from, request->len,
                         NULL, BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);
@@ -2527,6 +2532,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         if (request->flags & NBD_CMD_FLAG_FUA) {
             flags |= BDRV_REQ_FUA;
         }
+        assert(request->len <= NBD_MAX_BUFFER_SIZE);
         ret = blk_co_pwrite(exp->common.blk, request->from, request->len, data,
                             flags);
         return nbd_send_generic_reply(client, request->handle, ret,
@@ -2570,6 +2576,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             return nbd_send_generic_reply(client, request->handle, -EINVAL,
                                           "need non-zero length", errp);
         }
+        assert(request->len <= UINT32_MAX);
         if (client->export_meta.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
             int contexts_remaining = client->export_meta.count;
diff --git a/nbd/trace-events b/nbd/trace-events
index b7032ca2778..e2c1d68688d 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -31,7 +31,7 @@ nbd_client_loop(void) "Doing NBD loop"
 nbd_client_loop_ret(int ret, const char *error) "NBD loop returned %d: %s"
 nbd_client_clear_queue(void) "Clearing NBD queue"
 nbd_client_clear_socket(void) "Clearing NBD socket"
-nbd_send_request(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
+nbd_send_request(uint64_t from, uint64_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu64 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
 nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t handle) "Got simple reply: { .error = %" PRId32 " (%s), handle = %" PRIu64" }"
 nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), handle = %" PRIu64 ", length = %" PRIu32 " }"

@@ -60,7 +60,7 @@ nbd_negotiate_options_check_option(uint32_t option, const char *name) "Checking
 nbd_negotiate_begin(void) "Beginning negotiation"
 nbd_negotiate_new_style_size_flags(uint64_t size, unsigned flags) "advertising size %" PRIu64 " and flags 0x%x"
 nbd_negotiate_success(void) "Negotiation succeeded"
-nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t from, uint32_t len) "Got request: { magic = 0x%" PRIx32 ", .flags = 0x%" PRIx16 ", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len = %" PRIu32 " }"
+nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t from, uint64_t len) "Got request: { magic = 0x%" PRIx32 ", .flags = 0x%" PRIx16 ", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len = %" PRIu64 " }"
 nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching clients to AIO context %p"
 nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching clients from AIO context %p"
 nbd_co_send_simple_reply(uint64_t handle, uint32_t error, const char *errname, int len) "Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 " (%s), len = %d"
@@ -70,8 +70,8 @@ nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset, size_t size)
 nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: handle = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
 nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'"
 nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
-nbd_co_receive_request_payload_received(uint64_t handle, uint32_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu32
-nbd_co_receive_align_compliance(const char *op, uint64_t from, uint32_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx32 ", align=0x%" PRIx32
+nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64
+nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32
 nbd_trip(void) "Reading request"

 # client-connection.c
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 05/14] nbd: Add types for extended headers
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (3 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 04/14] nbd: Prepare for 64-bit request effect lengths Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-30 13:23   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 06/14] nbd/server: Refactor handling of request payload Eric Blake
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: libguestfs, vsementsov, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

Add the constants and structs necessary for later patches to start
implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client
and server, matching recent commit e6f3b94a934] in the upstream nbd
project.  This patch does not change any existing behavior, but merely
sets the stage.

This patch does not change the status quo that neither the client nor
server use a packed-struct representation for the request header.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 docs/interop/nbd.txt |  1 +
 include/block/nbd.h  | 74 ++++++++++++++++++++++++++++++++------------
 nbd/common.c         | 10 +++++-
 3 files changed, 65 insertions(+), 20 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f5ca25174a6..abaf4c28a96 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,3 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
+* 8.1: NBD_OPT_EXTENDED_HEADERS
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 50626ab2744..d753fb8006f 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -87,13 +87,24 @@ typedef struct NBDStructuredReplyChunk {
     uint32_t length; /* length of payload */
 } QEMU_PACKED NBDStructuredReplyChunk;

+typedef struct NBDExtendedReplyChunk {
+    uint32_t magic;  /* NBD_EXTENDED_REPLY_MAGIC */
+    uint16_t flags;  /* combination of NBD_REPLY_FLAG_* */
+    uint16_t type;   /* NBD_REPLY_TYPE_* */
+    uint64_t handle; /* request handle */
+    uint64_t offset; /* request offset */
+    uint64_t length; /* length of payload */
+} QEMU_PACKED NBDExtendedReplyChunk;
+
 typedef union NBDReply {
     NBDSimpleReply simple;
     NBDStructuredReplyChunk structured;
+    NBDExtendedReplyChunk extended;
     struct {
-        /* @magic and @handle fields have the same offset and size both in
-         * simple reply and structured reply chunk, so let them be accessible
-         * without ".simple." or ".structured." specification
+        /*
+         * @magic and @handle fields have the same offset and size in all
+         * forms of replies, so let them be accessible without ".simple.",
+         * ".structured.", or ".extended." specifications.
          */
         uint32_t magic;
         uint32_t _skip;
@@ -126,15 +137,29 @@ typedef struct NBDStructuredError {
 typedef struct NBDStructuredMeta {
     /* header's length >= 12 (at least one extent) */
     uint32_t context_id;
-    /* extents follows */
+    /* NBDExtent extents[] follows, array length implied by header */
 } QEMU_PACKED NBDStructuredMeta;

-/* Extent chunk for NBD_REPLY_TYPE_BLOCK_STATUS */
+/* Extent array for NBD_REPLY_TYPE_BLOCK_STATUS */
 typedef struct NBDExtent {
     uint32_t length;
     uint32_t flags; /* NBD_STATE_* */
 } QEMU_PACKED NBDExtent;

+/* Header of NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
+typedef struct NBDStructuredMetaExt {
+    /* header's length >= 24 (at least one extent) */
+    uint32_t context_id;
+    uint32_t count; /* header length must be count * 16 + 8 */
+    /* NBDExtentExt extents[count] follows */
+} QEMU_PACKED NBDStructuredMetaExt;
+
+/* Extent array for NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
+typedef struct NBDExtentExt {
+    uint64_t length;
+    uint64_t flags; /* NBD_STATE_* */
+} QEMU_PACKED NBDExtentExt;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 enum {
@@ -187,6 +212,7 @@ enum {
 #define NBD_OPT_STRUCTURED_REPLY  (8)
 #define NBD_OPT_LIST_META_CONTEXT (9)
 #define NBD_OPT_SET_META_CONTEXT  (10)
+#define NBD_OPT_EXTENDED_HEADERS  (11)

 /* Option reply types. */
 #define NBD_REP_ERR(value) ((UINT32_C(1) << 31) | (value))
@@ -204,6 +230,8 @@ enum {
 #define NBD_REP_ERR_UNKNOWN         NBD_REP_ERR(6)  /* Export unknown */
 #define NBD_REP_ERR_SHUTDOWN        NBD_REP_ERR(7)  /* Server shutting down */
 #define NBD_REP_ERR_BLOCK_SIZE_REQD NBD_REP_ERR(8)  /* Need INFO_BLOCK_SIZE */
+#define NBD_REP_ERR_TOO_BIG         NBD_REP_ERR(9)  /* Payload size overflow */
+#define NBD_REP_ERR_EXT_HEADER_REQD NBD_REP_ERR(10) /* Need extended headers */

 /* Info types, used during NBD_REP_INFO */
 #define NBD_INFO_EXPORT         0
@@ -212,12 +240,14 @@ enum {
 #define NBD_INFO_BLOCK_SIZE     3

 /* Request flags, sent from client to server during transmission phase */
-#define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
-#define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */
-#define NBD_CMD_FLAG_DF         (1 << 2) /* don't fragment structured read */
-#define NBD_CMD_FLAG_REQ_ONE    (1 << 3) /* only one extent in BLOCK_STATUS
-                                          * reply chunk */
-#define NBD_CMD_FLAG_FAST_ZERO  (1 << 4) /* fail if WRITE_ZEROES is not fast */
+#define NBD_CMD_FLAG_FUA         (1 << 0) /* 'force unit access' during write */
+#define NBD_CMD_FLAG_NO_HOLE     (1 << 1) /* don't punch hole on zero run */
+#define NBD_CMD_FLAG_DF          (1 << 2) /* don't fragment structured read */
+#define NBD_CMD_FLAG_REQ_ONE     (1 << 3) \
+    /* only one extent in BLOCK_STATUS reply chunk */
+#define NBD_CMD_FLAG_FAST_ZERO   (1 << 4) /* fail if WRITE_ZEROES is not fast */
+#define NBD_CMD_FLAG_PAYLOAD_LEN (1 << 5) \
+    /* length describes payload, not effect; only with ext header */

 /* Supported request types */
 enum {
@@ -243,12 +273,17 @@ enum {
  */
 #define NBD_MAX_STRING_SIZE 4096

-/* Transmission request structure */
+/* Two types of request structures, a given client will only use 1 */
 #define NBD_REQUEST_MAGIC           0x25609513
+#define NBD_EXTENDED_REQUEST_MAGIC  0x21e41c71

-/* Two types of reply structures */
+/*
+ * Three types of reply structures, but what a client expects depends
+ * on NBD_OPT_STRUCTURED_REPLY and NBD_OPT_EXTENDED_HEADERS.
+ */
 #define NBD_SIMPLE_REPLY_MAGIC      0x67446698
 #define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
+#define NBD_EXTENDED_REPLY_MAGIC    0x6e8a278c

 /* Structured reply flags */
 #define NBD_REPLY_FLAG_DONE          (1 << 0) /* This reply-chunk is last */
@@ -256,12 +291,13 @@ enum {
 /* Structured reply types */
 #define NBD_REPLY_ERR(value)         ((1 << 15) | (value))

-#define NBD_REPLY_TYPE_NONE          0
-#define NBD_REPLY_TYPE_OFFSET_DATA   1
-#define NBD_REPLY_TYPE_OFFSET_HOLE   2
-#define NBD_REPLY_TYPE_BLOCK_STATUS  5
-#define NBD_REPLY_TYPE_ERROR         NBD_REPLY_ERR(1)
-#define NBD_REPLY_TYPE_ERROR_OFFSET  NBD_REPLY_ERR(2)
+#define NBD_REPLY_TYPE_NONE              0
+#define NBD_REPLY_TYPE_OFFSET_DATA       1
+#define NBD_REPLY_TYPE_OFFSET_HOLE       2
+#define NBD_REPLY_TYPE_BLOCK_STATUS      5
+#define NBD_REPLY_TYPE_BLOCK_STATUS_EXT  6
+#define NBD_REPLY_TYPE_ERROR             NBD_REPLY_ERR(1)
+#define NBD_REPLY_TYPE_ERROR_OFFSET      NBD_REPLY_ERR(2)

 /* Extent flags for base:allocation in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_HOLE (1 << 0)
diff --git a/nbd/common.c b/nbd/common.c
index ddfe7d11837..137466defd2 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -79,6 +79,8 @@ const char *nbd_opt_lookup(uint32_t opt)
         return "list meta context";
     case NBD_OPT_SET_META_CONTEXT:
         return "set meta context";
+    case NBD_OPT_EXTENDED_HEADERS:
+        return "extended headers";
     default:
         return "<unknown>";
     }
@@ -112,6 +114,10 @@ const char *nbd_rep_lookup(uint32_t rep)
         return "server shutting down";
     case NBD_REP_ERR_BLOCK_SIZE_REQD:
         return "block size required";
+    case NBD_REP_ERR_TOO_BIG:
+        return "option payload too big";
+    case NBD_REP_ERR_EXT_HEADER_REQD:
+        return "extended headers required";
     default:
         return "<unknown>";
     }
@@ -170,7 +176,9 @@ const char *nbd_reply_type_lookup(uint16_t type)
     case NBD_REPLY_TYPE_OFFSET_HOLE:
         return "hole";
     case NBD_REPLY_TYPE_BLOCK_STATUS:
-        return "block status";
+        return "block status (32-bit)";
+    case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
+        return "block status (64-bit)";
     case NBD_REPLY_TYPE_ERROR:
         return "generic error";
     case NBD_REPLY_TYPE_ERROR_OFFSET:
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 06/14] nbd/server: Refactor handling of request payload
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (4 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 05/14] nbd: Add types for extended headers Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-31  8:04   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 07/14] nbd/server: Refactor to pass full request around Eric Blake
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, vsementsov, open list:Network Block Dev...

Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (up to 63 bits).  Without that extension, only the
NBD_CMD_WRITE request has a payload; but with the extension, it makes
sense to allow at least NBD_CMD_BLOCK_STATUS to have both a payload
and effect length (where the payload is a limited-size struct that in
turns gives the real effect length as well as a subset of known ids
for which status is requested).  Other future NBD commands may also
have a request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.  Note that we do not
support the payload version of BLOCK_STATUS yet.

For this patch, no semantic change is intended for a compliant client.
For a non-compliant client, it is possible that the error behavior
changes (a different message, a change on whether the connection is
killed or remains alive for the next command, or so forth), but all
errors should still be handled gracefully.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c     | 55 +++++++++++++++++++++++++++++++++---------------
 nbd/trace-events |  1 +
 2 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index cf38a104d9a..5812a773ace 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2316,6 +2316,8 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
                                                Error **errp)
 {
     NBDClient *client = req->client;
+    bool extended_with_payload;
+    int payload_len = 0;
     int valid_flags;
     int ret;

@@ -2329,27 +2331,41 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
     trace_nbd_co_receive_request_decode_type(request->handle, request->type,
                                              nbd_cmd_lookup(request->type));

-    if (request->type != NBD_CMD_WRITE) {
-        /* No payload, we are ready to read the next request.  */
-        req->complete = true;
-    }
-
     if (request->type == NBD_CMD_DISC) {
         /* Special case: we're going to disconnect without a reply,
          * whether or not flags, from, or len are bogus */
+        req->complete = true;
         return -EIO;
     }

+    /* Payload and buffer handling. */
+    extended_with_payload = client->header_style >= NBD_HEADER_EXTENDED &&
+        (request->flags & NBD_CMD_FLAG_PAYLOAD_LEN);
     if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
-        request->type == NBD_CMD_CACHE)
-    {
+        request->type == NBD_CMD_CACHE || extended_with_payload) {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
             error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
                        request->len, NBD_MAX_BUFFER_SIZE);
             return -EINVAL;
         }

-        if (request->type != NBD_CMD_CACHE) {
+        if (request->type == NBD_CMD_WRITE || extended_with_payload) {
+            payload_len = request->len;
+            if (request->type != NBD_CMD_WRITE) {
+                /*
+                 * For now, we don't support payloads on other
+                 * commands; but we can keep the connection alive.
+                 */
+                request->len = 0;
+            } else if (client->header_style >= NBD_HEADER_EXTENDED &&
+                       !extended_with_payload) {
+                /* The client is noncompliant. Trace it, but proceed. */
+                trace_nbd_co_receive_ext_payload_compliance(request->from,
+                                                            request->len);
+            }
+        }
+
+        if (request->type == NBD_CMD_WRITE || request->type == NBD_CMD_READ) {
             req->data = blk_try_blockalign(client->exp->common.blk,
                                            request->len);
             if (req->data == NULL) {
@@ -2359,18 +2375,20 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
         }
     }

-    if (request->type == NBD_CMD_WRITE) {
-        assert(request->len <= NBD_MAX_BUFFER_SIZE);
-        if (nbd_read(client->ioc, req->data, request->len, "CMD_WRITE data",
-                     errp) < 0)
-        {
+    if (payload_len) {
+        if (req->data) {
+            ret = nbd_read(client->ioc, req->data, payload_len,
+                           "CMD_WRITE data", errp);
+        } else {
+            ret = nbd_drop(client->ioc, payload_len, errp);
+        }
+        if (ret < 0) {
             return -EIO;
         }
-        req->complete = true;
-
         trace_nbd_co_receive_request_payload_received(request->handle,
-                                                      request->len);
+                                                      payload_len);
     }
+    req->complete = true;

     /* Sanity checks. */
     if (client->exp->nbdflags & NBD_FLAG_READ_ONLY &&
@@ -2400,7 +2418,10 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
                                               client->check_align);
     }
     valid_flags = NBD_CMD_FLAG_FUA;
-    if (request->type == NBD_CMD_READ &&
+    if (request->type == NBD_CMD_WRITE &&
+        client->header_style >= NBD_HEADER_EXTENDED) {
+        valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
+    } else if (request->type == NBD_CMD_READ &&
         client->header_style >= NBD_HEADER_STRUCTURED) {
         valid_flags |= NBD_CMD_FLAG_DF;
     } else if (request->type == NBD_CMD_WRITE_ZEROES) {
diff --git a/nbd/trace-events b/nbd/trace-events
index e2c1d68688d..adf5666e207 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -71,6 +71,7 @@ nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t
 nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'"
 nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
 nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64
+nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
 nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32
 nbd_trip(void) "Reading request"

-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 07/14] nbd/server: Refactor to pass full request around
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (5 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 06/14] nbd/server: Refactor handling of request payload Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-31  8:13   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 08/14] nbd/server: Support 64-bit block status Eric Blake
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, vsementsov, open list:Network Block Dev...

Part of NBD's 64-bit headers extension involves passing the client's
requested offset back as part of the reply header (one reason for this
change: converting absolute offsets stored in
NBD_REPLY_TYPE_OFFSET_DATA to relative offsets within the buffer is
easier if the absolute offset of the buffer is also available).  This
is a refactoring patch to pass the full request around the reply
stack, rather than just the handle, so that later patches can then
access request->from when extended headers are active.  But for this
patch, there are no semantic changes.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 117 +++++++++++++++++++++++++++------------------------
 1 file changed, 61 insertions(+), 56 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 5812a773ace..ffab51efd26 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1887,18 +1887,18 @@ static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
 }

 static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov,
-                                       uint64_t error, uint64_t handle)
+                                       uint64_t error, NBDRequest *request)
 {
     NBDSimpleReply *reply = iov->iov_base;

     iov->iov_len = sizeof(*reply);
     stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
     stl_be_p(&reply->error, error);
-    stq_be_p(&reply->handle, handle);
+    stq_be_p(&reply->handle, request->handle);
 }

 static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
-                                                 uint64_t handle,
+                                                 NBDRequest *request,
                                                  uint32_t error,
                                                  void *data,
                                                  size_t len,
@@ -1911,16 +1911,16 @@ static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
         {.iov_base = data, .iov_len = len}
     };

-    trace_nbd_co_send_simple_reply(handle, nbd_err, nbd_err_lookup(nbd_err),
-                                   len);
-    set_be_simple_reply(client, &iov[0], nbd_err, handle);
+    trace_nbd_co_send_simple_reply(request->handle, nbd_err,
+                                   nbd_err_lookup(nbd_err), len);
+    set_be_simple_reply(client, &iov[0], nbd_err, request);

     return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
 }

 static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
                                 uint16_t flags, uint16_t type,
-                                uint64_t handle, uint32_t length)
+                                NBDRequest *request, uint32_t length)
 {
     NBDStructuredReplyChunk *chunk = iov->iov_base;

@@ -1928,12 +1928,12 @@ static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
     stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
     stw_be_p(&chunk->flags, flags);
     stw_be_p(&chunk->type, type);
-    stq_be_p(&chunk->handle, handle);
+    stq_be_p(&chunk->handle, request->handle);
     stl_be_p(&chunk->length, length);
 }

 static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
-                                                    uint64_t handle,
+                                                    NBDRequest *request,
                                                     Error **errp)
 {
     NBDReply hdr;
@@ -1941,15 +1941,15 @@ static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
         {.iov_base = &hdr},
     };

-    trace_nbd_co_send_structured_done(handle);
+    trace_nbd_co_send_structured_done(request->handle);
     set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE,
-                 NBD_REPLY_TYPE_NONE, handle, 0);
+                 NBD_REPLY_TYPE_NONE, request, 0);

     return nbd_co_send_iov(client, iov, 1, errp);
 }

 static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
-                                                    uint64_t handle,
+                                                    NBDRequest *request,
                                                     uint64_t offset,
                                                     void *data,
                                                     size_t size,
@@ -1965,16 +1965,16 @@ static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
     };

     assert(size);
-    trace_nbd_co_send_structured_read(handle, offset, data, size);
+    trace_nbd_co_send_structured_read(request->handle, offset, data, size);
     set_be_chunk(client, &iov[0], final ? NBD_REPLY_FLAG_DONE : 0,
-                 NBD_REPLY_TYPE_OFFSET_DATA, handle, iov[1].iov_len + size);
+                 NBD_REPLY_TYPE_OFFSET_DATA, request, iov[1].iov_len + size);
     stq_be_p(&chunk.offset, offset);

     return nbd_co_send_iov(client, iov, 3, errp);
 }

 static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
-                                                     uint64_t handle,
+                                                     NBDRequest *request,
                                                      uint32_t error,
                                                      const char *msg,
                                                      Error **errp)
@@ -1989,10 +1989,11 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
     };

     assert(nbd_err);
-    trace_nbd_co_send_structured_error(handle, nbd_err,
+    trace_nbd_co_send_structured_error(request->handle, nbd_err,
                                        nbd_err_lookup(nbd_err), msg ? msg : "");
     set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE,
-                 NBD_REPLY_TYPE_ERROR, handle, iov[1].iov_len + iov[2].iov_len);
+                 NBD_REPLY_TYPE_ERROR, request,
+                 iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.error, nbd_err);
     stw_be_p(&chunk.message_length, iov[2].iov_len);

@@ -2004,7 +2005,7 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
  * reported to the client, at which point this function succeeds.
  */
 static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
-                                                uint64_t handle,
+                                                NBDRequest *request,
                                                 uint64_t offset,
                                                 uint8_t *data,
                                                 size_t size,
@@ -2026,7 +2027,7 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
             char *msg = g_strdup_printf("unable to check for holes: %s",
                                         strerror(-status));

-            ret = nbd_co_send_structured_error(client, handle, -status, msg,
+            ret = nbd_co_send_structured_error(client, request, -status, msg,
                                                errp);
             g_free(msg);
             return ret;
@@ -2041,12 +2042,12 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
                 {.iov_base = &chunk, .iov_len = sizeof(chunk)},
             };

-            trace_nbd_co_send_structured_read_hole(handle, offset + progress,
+            trace_nbd_co_send_structured_read_hole(request->handle,
+                                                   offset + progress,
                                                    pnum);
             set_be_chunk(client, &iov[0],
                          final ? NBD_REPLY_FLAG_DONE : 0,
-                         NBD_REPLY_TYPE_OFFSET_HOLE,
-                         handle, iov[1].iov_len);
+                         NBD_REPLY_TYPE_OFFSET_HOLE, request, iov[1].iov_len);
             stq_be_p(&chunk.offset, offset + progress);
             stl_be_p(&chunk.length, pnum);
             ret = nbd_co_send_iov(client, iov, 2, errp);
@@ -2057,7 +2058,8 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
                 error_setg_errno(errp, -ret, "reading from file failed");
                 break;
             }
-            ret = nbd_co_send_structured_read(client, handle, offset + progress,
+            ret = nbd_co_send_structured_read(client, request,
+                                              offset + progress,
                                               data + progress, pnum, final,
                                               errp);
         }
@@ -2212,7 +2214,7 @@ static int coroutine_fn blockalloc_to_extents(BlockBackend *blk,
  * @last controls whether NBD_REPLY_FLAG_DONE is sent.
  */
 static int coroutine_fn
-nbd_co_send_extents(NBDClient *client, uint64_t handle, NBDExtentArray *ea,
+nbd_co_send_extents(NBDClient *client, NBDRequest *request, NBDExtentArray *ea,
                     bool last, uint32_t context_id, Error **errp)
 {
     NBDReply hdr;
@@ -2225,11 +2227,11 @@ nbd_co_send_extents(NBDClient *client, uint64_t handle, NBDExtentArray *ea,

     nbd_extent_array_convert_to_be(ea);

-    trace_nbd_co_send_extents(handle, ea->count, context_id, ea->total_length,
-                              last);
+    trace_nbd_co_send_extents(request->handle, ea->count, context_id,
+                              ea->total_length, last);
     set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0,
                  NBD_REPLY_TYPE_BLOCK_STATUS,
-                 handle, iov[1].iov_len + iov[2].iov_len);
+                 request, iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.context_id, context_id);

     return nbd_co_send_iov(client, iov, 3, errp);
@@ -2237,7 +2239,7 @@ nbd_co_send_extents(NBDClient *client, uint64_t handle, NBDExtentArray *ea,

 /* Get block status from the exported device and send it to the client */
 static int
-coroutine_fn nbd_co_send_block_status(NBDClient *client, uint64_t handle,
+coroutine_fn nbd_co_send_block_status(NBDClient *client, NBDRequest *request,
                                       BlockBackend *blk, uint64_t offset,
                                       uint32_t length, bool dont_fragment,
                                       bool last, uint32_t context_id,
@@ -2254,10 +2256,10 @@ coroutine_fn nbd_co_send_block_status(NBDClient *client, uint64_t handle,
     }
     if (ret < 0) {
         return nbd_co_send_structured_error(
-                client, handle, -ret, "can't get block status", errp);
+                client, request, -ret, "can't get block status", errp);
     }

-    return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
+    return nbd_co_send_extents(client, request, ea, last, context_id, errp);
 }

 /* Populate @ea from a dirty bitmap. */
@@ -2292,17 +2294,20 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
     bdrv_dirty_bitmap_unlock(bitmap);
 }

-static int coroutine_fn nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
-                                           BdrvDirtyBitmap *bitmap, uint64_t offset,
-                                           uint32_t length, bool dont_fragment, bool last,
-                                           uint32_t context_id, Error **errp)
+static int coroutine_fn nbd_co_send_bitmap(NBDClient *client,
+                                           NBDRequest *request,
+                                           BdrvDirtyBitmap *bitmap,
+                                           uint64_t offset,
+                                           uint32_t length, bool dont_fragment,
+                                           bool last, uint32_t context_id,
+                                           Error **errp)
 {
     unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
     g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);

     bitmap_to_extents(bitmap, offset, length, ea);

-    return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
+    return nbd_co_send_extents(client, request, ea, last, context_id, errp);
 }

 /* nbd_co_receive_request
@@ -2443,16 +2448,16 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
  * Returns 0 if connection is still live, -errno on failure to talk to client
  */
 static coroutine_fn int nbd_send_generic_reply(NBDClient *client,
-                                               uint64_t handle,
+                                               NBDRequest *request,
                                                int ret,
                                                const char *error_msg,
                                                Error **errp)
 {
     if (client->header_style >= NBD_HEADER_STRUCTURED && ret < 0) {
-        return nbd_co_send_structured_error(client, handle, -ret, error_msg,
+        return nbd_co_send_structured_error(client, request, -ret, error_msg,
                                             errp);
     } else {
-        return nbd_co_send_simple_reply(client, handle, ret < 0 ? -ret : 0,
+        return nbd_co_send_simple_reply(client, request, ret < 0 ? -ret : 0,
                                         NULL, 0, errp);
     }
 }
@@ -2473,7 +2478,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request,
     if (request->flags & NBD_CMD_FLAG_FUA) {
         ret = blk_co_flush(exp->common.blk);
         if (ret < 0) {
-            return nbd_send_generic_reply(client, request->handle, ret,
+            return nbd_send_generic_reply(client, request, ret,
                                           "flush failed", errp);
         }
     }
@@ -2481,26 +2486,26 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request,
     if (client->header_style >= NBD_HEADER_STRUCTURED &&
         !(request->flags & NBD_CMD_FLAG_DF) && request->len)
     {
-        return nbd_co_send_sparse_read(client, request->handle, request->from,
+        return nbd_co_send_sparse_read(client, request, request->from,
                                        data, request->len, errp);
     }

     ret = blk_co_pread(exp->common.blk, request->from, request->len, data, 0);
     if (ret < 0) {
-        return nbd_send_generic_reply(client, request->handle, ret,
+        return nbd_send_generic_reply(client, request, ret,
                                       "reading from file failed", errp);
     }

     if (client->header_style >= NBD_HEADER_STRUCTURED) {
         if (request->len) {
-            return nbd_co_send_structured_read(client, request->handle,
+            return nbd_co_send_structured_read(client, request,
                                                request->from, data,
                                                request->len, true, errp);
         } else {
-            return nbd_co_send_structured_done(client, request->handle, errp);
+            return nbd_co_send_structured_done(client, request, errp);
         }
     } else {
-        return nbd_co_send_simple_reply(client, request->handle, 0,
+        return nbd_co_send_simple_reply(client, request, 0,
                                         data, request->len, errp);
     }
 }
@@ -2524,7 +2529,7 @@ static coroutine_fn int nbd_do_cmd_cache(NBDClient *client, NBDRequest *request,
     ret = blk_co_preadv(exp->common.blk, request->from, request->len,
                         NULL, BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);

-    return nbd_send_generic_reply(client, request->handle, ret,
+    return nbd_send_generic_reply(client, request, ret,
                                   "caching data failed", errp);
 }

@@ -2556,7 +2561,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         assert(request->len <= NBD_MAX_BUFFER_SIZE);
         ret = blk_co_pwrite(exp->common.blk, request->from, request->len, data,
                             flags);
-        return nbd_send_generic_reply(client, request->handle, ret,
+        return nbd_send_generic_reply(client, request, ret,
                                       "writing to file failed", errp);

     case NBD_CMD_WRITE_ZEROES:
@@ -2572,7 +2577,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         }
         ret = blk_co_pwrite_zeroes(exp->common.blk, request->from, request->len,
                                    flags);
-        return nbd_send_generic_reply(client, request->handle, ret,
+        return nbd_send_generic_reply(client, request, ret,
                                       "writing to file failed", errp);

     case NBD_CMD_DISC:
@@ -2581,7 +2586,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,

     case NBD_CMD_FLUSH:
         ret = blk_co_flush(exp->common.blk);
-        return nbd_send_generic_reply(client, request->handle, ret,
+        return nbd_send_generic_reply(client, request, ret,
                                       "flush failed", errp);

     case NBD_CMD_TRIM:
@@ -2589,12 +2594,12 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         if (ret >= 0 && request->flags & NBD_CMD_FLAG_FUA) {
             ret = blk_co_flush(exp->common.blk);
         }
-        return nbd_send_generic_reply(client, request->handle, ret,
+        return nbd_send_generic_reply(client, request, ret,
                                       "discard failed", errp);

     case NBD_CMD_BLOCK_STATUS:
         if (!request->len) {
-            return nbd_send_generic_reply(client, request->handle, -EINVAL,
+            return nbd_send_generic_reply(client, request, -EINVAL,
                                           "need non-zero length", errp);
         }
         assert(request->len <= UINT32_MAX);
@@ -2603,7 +2608,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             int contexts_remaining = client->export_meta.count;

             if (client->export_meta.base_allocation) {
-                ret = nbd_co_send_block_status(client, request->handle,
+                ret = nbd_co_send_block_status(client, request,
                                                exp->common.blk,
                                                request->from,
                                                request->len, dont_fragment,
@@ -2616,7 +2621,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             }

             if (client->export_meta.allocation_depth) {
-                ret = nbd_co_send_block_status(client, request->handle,
+                ret = nbd_co_send_block_status(client, request,
                                                exp->common.blk,
                                                request->from, request->len,
                                                dont_fragment,
@@ -2632,7 +2637,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                 if (!client->export_meta.bitmaps[i]) {
                     continue;
                 }
-                ret = nbd_co_send_bitmap(client, request->handle,
+                ret = nbd_co_send_bitmap(client, request,
                                          client->exp->export_bitmaps[i],
                                          request->from, request->len,
                                          dont_fragment, !--contexts_remaining,
@@ -2646,7 +2651,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,

             return 0;
         } else {
-            return nbd_send_generic_reply(client, request->handle, -EINVAL,
+            return nbd_send_generic_reply(client, request, -EINVAL,
                                           "CMD_BLOCK_STATUS not negotiated",
                                           errp);
         }
@@ -2654,7 +2659,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
     default:
         msg = g_strdup_printf("invalid request type (%" PRIu32 ") received",
                               request->type);
-        ret = nbd_send_generic_reply(client, request->handle, -EINVAL, msg,
+        ret = nbd_send_generic_reply(client, request, -EINVAL, msg,
                                      errp);
         g_free(msg);
         return ret;
@@ -2717,7 +2722,7 @@ static coroutine_fn void nbd_trip(void *opaque)
         Error *export_err = local_err;

         local_err = NULL;
-        ret = nbd_send_generic_reply(client, request.handle, -EINVAL,
+        ret = nbd_send_generic_reply(client, &request, -EINVAL,
                                      error_get_pretty(export_err), &local_err);
         error_free(export_err);
     } else {
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 08/14] nbd/server: Support 64-bit block status
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (6 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 07/14] nbd/server: Refactor to pass full request around Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-31 14:10   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 09/14] nbd/server: Initial support for extended headers Eric Blake
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, vsementsov, open list:Network Block Dev...

The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits.  As of this patch,
client->header_style is still never NBD_HEADER_EXTENDED, so the code
added here does not take effect until the next patch enables
negotiation.

For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.

Note that we previously had some interesting size-juggling on call
chains, such as:

nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
  -> bdrv_block_status_above(bytes, &uint64_t num)
  -> nbd_extent_array_add(uint64_t num)
    -> store num in 32-bit length

But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers).  This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 86 +++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 62 insertions(+), 24 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index ffab51efd26..b4c15ae1a14 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2073,7 +2073,15 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
 }

 typedef struct NBDExtentArray {
-    NBDExtent *extents;
+    NBDHeaderStyle style;           /* 32- or 64-bit extent descriptions */
+    union {
+        NBDStructuredMeta id;       /* style == NBD_HEADER_STRUCTURED */
+        NBDStructuredMetaExt meta;  /* style == NBD_HEADER_EXTENDED */
+    };
+    union {
+        NBDExtent *narrow;          /* style == NBD_HEADER_STRUCTURED */
+        NBDExtentExt *extents;      /* style == NBD_HEADER_EXTENDED */
+    };
     unsigned int nb_alloc;
     unsigned int count;
     uint64_t total_length;
@@ -2081,12 +2089,15 @@ typedef struct NBDExtentArray {
     bool converted_to_be;
 } NBDExtentArray;

-static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc)
+static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc,
+                                            NBDHeaderStyle style)
 {
     NBDExtentArray *ea = g_new0(NBDExtentArray, 1);

+    assert(style >= NBD_HEADER_STRUCTURED);
     ea->nb_alloc = nb_alloc;
-    ea->extents = g_new(NBDExtent, nb_alloc);
+    ea->extents = g_new(NBDExtentExt, nb_alloc);
+    ea->style = style;
     ea->can_add = true;

     return ea;
@@ -2100,17 +2111,37 @@ static void nbd_extent_array_free(NBDExtentArray *ea)
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(NBDExtentArray, nbd_extent_array_free)

 /* Further modifications of the array after conversion are abandoned */
-static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
+static void nbd_extent_array_convert_to_be(NBDExtentArray *ea,
+                                           uint32_t context_id,
+                                           struct iovec *iov)
 {
     int i;

     assert(!ea->converted_to_be);
+    assert(iov[0].iov_base == &ea->meta);
+    assert(iov[1].iov_base == ea->extents);
     ea->can_add = false;
     ea->converted_to_be = true;

-    for (i = 0; i < ea->count; i++) {
-        ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
-        ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
+    stl_be_p(&ea->meta.context_id, context_id);
+    if (ea->style >= NBD_HEADER_EXTENDED) {
+        stl_be_p(&ea->meta.count, ea->count);
+        for (i = 0; i < ea->count; i++) {
+            ea->extents[i].length = cpu_to_be64(ea->extents[i].length);
+            ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags);
+        }
+        iov[0].iov_len = sizeof(ea->meta);
+        iov[1].iov_len = ea->count * sizeof(ea->extents[0]);
+    } else {
+        /* Conversion reduces memory usage, order of iteration matters */
+        for (i = 0; i < ea->count; i++) {
+            assert(ea->extents[i].length <= UINT32_MAX);
+            assert((uint32_t) ea->extents[i].flags == ea->extents[i].flags);
+            ea->narrow[i].length = cpu_to_be32(ea->extents[i].length);
+            ea->narrow[i].flags = cpu_to_be32(ea->extents[i].flags);
+        }
+        iov[0].iov_len = sizeof(ea->id);
+        iov[1].iov_len = ea->count * sizeof(ea->narrow[0]);
     }
 }

@@ -2124,19 +2155,23 @@ static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
  * would result in an incorrect range reported to the client)
  */
 static int nbd_extent_array_add(NBDExtentArray *ea,
-                                uint32_t length, uint32_t flags)
+                                uint64_t length, uint32_t flags)
 {
     assert(ea->can_add);

     if (!length) {
         return 0;
     }
+    if (ea->style == NBD_HEADER_STRUCTURED) {
+        assert(length <= UINT32_MAX);
+    }

     /* Extend previous extent if flags are the same */
     if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) {
-        uint64_t sum = (uint64_t)length + ea->extents[ea->count - 1].length;
+        uint64_t sum = length + ea->extents[ea->count - 1].length;

-        if (sum <= UINT32_MAX) {
+        assert(sum >= length);
+        if (sum <= UINT32_MAX || ea->style >= NBD_HEADER_EXTENDED) {
             ea->extents[ea->count - 1].length = sum;
             ea->total_length += length;
             return 0;
@@ -2149,7 +2184,7 @@ static int nbd_extent_array_add(NBDExtentArray *ea,
     }

     ea->total_length += length;
-    ea->extents[ea->count] = (NBDExtent) {.length = length, .flags = flags};
+    ea->extents[ea->count] = (NBDExtentExt) {.length = length, .flags = flags};
     ea->count++;

     return 0;
@@ -2218,21 +2253,20 @@ nbd_co_send_extents(NBDClient *client, NBDRequest *request, NBDExtentArray *ea,
                     bool last, uint32_t context_id, Error **errp)
 {
     NBDReply hdr;
-    NBDStructuredMeta chunk;
     struct iovec iov[] = {
         {.iov_base = &hdr},
-        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
-        {.iov_base = ea->extents, .iov_len = ea->count * sizeof(ea->extents[0])}
+        {.iov_base = &ea->meta},
+        {.iov_base = ea->extents}
     };
+    uint16_t type = client->header_style == NBD_HEADER_EXTENDED ?
+        NBD_REPLY_TYPE_BLOCK_STATUS_EXT : NBD_REPLY_TYPE_BLOCK_STATUS;

-    nbd_extent_array_convert_to_be(ea);
+    nbd_extent_array_convert_to_be(ea, context_id, &iov[1]);

     trace_nbd_co_send_extents(request->handle, ea->count, context_id,
                               ea->total_length, last);
-    set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0,
-                 NBD_REPLY_TYPE_BLOCK_STATUS,
+    set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0, type,
                  request, iov[1].iov_len + iov[2].iov_len);
-    stl_be_p(&chunk.context_id, context_id);

     return nbd_co_send_iov(client, iov, 3, errp);
 }
@@ -2241,13 +2275,14 @@ nbd_co_send_extents(NBDClient *client, NBDRequest *request, NBDExtentArray *ea,
 static int
 coroutine_fn nbd_co_send_block_status(NBDClient *client, NBDRequest *request,
                                       BlockBackend *blk, uint64_t offset,
-                                      uint32_t length, bool dont_fragment,
+                                      uint64_t length, bool dont_fragment,
                                       bool last, uint32_t context_id,
                                       Error **errp)
 {
     int ret;
     unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
-    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
+    g_autoptr(NBDExtentArray) ea =
+        nbd_extent_array_new(nb_extents, client->header_style);

     if (context_id == NBD_META_ID_BASE_ALLOCATION) {
         ret = blockstatus_to_extents(blk, offset, length, ea);
@@ -2270,11 +2305,12 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
     int64_t start, dirty_start, dirty_count;
     int64_t end = offset + length;
     bool full = false;
+    int64_t bound = es->style >= NBD_HEADER_EXTENDED ? INT64_MAX : INT32_MAX;

     bdrv_dirty_bitmap_lock(bitmap);

     for (start = offset;
-         bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end, INT32_MAX,
+         bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end, bound,
                                            &dirty_start, &dirty_count);
          start = dirty_start + dirty_count)
     {
@@ -2298,12 +2334,13 @@ static int coroutine_fn nbd_co_send_bitmap(NBDClient *client,
                                            NBDRequest *request,
                                            BdrvDirtyBitmap *bitmap,
                                            uint64_t offset,
-                                           uint32_t length, bool dont_fragment,
+                                           uint64_t length, bool dont_fragment,
                                            bool last, uint32_t context_id,
                                            Error **errp)
 {
     unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
-    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
+    g_autoptr(NBDExtentArray) ea =
+        nbd_extent_array_new(nb_extents, client->header_style);

     bitmap_to_extents(bitmap, offset, length, ea);

@@ -2602,7 +2639,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             return nbd_send_generic_reply(client, request, -EINVAL,
                                           "need non-zero length", errp);
         }
-        assert(request->len <= UINT32_MAX);
+        assert(client->header_style >= NBD_HEADER_EXTENDED ||
+               request->len <= UINT32_MAX);
         if (client->export_meta.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
             int contexts_remaining = client->export_meta.count;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 09/14] nbd/server: Initial support for extended headers
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (7 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 08/14] nbd/server: Support 64-bit block status Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-31 14:46   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 10/14] nbd/client: " Eric Blake
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, vsementsov, open list:Network Block Dev...

Time to support clients that request extended headers.  Now we can
finally reach the code added across several previous patches.

Even though the NBD spec has been altered to allow us to accept
NBD_CMD_READ larger than the max payload size (provided our response
is a hole or broken up over more than one data chunk), we are not
planning to take advantage of that, and continue to cap NBD_CMD_READ
to 32M regardless of header size.

For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already
supports 64-bit operations without any effort on our part.  For
NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous
patch took care of implementing the required
NBD_REPLY_TYPE_BLOCK_STATUS_EXT.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/nbd-internal.h |   5 +-
 nbd/server.c       | 130 +++++++++++++++++++++++++++++++++++----------
 2 files changed, 106 insertions(+), 29 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 133b1d94b50..dfa02f77ee4 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -34,8 +34,11 @@
  * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

-/* Size of all NBD_OPT_*, without payload */
+/* Size of all compact NBD_CMD_*, without payload */
 #define NBD_REQUEST_SIZE            (4 + 2 + 2 + 8 + 8 + 4)
+/* Size of all extended NBD_CMD_*, without payload */
+#define NBD_EXTENDED_REQUEST_SIZE   (4 + 2 + 2 + 8 + 8 + 8)
+
 /* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
 #define NBD_REPLY_SIZE              (4 + 4 + 8)
 /* Size of reply to NBD_OPT_EXPORT_NAME */
diff --git a/nbd/server.c b/nbd/server.c
index b4c15ae1a14..6475a76c1f0 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -482,6 +482,10 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
         [10 .. 133]   reserved     (0) [unless no_zeroes]
      */
     trace_nbd_negotiate_handle_export_name();
+    if (client->header_style >= NBD_HEADER_EXTENDED) {
+        error_setg(errp, "Extended headers already negotiated");
+        return -EINVAL;
+    }
     if (client->optlen > NBD_MAX_STRING_SIZE) {
         error_setg(errp, "Bad length received");
         return -EINVAL;
@@ -1262,7 +1266,11 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
             case NBD_OPT_STRUCTURED_REPLY:
                 if (length) {
                     ret = nbd_reject_length(client, false, errp);
-                } else if (client->header_style >= NBD_HEADER_STRUCTURED) {
+                } else if (client->header_style >= NBD_HEADER_EXTENDED) {
+                    ret = nbd_negotiate_send_rep_err(
+                        client, NBD_REP_ERR_EXT_HEADER_REQD, errp,
+                        "extended headers already negotiated");
+                } else if (client->header_style == NBD_HEADER_STRUCTURED) {
                     ret = nbd_negotiate_send_rep_err(
                         client, NBD_REP_ERR_INVALID, errp,
                         "structured reply already negotiated");
@@ -1278,6 +1286,19 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
                                                  errp);
                 break;

+            case NBD_OPT_EXTENDED_HEADERS:
+                if (length) {
+                    ret = nbd_reject_length(client, false, errp);
+                } else if (client->header_style >= NBD_HEADER_EXTENDED) {
+                    ret = nbd_negotiate_send_rep_err(
+                        client, NBD_REP_ERR_INVALID, errp,
+                        "extended headers already negotiated");
+                } else {
+                    ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
+                    client->header_style = NBD_HEADER_EXTENDED;
+                }
+                break;
+
             default:
                 ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
                                    "Unsupported option %" PRIu32 " (%s)",
@@ -1413,11 +1434,13 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
 static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *request,
                                             Error **errp)
 {
-    uint8_t buf[NBD_REQUEST_SIZE];
-    uint32_t magic;
+    uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+    uint32_t magic, expect;
     int ret;
+    size_t size = client->header_style == NBD_HEADER_EXTENDED ?
+        NBD_EXTENDED_REQUEST_SIZE : NBD_REQUEST_SIZE;

-    ret = nbd_read_eof(client, buf, sizeof(buf), errp);
+    ret = nbd_read_eof(client, buf, size, errp);
     if (ret < 0) {
         return ret;
     }
@@ -1425,13 +1448,21 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque
         return -EIO;
     }

-    /* Request
-       [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
-       [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
-       [ 6 ..  7]   type    (NBD_CMD_READ, ...)
-       [ 8 .. 15]   handle
-       [16 .. 23]   from
-       [24 .. 27]   len
+    /*
+     * Compact request
+     *  [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
+     *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+     *  [ 6 ..  7]   type    (NBD_CMD_READ, ...)
+     *  [ 8 .. 15]   handle
+     *  [16 .. 23]   from
+     *  [24 .. 27]   len
+     * Extended request
+     *  [ 0 ..  3]   magic   (NBD_EXTENDED_REQUEST_MAGIC)
+     *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, NBD_CMD_FLAG_PAYLOAD_LEN, ...)
+     *  [ 6 ..  7]   type    (NBD_CMD_READ, ...)
+     *  [ 8 .. 15]   handle
+     *  [16 .. 23]   from
+     *  [24 .. 31]   len
      */

     magic = ldl_be_p(buf);
@@ -1439,12 +1470,18 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque
     request->type   = lduw_be_p(buf + 6);
     request->handle = ldq_be_p(buf + 8);
     request->from   = ldq_be_p(buf + 16);
-    request->len    = ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+    if (client->header_style >= NBD_HEADER_EXTENDED) {
+        request->len = ldq_be_p(buf + 24);
+        expect = NBD_EXTENDED_REQUEST_MAGIC;
+    } else {
+        request->len = ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+        expect = NBD_REQUEST_MAGIC;
+    }

     trace_nbd_receive_request(magic, request->flags, request->type,
                               request->from, request->len);

-    if (magic != NBD_REQUEST_MAGIC) {
+    if (magic != expect) {
         error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
@@ -1887,14 +1924,37 @@ static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
 }

 static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov,
-                                       uint64_t error, NBDRequest *request)
+                                       uint32_t error, NBDStructuredError *err,
+                                       NBDRequest *request)
 {
-    NBDSimpleReply *reply = iov->iov_base;
+    if (client->header_style >= NBD_HEADER_EXTENDED) {
+        NBDExtendedReplyChunk *chunk = iov->iov_base;

-    iov->iov_len = sizeof(*reply);
-    stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
-    stl_be_p(&reply->error, error);
-    stq_be_p(&reply->handle, request->handle);
+        iov->iov_len = sizeof(*chunk);
+        stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC);
+        stw_be_p(&chunk->flags, NBD_REPLY_FLAG_DONE);
+        stq_be_p(&chunk->handle, request->handle);
+        stq_be_p(&chunk->offset, request->from);
+        if (error) {
+            assert(!iov[1].iov_base);
+            iov[1].iov_base = err;
+            iov[1].iov_len = sizeof(*err);
+            stw_be_p(&chunk->type, NBD_REPLY_TYPE_ERROR);
+            stq_be_p(&chunk->length, sizeof(*err));
+            stl_be_p(&err->error, error);
+            stw_be_p(&err->message_length, 0);
+        } else {
+            stw_be_p(&chunk->type, NBD_REPLY_TYPE_NONE);
+            stq_be_p(&chunk->length, 0);
+        }
+    } else {
+        NBDSimpleReply *reply = iov->iov_base;
+
+        iov->iov_len = sizeof(*reply);
+        stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
+        stl_be_p(&reply->error, error);
+        stq_be_p(&reply->handle, request->handle);
+    }
 }

 static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
@@ -1906,30 +1966,44 @@ static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
 {
     NBDReply hdr;
     int nbd_err = system_errno_to_nbd_errno(error);
+    NBDStructuredError err;
     struct iovec iov[] = {
         {.iov_base = &hdr},
         {.iov_base = data, .iov_len = len}
     };

+    assert(!len || !nbd_err);
     trace_nbd_co_send_simple_reply(request->handle, nbd_err,
                                    nbd_err_lookup(nbd_err), len);
-    set_be_simple_reply(client, &iov[0], nbd_err, request);
+    set_be_simple_reply(client, &iov[0], nbd_err, &err, request);

-    return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
+    return nbd_co_send_iov(client, iov, iov[1].iov_len ? 2 : 1, errp);
 }

 static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
                                 uint16_t flags, uint16_t type,
                                 NBDRequest *request, uint32_t length)
 {
-    NBDStructuredReplyChunk *chunk = iov->iov_base;
+    if (client->header_style >= NBD_HEADER_EXTENDED) {
+        NBDExtendedReplyChunk *chunk = iov->iov_base;

-    iov->iov_len = sizeof(*chunk);
-    stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
-    stw_be_p(&chunk->flags, flags);
-    stw_be_p(&chunk->type, type);
-    stq_be_p(&chunk->handle, request->handle);
-    stl_be_p(&chunk->length, length);
+        iov->iov_len = sizeof(*chunk);
+        stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC);
+        stw_be_p(&chunk->flags, flags);
+        stw_be_p(&chunk->type, type);
+        stq_be_p(&chunk->handle, request->handle);
+        stq_be_p(&chunk->offset, request->from);
+        stq_be_p(&chunk->length, length);
+    } else {
+        NBDStructuredReplyChunk *chunk = iov->iov_base;
+
+        iov->iov_len = sizeof(*chunk);
+        stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
+        stw_be_p(&chunk->flags, flags);
+        stw_be_p(&chunk->type, type);
+        stq_be_p(&chunk->handle, request->handle);
+        stl_be_p(&chunk->length, length);
+    }
 }

 static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 10/14] nbd/client: Initial support for extended headers
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (8 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 09/14] nbd/server: Initial support for extended headers Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-31 15:26   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 11/14] nbd/client: Accept 64-bit block status chunks Eric Blake
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: libguestfs, vsementsov, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

Update the client code to be able to send an extended request, and
parse an extended header from the server.  Note that since we reject
any structured reply with a too-large payload, we can always normalize
a valid header back into the compact form, so that the caller need not
deal with two branches of a union.  Still, until a later patch lets
the client negotiate extended headers, the code added here should not
be reached.  Note that because of the different magic numbers, it is
just as easy to trace and then tolerate a non-compliant server sending
the wrong header reply as it would be to insist that the server is
compliant.

The only caller to nbd_receive_reply() always passed NULL for errp;
since we are changing the signature anyways, I decided to sink the
decision to ignore errors one layer lower.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  2 +-
 block/nbd.c         |  3 +-
 nbd/client.c        | 86 +++++++++++++++++++++++++++++++--------------
 nbd/trace-events    |  1 +
 4 files changed, 63 insertions(+), 29 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index d753fb8006f..865bb4ee2e1 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -371,7 +371,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request, NBDHeaderStyle hdr);
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp);
+                                   NBDReply *reply, NBDHeaderStyle hdr);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd.c b/block/nbd.c
index 6ad6a4f5ecd..d6caea44928 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -458,7 +458,8 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)

         /* We are under mutex and handle is 0. We have to do the dirty work. */
         assert(s->reply.handle == 0);
-        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, NULL);
+        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply,
+                                s->info.header_style);
         if (ret <= 0) {
             ret = ret ? ret : -EIO;
             nbd_channel_error(s, ret);
diff --git a/nbd/client.c b/nbd/client.c
index 17d1f57da60..e5db3c8b79d 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1350,22 +1350,29 @@ int nbd_disconnect(int fd)

 int nbd_send_request(QIOChannel *ioc, NBDRequest *request, NBDHeaderStyle hdr)
 {
-    uint8_t buf[NBD_REQUEST_SIZE];
+    uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+    size_t len;

-    assert(hdr < NBD_HEADER_EXTENDED);
-    assert(request->len <= UINT32_MAX);
     trace_nbd_send_request(request->from, request->len, request->handle,
                            request->flags, request->type,
                            nbd_cmd_lookup(request->type));

-    stl_be_p(buf, NBD_REQUEST_MAGIC);
     stw_be_p(buf + 4, request->flags);
     stw_be_p(buf + 6, request->type);
     stq_be_p(buf + 8, request->handle);
     stq_be_p(buf + 16, request->from);
-    stl_be_p(buf + 24, request->len);
+    if (hdr >= NBD_HEADER_EXTENDED) {
+        stl_be_p(buf, NBD_EXTENDED_REQUEST_MAGIC);
+        stq_be_p(buf + 24, request->len);
+        len = NBD_EXTENDED_REQUEST_SIZE;
+    } else {
+        assert(request->len <= UINT32_MAX);
+        stl_be_p(buf, NBD_REQUEST_MAGIC);
+        stl_be_p(buf + 24, request->len);
+        len = NBD_REQUEST_SIZE;
+    }

-    return nbd_write(ioc, buf, sizeof(buf), NULL);
+    return nbd_write(ioc, buf, len, NULL);
 }

 /* nbd_receive_simple_reply
@@ -1394,28 +1401,34 @@ static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply,

 /* nbd_receive_structured_reply_chunk
  * Read structured reply chunk except magic field (which should be already
- * read).
+ * read).  Normalize into the compact form.
  * Payload is not read.
  */
-static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
-                                              NBDStructuredReplyChunk *chunk,
+static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *chunk,
                                               Error **errp)
 {
     int ret;
+    size_t len;
+    uint64_t payload_len;

-    assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
+    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+        len = sizeof(chunk->structured);
+    } else {
+        assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC);
+        len = sizeof(chunk->extended);
+    }

     ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
-                   sizeof(*chunk) - sizeof(chunk->magic), "structured chunk",
+                   len - sizeof(chunk->magic), "structured chunk",
                    errp);
     if (ret < 0) {
         return ret;
     }

-    chunk->flags = be16_to_cpu(chunk->flags);
-    chunk->type = be16_to_cpu(chunk->type);
-    chunk->handle = be64_to_cpu(chunk->handle);
-    chunk->length = be32_to_cpu(chunk->length);
+    /* flags, type, and handle occupy same space between forms */
+    chunk->structured.flags = be16_to_cpu(chunk->structured.flags);
+    chunk->structured.type = be16_to_cpu(chunk->structured.type);
+    chunk->structured.handle = be64_to_cpu(chunk->structured.handle);

     /*
      * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests
@@ -1423,11 +1436,20 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
      * this.  Even if we stopped using REQ_ONE, sane servers will cap
      * the number of extents they return for block status.
      */
-    if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
+    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+        payload_len = be32_to_cpu(chunk->structured.length);
+    } else {
+        /* For now, we are ignoring the extended header offset. */
+        payload_len = be64_to_cpu(chunk->extended.length);
+        chunk->magic = NBD_STRUCTURED_REPLY_MAGIC;
+    }
+    if (payload_len > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
         error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long",
-                   chunk->type, nbd_rep_lookup(chunk->type));
+                   chunk->structured.type,
+                   nbd_rep_lookup(chunk->structured.type));
         return -EINVAL;
     }
+    chunk->structured.length = payload_len;

     return 0;
 }
@@ -1474,30 +1496,35 @@ nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,

 /* nbd_receive_reply
  *
- * Decreases bs->in_flight while waiting for a new reply. This yield is where
- * we wait indefinitely and the coroutine must be able to be safely reentered
- * for nbd_client_attach_aio_context().
+ * Wait for a new reply. If this yields, the coroutine must be able to be
+ * safely reentered for nbd_client_attach_aio_context().  @hdr determines
+ * which reply magic we are expecting, although this normalizes the result
+ * so that the caller only has to work with compact headers.
  *
  * Returns 1 on success
- *         0 on eof, when no data was read (errp is not set)
- *         negative errno on failure (errp is set)
+ *         0 on eof, when no data was read
+ *         negative errno on failure
  */
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp)
+                                   NBDReply *reply, NBDHeaderStyle hdr)
 {
     int ret;
     const char *type;

-    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp);
+    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), NULL);
     if (ret <= 0) {
         return ret;
     }

     reply->magic = be32_to_cpu(reply->magic);

+    /* Diagnose but accept wrong-width header */
     switch (reply->magic) {
     case NBD_SIMPLE_REPLY_MAGIC:
-        ret = nbd_receive_simple_reply(ioc, &reply->simple, errp);
+        if (hdr >= NBD_HEADER_EXTENDED) {
+            trace_nbd_receive_wrong_header(reply->magic);
+        }
+        ret = nbd_receive_simple_reply(ioc, &reply->simple, NULL);
         if (ret < 0) {
             break;
         }
@@ -1506,7 +1533,12 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
                                        reply->handle);
         break;
     case NBD_STRUCTURED_REPLY_MAGIC:
-        ret = nbd_receive_structured_reply_chunk(ioc, &reply->structured, errp);
+    case NBD_EXTENDED_REPLY_MAGIC:
+        if ((hdr >= NBD_HEADER_EXTENDED) !=
+            (reply->magic == NBD_EXTENDED_REPLY_MAGIC)) {
+            trace_nbd_receive_wrong_header(reply->magic);
+        }
+        ret = nbd_receive_structured_reply_chunk(ioc, reply, NULL);
         if (ret < 0) {
             break;
         }
@@ -1517,7 +1549,7 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
                                                  reply->structured.length);
         break;
     default:
-        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", reply->magic);
+        trace_nbd_receive_wrong_header(reply->magic);
         return -EINVAL;
     }
     if (ret < 0) {
diff --git a/nbd/trace-events b/nbd/trace-events
index adf5666e207..c20df33a431 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -34,6 +34,7 @@ nbd_client_clear_socket(void) "Clearing NBD socket"
 nbd_send_request(uint64_t from, uint64_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu64 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
 nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t handle) "Got simple reply: { .error = %" PRId32 " (%s), handle = %" PRIu64" }"
 nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), handle = %" PRIu64 ", length = %" PRIu32 " }"
+nbd_receive_wrong_header(uint32_t magic) "Server sent unexpected magic 0x%" PRIx32

 # common.c
 nbd_unknown_error(int err) "Squashing unexpected error %d to EINVAL"
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 11/14] nbd/client: Accept 64-bit block status chunks
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (9 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 10/14] nbd/client: " Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-05-31 17:00   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 12/14] nbd/client: Request extended headers during negotiation Eric Blake
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: libguestfs, vsementsov, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

Because we use NBD_CMD_FLAG_REQ_ONE with NBD_CMD_BLOCK_STATUS, a
client in narrow mode should not be able to provoke a server into
sending a block status result larger than the client's 32-bit request.
But in extended mode, a 64-bit status request must be able to handle a
64-bit status result, once a future patch enables the client
requesting extended mode.  We can also tolerate a non-compliant server
sending the new chunk even when it should not.

In normal execution, we are only requesting "base:allocation" which
never exceeds 32 bits. But during testing with x-dirty-bitmap, we can
force qemu to connect to some other context that might have 64-bit
status bit; however, we ignore those upper bits (other than mapping
qemu:allocation-depth into something that 'qemu-img map --output=json'
can expose), and since it is only testing, we really don't bother with
checking whether more than the two least-significant bits are set.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd.c        | 39 ++++++++++++++++++++++++++++-----------
 block/trace-events |  1 +
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index d6caea44928..150dfe7170c 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -610,13 +610,16 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
  */
 static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
                                          NBDStructuredReplyChunk *chunk,
-                                         uint8_t *payload, uint64_t orig_length,
-                                         NBDExtent *extent, Error **errp)
+                                         uint8_t *payload, bool wide,
+                                         uint64_t orig_length,
+                                         NBDExtentExt *extent, Error **errp)
 {
     uint32_t context_id;
+    uint32_t count = 0;
+    size_t len = wide ? sizeof(*extent) : sizeof(NBDExtent);

     /* The server succeeded, so it must have sent [at least] one extent */
-    if (chunk->length < sizeof(context_id) + sizeof(*extent)) {
+    if (chunk->length < sizeof(context_id) + wide * sizeof(count) + len) {
         error_setg(errp, "Protocol error: invalid payload for "
                          "NBD_REPLY_TYPE_BLOCK_STATUS");
         return -EINVAL;
@@ -631,8 +634,14 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
         return -EINVAL;
     }

-    extent->length = payload_advance32(&payload);
-    extent->flags = payload_advance32(&payload);
+    if (wide) {
+        count = payload_advance32(&payload);
+        extent->length = payload_advance64(&payload);
+        extent->flags = payload_advance64(&payload);
+    } else {
+        extent->length = payload_advance32(&payload);
+        extent->flags = payload_advance32(&payload);
+    }

     if (extent->length == 0) {
         error_setg(errp, "Protocol error: server sent status chunk with "
@@ -672,7 +681,8 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
      * connection; just ignore trailing extents, and clamp things to
      * the length of our request.
      */
-    if (chunk->length > sizeof(context_id) + sizeof(*extent)) {
+    if (count != wide ||
+        chunk->length > sizeof(context_id) + wide * sizeof(count) + len) {
         trace_nbd_parse_blockstatus_compliance("more than one extent");
     }
     if (extent->length > orig_length) {
@@ -1117,7 +1127,7 @@ static int coroutine_fn nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t h

 static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
                                                          uint64_t handle, uint64_t length,
-                                                         NBDExtent *extent,
+                                                         NBDExtentExt *extent,
                                                          int *request_ret, Error **errp)
 {
     NBDReplyChunkIter iter;
@@ -1125,6 +1135,7 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
     void *payload = NULL;
     Error *local_err = NULL;
     bool received = false;
+    bool wide = false;

     assert(!extent->length);
     NBD_FOREACH_REPLY_CHUNK(s, iter, handle, false, NULL, &reply, &payload) {
@@ -1134,7 +1145,13 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
         assert(nbd_reply_is_structured(&reply));

         switch (chunk->type) {
+        case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
+            wide = true;
+            /* fallthrough */
         case NBD_REPLY_TYPE_BLOCK_STATUS:
+            if (s->info.extended_headers != wide) {
+                trace_nbd_extended_headers_compliance("block_status");
+            }
             if (received) {
                 nbd_channel_error(s, -EINVAL);
                 error_setg(&local_err, "Several BLOCK_STATUS chunks in reply");
@@ -1142,9 +1159,9 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
             }
             received = true;

-            ret = nbd_parse_blockstatus_payload(s, &reply.structured,
-                                                payload, length, extent,
-                                                &local_err);
+            ret = nbd_parse_blockstatus_payload(
+                s, &reply.structured, payload, wide,
+                length, extent, &local_err);
             if (ret < 0) {
                 nbd_channel_error(s, ret);
                 nbd_iter_channel_error(&iter, ret, &local_err);
@@ -1374,7 +1391,7 @@ static int coroutine_fn GRAPH_RDLOCK nbd_client_co_block_status(
         int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
     int ret, request_ret;
-    NBDExtent extent = { 0 };
+    NBDExtentExt extent = { 0 };
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
     Error *local_err = NULL;

diff --git a/block/trace-events b/block/trace-events
index 48dbf10c66f..afb32fcce5b 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -168,6 +168,7 @@ iscsi_xcopy(void *src_lun, uint64_t src_off, void *dst_lun, uint64_t dst_off, ui
 # nbd.c
 nbd_parse_blockstatus_compliance(const char *err) "ignoring extra data from non-compliant server: %s"
 nbd_structured_read_compliance(const char *type) "server sent non-compliant unaligned read %s chunk"
+nbd_extended_headers_compliance(const char *type) "server sent non-compliant %s chunk not matching choice of extended headers"
 nbd_read_reply_entry_fail(int ret, const char *err) "ret = %d, err: %s"
 nbd_co_request_fail(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name, int ret, const char *err) "Request failed { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) } ret = %d, err: %s"
 nbd_client_handshake(const char *export_name) "export '%s'"
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 12/14] nbd/client: Request extended headers during negotiation
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (10 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 11/14] nbd/client: Accept 64-bit block status chunks Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
       [not found]   ` <1af7f692-b5de-c767-2568-1fc024a57133@yandex-team.ru>
  2023-05-15 19:53 ` [PATCH v3 13/14] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: libguestfs, vsementsov, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

All the pieces are in place for a client to finally request extended
headers.  Note that we must not request extended headers when qemu-nbd
is used to connect to the kernel module (as nbd.ko does not expect
them), but there is no harm in all other clients requesting them.

Extended headers are not essential to the information collected during
'qemu-nbd --list', but probing for it gives us one more piece of
information in that output.  Update the iotests affected by the new
line of output.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd.c                                   |  5 +--
 nbd/client-connection.c                       |  2 +-
 nbd/client.c                                  | 38 ++++++++++++-------
 qemu-nbd.c                                    |  3 ++
 tests/qemu-iotests/223.out                    |  6 +++
 tests/qemu-iotests/233.out                    |  5 +++
 tests/qemu-iotests/241.out                    |  3 ++
 tests/qemu-iotests/307.out                    |  5 +++
 .../tests/nbd-qemu-allocation.out             |  1 +
 9 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 150dfe7170c..db107ff0806 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1146,10 +1146,9 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,

         switch (chunk->type) {
         case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
-            wide = true;
-            /* fallthrough */
         case NBD_REPLY_TYPE_BLOCK_STATUS:
-            if (s->info.extended_headers != wide) {
+            wide = chunk->type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT;
+            if ((s->info.header_style == NBD_HEADER_EXTENDED) != wide) {
                 trace_nbd_extended_headers_compliance("block_status");
             }
             if (received) {
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 62d75af0bb3..8e0606cadf0 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -93,7 +93,7 @@ NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
         .do_negotiation = do_negotiation,

         .initial_info.request_sizes = true,
-        .initial_info.header_style = NBD_HEADER_STRUCTURED,
+        .initial_info.header_style = NBD_HEADER_EXTENDED,
         .initial_info.base_allocation = true,
         .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
         .initial_info.name = g_strdup(export_name ?: "")
diff --git a/nbd/client.c b/nbd/client.c
index e5db3c8b79d..7edddfd2f83 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -879,11 +879,12 @@ static int nbd_list_meta_contexts(QIOChannel *ioc,
  *          1: server is newstyle, but can only accept EXPORT_NAME
  *          2: server is newstyle, but lacks structured replies
  *          3: server is newstyle and set up for structured replies
+ *          4: server is newstyle and set up for extended headers
  */
 static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
                                QCryptoTLSCreds *tlscreds,
                                const char *hostname, QIOChannel **outioc,
-                               bool structured_reply, bool *zeroes,
+                               NBDHeaderStyle style, bool *zeroes,
                                Error **errp)
 {
     ERRP_GUARD();
@@ -961,15 +962,23 @@ static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
         if (fixedNewStyle) {
             int result = 0;

-            if (structured_reply) {
+            if (style >= NBD_HEADER_EXTENDED) {
+                result = nbd_request_simple_option(ioc,
+                                                   NBD_OPT_EXTENDED_HEADERS,
+                                                   false, errp);
+                if (result) {
+                    return result < 0 ? -EINVAL : 4;
+                }
+            }
+            if (style >= NBD_HEADER_STRUCTURED) {
                 result = nbd_request_simple_option(ioc,
                                                    NBD_OPT_STRUCTURED_REPLY,
                                                    false, errp);
-                if (result < 0) {
-                    return -EINVAL;
+                if (result) {
+                    return result < 0 ? -EINVAL : 3;
                 }
             }
-            return 2 + result;
+            return 2;
         } else {
             return 1;
         }
@@ -1031,8 +1040,7 @@ int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
     trace_nbd_receive_negotiate_name(info->name);

     result = nbd_start_negotiate(aio_context, ioc, tlscreds, hostname, outioc,
-                                 info->header_style >= NBD_HEADER_STRUCTURED,
-                                 &zeroes, errp);
+                                 info->header_style, &zeroes, errp);

     info->header_style = NBD_HEADER_SIMPLE;
     info->base_allocation = false;
@@ -1041,8 +1049,10 @@ int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
     }

     switch (result) {
+    case 4: /* newstyle, with extended headers */
     case 3: /* newstyle, with structured replies */
-        info->header_style = NBD_HEADER_STRUCTURED;
+        /* Relies on encoding of _STRUCTURED and _EXTENDED */
+        info->header_style = result - 2;
         if (base_allocation) {
             result = nbd_negotiate_simple_meta_context(ioc, info, errp);
             if (result < 0) {
@@ -1151,8 +1161,8 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
     QIOChannel *sioc = NULL;

     *info = NULL;
-    result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, &sioc, true,
-                                 NULL, errp);
+    result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, &sioc,
+                                 NBD_HEADER_EXTENDED, NULL, errp);
     if (tlscreds && sioc) {
         ioc = sioc;
     }
@@ -1160,6 +1170,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
     switch (result) {
     case 2:
     case 3:
+    case 4:
         /* newstyle - use NBD_OPT_LIST to populate array, then try
          * NBD_OPT_INFO on each array member. If structured replies
          * are enabled, also try NBD_OPT_LIST_META_CONTEXT. */
@@ -1180,8 +1191,9 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
             memset(&array[count - 1], 0, sizeof(*array));
             array[count - 1].name = name;
             array[count - 1].description = desc;
-            array[count - 1].header_style = result == 3 ?
-                NBD_HEADER_STRUCTURED : NBD_HEADER_SIMPLE;
+
+            /* Depends on values of _SIMPLE, _STRUCTURED, and _EXTENDED */
+            array[count - 1].header_style = result - 2;
         }

         for (i = 0; i < count; i++) {
@@ -1197,7 +1209,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
                 break;
             }

-            if (result == 3 &&
+            if (result >= 3 &&
                 nbd_list_meta_contexts(ioc, &array[i], errp) < 0) {
                 goto out;
             }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 6ff45308a9c..8c35442626a 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -238,6 +238,9 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
             printf("  opt block: %u\n", list[i].opt_block);
             printf("  max block: %u\n", list[i].max_block);
         }
+        printf("  transaction size: %s\n",
+               list[i].header_style >= NBD_HEADER_EXTENDED ?
+               "64-bit" : "32-bit");
         if (list[i].n_contexts) {
             printf("  available meta contexts: %d\n", list[i].n_contexts);
             for (j = 0; j < list[i].n_contexts; j++) {
diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index 26fb347c5da..b98582c38ea 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -87,6 +87,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b
@@ -97,6 +98,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b2
@@ -106,6 +108,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b3
@@ -206,6 +209,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b
@@ -216,6 +220,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b2
@@ -225,6 +230,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b3
diff --git a/tests/qemu-iotests/233.out b/tests/qemu-iotests/233.out
index 237c82767ea..33cb622ecf0 100644
--- a/tests/qemu-iotests/233.out
+++ b/tests/qemu-iotests/233.out
@@ -53,6 +53,11 @@ exports available: 1
  export: ''
   size:  67108864
   min block: 1
+  opt block: 4096
+  max block: 33554432
+  transaction size: 64-bit
+  available meta contexts: 1
+   base:allocation

 == check TLS with different CA fails ==
 qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': The certificate hasn't got a known issuer
diff --git a/tests/qemu-iotests/241.out b/tests/qemu-iotests/241.out
index 88e8cfcd7e2..a9efb876521 100644
--- a/tests/qemu-iotests/241.out
+++ b/tests/qemu-iotests/241.out
@@ -6,6 +6,7 @@ exports available: 1
  export: ''
   size:  1024
   min block: 1
+  transaction size: 64-bit
 [{ "start": 0, "length": 1000, "depth": 0, "present": true, "zero": false, "data": true, "offset": OFFSET},
 { "start": 1000, "length": 24, "depth": 0, "present": true, "zero": true, "data": false, "offset": OFFSET}]
 1 KiB (0x400) bytes     allocated at offset 0 bytes (0x0)
@@ -16,6 +17,7 @@ exports available: 1
  export: ''
   size:  1024
   min block: 512
+  transaction size: 64-bit
 [{ "start": 0, "length": 1024, "depth": 0, "present": true, "zero": false, "data": true, "offset": OFFSET}]
 1 KiB (0x400) bytes     allocated at offset 0 bytes (0x0)
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
@@ -28,6 +30,7 @@ exports available: 1
  export: ''
   size:  1024
   min block: 1
+  transaction size: 64-bit
 [{ "start": 0, "length": 1000, "depth": 0, "present": true, "zero": false, "data": true, "offset": OFFSET},
 { "start": 1000, "length": 24, "depth": 0, "present": true, "zero": true, "data": false, "offset": OFFSET}]
 1 KiB (0x400) bytes     allocated at offset 0 bytes (0x0)
diff --git a/tests/qemu-iotests/307.out b/tests/qemu-iotests/307.out
index 390f05d1b78..2b9a6a67a1a 100644
--- a/tests/qemu-iotests/307.out
+++ b/tests/qemu-iotests/307.out
@@ -19,6 +19,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -47,6 +48,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -78,6 +80,7 @@ exports available: 2
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation
  export: 'export1'
@@ -87,6 +90,7 @@ exports available: 2
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -113,6 +117,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
index 9d938db24e6..659276032b0 100644
--- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out
+++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
@@ -21,6 +21,7 @@ exports available: 1
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:allocation-depth
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 13/14] nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (11 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 12/14] nbd/client: Request extended headers during negotiation Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-06-01  9:57   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 19:53 ` [PATCH v3 14/14] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
  2023-05-15 21:05 ` [Libguestfs] [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: libguestfs, vsementsov, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

The next commit will add support for the new addition of
NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
request that the server only return a subset of negotiated contexts,
rather than all contexts.  To make that task easier, this patch
populates the list of contexts to return on a per-command basis (for
now, identical to the full set of negotiated contexts).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  15 ++++++
 nbd/server.c        | 108 +++++++++++++++++++++++---------------------
 2 files changed, 72 insertions(+), 51 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 865bb4ee2e1..6696d61bd59 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -60,6 +60,20 @@ typedef enum NBDHeaderStyle {
     NBD_HEADER_EXTENDED,    /* NBD_OPT_EXTENDED_HEADERS negotiated */
 } NBDHeaderStyle;

+/*
+ * NBDMetaContexts represents a list of meta contexts in use, as
+ * selected by NBD_OPT_SET_META_CONTEXT. Also used for
+ * NBD_OPT_LIST_META_CONTEXT, and payload filtering in
+ * NBD_CMD_BLOCK_STATUS.
+ */
+typedef struct NBDMetaContexts {
+    size_t count; /* number of negotiated contexts */
+    bool base_allocation; /* export base:allocation context (block status) */
+    bool allocation_depth; /* export qemu:allocation-depth */
+    size_t nr_bitmaps; /* Length of bitmaps array */
+    bool *bitmaps; /* export qemu:dirty-bitmap:<export bitmap name> */
+} NBDMetaContexts;
+
 /*
  * Note: NBDRequest is _NOT_ the same as the network representation of an NBD
  * request!
@@ -70,6 +84,7 @@ typedef struct NBDRequest {
     uint64_t len;   /* Effect length; 32 bit limit without extended headers */
     uint16_t flags; /* NBD_CMD_FLAG_* */
     uint16_t type;  /* NBD_CMD_* */
+    NBDMetaContexts contexts; /* Used by NBD_CMD_BLOCK_STATUS */
 } NBDRequest;

 typedef struct NBDSimpleReply {
diff --git a/nbd/server.c b/nbd/server.c
index 6475a76c1f0..db550c82cd2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -105,20 +105,6 @@ struct NBDExport {

 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

-/* NBDExportMetaContexts represents a list of contexts to be exported,
- * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
- * NBD_OPT_LIST_META_CONTEXT. */
-typedef struct NBDExportMetaContexts {
-    NBDExport *exp;
-    size_t count; /* number of negotiated contexts */
-    bool base_allocation; /* export base:allocation context (block status) */
-    bool allocation_depth; /* export qemu:allocation-depth */
-    bool *bitmaps; /*
-                    * export qemu:dirty-bitmap:<export bitmap name>,
-                    * sized by exp->nr_export_bitmaps
-                    */
-} NBDExportMetaContexts;
-
 struct NBDClient {
     int refcount;
     void (*close_fn)(NBDClient *client, bool negotiated);
@@ -144,7 +130,8 @@ struct NBDClient {
     uint32_t check_align; /* If non-zero, check for aligned client requests */

     NBDHeaderStyle header_style;
-    NBDExportMetaContexts export_meta;
+    NBDExport *context_exp; /* export of last OPT_SET_META_CONTEXT */
+    NBDMetaContexts contexts; /* Negotiated meta contexts */

     uint32_t opt; /* Current option being negotiated */
     uint32_t optlen; /* remaining length of data in ioc for the option being
@@ -457,8 +444,8 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp)

 static void nbd_check_meta_export(NBDClient *client)
 {
-    if (client->exp != client->export_meta.exp) {
-        client->export_meta.count = 0;
+    if (client->exp != client->context_exp) {
+        client->contexts.count = 0;
     }
 }

@@ -852,7 +839,7 @@ static bool nbd_strshift(const char **str, const char *prefix)
  * Handle queries to 'base' namespace. For now, only the base:allocation
  * context is available.  Return true if @query has been handled.
  */
-static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
+static bool nbd_meta_base_query(NBDClient *client, NBDMetaContexts *meta,
                                 const char *query)
 {
     if (!nbd_strshift(&query, "base:")) {
@@ -872,8 +859,8 @@ static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
  * and qemu:allocation-depth contexts are available.  Return true if @query
  * has been handled.
  */
-static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
-                                const char *query)
+static bool nbd_meta_qemu_query(NBDClient *client, NBDExport *exp,
+                                NBDMetaContexts *meta, const char *query)
 {
     size_t i;

@@ -884,9 +871,9 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,

     if (!*query) {
         if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-            meta->allocation_depth = meta->exp->allocation_depth;
-            if (meta->exp->nr_export_bitmaps) {
-                memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
+            meta->allocation_depth = exp->allocation_depth;
+            if (meta->nr_bitmaps) {
+                memset(meta->bitmaps, 1, meta->nr_bitmaps);
             }
         }
         trace_nbd_negotiate_meta_query_parse("empty");
@@ -895,7 +882,7 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,

     if (strcmp(query, "allocation-depth") == 0) {
         trace_nbd_negotiate_meta_query_parse("allocation-depth");
-        meta->allocation_depth = meta->exp->allocation_depth;
+        meta->allocation_depth = exp->allocation_depth;
         return true;
     }

@@ -903,17 +890,17 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
         trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
         if (!*query) {
             if (client->opt == NBD_OPT_LIST_META_CONTEXT &&
-                meta->exp->nr_export_bitmaps) {
-                memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
+                exp->nr_export_bitmaps) {
+                memset(meta->bitmaps, 1, exp->nr_export_bitmaps);
             }
             trace_nbd_negotiate_meta_query_parse("empty");
             return true;
         }

-        for (i = 0; i < meta->exp->nr_export_bitmaps; i++) {
+        for (i = 0; i < meta->nr_bitmaps; i++) {
             const char *bm_name;

-            bm_name = bdrv_dirty_bitmap_name(meta->exp->export_bitmaps[i]);
+            bm_name = bdrv_dirty_bitmap_name(exp->export_bitmaps[i]);
             if (strcmp(bm_name, query) == 0) {
                 meta->bitmaps[i] = true;
                 trace_nbd_negotiate_meta_query_parse(query);
@@ -937,8 +924,8 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
  *
  * Return -errno on I/O error, 0 if option was completely handled by
  * sending a reply about inconsistent lengths, or 1 on success. */
-static int nbd_negotiate_meta_query(NBDClient *client,
-                                    NBDExportMetaContexts *meta, Error **errp)
+static int nbd_negotiate_meta_query(NBDClient *client, NBDExport *exp,
+                                    NBDMetaContexts *meta, Error **errp)
 {
     int ret;
     g_autofree char *query = NULL;
@@ -965,7 +952,7 @@ static int nbd_negotiate_meta_query(NBDClient *client,
     if (nbd_meta_base_query(client, meta, query)) {
         return 1;
     }
-    if (nbd_meta_qemu_query(client, meta, query)) {
+    if (nbd_meta_qemu_query(client, exp, meta, query)) {
         return 1;
     }

@@ -977,14 +964,15 @@ static int nbd_negotiate_meta_query(NBDClient *client,
  * Handle NBD_OPT_LIST_META_CONTEXT and NBD_OPT_SET_META_CONTEXT
  *
  * Return -errno on I/O error, or 0 if option was completely handled. */
-static int nbd_negotiate_meta_queries(NBDClient *client,
-                                      NBDExportMetaContexts *meta, Error **errp)
+static int nbd_negotiate_meta_queries(NBDClient *client, Error **errp)
 {
     int ret;
     g_autofree char *export_name = NULL;
     /* Mark unused to work around https://bugs.llvm.org/show_bug.cgi?id=3888 */
     g_autofree G_GNUC_UNUSED bool *bitmaps = NULL;
-    NBDExportMetaContexts local_meta = {0};
+    NBDMetaContexts local_meta = {0};
+    NBDMetaContexts *meta;
+    NBDExport *exp;
     uint32_t nb_queries;
     size_t i;
     size_t count = 0;
@@ -1000,6 +988,9 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
     if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
         /* Only change the caller's meta on SET. */
         meta = &local_meta;
+    } else {
+        meta = &client->contexts;
+        client->context_exp = NULL;
     }

     g_free(meta->bitmaps);
@@ -1010,14 +1001,15 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
         return ret;
     }

-    meta->exp = nbd_export_find(export_name);
-    if (meta->exp == NULL) {
+    exp = nbd_export_find(export_name);
+    if (exp == NULL) {
         g_autofree char *sane_name = nbd_sanitize_name(export_name);

         return nbd_opt_drop(client, NBD_REP_ERR_UNKNOWN, errp,
                             "export '%s' not present", sane_name);
     }
-    meta->bitmaps = g_new0(bool, meta->exp->nr_export_bitmaps);
+    meta->nr_bitmaps = exp->nr_export_bitmaps;
+    meta->bitmaps = g_new0(bool, exp->nr_export_bitmaps);
     if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
         bitmaps = meta->bitmaps;
     }
@@ -1033,13 +1025,13 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
     if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
         /* enable all known contexts */
         meta->base_allocation = true;
-        meta->allocation_depth = meta->exp->allocation_depth;
-        if (meta->exp->nr_export_bitmaps) {
-            memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
+        meta->allocation_depth = exp->allocation_depth;
+        if (exp->nr_export_bitmaps) {
+            memset(meta->bitmaps, 1, meta->nr_bitmaps);
         }
     } else {
         for (i = 0; i < nb_queries; ++i) {
-            ret = nbd_negotiate_meta_query(client, meta, errp);
+            ret = nbd_negotiate_meta_query(client, exp, meta, errp);
             if (ret <= 0) {
                 return ret;
             }
@@ -1066,7 +1058,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
         count++;
     }

-    for (i = 0; i < meta->exp->nr_export_bitmaps; i++) {
+    for (i = 0; i < meta->nr_bitmaps; i++) {
         const char *bm_name;
         g_autofree char *context = NULL;

@@ -1074,7 +1066,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
             continue;
         }

-        bm_name = bdrv_dirty_bitmap_name(meta->exp->export_bitmaps[i]);
+        bm_name = bdrv_dirty_bitmap_name(exp->export_bitmaps[i]);
         context = g_strdup_printf("qemu:dirty-bitmap:%s", bm_name);

         ret = nbd_negotiate_send_meta_context(client, context,
@@ -1089,6 +1081,9 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
     ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
     if (ret == 0) {
         meta->count = count;
+        if (client->opt == NBD_OPT_SET_META_CONTEXT) {
+            client->context_exp = exp;
+        }
     }

     return ret;
@@ -1282,8 +1277,7 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)

             case NBD_OPT_LIST_META_CONTEXT:
             case NBD_OPT_SET_META_CONTEXT:
-                ret = nbd_negotiate_meta_queries(client, &client->export_meta,
-                                                 errp);
+                ret = nbd_negotiate_meta_queries(client, errp);
                 break;

             case NBD_OPT_EXTENDED_HEADERS:
@@ -1514,7 +1508,7 @@ void nbd_client_put(NBDClient *client)
             QTAILQ_REMOVE(&client->exp->clients, client, next);
             blk_exp_unref(&client->exp->common);
         }
-        g_free(client->export_meta.bitmaps);
+        g_free(client->contexts.bitmaps);
         g_free(client);
     }
 }
@@ -2489,6 +2483,8 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
                 return -ENOMEM;
             }
         }
+    } else if (request->type == NBD_CMD_BLOCK_STATUS) {
+        request->contexts = client->contexts;
     }

     if (payload_len) {
@@ -2715,11 +2711,11 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         }
         assert(client->header_style >= NBD_HEADER_EXTENDED ||
                request->len <= UINT32_MAX);
-        if (client->export_meta.count) {
+        if (request->contexts.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
-            int contexts_remaining = client->export_meta.count;
+            int contexts_remaining = request->contexts.count;

-            if (client->export_meta.base_allocation) {
+            if (request->contexts.base_allocation) {
                 ret = nbd_co_send_block_status(client, request,
                                                exp->common.blk,
                                                request->from,
@@ -2732,7 +2728,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                 }
             }

-            if (client->export_meta.allocation_depth) {
+            if (request->contexts.allocation_depth) {
                 ret = nbd_co_send_block_status(client, request,
                                                exp->common.blk,
                                                request->from, request->len,
@@ -2745,8 +2741,10 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                 }
             }

+            assert(request->contexts.nr_bitmaps ==
+                   client->exp->nr_export_bitmaps);
             for (i = 0; i < client->exp->nr_export_bitmaps; i++) {
-                if (!client->export_meta.bitmaps[i]) {
+                if (!request->contexts.bitmaps[i]) {
                     continue;
                 }
                 ret = nbd_co_send_bitmap(client, request,
@@ -2762,6 +2760,10 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             assert(!contexts_remaining);

             return 0;
+        } else if (client->contexts.count) {
+            return nbd_send_generic_reply(client, request, -EINVAL,
+                                          "CMD_BLOCK_STATUS payload not valid",
+                                          errp);
         } else {
             return nbd_send_generic_reply(client, request, -EINVAL,
                                           "CMD_BLOCK_STATUS not negotiated",
@@ -2840,6 +2842,10 @@ static coroutine_fn void nbd_trip(void *opaque)
     } else {
         ret = nbd_handle_request(client, &request, req->data, &local_err);
     }
+    if (request.type == NBD_CMD_BLOCK_STATUS &&
+        request.contexts.bitmaps != client->contexts.bitmaps) {
+        g_free(request.contexts.bitmaps);
+    }
     if (ret < 0) {
         error_prepend(&local_err, "Failed to send reply: ");
         goto disconnect;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 14/14] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (12 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 13/14] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
@ 2023-05-15 19:53 ` Eric Blake
  2023-06-02  9:13   ` Vladimir Sementsov-Ogievskiy
  2023-05-15 21:05 ` [Libguestfs] [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
  14 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-15 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: libguestfs, vsementsov, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

Allow a client to request a subset of negotiated meta contexts.  For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).

Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads.  Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.

This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 docs/interop/nbd.txt                          |   2 +-
 include/block/nbd.h                           |  32 ++++--
 nbd/server.c                                  | 106 +++++++++++++++++-
 qemu-nbd.c                                    |   1 +
 nbd/trace-events                              |   1 +
 tests/qemu-iotests/223.out                    |  12 +-
 tests/qemu-iotests/307.out                    |  10 +-
 .../tests/nbd-qemu-allocation.out             |   2 +-
 8 files changed, 136 insertions(+), 30 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index abaf4c28a96..83d85ce8d13 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,4 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
-* 8.1: NBD_OPT_EXTENDED_HEADERS
+* 8.1: NBD_OPT_EXTENDED_HEADERS, NBD_FLAG_BLOCK_STATUS_PAYLOAD
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 6696d61bd59..3d8d7150121 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -175,6 +175,12 @@ typedef struct NBDExtentExt {
     uint64_t flags; /* NBD_STATE_* */
 } QEMU_PACKED NBDExtentExt;

+/* Client payload for limiting NBD_CMD_BLOCK_STATUS reply */
+typedef struct NBDBlockStatusPayload {
+    uint64_t effect_length;
+    /* uint32_t ids[] follows, array length implied by header */
+} QEMU_PACKED NBDBlockStatusPayload;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 enum {
@@ -191,20 +197,22 @@ enum {
     NBD_FLAG_SEND_RESIZE_BIT        =  9, /* Send resize */
     NBD_FLAG_SEND_CACHE_BIT         = 10, /* Send CACHE (prefetch) */
     NBD_FLAG_SEND_FAST_ZERO_BIT     = 11, /* FAST_ZERO flag for WRITE_ZEROES */
+    NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT = 12, /* PAYLOAD flag for BLOCK_STATUS */
 };

-#define NBD_FLAG_HAS_FLAGS         (1 << NBD_FLAG_HAS_FLAGS_BIT)
-#define NBD_FLAG_READ_ONLY         (1 << NBD_FLAG_READ_ONLY_BIT)
-#define NBD_FLAG_SEND_FLUSH        (1 << NBD_FLAG_SEND_FLUSH_BIT)
-#define NBD_FLAG_SEND_FUA          (1 << NBD_FLAG_SEND_FUA_BIT)
-#define NBD_FLAG_ROTATIONAL        (1 << NBD_FLAG_ROTATIONAL_BIT)
-#define NBD_FLAG_SEND_TRIM         (1 << NBD_FLAG_SEND_TRIM_BIT)
-#define NBD_FLAG_SEND_WRITE_ZEROES (1 << NBD_FLAG_SEND_WRITE_ZEROES_BIT)
-#define NBD_FLAG_SEND_DF           (1 << NBD_FLAG_SEND_DF_BIT)
-#define NBD_FLAG_CAN_MULTI_CONN    (1 << NBD_FLAG_CAN_MULTI_CONN_BIT)
-#define NBD_FLAG_SEND_RESIZE       (1 << NBD_FLAG_SEND_RESIZE_BIT)
-#define NBD_FLAG_SEND_CACHE        (1 << NBD_FLAG_SEND_CACHE_BIT)
-#define NBD_FLAG_SEND_FAST_ZERO    (1 << NBD_FLAG_SEND_FAST_ZERO_BIT)
+#define NBD_FLAG_HAS_FLAGS          (1 << NBD_FLAG_HAS_FLAGS_BIT)
+#define NBD_FLAG_READ_ONLY          (1 << NBD_FLAG_READ_ONLY_BIT)
+#define NBD_FLAG_SEND_FLUSH         (1 << NBD_FLAG_SEND_FLUSH_BIT)
+#define NBD_FLAG_SEND_FUA           (1 << NBD_FLAG_SEND_FUA_BIT)
+#define NBD_FLAG_ROTATIONAL         (1 << NBD_FLAG_ROTATIONAL_BIT)
+#define NBD_FLAG_SEND_TRIM          (1 << NBD_FLAG_SEND_TRIM_BIT)
+#define NBD_FLAG_SEND_WRITE_ZEROES  (1 << NBD_FLAG_SEND_WRITE_ZEROES_BIT)
+#define NBD_FLAG_SEND_DF            (1 << NBD_FLAG_SEND_DF_BIT)
+#define NBD_FLAG_CAN_MULTI_CONN     (1 << NBD_FLAG_CAN_MULTI_CONN_BIT)
+#define NBD_FLAG_SEND_RESIZE        (1 << NBD_FLAG_SEND_RESIZE_BIT)
+#define NBD_FLAG_SEND_CACHE         (1 << NBD_FLAG_SEND_CACHE_BIT)
+#define NBD_FLAG_SEND_FAST_ZERO     (1 << NBD_FLAG_SEND_FAST_ZERO_BIT)
+#define NBD_FLAG_BLOCK_STAT_PAYLOAD (1 << NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT)

 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
diff --git a/nbd/server.c b/nbd/server.c
index db550c82cd2..ce11285c0d7 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -442,9 +442,9 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp)
     return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
 }

-static void nbd_check_meta_export(NBDClient *client)
+static void nbd_check_meta_export(NBDClient *client, NBDExport *exp)
 {
-    if (client->exp != client->context_exp) {
+    if (exp != client->context_exp) {
         client->contexts.count = 0;
     }
 }
@@ -491,11 +491,15 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
         error_setg(errp, "export not found");
         return -EINVAL;
     }
+    nbd_check_meta_export(client, client->exp);

     myflags = client->exp->nbdflags;
     if (client->header_style >= NBD_HEADER_STRUCTURED) {
         myflags |= NBD_FLAG_SEND_DF;
     }
+    if (client->extended_headers && client->contexts.count) {
+        myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+    }
     trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags);
     stq_be_p(buf, client->exp->size);
     stw_be_p(buf + 8, myflags);
@@ -508,7 +512,6 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,

     QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
     blk_exp_ref(&client->exp->common);
-    nbd_check_meta_export(client);

     return 0;
 }
@@ -628,6 +631,9 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
                                           errp, "export '%s' not present",
                                           sane_name);
     }
+    if (client->opt == NBD_OPT_GO) {
+        nbd_check_meta_export(client, exp);
+    }

     /* Don't bother sending NBD_INFO_NAME unless client requested it */
     if (sendname) {
@@ -681,6 +687,10 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
     if (client->header_style >= NBD_HEADER_STRUCTURED) {
         myflags |= NBD_FLAG_SEND_DF;
     }
+    if (client->extended_headers &&
+        (client->contexts.count || client->opt == NBD_OPT_INFO)) {
+        myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+    }
     trace_nbd_negotiate_new_style_size_flags(exp->size, myflags);
     stq_be_p(buf, exp->size);
     stw_be_p(buf + 8, myflags);
@@ -716,7 +726,6 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
         client->check_align = check_align;
         QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
         blk_exp_ref(&client->exp->common);
-        nbd_check_meta_export(client);
         rc = 1;
     }
     return rc;
@@ -2415,6 +2424,83 @@ static int coroutine_fn nbd_co_send_bitmap(NBDClient *client,
     return nbd_co_send_extents(client, request, ea, last, context_id, errp);
 }

+/*
+ * nbd_co_block_status_payload_read
+ * Called when a client wants a subset of negotiated contexts via a
+ * BLOCK_STATUS payload.  Check the payload for valid length and
+ * contents.  On success, return 0 with request updated to effective
+ * length.  If request was invalid but payload consumed, return 0 with
+ * request->len and request->contexts.count set to 0 (which will
+ * trigger an appropriate NBD_EINVAL response later on).  On I/O
+ * error, return -EIO.
+ */
+static int
+nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
+                                 Error **errp)
+{
+    int payload_len = request->len;
+    g_autofree char *buf = NULL;
+    g_autofree bool *bitmaps = NULL;
+    size_t count, i;
+    uint32_t id;
+
+    assert(request->len <= NBD_MAX_BUFFER_SIZE);
+    if (payload_len % sizeof(uint32_t) ||
+        payload_len < sizeof(NBDBlockStatusPayload) ||
+        payload_len > (sizeof(NBDBlockStatusPayload) +
+                       sizeof(id) * client->contexts.count)) {
+        goto skip;
+    }
+
+    buf = g_malloc(payload_len);
+    if (nbd_read(client->ioc, buf, payload_len,
+                 "CMD_BLOCK_STATUS data", errp) < 0) {
+        return -EIO;
+    }
+    trace_nbd_co_receive_request_payload_received(request->handle,
+                                                  payload_len);
+    memset(&request->contexts, 0, sizeof(request->contexts));
+    request->contexts.nr_bitmaps = client->context_exp->nr_export_bitmaps;
+    bitmaps = g_new0(bool, request->contexts.nr_bitmaps);
+    count = (payload_len - sizeof(NBDBlockStatusPayload)) / sizeof(id);
+    payload_len = 0;
+
+    for (i = 0; i < count; i++) {
+
+        id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
+        if (id == NBD_META_ID_BASE_ALLOCATION) {
+            if (request->contexts.base_allocation) {
+                goto skip;
+            }
+            request->contexts.base_allocation = true;
+        } else if (id == NBD_META_ID_ALLOCATION_DEPTH) {
+            if (request->contexts.allocation_depth) {
+                goto skip;
+            }
+            request->contexts.allocation_depth = true;
+        } else {
+            if (id - NBD_META_ID_DIRTY_BITMAP >
+                request->contexts.nr_bitmaps ||
+                bitmaps[id - NBD_META_ID_DIRTY_BITMAP]) {
+                goto skip;
+            }
+            bitmaps[id - NBD_META_ID_DIRTY_BITMAP] = true;
+        }
+    }
+
+    request->len = ldq_be_p(buf);
+    request->contexts.count = count;
+    request->contexts.bitmaps = bitmaps;
+    bitmaps = NULL;
+    return 0;
+
+ skip:
+    trace_nbd_co_receive_block_status_payload_compliance(request->from,
+                                                         request->len);
+    request->len = request->contexts.count = 0;
+    return nbd_drop(client->ioc, payload_len, errp);
+}
+
 /* nbd_co_receive_request
  * Collect a client request. Return 0 if request looks valid, -EIO to drop
  * connection right away, -EAGAIN to indicate we were interrupted and the
@@ -2461,7 +2547,14 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *

         if (request->type == NBD_CMD_WRITE || extended_with_payload) {
             payload_len = request->len;
-            if (request->type != NBD_CMD_WRITE) {
+            if (request->type == NBD_CMD_BLOCK_STATUS) {
+                payload_len = nbd_co_block_status_payload_read(client,
+                                                               request,
+                                                               errp);
+                if (payload_len < 0) {
+                    return -EIO;
+                }
+            } else if (request->type != NBD_CMD_WRITE) {
                 /*
                  * For now, we don't support payloads on other
                  * commands; but we can keep the connection alive.
@@ -2540,6 +2633,9 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
         valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO;
     } else if (request->type == NBD_CMD_BLOCK_STATUS) {
         valid_flags |= NBD_CMD_FLAG_REQ_ONE;
+        if (client->extended_headers && client->contexts.count) {
+            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
+        }
     }
     if (request->flags & ~valid_flags) {
         error_setg(errp, "unsupported flags for command %s (got 0x%x)",
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 8c35442626a..b7ab0fdc791 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -222,6 +222,7 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
                 [NBD_FLAG_SEND_RESIZE_BIT]          = "resize",
                 [NBD_FLAG_SEND_CACHE_BIT]           = "cache",
                 [NBD_FLAG_SEND_FAST_ZERO_BIT]       = "fast-zero",
+                [NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT]   = "block-status-payload",
             };

             printf("  size:  %" PRIu64 "\n", list[i].size);
diff --git a/nbd/trace-events b/nbd/trace-events
index c20df33a431..da92fe1b56b 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -70,6 +70,7 @@ nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void *data, size_t
 nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset, size_t size) "Send structured read hole reply: handle = %" PRIu64 ", offset = %" PRIu64 ", len = %zu"
 nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: handle = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
 nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'"
+nbd_co_receive_block_status_payload_compliance(uint64_t from, int len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%x"
 nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
 nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64
 nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index b98582c38ea..b38f0b7963b 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -83,7 +83,7 @@ exports available: 0
 exports available: 3
  export: 'n'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -94,7 +94,7 @@ exports available: 3
  export: 'n2'
   description: some text
   size:  4194304
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -104,7 +104,7 @@ exports available: 3
    qemu:dirty-bitmap:b2
  export: 'n3'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -205,7 +205,7 @@ exports available: 0
 exports available: 3
  export: 'n'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -216,7 +216,7 @@ exports available: 3
  export: 'n2'
   description: some text
   size:  4194304
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -226,7 +226,7 @@ exports available: 3
    qemu:dirty-bitmap:b2
  export: 'n3'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
diff --git a/tests/qemu-iotests/307.out b/tests/qemu-iotests/307.out
index 2b9a6a67a1a..f645f3315f8 100644
--- a/tests/qemu-iotests/307.out
+++ b/tests/qemu-iotests/307.out
@@ -15,7 +15,7 @@ wrote 4096/4096 bytes at offset 0
 exports available: 1
  export: 'fmt'
   size:  67108864
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -44,7 +44,7 @@ exports available: 1
 exports available: 1
  export: 'fmt'
   size:  67108864
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -76,7 +76,7 @@ exports available: 1
 exports available: 2
  export: 'fmt'
   size:  67108864
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -86,7 +86,7 @@ exports available: 2
  export: 'export1'
   description: This is the writable second export
   size:  67108864
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -113,7 +113,7 @@ exports available: 1
  export: 'export1'
   description: This is the writable second export
   size:  67108864
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
index 659276032b0..794d1bfce62 100644
--- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out
+++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
@@ -17,7 +17,7 @@ wrote 2097152/2097152 bytes at offset 1048576
 exports available: 1
  export: ''
   size:  4194304
-  flags: 0x48f ( readonly flush fua df cache )
+  flags: 0x148f ( readonly flush fua df cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Libguestfs] [PATCH v3 00/14] qemu patches for 64-bit NBD extensions
  2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
                   ` (13 preceding siblings ...)
  2023-05-15 19:53 ` [PATCH v3 14/14] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
@ 2023-05-15 21:05 ` Eric Blake
  14 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2023-05-15 21:05 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: libguestfs, vsementsov


Adding qemu-block for the cover letter (not sure how I missed that the
first time).

On Mon, May 15, 2023 at 02:53:29PM -0500, Eric Blake wrote:
> 
> v2 was here:
> https://lists.gnu.org/archive/html/qemu-devel/2022-11/msg02340.html
> 
> Since then:
>  - upstream NBD has accepted the extension on a branch; once multiple
>    implementations interoperate based on that spec, it will be promoted
>    to mainline (my plan: qemu with this series, libnbd nearly ready to
>    go, nbdkit a bit further out)
>  - rebase to block changes in meantime
>  - drop RFC patches for 64-bit NBD_CMD_READ (NBD spec did not take them)
>  - per upstream spec decision, extended headers now mandates use of
>    NBD_REPLY_TYPE_BLOCK_STATUS_EXT rather than server choice based on
>    reply size, which in turn required rearranging server patches a bit
>  - other changes that I noticed while testing with parallel changes
>    being added to libnbd (link to those patches to follow in the next
>    week or so)

If it helps review, I compared to my v2 posting as follows:

001/14:[0007] [FC] 'nbd/client: Use smarter assert'
002/14:[----] [--] 'nbd/client: Add safety check on chunk payload length'
003/14:[----] [-C] 'nbd/server: Prepare for alternate-size headers'
004/14:[0099] [FC] 'nbd: Prepare for 64-bit request effect lengths'
005/14:[0002] [FC] 'nbd: Add types for extended headers'
006/14:[0012] [FC] 'nbd/server: Refactor handling of request payload'
007/14:[0026] [FC] 'nbd/server: Refactor to pass full request around'
008/14:[0052] [FC] 'nbd/server: Support 64-bit block status'
009/14:[0032] [FC] 'nbd/server: Initial support for extended headers'
010/14:[0020] [FC] 'nbd/client: Initial support for extended headers'
011/14:[0015] [FC] 'nbd/client: Accept 64-bit block status chunks'
012/14:[0042] [FC] 'nbd/client: Request extended headers during negotiation'
013/14:[0005] [FC] 'nbd/server: Prepare for per-request filtering of BLOCK_STATUS'
014/14:[0004] [FC] 'nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS'

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 01/14] nbd/client: Use smarter assert
  2023-05-15 19:53 ` [PATCH v3 01/14] nbd/client: Use smarter assert Eric Blake
@ 2023-05-29  8:20   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-29  8:20 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: libguestfs, Dr . David Alan Gilbert, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Assigning strlen() to a uint32_t and then asserting that it isn't too
> large doesn't catch the case of an input string 4G in length.
> Thankfully, the incoming strings can never be that large: if the
> export name or query is reflecting a string the client got from the
> server, we already guarantee that we dropped the NBD connection if the
> server sent more than 32M in a single reply to our NBD_OPT_* request;
> if the export name is coming from qemu, nbd_receive_negotiate()
> asserted that strlen(info->name) <= NBD_MAX_STRING_SIZE; and
> similarly, a query string via x->dirty_bitmap coming from the user was
> bounds-checked in either qemu-nbd or by the limitations of QMP.
> Still, it doesn't hurt to be more explicit in how we write our
> assertions to not have to analyze whether inadvertent wraparound is
> possible.
> 
> Fixes: 93676c88 ("nbd: Don't send oversize strings", v4.2.0)
> Reported-by: Dr. David Alan Gilbert<dave@treblig.org>
> Signed-off-by: Eric Blake<eblake@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 02/14] nbd/client: Add safety check on chunk payload length
  2023-05-15 19:53 ` [PATCH v3 02/14] nbd/client: Add safety check on chunk payload length Eric Blake
@ 2023-05-29  8:25   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-29  8:25 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Our existing use of structured replies either reads into a qiov capped
> at 32M (NBD_CMD_READ) or caps allocation to 1000 bytes (see
> NBD_MAX_MALLOC_PAYLOAD in block/nbd.c).  But the existing length
> checks are rather late; if we encounter a buggy (or malicious) server
> that sends a super-large payload length, we should drop the connection
> right then rather than assuming the layer on top will be careful.
> This becomes more important when we permit 64-bit lengths which are
> even more likely to have the potential for attempted denial of service
> abuse.
> 
> Signed-off-by: Eric Blake<eblake@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 03/14] nbd/server: Prepare for alternate-size headers
  2023-05-15 19:53 ` [PATCH v3 03/14] nbd/server: Prepare for alternate-size headers Eric Blake
@ 2023-05-29 14:26   ` Vladimir Sementsov-Ogievskiy
  2023-05-30 16:29     ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-29 14:26 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: libguestfs, Kevin Wolf, Hanna Reitz, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Upstream NBD now documents[1] an extension that supports 64-bit effect
> lengths in requests.  As part of that extension, the size of the reply
> headers will change in order to permit a 64-bit length in the reply
> for symmetry[2].  Additionally, where the reply header is currently
> 16 bytes for simple reply, and 20 bytes for structured reply; with the
> extension enabled, there will only be one structured reply type, of 32
> bytes.  Since we are already wired up to use iovecs, it is easiest to
> allow for this change in header size by splitting each structured
> reply across two iovecs, one for the header (which will become
> variable-length in a future patch according to client negotiation),
> and the other for the payload, and removing the header from the
> payload struct definitions.  Interestingly, the client side code never
> utilized the packed types, so only the server code needs to be
> updated.

Actually, it does, since previous patch :) But, it becomes even better now? Anyway some note somewhere is needed I think.

> 
> [1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
> as of NBD commit e6f3b94a934
> 
> [2] Note that on the surface, this is because some future server might
> permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
> transaction.  But even though the extended reply length is widened to
> 64 bits, for now the NBD spec is clear that servers will not reply
> with more than a maximum payload bounded by the 32-bit
> NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
> agree to transactions larger than 4G would require yet another
> extension.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

> ---
>   include/block/nbd.h |  8 +++---
>   nbd/server.c        | 64 ++++++++++++++++++++++++++++-----------------
>   2 files changed, 44 insertions(+), 28 deletions(-)
> 

[..]

> 
> -static inline void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
> -                                uint16_t type, uint64_t handle, uint32_t length)
> +static inline void set_be_chunk(NBDClient *client, struct iovec *iov,

Suggestion: pass niov here too, and caluculate "length" as a sum of iov lengths starting from second extent automatically.

Also, worth documenting that set_be_chunk() expects half-initialized iov, with .iov_base pointing to NBDReply (IN parameter) and .iov_len unset (OUT parameter), as that's not usual practice

> +                                uint16_t flags, uint16_t type,
> +                                uint64_t handle, uint32_t length)
>   {
> +    NBDStructuredReplyChunk *chunk = iov->iov_base;
> +
> +    iov->iov_len = sizeof(*chunk);
>       stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
>       stw_be_p(&chunk->flags, flags);
>       stw_be_p(&chunk->type, type);

[..]

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/14] nbd: Prepare for 64-bit request effect lengths
  2023-05-15 19:53 ` [PATCH v3 04/14] nbd: Prepare for 64-bit request effect lengths Eric Blake
@ 2023-05-30 13:05   ` Vladimir Sementsov-Ogievskiy
  2023-05-30 18:23     ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-30 13:05 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: libguestfs, Kevin Wolf, Hanna Reitz, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Widen the length field of NBDRequest to 64-bits, although we can
> assert that all current uses are still under 32 bits.  Move the
> request magic number to nbd.h, to live alongside the reply magic
> number.  Convert 'bool structured_reply' into a tri-state enum that
> will eventually track whether the client successfully negotiated
> extended headers with the server, allowing the nbd driver to pass
> larger requests along where possible; although in this patch the enum
> never surpasses structured replies, for no semantic change yet.
> 
> Signed-off-by: Eric Blake<eblake@redhat.com>

Seems too much for one patch, could it could be split into
- Convert 'bool structured_reply'
- introduce third parameter for nbd_send_request()
- rework len to 64bit

otherwise, looks good to me

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 05/14] nbd: Add types for extended headers
  2023-05-15 19:53 ` [PATCH v3 05/14] nbd: Add types for extended headers Eric Blake
@ 2023-05-30 13:23   ` Vladimir Sementsov-Ogievskiy
  2023-05-30 18:22     ` [Libguestfs] " Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-30 13:23 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: libguestfs, Kevin Wolf, Hanna Reitz, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Add the constants and structs necessary for later patches to start
> implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client
> and server, matching recent commit e6f3b94a934] in the upstream nbd
> project.  This patch does not change any existing behavior, but merely
> sets the stage.
> 
> This patch does not change the status quo that neither the client nor
> server use a packed-struct representation for the request header.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

> ---
>   docs/interop/nbd.txt |  1 +
>   include/block/nbd.h  | 74 ++++++++++++++++++++++++++++++++------------
>   nbd/common.c         | 10 +++++-
>   3 files changed, 65 insertions(+), 20 deletions(-)
> 
> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
> index f5ca25174a6..abaf4c28a96 100644
> --- a/docs/interop/nbd.txt
> +++ b/docs/interop/nbd.txt
> @@ -69,3 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
>   NBD_CMD_FLAG_FAST_ZERO
>   * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
>   * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
> +* 8.1: NBD_OPT_EXTENDED_HEADERS
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 50626ab2744..d753fb8006f 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -87,13 +87,24 @@ typedef struct NBDStructuredReplyChunk {
>       uint32_t length; /* length of payload */
>   } QEMU_PACKED NBDStructuredReplyChunk;
> 

[..]

> -/* Extent chunk for NBD_REPLY_TYPE_BLOCK_STATUS */
> +/* Extent array for NBD_REPLY_TYPE_BLOCK_STATUS */

Why? NBDExtent is one extent, not extent array.

>   typedef struct NBDExtent {
>       uint32_t length;
>       uint32_t flags; /* NBD_STATE_* */
>   } QEMU_PACKED NBDExtent;
> 
> +/* Header of NBD_REPLY_TYPE_BLOCK_STATUS_EXT */



-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 03/14] nbd/server: Prepare for alternate-size headers
  2023-05-29 14:26   ` Vladimir Sementsov-Ogievskiy
@ 2023-05-30 16:29     ` Eric Blake
  2023-05-31  7:28       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-30 16:29 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, libguestfs, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

On Mon, May 29, 2023 at 05:26:50PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.05.23 22:53, Eric Blake wrote:
> > Upstream NBD now documents[1] an extension that supports 64-bit effect
> > lengths in requests.  As part of that extension, the size of the reply
> > headers will change in order to permit a 64-bit length in the reply
> > for symmetry[2].  Additionally, where the reply header is currently
> > 16 bytes for simple reply, and 20 bytes for structured reply; with the
> > extension enabled, there will only be one structured reply type, of 32
> > bytes.  Since we are already wired up to use iovecs, it is easiest to
> > allow for this change in header size by splitting each structured
> > reply across two iovecs, one for the header (which will become
> > variable-length in a future patch according to client negotiation),
> > and the other for the payload, and removing the header from the
> > payload struct definitions.  Interestingly, the client side code never
> > utilized the packed types, so only the server code needs to be
> > updated.
> 
> Actually, it does, since previous patch :) But, it becomes even better now? Anyway some note somewhere is needed I think.

Oh, indeed - but only in a sizeof expression for an added assertion
check, and not actually for in-memory storage.

If you are envisioning a comment addition, where are you thinking it
should be placed?

> 
> > 
> > -static inline void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
> > -                                uint16_t type, uint64_t handle, uint32_t length)
> > +static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
> 
> Suggestion: pass niov here too, and caluculate "length" as a sum of iov lengths starting from second extent automatically.

Makes sense.

> 
> Also, worth documenting that set_be_chunk() expects half-initialized iov, with .iov_base pointing to NBDReply (IN parameter) and .iov_len unset (OUT parameter), as that's not usual practice

Yeah, I'm not sure if there is a better interface, but either I come
up with one, or at least better document what I landed on.

> 
> > +                                uint16_t flags, uint16_t type,
> > +                                uint64_t handle, uint32_t length)
> >   {
> > +    NBDStructuredReplyChunk *chunk = iov->iov_base;
> > +
> > +    iov->iov_len = sizeof(*chunk);
> >       stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
> >       stw_be_p(&chunk->flags, flags);
> >       stw_be_p(&chunk->type, type);
> 
> [..]
> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Libguestfs] [PATCH v3 05/14] nbd: Add types for extended headers
  2023-05-30 13:23   ` Vladimir Sementsov-Ogievskiy
@ 2023-05-30 18:22     ` Eric Blake
  2023-05-31  7:30       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2023-05-30 18:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...,
	libguestfs

On Tue, May 30, 2023 at 04:23:46PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.05.23 22:53, Eric Blake wrote:
> > Add the constants and structs necessary for later patches to start
> > implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client
> > and server, matching recent commit e6f3b94a934] in the upstream nbd
> > project.  This patch does not change any existing behavior, but merely
> > sets the stage.
> > 
> > This patch does not change the status quo that neither the client nor
> > server use a packed-struct representation for the request header.
> > 
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> 
> > ---
> >   docs/interop/nbd.txt |  1 +
> >   include/block/nbd.h  | 74 ++++++++++++++++++++++++++++++++------------
> >   nbd/common.c         | 10 +++++-
> >   3 files changed, 65 insertions(+), 20 deletions(-)
> > 
> > diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
> > index f5ca25174a6..abaf4c28a96 100644
> > --- a/docs/interop/nbd.txt
> > +++ b/docs/interop/nbd.txt
> > @@ -69,3 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
> >   NBD_CMD_FLAG_FAST_ZERO
> >   * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
> >   * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
> > +* 8.1: NBD_OPT_EXTENDED_HEADERS
> > diff --git a/include/block/nbd.h b/include/block/nbd.h
> > index 50626ab2744..d753fb8006f 100644
> > --- a/include/block/nbd.h
> > +++ b/include/block/nbd.h
> > @@ -87,13 +87,24 @@ typedef struct NBDStructuredReplyChunk {
> >       uint32_t length; /* length of payload */
> >   } QEMU_PACKED NBDStructuredReplyChunk;
> > 
> 
> [..]
> 
> > -/* Extent chunk for NBD_REPLY_TYPE_BLOCK_STATUS */
> > +/* Extent array for NBD_REPLY_TYPE_BLOCK_STATUS */
> 
> Why? NBDExtent is one extent, not extent array.

It's not the entire chunk either, because that also includes the
header and the metacontext id that are not part of the extent array.
Maybe 'Extent array element', which matches our wire layout of:

<-  chunk                  ->
<- hdr -><- payload        ->
 ...     id  <- array      ->
             ext[0] ext[1]...

> 
> >   typedef struct NBDExtent {
> >       uint32_t length;
> >       uint32_t flags; /* NBD_STATE_* */
> >   } QEMU_PACKED NBDExtent;
> > 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/14] nbd: Prepare for 64-bit request effect lengths
  2023-05-30 13:05   ` Vladimir Sementsov-Ogievskiy
@ 2023-05-30 18:23     ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2023-05-30 18:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, libguestfs, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

On Tue, May 30, 2023 at 04:05:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.05.23 22:53, Eric Blake wrote:
> > Widen the length field of NBDRequest to 64-bits, although we can
> > assert that all current uses are still under 32 bits.  Move the
> > request magic number to nbd.h, to live alongside the reply magic
> > number.  Convert 'bool structured_reply' into a tri-state enum that
> > will eventually track whether the client successfully negotiated
> > extended headers with the server, allowing the nbd driver to pass
> > larger requests along where possible; although in this patch the enum
> > never surpasses structured replies, for no semantic change yet.
> > 
> > Signed-off-by: Eric Blake<eblake@redhat.com>
> 
> Seems too much for one patch, could it could be split into
> - Convert 'bool structured_reply'
> - introduce third parameter for nbd_send_request()
> - rework len to 64bit

Okay, will give that a shot for v4.

> 
> otherwise, looks good to me
> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 03/14] nbd/server: Prepare for alternate-size headers
  2023-05-30 16:29     ` Eric Blake
@ 2023-05-31  7:28       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-31  7:28 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, libguestfs, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

On 30.05.23 19:29, Eric Blake wrote:
> On Mon, May 29, 2023 at 05:26:50PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 15.05.23 22:53, Eric Blake wrote:
>>> Upstream NBD now documents[1] an extension that supports 64-bit effect
>>> lengths in requests.  As part of that extension, the size of the reply
>>> headers will change in order to permit a 64-bit length in the reply
>>> for symmetry[2].  Additionally, where the reply header is currently
>>> 16 bytes for simple reply, and 20 bytes for structured reply; with the
>>> extension enabled, there will only be one structured reply type, of 32
>>> bytes.  Since we are already wired up to use iovecs, it is easiest to
>>> allow for this change in header size by splitting each structured
>>> reply across two iovecs, one for the header (which will become
>>> variable-length in a future patch according to client negotiation),
>>> and the other for the payload, and removing the header from the
>>> payload struct definitions.  Interestingly, the client side code never
>>> utilized the packed types, so only the server code needs to be
>>> updated.
>>
>> Actually, it does, since previous patch :) But, it becomes even better now? Anyway some note somewhere is needed I think.
> 
> Oh, indeed - but only in a sizeof expression for an added assertion
> check, and not actually for in-memory storage.
> 
> If you are envisioning a comment addition, where are you thinking it
> should be placed?

Thinking of it again, the check in 02 is incorrect originally, as it calculates NBDStructuredReplyChunk as part of payload, and with 03 it becomes correct. So, swapping 02 and 03 commits will make everything correct with no additional comments.

> 
>>
>>>
>>> -static inline void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
>>> -                                uint16_t type, uint64_t handle, uint32_t length)
>>> +static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
>>
>> Suggestion: pass niov here too, and caluculate "length" as a sum of iov lengths starting from second extent automatically.
> 
> Makes sense.
> 
>>
>> Also, worth documenting that set_be_chunk() expects half-initialized iov, with .iov_base pointing to NBDReply (IN parameter) and .iov_len unset (OUT parameter), as that's not usual practice
> 
> Yeah, I'm not sure if there is a better interface, but either I come
> up with one, or at least better document what I landed on.
> 
>>
>>> +                                uint16_t flags, uint16_t type,
>>> +                                uint64_t handle, uint32_t length)
>>>    {
>>> +    NBDStructuredReplyChunk *chunk = iov->iov_base;
>>> +
>>> +    iov->iov_len = sizeof(*chunk);
>>>        stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
>>>        stw_be_p(&chunk->flags, flags);
>>>        stw_be_p(&chunk->type, type);
>>
>> [..]
>>
>> -- 
>> Best regards,
>> Vladimir
>>
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Libguestfs] [PATCH v3 05/14] nbd: Add types for extended headers
  2023-05-30 18:22     ` [Libguestfs] " Eric Blake
@ 2023-05-31  7:30       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-31  7:30 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...,
	libguestfs

On 30.05.23 21:22, Eric Blake wrote:
> On Tue, May 30, 2023 at 04:23:46PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 15.05.23 22:53, Eric Blake wrote:
>>> Add the constants and structs necessary for later patches to start
>>> implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client
>>> and server, matching recent commit e6f3b94a934] in the upstream nbd
>>> project.  This patch does not change any existing behavior, but merely
>>> sets the stage.
>>>
>>> This patch does not change the status quo that neither the client nor
>>> server use a packed-struct representation for the request header.
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>
>>> ---
>>>    docs/interop/nbd.txt |  1 +
>>>    include/block/nbd.h  | 74 ++++++++++++++++++++++++++++++++------------
>>>    nbd/common.c         | 10 +++++-
>>>    3 files changed, 65 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
>>> index f5ca25174a6..abaf4c28a96 100644
>>> --- a/docs/interop/nbd.txt
>>> +++ b/docs/interop/nbd.txt
>>> @@ -69,3 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
>>>    NBD_CMD_FLAG_FAST_ZERO
>>>    * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
>>>    * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
>>> +* 8.1: NBD_OPT_EXTENDED_HEADERS
>>> diff --git a/include/block/nbd.h b/include/block/nbd.h
>>> index 50626ab2744..d753fb8006f 100644
>>> --- a/include/block/nbd.h
>>> +++ b/include/block/nbd.h
>>> @@ -87,13 +87,24 @@ typedef struct NBDStructuredReplyChunk {
>>>        uint32_t length; /* length of payload */
>>>    } QEMU_PACKED NBDStructuredReplyChunk;
>>>
>>
>> [..]
>>
>>> -/* Extent chunk for NBD_REPLY_TYPE_BLOCK_STATUS */
>>> +/* Extent array for NBD_REPLY_TYPE_BLOCK_STATUS */
>>
>> Why? NBDExtent is one extent, not extent array.
> 
> It's not the entire chunk either, because that also includes the
> header and the metacontext id that are not part of the extent array.
> Maybe 'Extent array element', which matches our wire layout of:

Yes, sounds good

> 
> <-  chunk                  ->
> <- hdr -><- payload        ->
>   ...     id  <- array      ->
>               ext[0] ext[1]...
> 
>>
>>>    typedef struct NBDExtent {
>>>        uint32_t length;
>>>        uint32_t flags; /* NBD_STATE_* */
>>>    } QEMU_PACKED NBDExtent;
>>>
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 06/14] nbd/server: Refactor handling of request payload
  2023-05-15 19:53 ` [PATCH v3 06/14] nbd/server: Refactor handling of request payload Eric Blake
@ 2023-05-31  8:04   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-31  8:04 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Upcoming additions to support NBD 64-bit effect lengths allow for the
> possibility to distinguish between payload length (capped at 32M) and
> effect length (up to 63 bits).  Without that extension, only the
> NBD_CMD_WRITE request has a payload; but with the extension, it makes
> sense to allow at least NBD_CMD_BLOCK_STATUS to have both a payload
> and effect length (where the payload is a limited-size struct that in
> turns gives the real effect length as well as a subset of known ids
> for which status is requested).  Other future NBD commands may also
> have a request payload, so the 64-bit extension introduces a new
> NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
> length is a payload length or an effect length, rather than
> hard-coding the decision based on the command.  Note that we do not
> support the payload version of BLOCK_STATUS yet.
> 
> For this patch, no semantic change is intended for a compliant client.
> For a non-compliant client, it is possible that the error behavior
> changes (a different message, a change on whether the connection is
> killed or remains alive for the next command, or so forth), but all
> errors should still be handled gracefully.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   nbd/server.c     | 55 +++++++++++++++++++++++++++++++++---------------
>   nbd/trace-events |  1 +
>   2 files changed, 39 insertions(+), 17 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index cf38a104d9a..5812a773ace 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -2316,6 +2316,8 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
>                                                  Error **errp)
>   {
>       NBDClient *client = req->client;
> +    bool extended_with_payload;
> +    int payload_len = 0;
>       int valid_flags;
>       int ret;
> 
> @@ -2329,27 +2331,41 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
>       trace_nbd_co_receive_request_decode_type(request->handle, request->type,
>                                                nbd_cmd_lookup(request->type));
> 
> -    if (request->type != NBD_CMD_WRITE) {
> -        /* No payload, we are ready to read the next request.  */
> -        req->complete = true;
> -    }
> -
>       if (request->type == NBD_CMD_DISC) {
>           /* Special case: we're going to disconnect without a reply,
>            * whether or not flags, from, or len are bogus */
> +        req->complete = true;
>           return -EIO;
>       }
> 
> +    /* Payload and buffer handling. */
> +    extended_with_payload = client->header_style >= NBD_HEADER_EXTENDED &&
> +        (request->flags & NBD_CMD_FLAG_PAYLOAD_LEN);
>       if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
> -        request->type == NBD_CMD_CACHE)
> -    {
> +        request->type == NBD_CMD_CACHE || extended_with_payload) {
>           if (request->len > NBD_MAX_BUFFER_SIZE) {
>               error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
>                          request->len, NBD_MAX_BUFFER_SIZE);

hmm pre-patch, here req->complete is set to true, except for WRITE request

>               return -EINVAL;
>           }
> 

the whole big if with sub-ifs becomes really hard to read. At least, I think, no reason to keep the following two ifs inside the bigger if, as small ifs just check the same conditions.

> -        if (request->type != NBD_CMD_CACHE) {
> +        if (request->type == NBD_CMD_WRITE || extended_with_payload) {
> +            payload_len = request->len;
> +            if (request->type != NBD_CMD_WRITE) {
> +                /*
> +                 * For now, we don't support payloads on other
> +                 * commands; but we can keep the connection alive.
> +                 */
> +                request->len = 0;
> +            } else if (client->header_style >= NBD_HEADER_EXTENDED &&
> +                       !extended_with_payload) {
> +                /* The client is noncompliant. Trace it, but proceed. */
> +                trace_nbd_co_receive_ext_payload_compliance(request->from,
> +                                                            request->len);
> +            }
> +        }
> +
> +        if (request->type == NBD_CMD_WRITE || request->type == NBD_CMD_READ) {
>               req->data = blk_try_blockalign(client->exp->common.blk,
>                                              request->len);
>               if (req->data == NULL) {
> @@ -2359,18 +2375,20 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
>           }
>       }
> 
> -    if (request->type == NBD_CMD_WRITE) {
> -        assert(request->len <= NBD_MAX_BUFFER_SIZE);
> -        if (nbd_read(client->ioc, req->data, request->len, "CMD_WRITE data",
> -                     errp) < 0)
> -        {
> +    if (payload_len) {
> +        if (req->data) {
> +            ret = nbd_read(client->ioc, req->data, payload_len,
> +                           "CMD_WRITE data", errp);
> +        } else {
> +            ret = nbd_drop(client->ioc, payload_len, errp);
> +        }
> +        if (ret < 0) {
>               return -EIO;
>           }
> -        req->complete = true;
> -
>           trace_nbd_co_receive_request_payload_received(request->handle,
> -                                                      request->len);
> +                                                      payload_len);
>       }
> +    req->complete = true;
> 
>       /* Sanity checks. */
>       if (client->exp->nbdflags & NBD_FLAG_READ_ONLY &&
> @@ -2400,7 +2418,10 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
>                                                 client->check_align);
>       }
>       valid_flags = NBD_CMD_FLAG_FUA;
> -    if (request->type == NBD_CMD_READ &&
> +    if (request->type == NBD_CMD_WRITE &&
> +        client->header_style >= NBD_HEADER_EXTENDED) {
> +        valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
> +    } else if (request->type == NBD_CMD_READ &&
>           client->header_style >= NBD_HEADER_STRUCTURED) {
>           valid_flags |= NBD_CMD_FLAG_DF;
>       } else if (request->type == NBD_CMD_WRITE_ZEROES) {
> diff --git a/nbd/trace-events b/nbd/trace-events
> index e2c1d68688d..adf5666e207 100644
> --- a/nbd/trace-events
> +++ b/nbd/trace-events
> @@ -71,6 +71,7 @@ nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t
>   nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'"
>   nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
>   nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64
> +nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
>   nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32
>   nbd_trip(void) "Reading request"
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 07/14] nbd/server: Refactor to pass full request around
  2023-05-15 19:53 ` [PATCH v3 07/14] nbd/server: Refactor to pass full request around Eric Blake
@ 2023-05-31  8:13   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-31  8:13 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Part of NBD's 64-bit headers extension involves passing the client's
> requested offset back as part of the reply header (one reason for this
> change: converting absolute offsets stored in
> NBD_REPLY_TYPE_OFFSET_DATA to relative offsets within the buffer is
> easier if the absolute offset of the buffer is also available).  This
> is a refactoring patch to pass the full request around the reply
> stack, rather than just the handle, so that later patches can then
> access request->from when extended headers are active.  But for this
> patch, there are no semantic changes.
> 
> Signed-off-by: Eric Blake<eblake@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 08/14] nbd/server: Support 64-bit block status
  2023-05-15 19:53 ` [PATCH v3 08/14] nbd/server: Support 64-bit block status Eric Blake
@ 2023-05-31 14:10   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-31 14:10 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> The NBD spec states that if the client negotiates extended headers,
> the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
> NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
> the reply does not need more than 32 bits.  As of this patch,
> client->header_style is still never NBD_HEADER_EXTENDED, so the code
> added here does not take effect until the next patch enables
> negotiation.
> 
> For now, all metacontexts that we know how to export never populate
> more than 32 bits of information, so we don't have to worry about
> NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
> always send all zeroes for the upper 32 bits of status during
> NBD_CMD_BLOCK_STATUS.
> 
> Note that we previously had some interesting size-juggling on call
> chains, such as:
> 
> nbd_co_send_block_status(uint32_t length)
> -> blockstatus_to_extents(uint32_t bytes)
>    -> bdrv_block_status_above(bytes, &uint64_t num)
>    -> nbd_extent_array_add(uint64_t num)
>      -> store num in 32-bit length
> 
> But we were lucky that it never overflowed: bdrv_block_status_above
> never sets num larger than bytes, and we had previously been capping
> 'bytes' at 32 bits (since the protocol does not allow sending a larger
> request without extended headers).  This patch adds some assertions
> that ensure we continue to avoid overflowing 32 bits for a narrow
> client, while fully utilizing 64-bits all the way through when the
> client understands that.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   nbd/server.c | 86 +++++++++++++++++++++++++++++++++++++---------------
>   1 file changed, 62 insertions(+), 24 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index ffab51efd26..b4c15ae1a14 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -2073,7 +2073,15 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
>   }
> 
>   typedef struct NBDExtentArray {
> -    NBDExtent *extents;
> +    NBDHeaderStyle style;           /* 32- or 64-bit extent descriptions */
> +    union {
> +        NBDStructuredMeta id;       /* style == NBD_HEADER_STRUCTURED */
> +        NBDStructuredMetaExt meta;  /* style == NBD_HEADER_EXTENDED */
> +    };
> +    union {
> +        NBDExtent *narrow;          /* style == NBD_HEADER_STRUCTURED */
> +        NBDExtentExt *extents;      /* style == NBD_HEADER_EXTENDED */
> +    };
>       unsigned int nb_alloc;
>       unsigned int count;
>       uint64_t total_length;
> @@ -2081,12 +2089,15 @@ typedef struct NBDExtentArray {
>       bool converted_to_be;
>   } NBDExtentArray;
> 
> -static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc)
> +static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc,
> +                                            NBDHeaderStyle style)
>   {
>       NBDExtentArray *ea = g_new0(NBDExtentArray, 1);
> 
> +    assert(style >= NBD_HEADER_STRUCTURED);
>       ea->nb_alloc = nb_alloc;
> -    ea->extents = g_new(NBDExtent, nb_alloc);
> +    ea->extents = g_new(NBDExtentExt, nb_alloc);
> +    ea->style = style;
>       ea->can_add = true;
> 
>       return ea;
> @@ -2100,17 +2111,37 @@ static void nbd_extent_array_free(NBDExtentArray *ea)
>   G_DEFINE_AUTOPTR_CLEANUP_FUNC(NBDExtentArray, nbd_extent_array_free)
> 
>   /* Further modifications of the array after conversion are abandoned */
> -static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
> +static void nbd_extent_array_convert_to_be(NBDExtentArray *ea,
> +                                           uint32_t context_id,
> +                                           struct iovec *iov)
>   {
>       int i;
> 
>       assert(!ea->converted_to_be);
> +    assert(iov[0].iov_base == &ea->meta);
> +    assert(iov[1].iov_base == ea->extents);

Hmm. Maybe just pass unitialized iov, and set bases here? It would be more clear interface.

>       ea->can_add = false;
>       ea->converted_to_be = true;
> 
> -    for (i = 0; i < ea->count; i++) {
> -        ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
> -        ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
> +    stl_be_p(&ea->meta.context_id, context_id);
> +    if (ea->style >= NBD_HEADER_EXTENDED) {
> +        stl_be_p(&ea->meta.count, ea->count);
> +        for (i = 0; i < ea->count; i++) {
> +            ea->extents[i].length = cpu_to_be64(ea->extents[i].length);
> +            ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags);
> +        }
> +        iov[0].iov_len = sizeof(ea->meta);
> +        iov[1].iov_len = ea->count * sizeof(ea->extents[0]);
> +    } else {
> +        /* Conversion reduces memory usage, order of iteration matters */
> +        for (i = 0; i < ea->count; i++) {
> +            assert(ea->extents[i].length <= UINT32_MAX);
> +            assert((uint32_t) ea->extents[i].flags == ea->extents[i].flags);
> +            ea->narrow[i].length = cpu_to_be32(ea->extents[i].length);
> +            ea->narrow[i].flags = cpu_to_be32(ea->extents[i].flags);

IMHO, this union-magic significantly increases the complexity of the code..

For example, if simply swap these two lines:

            ea->narrow[i].flags = cpu_to_be32(ea->extents[i].flags);
            ea->narrow[i].length = cpu_to_be32(ea->extents[i].length);

that would be a bug, which's not simple to find.


I have an idea:

1. rewrite the common logic to work with new extended structures
2. add a separate function, which produces old style array, allocating it in flight.

let me try. Something like this (applied on top of this patch). This way we can avoid both unions and passing half-initialized iovs:

diff --git a/nbd/server.c b/nbd/server.c
index 70cc1808c4..b0075dd1ee 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2073,14 +2073,7 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
  
  typedef struct NBDExtentArray {
      NBDHeaderStyle style;           /* 32- or 64-bit extent descriptions */
-    union {
-        NBDStructuredMeta id;       /* style == NBD_HEADER_STRUCTURED */
-        NBDStructuredMetaExt meta;  /* style == NBD_HEADER_EXTENDED */
-    };
-    union {
-        NBDExtent *narrow;          /* style == NBD_HEADER_STRUCTURED */
-        NBDExtentExt *extents;      /* style == NBD_HEADER_EXTENDED */
-    };
+    NBDExtentExt *extents;
      unsigned int nb_alloc;
      unsigned int count;
      uint64_t total_length;
@@ -2110,38 +2103,35 @@ static void nbd_extent_array_free(NBDExtentArray *ea)
  G_DEFINE_AUTOPTR_CLEANUP_FUNC(NBDExtentArray, nbd_extent_array_free)
  
  /* Further modifications of the array after conversion are abandoned */
-static void nbd_extent_array_convert_to_be(NBDExtentArray *ea,
-                                           uint32_t context_id,
-                                           struct iovec *iov)
+static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
  {
      int i;
  
      assert(!ea->converted_to_be);
-    assert(iov[0].iov_base == &ea->meta);
-    assert(iov[1].iov_base == ea->extents);
      ea->can_add = false;
      ea->converted_to_be = true;
  
-    stl_be_p(&ea->meta.context_id, context_id);
-    if (ea->style >= NBD_HEADER_EXTENDED) {
-        stl_be_p(&ea->meta.count, ea->count);
-        for (i = 0; i < ea->count; i++) {
-            ea->extents[i].length = cpu_to_be64(ea->extents[i].length);
-            ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags);
-        }
-        iov[0].iov_len = sizeof(ea->meta);
-        iov[1].iov_len = ea->count * sizeof(ea->extents[0]);
-    } else {
-        /* Conversion reduces memory usage, order of iteration matters */
-        for (i = 0; i < ea->count; i++) {
-            assert(ea->extents[i].length <= UINT32_MAX);
-            assert((uint32_t) ea->extents[i].flags == ea->extents[i].flags);
-            ea->narrow[i].length = cpu_to_be32(ea->extents[i].length);
-            ea->narrow[i].flags = cpu_to_be32(ea->extents[i].flags);
-        }
-        iov[0].iov_len = sizeof(ea->id);
-        iov[1].iov_len = ea->count * sizeof(ea->narrow[0]);
+    for (i = 0; i < ea->count; i++) {
+        ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
+        ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
+    }
+}
+
+static NBDExtent *nbd_extent_array_make_old_style_extents(NBDExtentArray *ea)
+{
+    int i;
+    NBDExtent *extents = g_new(NBDExtent, ea->count);
+
+    assert(!ea->converted_to_be);
+
+    for (i = 0; i < ea->count; i++) {
+        assert(ea->extents[i].length <= UINT32_MAX);
+        assert((uint32_t) ea->extents[i].flags == ea->extents[i].flags);
+        ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
+        ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
      }
+
+    return extents;
  }
  
  /*
@@ -2244,7 +2234,7 @@ static int coroutine_fn blockalloc_to_extents(BlockBackend *blk,
  /*
   * nbd_co_send_extents
   *
- * @ea is converted to BE by the function
+ * @ea may be converted to BE by the function
   * @last controls whether NBD_REPLY_FLAG_DONE is sent.
   */
  static int coroutine_fn
@@ -2252,15 +2242,35 @@ nbd_co_send_extents(NBDClient *client, NBDRequest *request, NBDExtentArray *ea,
                      bool last, uint32_t context_id, Error **errp)
  {
      NBDReply hdr;
-    struct iovec iov[] = {
-        {.iov_base = &hdr},
-        {.iov_base = &ea->meta},
-        {.iov_base = ea->extents}
-    };
-    uint16_t type = client->header_style == NBD_HEADER_EXTENDED ?
-        NBD_REPLY_TYPE_BLOCK_STATUS_EXT : NBD_REPLY_TYPE_BLOCK_STATUS;
+    NBDStructuredMeta meta;
+    NBDStructuredMetaExt meta_ext;
+    g_autofree NBDExtent *extents = NULL;
+    uint16_t type;
+    struct iovec iov[] = { {.iov_base = &hdr}, {0}, {0} };
+
+    if (client->header_style == NBD_HEADER_EXTENDED) {
+        type = NBD_REPLY_TYPE_BLOCK_STATUS_EXT;
+
+        iov[1].iov_base = &meta_ext;
+        iov[1].iov_len = sizeof(meta_ext);
+        stl_be_p(&meta_ext.context_id, context_id);
+        stl_be_p(&meta_ext.count, ea->count);
+
+        nbd_extent_array_convert_to_be(ea);
+        iov[2].iov_base = ea->extents;
+        iov[2].iov_len = ea->count * sizeof(ea->extents[0]);
+    } else {
+        type = NBD_REPLY_TYPE_BLOCK_STATUS;
+
+        iov[1].iov_base = &meta;
+        iov[1].iov_len = sizeof(meta);
+        stl_be_p(&meta.context_id, context_id);
+
+        extents = nbd_extent_array_make_old_style_extents(ea);
+        iov[2].iov_base = extents;
+        iov[2].iov_len = ea->count * sizeof(extents[0]);
+    }
  
-    nbd_extent_array_convert_to_be(ea, context_id, &iov[1]);
  
      trace_nbd_co_send_extents(request->handle, ea->count, context_id,
                                ea->total_length, last);



-- 
Best regards,
Vladimir



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 09/14] nbd/server: Initial support for extended headers
  2023-05-15 19:53 ` [PATCH v3 09/14] nbd/server: Initial support for extended headers Eric Blake
@ 2023-05-31 14:46   ` Vladimir Sementsov-Ogievskiy
  2023-06-07 11:39     ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-31 14:46 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Time to support clients that request extended headers.  Now we can
> finally reach the code added across several previous patches.
> 
> Even though the NBD spec has been altered to allow us to accept
> NBD_CMD_READ larger than the max payload size (provided our response
> is a hole or broken up over more than one data chunk), we are not
> planning to take advantage of that, and continue to cap NBD_CMD_READ
> to 32M regardless of header size.
> 
> For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already
> supports 64-bit operations without any effort on our part.  For
> NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous
> patch took care of implementing the required
> NBD_REPLY_TYPE_BLOCK_STATUS_EXT.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   nbd/nbd-internal.h |   5 +-

[..]

> 
>   static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov,
> -                                       uint64_t error, NBDRequest *request)
> +                                       uint32_t error, NBDStructuredError *err,
> +                                       NBDRequest *request)
>   {
> -    NBDSimpleReply *reply = iov->iov_base;
> +    if (client->header_style >= NBD_HEADER_EXTENDED) {
> +        NBDExtendedReplyChunk *chunk = iov->iov_base;
> 
> -    iov->iov_len = sizeof(*reply);
> -    stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
> -    stl_be_p(&reply->error, error);
> -    stq_be_p(&reply->handle, request->handle);
> +        iov->iov_len = sizeof(*chunk);
> +        stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC);
> +        stw_be_p(&chunk->flags, NBD_REPLY_FLAG_DONE);
> +        stq_be_p(&chunk->handle, request->handle);
> +        stq_be_p(&chunk->offset, request->from);
> +        if (error) {
> +            assert(!iov[1].iov_base);
> +            iov[1].iov_base = err;
> +            iov[1].iov_len = sizeof(*err);
> +            stw_be_p(&chunk->type, NBD_REPLY_TYPE_ERROR);
> +            stq_be_p(&chunk->length, sizeof(*err));
> +            stl_be_p(&err->error, error);
> +            stw_be_p(&err->message_length, 0);
> +        } else {
> +            stw_be_p(&chunk->type, NBD_REPLY_TYPE_NONE);
> +            stq_be_p(&chunk->length, 0);
> +        }
> +    } else {
> +        NBDSimpleReply *reply = iov->iov_base;
> +
> +        iov->iov_len = sizeof(*reply);
> +        stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
> +        stl_be_p(&reply->error, error);
> +        stq_be_p(&reply->handle, request->handle);
> +    }
>   }
> 
>   static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
> @@ -1906,30 +1966,44 @@ static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,

So, that's not _simple_ now.. The function should be renamed. As well as set_be_simple_reply(). _simple_or_extended_ ? a bit too long. But continuing to use "simple" is in bad relation with use of "simple" word in specification.

Probably better to update callers? The only caller isi nbd_send_generic_reply(). So, could we just add nbd_co_send_single_extended_reply() to call from nbd_send_generic_reply() in case of EXTENDED?

Also, transformation of set_be_simple_reply() do look like it should be two separate functions.

>   {
>       NBDReply hdr;
>       int nbd_err = system_errno_to_nbd_errno(error);
> +    NBDStructuredError err;
>       struct iovec iov[] = {
>           {.iov_base = &hdr},
>           {.iov_base = data, .iov_len = len}
>       };
> 
> +    assert(!len || !nbd_err);
>       trace_nbd_co_send_simple_reply(request->handle, nbd_err,
>                                      nbd_err_lookup(nbd_err), len);
> -    set_be_simple_reply(client, &iov[0], nbd_err, request);
> +    set_be_simple_reply(client, &iov[0], nbd_err, &err, request);
> 
> -    return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
> +    return nbd_co_send_iov(client, iov, iov[1].iov_len ? 2 : 1, errp);
>   }
> 
>   static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
>                                   uint16_t flags, uint16_t type,
>                                   NBDRequest *request, uint32_t length)
>   {
> -    NBDStructuredReplyChunk *chunk = iov->iov_base;
> +    if (client->header_style >= NBD_HEADER_EXTENDED) {
> +        NBDExtendedReplyChunk *chunk = iov->iov_base;
> 
> -    iov->iov_len = sizeof(*chunk);
> -    stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
> -    stw_be_p(&chunk->flags, flags);
> -    stw_be_p(&chunk->type, type);
> -    stq_be_p(&chunk->handle, request->handle);
> -    stl_be_p(&chunk->length, length);
> +        iov->iov_len = sizeof(*chunk);
> +        stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC);
> +        stw_be_p(&chunk->flags, flags);
> +        stw_be_p(&chunk->type, type);
> +        stq_be_p(&chunk->handle, request->handle);
> +        stq_be_p(&chunk->offset, request->from);
> +        stq_be_p(&chunk->length, length);
> +    } else {
> +        NBDStructuredReplyChunk *chunk = iov->iov_base;
> +
> +        iov->iov_len = sizeof(*chunk);
> +        stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
> +        stw_be_p(&chunk->flags, flags);
> +        stw_be_p(&chunk->type, type);
> +        stq_be_p(&chunk->handle, request->handle);
> +        stl_be_p(&chunk->length, length);
> +    }
>   }
> 
>   static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,




-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 10/14] nbd/client: Initial support for extended headers
  2023-05-15 19:53 ` [PATCH v3 10/14] nbd/client: " Eric Blake
@ 2023-05-31 15:26   ` Vladimir Sementsov-Ogievskiy
  2023-06-07 18:22     ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-31 15:26 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: libguestfs, Kevin Wolf, Hanna Reitz, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Update the client code to be able to send an extended request, and
> parse an extended header from the server.  Note that since we reject
> any structured reply with a too-large payload, we can always normalize
> a valid header back into the compact form, so that the caller need not
> deal with two branches of a union.  Still, until a later patch lets
> the client negotiate extended headers, the code added here should not
> be reached.  Note that because of the different magic numbers, it is
> just as easy to trace and then tolerate a non-compliant server sending
> the wrong header reply as it would be to insist that the server is
> compliant.
> 
> The only caller to nbd_receive_reply() always passed NULL for errp;
> since we are changing the signature anyways, I decided to sink the
> decision to ignore errors one layer lower.

This way nbd_receive_simple_reply() and nbd_receive_structured_reply_chunk() are called now only with explicit NULL last argument.. And we start to drop all errors.

Also, actually, we'd better add errp parameter to the caller - nbd_receive_replies(), because its caller (nbd_co_do_receive_one_chunk()) will benefit of it, as it already has errp.

> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   include/block/nbd.h |  2 +-
>   block/nbd.c         |  3 +-
>   nbd/client.c        | 86 +++++++++++++++++++++++++++++++--------------
>   nbd/trace-events    |  1 +
>   4 files changed, 63 insertions(+), 29 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index d753fb8006f..865bb4ee2e1 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -371,7 +371,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
>                Error **errp);
>   int nbd_send_request(QIOChannel *ioc, NBDRequest *request, NBDHeaderStyle hdr);
>   int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
> -                                   NBDReply *reply, Error **errp);
> +                                   NBDReply *reply, NBDHeaderStyle hdr);
>   int nbd_client(int fd);
>   int nbd_disconnect(int fd);
>   int nbd_errno_to_system_errno(int err);
> diff --git a/block/nbd.c b/block/nbd.c
> index 6ad6a4f5ecd..d6caea44928 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -458,7 +458,8 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
> 
>           /* We are under mutex and handle is 0. We have to do the dirty work. */
>           assert(s->reply.handle == 0);
> -        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, NULL);
> +        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply,
> +                                s->info.header_style);
>           if (ret <= 0) {
>               ret = ret ? ret : -EIO;
>               nbd_channel_error(s, ret);
> diff --git a/nbd/client.c b/nbd/client.c
> index 17d1f57da60..e5db3c8b79d 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -1350,22 +1350,29 @@ int nbd_disconnect(int fd)
> 
>   int nbd_send_request(QIOChannel *ioc, NBDRequest *request, NBDHeaderStyle hdr)
>   {
> -    uint8_t buf[NBD_REQUEST_SIZE];
> +    uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
> +    size_t len;
> 
> -    assert(hdr < NBD_HEADER_EXTENDED);
> -    assert(request->len <= UINT32_MAX);
>       trace_nbd_send_request(request->from, request->len, request->handle,
>                              request->flags, request->type,
>                              nbd_cmd_lookup(request->type));
> 
> -    stl_be_p(buf, NBD_REQUEST_MAGIC);
>       stw_be_p(buf + 4, request->flags);
>       stw_be_p(buf + 6, request->type);
>       stq_be_p(buf + 8, request->handle);
>       stq_be_p(buf + 16, request->from);
> -    stl_be_p(buf + 24, request->len);
> +    if (hdr >= NBD_HEADER_EXTENDED) {
> +        stl_be_p(buf, NBD_EXTENDED_REQUEST_MAGIC);
> +        stq_be_p(buf + 24, request->len);
> +        len = NBD_EXTENDED_REQUEST_SIZE;
> +    } else {
> +        assert(request->len <= UINT32_MAX);
> +        stl_be_p(buf, NBD_REQUEST_MAGIC);
> +        stl_be_p(buf + 24, request->len);
> +        len = NBD_REQUEST_SIZE;
> +    }
> 
> -    return nbd_write(ioc, buf, sizeof(buf), NULL);
> +    return nbd_write(ioc, buf, len, NULL);
>   }
> 
>   /* nbd_receive_simple_reply
> @@ -1394,28 +1401,34 @@ static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply,
> 
>   /* nbd_receive_structured_reply_chunk
>    * Read structured reply chunk except magic field (which should be already
> - * read).
> + * read).  Normalize into the compact form.
>    * Payload is not read.
>    */
> -static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
> -                                              NBDStructuredReplyChunk *chunk,
> +static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *chunk,
>                                                 Error **errp)

Hmm, _structured_or_extened_ ? Or at least in comment above the function we should mention this.

>   {
>       int ret;
> +    size_t len;
> +    uint64_t payload_len;
> 
> -    assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
> +    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
> +        len = sizeof(chunk->structured);
> +    } else {
> +        assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC);
> +        len = sizeof(chunk->extended);
> +    }
> 
>       ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
> -                   sizeof(*chunk) - sizeof(chunk->magic), "structured chunk",

Would be good to print "extended chunk" in error message for EXTENDED case.

> +                   len - sizeof(chunk->magic), "structured chunk",
>                      errp);
>       if (ret < 0) {
>           return ret;
>       }
> 
> -    chunk->flags = be16_to_cpu(chunk->flags);
> -    chunk->type = be16_to_cpu(chunk->type);
> -    chunk->handle = be64_to_cpu(chunk->handle);
> -    chunk->length = be32_to_cpu(chunk->length);
> +    /* flags, type, and handle occupy same space between forms */
> +    chunk->structured.flags = be16_to_cpu(chunk->structured.flags);
> +    chunk->structured.type = be16_to_cpu(chunk->structured.type);
> +    chunk->structured.handle = be64_to_cpu(chunk->structured.handle);
> 
>       /*
>        * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests
> @@ -1423,11 +1436,20 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
>        * this.  Even if we stopped using REQ_ONE, sane servers will cap
>        * the number of extents they return for block status.
>        */
> -    if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
> +    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
> +        payload_len = be32_to_cpu(chunk->structured.length);
> +    } else {
> +        /* For now, we are ignoring the extended header offset. */
> +        payload_len = be64_to_cpu(chunk->extended.length);
> +        chunk->magic = NBD_STRUCTURED_REPLY_MAGIC;
> +    }
> +    if (payload_len > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
>           error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long",
> -                   chunk->type, nbd_rep_lookup(chunk->type));
> +                   chunk->structured.type,
> +                   nbd_rep_lookup(chunk->structured.type));
>           return -EINVAL;
>       }
> +    chunk->structured.length = payload_len;
> 
>       return 0;
>   }
> @@ -1474,30 +1496,35 @@ nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,
> 
>   /* nbd_receive_reply
>    *
> - * Decreases bs->in_flight while waiting for a new reply. This yield is where
> - * we wait indefinitely and the coroutine must be able to be safely reentered
> - * for nbd_client_attach_aio_context().
> + * Wait for a new reply. If this yields, the coroutine must be able to be
> + * safely reentered for nbd_client_attach_aio_context().  @hdr determines
> + * which reply magic we are expecting, although this normalizes the result
> + * so that the caller only has to work with compact headers.
>    *
>    * Returns 1 on success
> - *         0 on eof, when no data was read (errp is not set)
> - *         negative errno on failure (errp is set)
> + *         0 on eof, when no data was read
> + *         negative errno on failure
>    */
>   int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
> -                                   NBDReply *reply, Error **errp)
> +                                   NBDReply *reply, NBDHeaderStyle hdr)
>   {
>       int ret;
>       const char *type;
> 
> -    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp);
> +    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), NULL);
>       if (ret <= 0) {
>           return ret;
>       }
> 
>       reply->magic = be32_to_cpu(reply->magic);
> 
> +    /* Diagnose but accept wrong-width header */
>       switch (reply->magic) {
>       case NBD_SIMPLE_REPLY_MAGIC:
> -        ret = nbd_receive_simple_reply(ioc, &reply->simple, errp);
> +        if (hdr >= NBD_HEADER_EXTENDED) {
> +            trace_nbd_receive_wrong_header(reply->magic);

maybe, trace also expected style

> +        }
> +        ret = nbd_receive_simple_reply(ioc, &reply->simple, NULL);
>           if (ret < 0) {
>               break;
>           }
> @@ -1506,7 +1533,12 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
>                                          reply->handle);
>           break;
>       case NBD_STRUCTURED_REPLY_MAGIC:
> -        ret = nbd_receive_structured_reply_chunk(ioc, &reply->structured, errp);
> +    case NBD_EXTENDED_REPLY_MAGIC:
> +        if ((hdr >= NBD_HEADER_EXTENDED) !=
> +            (reply->magic == NBD_EXTENDED_REPLY_MAGIC)) {
> +            trace_nbd_receive_wrong_header(reply->magic);
> +        }
> +        ret = nbd_receive_structured_reply_chunk(ioc, reply, NULL);
>           if (ret < 0) {
>               break;
>           }
> @@ -1517,7 +1549,7 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
>                                                    reply->structured.length);
>           break;
>       default:
> -        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", reply->magic);
> +        trace_nbd_receive_wrong_header(reply->magic);
>           return -EINVAL;
>       }
>       if (ret < 0) {
> diff --git a/nbd/trace-events b/nbd/trace-events
> index adf5666e207..c20df33a431 100644
> --- a/nbd/trace-events
> +++ b/nbd/trace-events
> @@ -34,6 +34,7 @@ nbd_client_clear_socket(void) "Clearing NBD socket"
>   nbd_send_request(uint64_t from, uint64_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu64 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
>   nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t handle) "Got simple reply: { .error = %" PRId32 " (%s), handle = %" PRIu64" }"
>   nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), handle = %" PRIu64 ", length = %" PRIu32 " }"
> +nbd_receive_wrong_header(uint32_t magic) "Server sent unexpected magic 0x%" PRIx32
> 
>   # common.c
>   nbd_unknown_error(int err) "Squashing unexpected error %d to EINVAL"

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 11/14] nbd/client: Accept 64-bit block status chunks
  2023-05-15 19:53 ` [PATCH v3 11/14] nbd/client: Accept 64-bit block status chunks Eric Blake
@ 2023-05-31 17:00   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-05-31 17:00 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: libguestfs, Kevin Wolf, Hanna Reitz, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Because we use NBD_CMD_FLAG_REQ_ONE with NBD_CMD_BLOCK_STATUS, a
> client in narrow mode should not be able to provoke a server into
> sending a block status result larger than the client's 32-bit request.
> But in extended mode, a 64-bit status request must be able to handle a
> 64-bit status result, once a future patch enables the client
> requesting extended mode.  We can also tolerate a non-compliant server
> sending the new chunk even when it should not.
> 
> In normal execution, we are only requesting "base:allocation" which
> never exceeds 32 bits. But during testing with x-dirty-bitmap, we can
> force qemu to connect to some other context that might have 64-bit
> status bit; however, we ignore those upper bits (other than mapping
> qemu:allocation-depth into something that 'qemu-img map --output=json'
> can expose), and since it is only testing, we really don't bother with
> checking whether more than the two least-significant bits are set.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   block/nbd.c        | 39 ++++++++++++++++++++++++++++-----------
>   block/trace-events |  1 +
>   2 files changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index d6caea44928..150dfe7170c 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -610,13 +610,16 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
>    */
>   static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
>                                            NBDStructuredReplyChunk *chunk,
> -                                         uint8_t *payload, uint64_t orig_length,
> -                                         NBDExtent *extent, Error **errp)
> +                                         uint8_t *payload, bool wide,
> +                                         uint64_t orig_length,
> +                                         NBDExtentExt *extent, Error **errp)
>   {
>       uint32_t context_id;
> +    uint32_t count = 0;
> +    size_t len = wide ? sizeof(*extent) : sizeof(NBDExtent);
> 
>       /* The server succeeded, so it must have sent [at least] one extent */
> -    if (chunk->length < sizeof(context_id) + sizeof(*extent)) {
> +    if (chunk->length < sizeof(context_id) + wide * sizeof(count) + len) {
>           error_setg(errp, "Protocol error: invalid payload for "
>                            "NBD_REPLY_TYPE_BLOCK_STATUS");
>           return -EINVAL;
> @@ -631,8 +634,14 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
>           return -EINVAL;
>       }
> 
> -    extent->length = payload_advance32(&payload);
> -    extent->flags = payload_advance32(&payload);
> +    if (wide) {
> +        count = payload_advance32(&payload);
> +        extent->length = payload_advance64(&payload);
> +        extent->flags = payload_advance64(&payload);
> +    } else {
> +        extent->length = payload_advance32(&payload);
> +        extent->flags = payload_advance32(&payload);
> +    }
> 
>       if (extent->length == 0) {
>           error_setg(errp, "Protocol error: server sent status chunk with "
> @@ -672,7 +681,8 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
>        * connection; just ignore trailing extents, and clamp things to
>        * the length of our request.
>        */
> -    if (chunk->length > sizeof(context_id) + sizeof(*extent)) {
> +    if (count != wide ||

hard to read for me. Could it be simply "count > 1 ||" ?

> +        chunk->length > sizeof(context_id) + wide * sizeof(count) + len) {
>           trace_nbd_parse_blockstatus_compliance("more than one extent");
>       }
>       if (extent->length > orig_length) {
> @@ -1117,7 +1127,7 @@ static int coroutine_fn nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t h
> 
>   static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
>                                                            uint64_t handle, uint64_t length,
> -                                                         NBDExtent *extent,
> +                                                         NBDExtentExt *extent,
>                                                            int *request_ret, Error **errp)
>   {
>       NBDReplyChunkIter iter;
> @@ -1125,6 +1135,7 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
>       void *payload = NULL;
>       Error *local_err = NULL;
>       bool received = false;
> +    bool wide = false;
> 
>       assert(!extent->length);
>       NBD_FOREACH_REPLY_CHUNK(s, iter, handle, false, NULL, &reply, &payload) {
> @@ -1134,7 +1145,13 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
>           assert(nbd_reply_is_structured(&reply));
> 
>           switch (chunk->type) {
> +        case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
> +            wide = true;
> +            /* fallthrough */
>           case NBD_REPLY_TYPE_BLOCK_STATUS:
> +            if (s->info.extended_headers != wide) {
> +                trace_nbd_extended_headers_compliance("block_status");

You set wide to true once, on first "NBD_REPLY_TYPE_BLOCK_STATUS_EXT", and then parse following "NBD_REPLY_TYPE_BLOCK_STATUS" if the come with wide=true.

Should it be:

--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1135,7 +1135,7 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
      void *payload = NULL;
      Error *local_err = NULL;
      bool received = false;
-    bool wide = false;
+    bool wide;
  
      assert(!extent->length);
      NBD_FOREACH_REPLY_CHUNK(s, iter, handle, false, NULL, &reply, &payload) {
@@ -1146,9 +1146,8 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
  
          switch (chunk->type) {
          case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
-            wide = true;
-            /* fallthrough */
          case NBD_REPLY_TYPE_BLOCK_STATUS:
+            wide = chunk->type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT;
              if (s->info.extended_headers != wide) {
                  trace_nbd_extended_headers_compliance("block_status");
              }


> +            }
>               if (received) {
>                   nbd_channel_error(s, -EINVAL);
>                   error_setg(&local_err, "Several BLOCK_STATUS chunks in reply");
> @@ -1142,9 +1159,9 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
>               }
>               received = true;
> 
> -            ret = nbd_parse_blockstatus_payload(s, &reply.structured,
> -                                                payload, length, extent,
> -                                                &local_err);
> +            ret = nbd_parse_blockstatus_payload(
> +                s, &reply.structured, payload, wide,
> +                length, extent, &local_err);
>               if (ret < 0) {
>                   nbd_channel_error(s, ret);
>                   nbd_iter_channel_error(&iter, ret, &local_err);
> @@ -1374,7 +1391,7 @@ static int coroutine_fn GRAPH_RDLOCK nbd_client_co_block_status(
>           int64_t *pnum, int64_t *map, BlockDriverState **file)
>   {
>       int ret, request_ret;
> -    NBDExtent extent = { 0 };
> +    NBDExtentExt extent = { 0 };
>       BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
>       Error *local_err = NULL;
> 
> diff --git a/block/trace-events b/block/trace-events
> index 48dbf10c66f..afb32fcce5b 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -168,6 +168,7 @@ iscsi_xcopy(void *src_lun, uint64_t src_off, void *dst_lun, uint64_t dst_off, ui
>   # nbd.c
>   nbd_parse_blockstatus_compliance(const char *err) "ignoring extra data from non-compliant server: %s"
>   nbd_structured_read_compliance(const char *type) "server sent non-compliant unaligned read %s chunk"
> +nbd_extended_headers_compliance(const char *type) "server sent non-compliant %s chunk not matching choice of extended headers"
>   nbd_read_reply_entry_fail(int ret, const char *err) "ret = %d, err: %s"
>   nbd_co_request_fail(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name, int ret, const char *err) "Request failed { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) } ret = %d, err: %s"
>   nbd_client_handshake(const char *export_name) "export '%s'"

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 12/14] nbd/client: Request extended headers during negotiation
       [not found]         ` <hbjtjovry4e5kb6oyii4g2hncetfo2uic67r5ipufcikvgyb5x@idenexfxits4>
@ 2023-06-01  8:43           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-06-01  8:43 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, libguestfs, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

On 31.05.23 23:26, Eric Blake wrote:
> On Wed, May 31, 2023 at 09:33:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 31.05.23 20:54, Eric Blake wrote:
>>> On Wed, May 31, 2023 at 08:39:53PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 15.05.23 22:53, Eric Blake wrote:
>>>>> All the pieces are in place for a client to finally request extended
>>>>> headers.  Note that we must not request extended headers when qemu-nbd
>>>>
>>>> why must not? It should gracefully report ENOTSUP? Or not?
>>>
>>> The kernel code does not yet know how to send extended requests; once
>>> extended mode is negotiated, sending a simple request requires the
>>
>> but how it could be negotiated if kernel doesn't support it?
> 
> That's the problem.  The kernel doesn't do the negotiation, userspace

oh yes, I totally forget these mechanics

> does.  There is an ioctl for the userspace to tell the kernel what
> flags were advertised as part of the negotiation, but that does not
> include a flag for extended operation.  The kernel ONLY takes care of
> NBD_CMD_ operations, it does not do NBD_OPT_ operations.  So when
> qemu-nbd is preparing to connect to /dev/nbdN, it first has to
> negotiate in userspace, avoiding any attempt to use extended headers,
> then call the right ioctls for the kernel to take over command phase
> using the older compact headers.

> 
>>
>> I mean if we request extended headers during negotiation with kernel, the kernel will just say "unsupported option", isn't it?
> 
> If we request extended headers in userspace before calling the ioctl
> to tell the kernel to start transmission, then the kernel's first
> command will use the compact style, which the server is not expecting,
> and while we can hope the server will hang up on the kernel, I didn't
> test what actually happens.
> 
> 
>>
>> Or, in other words, I understand that kernel doesn't support it, I don't understand why you note it here. Is kernel different from other NBD server implementations which doesn't support extended requests at the moment?
> 
> The kernel is an NBD client, not a server.  But if we are about to
> connect an NBD server over to the kernel for /dev/nbdN, we better make
> sure the server is not using any features the kernel doesn't support.
> 

thanks!

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 13/14] nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  2023-05-15 19:53 ` [PATCH v3 13/14] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
@ 2023-06-01  9:57   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-06-01  9:57 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: libguestfs, Kevin Wolf, Hanna Reitz, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> The next commit will add support for the new addition of
> NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
> request that the server only return a subset of negotiated contexts,
> rather than all contexts.  To make that task easier, this patch
> populates the list of contexts to return on a per-command basis (for
> now, identical to the full set of negotiated contexts).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   include/block/nbd.h |  15 ++++++
>   nbd/server.c        | 108 +++++++++++++++++++++++---------------------
>   2 files changed, 72 insertions(+), 51 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 865bb4ee2e1..6696d61bd59 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -60,6 +60,20 @@ typedef enum NBDHeaderStyle {
>       NBD_HEADER_EXTENDED,    /* NBD_OPT_EXTENDED_HEADERS negotiated */
>   } NBDHeaderStyle;
> 
> +/*
> + * NBDMetaContexts represents a list of meta contexts in use, as
> + * selected by NBD_OPT_SET_META_CONTEXT. Also used for
> + * NBD_OPT_LIST_META_CONTEXT, and payload filtering in
> + * NBD_CMD_BLOCK_STATUS.
> + */
> +typedef struct NBDMetaContexts {
> +    size_t count; /* number of negotiated contexts */
> +    bool base_allocation; /* export base:allocation context (block status) */
> +    bool allocation_depth; /* export qemu:allocation-depth */
> +    size_t nr_bitmaps; /* Length of bitmaps array */
> +    bool *bitmaps; /* export qemu:dirty-bitmap:<export bitmap name> */
> +} NBDMetaContexts;
> +
>   /*
>    * Note: NBDRequest is _NOT_ the same as the network representation of an NBD
>    * request!
> @@ -70,6 +84,7 @@ typedef struct NBDRequest {
>       uint64_t len;   /* Effect length; 32 bit limit without extended headers */
>       uint16_t flags; /* NBD_CMD_FLAG_* */
>       uint16_t type;  /* NBD_CMD_* */
> +    NBDMetaContexts contexts; /* Used by NBD_CMD_BLOCK_STATUS */
>   } NBDRequest;
> 
>   typedef struct NBDSimpleReply {
> diff --git a/nbd/server.c b/nbd/server.c
> index 6475a76c1f0..db550c82cd2 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -105,20 +105,6 @@ struct NBDExport {
> 
>   static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
> 
> -/* NBDExportMetaContexts represents a list of contexts to be exported,
> - * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
> - * NBD_OPT_LIST_META_CONTEXT. */
> -typedef struct NBDExportMetaContexts {
> -    NBDExport *exp;
> -    size_t count; /* number of negotiated contexts */
> -    bool base_allocation; /* export base:allocation context (block status) */
> -    bool allocation_depth; /* export qemu:allocation-depth */
> -    bool *bitmaps; /*
> -                    * export qemu:dirty-bitmap:<export bitmap name>,
> -                    * sized by exp->nr_export_bitmaps
> -                    */
> -} NBDExportMetaContexts;
> -
>   struct NBDClient {
>       int refcount;
>       void (*close_fn)(NBDClient *client, bool negotiated);
> @@ -144,7 +130,8 @@ struct NBDClient {
>       uint32_t check_align; /* If non-zero, check for aligned client requests */
> 
>       NBDHeaderStyle header_style;
> -    NBDExportMetaContexts export_meta;
> +    NBDExport *context_exp; /* export of last OPT_SET_META_CONTEXT */
> +    NBDMetaContexts contexts; /* Negotiated meta contexts */
> 
>       uint32_t opt; /* Current option being negotiated */
>       uint32_t optlen; /* remaining length of data in ioc for the option being
> @@ -457,8 +444,8 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp)
> 
>   static void nbd_check_meta_export(NBDClient *client)
>   {
> -    if (client->exp != client->export_meta.exp) {
> -        client->export_meta.count = 0;
> +    if (client->exp != client->context_exp) {
> +        client->contexts.count = 0;
>       }
>   }
> 
> @@ -852,7 +839,7 @@ static bool nbd_strshift(const char **str, const char *prefix)
>    * Handle queries to 'base' namespace. For now, only the base:allocation
>    * context is available.  Return true if @query has been handled.
>    */
> -static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
> +static bool nbd_meta_base_query(NBDClient *client, NBDMetaContexts *meta,
>                                   const char *query)
>   {
>       if (!nbd_strshift(&query, "base:")) {
> @@ -872,8 +859,8 @@ static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
>    * and qemu:allocation-depth contexts are available.  Return true if @query
>    * has been handled.
>    */
> -static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
> -                                const char *query)
> +static bool nbd_meta_qemu_query(NBDClient *client, NBDExport *exp,
> +                                NBDMetaContexts *meta, const char *query)
>   {
>       size_t i;
> 
> @@ -884,9 +871,9 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
> 
>       if (!*query) {
>           if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
> -            meta->allocation_depth = meta->exp->allocation_depth;
> -            if (meta->exp->nr_export_bitmaps) {
> -                memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
> +            meta->allocation_depth = exp->allocation_depth;
> +            if (meta->nr_bitmaps) {
> +                memset(meta->bitmaps, 1, meta->nr_bitmaps);
>               }
>           }
>           trace_nbd_negotiate_meta_query_parse("empty");
> @@ -895,7 +882,7 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
> 
>       if (strcmp(query, "allocation-depth") == 0) {
>           trace_nbd_negotiate_meta_query_parse("allocation-depth");
> -        meta->allocation_depth = meta->exp->allocation_depth;
> +        meta->allocation_depth = exp->allocation_depth;
>           return true;
>       }
> 
> @@ -903,17 +890,17 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
>           trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
>           if (!*query) {
>               if (client->opt == NBD_OPT_LIST_META_CONTEXT &&
> -                meta->exp->nr_export_bitmaps) {
> -                memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
> +                exp->nr_export_bitmaps) {
> +                memset(meta->bitmaps, 1, exp->nr_export_bitmaps);

sometimes you change s/meta->exp->nr_export_bitmaps/meta->nr_bitmaps/ and sometimes s/meta->exp->nr_export_bitmaps/exp->nr_export_bitmaps/

>               }
>               trace_nbd_negotiate_meta_query_parse("empty");
>               return true;
>           }
> 
> -        for (i = 0; i < meta->exp->nr_export_bitmaps; i++) {
> +        for (i = 0; i < meta->nr_bitmaps; i++) {
>               const char *bm_name;
> 
> -            bm_name = bdrv_dirty_bitmap_name(meta->exp->export_bitmaps[i]);
> +            bm_name = bdrv_dirty_bitmap_name(exp->export_bitmaps[i]);

I don't like the change: we iterate through exp->export_bitmaps. So seems more correct to use exp->nr_export_bitmaps as a limit.

I understand why you want to duplicated nr_export_bitmaps in NBDMetaContexts, but the result doesn't seem better than just use exp->nr_export_bitmaps.

Probably, if we want to duplicate nr_export_bitmaps, we should duplicate export_bitmaps as well. But this leads actually to keeping meta->exp as is.

Hm, may be just do

size_t nr_bitmaps = meta->nr_bitmaps;
assert(nr_bitmaps == exp->nr_export_bitmaps)
at top of the function? and use nr_bitmaps everywhere.

But this again make me doubt that meta.nr_bitmaps is good idea: we have to pass exp together and check meta.nr_bitmaps == exp->nr_export_bitmaps every time.

So, I understand, that

     bool *bitmaps; /*
                     * export qemu:dirty-bitmap:<export bitmap name>,
                     * sized by exp->nr_export_bitmaps
                     */

sounds like not good design - we have an array, sized by field of another object. But, we have a link that that object (meta->exp).

But with your way we fall into bad design too: we have a duplicated field, and have to control that they not diverse. And we iterate things in exp, using field of meta.


I looked through, seems in all places where we use nr_bitmaps, we also have access to nr_export_bitmaps. So, maybe, drop this nr_bitmaps?
And why not to keep meta.exp? With it, meta is self-sufficient object, which fully describe some selected contexts.

For per-request filtering, meta.exp just must always point to current export. And anyway would be good to split refactoring of meta structure (if we need it) and new logic of per-command filtering.

>               if (strcmp(bm_name, query) == 0) {
>                   meta->bitmaps[i] = true;
>                   trace_nbd_negotiate_meta_query_parse(query);
> @@ -937,8 +924,8 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
>    *
>    * Return -errno on I/O error, 0 if option was completely handled by
>    * sending a reply about inconsistent lengths, or 1 on success. */
> -static int nbd_negotiate_meta_query(NBDClient *client,
> -                                    NBDExportMetaContexts *meta, Error **errp)
> +static int nbd_negotiate_meta_query(NBDClient *client, NBDExport *exp,
> +                                    NBDMetaContexts *meta, Error **errp)
>   {
>       int ret;
>       g_autofree char *query = NULL;
> @@ -965,7 +952,7 @@ static int nbd_negotiate_meta_query(NBDClient *client,
>       if (nbd_meta_base_query(client, meta, query)) {
>           return 1;
>       }
> -    if (nbd_meta_qemu_query(client, meta, query)) {
> +    if (nbd_meta_qemu_query(client, exp, meta, query)) {
>           return 1;
>       }
> 
> @@ -977,14 +964,15 @@ static int nbd_negotiate_meta_query(NBDClient *client,
>    * Handle NBD_OPT_LIST_META_CONTEXT and NBD_OPT_SET_META_CONTEXT
>    *
>    * Return -errno on I/O error, or 0 if option was completely handled. */
> -static int nbd_negotiate_meta_queries(NBDClient *client,
> -                                      NBDExportMetaContexts *meta, Error **errp)
> +static int nbd_negotiate_meta_queries(NBDClient *client, Error **errp)
>   {
>       int ret;
>       g_autofree char *export_name = NULL;
>       /* Mark unused to work around https://bugs.llvm.org/show_bug.cgi?id=3888 */
>       g_autofree G_GNUC_UNUSED bool *bitmaps = NULL;
> -    NBDExportMetaContexts local_meta = {0};
> +    NBDMetaContexts local_meta = {0};
> +    NBDMetaContexts *meta;
> +    NBDExport *exp;
>       uint32_t nb_queries;
>       size_t i;
>       size_t count = 0;
> @@ -1000,6 +988,9 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
>       if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
>           /* Only change the caller's meta on SET. */
>           meta = &local_meta;
> +    } else {
> +        meta = &client->contexts;
> +        client->context_exp = NULL;
>       }
> 
>       g_free(meta->bitmaps);
> @@ -1010,14 +1001,15 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
>           return ret;
>       }
> 
> -    meta->exp = nbd_export_find(export_name);
> -    if (meta->exp == NULL) {
> +    exp = nbd_export_find(export_name);
> +    if (exp == NULL) {
>           g_autofree char *sane_name = nbd_sanitize_name(export_name);
> 
>           return nbd_opt_drop(client, NBD_REP_ERR_UNKNOWN, errp,
>                               "export '%s' not present", sane_name);
>       }
> -    meta->bitmaps = g_new0(bool, meta->exp->nr_export_bitmaps);
> +    meta->nr_bitmaps = exp->nr_export_bitmaps;
> +    meta->bitmaps = g_new0(bool, exp->nr_export_bitmaps);
>       if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
>           bitmaps = meta->bitmaps;
>       }
> @@ -1033,13 +1025,13 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
>       if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
>           /* enable all known contexts */
>           meta->base_allocation = true;
> -        meta->allocation_depth = meta->exp->allocation_depth;
> -        if (meta->exp->nr_export_bitmaps) {
> -            memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
> +        meta->allocation_depth = exp->allocation_depth;
> +        if (exp->nr_export_bitmaps) {
> +            memset(meta->bitmaps, 1, meta->nr_bitmaps);
>           }
>       } else {
>           for (i = 0; i < nb_queries; ++i) {
> -            ret = nbd_negotiate_meta_query(client, meta, errp);
> +            ret = nbd_negotiate_meta_query(client, exp, meta, errp);
>               if (ret <= 0) {
>                   return ret;
>               }
> @@ -1066,7 +1058,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
>           count++;
>       }
> 
> -    for (i = 0; i < meta->exp->nr_export_bitmaps; i++) {
> +    for (i = 0; i < meta->nr_bitmaps; i++) {
>           const char *bm_name;
>           g_autofree char *context = NULL;
> 
> @@ -1074,7 +1066,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
>               continue;
>           }
> 
> -        bm_name = bdrv_dirty_bitmap_name(meta->exp->export_bitmaps[i]);
> +        bm_name = bdrv_dirty_bitmap_name(exp->export_bitmaps[i]);
>           context = g_strdup_printf("qemu:dirty-bitmap:%s", bm_name);
> 
>           ret = nbd_negotiate_send_meta_context(client, context,
> @@ -1089,6 +1081,9 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
>       ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
>       if (ret == 0) {
>           meta->count = count;
> +        if (client->opt == NBD_OPT_SET_META_CONTEXT) {
> +            client->context_exp = exp;
> +        }
>       }
> 
>       return ret;
> @@ -1282,8 +1277,7 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
> 
>               case NBD_OPT_LIST_META_CONTEXT:
>               case NBD_OPT_SET_META_CONTEXT:
> -                ret = nbd_negotiate_meta_queries(client, &client->export_meta,
> -                                                 errp);
> +                ret = nbd_negotiate_meta_queries(client, errp);
>                   break;
> 
>               case NBD_OPT_EXTENDED_HEADERS:
> @@ -1514,7 +1508,7 @@ void nbd_client_put(NBDClient *client)
>               QTAILQ_REMOVE(&client->exp->clients, client, next);
>               blk_exp_unref(&client->exp->common);
>           }
> -        g_free(client->export_meta.bitmaps);
> +        g_free(client->contexts.bitmaps);
>           g_free(client);
>       }
>   }
> @@ -2489,6 +2483,8 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
>                   return -ENOMEM;
>               }
>           }
> +    } else if (request->type == NBD_CMD_BLOCK_STATUS) {
> +        request->contexts = client->contexts;
>       }
> 
>       if (payload_len) {
> @@ -2715,11 +2711,11 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>           }
>           assert(client->header_style >= NBD_HEADER_EXTENDED ||
>                  request->len <= UINT32_MAX);
> -        if (client->export_meta.count) {
> +        if (request->contexts.count) {
>               bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
> -            int contexts_remaining = client->export_meta.count;
> +            int contexts_remaining = request->contexts.count;
> 
> -            if (client->export_meta.base_allocation) {
> +            if (request->contexts.base_allocation) {
>                   ret = nbd_co_send_block_status(client, request,
>                                                  exp->common.blk,
>                                                  request->from,
> @@ -2732,7 +2728,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>                   }
>               }
> 
> -            if (client->export_meta.allocation_depth) {
> +            if (request->contexts.allocation_depth) {
>                   ret = nbd_co_send_block_status(client, request,
>                                                  exp->common.blk,
>                                                  request->from, request->len,
> @@ -2745,8 +2741,10 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>                   }
>               }
> 
> +            assert(request->contexts.nr_bitmaps ==
> +                   client->exp->nr_export_bitmaps);
>               for (i = 0; i < client->exp->nr_export_bitmaps; i++) {
> -                if (!client->export_meta.bitmaps[i]) {
> +                if (!request->contexts.bitmaps[i]) {
>                       continue;
>                   }
>                   ret = nbd_co_send_bitmap(client, request,
> @@ -2762,6 +2760,10 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>               assert(!contexts_remaining);
> 
>               return 0;
> +        } else if (client->contexts.count) {
> +            return nbd_send_generic_reply(client, request, -EINVAL,
> +                                          "CMD_BLOCK_STATUS payload not valid",
> +                                          errp);
>           } else {
>               return nbd_send_generic_reply(client, request, -EINVAL,
>                                             "CMD_BLOCK_STATUS not negotiated",
> @@ -2840,6 +2842,10 @@ static coroutine_fn void nbd_trip(void *opaque)
>       } else {
>           ret = nbd_handle_request(client, &request, req->data, &local_err);
>       }
> +    if (request.type == NBD_CMD_BLOCK_STATUS &&
> +        request.contexts.bitmaps != client->contexts.bitmaps) {
> +        g_free(request.contexts.bitmaps);
> +    }
>       if (ret < 0) {
>           error_prepend(&local_err, "Failed to send reply: ");
>           goto disconnect;

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 14/14] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2023-05-15 19:53 ` [PATCH v3 14/14] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
@ 2023-06-02  9:13   ` Vladimir Sementsov-Ogievskiy
  2023-06-02 13:14     ` [Libguestfs] " Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-06-02  9:13 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: libguestfs, Kevin Wolf, Hanna Reitz, open list:Network Block Dev...

On 15.05.23 22:53, Eric Blake wrote:
> Allow a client to request a subset of negotiated meta contexts.  For
> example, a client may ask to use a single connection to learn about
> both block status and dirty bitmaps, but where the dirty bitmap
> queries only need to be performed on a subset of the disk; forcing the
> server to compute that information on block status queries in the rest
> of the disk is wasted effort (both at the server, and on the amount of
> traffic sent over the wire to be parsed and ignored by the client).
> 
> Qemu as an NBD client never requests to use more than one meta
> context, so it has no need to use block status payloads.  Testing this
> instead requires support from libnbd, which CAN access multiple meta
> contexts in parallel from a single NBD connection; an interop test
> submitted to the libnbd project at the same time as this patch
> demonstrates the feature working, as well as testing some corner cases
> (for example, when the payload length is longer than the export
> length), although other corner cases (like passing the same id
> duplicated) requires a protocol fuzzer because libnbd is not wired up
> to break the protocol that badly.
> 
> This also includes tweaks to 'qemu-nbd --list' to show when a server
> is advertising the capability, and to the testsuite to reflect the
> addition to that output.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   docs/interop/nbd.txt                          |   2 +-
>   include/block/nbd.h                           |  32 ++++--
>   nbd/server.c                                  | 106 +++++++++++++++++-
>   qemu-nbd.c                                    |   1 +
>   nbd/trace-events                              |   1 +
>   tests/qemu-iotests/223.out                    |  12 +-
>   tests/qemu-iotests/307.out                    |  10 +-
>   .../tests/nbd-qemu-allocation.out             |   2 +-
>   8 files changed, 136 insertions(+), 30 deletions(-)
> 
> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
> index abaf4c28a96..83d85ce8d13 100644
> --- a/docs/interop/nbd.txt
> +++ b/docs/interop/nbd.txt
> @@ -69,4 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
>   NBD_CMD_FLAG_FAST_ZERO
>   * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
>   * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
> -* 8.1: NBD_OPT_EXTENDED_HEADERS
> +* 8.1: NBD_OPT_EXTENDED_HEADERS, NBD_FLAG_BLOCK_STATUS_PAYLOAD
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 6696d61bd59..3d8d7150121 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -175,6 +175,12 @@ typedef struct NBDExtentExt {
>       uint64_t flags; /* NBD_STATE_* */
>   } QEMU_PACKED NBDExtentExt;
> 
> +/* Client payload for limiting NBD_CMD_BLOCK_STATUS reply */
> +typedef struct NBDBlockStatusPayload {
> +    uint64_t effect_length;
> +    /* uint32_t ids[] follows, array length implied by header */
> +} QEMU_PACKED NBDBlockStatusPayload;
> +
>   /* Transmission (export) flags: sent from server to client during handshake,
>      but describe what will happen during transmission */
>   enum {
> @@ -191,20 +197,22 @@ enum {
>       NBD_FLAG_SEND_RESIZE_BIT        =  9, /* Send resize */
>       NBD_FLAG_SEND_CACHE_BIT         = 10, /* Send CACHE (prefetch) */
>       NBD_FLAG_SEND_FAST_ZERO_BIT     = 11, /* FAST_ZERO flag for WRITE_ZEROES */
> +    NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT = 12, /* PAYLOAD flag for BLOCK_STATUS */
>   };
> 
> -#define NBD_FLAG_HAS_FLAGS         (1 << NBD_FLAG_HAS_FLAGS_BIT)
> -#define NBD_FLAG_READ_ONLY         (1 << NBD_FLAG_READ_ONLY_BIT)
> -#define NBD_FLAG_SEND_FLUSH        (1 << NBD_FLAG_SEND_FLUSH_BIT)
> -#define NBD_FLAG_SEND_FUA          (1 << NBD_FLAG_SEND_FUA_BIT)
> -#define NBD_FLAG_ROTATIONAL        (1 << NBD_FLAG_ROTATIONAL_BIT)
> -#define NBD_FLAG_SEND_TRIM         (1 << NBD_FLAG_SEND_TRIM_BIT)
> -#define NBD_FLAG_SEND_WRITE_ZEROES (1 << NBD_FLAG_SEND_WRITE_ZEROES_BIT)
> -#define NBD_FLAG_SEND_DF           (1 << NBD_FLAG_SEND_DF_BIT)
> -#define NBD_FLAG_CAN_MULTI_CONN    (1 << NBD_FLAG_CAN_MULTI_CONN_BIT)
> -#define NBD_FLAG_SEND_RESIZE       (1 << NBD_FLAG_SEND_RESIZE_BIT)
> -#define NBD_FLAG_SEND_CACHE        (1 << NBD_FLAG_SEND_CACHE_BIT)
> -#define NBD_FLAG_SEND_FAST_ZERO    (1 << NBD_FLAG_SEND_FAST_ZERO_BIT)
> +#define NBD_FLAG_HAS_FLAGS          (1 << NBD_FLAG_HAS_FLAGS_BIT)
> +#define NBD_FLAG_READ_ONLY          (1 << NBD_FLAG_READ_ONLY_BIT)
> +#define NBD_FLAG_SEND_FLUSH         (1 << NBD_FLAG_SEND_FLUSH_BIT)
> +#define NBD_FLAG_SEND_FUA           (1 << NBD_FLAG_SEND_FUA_BIT)
> +#define NBD_FLAG_ROTATIONAL         (1 << NBD_FLAG_ROTATIONAL_BIT)
> +#define NBD_FLAG_SEND_TRIM          (1 << NBD_FLAG_SEND_TRIM_BIT)
> +#define NBD_FLAG_SEND_WRITE_ZEROES  (1 << NBD_FLAG_SEND_WRITE_ZEROES_BIT)
> +#define NBD_FLAG_SEND_DF            (1 << NBD_FLAG_SEND_DF_BIT)
> +#define NBD_FLAG_CAN_MULTI_CONN     (1 << NBD_FLAG_CAN_MULTI_CONN_BIT)
> +#define NBD_FLAG_SEND_RESIZE        (1 << NBD_FLAG_SEND_RESIZE_BIT)
> +#define NBD_FLAG_SEND_CACHE         (1 << NBD_FLAG_SEND_CACHE_BIT)
> +#define NBD_FLAG_SEND_FAST_ZERO     (1 << NBD_FLAG_SEND_FAST_ZERO_BIT)
> +#define NBD_FLAG_BLOCK_STAT_PAYLOAD (1 << NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT)
> 
>   /* New-style handshake (global) flags, sent from server to client, and
>      control what will happen during handshake phase. */
> diff --git a/nbd/server.c b/nbd/server.c
> index db550c82cd2..ce11285c0d7 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -442,9 +442,9 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp)
>       return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
>   }
> 
> -static void nbd_check_meta_export(NBDClient *client)
> +static void nbd_check_meta_export(NBDClient *client, NBDExport *exp)
>   {
> -    if (client->exp != client->context_exp) {
> +    if (exp != client->context_exp) {
>           client->contexts.count = 0;
>       }
>   }
> @@ -491,11 +491,15 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
>           error_setg(errp, "export not found");
>           return -EINVAL;
>       }
> +    nbd_check_meta_export(client, client->exp);
> 
>       myflags = client->exp->nbdflags;
>       if (client->header_style >= NBD_HEADER_STRUCTURED) {
>           myflags |= NBD_FLAG_SEND_DF;
>       }
> +    if (client->extended_headers && client->contexts.count) {
> +        myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
> +    }
>       trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags);
>       stq_be_p(buf, client->exp->size);
>       stw_be_p(buf + 8, myflags);
> @@ -508,7 +512,6 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
> 
>       QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
>       blk_exp_ref(&client->exp->common);
> -    nbd_check_meta_export(client);
> 
>       return 0;
>   }
> @@ -628,6 +631,9 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
>                                             errp, "export '%s' not present",
>                                             sane_name);
>       }
> +    if (client->opt == NBD_OPT_GO) {
> +        nbd_check_meta_export(client, exp);
> +    }
> 
>       /* Don't bother sending NBD_INFO_NAME unless client requested it */
>       if (sendname) {
> @@ -681,6 +687,10 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
>       if (client->header_style >= NBD_HEADER_STRUCTURED) {
>           myflags |= NBD_FLAG_SEND_DF;
>       }
> +    if (client->extended_headers &&
> +        (client->contexts.count || client->opt == NBD_OPT_INFO)) {
> +        myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
> +    }
>       trace_nbd_negotiate_new_style_size_flags(exp->size, myflags);
>       stq_be_p(buf, exp->size);
>       stw_be_p(buf + 8, myflags);
> @@ -716,7 +726,6 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
>           client->check_align = check_align;
>           QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
>           blk_exp_ref(&client->exp->common);
> -        nbd_check_meta_export(client);
>           rc = 1;
>       }
>       return rc;
> @@ -2415,6 +2424,83 @@ static int coroutine_fn nbd_co_send_bitmap(NBDClient *client,
>       return nbd_co_send_extents(client, request, ea, last, context_id, errp);
>   }
> 
> +/*
> + * nbd_co_block_status_payload_read
> + * Called when a client wants a subset of negotiated contexts via a
> + * BLOCK_STATUS payload.  Check the payload for valid length and
> + * contents.  On success, return 0 with request updated to effective
> + * length.  If request was invalid but payload consumed, return 0 with
> + * request->len and request->contexts.count set to 0 (which will
> + * trigger an appropriate NBD_EINVAL response later on). 

Hmm. So, it leads to

     case NBD_CMD_BLOCK_STATUS:
         if (!request->len) {
             return nbd_send_generic_reply(client, request, -EINVAL,
                                           "need non-zero length", errp);
         }

EINVAL is OK, but "need non-zero length" message is not appropriate.. I think we need separate reply for the case of invalid payload.

> On I/O
> + * error, return -EIO.
> + */
> +static int
> +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
> +                                 Error **errp)
> +{
> +    int payload_len = request->len;
> +    g_autofree char *buf = NULL;
> +    g_autofree bool *bitmaps = NULL;
> +    size_t count, i;
> +    uint32_t id;
> +
> +    assert(request->len <= NBD_MAX_BUFFER_SIZE);
> +    if (payload_len % sizeof(uint32_t) ||
> +        payload_len < sizeof(NBDBlockStatusPayload) ||
> +        payload_len > (sizeof(NBDBlockStatusPayload) +
> +                       sizeof(id) * client->contexts.count)) {
> +        goto skip;
> +    }
> +
> +    buf = g_malloc(payload_len);
> +    if (nbd_read(client->ioc, buf, payload_len,
> +                 "CMD_BLOCK_STATUS data", errp) < 0) {
> +        return -EIO;
> +    }
> +    trace_nbd_co_receive_request_payload_received(request->handle,
> +                                                  payload_len);
> +    memset(&request->contexts, 0, sizeof(request->contexts));
> +    request->contexts.nr_bitmaps = client->context_exp->nr_export_bitmaps;
> +    bitmaps = g_new0(bool, request->contexts.nr_bitmaps);
> +    count = (payload_len - sizeof(NBDBlockStatusPayload)) / sizeof(id);

In doc we have MUST: "The payload form MUST occupy 8 + n*4 bytes", do we really want to forgive and ignore unaligned tail? May be better to "goto skip" in this case, to avoid ambiguity.

> +    payload_len = 0;
> +
> +    for (i = 0; i < count; i++) {
> +
> +        id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
> +        if (id == NBD_META_ID_BASE_ALLOCATION) {
> +            if (request->contexts.base_allocation) {
> +                goto skip;
> +            }
> +            request->contexts.base_allocation = true;
> +        } else if (id == NBD_META_ID_ALLOCATION_DEPTH) {
> +            if (request->contexts.allocation_depth) {
> +                goto skip;
> +            }
> +            request->contexts.allocation_depth = true;
> +        } else {
> +            if (id - NBD_META_ID_DIRTY_BITMAP >
> +                request->contexts.nr_bitmaps ||
> +                bitmaps[id - NBD_META_ID_DIRTY_BITMAP]) {
> +                goto skip;
> +            }
> +            bitmaps[id - NBD_META_ID_DIRTY_BITMAP] = true;
> +        }
> +    }
> +
> +    request->len = ldq_be_p(buf);
> +    request->contexts.count = count;
> +    request->contexts.bitmaps = bitmaps;
> +    bitmaps = NULL;

better I think:

request->context.bitmaps = g_steal_pointer(bitmaps);

  - as a pair to g_autofree.

> +    return 0;
> +
> + skip:
> +    trace_nbd_co_receive_block_status_payload_compliance(request->from,
> +                                                         request->len);
> +    request->len = request->contexts.count = 0;
> +    return nbd_drop(client->ioc, payload_len, errp);
> +}
> +
>   /* nbd_co_receive_request
>    * Collect a client request. Return 0 if request looks valid, -EIO to drop
>    * connection right away, -EAGAIN to indicate we were interrupted and the
> @@ -2461,7 +2547,14 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
> 
>           if (request->type == NBD_CMD_WRITE || extended_with_payload) {
>               payload_len = request->len;
> -            if (request->type != NBD_CMD_WRITE) {
> +            if (request->type == NBD_CMD_BLOCK_STATUS) {
> +                payload_len = nbd_co_block_status_payload_read(client,
> +                                                               request,
> +                                                               errp);
> +                if (payload_len < 0) {
> +                    return -EIO;
> +                }

Seems we can handle all payload in one "switch" block, instead of handling BLOCK_STATUS here and postponing WRITE payload (and dropping) up to the end of the block with help of payload_len variable.

> +            } else if (request->type != NBD_CMD_WRITE) {
>                   /*
>                    * For now, we don't support payloads on other
>                    * commands; but we can keep the connection alive.
> @@ -2540,6 +2633,9 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
>           valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO;
>       } else if (request->type == NBD_CMD_BLOCK_STATUS) {
>           valid_flags |= NBD_CMD_FLAG_REQ_ONE;
> +        if (client->extended_headers && client->contexts.count) {
> +            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
> +        }
>       }
>       if (request->flags & ~valid_flags) {
>           error_setg(errp, "unsupported flags for command %s (got 0x%x)",
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 8c35442626a..b7ab0fdc791 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -222,6 +222,7 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
>                   [NBD_FLAG_SEND_RESIZE_BIT]          = "resize",
>                   [NBD_FLAG_SEND_CACHE_BIT]           = "cache",
>                   [NBD_FLAG_SEND_FAST_ZERO_BIT]       = "fast-zero",
> +                [NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT]   = "block-status-payload",
>               };
> 
>               printf("  size:  %" PRIu64 "\n", list[i].size);
> diff --git a/nbd/trace-events b/nbd/trace-events
> index c20df33a431..da92fe1b56b 100644
> --- a/nbd/trace-events
> +++ b/nbd/trace-events
> @@ -70,6 +70,7 @@ nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void *data, size_t
>   nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset, size_t size) "Send structured read hole reply: handle = %" PRIu64 ", offset = %" PRIu64 ", len = %zu"
>   nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: handle = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
>   nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'"
> +nbd_co_receive_block_status_payload_compliance(uint64_t from, int len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%x"
>   nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
>   nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64
>   nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
> diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
> index b98582c38ea..b38f0b7963b 100644
> --- a/tests/qemu-iotests/223.out
> +++ b/tests/qemu-iotests/223.out
> @@ -83,7 +83,7 @@ exports available: 0
>   exports available: 3
>    export: 'n'
>     size:  4194304
> -  flags: 0x58f ( readonly flush fua df multi cache )
> +  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
>     min block: 1
>     opt block: 4096
>     max block: 33554432
> @@ -94,7 +94,7 @@ exports available: 3
>    export: 'n2'
>     description: some text
>     size:  4194304
> -  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
> +  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
>     min block: 1
>     opt block: 4096
>     max block: 33554432
> @@ -104,7 +104,7 @@ exports available: 3
>      qemu:dirty-bitmap:b2
>    export: 'n3'
>     size:  4194304
> -  flags: 0x58f ( readonly flush fua df multi cache )
> +  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
>     min block: 1
>     opt block: 4096
>     max block: 33554432
> @@ -205,7 +205,7 @@ exports available: 0
>   exports available: 3
>    export: 'n'
>     size:  4194304
> -  flags: 0x58f ( readonly flush fua df multi cache )
> +  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
>     min block: 1
>     opt block: 4096
>     max block: 33554432
> @@ -216,7 +216,7 @@ exports available: 3
>    export: 'n2'
>     description: some text
>     size:  4194304
> -  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
> +  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
>     min block: 1
>     opt block: 4096
>     max block: 33554432
> @@ -226,7 +226,7 @@ exports available: 3
>      qemu:dirty-bitmap:b2
>    export: 'n3'
>     size:  4194304
> -  flags: 0x58f ( readonly flush fua df multi cache )
> +  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
>     min block: 1
>     opt block: 4096
>     max block: 33554432
> diff --git a/tests/qemu-iotests/307.out b/tests/qemu-iotests/307.out
> index 2b9a6a67a1a..f645f3315f8 100644
> --- a/tests/qemu-iotests/307.out
> +++ b/tests/qemu-iotests/307.out
> @@ -15,7 +15,7 @@ wrote 4096/4096 bytes at offset 0
>   exports available: 1
>    export: 'fmt'
>     size:  67108864
> -  flags: 0x58f ( readonly flush fua df multi cache )
> +  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
>     min block: XXX
>     opt block: XXX
>     max block: XXX
> @@ -44,7 +44,7 @@ exports available: 1
>   exports available: 1
>    export: 'fmt'
>     size:  67108864
> -  flags: 0x58f ( readonly flush fua df multi cache )
> +  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
>     min block: XXX
>     opt block: XXX
>     max block: XXX
> @@ -76,7 +76,7 @@ exports available: 1
>   exports available: 2
>    export: 'fmt'
>     size:  67108864
> -  flags: 0x58f ( readonly flush fua df multi cache )
> +  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
>     min block: XXX
>     opt block: XXX
>     max block: XXX
> @@ -86,7 +86,7 @@ exports available: 2
>    export: 'export1'
>     description: This is the writable second export
>     size:  67108864
> -  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
> +  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
>     min block: XXX
>     opt block: XXX
>     max block: XXX
> @@ -113,7 +113,7 @@ exports available: 1
>    export: 'export1'
>     description: This is the writable second export
>     size:  67108864
> -  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
> +  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
>     min block: XXX
>     opt block: XXX
>     max block: XXX
> diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
> index 659276032b0..794d1bfce62 100644
> --- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out
> +++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
> @@ -17,7 +17,7 @@ wrote 2097152/2097152 bytes at offset 1048576
>   exports available: 1
>    export: ''
>     size:  4194304
> -  flags: 0x48f ( readonly flush fua df cache )
> +  flags: 0x148f ( readonly flush fua df cache block-status-payload )
>     min block: 1
>     opt block: 4096
>     max block: 33554432

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Libguestfs] [PATCH v3 14/14] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2023-06-02  9:13   ` Vladimir Sementsov-Ogievskiy
@ 2023-06-02 13:14     ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2023-06-02 13:14 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...,
	libguestfs

On Fri, Jun 02, 2023 at 12:13:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.05.23 22:53, Eric Blake wrote:
> > Allow a client to request a subset of negotiated meta contexts.  For
> > example, a client may ask to use a single connection to learn about
> > both block status and dirty bitmaps, but where the dirty bitmap
> > queries only need to be performed on a subset of the disk; forcing the
> > server to compute that information on block status queries in the rest
> > of the disk is wasted effort (both at the server, and on the amount of
> > traffic sent over the wire to be parsed and ignored by the client).
> > 
...
> > 
> > +/*
> > + * nbd_co_block_status_payload_read
> > + * Called when a client wants a subset of negotiated contexts via a
> > + * BLOCK_STATUS payload.  Check the payload for valid length and
> > + * contents.  On success, return 0 with request updated to effective
> > + * length.  If request was invalid but payload consumed, return 0 with
> > + * request->len and request->contexts.count set to 0 (which will
> > + * trigger an appropriate NBD_EINVAL response later on).
> 
> Hmm. So, it leads to
> 
>     case NBD_CMD_BLOCK_STATUS:
>         if (!request->len) {
>             return nbd_send_generic_reply(client, request, -EINVAL,
>                                           "need non-zero length", errp);
>         }
> 
> EINVAL is OK, but "need non-zero length" message is not appropriate.. I think we need separate reply for the case of invalid payload.

Or maybe just reword the error to "unexpected length", which covers a
broader swath of errors (none of which are likely from compliant
clients).

> 
> > On I/O
> > + * error, return -EIO.
> > + */
> > +static int
> > +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
> > +                                 Error **errp)
> > +{
> > +    int payload_len = request->len;
> > +    g_autofree char *buf = NULL;
> > +    g_autofree bool *bitmaps = NULL;
> > +    size_t count, i;
> > +    uint32_t id;
> > +
> > +    assert(request->len <= NBD_MAX_BUFFER_SIZE);
> > +    if (payload_len % sizeof(uint32_t) ||
> > +        payload_len < sizeof(NBDBlockStatusPayload) ||
> > +        payload_len > (sizeof(NBDBlockStatusPayload) +
> > +                       sizeof(id) * client->contexts.count)) {
> > +        goto skip;
> > +    }
> > +
> > +    buf = g_malloc(payload_len);
> > +    if (nbd_read(client->ioc, buf, payload_len,
> > +                 "CMD_BLOCK_STATUS data", errp) < 0) {
> > +        return -EIO;
> > +    }
> > +    trace_nbd_co_receive_request_payload_received(request->handle,
> > +                                                  payload_len);
> > +    memset(&request->contexts, 0, sizeof(request->contexts));
> > +    request->contexts.nr_bitmaps = client->context_exp->nr_export_bitmaps;
> > +    bitmaps = g_new0(bool, request->contexts.nr_bitmaps);
> > +    count = (payload_len - sizeof(NBDBlockStatusPayload)) / sizeof(id);
> 
> In doc we have MUST: "The payload form MUST occupy 8 + n*4 bytes", do we really want to forgive and ignore unaligned tail? May be better to "goto skip" in this case, to avoid ambiguity.

That's what happened above, when checking that payload_len %
sizeof(uint32_t) was 0.  Or am I misunderstanding your question about
another condition where goto skip would be appropriate?

> 
> > +    payload_len = 0;
> > +
> > +    for (i = 0; i < count; i++) {
> > +
> > +        id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
> > +        if (id == NBD_META_ID_BASE_ALLOCATION) {
> > +            if (request->contexts.base_allocation) {
> > +                goto skip;
> > +            }
> > +            request->contexts.base_allocation = true;
> > +        } else if (id == NBD_META_ID_ALLOCATION_DEPTH) {
> > +            if (request->contexts.allocation_depth) {
> > +                goto skip;
> > +            }
> > +            request->contexts.allocation_depth = true;
> > +        } else {
> > +            if (id - NBD_META_ID_DIRTY_BITMAP >
> > +                request->contexts.nr_bitmaps ||
> > +                bitmaps[id - NBD_META_ID_DIRTY_BITMAP]) {
> > +                goto skip;
> > +            }
> > +            bitmaps[id - NBD_META_ID_DIRTY_BITMAP] = true;
> > +        }
> > +    }
> > +
> > +    request->len = ldq_be_p(buf);
> > +    request->contexts.count = count;
> > +    request->contexts.bitmaps = bitmaps;
> > +    bitmaps = NULL;
> 
> better I think:
> 
> request->context.bitmaps = g_steal_pointer(bitmaps);
> 
>  - as a pair to g_autofree.

Yes, that is shorter.

> 
> > +    return 0;
> > +
> > + skip:
> > +    trace_nbd_co_receive_block_status_payload_compliance(request->from,
> > +                                                         request->len);
> > +    request->len = request->contexts.count = 0;
> > +    return nbd_drop(client->ioc, payload_len, errp);
> > +}
> > +
> >   /* nbd_co_receive_request
> >    * Collect a client request. Return 0 if request looks valid, -EIO to drop
> >    * connection right away, -EAGAIN to indicate we were interrupted and the
> > @@ -2461,7 +2547,14 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
> > 
> >           if (request->type == NBD_CMD_WRITE || extended_with_payload) {
> >               payload_len = request->len;
> > -            if (request->type != NBD_CMD_WRITE) {
> > +            if (request->type == NBD_CMD_BLOCK_STATUS) {
> > +                payload_len = nbd_co_block_status_payload_read(client,
> > +                                                               request,
> > +                                                               errp);
> > +                if (payload_len < 0) {
> > +                    return -EIO;
> > +                }
> 
> Seems we can handle all payload in one "switch" block, instead of handling BLOCK_STATUS here and postponing WRITE payload (and dropping) up to the end of the block with help of payload_len variable.

I can experiment with other ways of respresenting the logic; there's
enough complexity that I'm trying to balance between repeating
conditionals vs. avoiding added nesting.

> 
> > +            } else if (request->type != NBD_CMD_WRITE) {
> >                   /*
> >                    * For now, we don't support payloads on other
> >                    * commands; but we can keep the connection alive.
> > @@ -2540,6 +2633,9 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest *
> >           valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO;
> >       } else if (request->type == NBD_CMD_BLOCK_STATUS) {
> >           valid_flags |= NBD_CMD_FLAG_REQ_ONE;
> > +        if (client->extended_headers && client->contexts.count) {
> > +            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
> > +        }
> >       }
> >       if (request->flags & ~valid_flags) {
> >           error_setg(errp, "unsupported flags for command %s (got 0x%x)",

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 09/14] nbd/server: Initial support for extended headers
  2023-05-31 14:46   ` Vladimir Sementsov-Ogievskiy
@ 2023-06-07 11:39     ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2023-06-07 11:39 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, libguestfs, open list:Network Block Dev...

On Wed, May 31, 2023 at 05:46:55PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.05.23 22:53, Eric Blake wrote:
> > Time to support clients that request extended headers.  Now we can
> > finally reach the code added across several previous patches.
> > 
> > Even though the NBD spec has been altered to allow us to accept
> > NBD_CMD_READ larger than the max payload size (provided our response
> > is a hole or broken up over more than one data chunk), we are not
> > planning to take advantage of that, and continue to cap NBD_CMD_READ
> > to 32M regardless of header size.
> > 
> > For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already
> > supports 64-bit operations without any effort on our part.  For
> > NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous
> > patch took care of implementing the required
> > NBD_REPLY_TYPE_BLOCK_STATUS_EXT.
> > 
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> >   nbd/nbd-internal.h |   5 +-
> 
> [..]
> 
> > 
> >   static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov,
> > -                                       uint64_t error, NBDRequest *request)
> > +                                       uint32_t error, NBDStructuredError *err,
> > +                                       NBDRequest *request)
> >   {
> > -    NBDSimpleReply *reply = iov->iov_base;
> > +    if (client->header_style >= NBD_HEADER_EXTENDED) {
> > +        NBDExtendedReplyChunk *chunk = iov->iov_base;
> > 
> > -    iov->iov_len = sizeof(*reply);
> > -    stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
> > -    stl_be_p(&reply->error, error);
> > -    stq_be_p(&reply->handle, request->handle);
> > +        iov->iov_len = sizeof(*chunk);
> > +        stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC);
> > +        stw_be_p(&chunk->flags, NBD_REPLY_FLAG_DONE);
> > +        stq_be_p(&chunk->handle, request->handle);
> > +        stq_be_p(&chunk->offset, request->from);
> > +        if (error) {
> > +            assert(!iov[1].iov_base);
> > +            iov[1].iov_base = err;
> > +            iov[1].iov_len = sizeof(*err);
> > +            stw_be_p(&chunk->type, NBD_REPLY_TYPE_ERROR);
> > +            stq_be_p(&chunk->length, sizeof(*err));
> > +            stl_be_p(&err->error, error);
> > +            stw_be_p(&err->message_length, 0);
> > +        } else {
> > +            stw_be_p(&chunk->type, NBD_REPLY_TYPE_NONE);
> > +            stq_be_p(&chunk->length, 0);
> > +        }
> > +    } else {
> > +        NBDSimpleReply *reply = iov->iov_base;
> > +
> > +        iov->iov_len = sizeof(*reply);
> > +        stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
> > +        stl_be_p(&reply->error, error);
> > +        stq_be_p(&reply->handle, request->handle);
> > +    }
> >   }
> > 
> >   static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
> > @@ -1906,30 +1966,44 @@ static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
> 
> So, that's not _simple_ now.. The function should be renamed. As well as set_be_simple_reply(). _simple_or_extended_ ? a bit too long. But continuing to use "simple" is in bad relation with use of "simple" word in specification.

In fact, I added an assertion that set_be_simple_reply() can only be
reached when extended replies are not in use, so none of this
complexity here was needed after all.

> 
> Probably better to update callers? The only caller isi nbd_send_generic_reply(). So, could we just add nbd_co_send_single_extended_reply() to call from nbd_send_generic_reply() in case of EXTENDED?
> 
> Also, transformation of set_be_simple_reply() do look like it should be two separate functions.
> 
> >   {
> >       NBDReply hdr;
> >       int nbd_err = system_errno_to_nbd_errno(error);
> > +    NBDStructuredError err;
> >       struct iovec iov[] = {
> >           {.iov_base = &hdr},
> >           {.iov_base = data, .iov_len = len}
> >       };
> > 
> > +    assert(!len || !nbd_err);
> >       trace_nbd_co_send_simple_reply(request->handle, nbd_err,
> >                                      nbd_err_lookup(nbd_err), len);
> > -    set_be_simple_reply(client, &iov[0], nbd_err, request);
> > +    set_be_simple_reply(client, &iov[0], nbd_err, &err, request);
> > 
> > -    return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
> > +    return nbd_co_send_iov(client, iov, iov[1].iov_len ? 2 : 1, errp);

Not introduced in this patch, but it turns out that when
iov[1].iov_len == 0, blindly passing niov==2 to nbd_co_send_iov()
still does the right thing, so I can lose the conditional here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 10/14] nbd/client: Initial support for extended headers
  2023-05-31 15:26   ` Vladimir Sementsov-Ogievskiy
@ 2023-06-07 18:22     ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2023-06-07 18:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, libguestfs, Kevin Wolf, Hanna Reitz,
	open list:Network Block Dev...

The subject lines are confusing: 9/14 enables extended headers in the
server, while this one does not yet enable the headers but is merely a
preliminary step.  I'll probably retitle one or the other in v4.

On Wed, May 31, 2023 at 06:26:17PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.05.23 22:53, Eric Blake wrote:
> > Update the client code to be able to send an extended request, and
> > parse an extended header from the server.  Note that since we reject
> > any structured reply with a too-large payload, we can always normalize
> > a valid header back into the compact form, so that the caller need not
> > deal with two branches of a union.  Still, until a later patch lets
> > the client negotiate extended headers, the code added here should not
> > be reached.  Note that because of the different magic numbers, it is
> > just as easy to trace and then tolerate a non-compliant server sending
> > the wrong header reply as it would be to insist that the server is
> > compliant.
> > 
> > The only caller to nbd_receive_reply() always passed NULL for errp;
> > since we are changing the signature anyways, I decided to sink the
> > decision to ignore errors one layer lower.
> 
> This way nbd_receive_simple_reply() and nbd_receive_structured_reply_chunk() are called now only with explicit NULL last argument.. And we start to drop all errors.
> 
> Also, actually, we'd better add errp parameter to the caller - nbd_receive_replies(), because its caller (nbd_co_do_receive_one_chunk()) will benefit of it, as it already has errp.

I can explore plumbing errp back through for v4.

> > @@ -1394,28 +1401,34 @@ static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply,
> > 
> >   /* nbd_receive_structured_reply_chunk
> >    * Read structured reply chunk except magic field (which should be already
> > - * read).
> > + * read).  Normalize into the compact form.
> >    * Payload is not read.
> >    */
> > -static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
> > -                                              NBDStructuredReplyChunk *chunk,
> > +static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *chunk,
> >                                                 Error **errp)
> 
> Hmm, _structured_or_extened_ ? Or at least in comment above the function we should mention this.

I'm going with 'nbd_receive_reply_chunk', since both structured and
extended modes receive chunks.

> 
> >   {
> >       int ret;
> > +    size_t len;
> > +    uint64_t payload_len;
> > 
> > -    assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
> > +    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
> > +        len = sizeof(chunk->structured);
> > +    } else {
> > +        assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC);
> > +        len = sizeof(chunk->extended);
> > +    }
> > 
> >       ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
> > -                   sizeof(*chunk) - sizeof(chunk->magic), "structured chunk",
> 
> Would be good to print "extended chunk" in error message for EXTENDED case.

Or even just "chunk header", which covers both modes.

> >   int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
> > -                                   NBDReply *reply, Error **errp)
> > +                                   NBDReply *reply, NBDHeaderStyle hdr)
> >   {
> >       int ret;
> >       const char *type;
> > 
> > -    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp);
> > +    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), NULL);
> >       if (ret <= 0) {
> >           return ret;
> >       }
> > 
> >       reply->magic = be32_to_cpu(reply->magic);
> > 
> > +    /* Diagnose but accept wrong-width header */
> >       switch (reply->magic) {
> >       case NBD_SIMPLE_REPLY_MAGIC:
> > -        ret = nbd_receive_simple_reply(ioc, &reply->simple, errp);
> > +        if (hdr >= NBD_HEADER_EXTENDED) {
> > +            trace_nbd_receive_wrong_header(reply->magic);
> 
> maybe, trace also expected style

Sure, I can give that a shot.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-06-07 18:22 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 19:53 [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake
2023-05-15 19:53 ` [PATCH v3 01/14] nbd/client: Use smarter assert Eric Blake
2023-05-29  8:20   ` Vladimir Sementsov-Ogievskiy
2023-05-15 19:53 ` [PATCH v3 02/14] nbd/client: Add safety check on chunk payload length Eric Blake
2023-05-29  8:25   ` Vladimir Sementsov-Ogievskiy
2023-05-15 19:53 ` [PATCH v3 03/14] nbd/server: Prepare for alternate-size headers Eric Blake
2023-05-29 14:26   ` Vladimir Sementsov-Ogievskiy
2023-05-30 16:29     ` Eric Blake
2023-05-31  7:28       ` Vladimir Sementsov-Ogievskiy
2023-05-15 19:53 ` [PATCH v3 04/14] nbd: Prepare for 64-bit request effect lengths Eric Blake
2023-05-30 13:05   ` Vladimir Sementsov-Ogievskiy
2023-05-30 18:23     ` Eric Blake
2023-05-15 19:53 ` [PATCH v3 05/14] nbd: Add types for extended headers Eric Blake
2023-05-30 13:23   ` Vladimir Sementsov-Ogievskiy
2023-05-30 18:22     ` [Libguestfs] " Eric Blake
2023-05-31  7:30       ` Vladimir Sementsov-Ogievskiy
2023-05-15 19:53 ` [PATCH v3 06/14] nbd/server: Refactor handling of request payload Eric Blake
2023-05-31  8:04   ` Vladimir Sementsov-Ogievskiy
2023-05-15 19:53 ` [PATCH v3 07/14] nbd/server: Refactor to pass full request around Eric Blake
2023-05-31  8:13   ` Vladimir Sementsov-Ogievskiy
2023-05-15 19:53 ` [PATCH v3 08/14] nbd/server: Support 64-bit block status Eric Blake
2023-05-31 14:10   ` Vladimir Sementsov-Ogievskiy
2023-05-15 19:53 ` [PATCH v3 09/14] nbd/server: Initial support for extended headers Eric Blake
2023-05-31 14:46   ` Vladimir Sementsov-Ogievskiy
2023-06-07 11:39     ` Eric Blake
2023-05-15 19:53 ` [PATCH v3 10/14] nbd/client: " Eric Blake
2023-05-31 15:26   ` Vladimir Sementsov-Ogievskiy
2023-06-07 18:22     ` Eric Blake
2023-05-15 19:53 ` [PATCH v3 11/14] nbd/client: Accept 64-bit block status chunks Eric Blake
2023-05-31 17:00   ` Vladimir Sementsov-Ogievskiy
2023-05-15 19:53 ` [PATCH v3 12/14] nbd/client: Request extended headers during negotiation Eric Blake
     [not found]   ` <1af7f692-b5de-c767-2568-1fc024a57133@yandex-team.ru>
     [not found]     ` <cqb3yww5ceeinh2pb5nqaljrsllu3ejkjsdueuw32cwcocumsn@okgujto2lzmn>
     [not found]       ` <cd83b0bc-0e6b-fc94-1cc2-9bf00d516140@yandex-team.ru>
     [not found]         ` <hbjtjovry4e5kb6oyii4g2hncetfo2uic67r5ipufcikvgyb5x@idenexfxits4>
2023-06-01  8:43           ` Vladimir Sementsov-Ogievskiy
2023-05-15 19:53 ` [PATCH v3 13/14] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
2023-06-01  9:57   ` Vladimir Sementsov-Ogievskiy
2023-05-15 19:53 ` [PATCH v3 14/14] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
2023-06-02  9:13   ` Vladimir Sementsov-Ogievskiy
2023-06-02 13:14     ` [Libguestfs] " Eric Blake
2023-05-15 21:05 ` [Libguestfs] [PATCH v3 00/14] qemu patches for 64-bit NBD extensions Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.