All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC for NBD protocol extension: extended headers
@ 2021-12-03 23:13 Eric Blake
  2021-12-03 23:14 ` [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:13 UTC (permalink / raw)
  To: nbd, qemu-devel, qemu-block, vsementsov, libguestfs, nsoffer

In response to this mail, I will be cross-posting a series of patches
to multiple projects as a proof-of-concept implementation and request
for comments on a new NBD protocol extension, called
NBD_OPT_EXTENDED_HEADERS.  With this in place, it will be possible for
clients to request 64-bit zero, trim, cache, and block status
operations when supported by the server.

Not yet complete: an implementation of this in nbdkit.  I also plan to
tweak libnbd's 'nbdinfo --map' and 'nbdcopy' to take advantage of the
larger block status results.  Also, with 64-bit commands, we may want
to also make it easier to let servers advertise an actual maximum size
they are willing to accept for the commands in question (for example,
a server may be happy with a full 64-bit block status, but still want
to limit non-fast zero and cache to a smaller limit to avoid denial of
service).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2021-12-03 23:13 RFC for NBD protocol extension: extended headers Eric Blake
@ 2021-12-03 23:14 ` Eric Blake
  2021-12-06 11:40   ` Vladimir Sementsov-Ogievskiy
                     ` (3 more replies)
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
  2 siblings, 4 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:14 UTC (permalink / raw)
  To: nbd; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, libguestfs

Add a new negotiation feature where the client and server agree to use
larger packet headers on every packet sent during transmission phase.
This has two purposes: first, it makes it possible to perform
operations like trim, write zeroes, and block status on more than 2^32
bytes in a single command; this in turn requires that some structured
replies from the server also be extended to match.  The wording chosen
here is careful to permit a server to use either flavor in its reply
(that is, a request less than 32-bits can trigger an extended reply,
and conversely a request larger than 32-bits can trigger a compact
reply).

Second, when structured replies are active, clients have to deal with
the difference between 16- and 20-byte headers of simple
vs. structured replies, which impacts performance if the client must
perform multiple syscalls to first read the magic before knowing how
many additional bytes to read.  In extended header mode, all headers
are the same width, so the client can read a full header before
deciding whether the header describes a simple or structured reply.
Similarly, by having extended mode use a power-of-2 sizing, it becomes
easier to manipulate headers within a single cache line, even if it
requires padding bytes sent over the wire.  However, note that this
change only affects the headers; as data payloads can still be
unaligned (for example, a client performing 1-byte reads or writes),
we would need to negotiate yet another extension if we wanted to
ensure that all NBD transmission packets started on an 8-byte boundary
after option haggling has completed.

This spec addition was done in parallel with a proof of concept
implementation in qemu (server and client) and libnbd (client), and I
also have plans to implement it in nbdkit (server).

Signed-off-by: Eric Blake <eblake@redhat.com>
---

Available at https://repo.or.cz/nbd/ericb.git/shortlog/refs/tags/exthdr-v1

 doc/proto.md | 218 +++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 177 insertions(+), 41 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 3a877a9..46560b6 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -295,6 +295,21 @@ reply is also problematic for error handling of the `NBD_CMD_READ`
 request.  Therefore, structured replies can be used to create a
 a context-free server stream; see below.

+The results of client negotiation also determine whether the client
+and server will utilize only compact requests and replies, or whether
+both sides will use only extended packets.  Compact messages are the
+default, but inherently limit single transactions to a 32-bit window
+starting at a 64-bit offset.  Extended messages make it possible to
+perform 64-bit transactions (although typically only for commands that
+do not include a data payload).  Furthermore, when structured replies
+have been negotiated, compact messages require the client to perform
+partial reads to determine which reply packet style (simple or
+structured) is on the wire before knowing the length of the rest of
+the reply, which can reduce client performance.  With extended
+messages, all packet headers have a fixed length of 32 bytes, and
+although this results in more traffic over the network due to padding,
+the resulting layout is friendlier for performance.
+
 Replies need not be sent in the same order as requests (i.e., requests
 may be handled by the server asynchronously), and structured reply
 chunks from one request may be interleaved with reply messages from
@@ -343,7 +358,9 @@ may be useful.

 #### Request message

-The request message, sent by the client, looks as follows:
+The compact request message, sent by the client when extended
+transactions are not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
+looks as follows:

 C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`)  
 C: 16 bits, command flags  
@@ -353,14 +370,26 @@ C: 64 bits, offset (unsigned)
 C: 32 bits, length (unsigned)  
 C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)  

+If negotiation agreed on extended transactions with
+`NBD_OPT_EXTENDED_HEADERS`, the client instead uses extended requests:
+
+C: 32 bits, 0x21e41c71, magic (`NBD_REQUEST_EXT_MAGIC`)  
+C: 16 bits, command flags  
+C: 16 bits, type  
+C: 64 bits, handle  
+C: 64 bits, offset (unsigned)  
+C: 64 bits, length (unsigned)  
+C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)  
+
 #### Simple reply message

 The simple reply message MUST be sent by the server in response to all
 requests if structured replies have not been negotiated using
-`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
-reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
-but only if the reply has no data payload.  The message looks as
-follows:
+`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been
+negotiated, a simple reply MAY be used as a reply to any request other
+than `NBD_CMD_READ`, but only if the reply has no data payload.  If
+extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
+the message looks as follows:

 S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
    `NBD_REPLY_MAGIC`)  
@@ -369,6 +398,16 @@ S: 64 bits, handle
 S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
     *error* is zero)  

+If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
+the message looks like:
+
+S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`)  
+S: 32 bits, error (MAY be zero)  
+S: 64 bits, handle  
+S: 128 bits, padding (MUST be zero)  
+S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
+    *error* is zero)  
+
 #### Structured reply chunk message

 Some of the major downsides of the default simple reply to
@@ -410,7 +449,9 @@ considered successful only if it did not contain any error chunks,
 although the client MAY be able to determine partial success based
 on the chunks received.

-A structured reply chunk message looks as follows:
+If extended headers were not negotiated using
+`NBD_OPT_EXTENDED_HEADERS`, a structured reply chunk message looks as
+follows:

 S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)  
 S: 16 bits, flags  
@@ -423,6 +464,17 @@ The use of *length* in the reply allows context-free division of
 the overall server traffic into individual reply messages; the
 *type* field describes how to further interpret the payload.

+If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
+the message looks like:
+
+S: 32 bits, 0x6e8a278c, magic (`NBD_STRUCTURED_REPLY_EXT_MAGIC`)  
+S: 16 bits, flags  
+S: 16 bits, type  
+S: 64 bits, handle  
+S: 64 bits, length of payload (unsigned)  
+S: 64 bits, padding (MUST be zero)  
+S: *length* bytes of payload data (if *length* is nonzero)  
+
 #### Terminating the transmission phase

 There are two methods of terminating the transmission phase:
@@ -870,15 +922,19 @@ The procedure works as follows:
   server supports.
 - During transmission, a client can then indicate interest in metadata
   for a given region by way of the `NBD_CMD_BLOCK_STATUS` command,
-  where *offset* and *length* indicate the area of interest. The
-  server MUST then respond with the requested information, for all
+  where *offset* and *length* indicate the area of interest.
+- The server MUST then respond with the requested information, for all
   contexts which were selected during negotiation. For every metadata
-  context, the server sends one set of extent chunks, where the sizes
-  of the extents MUST be less than or equal to the length as specified
-  in the request. Each extent comes with a *flags* field, the
-  semantics of which are defined by the metadata context.
-- A server MUST reply to `NBD_CMD_BLOCK_STATUS` with a structured
-  reply of type `NBD_REPLY_TYPE_BLOCK_STATUS`.
+  context, the server sends one set of extent chunks, using
+  `NBD_REPLY_TYPE_BLOCK_STATUS` or `NBD_REPLY_TYPE_BLOCK_STATUS_EXT`
+  (the latter is only possible if the client also negotiated
+  `NBD_OPT_EXTENDED_HEADERS`).  Each extent comes with a *flags*
+  field, the semantics of which are defined by the metadata context.
+
+The client's requested *size* is only a hint to the server, so the
+summed size of extents in the server's reply may be shorter, or in
+some cases longer, than the original request, and may even differ
+between contexts when multiple metadata contexts were negotiated.

 A client MUST NOT use `NBD_CMD_BLOCK_STATUS` unless it selected a
 nonzero number of metadata contexts during negotiation, and used the
@@ -1179,10 +1235,10 @@ of the newstyle negotiation.

     When this command succeeds, the server MUST NOT preserve any
     negotiation state (such as a request for
-    `NBD_OPT_STRUCTURED_REPLY`, or metadata contexts from
-    `NBD_OPT_SET_META_CONTEXT`) issued before this command.  A client
-    SHOULD defer all stateful option requests until after it
-    determines whether encryption is available.
+    `NBD_OPT_STRUCTURED_REPLY` or `NBD_OPT_EXTENDED_HEADERS`, or
+    metadata contexts from `NBD_OPT_SET_META_CONTEXT`) issued before
+    this command.  A client SHOULD defer all stateful option requests
+    until after it determines whether encryption is available.

     See the section on TLS above for further details.

@@ -1460,6 +1516,26 @@ of the newstyle negotiation.
     option does not select any metadata context, provided the client
     then does not attempt to issue `NBD_CMD_BLOCK_STATUS` commands.

+* `NBD_OPT_EXTENDED_HEADERS` (11)
+
+    The client wishes to use extended headers during the transmission
+    phase.  The client MUST NOT send any additional data with the
+    option, and the server SHOULD reject a request that includes data
+    with `NBD_REP_ERR_INVALID`.
+
+    The server replies with the following, or with an error permitted
+    elsewhere in this document:
+
+    - `NBD_REP_ACK`: Extended headers have been negotiated; the client
+      MUST use the 32-byte extended request header, and the server
+      MUST use the 32-byte extended reply header.
+    - For backwards compatibility, clients SHOULD be prepared to also
+      handle `NBD_REP_ERR_UNSUP`; in this case, only the compact
+      transmission headers will be used.
+
+    If the client requests `NBD_OPT_STARTTLS` after this option, it
+    MUST renegotiate extended headers.
+
 #### Option reply types

 These values are used in the "reply type" field, sent by the server
@@ -1713,12 +1789,12 @@ unrecognized flags.

 #### Structured reply types

-These values are used in the "type" field of a structured reply.
-Some chunk types can additionally be categorized by role, such as
-*error chunks* or *content chunks*.  Each type determines how to
-interpret the "length" bytes of payload.  If the client receives
-an unknown or unexpected type, other than an *error chunk*, it
-MUST initiate a hard disconnect.
+These values are used in the "type" field of a structured reply.  Some
+chunk types can additionally be categorized by role, such as *error
+chunks*, *content chunks*, or *status chunks*.  Each type determines
+how to interpret the "length" bytes of payload.  If the client
+receives an unknown or unexpected type, other than an *error chunk*,
+it MUST initiate a hard disconnect.

 * `NBD_REPLY_TYPE_NONE` (0)

@@ -1761,13 +1837,34 @@ MUST initiate a hard disconnect.
   64 bits: offset (unsigned)  
   32 bits: hole size (unsigned, MUST be nonzero)  

+* `NBD_REPLY_TYPE_OFFSET_HOLE_EXT` (3)
+
+  This chunk type is in the content chunk category.  *length* MUST be
+  exactly 16.  The semantics of this chunk mirror those of
+  `NBD_REPLY_TYPE_OFFSET_HOLE`, other than the use of a larger *hole
+  size* field.  This chunk type MUST NOT be used unless extended
+  headers were negotiated with `NBD_OPT_EXTENDED_HEADERS`.
+
+  The payload is structured as:
+
+  64 bits: offset (unsigned)  
+  64 bits: hole size (unsigned, MUST be nonzero)  
+
+  Note that even when extended headers are in use, a server may
+  enforce a maximum block size that is smaller than 32 bits, in which
+  case no valid `NBD_CMD_READ` will have a *length* large enough to
+  require the use of this chunk type.  However, a client using
+  extended headers MUST be prepared for the server to use either the
+  compact or extended chunk type.
+
 * `NBD_REPLY_TYPE_BLOCK_STATUS` (5)

-  *length* MUST be 4 + (a positive integer multiple of 8).  This reply
-  represents a series of consecutive block descriptors where the sum
-  of the length fields within the descriptors is subject to further
-  constraints documented below. This chunk type MUST appear
-  exactly once per metadata ID in a structured reply.
+  This chunk type is in the status chunk category.  *length* MUST be
+  4 + (a positive integer multiple of 8).  This reply represents a
+  series of consecutive block descriptors where the sum of the length
+  fields within the descriptors is subject to further constraints
+  documented below.  Each negotiated metadata ID must have exactly one
+  status chunk in the overall structured reply.

   The payload starts with:

@@ -1796,9 +1893,36 @@ MUST initiate a hard disconnect.
   information to the client, if looking up the information would be
   too resource-intensive for the server, so long as at least one
   extent is returned. Servers should however be aware that most
-  clients implementations will then simply ask for the next extent
+  client implementations will then simply ask for the next extent
   instead.

+* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)
+
+  This chunk type is in the status chunk category.  *length* MUST be
+  4 + (a positive multiple of 16).  The semantics of this chunk mirror
+  those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a
+  larger *extent length* field, as well as added padding to ease
+  alignment.  This chunk type MUST NOT be used unless extended headers
+  were negotiated with `NBD_OPT_EXTENDED_HEADERS`.
+
+  The payload starts with:
+
+  32 bits, metadata context ID  
+
+  and is followed by a list of one or more descriptors, each with this
+  layout:
+
+  64 bits, length of the extent to which the status below
+     applies (unsigned, MUST be nonzero)  
+  32 bits, status flags  
+  32 bits, padding (MUST be zero)  
+
+  Note that even when extended headers are in use, the client MUST be
+  prepared for the server to use either the compact or extended chunk
+  type, regardless of whether the client's hinted length was more or
+  less than 32 bits, but the server MUST use exactly one of the two
+  chunk types per negotiated metacontext ID.
+
 All error chunk types have bit 15 set, and begin with the same
 *error*, *message length*, and optional *message* fields as
 `NBD_REPLY_TYPE_ERROR`.  If nonzero, *message length* indicates
@@ -1812,7 +1936,10 @@ remaining structured fields at the end.
   be at least 6.  This chunk represents that an error occurred,
   and the client MAY NOT make any assumptions about partial
   success. This type SHOULD NOT be used more than once in a
-  structured reply.  Valid as a reply to any request.
+  structured reply.  Valid as a reply to any request.  Note that
+  *message length* MUST NOT exceed the 4096 bytes string length limit,
+  and therefore there is no need for a counterpart extended-length
+  error chunk type.

   The payload is structured as:

@@ -1867,7 +1994,8 @@ The following request types exist:

     If structured replies were not negotiated, then a read request
     MUST always be answered by a simple reply, as documented above
-    (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing
+    (using `NBD_SIMPLE_REPLY_MAGIC` or `NBD_SIMPLE_REPLY_EXT_MAGIC`
+    according to whether extended headers are in use, and containing
     length bytes of data according to the client's request).

     If an error occurs, the server SHOULD set the appropriate error code
@@ -1883,7 +2011,8 @@ The following request types exist:

     If structured replies are negotiated, then a read request MUST
     result in a structured reply with one or more chunks (each using
-    magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), where the final
+    `NBD_STRUCTURED_REPLY_MAGIC` or `NBD_STRUCTURED_REPLY_EXT_MAGIC`
+    according to whether extended headers are in use), where the final
     chunk has the flag `NBD_REPLY_FLAG_DONE`, and with the following
     additional constraints.

@@ -1897,13 +2026,14 @@ The following request types exist:
     chunks that describe data outside the offset and length of the
     request, but MAY send the content chunks in any order (the client
     MUST reassemble content chunks into the correct order), and MAY
-    send additional content chunks even after reporting an error chunk.
-    Note that a request for more than 2^32 - 8 bytes MUST be split
-    into at least two chunks, so as not to overflow the length field
-    of a reply while still allowing space for the offset of each
-    chunk.  When no error is detected, the server MUST send enough
-    data chunks to cover the entire region described by the offset and
-    length of the client's request.
+    send additional content chunks even after reporting an error
+    chunk.  Note that if extended headers are not in use, a request
+    for more than 2^32 - 8 bytes MUST be split into at least two
+    chunks, so as not to overflow the length field of a reply while
+    still allowing space for the offset of each chunk.  When no error
+    is detected, the server MUST send enough data chunks to cover the
+    entire region described by the offset and length of the client's
+    request.

     To minimize traffic, the server MAY use a content or error chunk
     as the final chunk by setting the `NBD_REPLY_FLAG_DONE` flag, but
@@ -2136,13 +2266,19 @@ The following request types exist:
     server returned at least one metadata context without an error.
     This in turn requires the client to first negotiate structured
     replies. For a successful return, the server MUST use a structured
-    reply, containing exactly one chunk of type
+    reply, containing exactly one status chunk of type
     `NBD_REPLY_TYPE_BLOCK_STATUS` per selected context id, where the
     status field of each descriptor is determined by the flags field
     as defined by the metadata context.  The server MAY send chunks in
     a different order than the context ids were assigned in reply to
     `NBD_OPT_SET_META_CONTEXT`.

+    If extended headers were negotiated via
+    `NBD_OPT_EXTENDED_HEADERS`, the server may use
+    `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` instead of
+    `NBD_REPLY_TYPE_BLOCK_STATUS` as the reply chunk for a metacontext
+    id.
+
     The list of block status descriptors within the
     `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
     of the file starting from specified *offset*.  If the client used
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS
  2021-12-03 23:13 RFC for NBD protocol extension: extended headers Eric Blake
  2021-12-03 23:14 ` [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
@ 2021-12-03 23:15 ` Eric Blake
  2021-12-03 23:15   ` [PATCH 01/14] nbd/server: Minor cleanups Eric Blake
                     ` (13 more replies)
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
  2 siblings, 14 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: nsoffer, vsementsov, libguestfs, qemu-block, nbd

Available at https://repo.or.cz/qemu/ericb.git/shortlog/refs/tags/exthdr-v1

Patch 14 is optional; I'm including it now because I tested with it,
but I'm also okay with dropping it based on RFC discussion.

Eric Blake (14):
  nbd/server: Minor cleanups
  qemu-io: Utilize 64-bit status during map
  qemu-io: Allow larger write zeroes under no fallback
  nbd/client: Add safety check on chunk payload length
  nbd/server: Prepare for alternate-size headers
  nbd: Prepare for 64-bit requests
  nbd: Add types for extended headers
  nbd/server: Initial support for extended headers
  nbd/server: Support 64-bit block status
  nbd/client: Initial support for extended headers
  nbd/client: Accept 64-bit hole chunks
  nbd/client: Accept 64-bit block status chunks
  nbd/client: Request extended headers during negotiation
  do not apply: nbd/server: Send 64-bit hole chunk

 docs/interop/nbd.txt                          |   1 +
 include/block/nbd.h                           |  94 +++++--
 nbd/nbd-internal.h                            |   8 +-
 block/nbd.c                                   | 102 +++++--
 nbd/client-connection.c                       |   1 +
 nbd/client.c                                  | 150 +++++++---
 nbd/common.c                                  |  10 +-
 nbd/server.c                                  | 262 +++++++++++++-----
 qemu-io-cmds.c                                |  16 +-
 qemu-nbd.c                                    |   2 +
 block/trace-events                            |   1 +
 nbd/trace-events                              |   9 +-
 tests/qemu-iotests/223.out                    |   4 +
 tests/qemu-iotests/233.out                    |   1 +
 tests/qemu-iotests/241                        |   8 +-
 tests/qemu-iotests/307                        |   2 +-
 tests/qemu-iotests/307.out                    |   5 +
 .../tests/nbd-qemu-allocation.out             |   1 +
 18 files changed, 486 insertions(+), 191 deletions(-)

-- 
2.33.1



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/14] nbd/server: Minor cleanups
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-06 12:03     ` Vladimir Sementsov-Ogievskiy
  2021-12-03 23:15   ` [PATCH 02/14] qemu-io: Utilize 64-bit status during map Eric Blake
                     ` (12 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: nsoffer, vsementsov, libguestfs, qemu-block, nbd

Spelling fixes, grammar improvements and consistent spacing, noticed
while preparing other patches in this file.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 4630dd732250..f302e1cbb03e 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2085,11 +2085,10 @@ static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
  * Add extent to NBDExtentArray. If extent can't be added (no available space),
  * return -1.
  * For safety, when returning -1 for the first time, .can_add is set to false,
- * further call to nbd_extent_array_add() will crash.
- * (to avoid the situation, when after failing to add an extent (returned -1),
- * user miss this failure and add another extent, which is successfully added
- * (array is full, but new extent may be squashed into the last one), then we
- * have invalid array with skipped extent)
+ * and further calls to nbd_extent_array_add() will crash.
+ * (this avoids the situation where a caller ignores failure to add one extent,
+ * where adding another extent that would squash into the last array entry
+ * would result in an incorrect range reported to the client)
  */
 static int nbd_extent_array_add(NBDExtentArray *ea,
                                 uint32_t length, uint32_t flags)
@@ -2288,7 +2287,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
     assert(client->recv_coroutine == qemu_coroutine_self());
     ret = nbd_receive_request(client, request, errp);
     if (ret < 0) {
-        return  ret;
+        return ret;
     }

     trace_nbd_co_receive_request_decode_type(request->handle, request->type,
@@ -2648,7 +2647,7 @@ static coroutine_fn void nbd_trip(void *opaque)
     }

     if (ret < 0) {
-        /* It wans't -EIO, so, according to nbd_co_receive_request()
+        /* It wasn't -EIO, so, according to nbd_co_receive_request()
          * semantics, we should return the error to the client. */
         Error *export_err = local_err;

-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/14] qemu-io: Utilize 64-bit status during map
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
  2021-12-03 23:15   ` [PATCH 01/14] nbd/server: Minor cleanups Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-06 12:06     ` Vladimir Sementsov-Ogievskiy
  2021-12-03 23:15   ` [PATCH 03/14] qemu-io: Allow larger write zeroes under no fallback Eric Blake
                     ` (11 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, nbd, nsoffer, Hanna Reitz,
	libguestfs

The block layer has supported 64-bit block status from drivers since
commit 86a3d5c688 ("block: Add .bdrv_co_block_status() callback",
v2.12) and friends, with individual driver callbacks responsible for
capping things where necessary.  Artificially capping things below 2G
in the qemu-io 'map' command, added in commit d6a644bbfe ("block: Make
bdrv_is_allocated() byte-based", v2.10) is thus no longer necessary.

One way to test this is with qemu-nbd as server on a raw file larger
than 4G (the entire file should show as allocated), plus 'qemu-io -f
raw -c map nbd://localhost --trace=nbd_\*' as client.  Prior to this
patch, the NBD_CMD_BLOCK_STATUS requests are fragmented at 0x7ffffe00
distances; with this patch, the fragmenting changes to 0x7fffffff
(since the NBD protocol is currently still limited to 32-bit
transactions - see block/nbd.c:nbd_client_co_block_status).  Then in
later patches, once I add an NBD extension for a 64-bit block status,
the same map command completes with just one NBD_CMD_BLOCK_STATUS.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-io-cmds.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 46593d632d8f..954955c12fb9 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1993,11 +1993,9 @@ static int map_is_allocated(BlockDriverState *bs, int64_t offset,
                             int64_t bytes, int64_t *pnum)
 {
     int64_t num;
-    int num_checked;
     int ret, firstret;

-    num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
-    ret = bdrv_is_allocated(bs, offset, num_checked, &num);
+    ret = bdrv_is_allocated(bs, offset, bytes, &num);
     if (ret < 0) {
         return ret;
     }
@@ -2009,8 +2007,7 @@ static int map_is_allocated(BlockDriverState *bs, int64_t offset,
         offset += num;
         bytes -= num;

-        num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
-        ret = bdrv_is_allocated(bs, offset, num_checked, &num);
+        ret = bdrv_is_allocated(bs, offset, bytes, &num);
         if (ret == firstret && num) {
             *pnum += num;
         } else {
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/14] qemu-io: Allow larger write zeroes under no fallback
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
  2021-12-03 23:15   ` [PATCH 01/14] nbd/server: Minor cleanups Eric Blake
  2021-12-03 23:15   ` [PATCH 02/14] qemu-io: Utilize 64-bit status during map Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-06 12:26     ` Vladimir Sementsov-Ogievskiy
  2021-12-03 23:15   ` [PATCH 04/14] nbd/client: Add safety check on chunk payload length Eric Blake
                     ` (10 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, nbd, nsoffer, Hanna Reitz,
	libguestfs

When writing zeroes can fall back to a slow write, permitting an
overly large request can become an amplification denial of service
attack in triggering a large amount of work from a small request.  But
the whole point of the no fallback flag is to quickly determine if
writing an entire device to zero can be done quickly (such as when it
is already known that the device started with zero contents); in those
cases, artificially capping things at 2G in qemu-io itself doesn't
help us.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-io-cmds.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 954955c12fb9..45a957093369 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -603,10 +603,6 @@ static int do_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
         .done   = false,
     };

-    if (bytes > INT_MAX) {
-        return -ERANGE;
-    }
-
     co = qemu_coroutine_create(co_pwrite_zeroes_entry, &data);
     bdrv_coroutine_enter(blk_bs(blk), co);
     while (!data.done) {
@@ -1160,8 +1156,9 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
     if (count < 0) {
         print_cvtnum_err(count, argv[optind]);
         return count;
-    } else if (count > BDRV_REQUEST_MAX_BYTES) {
-        printf("length cannot exceed %" PRIu64 ", given %s\n",
+    } else if (count > BDRV_REQUEST_MAX_BYTES &&
+               !(flags & BDRV_REQ_NO_FALLBACK)) {
+        printf("length cannot exceed %" PRIu64 " without -n, given %s\n",
                (uint64_t)BDRV_REQUEST_MAX_BYTES, argv[optind]);
         return -EINVAL;
     }
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/14] nbd/client: Add safety check on chunk payload length
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (2 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 03/14] qemu-io: Allow larger write zeroes under no fallback Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-06 12:33     ` Vladimir Sementsov-Ogievskiy
  2021-12-03 23:15   ` [PATCH 05/14] nbd/server: Prepare for alternate-size headers Eric Blake
                     ` (9 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: nsoffer, vsementsov, libguestfs, qemu-block, nbd

Our existing use of structured replies either reads into a qiov capped
at 32M (NBD_CMD_READ) or caps allocation to 1000 bytes (see
NBD_MAX_MALLOC_PAYLOAD in block/nbd.c).  But the existing length
checks are rather late; if we encounter a buggy (or malicious) server
that sends a super-large payload length, we should drop the connection
right then rather than assuming the layer on top will be careful.
This becomes more important when we permit 64-bit lengths which are
even more likely to have the potential for attempted denial of service
abuse.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/nbd/client.c b/nbd/client.c
index 30d5383cb195..8f137c2320bb 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1412,6 +1412,18 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
     chunk->handle = be64_to_cpu(chunk->handle);
     chunk->length = be32_to_cpu(chunk->length);

+    /*
+     * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests
+     * at 32M, no valid server should send us payload larger than
+     * this.  Even if we stopped using REQ_ONE, sane servers will cap
+     * the number of extents they return for block status.
+     */
+    if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
+        error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long",
+                   chunk->type, nbd_rep_lookup(chunk->type));
+        return -EINVAL;
+    }
+
     return 0;
 }

-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/14] nbd/server: Prepare for alternate-size headers
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (3 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 04/14] nbd/client: Add safety check on chunk payload length Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-03 23:15   ` [PATCH 06/14] nbd: Prepare for 64-bit requests Eric Blake
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, nbd, nsoffer, Hanna Reitz,
	libguestfs

An upcoming NBD extension wants to add the ability to do 64-bit
requests.  As part of that extension, the size of the reply headers
will change in order to permit a 64-bit length in the reply for
symmetry [*].  Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, both reply type headers will be 32 bytes.  Since we
are already wired up to use iovecs, it is easiest to allow for this
change in header size by splitting each structured reply across two
iovecs, one for the header (which will become variable-length in a
future patch according to client negotiation), and the other for the
payload, and removing the header from the payload struct definitions.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.

[*] Note that on the surface, this is because some server might permit
a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction.  But even though the extended reply length is widened to
64 bits, we will still never send a reply payload larger than just
over 32M (the maximum buffer we allow in NBD_CMD_READ; and we cap the
number of extents we are willing to report in NBD_CMD_BLOCK_STATUS).
Where 64-bit fields really matter in the extension is in a later patch
adding 64-bit support into a counterpart for REPLY_TYPE_BLOCK_STATUS.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h | 10 +++----
 nbd/server.c        | 64 ++++++++++++++++++++++++++++-----------------
 2 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 78d101b77488..3d0689b69367 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2020 Red Hat, Inc.
+ *  Copyright (C) 2016-2021 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -95,28 +95,28 @@ typedef union NBDReply {

 /* Header of chunk for NBD_REPLY_TYPE_OFFSET_DATA */
 typedef struct NBDStructuredReadData {
-    NBDStructuredReplyChunk h; /* h.length >= 9 */
+    /* header's .length >= 9 */
     uint64_t offset;
     /* At least one byte of data payload follows, calculated from h.length */
 } QEMU_PACKED NBDStructuredReadData;

 /* Complete chunk for NBD_REPLY_TYPE_OFFSET_HOLE */
 typedef struct NBDStructuredReadHole {
-    NBDStructuredReplyChunk h; /* h.length == 12 */
+    /* header's length == 12 */
     uint64_t offset;
     uint32_t length;
 } QEMU_PACKED NBDStructuredReadHole;

 /* Header of all NBD_REPLY_TYPE_ERROR* errors */
 typedef struct NBDStructuredError {
-    NBDStructuredReplyChunk h; /* h.length >= 6 */
+    /* header's length >= 6 */
     uint32_t error;
     uint16_t message_length;
 } QEMU_PACKED NBDStructuredError;

 /* Header of NBD_REPLY_TYPE_BLOCK_STATUS */
 typedef struct NBDStructuredMeta {
-    NBDStructuredReplyChunk h; /* h.length >= 12 (at least one extent) */
+    /* header's length >= 12 (at least one extent) */
     uint32_t context_id;
     /* extents follows */
 } QEMU_PACKED NBDStructuredMeta;
diff --git a/nbd/server.c b/nbd/server.c
index f302e1cbb03e..64845542fd6b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1869,9 +1869,12 @@ static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
     return ret;
 }

-static inline void set_be_simple_reply(NBDSimpleReply *reply, uint64_t error,
-                                       uint64_t handle)
+static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov,
+                                       uint64_t error, uint64_t handle)
 {
+    NBDSimpleReply *reply = iov->iov_base;
+
+    iov->iov_len = sizeof(*reply);
     stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
     stl_be_p(&reply->error, error);
     stq_be_p(&reply->handle, handle);
@@ -1884,23 +1887,27 @@ static int nbd_co_send_simple_reply(NBDClient *client,
                                     size_t len,
                                     Error **errp)
 {
-    NBDSimpleReply reply;
+    NBDReply hdr;
     int nbd_err = system_errno_to_nbd_errno(error);
     struct iovec iov[] = {
-        {.iov_base = &reply, .iov_len = sizeof(reply)},
+        {.iov_base = &hdr},
         {.iov_base = data, .iov_len = len}
     };

     trace_nbd_co_send_simple_reply(handle, nbd_err, nbd_err_lookup(nbd_err),
                                    len);
-    set_be_simple_reply(&reply, nbd_err, handle);
+    set_be_simple_reply(client, &iov[0], nbd_err, handle);

     return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
 }

-static inline void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
-                                uint16_t type, uint64_t handle, uint32_t length)
+static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
+                                uint16_t flags, uint16_t type,
+                                uint64_t handle, uint32_t length)
 {
+    NBDStructuredReplyChunk *chunk = iov->iov_base;
+
+    iov->iov_len = sizeof(*chunk);
     stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
     stw_be_p(&chunk->flags, flags);
     stw_be_p(&chunk->type, type);
@@ -1912,13 +1919,14 @@ static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
                                                     uint64_t handle,
                                                     Error **errp)
 {
-    NBDStructuredReplyChunk chunk;
+    NBDReply hdr;
     struct iovec iov[] = {
-        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+        {.iov_base = &hdr},
     };

     trace_nbd_co_send_structured_done(handle);
-    set_be_chunk(&chunk, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE, handle, 0);
+    set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE,
+                 NBD_REPLY_TYPE_NONE, handle, 0);

     return nbd_co_send_iov(client, iov, 1, errp);
 }
@@ -1931,20 +1939,21 @@ static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
                                                     bool final,
                                                     Error **errp)
 {
+    NBDReply hdr;
     NBDStructuredReadData chunk;
     struct iovec iov[] = {
+        {.iov_base = &hdr},
         {.iov_base = &chunk, .iov_len = sizeof(chunk)},
         {.iov_base = data, .iov_len = size}
     };

     assert(size);
     trace_nbd_co_send_structured_read(handle, offset, data, size);
-    set_be_chunk(&chunk.h, final ? NBD_REPLY_FLAG_DONE : 0,
-                 NBD_REPLY_TYPE_OFFSET_DATA, handle,
-                 sizeof(chunk) - sizeof(chunk.h) + size);
+    set_be_chunk(client, &iov[0], final ? NBD_REPLY_FLAG_DONE : 0,
+                 NBD_REPLY_TYPE_OFFSET_DATA, handle, iov[1].iov_len + size);
     stq_be_p(&chunk.offset, offset);

-    return nbd_co_send_iov(client, iov, 2, errp);
+    return nbd_co_send_iov(client, iov, 3, errp);
 }

 static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
@@ -1953,9 +1962,11 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
                                                      const char *msg,
                                                      Error **errp)
 {
+    NBDReply hdr;
     NBDStructuredError chunk;
     int nbd_err = system_errno_to_nbd_errno(error);
     struct iovec iov[] = {
+        {.iov_base = &hdr},
         {.iov_base = &chunk, .iov_len = sizeof(chunk)},
         {.iov_base = (char *)msg, .iov_len = msg ? strlen(msg) : 0},
     };
@@ -1963,12 +1974,12 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
     assert(nbd_err);
     trace_nbd_co_send_structured_error(handle, nbd_err,
                                        nbd_err_lookup(nbd_err), msg ? msg : "");
-    set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_ERROR, handle,
-                 sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len);
+    set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE,
+                 NBD_REPLY_TYPE_ERROR, handle, iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.error, nbd_err);
-    stw_be_p(&chunk.message_length, iov[1].iov_len);
+    stw_be_p(&chunk.message_length, iov[2].iov_len);

-    return nbd_co_send_iov(client, iov, 1 + !!iov[1].iov_len, errp);
+    return nbd_co_send_iov(client, iov, 2 + !!iov[1].iov_len, errp);
 }

 /* Do a sparse read and send the structured reply to the client.
@@ -2006,19 +2017,22 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
         assert(pnum && pnum <= size - progress);
         final = progress + pnum == size;
         if (status & BDRV_BLOCK_ZERO) {
+            NBDReply hdr;
             NBDStructuredReadHole chunk;
             struct iovec iov[] = {
+                {.iov_base = &hdr},
                 {.iov_base = &chunk, .iov_len = sizeof(chunk)},
             };

             trace_nbd_co_send_structured_read_hole(handle, offset + progress,
                                                    pnum);
-            set_be_chunk(&chunk.h, final ? NBD_REPLY_FLAG_DONE : 0,
+            set_be_chunk(client, &iov[0],
+                         final ? NBD_REPLY_FLAG_DONE : 0,
                          NBD_REPLY_TYPE_OFFSET_HOLE,
-                         handle, sizeof(chunk) - sizeof(chunk.h));
+                         handle, iov[1].iov_len);
             stq_be_p(&chunk.offset, offset + progress);
             stl_be_p(&chunk.length, pnum);
-            ret = nbd_co_send_iov(client, iov, 1, errp);
+            ret = nbd_co_send_iov(client, iov, 2, errp);
         } else {
             ret = blk_pread(exp->common.blk, offset + progress,
                             data + progress, pnum);
@@ -2182,8 +2196,10 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
                                NBDExtentArray *ea,
                                bool last, uint32_t context_id, Error **errp)
 {
+    NBDReply hdr;
     NBDStructuredMeta chunk;
     struct iovec iov[] = {
+        {.iov_base = &hdr},
         {.iov_base = &chunk, .iov_len = sizeof(chunk)},
         {.iov_base = ea->extents, .iov_len = ea->count * sizeof(ea->extents[0])}
     };
@@ -2192,12 +2208,12 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle,

     trace_nbd_co_send_extents(handle, ea->count, context_id, ea->total_length,
                               last);
-    set_be_chunk(&chunk.h, last ? NBD_REPLY_FLAG_DONE : 0,
+    set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0,
                  NBD_REPLY_TYPE_BLOCK_STATUS,
-                 handle, sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len);
+                 handle, iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.context_id, context_id);

-    return nbd_co_send_iov(client, iov, 2, errp);
+    return nbd_co_send_iov(client, iov, 3, errp);
 }

 /* Get block status from the exported device and send it to the client */
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/14] nbd: Prepare for 64-bit requests
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (4 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 05/14] nbd/server: Prepare for alternate-size headers Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-03 23:15   ` [PATCH 07/14] nbd: Add types for extended headers Eric Blake
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, nbd, nsoffer, Hanna Reitz,
	libguestfs

Widen the length field of NBDRequest to 64-bits, although we can
assert that all current uses are still under 32 bits.  Move the
request magic number to nbd.h, to live alongside the reply magic
number.  Add a bool that will eventually track whether the client
successfully negotiated extended headers with the server, allowing the
nbd driver to pass larger requests along where possible; although in
this patch it always remains false for no semantic change yet.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h | 19 +++++++++++--------
 nbd/nbd-internal.h  |  3 +--
 block/nbd.c         | 35 ++++++++++++++++++++++++-----------
 nbd/client.c        |  8 +++++---
 nbd/server.c        | 11 ++++++++---
 nbd/trace-events    |  8 ++++----
 6 files changed, 53 insertions(+), 31 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3d0689b69367..732314aaba11 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -52,17 +52,16 @@ typedef struct NBDOptionReplyMetaContext {

 /* Transmission phase structs
  *
- * Note: these are _NOT_ the same as the network representation of an NBD
- * request and reply!
+ * Note: NBDRequest is _NOT_ the same as the network representation of an NBD
+ * request!
  */
-struct NBDRequest {
+typedef struct NBDRequest {
     uint64_t handle;
     uint64_t from;
-    uint32_t len;
+    uint64_t len;   /* Must fit 32 bits unless extended headers negotiated */
     uint16_t flags; /* NBD_CMD_FLAG_* */
-    uint16_t type; /* NBD_CMD_* */
-};
-typedef struct NBDRequest NBDRequest;
+    uint16_t type;  /* NBD_CMD_* */
+} NBDRequest;

 typedef struct NBDSimpleReply {
     uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
@@ -235,6 +234,9 @@ enum {
  */
 #define NBD_MAX_STRING_SIZE 4096

+/* Transmission request structure */
+#define NBD_REQUEST_MAGIC           0x25609513
+
 /* Two types of reply structures */
 #define NBD_SIMPLE_REPLY_MAGIC      0x67446698
 #define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
@@ -293,6 +295,7 @@ struct NBDExportInfo {
     /* In-out fields, set by client before nbd_receive_negotiate() and
      * updated by server results during nbd_receive_negotiate() */
     bool structured_reply;
+    bool extended_headers;
     bool base_allocation; /* base:allocation context for NBD_CMD_BLOCK_STATUS */

     /* Set by server results during nbd_receive_negotiate() and
@@ -322,7 +325,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
                             Error **errp);
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
-int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
+int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr);
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
                                    NBDReply *reply, Error **errp);
 int nbd_client(int fd);
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 1b2141ab4b2d..0016793ff4b1 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -1,7 +1,7 @@
 /*
  * NBD Internal Declarations
  *
- * Copyright (C) 2016 Red Hat, Inc.
+ * Copyright (C) 2016-2021 Red Hat, Inc.
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -45,7 +45,6 @@
 #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124)

 #define NBD_INIT_MAGIC              0x4e42444d41474943LL /* ASCII "NBDMAGIC" */
-#define NBD_REQUEST_MAGIC           0x25609513
 #define NBD_OPTS_MAGIC              0x49484156454F5054LL /* ASCII "IHAVEOPT" */
 #define NBD_CLIENT_MAGIC            0x0000420281861253LL
 #define NBD_REP_MAGIC               0x0003e889045565a9LL
diff --git a/block/nbd.c b/block/nbd.c
index 5ef462db1b7f..3e9875241bec 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -2,7 +2,7 @@
  * QEMU Block driver for  NBD
  *
  * Copyright (c) 2019 Virtuozzo International GmbH.
- * Copyright (C) 2016 Red Hat, Inc.
+ * Copyright (C) 2016-2021 Red Hat, Inc.
  * Copyright (C) 2008 Bull S.A.S.
  *     Author: Laurent Vivier <Laurent.Vivier@bull.net>
  *
@@ -300,7 +300,7 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
          */
         NBDRequest request = { .type = NBD_CMD_DISC };

-        nbd_send_request(s->ioc, &request);
+        nbd_send_request(s->ioc, &request, s->info.extended_headers);

         yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
                                  nbd_yank, bs);
@@ -470,7 +470,7 @@ static int nbd_co_send_request(BlockDriverState *bs,

     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
-        rc = nbd_send_request(s->ioc, request);
+        rc = nbd_send_request(s->ioc, request, s->info.extended_headers);
         if (nbd_client_connected(s) && rc >= 0) {
             if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                        NULL) < 0) {
@@ -481,7 +481,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
         }
         qio_channel_set_cork(s->ioc, false);
     } else {
-        rc = nbd_send_request(s->ioc, request);
+        rc = nbd_send_request(s->ioc, request, s->info.extended_headers);
     }

 err:
@@ -1252,10 +1252,11 @@ static int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
     NBDRequest request = {
         .type = NBD_CMD_WRITE_ZEROES,
         .from = offset,
-        .len = bytes,  /* .len is uint32_t actually */
+        .len = bytes,
     };

-    assert(bytes <= UINT32_MAX); /* rely on max_pwrite_zeroes */
+    /* rely on max_pwrite_zeroes */
+    assert(bytes <= UINT32_MAX || s->info.extended_headers);

     assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
     if (!(s->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
@@ -1302,10 +1303,11 @@ static int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset,
     NBDRequest request = {
         .type = NBD_CMD_TRIM,
         .from = offset,
-        .len = bytes, /* len is uint32_t */
+        .len = bytes,
     };

-    assert(bytes <= UINT32_MAX); /* rely on max_pdiscard */
+    /* rely on max_pdiscard */
+    assert(bytes <= UINT32_MAX || s->info.extended_headers);

     assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
     if (!(s->info.flags & NBD_FLAG_SEND_TRIM) || !bytes) {
@@ -1327,8 +1329,7 @@ static int coroutine_fn nbd_client_co_block_status(
     NBDRequest request = {
         .type = NBD_CMD_BLOCK_STATUS,
         .from = offset,
-        .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
-                   MIN(bytes, s->info.size - offset)),
+        .len = MIN(bytes, s->info.size - offset),
         .flags = NBD_CMD_FLAG_REQ_ONE,
     };

@@ -1338,6 +1339,10 @@ static int coroutine_fn nbd_client_co_block_status(
         *file = bs;
         return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
     }
+    if (!s->info.extended_headers) {
+        request.len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
+                          request.len);
+    }

     /*
      * Work around the fact that the block layer doesn't do
@@ -1415,7 +1420,7 @@ static void nbd_client_close(BlockDriverState *bs)
     NBDRequest request = { .type = NBD_CMD_DISC };

     if (s->ioc) {
-        nbd_send_request(s->ioc, &request);
+        nbd_send_request(s->ioc, &request, s->info.extended_headers);
     }

     nbd_teardown_connection(bs);
@@ -1877,6 +1882,14 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
     bs->bl.max_pwrite_zeroes = max;
     bs->bl.max_transfer = max;

+    /*
+     * Assume that if the server supports extended headers, it also
+     * supports unlimited size zero and trim commands.
+     */
+    if (s->info.extended_headers) {
+        bs->bl.max_pdiscard = bs->bl.max_pwrite_zeroes = 0;
+    }
+
     if (s->info.opt_block &&
         s->info.opt_block > bs->bl.opt_transfer) {
         bs->bl.opt_transfer = s->info.opt_block;
diff --git a/nbd/client.c b/nbd/client.c
index 8f137c2320bb..aa162b9d08d5 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2019 Red Hat, Inc.
+ *  Copyright (C) 2016-2021 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Client Side
@@ -1221,7 +1221,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
         if (nbd_drop(ioc, 124, NULL) == 0) {
             NBDRequest request = { .type = NBD_CMD_DISC };

-            nbd_send_request(ioc, &request);
+            nbd_send_request(ioc, &request, false);
         }
         break;
     default:
@@ -1345,10 +1345,12 @@ int nbd_disconnect(int fd)

 #endif /* __linux__ */

-int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
+int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr)
 {
     uint8_t buf[NBD_REQUEST_SIZE];

+    assert(!ext_hdr);
+    assert(request->len <= UINT32_MAX);
     trace_nbd_send_request(request->from, request->len, request->handle,
                            request->flags, request->type,
                            nbd_cmd_lookup(request->type));
diff --git a/nbd/server.c b/nbd/server.c
index 64845542fd6b..4306fa7b426c 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1436,7 +1436,7 @@ static int nbd_receive_request(NBDClient *client, NBDRequest *request,
     request->type   = lduw_be_p(buf + 6);
     request->handle = ldq_be_p(buf + 8);
     request->from   = ldq_be_p(buf + 16);
-    request->len    = ldl_be_p(buf + 24);
+    request->len    = ldl_be_p(buf + 24); /* widen 32 to 64 bits */

     trace_nbd_receive_request(magic, request->flags, request->type,
                               request->from, request->len);
@@ -2324,7 +2324,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
         request->type == NBD_CMD_CACHE)
     {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
-            error_setg(errp, "len (%" PRIu32" ) is larger than max len (%u)",
+            error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
                        request->len, NBD_MAX_BUFFER_SIZE);
             return -EINVAL;
         }
@@ -2340,6 +2340,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
     }

     if (request->type == NBD_CMD_WRITE) {
+        assert(request->len <= UINT32_MAX);
         if (nbd_read(client->ioc, req->data, request->len, "CMD_WRITE data",
                      errp) < 0)
         {
@@ -2361,7 +2362,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
     }
     if (request->from > client->exp->size ||
         request->len > client->exp->size - request->from) {
-        error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu32
+        error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu64
                    ", Size: %" PRIu64, request->from, request->len,
                    client->exp->size);
         return (request->type == NBD_CMD_WRITE ||
@@ -2424,6 +2425,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request,
     NBDExport *exp = client->exp;

     assert(request->type == NBD_CMD_READ);
+    assert(request->len <= INT_MAX);

     /* XXX: NBD Protocol only documents use of FUA with WRITE */
     if (request->flags & NBD_CMD_FLAG_FUA) {
@@ -2475,6 +2477,7 @@ static coroutine_fn int nbd_do_cmd_cache(NBDClient *client, NBDRequest *request,
     NBDExport *exp = client->exp;

     assert(request->type == NBD_CMD_CACHE);
+    assert(request->len <= INT_MAX);

     ret = blk_co_preadv(exp->common.blk, request->from, request->len,
                         NULL, BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);
@@ -2508,6 +2511,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         if (request->flags & NBD_CMD_FLAG_FUA) {
             flags |= BDRV_REQ_FUA;
         }
+        assert(request->len <= INT_MAX);
         ret = blk_pwrite(exp->common.blk, request->from, data, request->len,
                          flags);
         return nbd_send_generic_reply(client, request->handle, ret,
@@ -2551,6 +2555,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             return nbd_send_generic_reply(client, request->handle, -EINVAL,
                                           "need non-zero length", errp);
         }
+        assert(request->len <= UINT32_MAX);
         if (client->export_meta.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
             int contexts_remaining = client->export_meta.count;
diff --git a/nbd/trace-events b/nbd/trace-events
index c4919a2dd581..d18da8b0b743 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -31,7 +31,7 @@ nbd_client_loop(void) "Doing NBD loop"
 nbd_client_loop_ret(int ret, const char *error) "NBD loop returned %d: %s"
 nbd_client_clear_queue(void) "Clearing NBD queue"
 nbd_client_clear_socket(void) "Clearing NBD socket"
-nbd_send_request(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
+nbd_send_request(uint64_t from, uint64_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu64 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
 nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t handle) "Got simple reply: { .error = %" PRId32 " (%s), handle = %" PRIu64" }"
 nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), handle = %" PRIu64 ", length = %" PRIu32 " }"

@@ -60,7 +60,7 @@ nbd_negotiate_options_check_option(uint32_t option, const char *name) "Checking
 nbd_negotiate_begin(void) "Beginning negotiation"
 nbd_negotiate_new_style_size_flags(uint64_t size, unsigned flags) "advertising size %" PRIu64 " and flags 0x%x"
 nbd_negotiate_success(void) "Negotiation succeeded"
-nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t from, uint32_t len) "Got request: { magic = 0x%" PRIx32 ", .flags = 0x%" PRIx16 ", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len = %" PRIu32 " }"
+nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t from, uint64_t len) "Got request: { magic = 0x%" PRIx32 ", .flags = 0x%" PRIx16 ", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len = %" PRIu64 " }"
 nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching clients to AIO context %p"
 nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching clients from AIO context %p"
 nbd_co_send_simple_reply(uint64_t handle, uint32_t error, const char *errname, int len) "Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 " (%s), len = %d"
@@ -70,6 +70,6 @@ nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset, size_t size)
 nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: handle = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
 nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'"
 nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
-nbd_co_receive_request_payload_received(uint64_t handle, uint32_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu32
-nbd_co_receive_align_compliance(const char *op, uint64_t from, uint32_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx32 ", align=0x%" PRIx32
+nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64
+nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32
 nbd_trip(void) "Reading request"
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/14] nbd: Add types for extended headers
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (5 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 06/14] nbd: Prepare for 64-bit requests Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-03 23:15   ` [PATCH 08/14] nbd/server: Initial support " Eric Blake
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, nbd, nsoffer, Hanna Reitz,
	libguestfs

Add the constants and structs necessary for later patches to start
implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client
and server.  This patch does not change any existing behavior, but
merely sets the stage.

This patch does not change the status quo that neither the client nor
server use a packed-struct representation for the request header.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 docs/interop/nbd.txt |  1 +
 include/block/nbd.h  | 67 +++++++++++++++++++++++++++++++++++---------
 nbd/common.c         | 10 +++++--
 3 files changed, 62 insertions(+), 16 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index bdb0f2a41aca..6229ea573c04 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -68,3 +68,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
+* 7.0: NBD_OPT_EXTENDED_HEADERS
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 732314aaba11..5f9d86a86352 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -69,6 +69,14 @@ typedef struct NBDSimpleReply {
     uint64_t handle;
 } QEMU_PACKED NBDSimpleReply;

+typedef struct NBDSimpleReplyExt {
+    uint32_t magic;  /* NBD_SIMPLE_REPLY_EXT_MAGIC */
+    uint32_t error;
+    uint64_t handle;
+    uint64_t _pad1;  /* Must be 0 */
+    uint64_t _pad2;  /* Must be 0 */
+} QEMU_PACKED NBDSimpleReplyExt;
+
 /* Header of all structured replies */
 typedef struct NBDStructuredReplyChunk {
     uint32_t magic;  /* NBD_STRUCTURED_REPLY_MAGIC */
@@ -78,9 +86,20 @@ typedef struct NBDStructuredReplyChunk {
     uint32_t length; /* length of payload */
 } QEMU_PACKED NBDStructuredReplyChunk;

+typedef struct NBDStructuredReplyChunkExt {
+    uint32_t magic;  /* NBD_STRUCTURED_REPLY_EXT_MAGIC */
+    uint16_t flags;  /* combination of NBD_REPLY_FLAG_* */
+    uint16_t type;   /* NBD_REPLY_TYPE_* */
+    uint64_t handle; /* request handle */
+    uint64_t length; /* length of payload */
+    uint64_t _pad;   /* Must be 0 */
+} QEMU_PACKED NBDStructuredReplyChunkExt;
+
 typedef union NBDReply {
     NBDSimpleReply simple;
+    NBDSimpleReplyExt simple_ext;
     NBDStructuredReplyChunk structured;
+    NBDStructuredReplyChunkExt structured_ext;
     struct {
         /* @magic and @handle fields have the same offset and size both in
          * simple reply and structured reply chunk, so let them be accessible
@@ -106,6 +125,13 @@ typedef struct NBDStructuredReadHole {
     uint32_t length;
 } QEMU_PACKED NBDStructuredReadHole;

+/* Complete chunk for NBD_REPLY_TYPE_OFFSET_HOLE_EXT */
+typedef struct NBDStructuredReadHoleExt {
+    /* header's length == 16 */
+    uint64_t offset;
+    uint64_t length;
+} QEMU_PACKED NBDStructuredReadHoleExt;
+
 /* Header of all NBD_REPLY_TYPE_ERROR* errors */
 typedef struct NBDStructuredError {
     /* header's length >= 6 */
@@ -113,19 +139,26 @@ typedef struct NBDStructuredError {
     uint16_t message_length;
 } QEMU_PACKED NBDStructuredError;

-/* Header of NBD_REPLY_TYPE_BLOCK_STATUS */
+/* Header of NBD_REPLY_TYPE_BLOCK_STATUS, NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
 typedef struct NBDStructuredMeta {
-    /* header's length >= 12 (at least one extent) */
+    /* header's length >= 12 narrow, or >= 20 extended (at least one extent) */
     uint32_t context_id;
-    /* extents follows */
+    /* extents[] follows: NBDExtent for narrow, NBDExtentExt for extended */
 } QEMU_PACKED NBDStructuredMeta;

-/* Extent chunk for NBD_REPLY_TYPE_BLOCK_STATUS */
+/* Extent array for NBD_REPLY_TYPE_BLOCK_STATUS */
 typedef struct NBDExtent {
     uint32_t length;
     uint32_t flags; /* NBD_STATE_* */
 } QEMU_PACKED NBDExtent;

+/* Extent array for NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
+typedef struct NBDExtentExt {
+    uint64_t length;
+    uint32_t flags; /* NBD_STATE_* */
+    uint32_t _pad;  /* Must be 0 */
+} QEMU_PACKED NBDExtentExt;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 enum {
@@ -178,6 +211,7 @@ enum {
 #define NBD_OPT_STRUCTURED_REPLY  (8)
 #define NBD_OPT_LIST_META_CONTEXT (9)
 #define NBD_OPT_SET_META_CONTEXT  (10)
+#define NBD_OPT_EXTENDED_HEADERS  (11)

 /* Option reply types. */
 #define NBD_REP_ERR(value) ((UINT32_C(1) << 31) | (value))
@@ -234,12 +268,15 @@ enum {
  */
 #define NBD_MAX_STRING_SIZE 4096

-/* Transmission request structure */
+/* Two types of request structures, a given client will only use 1 */
 #define NBD_REQUEST_MAGIC           0x25609513
+#define NBD_REQUEST_EXT_MAGIC       0x21e41c71

-/* Two types of reply structures */
-#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
-#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
+/* Four types of reply structures, a given client will only use 2 */
+#define NBD_SIMPLE_REPLY_MAGIC          0x67446698
+#define NBD_STRUCTURED_REPLY_MAGIC      0x668e33ef
+#define NBD_SIMPLE_REPLY_EXT_MAGIC      0x60d12fd6
+#define NBD_STRUCTURED_REPLY_EXT_MAGIC  0x6e8a278c

 /* Structured reply flags */
 #define NBD_REPLY_FLAG_DONE          (1 << 0) /* This reply-chunk is last */
@@ -247,12 +284,14 @@ enum {
 /* Structured reply types */
 #define NBD_REPLY_ERR(value)         ((1 << 15) | (value))

-#define NBD_REPLY_TYPE_NONE          0
-#define NBD_REPLY_TYPE_OFFSET_DATA   1
-#define NBD_REPLY_TYPE_OFFSET_HOLE   2
-#define NBD_REPLY_TYPE_BLOCK_STATUS  5
-#define NBD_REPLY_TYPE_ERROR         NBD_REPLY_ERR(1)
-#define NBD_REPLY_TYPE_ERROR_OFFSET  NBD_REPLY_ERR(2)
+#define NBD_REPLY_TYPE_NONE              0
+#define NBD_REPLY_TYPE_OFFSET_DATA       1
+#define NBD_REPLY_TYPE_OFFSET_HOLE       2
+#define NBD_REPLY_TYPE_OFFSET_HOLE_EXT   3
+#define NBD_REPLY_TYPE_BLOCK_STATUS      5
+#define NBD_REPLY_TYPE_BLOCK_STATUS_EXT  6
+#define NBD_REPLY_TYPE_ERROR             NBD_REPLY_ERR(1)
+#define NBD_REPLY_TYPE_ERROR_OFFSET      NBD_REPLY_ERR(2)

 /* Extent flags for base:allocation in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_HOLE (1 << 0)
diff --git a/nbd/common.c b/nbd/common.c
index ddfe7d118371..08eaed4cb3c2 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -79,6 +79,8 @@ const char *nbd_opt_lookup(uint32_t opt)
         return "list meta context";
     case NBD_OPT_SET_META_CONTEXT:
         return "set meta context";
+    case NBD_OPT_EXTENDED_HEADERS:
+        return "extended headers";
     default:
         return "<unknown>";
     }
@@ -168,9 +170,13 @@ const char *nbd_reply_type_lookup(uint16_t type)
     case NBD_REPLY_TYPE_OFFSET_DATA:
         return "data";
     case NBD_REPLY_TYPE_OFFSET_HOLE:
-        return "hole";
+        return "hole (32-bit)";
+    case NBD_REPLY_TYPE_OFFSET_HOLE_EXT:
+        return "hole (64-bit)";
     case NBD_REPLY_TYPE_BLOCK_STATUS:
-        return "block status";
+        return "block status (32-bit)";
+    case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
+        return "block status (64-bit)";
     case NBD_REPLY_TYPE_ERROR:
         return "generic error";
     case NBD_REPLY_TYPE_ERROR_OFFSET:
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/14] nbd/server: Initial support for extended headers
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (6 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 07/14] nbd: Add types for extended headers Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-03 23:15   ` [PATCH 09/14] nbd/server: Support 64-bit block status Eric Blake
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: nsoffer, vsementsov, libguestfs, qemu-block, nbd

We have no reason to send NBD_REPLY_TYPE_OFFSET_HOLE_EXT since we
already cap NBD_CMD_READ to 32M.  For NBD_CMD_WRITE_ZEROES and
NBD_CMD_TRIM, the block layer already supports 64-bit operations
without any effort on our part.  For NBD_CMD_BLOCK_STATUS, the
client's length is a hint; the easiest approach is to truncate our
answer back to 32 bits, letting us delay the effort of implementing
NBD_REPLY_TYPE_BLOCK_STATUS_EXT to a separate patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/nbd-internal.h |   5 ++-
 nbd/server.c       | 106 ++++++++++++++++++++++++++++++++++-----------
 2 files changed, 85 insertions(+), 26 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 0016793ff4b1..875b6204c28c 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -35,8 +35,11 @@
  * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

-/* Size of all NBD_OPT_*, without payload */
+/* Size of all compact NBD_CMD_*, without payload */
 #define NBD_REQUEST_SIZE            (4 + 2 + 2 + 8 + 8 + 4)
+/* Size of all extended NBD_CMD_*, without payload */
+#define NBD_REQUEST_EXT_SIZE        (4 + 2 + 2 + 8 + 8 + 8)
+
 /* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
 #define NBD_REPLY_SIZE              (4 + 4 + 8)
 /* Size of reply to NBD_OPT_EXPORT_NAME */
diff --git a/nbd/server.c b/nbd/server.c
index 4306fa7b426c..0e496f60ffbd 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -142,6 +142,7 @@ struct NBDClient {
     uint32_t check_align; /* If non-zero, check for aligned client requests */

     bool structured_reply;
+    bool extended_headers;
     NBDExportMetaContexts export_meta;

     uint32_t opt; /* Current option being negotiated */
@@ -1275,6 +1276,19 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
                                                  errp);
                 break;

+            case NBD_OPT_EXTENDED_HEADERS:
+                if (length) {
+                    ret = nbd_reject_length(client, false, errp);
+                } else if (client->extended_headers) {
+                    ret = nbd_negotiate_send_rep_err(
+                        client, NBD_REP_ERR_INVALID, errp,
+                        "extended headers already negotiated");
+                } else {
+                    ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
+                    client->extended_headers = true;
+                }
+                break;
+
             default:
                 ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
                                    "Unsupported option %" PRIu32 " (%s)",
@@ -1410,11 +1424,13 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
 static int nbd_receive_request(NBDClient *client, NBDRequest *request,
                                Error **errp)
 {
-    uint8_t buf[NBD_REQUEST_SIZE];
-    uint32_t magic;
+    uint8_t buf[NBD_REQUEST_EXT_SIZE];
+    uint32_t magic, expect;
     int ret;
+    size_t size = client->extended_headers ? NBD_REQUEST_EXT_SIZE
+        : NBD_REQUEST_SIZE;

-    ret = nbd_read_eof(client, buf, sizeof(buf), errp);
+    ret = nbd_read_eof(client, buf, size, errp);
     if (ret < 0) {
         return ret;
     }
@@ -1422,13 +1438,21 @@ static int nbd_receive_request(NBDClient *client, NBDRequest *request,
         return -EIO;
     }

-    /* Request
-       [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
-       [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
-       [ 6 ..  7]   type    (NBD_CMD_READ, ...)
-       [ 8 .. 15]   handle
-       [16 .. 23]   from
-       [24 .. 27]   len
+    /*
+     * Compact request
+     *  [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
+     *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+     *  [ 6 ..  7]   type    (NBD_CMD_READ, ...)
+     *  [ 8 .. 15]   handle
+     *  [16 .. 23]   from
+     *  [24 .. 27]   len
+     * Extended request
+     *  [ 0 ..  3]   magic   (NBD_REQUEST_EXT_MAGIC)
+     *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+     *  [ 6 ..  7]   type    (NBD_CMD_READ, ...)
+     *  [ 8 .. 15]   handle
+     *  [16 .. 23]   from
+     *  [24 .. 31]   len
      */

     magic = ldl_be_p(buf);
@@ -1436,12 +1460,18 @@ static int nbd_receive_request(NBDClient *client, NBDRequest *request,
     request->type   = lduw_be_p(buf + 6);
     request->handle = ldq_be_p(buf + 8);
     request->from   = ldq_be_p(buf + 16);
-    request->len    = ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+    if (client->extended_headers) {
+        request->len = ldq_be_p(buf + 24);
+        expect = NBD_REQUEST_EXT_MAGIC;
+    } else {
+        request->len = ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+        expect = NBD_REQUEST_MAGIC;
+    }

     trace_nbd_receive_request(magic, request->flags, request->type,
                               request->from, request->len);

-    if (magic != NBD_REQUEST_MAGIC) {
+    if (magic != expect) {
         error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
@@ -1872,12 +1902,22 @@ static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
 static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov,
                                        uint64_t error, uint64_t handle)
 {
-    NBDSimpleReply *reply = iov->iov_base;
+    if (client->extended_headers) {
+        NBDSimpleReplyExt *reply = iov->iov_base;

-    iov->iov_len = sizeof(*reply);
-    stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
-    stl_be_p(&reply->error, error);
-    stq_be_p(&reply->handle, handle);
+        iov->iov_len = sizeof(*reply);
+        stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_EXT_MAGIC);
+        stl_be_p(&reply->error, error);
+        stq_be_p(&reply->handle, handle);
+        reply->_pad1 = reply->_pad2 = 0;
+    } else {
+        NBDSimpleReply *reply = iov->iov_base;
+
+        iov->iov_len = sizeof(*reply);
+        stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
+        stl_be_p(&reply->error, error);
+        stq_be_p(&reply->handle, handle);
+    }
 }

 static int nbd_co_send_simple_reply(NBDClient *client,
@@ -1905,14 +1945,26 @@ static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
                                 uint16_t flags, uint16_t type,
                                 uint64_t handle, uint32_t length)
 {
-    NBDStructuredReplyChunk *chunk = iov->iov_base;
+    if (client->extended_headers) {
+        NBDStructuredReplyChunkExt *chunk = iov->iov_base;

-    iov->iov_len = sizeof(*chunk);
-    stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
-    stw_be_p(&chunk->flags, flags);
-    stw_be_p(&chunk->type, type);
-    stq_be_p(&chunk->handle, handle);
-    stl_be_p(&chunk->length, length);
+        iov->iov_len = sizeof(*chunk);
+        stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_EXT_MAGIC);
+        stw_be_p(&chunk->flags, flags);
+        stw_be_p(&chunk->type, type);
+        stq_be_p(&chunk->handle, handle);
+        stq_be_p(&chunk->length, length);
+        chunk->_pad = 0;
+    } else {
+        NBDStructuredReplyChunk *chunk = iov->iov_base;
+
+        iov->iov_len = sizeof(*chunk);
+        stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
+        stw_be_p(&chunk->flags, flags);
+        stw_be_p(&chunk->type, type);
+        stq_be_p(&chunk->handle, handle);
+        stl_be_p(&chunk->length, length);
+    }
 }

 static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
@@ -2555,7 +2607,11 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             return nbd_send_generic_reply(client, request->handle, -EINVAL,
                                           "need non-zero length", errp);
         }
-        assert(request->len <= UINT32_MAX);
+        if (request->len > UINT32_MAX) {
+            /* For now, truncate our response to a 32 bit window */
+            request->len = QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
+                                           client->check_align ?: 1);
+        }
         if (client->export_meta.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
             int contexts_remaining = client->export_meta.count;
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/14] nbd/server: Support 64-bit block status
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (7 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 08/14] nbd/server: Initial support " Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-03 23:15   ` [PATCH 10/14] nbd/client: Initial support for extended headers Eric Blake
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: nsoffer, vsementsov, libguestfs, qemu-block, nbd

The previous patch handled extended headers by truncating large block
status requests from the client back to 32 bits.  But this is not
ideal; for cases where we can truly determine the status of the entire
image quickly (for example, when reporting the entire image as
non-sparse because we lack the ability to probe for holes), this
causes more network traffic for the client to iterate through 4G
chunks than for us to just report the entire image at once.  For ease
of implementation, if extended headers were negotiated, then we always
reply with 64-bit block status replies, even when the result could
have fit in the older 32-bit block status chunk (clients supporting
extended headers have to be prepared for either chunk type, so
temporarily reverting this patch proves whether a client is
compliant).

Note that we previously had some interesting size-juggling on call
chains, such as:

nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extends(uint32_t bytes)
  -> bdrv_block_status_above(bytes, &uint64_t num)
  -> nbd_extent_array_add(uint64_t num)
    -> store num in 32-bit length

But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (either by the protocol, or in the previous patch
with an explicit truncation).  This patch adds some assertions that
ensure we continue to avoid overflowing 32 bits for a narrow client,
while fully utilizing 64-bits all the way through when the client
understands that.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 72 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 48 insertions(+), 24 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 0e496f60ffbd..7e6140350797 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2106,20 +2106,26 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
 }

 typedef struct NBDExtentArray {
-    NBDExtent *extents;
+    union {
+        NBDExtent *narrow;
+        NBDExtentExt *extents;
+    };
     unsigned int nb_alloc;
     unsigned int count;
     uint64_t total_length;
+    bool extended; /* Whether 64-bit extents are allowed */
     bool can_add;
     bool converted_to_be;
 } NBDExtentArray;

-static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc)
+static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc,
+                                            bool extended)
 {
     NBDExtentArray *ea = g_new0(NBDExtentArray, 1);

     ea->nb_alloc = nb_alloc;
-    ea->extents = g_new(NBDExtent, nb_alloc);
+    ea->extents = g_new(NBDExtentExt, nb_alloc);
+    ea->extended = extended;
     ea->can_add = true;

     return ea;
@@ -2133,17 +2139,31 @@ static void nbd_extent_array_free(NBDExtentArray *ea)
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(NBDExtentArray, nbd_extent_array_free);

 /* Further modifications of the array after conversion are abandoned */
-static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
+static void nbd_extent_array_convert_to_be(NBDExtentArray *ea,
+                                           struct iovec *iov)
 {
     int i;

     assert(!ea->converted_to_be);
+    assert(iov->iov_base == ea->extents);
     ea->can_add = false;
     ea->converted_to_be = true;

-    for (i = 0; i < ea->count; i++) {
-        ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
-        ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
+    if (ea->extended) {
+        for (i = 0; i < ea->count; i++) {
+            ea->extents[i].length = cpu_to_be64(ea->extents[i].length);
+            ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
+            assert(ea->extents[i]._pad == 0);
+        }
+        iov->iov_len = ea->count * sizeof(ea->extents[0]);
+    } else {
+        /* Conversion reduces memory usage, order of iteration matters */
+        for (i = 0; i < ea->count; i++) {
+            assert(ea->extents[i].length <= UINT32_MAX);
+            ea->narrow[i].length = cpu_to_be32(ea->extents[i].length);
+            ea->narrow[i].flags = cpu_to_be32(ea->extents[i].flags);
+        }
+        iov->iov_len = ea->count * sizeof(ea->narrow[0]);
     }
 }

@@ -2157,19 +2177,23 @@ static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
  * would result in an incorrect range reported to the client)
  */
 static int nbd_extent_array_add(NBDExtentArray *ea,
-                                uint32_t length, uint32_t flags)
+                                uint64_t length, uint32_t flags)
 {
     assert(ea->can_add);

     if (!length) {
         return 0;
     }
+    if (!ea->extended) {
+        assert(length <= UINT32_MAX);
+    }

     /* Extend previous extent if flags are the same */
     if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) {
-        uint64_t sum = (uint64_t)length + ea->extents[ea->count - 1].length;
+        uint64_t sum = length + ea->extents[ea->count - 1].length;

-        if (sum <= UINT32_MAX) {
+        assert(sum >= length);
+        if (sum <= UINT32_MAX || ea->extended) {
             ea->extents[ea->count - 1].length = sum;
             ea->total_length += length;
             return 0;
@@ -2182,7 +2206,7 @@ static int nbd_extent_array_add(NBDExtentArray *ea,
     }

     ea->total_length += length;
-    ea->extents[ea->count] = (NBDExtent) {.length = length, .flags = flags};
+    ea->extents[ea->count] = (NBDExtentExt) {.length = length, .flags = flags};
     ea->count++;

     return 0;
@@ -2253,15 +2277,16 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
     struct iovec iov[] = {
         {.iov_base = &hdr},
         {.iov_base = &chunk, .iov_len = sizeof(chunk)},
-        {.iov_base = ea->extents, .iov_len = ea->count * sizeof(ea->extents[0])}
+        {.iov_base = ea->extents}
     };

-    nbd_extent_array_convert_to_be(ea);
+    nbd_extent_array_convert_to_be(ea, &iov[2]);

     trace_nbd_co_send_extents(handle, ea->count, context_id, ea->total_length,
                               last);
     set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0,
-                 NBD_REPLY_TYPE_BLOCK_STATUS,
+                 client->extended_headers ? NBD_REPLY_TYPE_BLOCK_STATUS_EXT
+                 : NBD_REPLY_TYPE_BLOCK_STATUS,
                  handle, iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.context_id, context_id);

@@ -2271,13 +2296,14 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
 /* Get block status from the exported device and send it to the client */
 static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
                                     BlockDriverState *bs, uint64_t offset,
-                                    uint32_t length, bool dont_fragment,
+                                    uint64_t length, bool dont_fragment,
                                     bool last, uint32_t context_id,
                                     Error **errp)
 {
     int ret;
     unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
-    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
+    g_autoptr(NBDExtentArray) ea =
+        nbd_extent_array_new(nb_extents, client->extended_headers);

     if (context_id == NBD_META_ID_BASE_ALLOCATION) {
         ret = blockstatus_to_extents(bs, offset, length, ea);
@@ -2304,7 +2330,8 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
     bdrv_dirty_bitmap_lock(bitmap);

     for (start = offset;
-         bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end, INT32_MAX,
+         bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end,
+                                           es->extended ? INT64_MAX : INT32_MAX,
                                            &dirty_start, &dirty_count);
          start = dirty_start + dirty_count)
     {
@@ -2326,11 +2353,12 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,

 static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
                               BdrvDirtyBitmap *bitmap, uint64_t offset,
-                              uint32_t length, bool dont_fragment, bool last,
+                              uint64_t length, bool dont_fragment, bool last,
                               uint32_t context_id, Error **errp)
 {
     unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
-    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
+    g_autoptr(NBDExtentArray) ea =
+        nbd_extent_array_new(nb_extents, client->extended_headers);

     bitmap_to_extents(bitmap, offset, length, ea);

@@ -2607,11 +2635,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             return nbd_send_generic_reply(client, request->handle, -EINVAL,
                                           "need non-zero length", errp);
         }
-        if (request->len > UINT32_MAX) {
-            /* For now, truncate our response to a 32 bit window */
-            request->len = QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
-                                           client->check_align ?: 1);
-        }
+        assert(client->extended_headers || request->len <= UINT32_MAX);
         if (client->export_meta.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
             int contexts_remaining = client->export_meta.count;
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/14] nbd/client: Initial support for extended headers
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (8 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 09/14] nbd/server: Support 64-bit block status Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-03 23:15   ` [PATCH 11/14] nbd/client: Accept 64-bit hole chunks Eric Blake
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, nbd, nsoffer, Hanna Reitz,
	libguestfs

Update the client code to be able to send an extended request, and
parse an extended header from the server.  Note that since we reject
any structured reply with a too-large payload, we can always normalize
a valid header back into the compact form, so that the caller need not
deal with two branches of a union.  Still, until a later patch lets
the client negotiate extended headers, the code added here should not
be reached.  Note that because of the different magic numbers, it is
just as easy to trace and then tolerate a non-compliant server sending
the wrong header reply as it would be to insist that the server is
compliant.

The only caller to nbd_receive_reply() always passed NULL for errp;
since we are changing the signature anyways, I decided to sink the
decision to ignore errors one layer lower.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |   2 +-
 block/nbd.c         |   3 +-
 nbd/client.c        | 112 +++++++++++++++++++++++++++++++-------------
 nbd/trace-events    |   1 +
 4 files changed, 84 insertions(+), 34 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 5f9d86a86352..d489c67d98dc 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -366,7 +366,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr);
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp);
+                                   NBDReply *reply, bool ext_hdrs);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd.c b/block/nbd.c
index 3e9875241bec..da5e6ac2d9a5 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -401,7 +401,8 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)

         /* We are under mutex and handle is 0. We have to do the dirty work. */
         assert(s->reply.handle == 0);
-        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, NULL);
+        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply,
+                                s->info.extended_headers);
         if (ret <= 0) {
             ret = ret ? ret : -EIO;
             nbd_channel_error(s, ret);
diff --git a/nbd/client.c b/nbd/client.c
index aa162b9d08d5..f1aa5256c8bf 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1347,22 +1347,28 @@ int nbd_disconnect(int fd)

 int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr)
 {
-    uint8_t buf[NBD_REQUEST_SIZE];
+    uint8_t buf[NBD_REQUEST_EXT_SIZE];
+    size_t len;

-    assert(!ext_hdr);
-    assert(request->len <= UINT32_MAX);
     trace_nbd_send_request(request->from, request->len, request->handle,
                            request->flags, request->type,
                            nbd_cmd_lookup(request->type));

-    stl_be_p(buf, NBD_REQUEST_MAGIC);
+    stl_be_p(buf, ext_hdr ? NBD_REQUEST_EXT_MAGIC : NBD_REQUEST_MAGIC);
     stw_be_p(buf + 4, request->flags);
     stw_be_p(buf + 6, request->type);
     stq_be_p(buf + 8, request->handle);
     stq_be_p(buf + 16, request->from);
-    stl_be_p(buf + 24, request->len);
+    if (ext_hdr) {
+        stq_be_p(buf + 24, request->len);
+        len = NBD_REQUEST_EXT_SIZE;
+    } else {
+        assert(request->len <= UINT32_MAX);
+        stl_be_p(buf + 24, request->len);
+        len = NBD_REQUEST_SIZE;
+    }

-    return nbd_write(ioc, buf, sizeof(buf), NULL);
+    return nbd_write(ioc, buf, len, NULL);
 }

 /* nbd_receive_simple_reply
@@ -1370,49 +1376,69 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr)
  * Payload is not read (payload is possible for CMD_READ, but here we even
  * don't know whether it take place or not).
  */
-static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply,
+static int nbd_receive_simple_reply(QIOChannel *ioc, NBDReply *reply,
                                     Error **errp)
 {
     int ret;
+    size_t len;

-    assert(reply->magic == NBD_SIMPLE_REPLY_MAGIC);
+    if (reply->magic == NBD_SIMPLE_REPLY_MAGIC) {
+        len = sizeof(reply->simple);
+    } else {
+        assert(reply->magic == NBD_SIMPLE_REPLY_EXT_MAGIC);
+        len = sizeof(reply->simple_ext);
+    }

     ret = nbd_read(ioc, (uint8_t *)reply + sizeof(reply->magic),
-                   sizeof(*reply) - sizeof(reply->magic), "reply", errp);
+                   len - sizeof(reply->magic), "reply", errp);
     if (ret < 0) {
         return ret;
     }

-    reply->error = be32_to_cpu(reply->error);
-    reply->handle = be64_to_cpu(reply->handle);
+    /* error and handle occupy same space between forms */
+    reply->simple.error = be32_to_cpu(reply->simple.error);
+    reply->simple.handle = be64_to_cpu(reply->handle);
+    if (reply->magic == NBD_SIMPLE_REPLY_EXT_MAGIC) {
+        if (reply->simple_ext._pad1 || reply->simple_ext._pad2) {
+            error_setg(errp, "Server used non-zero padding in extended header");
+            return -EINVAL;
+        }
+        reply->magic = NBD_SIMPLE_REPLY_MAGIC;
+    }

     return 0;
 }

 /* nbd_receive_structured_reply_chunk
  * Read structured reply chunk except magic field (which should be already
- * read).
+ * read).  Normalize into the compact form.
  * Payload is not read.
  */
-static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
-                                              NBDStructuredReplyChunk *chunk,
+static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *chunk,
                                               Error **errp)
 {
     int ret;
+    size_t len;
+    uint64_t payload_len;

-    assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
+    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+        len = sizeof(chunk->structured);
+    } else {
+        assert(chunk->magic == NBD_STRUCTURED_REPLY_EXT_MAGIC);
+        len = sizeof(chunk->structured_ext);
+    }

     ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
-                   sizeof(*chunk) - sizeof(chunk->magic), "structured chunk",
+                   len - sizeof(chunk->magic), "structured chunk",
                    errp);
     if (ret < 0) {
         return ret;
     }

-    chunk->flags = be16_to_cpu(chunk->flags);
-    chunk->type = be16_to_cpu(chunk->type);
-    chunk->handle = be64_to_cpu(chunk->handle);
-    chunk->length = be32_to_cpu(chunk->length);
+    /* flags, type, and handle occupy same space between forms */
+    chunk->structured.flags = be16_to_cpu(chunk->structured.flags);
+    chunk->structured.type = be16_to_cpu(chunk->structured.type);
+    chunk->structured.handle = be64_to_cpu(chunk->structured.handle);

     /*
      * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests
@@ -1420,11 +1446,23 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
      * this.  Even if we stopped using REQ_ONE, sane servers will cap
      * the number of extents they return for block status.
      */
-    if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
+    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+        payload_len = be32_to_cpu(chunk->structured.length);
+    } else {
+        payload_len = be64_to_cpu(chunk->structured_ext.length);
+        if (chunk->structured_ext._pad) {
+            error_setg(errp, "Server used non-zero padding in extended header");
+            return -EINVAL;
+        }
+        chunk->magic = NBD_STRUCTURED_REPLY_MAGIC;
+    }
+    if (payload_len > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
         error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long",
-                   chunk->type, nbd_rep_lookup(chunk->type));
+                   chunk->structured.type,
+                   nbd_rep_lookup(chunk->structured.type));
         return -EINVAL;
     }
+    chunk->structured.length = payload_len;

     return 0;
 }
@@ -1471,30 +1509,36 @@ nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,

 /* nbd_receive_reply
  *
- * Decreases bs->in_flight while waiting for a new reply. This yield is where
- * we wait indefinitely and the coroutine must be able to be safely reentered
- * for nbd_client_attach_aio_context().
+ * Wait for a new reply. If this yields, the coroutine must be able to be
+ * safely reentered for nbd_client_attach_aio_context().  @ext_hdrs determines
+ * which reply magic we are expecting, although this normalizes the result
+ * so that the caller only has to work with compact headers.
  *
  * Returns 1 on success
- *         0 on eof, when no data was read (errp is not set)
- *         negative errno on failure (errp is set)
+ *         0 on eof, when no data was read
+ *         negative errno on failure
  */
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp)
+                                   NBDReply *reply, bool ext_hdrs)
 {
     int ret;
     const char *type;

-    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp);
+    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), NULL);
     if (ret <= 0) {
         return ret;
     }

     reply->magic = be32_to_cpu(reply->magic);

+    /* Diagnose but accept wrong-width header */
     switch (reply->magic) {
     case NBD_SIMPLE_REPLY_MAGIC:
-        ret = nbd_receive_simple_reply(ioc, &reply->simple, errp);
+    case NBD_SIMPLE_REPLY_EXT_MAGIC:
+        if (ext_hdrs != (reply->magic == NBD_SIMPLE_REPLY_EXT_MAGIC)) {
+            trace_nbd_receive_wrong_header(reply->magic);
+        }
+        ret = nbd_receive_simple_reply(ioc, reply, NULL);
         if (ret < 0) {
             break;
         }
@@ -1503,7 +1547,11 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
                                        reply->handle);
         break;
     case NBD_STRUCTURED_REPLY_MAGIC:
-        ret = nbd_receive_structured_reply_chunk(ioc, &reply->structured, errp);
+    case NBD_STRUCTURED_REPLY_EXT_MAGIC:
+        if (ext_hdrs != (reply->magic == NBD_STRUCTURED_REPLY_EXT_MAGIC)) {
+            trace_nbd_receive_wrong_header(reply->magic);
+        }
+        ret = nbd_receive_structured_reply_chunk(ioc, reply, NULL);
         if (ret < 0) {
             break;
         }
@@ -1514,7 +1562,7 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
                                                  reply->structured.length);
         break;
     default:
-        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", reply->magic);
+        trace_nbd_receive_wrong_header(reply->magic);
         return -EINVAL;
     }
     if (ret < 0) {
diff --git a/nbd/trace-events b/nbd/trace-events
index d18da8b0b743..ad63e565fd6e 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -34,6 +34,7 @@ nbd_client_clear_socket(void) "Clearing NBD socket"
 nbd_send_request(uint64_t from, uint64_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu64 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
 nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t handle) "Got simple reply: { .error = %" PRId32 " (%s), handle = %" PRIu64" }"
 nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), handle = %" PRIu64 ", length = %" PRIu32 " }"
+nbd_receive_wrong_header(uint32_t magic) "Server sent unexpected magic 0x%" PRIx32

 # common.c
 nbd_unknown_error(int err) "Squashing unexpected error %d to EINVAL"
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/14] nbd/client: Accept 64-bit hole chunks
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (9 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 10/14] nbd/client: Initial support for extended headers Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-03 23:15   ` [PATCH 12/14] nbd/client: Accept 64-bit block status chunks Eric Blake
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, nbd, nsoffer, Hanna Reitz,
	libguestfs

Although our read requests are sized such that servers need not send
an extended hole chunk, we still have to be prepared for it to be
fully compliant if we request extended headers.  We can also tolerate
a non-compliant server sending the new chunk even when it should not.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd.c        | 26 ++++++++++++++++++++------
 block/trace-events |  1 +
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index da5e6ac2d9a5..c5dea864ebb6 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -518,20 +518,26 @@ static inline uint64_t payload_advance64(uint8_t **payload)

 static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
                                          NBDStructuredReplyChunk *chunk,
-                                         uint8_t *payload, uint64_t orig_offset,
+                                         uint8_t *payload, bool wide,
+                                         uint64_t orig_offset,
                                          QEMUIOVector *qiov, Error **errp)
 {
     uint64_t offset;
-    uint32_t hole_size;
+    uint64_t hole_size;
+    size_t len = wide ? sizeof(hole_size) : sizeof(uint32_t);

-    if (chunk->length != sizeof(offset) + sizeof(hole_size)) {
+    if (chunk->length != sizeof(offset) + len) {
         error_setg(errp, "Protocol error: invalid payload for "
                          "NBD_REPLY_TYPE_OFFSET_HOLE");
         return -EINVAL;
     }

     offset = payload_advance64(&payload);
-    hole_size = payload_advance32(&payload);
+    if (wide) {
+        hole_size = payload_advance64(&payload);
+    } else {
+        hole_size = payload_advance32(&payload);
+    }

     if (!hole_size || offset < orig_offset || hole_size > qiov->size ||
         offset > orig_offset + qiov->size - hole_size) {
@@ -544,6 +550,7 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
         trace_nbd_structured_read_compliance("hole");
     }

+    assert(hole_size <= SIZE_MAX);
     qemu_iovec_memset(qiov, offset - orig_offset, 0, hole_size);

     return 0;
@@ -1037,9 +1044,16 @@ static int nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t handle,
              * in qiov
              */
             break;
+        case NBD_REPLY_TYPE_OFFSET_HOLE_EXT:
+            if (!s->info.extended_headers) {
+                trace_nbd_extended_headers_compliance("hole_ext");
+            }
+            /* fallthrough */
         case NBD_REPLY_TYPE_OFFSET_HOLE:
-            ret = nbd_parse_offset_hole_payload(s, &reply.structured, payload,
-                                                offset, qiov, &local_err);
+            ret = nbd_parse_offset_hole_payload(
+                s, &reply.structured, payload,
+                chunk->type == NBD_REPLY_TYPE_OFFSET_HOLE_EXT,
+                offset, qiov, &local_err);
             if (ret < 0) {
                 nbd_channel_error(s, ret);
                 nbd_iter_channel_error(&iter, ret, &local_err);
diff --git a/block/trace-events b/block/trace-events
index 549090d453e7..ee65da204dde 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -168,6 +168,7 @@ iscsi_xcopy(void *src_lun, uint64_t src_off, void *dst_lun, uint64_t dst_off, ui
 # nbd.c
 nbd_parse_blockstatus_compliance(const char *err) "ignoring extra data from non-compliant server: %s"
 nbd_structured_read_compliance(const char *type) "server sent non-compliant unaligned read %s chunk"
+nbd_extended_headers_compliance(const char *type) "server sent non-compliant %s chunk without extended headers"
 nbd_read_reply_entry_fail(int ret, const char *err) "ret = %d, err: %s"
 nbd_co_request_fail(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name, int ret, const char *err) "Request failed { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) } ret = %d, err: %s"
 nbd_client_handshake(const char *export_name) "export '%s'"
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/14] nbd/client: Accept 64-bit block status chunks
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (10 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 11/14] nbd/client: Accept 64-bit hole chunks Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-03 23:15   ` [PATCH 13/14] nbd/client: Request extended headers during negotiation Eric Blake
  2021-12-03 23:15   ` [PATCH 14/14] do not apply: nbd/server: Send 64-bit hole chunk Eric Blake
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, nbd, nsoffer, Hanna Reitz,
	libguestfs

Because we use NBD_CMD_FLAG_REQ_ONE with NBD_CMD_BLOCK_STATUS, a
client in narrow mode should not be able to provoke a server into
sending a block status result larger than the client's 32-bit request.
But in extended mode, a 64-bit status request must be able to handle a
64-bit status result, once a future patch enables the client
requesting extended mode.  We can also tolerate a non-compliant server
sending the new chunk even when it should not.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index c5dea864ebb6..bd4a9c407bde 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -563,13 +563,15 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
  */
 static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
                                          NBDStructuredReplyChunk *chunk,
-                                         uint8_t *payload, uint64_t orig_length,
-                                         NBDExtent *extent, Error **errp)
+                                         uint8_t *payload, bool wide,
+                                         uint64_t orig_length,
+                                         NBDExtentExt *extent, Error **errp)
 {
     uint32_t context_id;
+    size_t len = wide ? sizeof(*extent) : sizeof(NBDExtent);

     /* The server succeeded, so it must have sent [at least] one extent */
-    if (chunk->length < sizeof(context_id) + sizeof(*extent)) {
+    if (chunk->length < sizeof(context_id) + len) {
         error_setg(errp, "Protocol error: invalid payload for "
                          "NBD_REPLY_TYPE_BLOCK_STATUS");
         return -EINVAL;
@@ -584,8 +586,16 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
         return -EINVAL;
     }

-    extent->length = payload_advance32(&payload);
-    extent->flags = payload_advance32(&payload);
+    if (wide) {
+        extent->length = payload_advance64(&payload);
+        extent->flags = payload_advance32(&payload);
+        if (payload_advance32(&payload) != 0) {
+            trace_nbd_parse_blockstatus_compliance("non-zero extent padding");
+        }
+    } else {
+        extent->length = payload_advance32(&payload);
+        extent->flags = payload_advance32(&payload);
+    }

     if (extent->length == 0) {
         error_setg(errp, "Protocol error: server sent status chunk with "
@@ -625,7 +635,7 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
      * connection; just ignore trailing extents, and clamp things to
      * the length of our request.
      */
-    if (chunk->length > sizeof(context_id) + sizeof(*extent)) {
+    if (chunk->length > sizeof(context_id) + len) {
         trace_nbd_parse_blockstatus_compliance("more than one extent");
     }
     if (extent->length > orig_length) {
@@ -1081,7 +1091,7 @@ static int nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t handle,

 static int nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
                                             uint64_t handle, uint64_t length,
-                                            NBDExtent *extent,
+                                            NBDExtentExt *extent,
                                             int *request_ret, Error **errp)
 {
     NBDReplyChunkIter iter;
@@ -1098,6 +1108,11 @@ static int nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
         assert(nbd_reply_is_structured(&reply));

         switch (chunk->type) {
+        case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
+            if (!s->info.extended_headers) {
+                trace_nbd_extended_headers_compliance("block_status_ext");
+            }
+            /* fallthrough */
         case NBD_REPLY_TYPE_BLOCK_STATUS:
             if (received) {
                 nbd_channel_error(s, -EINVAL);
@@ -1106,9 +1121,10 @@ static int nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
             }
             received = true;

-            ret = nbd_parse_blockstatus_payload(s, &reply.structured,
-                                                payload, length, extent,
-                                                &local_err);
+            ret = nbd_parse_blockstatus_payload(
+                s, &reply.structured, payload,
+                chunk->type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT,
+                length, extent, &local_err);
             if (ret < 0) {
                 nbd_channel_error(s, ret);
                 nbd_iter_channel_error(&iter, ret, &local_err);
@@ -1337,7 +1353,7 @@ static int coroutine_fn nbd_client_co_block_status(
         int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
     int ret, request_ret;
-    NBDExtent extent = { 0 };
+    NBDExtentExt extent = { 0 };
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
     Error *local_err = NULL;

-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/14] nbd/client: Request extended headers during negotiation
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (11 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 12/14] nbd/client: Accept 64-bit block status chunks Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  2021-12-03 23:15   ` [PATCH 14/14] do not apply: nbd/server: Send 64-bit hole chunk Eric Blake
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, qemu-block, nbd, nsoffer, Hanna Reitz,
	libguestfs

All the pieces are in place for a client to finally request extended
headers.  Note that we must not request extended headers when qemu-nbd
is used to connect to the kernel module (as nbd.ko does not expect
them), but there is no harm in all other clients requesting them.

Extended headers do not make a difference to the information collected
during 'qemu-nbd --list', but probing for it gives us one more piece
of information in that output.  Update the iotests affected by the new
line of output.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client-connection.c                       |  1 +
 nbd/client.c                                  | 26 ++++++++++++++++---
 qemu-nbd.c                                    |  2 ++
 tests/qemu-iotests/223.out                    |  4 +++
 tests/qemu-iotests/233.out                    |  1 +
 tests/qemu-iotests/241                        |  8 +++---
 tests/qemu-iotests/307                        |  2 +-
 tests/qemu-iotests/307.out                    |  5 ++++
 .../tests/nbd-qemu-allocation.out             |  1 +
 9 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 695f85575414..d8b9ae230264 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -87,6 +87,7 @@ NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,

         .initial_info.request_sizes = true,
         .initial_info.structured_reply = true,
+        .initial_info.extended_headers = true,
         .initial_info.base_allocation = true,
         .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
         .initial_info.name = g_strdup(export_name ?: "")
diff --git a/nbd/client.c b/nbd/client.c
index f1aa5256c8bf..0e227255d59b 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -882,8 +882,8 @@ static int nbd_list_meta_contexts(QIOChannel *ioc,
 static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
                                QCryptoTLSCreds *tlscreds,
                                const char *hostname, QIOChannel **outioc,
-                               bool structured_reply, bool *zeroes,
-                               Error **errp)
+                               bool structured_reply, bool *ext_hdrs,
+                               bool *zeroes, Error **errp)
 {
     ERRP_GUARD();
     uint64_t magic;
@@ -960,6 +960,15 @@ static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
         if (fixedNewStyle) {
             int result = 0;

+            if (ext_hdrs && *ext_hdrs) {
+                result = nbd_request_simple_option(ioc,
+                                                   NBD_OPT_EXTENDED_HEADERS,
+                                                   false, errp);
+                if (result < 0) {
+                    return -EINVAL;
+                }
+                *ext_hdrs = result == 1;
+            }
             if (structured_reply) {
                 result = nbd_request_simple_option(ioc,
                                                    NBD_OPT_STRUCTURED_REPLY,
@@ -970,6 +979,9 @@ static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
             }
             return 2 + result;
         } else {
+            if (ext_hdrs) {
+                *ext_hdrs = false;
+            }
             return 1;
         }
     } else if (magic == NBD_CLIENT_MAGIC) {
@@ -977,6 +989,9 @@ static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
             error_setg(errp, "Server does not support STARTTLS");
             return -EINVAL;
         }
+        if (ext_hdrs) {
+            *ext_hdrs = false;
+        }
         return 0;
     } else {
         error_setg(errp, "Bad server magic received: 0x%" PRIx64, magic);
@@ -1030,7 +1045,8 @@ int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
     trace_nbd_receive_negotiate_name(info->name);

     result = nbd_start_negotiate(aio_context, ioc, tlscreds, hostname, outioc,
-                                 info->structured_reply, &zeroes, errp);
+                                 info->structured_reply,
+                                 &info->extended_headers, &zeroes, errp);

     info->structured_reply = false;
     info->base_allocation = false;
@@ -1147,10 +1163,11 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
     int ret = -1;
     NBDExportInfo *array = NULL;
     QIOChannel *sioc = NULL;
+    bool ext_hdrs;

     *info = NULL;
     result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, &sioc, true,
-                                 NULL, errp);
+                                 &ext_hdrs, NULL, errp);
     if (tlscreds && sioc) {
         ioc = sioc;
     }
@@ -1179,6 +1196,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
             array[count - 1].name = name;
             array[count - 1].description = desc;
             array[count - 1].structured_reply = result == 3;
+            array[count - 1].extended_headers = ext_hdrs;
         }

         for (i = 0; i < count; i++) {
diff --git a/qemu-nbd.c b/qemu-nbd.c
index c6c20df68a4d..ecf47b7e1250 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -237,6 +237,8 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
             printf("  opt block: %u\n", list[i].opt_block);
             printf("  max block: %u\n", list[i].max_block);
         }
+        printf("  transaction size: %s\n",
+               list[i].extended_headers ? "64-bit" : "32-bit");
         if (list[i].n_contexts) {
             printf("  available meta contexts: %d\n", list[i].n_contexts);
             for (j = 0; j < list[i].n_contexts; j++) {
diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index e58ea5abbd5a..36de28ccd12f 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -76,6 +76,7 @@ exports available: 2
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b
@@ -86,6 +87,7 @@ exports available: 2
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b2
@@ -177,6 +179,7 @@ exports available: 2
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b
@@ -187,6 +190,7 @@ exports available: 2
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b2
diff --git a/tests/qemu-iotests/233.out b/tests/qemu-iotests/233.out
index 4b1f6a0e1513..b04cc38bd11c 100644
--- a/tests/qemu-iotests/233.out
+++ b/tests/qemu-iotests/233.out
@@ -43,6 +43,7 @@ exports available: 1
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

diff --git a/tests/qemu-iotests/241 b/tests/qemu-iotests/241
index c962c8b6075d..23307ca2d829 100755
--- a/tests/qemu-iotests/241
+++ b/tests/qemu-iotests/241
@@ -3,7 +3,7 @@
 #
 # Test qemu-nbd vs. unaligned images
 #
-# Copyright (C) 2018-2019 Red Hat, Inc.
+# Copyright (C) 2018-2021 Red Hat, Inc.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -58,7 +58,7 @@ echo

 nbd_server_start_unix_socket -f $IMGFMT "$TEST_IMG_FILE"

-$QEMU_NBD_PROG --list -k $nbd_unix_socket | grep '\(size\|min\)'
+$QEMU_NBD_PROG --list -k $nbd_unix_socket | grep '^ *\(size\|min\)'
 $QEMU_IMG map -f raw --output=json "$TEST_IMG" | _filter_qemu_img_map
 $QEMU_IO -f raw -c map "$TEST_IMG"
 nbd_server_stop
@@ -71,7 +71,7 @@ echo
 # sector alignment, here at the server.
 nbd_server_start_unix_socket "$TEST_IMG_FILE" 2> "$TEST_DIR/server.log"

-$QEMU_NBD_PROG --list -k $nbd_unix_socket | grep '\(size\|min\)'
+$QEMU_NBD_PROG --list -k $nbd_unix_socket | grep '^ *\(size\|min\)'
 $QEMU_IMG map -f raw --output=json "$TEST_IMG" | _filter_qemu_img_map
 $QEMU_IO -f raw -c map "$TEST_IMG"
 nbd_server_stop
@@ -84,7 +84,7 @@ echo
 # Now force sector alignment at the client.
 nbd_server_start_unix_socket -f $IMGFMT "$TEST_IMG_FILE"

-$QEMU_NBD_PROG --list -k $nbd_unix_socket | grep '\(size\|min\)'
+$QEMU_NBD_PROG --list -k $nbd_unix_socket | grep '^ *\(size\|min\)'
 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
 $QEMU_IO -c map "$TEST_IMG"
 nbd_server_stop
diff --git a/tests/qemu-iotests/307 b/tests/qemu-iotests/307
index b429b5aa50a4..64ce250d82de 100755
--- a/tests/qemu-iotests/307
+++ b/tests/qemu-iotests/307
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
 # group: rw quick export
 #
-# Copyright (C) 2020 Red Hat, Inc.
+# Copyright (C) 2020-2021 Red Hat, Inc.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
diff --git a/tests/qemu-iotests/307.out b/tests/qemu-iotests/307.out
index ec8d2be0e0a4..343ddc0a5c16 100644
--- a/tests/qemu-iotests/307.out
+++ b/tests/qemu-iotests/307.out
@@ -19,6 +19,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -47,6 +48,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -78,6 +80,7 @@ exports available: 2
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation
  export: 'export1'
@@ -87,6 +90,7 @@ exports available: 2
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -113,6 +117,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
index 0bf1abb06338..f30b4bed2144 100644
--- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out
+++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
@@ -21,6 +21,7 @@ exports available: 1
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:allocation-depth
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/14] do not apply: nbd/server: Send 64-bit hole chunk
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (12 preceding siblings ...)
  2021-12-03 23:15   ` [PATCH 13/14] nbd/client: Request extended headers during negotiation Eric Blake
@ 2021-12-03 23:15   ` Eric Blake
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: nsoffer, vsementsov, libguestfs, qemu-block, nbd

Since we cap NBD_CMD_READ requests to 32M, we never have a reason to
send a 64-bit chunk type for a hole; but it is worth producing these
for interoperability testing of clients that want extended headers.
---
 nbd/server.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 7e6140350797..4369a9a8ff08 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2071,19 +2071,29 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
         if (status & BDRV_BLOCK_ZERO) {
             NBDReply hdr;
             NBDStructuredReadHole chunk;
+            NBDStructuredReadHoleExt chunk_ext;
             struct iovec iov[] = {
                 {.iov_base = &hdr},
-                {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+                {.iov_base = client->extended_headers ? &chunk_ext
+                 : (void *) &chunk,
+                 .iov_len = client->extended_headers ? sizeof(chunk_ext)
+                 : sizeof(chunk)},
             };

             trace_nbd_co_send_structured_read_hole(handle, offset + progress,
                                                    pnum);
             set_be_chunk(client, &iov[0],
                          final ? NBD_REPLY_FLAG_DONE : 0,
-                         NBD_REPLY_TYPE_OFFSET_HOLE,
+                         client->extended_headers ? NBD_REPLY_TYPE_OFFSET_HOLE_EXT
+                         : NBD_REPLY_TYPE_OFFSET_HOLE,
                          handle, iov[1].iov_len);
-            stq_be_p(&chunk.offset, offset + progress);
-            stl_be_p(&chunk.length, pnum);
+            if (client->extended_headers) {
+                stq_be_p(&chunk_ext.offset, offset + progress);
+                stq_be_p(&chunk_ext.length, pnum);
+            } else {
+                stq_be_p(&chunk.offset, offset + progress);
+                stl_be_p(&chunk.length, pnum);
+            }
             ret = nbd_co_send_iov(client, iov, 2, errp);
         } else {
             ret = blk_pread(exp->common.blk, offset + progress,
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS
  2021-12-03 23:13 RFC for NBD protocol extension: extended headers Eric Blake
  2021-12-03 23:14 ` [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
  2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
@ 2021-12-03 23:17 ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 01/13] golang: Simplify nbd_block_status callback array copy Eric Blake
                     ` (13 more replies)
  2 siblings, 14 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

Available here: https://repo.or.cz/libnbd/ericb.git/shortlog/refs/tags/exthdr-v1

I also want to do followup patches to teach 'nbdinfo --map' and
'nbdcopy' to utilize 64-bit extents.

Eric Blake (13):
  golang: Simplify nbd_block_status callback array copy
  block_status: Refactor array storage
  protocol: Add definitions for extended headers
  protocol: Prepare to send 64-bit requests
  protocol: Prepare to receive 64-bit replies
  protocol: Accept 64-bit holes during pread
  generator: Add struct nbd_extent in prep for 64-bit extents
  block_status: Track 64-bit extents internally
  block_status: Accept 64-bit extents during block status
  api: Add [aio_]nbd_block_status_64
  api: Add three functions for controlling extended headers
  generator: Actually request extended headers
  interop: Add test of 64-bit block status

 lib/internal.h                                |  31 ++-
 lib/nbd-protocol.h                            |  61 ++++-
 generator/API.ml                              | 237 ++++++++++++++++--
 generator/API.mli                             |   3 +-
 generator/C.ml                                |  24 +-
 generator/GoLang.ml                           |  35 ++-
 generator/Makefile.am                         |   3 +-
 generator/OCaml.ml                            |  20 +-
 generator/Python.ml                           |  29 ++-
 generator/state_machine.ml                    |  52 +++-
 generator/states-issue-command.c              |  31 ++-
 .../states-newstyle-opt-extended-headers.c    |  90 +++++++
 generator/states-newstyle-opt-starttls.c      |  10 +-
 generator/states-reply-structured.c           | 220 ++++++++++++----
 generator/states-reply.c                      |  31 ++-
 lib/handle.c                                  |  27 +-
 lib/rw.c                                      | 105 +++++++-
 python/t/110-defaults.py                      |   3 +-
 python/t/120-set-non-defaults.py              |   4 +-
 python/t/465-block-status-64.py               |  56 +++++
 ocaml/helpers.c                               |  22 +-
 ocaml/nbd-c.h                                 |   3 +-
 ocaml/tests/Makefile.am                       |   5 +-
 ocaml/tests/test_110_defaults.ml              |   4 +-
 ocaml/tests/test_120_set_non_defaults.ml      |   5 +-
 ocaml/tests/test_465_block_status_64.ml       |  58 +++++
 tests/meta-base-allocation.c                  | 111 +++++++-
 interop/Makefile.am                           |   6 +
 interop/large-status.c                        | 186 ++++++++++++++
 interop/large-status.sh                       |  49 ++++
 .gitignore                                    |   1 +
 golang/Makefile.am                            |   3 +-
 golang/handle.go                              |   6 +
 golang/libnbd_110_defaults_test.go            |   8 +
 golang/libnbd_120_set_non_defaults_test.go    |  12 +
 golang/libnbd_465_block_status_64_test.go     | 119 +++++++++
 36 files changed, 1511 insertions(+), 159 deletions(-)
 create mode 100644 generator/states-newstyle-opt-extended-headers.c
 create mode 100644 python/t/465-block-status-64.py
 create mode 100644 ocaml/tests/test_465_block_status_64.ml
 create mode 100644 interop/large-status.c
 create mode 100755 interop/large-status.sh
 create mode 100644 golang/libnbd_465_block_status_64_test.go

-- 
2.33.1



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [libnbd PATCH 01/13] golang: Simplify nbd_block_status callback array copy
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 02/13] block_status: Refactor array storage Eric Blake
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

In the block status callback glue code, we need to copy a C uint32_t[]
into a golang []uint32.  The copy is necessary since the lifetime of
the C array is not guaranteed to outlive whatever the Go callback may
have done with what it was handed; copying ensures that the user's Go
code doesn't have to worry about lifetime issues.  But we don't have
to have quite so many casts and pointer additions: since we can assume
C.uint32_t and uint32 occupy the same amount of memory (even though
they are different types), we can exploit Go's ability to treat an
unsafe pointer as if it were an oversized array, take a slice of that
array, and then use idiomatic Go to copy from the slice.

https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
---
 generator/GoLang.ml | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/generator/GoLang.ml b/generator/GoLang.ml
index eb3aa263..d3b7dc79 100644
--- a/generator/GoLang.ml
+++ b/generator/GoLang.ml
@@ -1,6 +1,6 @@
 (* hey emacs, this is OCaml code: -*- tuareg -*- *)
 (* nbd client library in userspace: generator
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -514,11 +514,14 @@ let
 /* Closures. */

 func copy_uint32_array (entries *C.uint32_t, count C.size_t) []uint32 {
-    ret := make([]uint32, int (count))
-    for i := 0; i < int (count); i++ {
-       entry := (*C.uint32_t) (unsafe.Pointer(uintptr(unsafe.Pointer(entries)) + (unsafe.Sizeof(*entries) * uintptr(i))))
-       ret[i] = uint32 (*entry)
-    }
+    /* https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices */
+    unsafePtr := unsafe.Pointer(entries)
+    /* Max structured reply payload is 64M, so this array size is more than
+     * sufficient for the underlying slice we want to access.
+     */
+    arrayPtr := (*[1 << 20]uint32)(unsafePtr)
+    ret := make([]uint32, count)
+    copy(ret, arrayPtr[:count:count])
     return ret
 }
 ";
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 02/13] block_status: Refactor array storage
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 01/13] golang: Simplify nbd_block_status callback array copy Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 03/13] protocol: Add definitions for extended headers Eric Blake
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

For 32-bit block status, we were able to cheat and use an array with
an odd number of elements, with array[0] holding the context id, and
passing &array[1] to the user's callback.  But once we have 64-bit
extents, we can no longer abuse array element 0 like that.  Split out
a new state to receive the context id separately from the extents
array.  No behavioral change, other than the rare possibility of
landing in the new state.
---
 lib/internal.h                      |  1 +
 generator/state_machine.ml          | 11 +++++-
 generator/states-reply-structured.c | 58 ++++++++++++++++++++---------
 3 files changed, 51 insertions(+), 19 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index 0e205aba..7e96e8e9 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -274,6 +274,7 @@ struct nbd_handle {
   size_t querynum;

   /* When receiving block status, this is used. */
+  uint32_t bs_contextid;
   uint32_t *bs_entries;

   /* Commands which are waiting to be issued [meaning the request
diff --git a/generator/state_machine.ml b/generator/state_machine.ml
index 3bc77f24..99652948 100644
--- a/generator/state_machine.ml
+++ b/generator/state_machine.ml
@@ -1,6 +1,6 @@
 (* hey emacs, this is OCaml code: -*- tuareg -*- *)
 (* nbd client library in userspace: state machine definition
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -862,10 +862,17 @@ and
     external_events = [];
   };

+  State {
+    default_state with
+    name = "RECV_BS_CONTEXTID";
+    comment = "Receive contextid of structured reply block-status payload";
+    external_events = [];
+  };
+
   State {
     default_state with
     name = "RECV_BS_ENTRIES";
-    comment = "Receive a structured reply block-status payload";
+    comment = "Receive entries array of structured reply block-status payload";
     external_events = [];
   };

diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index 70010474..e1da850d 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -1,5 +1,5 @@
 /* nbd client library in userspace: state machine
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -185,19 +185,10 @@ STATE_MACHINE {
       set_error (0, "not expecting NBD_REPLY_TYPE_BLOCK_STATUS here");
       return 0;
     }
-    /* We read the context ID followed by all the entries into a
-     * single array and deal with it at the end.
-     */
-    free (h->bs_entries);
-    h->bs_entries = malloc (length);
-    if (h->bs_entries == NULL) {
-      SET_NEXT_STATE (%.DEAD);
-      set_error (errno, "malloc");
-      return 0;
-    }
-    h->rbuf = h->bs_entries;
-    h->rlen = length;
-    SET_NEXT_STATE (%RECV_BS_ENTRIES);
+    /* Start by reading the context ID. */
+    h->rbuf = &h->bs_contextid;
+    h->rlen = sizeof h->bs_contextid;
+    SET_NEXT_STATE (%RECV_BS_CONTEXTID);
     return 0;
   }
   else {
@@ -452,9 +443,41 @@ STATE_MACHINE {
   }
   return 0;

+ REPLY.STRUCTURED_REPLY.RECV_BS_CONTEXTID:
+  struct command *cmd = h->reply_cmd;
+  uint32_t length;
+
+  switch (recv_into_rbuf (h)) {
+  case -1: SET_NEXT_STATE (%.DEAD); return 0;
+  case 1:
+    save_reply_state (h);
+    SET_NEXT_STATE (%.READY);
+    return 0;
+  case 0:
+    length = be32toh (h->sbuf.sr.structured_reply.length);
+
+    assert (cmd); /* guaranteed by CHECK */
+    assert (cmd->type == NBD_CMD_BLOCK_STATUS);
+    assert (length >= 12);
+    length -= sizeof h->bs_contextid;
+
+    free (h->bs_entries);
+    h->bs_entries = malloc (length);
+    if (h->bs_entries == NULL) {
+      SET_NEXT_STATE (%.DEAD);
+      set_error (errno, "malloc");
+      return 0;
+    }
+    h->rbuf = h->bs_entries;
+    h->rlen = length;
+    SET_NEXT_STATE (%RECV_BS_ENTRIES);
+  }
+  return 0;
+
  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
   struct command *cmd = h->reply_cmd;
   uint32_t length;
+  uint32_t count;
   size_t i;
   uint32_t context_id;
   struct meta_context *meta_context;
@@ -473,15 +496,16 @@ STATE_MACHINE {
     assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
     assert (h->bs_entries);
     assert (length >= 12);
+    count = (length - sizeof h->bs_contextid) / sizeof *h->bs_entries;

     /* Need to byte-swap the entries returned, but apart from that we
      * don't validate them.
      */
-    for (i = 0; i < length/4; ++i)
+    for (i = 0; i < count; ++i)
       h->bs_entries[i] = be32toh (h->bs_entries[i]);

     /* Look up the context ID. */
-    context_id = h->bs_entries[0];
+    context_id = be32toh (h->bs_contextid);
     for (meta_context = h->meta_contexts;
          meta_context;
          meta_context = meta_context->next)
@@ -494,7 +518,7 @@ STATE_MACHINE {

       if (CALL_CALLBACK (cmd->cb.fn.extent,
                          meta_context->name, cmd->offset,
-                         &h->bs_entries[1], (length-4) / 4,
+                         h->bs_entries, count,
                          &error) == -1)
         if (cmd->error == 0)
           cmd->error = error ? error : EPROTO;
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 03/13] protocol: Add definitions for extended headers
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 01/13] golang: Simplify nbd_block_status callback array copy Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 02/13] block_status: Refactor array storage Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 04/13] protocol: Prepare to send 64-bit requests Eric Blake
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

Add the magic numbers and new structs necessary to implement the NBD
protocol extension of extended headers providing 64-bit lengths.
---
 lib/nbd-protocol.h | 61 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 51 insertions(+), 10 deletions(-)

diff --git a/lib/nbd-protocol.h b/lib/nbd-protocol.h
index e5d6404b..7247d775 100644
--- a/lib/nbd-protocol.h
+++ b/lib/nbd-protocol.h
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
@@ -124,6 +124,7 @@ struct nbd_fixed_new_option_reply {
 #define NBD_OPT_STRUCTURED_REPLY   8
 #define NBD_OPT_LIST_META_CONTEXT  9
 #define NBD_OPT_SET_META_CONTEXT   10
+#define NBD_OPT_EXTENDED_HEADERS   11

 #define NBD_REP_ERR(val) (0x80000000 | (val))
 #define NBD_REP_IS_ERR(val) (!!((val) & 0x80000000))
@@ -188,6 +189,13 @@ struct nbd_block_descriptor {
   uint32_t status_flags;        /* block type (hole etc) */
 } NBD_ATTRIBUTE_PACKED;

+/* NBD_REPLY_TYPE_BLOCK_STATUS_EXT block descriptor. */
+struct nbd_block_descriptor_ext {
+  uint64_t length;              /* length of block */
+  uint32_t status_flags;        /* block type (hole etc) */
+  uint32_t pad;                 /* must be zero */
+} NBD_ATTRIBUTE_PACKED;
+
 /* Request (client -> server). */
 struct nbd_request {
   uint32_t magic;               /* NBD_REQUEST_MAGIC. */
@@ -197,6 +205,14 @@ struct nbd_request {
   uint64_t offset;              /* Request offset. */
   uint32_t count;               /* Request length. */
 } NBD_ATTRIBUTE_PACKED;
+struct nbd_request_ext {
+  uint32_t magic;               /* NBD_REQUEST_EXT_MAGIC. */
+  uint16_t flags;               /* Request flags. */
+  uint16_t type;                /* Request type. */
+  uint64_t handle;              /* Opaque handle. */
+  uint64_t offset;              /* Request offset. */
+  uint64_t count;               /* Request length. */
+} NBD_ATTRIBUTE_PACKED;

 /* Simple reply (server -> client). */
 struct nbd_simple_reply {
@@ -204,6 +220,13 @@ struct nbd_simple_reply {
   uint32_t error;               /* NBD_SUCCESS or one of NBD_E*. */
   uint64_t handle;              /* Opaque handle. */
 } NBD_ATTRIBUTE_PACKED;
+struct nbd_simple_reply_ext {
+  uint32_t magic;               /* NBD_SIMPLE_REPLY_EXT_MAGIC. */
+  uint32_t error;               /* NBD_SUCCESS or one of NBD_E*. */
+  uint64_t handle;              /* Opaque handle. */
+  uint64_t pad1;                /* Must be 0. */
+  uint64_t pad2;                /* Must be 0. */
+} NBD_ATTRIBUTE_PACKED;

 /* Structured reply (server -> client). */
 struct nbd_structured_reply {
@@ -213,6 +236,14 @@ struct nbd_structured_reply {
   uint64_t handle;              /* Opaque handle. */
   uint32_t length;              /* Length of payload which follows. */
 } NBD_ATTRIBUTE_PACKED;
+struct nbd_structured_reply_ext {
+  uint32_t magic;               /* NBD_STRUCTURED_REPLY_EXT_MAGIC. */
+  uint16_t flags;               /* NBD_REPLY_FLAG_* */
+  uint16_t type;                /* NBD_REPLY_TYPE_* */
+  uint64_t handle;              /* Opaque handle. */
+  uint64_t length;              /* Length of payload which follows. */
+  uint64_t pad;                 /* Must be 0. */
+} NBD_ATTRIBUTE_PACKED;

 struct nbd_structured_reply_offset_data {
   uint64_t offset;              /* offset */
@@ -224,15 +255,23 @@ struct nbd_structured_reply_offset_hole {
   uint32_t length;              /* Length of hole. */
 } NBD_ATTRIBUTE_PACKED;

+struct nbd_structured_reply_offset_hole_ext {
+  uint64_t offset;
+  uint64_t length;              /* Length of hole. */
+} NBD_ATTRIBUTE_PACKED;
+
 struct nbd_structured_reply_error {
   uint32_t error;               /* NBD_E* error number */
   uint16_t len;                 /* Length of human readable error. */
   /* Followed by human readable error string, and possibly more structure. */
 } NBD_ATTRIBUTE_PACKED;

-#define NBD_REQUEST_MAGIC           0x25609513
-#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
-#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
+#define NBD_REQUEST_MAGIC               0x25609513
+#define NBD_REQUEST_EXT_MAGIC           0x21e41c71
+#define NBD_SIMPLE_REPLY_MAGIC          0x67446698
+#define NBD_SIMPLE_REPLY_EXT_MAGIC      0x60d12fd6
+#define NBD_STRUCTURED_REPLY_MAGIC      0x668e33ef
+#define NBD_STRUCTURED_REPLY_EXT_MAGIC  0x6e8a278c

 /* Structured reply flags. */
 #define NBD_REPLY_FLAG_DONE         (1<<0)
@@ -241,12 +280,14 @@ struct nbd_structured_reply_error {
 #define NBD_REPLY_TYPE_IS_ERR(val) (!!((val) & (1<<15)))

 /* Structured reply types. */
-#define NBD_REPLY_TYPE_NONE         0
-#define NBD_REPLY_TYPE_OFFSET_DATA  1
-#define NBD_REPLY_TYPE_OFFSET_HOLE  2
-#define NBD_REPLY_TYPE_BLOCK_STATUS 5
-#define NBD_REPLY_TYPE_ERROR        NBD_REPLY_TYPE_ERR (1)
-#define NBD_REPLY_TYPE_ERROR_OFFSET NBD_REPLY_TYPE_ERR (2)
+#define NBD_REPLY_TYPE_NONE             0
+#define NBD_REPLY_TYPE_OFFSET_DATA      1
+#define NBD_REPLY_TYPE_OFFSET_HOLE      2
+#define NBD_REPLY_TYPE_OFFSET_HOLE_EXT  3
+#define NBD_REPLY_TYPE_BLOCK_STATUS     5
+#define NBD_REPLY_TYPE_BLOCK_STATUS_EXT 6
+#define NBD_REPLY_TYPE_ERROR            NBD_REPLY_TYPE_ERR (1)
+#define NBD_REPLY_TYPE_ERROR_OFFSET     NBD_REPLY_TYPE_ERR (2)

 /* NBD commands. */
 #define NBD_CMD_READ              0
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 04/13] protocol: Prepare to send 64-bit requests
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (2 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 03/13] protocol: Add definitions for extended headers Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 05/13] protocol: Prepare to receive 64-bit replies Eric Blake
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

Support sending 64-bit requests if extended headers were negotiated.

At this point, h->extended_headers is permanently false (we can't
enable it until all other aspects of the protocol have likewise been
converted).
---
 lib/internal.h                      | 12 ++++++++---
 generator/states-issue-command.c    | 31 +++++++++++++++++++----------
 generator/states-reply-structured.c |  2 +-
 lib/rw.c                            | 10 ++++------
 4 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index 7e96e8e9..07378588 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -1,5 +1,5 @@
 /* nbd client library in userspace: internal definitions
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -106,6 +106,9 @@ struct nbd_handle {
   char *tls_username;           /* Username, NULL = use current username */
   char *tls_psk_file;           /* PSK filename, NULL = no PSK */

+  /* Extended headers. */
+  bool extended_headers;        /* If we negotiated NBD_OPT_EXTENDED_HEADERS */
+
   /* Desired metadata contexts. */
   bool request_sr;
   string_vector request_meta_contexts;
@@ -242,7 +245,10 @@ struct nbd_handle {
   /* Issuing a command must use a buffer separate from sbuf, for the
    * case when we interrupt a request to service a reply.
    */
-  struct nbd_request request;
+  union {
+    struct nbd_request request;
+    struct nbd_request_ext request_ext;
+  } req;
   bool in_write_payload;
   bool in_write_shutdown;

@@ -347,7 +353,7 @@ struct command {
   uint16_t type;
   uint64_t cookie;
   uint64_t offset;
-  uint32_t count;
+  uint64_t count;
   void *data; /* Buffer for read/write */
   struct command_cb cb;
   enum state state; /* State to resume with on next POLLIN */
diff --git a/generator/states-issue-command.c b/generator/states-issue-command.c
index a8101144..7b1d6dc7 100644
--- a/generator/states-issue-command.c
+++ b/generator/states-issue-command.c
@@ -1,5 +1,5 @@
 /* nbd client library in userspace: state machine
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -41,14 +41,23 @@ STATE_MACHINE {
     return 0;
   }

-  h->request.magic = htobe32 (NBD_REQUEST_MAGIC);
-  h->request.flags = htobe16 (cmd->flags);
-  h->request.type = htobe16 (cmd->type);
-  h->request.handle = htobe64 (cmd->cookie);
-  h->request.offset = htobe64 (cmd->offset);
-  h->request.count = htobe32 ((uint32_t) cmd->count);
-  h->wbuf = &h->request;
-  h->wlen = sizeof (h->request);
+  /* These fields are coincident between req.request and req.request_ext */
+  h->req.request.flags = htobe16 (cmd->flags);
+  h->req.request.type = htobe16 (cmd->type);
+  h->req.request.handle = htobe64 (cmd->cookie);
+  h->req.request.offset = htobe64 (cmd->offset);
+  if (h->extended_headers) {
+    h->req.request_ext.magic = htobe32 (NBD_REQUEST_EXT_MAGIC);
+    h->req.request_ext.count = htobe64 (cmd->count);
+    h->wlen = sizeof (h->req.request_ext);
+  }
+  else {
+    assert (cmd->count <= UINT32_MAX);
+    h->req.request.magic = htobe32 (NBD_REQUEST_MAGIC);
+    h->req.request.count = htobe32 (cmd->count);
+    h->wlen = sizeof (h->req.request);
+  }
+  h->wbuf = &h->req;
   if (cmd->type == NBD_CMD_WRITE || cmd->next)
     h->wflags = MSG_MORE;
   SET_NEXT_STATE (%SEND_REQUEST);
@@ -73,7 +82,7 @@ STATE_MACHINE {

   assert (h->cmds_to_issue != NULL);
   cmd = h->cmds_to_issue;
-  assert (cmd->cookie == be64toh (h->request.handle));
+  assert (cmd->cookie == be64toh (h->req.request.handle));
   if (cmd->type == NBD_CMD_WRITE) {
     h->wbuf = cmd->data;
     h->wlen = cmd->count;
@@ -119,7 +128,7 @@ STATE_MACHINE {
   assert (!h->wlen);
   assert (h->cmds_to_issue != NULL);
   cmd = h->cmds_to_issue;
-  assert (cmd->cookie == be64toh (h->request.handle));
+  assert (cmd->cookie == be64toh (h->req.request.handle));
   h->cmds_to_issue = cmd->next;
   if (h->cmds_to_issue_tail == cmd)
     h->cmds_to_issue_tail = NULL;
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index e1da850d..5524e000 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -34,7 +34,7 @@ structured_reply_in_bounds (uint64_t offset, uint32_t length,
       offset + length > cmd->offset + cmd->count) {
     set_error (0, "range of structured reply is out of bounds, "
                "offset=%" PRIu64 ", cmd->offset=%" PRIu64 ", "
-               "length=%" PRIu32 ", cmd->count=%" PRIu32 ": "
+               "length=%" PRIu32 ", cmd->count=%" PRIu64 ": "
                "this is likely to be a bug in the NBD server",
                offset, cmd->offset, length, cmd->count);
     return false;
diff --git a/lib/rw.c b/lib/rw.c
index 4ade7508..16c2e848 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -1,5 +1,5 @@
 /* NBD client library in userspace
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -216,13 +216,11 @@ nbd_internal_command_common (struct nbd_handle *h,
     }
     break;

-    /* Other commands are currently limited by the 32 bit field in the
-     * command structure on the wire, but in future we hope to support
-     * 64 bit values here with a change to the NBD protocol which is
-     * being discussed upstream.
+    /* Other commands are limited by the 32 bit field in the command
+     * structure on the wire, unless extended headers were negotiated.
      */
   default:
-    if (count > UINT32_MAX) {
+    if (!h->extended_headers && count > UINT32_MAX) {
       set_error (ERANGE, "request too large: maximum request size is %" PRIu32,
                  UINT32_MAX);
       goto err;
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 05/13] protocol: Prepare to receive 64-bit replies
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (3 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 04/13] protocol: Prepare to send 64-bit requests Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 06/13] protocol: Accept 64-bit holes during pread Eric Blake
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

Support receiving headers for 64-bit replies if extended headers were
negotiated.  We already insist that the server not send us too much
payload in one reply, so we can exploit that and merge the 64-bit
length back into a normalized 32-bit field for the rest of the payload
length calculations.  The NBD protocol specifically made extended
simple and structured replies both occupy 32 bytes, while the handle
field is still in the same offset between all reply types.

Note that if we negotiate extended headers, but a non-compliant server
replies with a non-extended header, we will stall waiting for the
server to send more bytes rather than noticing that the magic number
is wrong.  The alternative would be to read just the first 4 bytes of
magic, then determine how many more bytes to expect; but that would
require more states and syscalls, and not worth it since the typical
server will be compliant.

At this point, h->extended_headers is permanently false (we can't
enable it until all other aspects of the protocol have likewise been
converted).
---
 lib/internal.h                      |  8 +++-
 generator/states-reply-structured.c | 59 +++++++++++++++++++----------
 generator/states-reply.c            | 31 +++++++++++----
 3 files changed, 68 insertions(+), 30 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index 07378588..c9f84441 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -222,8 +222,12 @@ struct nbd_handle {
     }  __attribute__((packed)) or;
     struct nbd_export_name_option_reply export_name_reply;
     struct nbd_simple_reply simple_reply;
+    struct nbd_simple_reply_ext simple_reply_ext;
     struct {
-      struct nbd_structured_reply structured_reply;
+      union {
+        struct nbd_structured_reply structured_reply;
+        struct nbd_structured_reply_ext structured_reply_ext;
+      } hdr;
       union {
         struct nbd_structured_reply_offset_data offset_data;
         struct nbd_structured_reply_offset_hole offset_hole;
@@ -233,7 +237,7 @@ struct nbd_handle {
           uint64_t offset; /* Only used for NBD_REPLY_TYPE_ERROR_OFFSET */
         } __attribute__((packed)) error;
       } payload;
-    }  __attribute__((packed)) sr;
+    } sr;
     uint16_t gflags;
     uint32_t cflags;
     uint32_t len;
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index 5524e000..1b675e8d 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -45,19 +45,23 @@ structured_reply_in_bounds (uint64_t offset, uint32_t length,

 STATE_MACHINE {
  REPLY.STRUCTURED_REPLY.START:
-  /* We've only read the simple_reply.  The structured_reply is longer,
-   * so read the remaining part.
+  /* We've only read the simple_reply.  Unless we have extended headers,
+   * the structured_reply is longer, so read the remaining part.
    */
   if (!h->structured_replies) {
     set_error (0, "server sent unexpected structured reply");
     SET_NEXT_STATE(%.DEAD);
     return 0;
   }
-  h->rbuf = &h->sbuf;
-  h->rbuf += sizeof h->sbuf.simple_reply;
-  h->rlen = sizeof h->sbuf.sr.structured_reply;
-  h->rlen -= sizeof h->sbuf.simple_reply;
-  SET_NEXT_STATE (%RECV_REMAINING);
+  if (h->extended_headers)
+    SET_NEXT_STATE (%CHECK);
+  else {
+    h->rbuf = &h->sbuf;
+    h->rbuf += sizeof h->sbuf.simple_reply;
+    h->rlen = sizeof h->sbuf.sr.hdr.structured_reply;
+    h->rlen -= sizeof h->sbuf.simple_reply;
+    SET_NEXT_STATE (%RECV_REMAINING);
+  }
   return 0;

  REPLY.STRUCTURED_REPLY.RECV_REMAINING:
@@ -75,12 +79,21 @@ STATE_MACHINE {
   struct command *cmd = h->reply_cmd;
   uint16_t flags, type;
   uint64_t cookie;
-  uint32_t length;
+  uint64_t length;

-  flags = be16toh (h->sbuf.sr.structured_reply.flags);
-  type = be16toh (h->sbuf.sr.structured_reply.type);
-  cookie = be64toh (h->sbuf.sr.structured_reply.handle);
-  length = be32toh (h->sbuf.sr.structured_reply.length);
+  flags = be16toh (h->sbuf.sr.hdr.structured_reply.flags);
+  type = be16toh (h->sbuf.sr.hdr.structured_reply.type);
+  cookie = be64toh (h->sbuf.sr.hdr.structured_reply.handle);
+  if (h->extended_headers) {
+    length = be64toh (h->sbuf.sr.hdr.structured_reply_ext.length);
+    if (h->sbuf.sr.hdr.structured_reply_ext.pad) {
+      set_error (0, "server sent non-zero padding in structured reply header");
+      SET_NEXT_STATE(%.DEAD);
+      return 0;
+    }
+  }
+  else
+    length = be32toh (h->sbuf.sr.hdr.structured_reply.length);

   assert (cmd);
   assert (cmd->cookie == cookie);
@@ -97,6 +110,10 @@ STATE_MACHINE {
     SET_NEXT_STATE (%.DEAD);
     return 0;
   }
+  /* For convenience, we now normalize extended replies into compact,
+   * doable since we validated length fits in 32 bits.
+   */
+  h->sbuf.sr.hdr.structured_reply.length = length;

   if (NBD_REPLY_TYPE_IS_ERR (type)) {
     if (length < sizeof h->sbuf.sr.payload.error.error) {
@@ -207,7 +224,7 @@ STATE_MACHINE {
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
+    length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */
     msglen = be16toh (h->sbuf.sr.payload.error.error.len);
     if (msglen > length - sizeof h->sbuf.sr.payload.error.error ||
         msglen > sizeof h->sbuf.sr.payload.error.msg) {
@@ -233,9 +250,9 @@ STATE_MACHINE {
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
+    length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */
     msglen = be16toh (h->sbuf.sr.payload.error.error.len);
-    type = be16toh (h->sbuf.sr.structured_reply.type);
+    type = be16toh (h->sbuf.sr.hdr.structured_reply.type);

     length -= sizeof h->sbuf.sr.payload.error.error + msglen;

@@ -281,7 +298,7 @@ STATE_MACHINE {
     return 0;
   case 0:
     error = be32toh (h->sbuf.sr.payload.error.error.error);
-    type = be16toh (h->sbuf.sr.structured_reply.type);
+    type = be16toh (h->sbuf.sr.hdr.structured_reply.type);

     assert (cmd); /* guaranteed by CHECK */
     error = nbd_internal_errno_of_nbd_error (error);
@@ -339,7 +356,7 @@ STATE_MACHINE {
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
+    length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */
     offset = be64toh (h->sbuf.sr.payload.offset_data.offset);

     assert (cmd); /* guaranteed by CHECK */
@@ -377,7 +394,7 @@ STATE_MACHINE {
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
+    length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */
     offset = be64toh (h->sbuf.sr.payload.offset_data.offset);

     assert (cmd); /* guaranteed by CHECK */
@@ -454,7 +471,7 @@ STATE_MACHINE {
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
+    length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */

     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
@@ -489,7 +506,7 @@ STATE_MACHINE {
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
+    length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */

     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
@@ -535,7 +552,7 @@ STATE_MACHINE {
  REPLY.STRUCTURED_REPLY.FINISH:
   uint16_t flags;

-  flags = be16toh (h->sbuf.sr.structured_reply.flags);
+  flags = be16toh (h->sbuf.sr.hdr.structured_reply.flags);
   if (flags & NBD_REPLY_FLAG_DONE) {
     SET_NEXT_STATE (%^FINISH_COMMAND);
   }
diff --git a/generator/states-reply.c b/generator/states-reply.c
index 9099a76a..949e982e 100644
--- a/generator/states-reply.c
+++ b/generator/states-reply.c
@@ -1,5 +1,5 @@
 /* nbd client library in userspace: state machine
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -68,7 +68,10 @@ STATE_MACHINE {
   assert (h->rlen == 0);

   h->rbuf = &h->sbuf;
-  h->rlen = sizeof h->sbuf.simple_reply;
+  if (h->extended_headers)
+    h->rlen = sizeof h->sbuf.simple_reply_ext;
+  else
+    h->rlen = sizeof h->sbuf.simple_reply;

   r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen);
   if (r == -1) {
@@ -113,13 +116,27 @@ STATE_MACHINE {
   uint64_t cookie;

   magic = be32toh (h->sbuf.simple_reply.magic);
-  if (magic == NBD_SIMPLE_REPLY_MAGIC) {
+  switch (magic) {
+  case NBD_SIMPLE_REPLY_MAGIC:
+  case NBD_SIMPLE_REPLY_EXT_MAGIC:
+    if ((magic == NBD_SIMPLE_REPLY_MAGIC) == h->extended_headers)
+      goto invalid;
+    if (magic == NBD_SIMPLE_REPLY_EXT_MAGIC &&
+        (h->sbuf.simple_reply_ext.pad1 || h->sbuf.simple_reply_ext.pad2)) {
+      set_error (0, "server sent non-zero padding in simple reply header");
+      SET_NEXT_STATE (%.DEAD);
+      return 0;
+    }
     SET_NEXT_STATE (%SIMPLE_REPLY.START);
-  }
-  else if (magic == NBD_STRUCTURED_REPLY_MAGIC) {
+    break;
+  case NBD_STRUCTURED_REPLY_MAGIC:
+  case NBD_STRUCTURED_REPLY_EXT_MAGIC:
+    if ((magic == NBD_STRUCTURED_REPLY_MAGIC) == h->extended_headers)
+      goto invalid;
     SET_NEXT_STATE (%STRUCTURED_REPLY.START);
-  }
-  else {
+    break;
+  default:
+  invalid:
     SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */
     set_error (0, "invalid reply magic");
     return 0;
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 06/13] protocol: Accept 64-bit holes during pread
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (4 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 05/13] protocol: Prepare to receive 64-bit replies Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 07/13] generator: Add struct nbd_extent in prep for 64-bit extents Eric Blake
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

Even though we don't allow the user to request NBD_CMD_READ with more
than 64M (and even if we did, our API signature caps us at SIZE_MAX,
which is 32 bits on a 32-bit machine), the NBD extension to allow
64-bit requests implies that for symmetry we have to be able to
support 64-bit holes over the wire.  Note that we don't have to change
the signature of the callback for nbd_pread_structured; nor is it
worth adding a counterpart to LIBNBD_READ_HOLE, because it is unlikely
that a user callback will ever need to distinguish between which size
was sent over the wire, when the value is always less than 32 bits.

While we cannot guarantee which size structured reply the server will
use, it is easy enough to handle both sizes, even for a non-compliant
server that sends wide replies when extended headers were not
negotiated.  Of course, until a later patch enables extended headers
negotiation, no compliant server will trigger the new code here.
---
 lib/internal.h                      |  1 +
 generator/states-reply-structured.c | 41 +++++++++++++++++++++++++----
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index c9f84441..06f3a65c 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -231,6 +231,7 @@ struct nbd_handle {
       union {
         struct nbd_structured_reply_offset_data offset_data;
         struct nbd_structured_reply_offset_hole offset_hole;
+        struct nbd_structured_reply_offset_hole_ext offset_hole_ext;
         struct {
           struct nbd_structured_reply_error error;
           char msg[NBD_MAX_STRING]; /* Common to all error types */
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index 1b675e8d..a3e0e2ac 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -26,15 +26,16 @@
  * requesting command.
  */
 static bool
-structured_reply_in_bounds (uint64_t offset, uint32_t length,
+structured_reply_in_bounds (uint64_t offset, uint64_t length,
                             const struct command *cmd)
 {
   if (offset < cmd->offset ||
       offset >= cmd->offset + cmd->count ||
-      offset + length > cmd->offset + cmd->count) {
+      length > cmd->offset + cmd->count ||
+      offset > cmd->offset + cmd->count - length) {
     set_error (0, "range of structured reply is out of bounds, "
                "offset=%" PRIu64 ", cmd->offset=%" PRIu64 ", "
-               "length=%" PRIu32 ", cmd->count=%" PRIu64 ": "
+               "length=%" PRIu64 ", cmd->count=%" PRIu64 ": "
                "this is likely to be a bug in the NBD server",
                offset, cmd->offset, length, cmd->count);
     return false;
@@ -182,6 +183,25 @@ STATE_MACHINE {
     SET_NEXT_STATE (%RECV_OFFSET_HOLE);
     return 0;
   }
+  else if (type == NBD_REPLY_TYPE_OFFSET_HOLE_EXT) {
+    if (cmd->type != NBD_CMD_READ) {
+      SET_NEXT_STATE (%.DEAD);
+      set_error (0, "invalid command for receiving offset-hole chunk, "
+                 "cmd->type=%" PRIu16 ", "
+                 "this is likely to be a bug in the server",
+                 cmd->type);
+      return 0;
+    }
+    if (length != sizeof h->sbuf.sr.payload.offset_hole_ext) {
+      SET_NEXT_STATE (%.DEAD);
+      set_error (0, "invalid length in NBD_REPLY_TYPE_OFFSET_HOLE_EXT");
+      return 0;
+    }
+    h->rbuf = &h->sbuf.sr.payload.offset_hole_ext;
+    h->rlen = sizeof h->sbuf.sr.payload.offset_hole_ext;
+    SET_NEXT_STATE (%RECV_OFFSET_HOLE);
+    return 0;
+  }
   else if (type == NBD_REPLY_TYPE_BLOCK_STATUS) {
     if (cmd->type != NBD_CMD_BLOCK_STATUS) {
       SET_NEXT_STATE (%.DEAD);
@@ -415,7 +435,8 @@ STATE_MACHINE {
  REPLY.STRUCTURED_REPLY.RECV_OFFSET_HOLE:
   struct command *cmd = h->reply_cmd;
   uint64_t offset;
-  uint32_t length;
+  uint64_t length;
+  uint16_t type;

   switch (recv_into_rbuf (h)) {
   case -1: SET_NEXT_STATE (%.DEAD); return 0;
@@ -425,7 +446,14 @@ STATE_MACHINE {
     return 0;
   case 0:
     offset = be64toh (h->sbuf.sr.payload.offset_hole.offset);
-    length = be32toh (h->sbuf.sr.payload.offset_hole.length);
+    type = be16toh (h->sbuf.sr.hdr.structured_reply.type);
+
+    if (type == NBD_REPLY_TYPE_OFFSET_HOLE)
+      length = be32toh (h->sbuf.sr.payload.offset_hole.length);
+    else {
+      /* XXX Insist on h->extended_headers? */
+      length = be64toh (h->sbuf.sr.payload.offset_hole_ext.length);
+    }

     assert (cmd); /* guaranteed by CHECK */

@@ -443,7 +471,10 @@ STATE_MACHINE {
     /* The spec states that 0-length requests are unspecified, but
      * 0-length replies are broken. Still, it's easy enough to support
      * them as an extension, and this works even when length == 0.
+     * Although length is 64 bits, the bounds check above ensures that
+     * it is no larger than the 64M cap we put on NBD_CMD_READ.
      */
+    assert (length <= SIZE_MAX);
     memset (cmd->data + offset, 0, length);
     if (CALLBACK_IS_NOT_NULL (cmd->cb.fn.chunk)) {
       int error = cmd->error;
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 07/13] generator: Add struct nbd_extent in prep for 64-bit extents
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (5 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 06/13] protocol: Accept 64-bit holes during pread Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 08/13] block_status: Track 64-bit extents internally Eric Blake
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

The existing nbd_block_status() callback is permanently stuck with an
array of uint32_t pairs (len/2 extents), and exposing 64-bit extents
requires a new API.  Before we get there, we first need a way to
express a struct containing uint64_t length and uint32_t flags across
the various language bindings in the callback that will be used by the
new API.

For the language bindings, we have to construct an array of a similar
struct in the target language's preferred format.  The bindings for
Python and OCaml were relatively straightforward; the Golang bindings
took a bit more effort for me to write.  Temporary unused attributes
are needed to keep the compiler happy until a later patch exposes a
new API using the new callback type.
---
 generator/API.ml    | 12 +++++++++++-
 generator/API.mli   |  3 ++-
 generator/C.ml      | 24 +++++++++++++++++++++---
 generator/GoLang.ml | 24 ++++++++++++++++++++++++
 generator/OCaml.ml  | 21 +++++++++++++++++----
 generator/Python.ml | 30 ++++++++++++++++++++++++++----
 ocaml/helpers.c     | 22 +++++++++++++++++++++-
 ocaml/nbd-c.h       |  3 ++-
 golang/handle.go    |  6 ++++++
 9 files changed, 130 insertions(+), 15 deletions(-)

diff --git a/generator/API.ml b/generator/API.ml
index cf2e7543..70ae721d 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -42,6 +42,7 @@
 | BytesPersistOut of string * string
 | Closure of closure
 | Enum of string * enum
+| Extent64 of string
 | Fd of string
 | Flags of string * flags
 | Int of string
@@ -142,6 +143,14 @@ let extent_closure =
                             "nr_entries");
              CBMutable (Int "error") ]
 }
+let extent64_closure = {
+  cbname = "extent64";
+  cbargs = [ CBString "metacontext";
+             CBUInt64 "offset";
+             CBArrayAndLen (Extent64 "entries",
+                            "nr_entries");
+             CBMutable (Int "error") ]
+}
 let list_closure = {
   cbname = "list";
   cbargs = [ CBString "name"; CBString "description" ]
@@ -151,7 +160,8 @@ let context_closure =
   cbargs = [ CBString "name" ]
 }
 let all_closures = [ chunk_closure; completion_closure;
-                     debug_closure; extent_closure; list_closure;
+                     debug_closure; extent_closure; extent64_closure;
+                     list_closure;
                      context_closure ]

 (* Enums. *)
diff --git a/generator/API.mli b/generator/API.mli
index d284637f..922d8120 100644
--- a/generator/API.mli
+++ b/generator/API.mli
@@ -1,6 +1,6 @@
 (* hey emacs, this is OCaml code: -*- tuareg -*- *)
 (* nbd client library in userspace: the API
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -52,6 +52,7 @@ and
 | BytesPersistOut of string * string
 | Closure of closure       (** function pointer + void *opaque *)
 | Enum of string * enum    (** enum/union type, int in C *)
+| Extent64 of string       (** extent descriptor, struct nbd_extent in C *)
 | Fd of string             (** file descriptor *)
 | Flags of string * flags  (** flags, uint32_t in C *)
 | Int of string            (** small int *)
diff --git a/generator/C.ml b/generator/C.ml
index 797af531..7b0be583 100644
--- a/generator/C.ml
+++ b/generator/C.ml
@@ -1,6 +1,6 @@
 (* hey emacs, this is OCaml code: -*- tuareg -*- *)
 (* nbd client library in userspace: generate the C API and documentation
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -90,6 +90,7 @@ let
 | Closure { cbname } ->
    [ sprintf "%s_callback" cbname; sprintf "%s_user_data" cbname ]
 | Enum (n, _) -> [n]
+| Extent64 n -> [n]
 | Fd n -> [n]
 | Flags (n, _) -> [n]
 | Int n -> [n]
@@ -152,6 +153,9 @@ and
       | Enum (n, _) ->
          if types then pr "int ";
          pr "%s" n
+      | Extent64 n ->
+         if types then pr "nbd_extent ";
+         pr "%s" n
       | Flags (n, _) ->
          if types then pr "uint32_t ";
          pr "%s" n
@@ -238,6 +242,11 @@ and
          pr "%s, " n;
          if types then pr "size_t ";
          pr "%s" len
+      | CBArrayAndLen (Extent64 n, len) ->
+         if types then pr "nbd_extent *";
+         pr "%s, " n;
+         if types then pr "size_t ";
+         pr "%s" len
       | CBArrayAndLen _ -> assert false
       | CBBytesIn (n, len) ->
          if types then pr "const void *";
@@ -388,6 +397,13 @@ let
   pr "extern int nbd_get_errno (void);\n";
   pr "#define LIBNBD_HAVE_NBD_GET_ERRNO 1\n";
   pr "\n";
+  pr "/* This is used in the callback for nbd_block_status_64.\n";
+  pr " */\n";
+  pr "typedef struct {\n";
+  pr "  uint64_t length;\n";
+  pr "  uint32_t flags;\n";
+  pr "} nbd_extent;\n";
+  pr "\n";
   print_closure_structs ();
   List.iter (
     fun (name, { args; optargs; ret }) ->
@@ -630,7 +646,7 @@ let
          pr "    char *%s_printable =\n" n;
          pr "        nbd_internal_printable_string_list (%s);\n" n
       | BytesOut _ | BytesPersistOut _
-      | Bool _ | Closure _ | Enum _ | Flags _ | Fd _ | Int _
+      | Bool _ | Closure _ | Enum _ | Extent64 _ | Flags _ | Fd _ | Int _
       | Int64 _ | SizeT _
       | SockAddrAndLen _ | UInt _ | UInt32 _ | UInt64 _ | UIntPtr _ -> ()
     ) args;
@@ -645,6 +661,7 @@ let
          pr " %s=\\\"%%s\\\" %s=%%zu" n count
       | Closure { cbname } -> pr " %s=<fun>" cbname
       | Enum (n, _) -> pr " %s=%%d" n
+      | Extent64 _ -> assert false (* only used in extent64_closure *)
       | Flags (n, _) -> pr " %s=0x%%x" n
       | Fd n | Int n -> pr " %s=%%d" n
       | Int64 n -> pr " %s=%%\" PRIi64 \"" n
@@ -674,6 +691,7 @@ let
          pr ", %s_printable ? %s_printable : \"\", %s" n n count
       | Closure { cbname } -> ()
       | Enum (n, _) -> pr ", %s" n
+      | Extent64 _ -> assert false (* only used in extent64_closure *)
       | Flags (n, _) -> pr ", %s" n
       | Fd n | Int n | Int64 n | SizeT n -> pr ", %s" n
       | SockAddrAndLen (_, len) -> pr ", (int) %s" len
@@ -697,7 +715,7 @@ let
       | StringList n ->
          pr "    free (%s_printable);\n" n
       | BytesOut _ | BytesPersistOut _
-      | Bool _ | Closure _ | Enum _ | Flags _ | Fd _ | Int _
+      | Bool _ | Closure _ | Enum _ | Extent64 _ | Flags _ | Fd _ | Int _
       | Int64 _ | SizeT _
       | SockAddrAndLen _ | UInt _ | UInt32 _ | UInt64 _ | UIntPtr _ -> ()
     ) args;
diff --git a/generator/GoLang.ml b/generator/GoLang.ml
index d3b7dc79..7363063f 100644
--- a/generator/GoLang.ml
+++ b/generator/GoLang.ml
@@ -49,6 +49,7 @@ let
   | BytesPersistOut (n, len) -> n
   | Closure { cbname } -> cbname
   | Enum (n, _) -> n
+  | Extent64 n -> n
   | Fd n -> n
   | Flags (n, _) -> n
   | Int n -> n
@@ -71,6 +72,7 @@ let
   | BytesPersistOut _ -> "AioBuffer"
   | Closure { cbname } -> sprintf "%sCallback" (camel_case cbname)
   | Enum (_, { enum_prefix }) -> camel_case enum_prefix
+  | Extent64 _ -> assert false (* only used in extent64_closure *)
   | Fd _ -> "int"
   | Flags (_, { flag_prefix }) -> camel_case flag_prefix
   | Int _ -> "int"
@@ -261,6 +263,7 @@ let
        pr "    c_%s.user_data = C.alloc_cbid(C.long(%s_cbid))\n" cbname cbname
     | Enum (n, _) ->
        pr "    c_%s := C.int (%s)\n" n n
+    | Extent64 _ -> assert false (* only used in extent64_closure *)
     | Fd n ->
        pr "    c_%s := C.int (%s)\n" n n
     | Flags (n, _) ->
@@ -333,6 +336,7 @@ let
     | BytesPersistOut (n, len) ->  pr ", c_%s, c_%s" n len
     | Closure { cbname } ->  pr ", c_%s" cbname
     | Enum (n, _) -> pr ", c_%s" n
+    | Extent64 _ -> assert false (* only used in extent64_closure *)
     | Fd n -> pr ", c_%s" n
     | Flags (n, _) -> pr ", c_%s" n
     | Int n -> pr ", c_%s" n
@@ -524,6 +528,18 @@ let
     copy(ret, arrayPtr[:count:count])
     return ret
 }
+
+func copy_extent_array (entries *C.nbd_extent, count C.size_t) []LibnbdExtent {
+    unsafePtr := unsafe.Pointer(entries)
+    arrayPtr := (*[1 << 20]C.nbd_extent)(unsafePtr)
+    slice := arrayPtr[:count:count]
+    ret := make([]LibnbdExtent, count)
+    for i := 0; i < int (count); i++ {
+      ret[i].Length = uint64 (slice[i].length)
+      ret[i].Flags = uint32 (slice[i].flags)
+    }
+    return ret
+}
 ";

   List.iter (
@@ -537,6 +553,8 @@ let
           match cbarg with
           | CBArrayAndLen (UInt32 n, _) ->
              pr "%s []uint32" n;
+          | CBArrayAndLen (Extent64 n, _) ->
+             pr "%s []LibnbdExtent" n;
           | CBBytesIn (n, len) ->
              pr "%s []byte" n;
           | CBInt n ->
@@ -563,6 +581,8 @@ let
           match cbarg with
           | CBArrayAndLen (UInt32 n, count) ->
              pr "%s *C.uint32_t, %s C.size_t" n count
+          | CBArrayAndLen (Extent64 n, count) ->
+             pr "%s *C.nbd_extent, %s C.size_t" n count
           | CBBytesIn (n, len) ->
              pr "%s unsafe.Pointer, %s C.size_t" n len
           | CBInt n ->
@@ -605,6 +625,8 @@ let
           match cbarg with
           | CBArrayAndLen (UInt32 n, count) ->
              pr "copy_uint32_array (%s, %s)" n count
+          | CBArrayAndLen (Extent64 n, count) ->
+             pr "copy_extent_array (%s, %s)" n count
           | CBBytesIn (n, len) ->
              pr "C.GoBytes (%s, C.int (%s))" n len
           | CBInt n ->
@@ -756,6 +778,8 @@ let
            match cbarg with
            | CBArrayAndLen (UInt32 n, count) ->
               pr "uint32_t *%s, size_t %s" n count
+           | CBArrayAndLen (Extent64 n, count) ->
+              pr "nbd_extent *%s, size_t %s" n count
            | CBBytesIn (n, len) ->
               pr "void *%s, size_t %s" n len
            | CBInt n ->
diff --git a/generator/OCaml.ml b/generator/OCaml.ml
index 1349609b..eac42668 100644
--- a/generator/OCaml.ml
+++ b/generator/OCaml.ml
@@ -1,6 +1,6 @@
 (* hey emacs, this is OCaml code: -*- tuareg -*- *)
 (* nbd client library in userspace: generator
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -44,6 +44,7 @@ and
   | Closure { cbargs } ->
      sprintf "(%s)" (ocaml_closuredecl_to_string cbargs)
   | Enum (_, { enum_prefix }) -> sprintf "%s.t" enum_prefix
+  | Extent64 _ -> "extent"
   | Fd _ -> "Unix.file_descr"
   | Flags (_, { flag_prefix }) -> sprintf "%s.t list" flag_prefix
   | Int _ -> "int"
@@ -100,6 +101,7 @@ let
   | BytesPersistOut (n, len) -> n
   | Closure { cbname } -> cbname
   | Enum (n, _) -> n
+  | Extent64 n -> n
   | Fd n -> n
   | Flags (n, _) -> n
   | Int n -> n
@@ -145,6 +147,9 @@ let

 type cookie = int64

+type extent = int64 * int32
+(** Length and flags of an extent in block_status_64 callback. *)
+
 ";

   List.iter (
@@ -246,6 +251,7 @@ let
 exception Error of string * int
 exception Closed of string
 type cookie = int64
+type extent = int64 * int32

 (* Give the exceptions names so that they can be raised from the C code. *)
 let () =
@@ -469,7 +475,8 @@ let
   let argnames =
     List.map (
       function
-      | CBArrayAndLen (UInt32 n, _) | CBBytesIn (n, _)
+      | CBArrayAndLen (UInt32 n, _) | CBArrayAndLen (Extent64 n, _)
+      | CBBytesIn (n, _)
       | CBInt n | CBInt64 n
       | CBMutable (Int n) | CBString n | CBUInt n | CBUInt64 n ->
          n ^ "v"
@@ -497,6 +504,9 @@ let
     | CBArrayAndLen (UInt32 n, count) ->
        pr "  %sv = nbd_internal_ocaml_alloc_int32_array (%s, %s);\n"
          n n count;
+    | CBArrayAndLen (Extent64 n, count) ->
+       pr "  %sv = nbd_internal_ocaml_alloc_extent64_array (%s, %s);\n"
+         n n count;
     | CBBytesIn (n, len) ->
        pr "  %sv = caml_alloc_initialized_string (%s, %s);\n" n len n
     | CBInt n | CBUInt n ->
@@ -520,7 +530,7 @@ let

   List.iter (
     function
-    | CBArrayAndLen (UInt32 _, _)
+    | CBArrayAndLen (_, _)
     | CBBytesIn _
     | CBInt _
     | CBInt64 _
@@ -529,7 +539,7 @@ let
     | CBUInt64 _ -> ()
     | CBMutable (Int n) ->
        pr "  *%s = Int_val (Field (%sv, 0));\n" n n
-    | CBArrayAndLen _ | CBMutable _ -> assert false
+    | CBMutable _ -> assert false
   ) cbargs;

   pr "  if (Is_exception_result (rv)) {\n";
@@ -544,6 +554,7 @@ let
   pr "}\n";
   pr "\n";
   pr "static int\n";
+  pr "__attribute__((unused)) /* XXX temporary hack */\n";
   pr "%s_wrapper " cbname;
   C.print_cbarg_list ~wrap:true cbargs;
   pr "\n";
@@ -659,6 +670,7 @@ let
        pr "  %s_callback.free = free_user_data;\n" cbname
     | Enum (n, { enum_prefix }) ->
        pr "  int %s = %s_val (%sv);\n" n enum_prefix n
+    | Extent64 _ -> assert false (* only used in extent64_closure *)
     | Fd n ->
        pr "  /* OCaml Unix.file_descr is just an int, at least on Unix. */\n";
        pr "  int %s = Int_val (%sv);\n" n n
@@ -739,6 +751,7 @@ let
     | BytesPersistOut _
     | Closure _
     | Enum _
+    | Extent64 _
     | Fd _
     | Flags _
     | Int _
diff --git a/generator/Python.ml b/generator/Python.ml
index 4ab18f62..4212e2ac 100644
--- a/generator/Python.ml
+++ b/generator/Python.ml
@@ -158,6 +158,7 @@ let
 let print_python_closure_wrapper { cbname; cbargs } =
   pr "/* Wrapper for %s callback. */\n" cbname;
   pr "static int\n";
+  pr "__attribute__((unused)) /* XXX temporary hack */\n";
   pr "%s_wrapper " cbname;
   C.print_cbarg_list ~wrap:true cbargs;
   pr "\n";
@@ -169,7 +170,8 @@ let
   pr "  PyObject *py_args, *py_ret;\n";
   List.iter (
     function
-    | CBArrayAndLen (UInt32 n, _) ->
+    | CBArrayAndLen (UInt32 n, _)
+    | CBArrayAndLen (Extent64 n, _) ->
        pr "  PyObject *py_%s = NULL;\n" n
     | CBMutable (Int n) ->
        pr "  PyObject *py_%s = NULL;\n" n
@@ -187,6 +189,16 @@ let
        pr "    if (!py_e_%s) { PyErr_PrintEx (0); goto out; }\n" n;
        pr "    PyList_SET_ITEM (py_%s, i_%s, py_e_%s);\n" n n n;
        pr "  }\n"
+    | CBArrayAndLen (Extent64 n, len) ->
+       pr "  py_%s = PyList_New (%s);\n" n len;
+       pr "  size_t i_%s;\n" n;
+       pr "  for (i_%s = 0; i_%s < %s; ++i_%s) {\n" n n len n;
+       pr "    PyObject *py_e_%s = Py_BuildValue (\"OO\",\n" n;
+       pr "      PyLong_FromUnsignedLong (%s[i_%s].length),\n" n n;
+       pr "      PyLong_FromUnsignedLong (%s[i_%s].flags));\n" n n;
+       pr "    if (!py_e_%s) { PyErr_PrintEx (0); goto out; }\n" n;
+       pr "    PyList_SET_ITEM (py_%s, i_%s, py_e_%s);\n" n n n;
+       pr "  }\n"
     | CBBytesIn _
     | CBInt _
     | CBInt64 _ -> ()
@@ -209,7 +221,7 @@ let
   pr "  py_args = Py_BuildValue (\"(\"";
   List.iter (
     function
-    | CBArrayAndLen (UInt32 n, len) -> pr " \"O\""
+    | CBArrayAndLen (_, len) -> pr " \"O\""
     | CBBytesIn (n, len) -> pr " \"y#\""
     | CBInt n -> pr " \"i\""
     | CBInt64 n -> pr " \"L\""
@@ -217,12 +229,13 @@ let
     | CBString n -> pr " \"s\""
     | CBUInt n -> pr " \"I\""
     | CBUInt64 n -> pr " \"K\""
-    | CBArrayAndLen _ | CBMutable _ -> assert false
+    | CBMutable _ -> assert false
   ) cbargs;
   pr " \")\"";
   List.iter (
     function
     | CBArrayAndLen (UInt32 n, _) -> pr ", py_%s" n
+    | CBArrayAndLen (Extent64 n, _) -> pr ", py_%s" n
     | CBBytesIn (n, len) -> pr ", %s, (int) %s" n len
     | CBMutable (Int n) -> pr ", py_%s" n
     | CBInt n | CBInt64 n
@@ -259,7 +272,8 @@ let
   pr " out:\n";
   List.iter (
     function
-    | CBArrayAndLen (UInt32 n, _) ->
+    | CBArrayAndLen (UInt32 n, _)
+    | CBArrayAndLen (Extent64 n, _) ->
        pr "  Py_XDECREF (py_%s);\n" n
     | CBMutable (Int n) ->
        pr "  if (py_%s) {\n" n;
@@ -307,6 +321,7 @@ let
          cbname cbname cbname;
        pr "                         .free = free_user_data };\n"
     | Enum (n, _) -> pr "  int %s;\n" n
+    | Extent64 _ -> assert false (* only used in extent64_closure *)
     | Flags (n, _) ->
        pr "  uint32_t %s_u32;\n" n;
        pr "  unsigned int %s; /* really uint32_t */\n" n
@@ -360,6 +375,7 @@ let
     | BytesPersistOut (_, count) -> pr " \"O\""
     | Closure _ -> pr " \"O\""
     | Enum _ -> pr " \"i\""
+    | Extent64 _ -> assert false (* only used in extent64_closure *)
     | Flags _ -> pr " \"I\""
     | Fd n | Int n -> pr " \"i\""
     | Int64 n -> pr " \"L\""
@@ -388,6 +404,7 @@ let
     | BytesOut (_, count) -> pr ", &%s" count
     | Closure { cbname } -> pr ", &py_%s_fn" cbname
     | Enum (n, _) -> pr ", &%s" n
+    | Extent64 _ -> assert false
     | Flags (n, _) -> pr ", &%s" n
     | Fd n | Int n | SizeT n | Int64 n -> pr ", &%s" n
     | Path n -> pr ", PyUnicode_FSConverter, &py_%s" n
@@ -452,6 +469,7 @@ let
        pr "  Py_INCREF (py_%s_fn);\n" cbname;
        pr "  %s_user_data->fn = py_%s_fn;\n" cbname cbname
     | Enum _ -> ()
+    | Extent64 _ -> ()
     | Flags (n, _) -> pr "  %s_u32 = %s;\n" n n
     | Fd _ | Int _ -> ()
     | Int64 n -> pr "  %s_i64 = %s;\n" n n
@@ -484,6 +502,7 @@ let
     | BytesPersistOut (n, _) -> pr ", %s_buf->data, %s_buf->len" n n
     | Closure { cbname } -> pr ", %s" cbname
     | Enum (n, _) -> pr ", %s" n
+    | Extent64 _ -> assert false (* only used in extent64_closure *)
     | Flags (n, _) -> pr ", %s_u32" n
     | Fd n | Int n -> pr ", %s" n
     | Int64 n -> pr ", %s_i64" n
@@ -532,6 +551,7 @@ let
     | BytesPersistIn _ | BytesPersistOut _
     | Closure _
     | Enum _
+    | Extent64 _
     | Flags _
     | Fd _ | Int _
     | Int64 _
@@ -577,6 +597,7 @@ let
     | Closure { cbname } ->
        pr "  free_user_data (%s_user_data);\n" cbname
     | Enum _ -> ()
+    | Extent64 _ -> ()
     | Flags _ -> ()
     | Fd _ | Int _ -> ()
     | Int64 _ -> ()
@@ -820,6 +841,7 @@ let
           | BytesPersistOut (n, _) -> n, None, Some (sprintf "%s._o" n)
           | Closure { cbname } -> cbname, None, None
           | Enum (n, _) -> n, None, None
+          | Extent64 _ -> assert false (* only used in extent64_closure *)
           | Flags (n, _) -> n, None, None
           | Fd n | Int n -> n, None, None
           | Int64 n -> n, None, None
diff --git a/ocaml/helpers.c b/ocaml/helpers.c
index 90333cd7..d15ffaf3 100644
--- a/ocaml/helpers.c
+++ b/ocaml/helpers.c
@@ -1,5 +1,5 @@
 /* NBD client library in userspace
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -112,6 +112,26 @@ nbd_internal_ocaml_alloc_int32_array (uint32_t *a, size_t len)
   CAMLreturn (rv);
 }

+value
+nbd_internal_ocaml_alloc_extent64_array (nbd_extent *a, size_t len)
+{
+  CAMLparam0 ();
+  CAMLlocal3 (s, v, rv);
+  size_t i;
+
+  rv = caml_alloc (len, 0);
+  for (i = 0; i < len; ++i) {
+    s = caml_alloc (2, 0);
+    v = caml_copy_int64 (a[i].length);
+    Store_field (s, 0, v);
+    v = caml_copy_int32 (a[i].flags);
+    Store_field (s, 1, v);
+    Store_field (rv, i, s);
+  }
+
+  CAMLreturn (rv);
+}
+
 /* Common code when an exception is raised in an OCaml callback and
  * the wrapper has to deal with it.  Callbacks are not supposed to
  * raise exceptions, so we print it.  We also handle Assert_failure
diff --git a/ocaml/nbd-c.h b/ocaml/nbd-c.h
index 9f362fa1..3b66049d 100644
--- a/ocaml/nbd-c.h
+++ b/ocaml/nbd-c.h
@@ -1,5 +1,5 @@
 /* NBD client library in userspace
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -61,6 +61,7 @@ extern void nbd_internal_ocaml_raise_closed (const char *func) Noreturn;

 extern const char **nbd_internal_ocaml_string_list (value);
 extern value nbd_internal_ocaml_alloc_int32_array (uint32_t *, size_t);
+extern value nbd_internal_ocaml_alloc_extent64_array (nbd_extent *, size_t);
 extern void nbd_internal_ocaml_exception_in_wrapper (const char *, value);

 /* Extract an NBD handle from an OCaml heap value. */
diff --git a/golang/handle.go b/golang/handle.go
index c8d9485d..43a0e489 100644
--- a/golang/handle.go
+++ b/golang/handle.go
@@ -58,6 +58,12 @@ func (h *Libnbd) String() string {
 	return "&Libnbd{}"
 }

+/* Used for block status callback. */
+type LibnbdExtent struct {
+	Length uint64        // length of the extent
+	Flags  uint32        // flags describing properties of the extent
+}
+
 /* All functions (except Close) return ([result,] LibnbdError). */
 type LibnbdError struct {
 	Op     string        // operation which failed
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 08/13] block_status: Track 64-bit extents internally
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (6 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 07/13] generator: Add struct nbd_extent in prep for 64-bit extents Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 09/13] block_status: Accept 64-bit extents during block status Eric Blake
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

When extended headers are in use, the server can send us 64-bit
extents, even for a 32-bit query (if the server knows the entire image
is data, for example).  For maximum flexibility, we are thus better
off storing 64-bit lengths internally, even if we have to convert it
back to 32-bit lengths when invoking the user's 32-bit callback.  The
next patch will then add a new API for letting the user access the
full 64-bit extent information.  The goal is to let both APIs work all
the time, regardless of the size extents that the server actually
answered with.

Note that when using the old nbd_block_status() API with a server that
lacks extended headers, we now add a double-conversion speed penalty
(converting the server's 32-bit answer into 64-bit internally and back
to 32-bit for the callback).  But the speed penalty will not be a
problem for applications using the new nbd_block_status_64() API (we
have to give a 64-bit answer no matter what the server answered), and
ideally the situation will become less common as more servers learn
extended headers.  So for now I chose to unconditionally use a 64-bit
internal representation; but if it turns out to have noticeable
degredation, we could tweak things to conditionally retain 32-bit
internal representation for servers lacking extended headers at the
expense of more code maintenance.

One of the trickier aspects of this patch is auditing that both the
user's extent and our malloc'd shim get cleaned up once on all
possible paths, so that there is neither a leak nor a double free.
---
 lib/internal.h                      |  7 +++-
 generator/states-reply-structured.c | 31 ++++++++++-----
 lib/handle.c                        |  4 +-
 lib/rw.c                            | 59 ++++++++++++++++++++++++++++-
 4 files changed, 85 insertions(+), 16 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index 06f3a65c..4800df83 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -75,7 +75,7 @@ struct export {

 struct command_cb {
   union {
-    nbd_extent_callback extent;
+    nbd_extent64_callback extent;
     nbd_chunk_callback chunk;
     nbd_list_callback list;
     nbd_context_callback context;
@@ -286,7 +286,10 @@ struct nbd_handle {

   /* When receiving block status, this is used. */
   uint32_t bs_contextid;
-  uint32_t *bs_entries;
+  union {
+    nbd_extent *normal; /* Our 64-bit preferred internal form */
+    uint32_t *narrow;   /* 32-bit form of NBD_REPLY_TYPE_BLOCK_STATUS */
+  } bs_entries;

   /* Commands which are waiting to be issued [meaning the request
    * packet is sent to the server].  This is used as a simple linked
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index a3e0e2ac..71c761e9 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -494,6 +494,7 @@ STATE_MACHINE {
  REPLY.STRUCTURED_REPLY.RECV_BS_CONTEXTID:
   struct command *cmd = h->reply_cmd;
   uint32_t length;
+  uint32_t count;

   switch (recv_into_rbuf (h)) {
   case -1: SET_NEXT_STATE (%.DEAD); return 0;
@@ -508,15 +509,19 @@ STATE_MACHINE {
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
     assert (length >= 12);
     length -= sizeof h->bs_contextid;
+    count = length / (2 * sizeof (uint32_t));

-    free (h->bs_entries);
-    h->bs_entries = malloc (length);
-    if (h->bs_entries == NULL) {
+    /* Read raw data into a subset of h->bs_entries, then expand it
+     * into place later later during byte-swapping.
+     */
+    free (h->bs_entries.normal);
+    h->bs_entries.normal = malloc (count * sizeof *h->bs_entries.normal);
+    if (h->bs_entries.normal == NULL) {
       SET_NEXT_STATE (%.DEAD);
       set_error (errno, "malloc");
       return 0;
     }
-    h->rbuf = h->bs_entries;
+    h->rbuf = h->bs_entries.narrow;
     h->rlen = length;
     SET_NEXT_STATE (%RECV_BS_ENTRIES);
   }
@@ -528,6 +533,7 @@ STATE_MACHINE {
   uint32_t count;
   size_t i;
   uint32_t context_id;
+  uint32_t *raw;
   struct meta_context *meta_context;

   switch (recv_into_rbuf (h)) {
@@ -542,15 +548,20 @@ STATE_MACHINE {
     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
     assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
-    assert (h->bs_entries);
+    assert (h->bs_entries.normal);
     assert (length >= 12);
-    count = (length - sizeof h->bs_contextid) / sizeof *h->bs_entries;
+    count = (length - sizeof h->bs_contextid) / (2 * sizeof (uint32_t));

     /* Need to byte-swap the entries returned, but apart from that we
-     * don't validate them.
+     * don't validate them.  Reverse order is essential, since we are
+     * expanding in-place from narrow to wider type.
      */
-    for (i = 0; i < count; ++i)
-      h->bs_entries[i] = be32toh (h->bs_entries[i]);
+    raw = h->bs_entries.narrow;
+    for (i = count; i > 0; ) {
+      --i;
+      h->bs_entries.normal[i].flags = be32toh (raw[i * 2 + 1]);
+      h->bs_entries.normal[i].length = be32toh (raw[i * 2]);
+    }

     /* Look up the context ID. */
     context_id = be32toh (h->bs_contextid);
@@ -566,7 +577,7 @@ STATE_MACHINE {

       if (CALL_CALLBACK (cmd->cb.fn.extent,
                          meta_context->name, cmd->offset,
-                         h->bs_entries, count,
+                         h->bs_entries.normal, count,
                          &error) == -1)
         if (cmd->error == 0)
           cmd->error = error ? error : EPROTO;
diff --git a/lib/handle.c b/lib/handle.c
index cbb37e89..74fe87ec 100644
--- a/lib/handle.c
+++ b/lib/handle.c
@@ -1,5 +1,5 @@
 /* NBD client library in userspace
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -123,7 +123,7 @@ nbd_close (struct nbd_handle *h)
   /* Free user callbacks first. */
   nbd_unlocked_clear_debug_callback (h);

-  free (h->bs_entries);
+  free (h->bs_entries.normal);
   nbd_internal_reset_size_and_flags (h);
   nbd_internal_free_option (h);
   free_cmd_list (h->cmds_to_issue);
diff --git a/lib/rw.c b/lib/rw.c
index 16c2e848..f36f4e15 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -42,6 +42,50 @@ wait_for_command (struct nbd_handle *h, int64_t cookie)
   return r == -1 ? -1 : 0;
 }

+/* Convert from 64-bit to 32-bit extent callback. */
+static int
+nbd_convert_extent (void *data, const char *metacontext, uint64_t offset,
+                    nbd_extent *entries, size_t nr_entries, int *error)
+{
+  nbd_extent_callback *cb = data;
+  uint32_t *array = malloc (nr_entries * 2 * sizeof *array);
+  size_t i;
+  int ret;
+
+  if (array == NULL) {
+    set_error (*error = errno, "malloc");
+    return -1;
+  }
+
+  for (i = 0; i < nr_entries; i++) {
+    array[i * 2] = entries[i].length;
+    array[i * 2 + 1] = entries[i].flags;
+    /* If an extent is larger than 32 bits, silently truncate the rest
+     * of the server's response.  Technically, such a server was
+     * non-compliant if the client did not negotiate extended headers,
+     * but it is easier to let the caller make progress than to make
+     * the call fail.  Rather than track the connection's alignment,
+     * just blindly truncate the large extent to 4G-64M.
+     */
+    if (entries[i].length > UINT32_MAX) {
+      array[i++ * 2] = -MAX_REQUEST_SIZE;
+      break;
+    }
+  }
+
+  ret = CALL_CALLBACK (*cb, metacontext, offset, array, i * 2, error);
+  free (array);
+  return ret;
+}
+
+static void
+nbd_convert_extent_free (void *data)
+{
+  nbd_extent_callback *cb = data;
+  FREE_CALLBACK (*cb);
+  free (cb);
+}
+
 /* Issue a read command and wait for the reply. */
 int
 nbd_unlocked_pread (struct nbd_handle *h, void *buf,
@@ -469,12 +513,23 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
                                nbd_completion_callback *completion,
                                uint32_t flags)
 {
-  struct command_cb cb = { .fn.extent = *extent,
+  nbd_extent_callback *shim = malloc (sizeof *shim);
+  struct command_cb cb = { .fn.extent.callback = nbd_convert_extent,
+                           .fn.extent.user_data = shim,
+                           .fn.extent.free = nbd_convert_extent_free,
                            .completion = *completion };

+  if (shim == NULL) {
+    set_error (errno, "malloc");
+    return -1;
+  }
+  *shim = *extent;
+  SET_CALLBACK_TO_NULL (*extent);
+
   if (h->strict & LIBNBD_STRICT_COMMANDS) {
     if (!h->structured_replies) {
       set_error (ENOTSUP, "server does not support structured replies");
+      FREE_CALLBACK (cb.fn.extent);
       return -1;
     }

@@ -482,11 +537,11 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
       set_error (ENOTSUP, "did not negotiate any metadata contexts, "
                  "either you did not call nbd_add_meta_context before "
                  "connecting or the server does not support it");
+      FREE_CALLBACK (cb.fn.extent);
       return -1;
     }
   }

-  SET_CALLBACK_TO_NULL (*extent);
   SET_CALLBACK_TO_NULL (*completion);
   return nbd_internal_command_common (h, flags, NBD_CMD_BLOCK_STATUS, offset,
                                       count, EINVAL, NULL, &cb);
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 09/13] block_status: Accept 64-bit extents during block status
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (7 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 08/13] block_status: Track 64-bit extents internally Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 10/13] api: Add [aio_]nbd_block_status_64 Eric Blake
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

Support a server giving us a 64-bit extent.  Note that the protocol
says a server should not give a 64-bit answer when extended headers
are not negotiated, but since the client's size is merely a hint, it
is possible for a server to have a 64-bit answer even when the
original query was 32 bits.  At any rate, it is just as easy for us to
always support the new chunk type as it is to complain when it is used
incorrectly by the server, and the user's 32-bit callback doesn't have
to care which size the server's result used (either the server's
result was a 32-bit value, or our shim silently truncates it, but the
user still makes progress).  Of course, until a later patch enables
extended headers negotiation, no compliant server will trigger the new
code here.

Implementation-wise, we don't care if we will be narrowing from the
server's 16-byte extent (including explicit padding) to a 12-byte
struct, or if our 'nbd_extent' type has implicit padding and is thus
also 16 bytes; either way, the order of our byte-swapping traversal is
safe.
---
 lib/internal.h                      |  1 +
 generator/states-reply-structured.c | 75 +++++++++++++++++++++++------
 2 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index 4800df83..97abf4f2 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -289,6 +289,7 @@ struct nbd_handle {
   union {
     nbd_extent *normal; /* Our 64-bit preferred internal form */
     uint32_t *narrow;   /* 32-bit form of NBD_REPLY_TYPE_BLOCK_STATUS */
+    struct nbd_block_descriptor_ext *wide; /* NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
   } bs_entries;

   /* Commands which are waiting to be issued [meaning the request
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index 71c761e9..29b1c3d8 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -22,6 +22,8 @@
 #include <stdint.h>
 #include <inttypes.h>

+#include "minmax.h"
+
 /* Structured reply must be completely inside the bounds of the
  * requesting command.
  */
@@ -202,7 +204,8 @@ STATE_MACHINE {
     SET_NEXT_STATE (%RECV_OFFSET_HOLE);
     return 0;
   }
-  else if (type == NBD_REPLY_TYPE_BLOCK_STATUS) {
+  else if (type == NBD_REPLY_TYPE_BLOCK_STATUS ||
+           type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT) {
     if (cmd->type != NBD_CMD_BLOCK_STATUS) {
       SET_NEXT_STATE (%.DEAD);
       set_error (0, "invalid command for receiving block-status chunk, "
@@ -211,12 +214,19 @@ STATE_MACHINE {
                  cmd->type);
       return 0;
     }
-    /* XXX We should be able to skip the bad reply in these two cases. */
-    if (length < 12 || ((length-4) & 7) != 0) {
+    /* XXX We should be able to skip the bad reply in these cases. */
+    if (type == NBD_REPLY_TYPE_BLOCK_STATUS &&
+        (length < 12 || (length-4) % (2 * sizeof(uint32_t)))) {
       SET_NEXT_STATE (%.DEAD);
       set_error (0, "invalid length in NBD_REPLY_TYPE_BLOCK_STATUS");
       return 0;
     }
+    if (type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT &&
+        (length < 20 || (length-4) % sizeof(struct nbd_block_descriptor_ext))) {
+      SET_NEXT_STATE (%.DEAD);
+      set_error (0, "invalid length in NBD_REPLY_TYPE_BLOCK_STATUS_EXT");
+      return 0;
+    }
     if (CALLBACK_IS_NULL (cmd->cb.fn.extent)) {
       SET_NEXT_STATE (%.DEAD);
       set_error (0, "not expecting NBD_REPLY_TYPE_BLOCK_STATUS here");
@@ -495,6 +505,7 @@ STATE_MACHINE {
   struct command *cmd = h->reply_cmd;
   uint32_t length;
   uint32_t count;
+  uint16_t type;

   switch (recv_into_rbuf (h)) {
   case -1: SET_NEXT_STATE (%.DEAD); return 0;
@@ -504,24 +515,33 @@ STATE_MACHINE {
     return 0;
   case 0:
     length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */
+    type = be16toh (h->sbuf.sr.hdr.structured_reply.type);

     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
     assert (length >= 12);
     length -= sizeof h->bs_contextid;
-    count = length / (2 * sizeof (uint32_t));
+    if (type == NBD_REPLY_TYPE_BLOCK_STATUS)
+      count = length / (2 * sizeof (uint32_t));
+    else {
+      assert (type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT);
+      /* XXX Insist on h->extended_headers? */
+      count = length / sizeof (struct nbd_block_descriptor_ext);
+    }

-    /* Read raw data into a subset of h->bs_entries, then expand it
+    /* Read raw data into an overlap of h->bs_entries, then move it
      * into place later later during byte-swapping.
      */
     free (h->bs_entries.normal);
-    h->bs_entries.normal = malloc (count * sizeof *h->bs_entries.normal);
+    h->bs_entries.normal = malloc (MAX (count * sizeof *h->bs_entries.normal,
+                                        length));
     if (h->bs_entries.normal == NULL) {
       SET_NEXT_STATE (%.DEAD);
       set_error (errno, "malloc");
       return 0;
     }
-    h->rbuf = h->bs_entries.narrow;
+    h->rbuf = type == NBD_REPLY_TYPE_BLOCK_STATUS
+      ? h->bs_entries.narrow : (void *) h->bs_entries.wide;
     h->rlen = length;
     SET_NEXT_STATE (%RECV_BS_ENTRIES);
   }
@@ -533,7 +553,7 @@ STATE_MACHINE {
   uint32_t count;
   size_t i;
   uint32_t context_id;
-  uint32_t *raw;
+  uint16_t type;
   struct meta_context *meta_context;

   switch (recv_into_rbuf (h)) {
@@ -544,23 +564,46 @@ STATE_MACHINE {
     return 0;
   case 0:
     length = h->sbuf.sr.hdr.structured_reply.length; /* normalized in CHECK */
+    type = be16toh (h->sbuf.sr.hdr.structured_reply.type);

     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
     assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
     assert (h->bs_entries.normal);
     assert (length >= 12);
-    count = (length - sizeof h->bs_contextid) / (2 * sizeof (uint32_t));
+    length -= sizeof h->bs_contextid;

     /* Need to byte-swap the entries returned, but apart from that we
-     * don't validate them.  Reverse order is essential, since we are
-     * expanding in-place from narrow to wider type.
+     * don't validate them.
      */
-    raw = h->bs_entries.narrow;
-    for (i = count; i > 0; ) {
-      --i;
-      h->bs_entries.normal[i].flags = be32toh (raw[i * 2 + 1]);
-      h->bs_entries.normal[i].length = be32toh (raw[i * 2]);
+    if (type == NBD_REPLY_TYPE_BLOCK_STATUS) {
+      uint32_t *raw = h->bs_entries.narrow;
+
+      /* Expanding in-place from narrow to wide, must use reverse order. */
+      count = length / (2 * sizeof (uint32_t));
+      for (i = count; i > 0; ) {
+        --i;
+        h->bs_entries.normal[i].flags = be32toh (raw[i * 2 + 1]);
+        h->bs_entries.normal[i].length = be32toh (raw[i * 2]);
+      }
+    }
+    else {
+      struct nbd_block_descriptor_ext *wide = h->bs_entries.wide;
+
+      /* ABI determines whether nbd_extent is 12 or 16 bytes, but the
+       * server sent us 16 bytes, so we must process in forward order.
+       */
+      assert (type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT);
+      count = length / sizeof (struct nbd_block_descriptor_ext);
+      for (i = 0; i < count; i++) {
+        h->bs_entries.normal[i].length = be64toh (wide[i].length);
+        h->bs_entries.normal[i].flags = be32toh (wide[i].status_flags);
+        if (wide[i].pad) {
+          set_error (0, "server sent non-zero padding in block status");
+          SET_NEXT_STATE(%.DEAD);
+          return 0;
+        }
+      }
     }

     /* Look up the context ID. */
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 10/13] api: Add [aio_]nbd_block_status_64
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (8 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 09/13] block_status: Accept 64-bit extents during block status Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 11/13] api: Add three functions for controlling extended headers Eric Blake
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

Overcome the inherent 32-bit limitation of our existing
nbd_block_status command by adding a 64-bit variant.  The command sent
to the server does not change, but the user's callback is now handed
64-bit information regardless of whether the server replies with 32-
or 64-bit extents.

Unit tests prove that the new API works in each of C, Python, OCaml,
and Go bindings.  We can also get rid of the temporary hack added to
appease the compiler in an earlier patch.
---
 generator/API.ml                          | 138 +++++++++++++++++++---
 generator/OCaml.ml                        |   1 -
 generator/Python.ml                       |   1 -
 lib/rw.c                                  |  48 ++++++--
 python/t/465-block-status-64.py           |  56 +++++++++
 ocaml/tests/Makefile.am                   |   5 +-
 ocaml/tests/test_465_block_status_64.ml   |  58 +++++++++
 tests/meta-base-allocation.c              | 111 +++++++++++++++--
 golang/Makefile.am                        |   3 +-
 golang/libnbd_465_block_status_64_test.go | 119 +++++++++++++++++++
 10 files changed, 503 insertions(+), 37 deletions(-)
 create mode 100644 python/t/465-block-status-64.py
 create mode 100644 ocaml/tests/test_465_block_status_64.ml
 create mode 100644 golang/libnbd_465_block_status_64_test.go

diff --git a/generator/API.ml b/generator/API.ml
index 70ae721d..1a452a24 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -1071,7 +1071,7 @@   "add_meta_context", {
 During connection libnbd can negotiate zero or more metadata
 contexts with the server.  Metadata contexts are features (such
 as C<\"base:allocation\">) which describe information returned
-by the L<nbd_block_status(3)> command (for C<\"base:allocation\">
+by the L<nbd_block_status_64(3)> command (for C<\"base:allocation\">
 this is whether blocks of data are allocated, zero or sparse).

 This call adds one metadata context to the list to be negotiated.
@@ -1098,7 +1098,7 @@   "add_meta_context", {
 Other metadata contexts are server-specific, but include
 C<\"qemu:dirty-bitmap:...\"> and C<\"qemu:allocation-depth\"> for
 qemu-nbd (see qemu-nbd I<-B> and I<-A> options).";
-    see_also = [Link "block_status"; Link "can_meta_context";
+    see_also = [Link "block_status_64"; Link "can_meta_context";
                 Link "get_nr_meta_contexts"; Link "get_meta_context";
                 Link "clear_meta_contexts"];
   };
@@ -1111,14 +1111,14 @@   "get_nr_meta_contexts", {
 During connection libnbd can negotiate zero or more metadata
 contexts with the server.  Metadata contexts are features (such
 as C<\"base:allocation\">) which describe information returned
-by the L<nbd_block_status(3)> command (for C<\"base:allocation\">
+by the L<nbd_block_status_64(3)> command (for C<\"base:allocation\">
 this is whether blocks of data are allocated, zero or sparse).

 This command returns how many meta contexts have been added to
 the list to request from the server via L<nbd_add_meta_context(3)>.
 The server is not obligated to honor all of the requests; to see
 what it actually supports, see L<nbd_can_meta_context(3)>.";
-    see_also = [Link "block_status"; Link "can_meta_context";
+    see_also = [Link "block_status_64"; Link "can_meta_context";
                 Link "add_meta_context"; Link "get_meta_context";
                 Link "clear_meta_contexts"];
   };
@@ -1131,13 +1131,13 @@   "get_meta_context", {
 During connection libnbd can negotiate zero or more metadata
 contexts with the server.  Metadata contexts are features (such
 as C<\"base:allocation\">) which describe information returned
-by the L<nbd_block_status(3)> command (for C<\"base:allocation\">
+by the L<nbd_block_status_64(3)> command (for C<\"base:allocation\">
 this is whether blocks of data are allocated, zero or sparse).

 This command returns the i'th meta context request, as added by
 L<nbd_add_meta_context(3)>, and bounded by
 L<nbd_get_nr_meta_contexts(3)>.";
-    see_also = [Link "block_status"; Link "can_meta_context";
+    see_also = [Link "block_status_64"; Link "can_meta_context";
                 Link "add_meta_context"; Link "get_nr_meta_contexts";
                 Link "clear_meta_contexts"];
   };
@@ -1151,7 +1151,7 @@   "clear_meta_contexts", {
 During connection libnbd can negotiate zero or more metadata
 contexts with the server.  Metadata contexts are features (such
 as C<\"base:allocation\">) which describe information returned
-by the L<nbd_block_status(3)> command (for C<\"base:allocation\">
+by the L<nbd_block_status_64(3)> command (for C<\"base:allocation\">
 this is whether blocks of data are allocated, zero or sparse).

 This command resets the list of meta contexts to request back to
@@ -1160,7 +1160,7 @@   "clear_meta_contexts", {
 negotiation mode is selected (see L<nbd_set_opt_mode(3)>), for
 altering the list of attempted contexts between subsequent export
 queries.";
-    see_also = [Link "block_status"; Link "can_meta_context";
+    see_also = [Link "block_status_64"; Link "can_meta_context";
                 Link "add_meta_context"; Link "get_nr_meta_contexts";
                 Link "get_meta_context"; Link "set_opt_mode"];
   };
@@ -1727,7 +1727,7 @@   "can_meta_context", {
 ^ non_blocking_test_call_description;
     see_also = [SectionLink "Flag calls"; Link "opt_info";
                 Link "add_meta_context";
-                Link "block_status"; Link "aio_block_status"];
+                Link "block_status_64"; Link "aio_block_status_64"];
   };

   "get_protocol", {
@@ -2124,7 +2124,7 @@   "block_status", {
     optargs = [ OFlags ("flags", cmd_flags, Some ["REQ_ONE"]) ];
     ret = RErr;
     permitted_states = [ Connected ];
-    shortdesc = "send block status command to the NBD server";
+    shortdesc = "send block status command to the NBD server, with 32-bit callback";
     longdesc = "\
 Issue the block status command to the NBD server.  If
 supported by the server, this causes metadata context
@@ -2139,7 +2139,12 @@   "block_status", {
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum block status size,
 although a future extension may add a constraint visible in
-L<nbd_get_block_size(3)>.
+L<nbd_get_block_size(3)>.  Furthermore, this function is inherently
+limited to reporting extents no larger than 32 bits in size.  If the
+server replies with a larger extent, the length of that extent will
+be truncated to just below 32 bits and any further extents from the
+server will be ignored.  To get the full extent information from a
+server that supports 64-bit extents, you must use L<nbd_block_status_64(3)>.

 Depending on which metadata contexts were enabled before
 connecting (see L<nbd_add_meta_context(3)>) and which are
@@ -2182,10 +2187,79 @@   "block_status", {
 does not exceed C<count> bytes; however, libnbd does not
 validate that the server obeyed the flag."
 ^ strict_call_description;
-    see_also = [Link "add_meta_context"; Link "can_meta_context";
+    see_also = [Link "block_status_64";
+                Link "add_meta_context"; Link "can_meta_context";
                 Link "aio_block_status"; Link "set_strict_mode"];
   };

+  "block_status_64", {
+    default_call with
+    args = [ UInt64 "count"; UInt64 "offset"; Closure extent64_closure ];
+    optargs = [ OFlags ("flags", cmd_flags, Some ["REQ_ONE"]) ];
+    ret = RErr;
+    permitted_states = [ Connected ];
+    shortdesc = "send block status command to the NBD server, with 64-bit callback";
+    longdesc = "\
+Issue the block status command to the NBD server.  If
+supported by the server, this causes metadata context
+information about blocks beginning from the specified
+offset to be returned. The C<count> parameter is a hint: the
+server may choose to return less status, or the final block
+may extend beyond the requested range. If multiple contexts
+are supported, the number of blocks and cumulative length
+of those blocks need not be identical between contexts.
+
+Note that not all servers can support a C<count> of 4GiB or larger.
+The NBD protocol does not yet have a way for a client to learn if
+the server will enforce an even smaller maximum block status size,
+although a future extension may add a constraint visible in
+L<nbd_get_block_size(3)>.
+
+Depending on which metadata contexts were enabled before
+connecting (see L<nbd_add_meta_context(3)>) and which are
+supported by the server (see L<nbd_can_meta_context(3)>) this call
+returns information about extents by calling back to the
+C<extent64> function.  The callback cannot call C<nbd_*> APIs on the
+same handle since it holds the handle lock and will
+cause a deadlock.  If the callback returns C<-1>, and no earlier
+error has been detected, then the overall block status command
+will fail with any non-zero value stored into the callback's
+C<error> parameter (with a default of C<EPROTO>); but any further
+contexts will still invoke the callback.
+
+The C<extent64> function is called once per type of metadata available,
+with the C<user_data> passed to this function.  The C<metacontext>
+parameter is a string such as C<\"base:allocation\">.  The C<entries>
+array is an array of B<nbd_extent> structs, containing  length (in bytes)
+of the block and a status/flags field which is specific to the metadata
+context.  (The number of array entries passed to the function is
+C<nr_entries>.)  The NBD protocol document in the section about
+C<NBD_REPLY_TYPE_BLOCK_STATUS_EXT> describes the meaning of this array;
+for contexts known to libnbd, B<E<lt>libnbd.hE<gt>> contains constants
+beginning with C<LIBNBD_STATE_> that may help decipher the values.
+On entry to the callback, the C<error> parameter contains the errno
+value of any previously detected error.
+
+It is possible for the extent function to be called
+more times than you expect (if the server is buggy),
+so always check the C<metacontext> field to ensure you
+are receiving the data you expect.  It is also possible
+that the extent function is not called at all, even for
+metadata contexts that you requested.  This indicates
+either that the server doesn't support the context
+or for some other reason cannot return the data.
+
+The C<flags> parameter may be C<0> for no flags, or may contain
+C<LIBNBD_CMD_FLAG_REQ_ONE> meaning that the server should
+return only one extent per metadata context where that extent
+does not exceed C<count> bytes; however, libnbd does not
+validate that the server obeyed the flag."
+^ strict_call_description;
+    see_also = [Link "block_status";
+                Link "add_meta_context"; Link "can_meta_context";
+                Link "aio_block_status_64"; Link "set_strict_mode"];
+  };
+
   "poll", {
     default_call with
     args = [ Int "timeout" ]; ret = RInt;
@@ -2634,7 +2708,7 @@   "aio_block_status", {
                 OFlags ("flags", cmd_flags, Some ["REQ_ONE"]) ];
     ret = RCookie;
     permitted_states = [ Connected ];
-    shortdesc = "send block status command to the NBD server";
+    shortdesc = "send block status command to the NBD server, with 32-bit callback";
     longdesc = "\
 Send the block status command to the NBD server.

@@ -2642,13 +2716,45 @@   "aio_block_status", {
 Or supply the optional C<completion_callback> which will be invoked
 as described in L<libnbd(3)/Completion callbacks>.

-Other parameters behave as documented in L<nbd_block_status(3)>."
+Other parameters behave as documented in L<nbd_block_status(3)>.
+
+This function is inherently limited to reporting extents no larger
+than 32 bits in size.  If the server replies with a larger extent,
+the length of that extent will be truncated to just below 32 bits
+and any further extents from the server will be ignored.  To get
+the full extent information from a server that supports 64-bit
+extents, you must use L<nbd_aio_block_status_64(3)>.
+"
 ^ strict_call_description;
     see_also = [SectionLink "Issuing asynchronous commands";
+                Link "aio_block_status_64";
                 Link "can_meta_context"; Link "block_status";
                 Link "set_strict_mode"];
   };

+  "aio_block_status_64", {
+    default_call with
+    args = [ UInt64 "count"; UInt64 "offset"; Closure extent64_closure ];
+    optargs = [ OClosure completion_closure;
+                OFlags ("flags", cmd_flags, Some ["REQ_ONE"]) ];
+    ret = RCookie;
+    permitted_states = [ Connected ];
+    shortdesc = "send block status command to the NBD server";
+    longdesc = "\
+Send the block status command to the NBD server.
+
+To check if the command completed, call L<nbd_aio_command_completed(3)>.
+Or supply the optional C<completion_callback> which will be invoked
+as described in L<libnbd(3)/Completion callbacks>.
+
+Other parameters behave as documented in L<nbd_block_status_64(3)>."
+^ strict_call_description;
+    see_also = [SectionLink "Issuing asynchronous commands";
+                Link "aio_block_status";
+                Link "can_meta_context"; Link "block_status_64";
+                Link "set_strict_mode"];
+  };
+
   "aio_get_fd", {
     default_call with
     args = []; ret = RFd;
@@ -3130,6 +3236,10 @@ let first_version =
   "get_private_data", (1, 8);
   "get_uri", (1, 8);

+  (* Added in 1.11.x development cycle, will be stable and supported in 1.12. *)
+  "block_status_64", (1, 12);
+  "aio_block_status_64", (1, 12);
+
   (* These calls are proposed for a future version of libnbd, but
    * have not been added to any released version so far.
   "get_tls_certificates", (1, ??);
diff --git a/generator/OCaml.ml b/generator/OCaml.ml
index eac42668..fd9dfdec 100644
--- a/generator/OCaml.ml
+++ b/generator/OCaml.ml
@@ -554,7 +554,6 @@ let
   pr "}\n";
   pr "\n";
   pr "static int\n";
-  pr "__attribute__((unused)) /* XXX temporary hack */\n";
   pr "%s_wrapper " cbname;
   C.print_cbarg_list ~wrap:true cbargs;
   pr "\n";
diff --git a/generator/Python.ml b/generator/Python.ml
index 4212e2ac..e32270cf 100644
--- a/generator/Python.ml
+++ b/generator/Python.ml
@@ -158,7 +158,6 @@ let
 let print_python_closure_wrapper { cbname; cbargs } =
   pr "/* Wrapper for %s callback. */\n" cbname;
   pr "static int\n";
-  pr "__attribute__((unused)) /* XXX temporary hack */\n";
   pr "%s_wrapper " cbname;
   C.print_cbarg_list ~wrap:true cbargs;
   pr "\n";
diff --git a/lib/rw.c b/lib/rw.c
index f36f4e15..5454adb7 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -194,7 +194,7 @@ nbd_unlocked_zero (struct nbd_handle *h,
   return wait_for_command (h, cookie);
 }

-/* Issue a block status command and wait for the reply. */
+/* Issue a block status command and wait for the reply, 32-bit callback. */
 int
 nbd_unlocked_block_status (struct nbd_handle *h,
                            uint64_t count, uint64_t offset,
@@ -212,6 +212,25 @@ nbd_unlocked_block_status (struct nbd_handle *h,
   return wait_for_command (h, cookie);
 }

+/* Issue a block status command and wait for the reply, 64-bit callback. */
+int
+nbd_unlocked_block_status_64 (struct nbd_handle *h,
+                              uint64_t count, uint64_t offset,
+                              nbd_extent64_callback *extent64,
+                              uint32_t flags)
+{
+  int64_t cookie;
+  nbd_completion_callback c = NBD_NULL_COMPLETION;
+
+  cookie = nbd_unlocked_aio_block_status_64 (h, count, offset, extent64, &c,
+                                             flags);
+  if (cookie == -1)
+    return -1;
+
+  assert (CALLBACK_IS_NULL (*extent64));
+  return wait_for_command (h, cookie);
+}
+
 /* count_err represents the errno to return if bounds check fail */
 int64_t
 nbd_internal_command_common (struct nbd_handle *h,
@@ -514,10 +533,10 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
                                uint32_t flags)
 {
   nbd_extent_callback *shim = malloc (sizeof *shim);
-  struct command_cb cb = { .fn.extent.callback = nbd_convert_extent,
-                           .fn.extent.user_data = shim,
-                           .fn.extent.free = nbd_convert_extent_free,
-                           .completion = *completion };
+  nbd_extent64_callback wrapper = { .callback = nbd_convert_extent,
+                                    .user_data = shim,
+                                    .free = nbd_convert_extent_free, };
+  int ret;

   if (shim == NULL) {
     set_error (errno, "malloc");
@@ -526,10 +545,25 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
   *shim = *extent;
   SET_CALLBACK_TO_NULL (*extent);

+  ret = nbd_unlocked_aio_block_status_64 (h, count, offset, &wrapper,
+                                          completion, flags);
+  FREE_CALLBACK (wrapper);
+  return ret;
+}
+
+int64_t
+nbd_unlocked_aio_block_status_64 (struct nbd_handle *h,
+                                  uint64_t count, uint64_t offset,
+                                  nbd_extent64_callback *extent64,
+                                  nbd_completion_callback *completion,
+                                  uint32_t flags)
+{
+  struct command_cb cb = { .fn.extent = *extent64,
+                           .completion = *completion };
+
   if (h->strict & LIBNBD_STRICT_COMMANDS) {
     if (!h->structured_replies) {
       set_error (ENOTSUP, "server does not support structured replies");
-      FREE_CALLBACK (cb.fn.extent);
       return -1;
     }

@@ -537,11 +571,11 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
       set_error (ENOTSUP, "did not negotiate any metadata contexts, "
                  "either you did not call nbd_add_meta_context before "
                  "connecting or the server does not support it");
-      FREE_CALLBACK (cb.fn.extent);
       return -1;
     }
   }

+  SET_CALLBACK_TO_NULL (*extent64);
   SET_CALLBACK_TO_NULL (*completion);
   return nbd_internal_command_common (h, flags, NBD_CMD_BLOCK_STATUS, offset,
                                       count, EINVAL, NULL, &cb);
diff --git a/python/t/465-block-status-64.py b/python/t/465-block-status-64.py
new file mode 100644
index 00000000..94d7b465
--- /dev/null
+++ b/python/t/465-block-status-64.py
@@ -0,0 +1,56 @@
+# libnbd Python bindings
+# Copyright (C) 2010-2021 Red Hat Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+
+import os
+
+import nbd
+
+script = "%s/../tests/meta-base-allocation.sh" % os.getenv("srcdir", ".")
+
+h = nbd.NBD()
+h.add_meta_context("base:allocation")
+h.connect_command(["nbdkit", "-s", "--exit-with-parent", "-v", "sh", script])
+
+entries = []
+
+
+def f(user_data, metacontext, offset, e, err):
+    global entries
+    assert user_data == 42
+    assert err.value == 0
+    if metacontext != "base:allocation":
+        return
+    entries = e
+
+
+h.block_status_64(65536, 0, lambda *args: f(42, *args))
+print("entries = %r" % entries)
+assert entries == [(8192, 0),
+                   (8192, 1),
+                   (16384, 3),
+                   (16384, 2),
+                   (16384, 0)]
+
+h.block_status_64(1024, 32256, lambda *args: f(42, *args))
+print("entries = %r" % entries)
+assert entries == [(512, 3),
+                   (16384, 2)]
+
+h.block_status_64(1024, 32256, lambda *args: f(42, *args),
+                  nbd.CMD_FLAG_REQ_ONE)
+print("entries = %r" % entries)
+assert entries == [(512, 3)]
diff --git a/ocaml/tests/Makefile.am b/ocaml/tests/Makefile.am
index 6fac8b7c..489b030a 100644
--- a/ocaml/tests/Makefile.am
+++ b/ocaml/tests/Makefile.am
@@ -1,5 +1,5 @@
 # nbd client library in userspace
-# Copyright (C) 2013-2020 Red Hat Inc.
+# Copyright (C) 2013-2021 Red Hat Inc.
 #
 # This library is free software; you can redistribute it and/or
 # modify it under the terms of the GNU Lesser General Public
@@ -35,6 +35,7 @@ EXTRA_DIST = \
 	test_405_pread_structured.ml \
 	test_410_pwrite.ml \
 	test_460_block_status.ml \
+	test_465_block_status_64.ml \
 	test_500_aio_pread.ml \
 	test_505_aio_pread_structured_callback.ml \
 	test_510_aio_pwrite.ml \
@@ -62,6 +63,7 @@ tests_bc = \
 	test_405_pread_structured.bc \
 	test_410_pwrite.bc \
 	test_460_block_status.bc \
+	test_465_block_status_64.bc \
 	test_500_aio_pread.bc \
 	test_505_aio_pread_structured_callback.bc \
 	test_510_aio_pwrite.bc \
@@ -86,6 +88,7 @@ tests_opt = \
 	test_405_pread_structured.opt \
 	test_410_pwrite.opt \
 	test_460_block_status.opt \
+	test_465_block_status_64.opt \
 	test_500_aio_pread.opt \
 	test_505_aio_pread_structured_callback.opt \
 	test_510_aio_pwrite.opt \
diff --git a/ocaml/tests/test_465_block_status_64.ml b/ocaml/tests/test_465_block_status_64.ml
new file mode 100644
index 00000000..a27a8ad4
--- /dev/null
+++ b/ocaml/tests/test_465_block_status_64.ml
@@ -0,0 +1,58 @@
+(* hey emacs, this is OCaml code: -*- tuareg -*- *)
+(* libnbd OCaml test case
+ * Copyright (C) 2013-2021 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *)
+
+open Printf
+
+let script =
+  try
+    let srcdir = Sys.getenv "srcdir" in
+    sprintf "%s/../../tests/meta-base-allocation.sh" srcdir
+  with
+    Not_found -> failwith "error: srcdir is not defined"
+
+let entries = ref [||]
+let f user_data metacontext offset e err =
+  assert (user_data = 42);
+  assert (!err = 0);
+  if metacontext = "base:allocation" then
+    entries := e;
+  0
+
+let () =
+  let nbd = NBD.create () in
+  NBD.add_meta_context nbd "base:allocation";
+  NBD.connect_command nbd ["nbdkit"; "-s"; "--exit-with-parent"; "-v";
+                           "sh"; script];
+
+  NBD.block_status_64 nbd 65536_L 0_L (f 42);
+  assert (!entries = [|  8192_L, 0_l;
+                         8192_L, 1_l;
+                        16384_L, 3_l;
+                        16384_L, 2_l;
+                        16384_L, 0_l |]);
+
+  NBD.block_status_64 nbd 1024_L 32256_L (f 42);
+  assert (!entries = [|   512_L, 3_l;
+                        16384_L, 2_l |]);
+
+  let flags = let open NBD.CMD_FLAG in [REQ_ONE] in
+  NBD.block_status_64 nbd 1024_L 32256_L (f 42) ~flags;
+  assert (!entries = [|   512_L, 3_l |])
+
+let () = Gc.compact ()
diff --git a/tests/meta-base-allocation.c b/tests/meta-base-allocation.c
index 401c0c88..a2847d6e 100644
--- a/tests/meta-base-allocation.c
+++ b/tests/meta-base-allocation.c
@@ -1,5 +1,5 @@
 /* NBD client library in userspace
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -32,10 +32,13 @@

 #define BOGUS_CONTEXT "x-libnbd:nosuch"

-static int check_extent (void *data,
-                         const char *metacontext,
-                         uint64_t offset,
-                         uint32_t *entries, size_t nr_entries, int *error);
+static int check_extent32 (void *data, const char *metacontext,
+                           uint64_t offset,
+                           uint32_t *entries, size_t nr_entries, int *error);
+
+static int check_extent64 (void *data, const char *metacontext,
+                           uint64_t offset,
+                           nbd_extent *entries, size_t nr_entries, int *error);

 int
 main (int argc, char *argv[])
@@ -149,27 +152,51 @@ main (int argc, char *argv[])
   /* Read the block status. */
   id = 1;
   if (nbd_block_status (nbd, 65536, 0,
-                        (nbd_extent_callback) { .callback = check_extent, .user_data = &id },
+                        (nbd_extent_callback) { .callback = check_extent32,
+                                                .user_data = &id },
                         0) == -1) {
     fprintf (stderr, "%s\n", nbd_get_error ());
     exit (EXIT_FAILURE);
   }
+  if (nbd_block_status_64 (nbd, 65536, 0,
+                           (nbd_extent64_callback) { .callback = check_extent64,
+                                                     .user_data = &id },
+                           0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }

   id = 2;
   if (nbd_block_status (nbd, 1024, 32768-512,
-                        (nbd_extent_callback) { .callback = check_extent, .user_data = &id },
+                        (nbd_extent_callback) { .callback = check_extent32,
+                                                .user_data = &id },
                         0) == -1) {
     fprintf (stderr, "%s\n", nbd_get_error ());
     exit (EXIT_FAILURE);
   }
+  if (nbd_block_status_64 (nbd, 1024, 32768-512,
+                           (nbd_extent64_callback) { .callback = check_extent64,
+                                                     .user_data = &id },
+                           0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }

   id = 3;
   if (nbd_block_status (nbd, 1024, 32768-512,
-                        (nbd_extent_callback) { .callback = check_extent, .user_data = &id },
+                        (nbd_extent_callback) { .callback = check_extent32,
+                                                .user_data = &id },
                         LIBNBD_CMD_FLAG_REQ_ONE) == -1) {
     fprintf (stderr, "%s\n", nbd_get_error ());
     exit (EXIT_FAILURE);
   }
+  if (nbd_block_status_64 (nbd, 1024, 32768-512,
+                           (nbd_extent64_callback) { .callback = check_extent64,
+                                                     .user_data = &id },
+                           LIBNBD_CMD_FLAG_REQ_ONE) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }

   if (nbd_shutdown (nbd, 0) == -1) {
     fprintf (stderr, "%s\n", nbd_get_error ());
@@ -181,10 +208,8 @@ main (int argc, char *argv[])
 }

 static int
-check_extent (void *data,
-              const char *metacontext,
-              uint64_t offset,
-              uint32_t *entries, size_t nr_entries, int *error)
+check_extent32 (void *data, const char *metacontext, uint64_t offset,
+                uint32_t *entries, size_t nr_entries, int *error)
 {
   size_t i;
   int id;
@@ -238,3 +263,65 @@ check_extent (void *data,

   return 0;
 }
+
+static int
+check_extent64 (void *data, const char *metacontext, uint64_t offset,
+                nbd_extent *entries, size_t nr_entries, int *error)
+{
+  size_t i;
+  int id;
+
+  id = * (int *)data;
+
+  printf ("extent: id=%d, metacontext=%s, offset=%" PRIu64 ", "
+          "nr_entries=%zu, error=%d\n",
+          id, metacontext, offset, nr_entries, *error);
+
+  assert (*error == 0);
+  if (strcmp (metacontext, LIBNBD_CONTEXT_BASE_ALLOCATION) == 0) {
+    for (i = 0; i < nr_entries; i++) {
+      printf ("\t%zu\tlength=%" PRIu64 ", status=%" PRIu32 "\n",
+              i, entries[i].length, entries[i].flags);
+    }
+    fflush (stdout);
+
+    switch (id) {
+    case 1:
+      assert (nr_entries == 5);
+      assert (entries[0].length == 8192);
+      assert (entries[0].flags == 0);
+      assert (entries[1].length == 8192);
+      assert (entries[1].flags == LIBNBD_STATE_HOLE);
+      assert (entries[2].length == 16384);
+      assert (entries[2].flags == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
+      assert (entries[3].length == 16384);
+      assert (entries[3].flags == LIBNBD_STATE_ZERO);
+      assert (entries[4].length == 16384);
+      assert (entries[4].flags == 0);
+      break;
+
+    case 2:
+      assert (nr_entries == 2);
+      assert (entries[0].length == 512);
+      assert (entries[0].flags == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
+      assert (entries[1].length == 16384);
+      assert (entries[1].flags == LIBNBD_STATE_ZERO);
+      break;
+
+    case 3:
+      assert (nr_entries == 1);
+      assert (entries[0].length == 512);
+      assert (entries[0].flags == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
+      break;
+
+    default:
+      abort ();
+    }
+
+  }
+  else
+    fprintf (stderr, "warning: ignored unexpected meta context %s\n",
+             metacontext);
+
+  return 0;
+}
diff --git a/golang/Makefile.am b/golang/Makefile.am
index 10fb8934..e861f5fa 100644
--- a/golang/Makefile.am
+++ b/golang/Makefile.am
@@ -1,5 +1,5 @@
 # nbd client library in userspace
-# Copyright (C) 2013-2020 Red Hat Inc.
+# Copyright (C) 2013-2021 Red Hat Inc.
 #
 # This library is free software; you can redistribute it and/or
 # modify it under the terms of the GNU Lesser General Public
@@ -39,6 +39,7 @@ source_files = \
 	libnbd_405_pread_structured_test.go \
 	libnbd_410_pwrite_test.go \
 	libnbd_460_block_status_test.go \
+	libnbd_465_block_status_64_test.go \
 	libnbd_500_aio_pread_test.go \
 	libnbd_510_aio_pwrite_test.go \
 	libnbd_590_aio_copy_test.go \
diff --git a/golang/libnbd_465_block_status_64_test.go b/golang/libnbd_465_block_status_64_test.go
new file mode 100644
index 00000000..40635875
--- /dev/null
+++ b/golang/libnbd_465_block_status_64_test.go
@@ -0,0 +1,119 @@
+/* libnbd golang tests
+ * Copyright (C) 2013-2021 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+package libnbd
+
+import (
+	"fmt"
+	"os"
+	"strings"
+	"testing"
+)
+
+var entries64 []LibnbdExtent
+
+func mcf64(metacontext string, offset uint64, e []LibnbdExtent, error *int) int {
+	if *error != 0 {
+		panic("expected *error == 0")
+	}
+	if metacontext == "base:allocation" {
+		entries64 = e
+	}
+	return 0
+}
+
+// Seriously WTF?
+func mc64_compare(a1 []LibnbdExtent, a2 []LibnbdExtent) bool {
+	if len(a1) != len(a2) {
+		return false
+	}
+	for i := 0; i < len(a1); i++ {
+		if a1[i] != a2[i] {
+			return false
+		}
+	}
+	return true
+}
+
+func mc64_to_string(a []LibnbdExtent) string {
+	ss := make([]string, len(a))
+	for i := 0; i < len(a); i++ {
+		ss[i] = fmt.Sprintf("%#v", a[i])
+	}
+	return strings.Join(ss, ", ")
+}
+
+func Test465BlockStatus64(t *testing.T) {
+	srcdir := os.Getenv("abs_top_srcdir")
+	script := srcdir + "/tests/meta-base-allocation.sh"
+
+	h, err := Create()
+	if err != nil {
+		t.Fatalf("could not create handle: %s", err)
+	}
+	defer h.Close()
+
+	err = h.AddMetaContext("base:allocation")
+	if err != nil {
+		t.Fatalf("%s", err)
+	}
+	err = h.ConnectCommand([]string{
+		"nbdkit", "-s", "--exit-with-parent", "-v",
+		"sh", script,
+	})
+	if err != nil {
+		t.Fatalf("%s", err)
+	}
+
+	err = h.BlockStatus64(65536, 0, mcf64, nil)
+	if err != nil {
+		t.Fatalf("%s", err)
+	}
+	if !mc64_compare(entries64, []LibnbdExtent{
+		{8192, 0},
+		{8192, 1},
+		{16384, 3},
+		{16384, 2},
+		{16384, 0},
+	}) {
+		t.Fatalf("unexpected entries (1): %s", mc64_to_string(entries64))
+	}
+
+	err = h.BlockStatus64(1024, 32256, mcf64, nil)
+	if err != nil {
+		t.Fatalf("%s", err)
+	}
+	if !mc64_compare(entries64, []LibnbdExtent{
+		{512, 3},
+		{16384, 2},
+	}) {
+		t.Fatalf("unexpected entries (2): %s", mc64_to_string(entries64))
+	}
+
+	var optargs BlockStatus64Optargs
+	optargs.FlagsSet = true
+	optargs.Flags = CMD_FLAG_REQ_ONE
+	err = h.BlockStatus64(1024, 32256, mcf64, &optargs)
+	if err != nil {
+		t.Fatalf("%s", err)
+	}
+	if !mc64_compare(entries64, []LibnbdExtent{{512, 3}}) {
+		t.Fatalf("unexpected entries (3): %s", mc64_to_string(entries64))
+	}
+
+}
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 11/13] api: Add three functions for controlling extended headers
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (9 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 10/13] api: Add [aio_]nbd_block_status_64 Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 12/13] generator: Actually request " Eric Blake
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

The new NBD_OPT_EXTENDED_HEADERS feature is worth using by default,
but there may be cases where the user explicitly wants to stick with
the older 32-bit headers.  nbd_set_request_extended_headers() will let
the client override the default, nbd_get_request_extended_headers()
determines the current state of the request, and
nbd_get_extended_headers_negotiated() determines what the client and
server actually settled on.  These use
nbd_set_request_structured_headers() and friends as a template.

Note that this patch just adds the API but ignores the state variable;
the next one will then tweak the state machine to actually request
structured headers when the state variable is set.
---
 lib/internal.h                             |  1 +
 generator/API.ml                           | 89 ++++++++++++++++++++--
 lib/handle.c                               | 23 ++++++
 python/t/110-defaults.py                   |  3 +-
 python/t/120-set-non-defaults.py           |  4 +-
 ocaml/tests/test_110_defaults.ml           |  4 +-
 ocaml/tests/test_120_set_non_defaults.ml   |  5 +-
 golang/libnbd_110_defaults_test.go         |  8 ++
 golang/libnbd_120_set_non_defaults_test.go | 12 +++
 9 files changed, 137 insertions(+), 12 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index 97abf4f2..a579e413 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -107,6 +107,7 @@ struct nbd_handle {
   char *tls_psk_file;           /* PSK filename, NULL = no PSK */

   /* Extended headers. */
+  bool request_eh;              /* Whether to request extended headers */
   bool extended_headers;        /* If we negotiated NBD_OPT_EXTENDED_HEADERS */

   /* Desired metadata contexts. */
diff --git a/generator/API.ml b/generator/API.ml
index 1a452a24..e45f0c86 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -675,6 +675,63 @@   "get_tls_psk_file", {
   };
 *)

+  "set_request_extended_headers", {
+    default_call with
+    args = [Bool "request"]; ret = RErr;
+    permitted_states = [ Created ];
+    shortdesc = "control use of extended headers";
+    longdesc = "\
+By default, libnbd tries to negotiate extended headers with the
+server, as this protocol extension permits the use of 64-bit
+zero, trim, and block status actions.  However,
+for integration testing, it can be useful to clear this flag
+rather than find a way to alter the server to fail the negotiation
+request.";
+    see_also = [Link "get_request_extended_headers";
+                Link "set_handshake_flags"; Link "set_strict_mode";
+                Link "get_extended_headers_negotiated";
+                Link "zero"; Link "trim"; Link "cache";
+                Link "block_status_64";
+                Link "set_request_structured_replies"];
+  };
+
+  "get_request_extended_headers", {
+    default_call with
+    args = []; ret = RBool;
+    may_set_error = false;
+    shortdesc = "see if extended headers are attempted";
+    longdesc = "\
+Return the state of the request extended headers flag on this
+handle.
+
+B<Note:> If you want to find out if extended headers were actually
+negotiated on a particular connection use
+L<nbd_get_extended_headers_negotiated(3)> instead.";
+    see_also = [Link "set_request_extended_headers";
+                Link "get_extended_headers_negotiated";
+                Link "get_request_extended_headers"];
+  };
+
+  "get_extended_headers_negotiated", {
+    default_call with
+    args = []; ret = RBool;
+    permitted_states = [ Negotiating; Connected; Closed ];
+    shortdesc = "see if extended headers are in use";
+    longdesc = "\
+After connecting you may call this to find out if the connection is
+using extended headers.  When extended headers are not in use, commands
+are limited to a 32-bit length, even when the libnbd API uses a 64-bit
+variable to express the length.  But even when extended headers are
+supported, the server may enforce other limits, visible through
+L<nbd_get_block_size(3)>.";
+    see_also = [Link "set_request_extended_headers";
+                Link "get_request_extended_headers";
+                Link "zero"; Link "trim"; Link "cache";
+                Link "block_status_64"; Link "get_block_size";
+                Link "get_protocol";
+                Link "get_structured_replies_negotiated"];
+  };
+
   "set_request_structured_replies", {
     default_call with
     args = [Bool "request"]; ret = RErr;
@@ -690,7 +747,8 @@   "set_request_structured_replies", {
     see_also = [Link "get_request_structured_replies";
                 Link "set_handshake_flags"; Link "set_strict_mode";
                 Link "get_structured_replies_negotiated";
-                Link "can_meta_context"; Link "can_df"];
+                Link "can_meta_context"; Link "can_df";
+                Link "set_request_extended_headers"];
   };

   "get_request_structured_replies", {
@@ -706,7 +764,8 @@   "get_request_structured_replies", {
 negotiated on a particular connection use
 L<nbd_get_structured_replies_negotiated(3)> instead.";
     see_also = [Link "set_request_structured_replies";
-                Link "get_structured_replies_negotiated"];
+                Link "get_structured_replies_negotiated";
+                Link "get_request_extended_headers"];
   };

   "get_structured_replies_negotiated", {
@@ -719,7 +778,8 @@   "get_structured_replies_negotiated", {
 using structured replies.";
     see_also = [Link "set_request_structured_replies";
                 Link "get_request_structured_replies";
-                Link "get_protocol"];
+                Link "get_protocol";
+                Link "get_extended_headers_negotiated"];
   };

   "set_handshake_flags", {
@@ -2035,7 +2095,9 @@   "trim", {
 or there is an error.  Note this will generally return an error
 if L<nbd_can_trim(3)> is false or L<nbd_is_read_only(3)> is true.

-Note that not all servers can support a C<count> of 4GiB or larger.
+Note that not all servers can support a C<count> of 4GiB or larger;
+L<nbd_get_extended_headers_negotiated(3)> indicates which servers
+will parse a request larger than 32 bits.
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum trim size, although
 a future extension may add a constraint visible in
@@ -2066,7 +2128,9 @@   "cache", {
 this command.  Note this will generally return an error if
 L<nbd_can_cache(3)> is false.

-Note that not all servers can support a C<count> of 4GiB or larger.
+Note that not all servers can support a C<count> of 4GiB or larger;
+L<nbd_get_extended_headers_negotiated(3)> indicates which servers
+will parse a request larger than 32 bits.
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum cache size, although
 a future extension may add a constraint visible in
@@ -2095,7 +2159,9 @@   "zero", {
 or there is an error.  Note this will generally return an error if
 L<nbd_can_zero(3)> is false or L<nbd_is_read_only(3)> is true.

-Note that not all servers can support a C<count> of 4GiB or larger.
+Note that not all servers can support a C<count> of 4GiB or larger;
+L<nbd_get_extended_headers_negotiated(3)> indicates which servers
+will parse a request larger than 32 bits.
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum zero size, although
 a future extension may add a constraint visible in
@@ -2135,7 +2201,9 @@   "block_status", {
 are supported, the number of blocks and cumulative length
 of those blocks need not be identical between contexts.

-Note that not all servers can support a C<count> of 4GiB or larger.
+Note that not all servers can support a C<count> of 4GiB or larger;
+L<nbd_get_extended_headers_negotiated(3)> indicates which servers
+will parse a request larger than 32 bits.
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum block status size,
 although a future extension may add a constraint visible in
@@ -2209,7 +2277,9 @@   "block_status_64", {
 are supported, the number of blocks and cumulative length
 of those blocks need not be identical between contexts.

-Note that not all servers can support a C<count> of 4GiB or larger.
+Note that not all servers can support a C<count> of 4GiB or larger;
+L<nbd_get_extended_headers_negotiated(3)> indicates which servers
+will parse a request larger than 32 bits.
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum block status size,
 although a future extension may add a constraint visible in
@@ -3239,6 +3309,9 @@ let first_version =
   (* Added in 1.11.x development cycle, will be stable and supported in 1.12. *)
   "block_status_64", (1, 12);
   "aio_block_status_64", (1, 12);
+  "set_request_extended_headers", (1, 12);
+  "get_request_extended_headers", (1, 12);
+  "get_extended_headers_negotiated", (1, 12);

   (* These calls are proposed for a future version of libnbd, but
    * have not been added to any released version so far.
diff --git a/lib/handle.c b/lib/handle.c
index 74fe87ec..9b96c7f7 100644
--- a/lib/handle.c
+++ b/lib/handle.c
@@ -63,6 +63,7 @@ nbd_create (void)

   h->unique = 1;
   h->tls_verify_peer = true;
+  h->request_eh = true;
   h->request_sr = true;

   h->uri_allow_transports = LIBNBD_ALLOW_TRANSPORT_MASK;
@@ -356,6 +357,28 @@ nbd_unlocked_clear_meta_contexts (struct nbd_handle *h)
   return 0;
 }

+
+int
+nbd_unlocked_set_request_extended_headers (struct nbd_handle *h,
+                                           bool request)
+{
+  h->request_eh = request;
+  return 0;
+}
+
+/* NB: may_set_error = false. */
+int
+nbd_unlocked_get_request_extended_headers (struct nbd_handle *h)
+{
+  return h->request_eh;
+}
+
+int
+nbd_unlocked_get_extended_headers_negotiated (struct nbd_handle *h)
+{
+  return h->extended_headers;
+}
+
 int
 nbd_unlocked_set_request_structured_replies (struct nbd_handle *h,
                                              bool request)
diff --git a/python/t/110-defaults.py b/python/t/110-defaults.py
index fb961cfd..ecd4dfda 100644
--- a/python/t/110-defaults.py
+++ b/python/t/110-defaults.py
@@ -1,5 +1,5 @@
 # libnbd Python bindings
-# Copyright (C) 2010-2020 Red Hat Inc.
+# Copyright (C) 2010-2021 Red Hat Inc.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -21,6 +21,7 @@ h = nbd.NBD()
 assert h.get_export_name() == ""
 assert h.get_full_info() is False
 assert h.get_tls() == nbd.TLS_DISABLE
+assert h.get_request_extended_headers() is True
 assert h.get_request_structured_replies() is True
 assert h.get_handshake_flags() == nbd.HANDSHAKE_FLAG_MASK
 assert h.get_opt_mode() is False
diff --git a/python/t/120-set-non-defaults.py b/python/t/120-set-non-defaults.py
index 3da0c23e..b34fb508 100644
--- a/python/t/120-set-non-defaults.py
+++ b/python/t/120-set-non-defaults.py
@@ -1,5 +1,5 @@
 # libnbd Python bindings
-# Copyright (C) 2010-2020 Red Hat Inc.
+# Copyright (C) 2010-2021 Red Hat Inc.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -31,6 +31,8 @@ assert h.get_tls() == nbd.TLS_DISABLE
 if h.supports_tls():
     h.set_tls(nbd.TLS_ALLOW)
     assert h.get_tls() == nbd.TLS_ALLOW
+h.set_request_extended_headers(False)
+assert h.get_request_extended_headers() is False
 h.set_request_structured_replies(False)
 assert h.get_request_structured_replies() is False
 try:
diff --git a/ocaml/tests/test_110_defaults.ml b/ocaml/tests/test_110_defaults.ml
index f5886fca..5fe448b6 100644
--- a/ocaml/tests/test_110_defaults.ml
+++ b/ocaml/tests/test_110_defaults.ml
@@ -1,6 +1,6 @@
 (* hey emacs, this is OCaml code: -*- tuareg -*- *)
 (* libnbd OCaml test case
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -25,6 +25,8 @@ let
   assert (info = false);
   let tls = NBD.get_tls nbd in
   assert (tls = NBD.TLS.DISABLE);
+  let eh = NBD.get_request_extended_headers nbd in
+  assert (eh = true);
   let sr = NBD.get_request_structured_replies nbd in
   assert (sr = true);
   let flags = NBD.get_handshake_flags nbd in
diff --git a/ocaml/tests/test_120_set_non_defaults.ml b/ocaml/tests/test_120_set_non_defaults.ml
index b660e5d5..47914d9c 100644
--- a/ocaml/tests/test_120_set_non_defaults.ml
+++ b/ocaml/tests/test_120_set_non_defaults.ml
@@ -1,6 +1,6 @@
 (* hey emacs, this is OCaml code: -*- tuareg -*- *)
 (* libnbd OCaml test case
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -37,6 +37,9 @@ let
     let tls = NBD.get_tls nbd in
     assert (tls = NBD.TLS.ALLOW);
   );
+  NBD.set_request_extended_headers nbd false;
+  let eh = NBD.get_request_extended_headers nbd in
+  assert (eh = false);
   NBD.set_request_structured_replies nbd false;
   let sr = NBD.get_request_structured_replies nbd in
   assert (sr = false);
diff --git a/golang/libnbd_110_defaults_test.go b/golang/libnbd_110_defaults_test.go
index b3ceb45d..659ea18c 100644
--- a/golang/libnbd_110_defaults_test.go
+++ b/golang/libnbd_110_defaults_test.go
@@ -51,6 +51,14 @@ func Test110Defaults(t *testing.T) {
 		t.Fatalf("unexpected tls state")
 	}

+	eh, err := h.GetRequestExtendedHeaders()
+	if err != nil {
+		t.Fatalf("could not get extended headers state: %s", err)
+	}
+	if eh != true {
+		t.Fatalf("unexpected extended headers state")
+	}
+
 	sr, err := h.GetRequestStructuredReplies()
 	if err != nil {
 		t.Fatalf("could not get structured replies state: %s", err)
diff --git a/golang/libnbd_120_set_non_defaults_test.go b/golang/libnbd_120_set_non_defaults_test.go
index f112456c..d27ec5dc 100644
--- a/golang/libnbd_120_set_non_defaults_test.go
+++ b/golang/libnbd_120_set_non_defaults_test.go
@@ -81,6 +81,18 @@ func Test120SetNonDefaults(t *testing.T) {
 		}
 	}

+	err = h.SetRequestExtendedHeaders(false)
+	if err != nil {
+		t.Fatalf("could not set extended headers state: %s", err)
+	}
+	eh, err := h.GetRequestExtendedHeaders()
+	if err != nil {
+		t.Fatalf("could not get extended headers state: %s", err)
+	}
+	if eh != false {
+		t.Fatalf("unexpected extended headers state")
+	}
+
 	err = h.SetRequestStructuredReplies(false)
 	if err != nil {
 		t.Fatalf("could not set structured replies state: %s", err)
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 12/13] generator: Actually request extended headers
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (10 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 11/13] api: Add three functions for controlling extended headers Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-03 23:17   ` [libnbd PATCH 13/13] interop: Add test of 64-bit block status Eric Blake
  2021-12-10  8:16   ` [Libguestfs] [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Laszlo Ersek
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

This is the culmination of the previous patches preparation work for
using extended headers when possible.  The new states in the state
machine are copied extensively from our handling of
OPT_STRUCTURED_REPLY.

At the same time I posted this patch, I had patches for qemu-nbd to
support extended headers as server (nbdkit is a bit tougher).  The
interop tests still pass when using a new enough qemu-nbd, showing
that we have cross-project interoperability and therefore an extension
worth standardizing.
---
 generator/Makefile.am                         |  3 +-
 generator/state_machine.ml                    | 41 +++++++++
 .../states-newstyle-opt-extended-headers.c    | 90 +++++++++++++++++++
 generator/states-newstyle-opt-starttls.c      | 10 +--
 4 files changed, 138 insertions(+), 6 deletions(-)
 create mode 100644 generator/states-newstyle-opt-extended-headers.c

diff --git a/generator/Makefile.am b/generator/Makefile.am
index 594d23cf..c889eb7f 100644
--- a/generator/Makefile.am
+++ b/generator/Makefile.am
@@ -1,5 +1,5 @@
 # nbd client library in userspace
-# Copyright (C) 2013-2020 Red Hat Inc.
+# Copyright (C) 2013-2021 Red Hat Inc.
 #
 # This library is free software; you can redistribute it and/or
 # modify it under the terms of the GNU Lesser General Public
@@ -30,6 +30,7 @@ states_code = \
 	states-issue-command.c \
 	states-magic.c \
 	states-newstyle-opt-export-name.c \
+	states-newstyle-opt-extended-headers.c \
 	states-newstyle-opt-list.c \
 	states-newstyle-opt-go.c \
 	states-newstyle-opt-meta-context.c \
diff --git a/generator/state_machine.ml b/generator/state_machine.ml
index 99652948..ad8eba5e 100644
--- a/generator/state_machine.ml
+++ b/generator/state_machine.ml
@@ -295,6 +295,7 @@ and
    * NEGOTIATING after OPT_STRUCTURED_REPLY or any failed OPT_GO.
    *)
   Group ("OPT_STARTTLS", newstyle_opt_starttls_state_machine);
+  Group ("OPT_EXTENDED_HEADERS", newstyle_opt_extended_headers_state_machine);
   Group ("OPT_STRUCTURED_REPLY", newstyle_opt_structured_reply_state_machine);
   Group ("OPT_META_CONTEXT", newstyle_opt_meta_context_state_machine);
   Group ("OPT_GO", newstyle_opt_go_state_machine);
@@ -432,6 +433,46 @@ and
   };
 ]

+(* Fixed newstyle NBD_OPT_EXTENDED_HEADERS option.
+ * Implementation: generator/states-newstyle-opt-extended-headers.c
+ *)
+and newstyle_opt_extended_headers_state_machine = [
+  State {
+    default_state with
+    name = "START";
+    comment = "Try to negotiate newstyle NBD_OPT_EXTENDED_HEADERS";
+    external_events = [];
+  };
+
+  State {
+    default_state with
+    name = "SEND";
+    comment = "Send newstyle NBD_OPT_EXTENDED_HEADERS negotiation request";
+    external_events = [ NotifyWrite, "" ];
+  };
+
+  State {
+    default_state with
+    name = "RECV_REPLY";
+    comment = "Receive newstyle NBD_OPT_EXTENDED_HEADERS option reply";
+    external_events = [ NotifyRead, "" ];
+  };
+
+  State {
+    default_state with
+    name = "RECV_REPLY_PAYLOAD";
+    comment = "Receive any newstyle NBD_OPT_EXTENDED_HEADERS reply payload";
+    external_events = [ NotifyRead, "" ];
+  };
+
+  State {
+    default_state with
+    name = "CHECK_REPLY";
+    comment = "Check newstyle NBD_OPT_EXTENDED_HEADERS option reply";
+    external_events = [];
+  };
+]
+
 (* Fixed newstyle NBD_OPT_STRUCTURED_REPLY option.
  * Implementation: generator/states-newstyle-opt-structured-reply.c
  *)
diff --git a/generator/states-newstyle-opt-extended-headers.c b/generator/states-newstyle-opt-extended-headers.c
new file mode 100644
index 00000000..e2c9890e
--- /dev/null
+++ b/generator/states-newstyle-opt-extended-headers.c
@@ -0,0 +1,90 @@
+/* nbd client library in userspace: state machine
+ * Copyright (C) 2013-2021 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/* State machine for negotiating NBD_OPT_EXTENDED_HEADERS. */
+
+STATE_MACHINE {
+ NEWSTYLE.OPT_EXTENDED_HEADERS.START:
+  assert (h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE);
+  if (!h->request_eh) {
+    SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+    return 0;
+  }
+
+  h->sbuf.option.version = htobe64 (NBD_NEW_VERSION);
+  h->sbuf.option.option = htobe32 (NBD_OPT_EXTENDED_HEADERS);
+  h->sbuf.option.optlen = htobe32 (0);
+  h->wbuf = &h->sbuf;
+  h->wlen = sizeof h->sbuf.option;
+  SET_NEXT_STATE (%SEND);
+  return 0;
+
+ NEWSTYLE.OPT_EXTENDED_HEADERS.SEND:
+  switch (send_from_wbuf (h)) {
+  case -1: SET_NEXT_STATE (%.DEAD); return 0;
+  case 0:
+    h->rbuf = &h->sbuf;
+    h->rlen = sizeof h->sbuf.or.option_reply;
+    SET_NEXT_STATE (%RECV_REPLY);
+  }
+  return 0;
+
+ NEWSTYLE.OPT_EXTENDED_HEADERS.RECV_REPLY:
+  switch (recv_into_rbuf (h)) {
+  case -1: SET_NEXT_STATE (%.DEAD); return 0;
+  case 0:
+    if (prepare_for_reply_payload (h, NBD_OPT_EXTENDED_HEADERS) == -1) {
+      SET_NEXT_STATE (%.DEAD);
+      return 0;
+    }
+    SET_NEXT_STATE (%RECV_REPLY_PAYLOAD);
+  }
+  return 0;
+
+ NEWSTYLE.OPT_EXTENDED_HEADERS.RECV_REPLY_PAYLOAD:
+  switch (recv_into_rbuf (h)) {
+  case -1: SET_NEXT_STATE (%.DEAD); return 0;
+  case 0:  SET_NEXT_STATE (%CHECK_REPLY);
+  }
+  return 0;
+
+ NEWSTYLE.OPT_EXTENDED_HEADERS.CHECK_REPLY:
+  uint32_t reply;
+
+  reply = be32toh (h->sbuf.or.option_reply.reply);
+  switch (reply) {
+  case NBD_REP_ACK:
+    debug (h, "negotiated extended headers on this connection");
+    h->extended_headers = true;
+    break;
+  default:
+    if (handle_reply_error (h) == -1) {
+      SET_NEXT_STATE (%.DEAD);
+      return 0;
+    }
+
+    debug (h, "extended headers are not supported by this server");
+    h->extended_headers = false;
+    break;
+  }
+
+  /* Next option. */
+  SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+  return 0;
+
+} /* END STATE MACHINE */
diff --git a/generator/states-newstyle-opt-starttls.c b/generator/states-newstyle-opt-starttls.c
index 9eab023b..2aec3f3d 100644
--- a/generator/states-newstyle-opt-starttls.c
+++ b/generator/states-newstyle-opt-starttls.c
@@ -1,5 +1,5 @@
 /* nbd client library in userspace: state machine
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2021 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -23,7 +23,7 @@ STATE_MACHINE {
   assert (h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE);
   /* If TLS was not requested we skip this option and go to the next one. */
   if (h->tls == LIBNBD_TLS_DISABLE) {
-    SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+    SET_NEXT_STATE (%^OPT_EXTENDED_HEADERS.START);
     return 0;
   }

@@ -102,7 +102,7 @@ STATE_MACHINE {
     debug (h,
            "server refused TLS (%s), continuing with unencrypted connection",
            reply == NBD_REP_ERR_POLICY ? "policy" : "not supported");
-    SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+    SET_NEXT_STATE (%^OPT_EXTENDED_HEADERS.START);
     return 0;
   }
   return 0;
@@ -121,7 +121,7 @@ STATE_MACHINE {
     nbd_internal_crypto_debug_tls_enabled (h);

     /* Continue with option negotiation. */
-    SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+    SET_NEXT_STATE (%^OPT_EXTENDED_HEADERS.START);
     return 0;
   }
   /* Continue handshake. */
@@ -144,7 +144,7 @@ STATE_MACHINE {
     debug (h, "connection is using TLS");

     /* Continue with option negotiation. */
-    SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+    SET_NEXT_STATE (%^OPT_EXTENDED_HEADERS.START);
     return 0;
   }
   /* Continue handshake. */
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [libnbd PATCH 13/13] interop: Add test of 64-bit block status
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (11 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 12/13] generator: Actually request " Eric Blake
@ 2021-12-03 23:17   ` Eric Blake
  2021-12-10  8:16   ` [Libguestfs] [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Laszlo Ersek
  13 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-03 23:17 UTC (permalink / raw)
  To: libguestfs; +Cc: nsoffer, vsementsov, qemu-devel, qemu-block, nbd

Prove that we can round-trip a block status request larger than 4G
through a new-enough qemu-nbd.  Also serves as a unit test of our shim
for converting internal 64-bit representation back to the older 32-bit
nbd_block_status callback interface.
---
 interop/Makefile.am     |   6 ++
 interop/large-status.c  | 186 ++++++++++++++++++++++++++++++++++++++++
 interop/large-status.sh |  49 +++++++++++
 .gitignore              |   1 +
 4 files changed, 242 insertions(+)
 create mode 100644 interop/large-status.c
 create mode 100755 interop/large-status.sh

diff --git a/interop/Makefile.am b/interop/Makefile.am
index 3a8d5677..96c0a0f6 100644
--- a/interop/Makefile.am
+++ b/interop/Makefile.am
@@ -20,6 +20,7 @@ include $(top_srcdir)/subdir-rules.mk
 EXTRA_DIST = \
 	dirty-bitmap.sh \
 	interop-qemu-storage-daemon.sh \
+	large-status.sh \
 	list-exports-nbd-config \
 	list-exports-test-dir/disk1 \
 	list-exports-test-dir/disk2 \
@@ -129,6 +130,7 @@ check_PROGRAMS += \
 	list-exports-qemu-nbd \
 	socket-activation-qemu-nbd \
 	dirty-bitmap \
+	large-status \
 	structured-read \
 	$(NULL)
 TESTS += \
@@ -138,6 +140,7 @@ TESTS += \
 	list-exports-qemu-nbd \
 	socket-activation-qemu-nbd \
 	dirty-bitmap.sh \
+	large-status.sh \
 	structured-read.sh \
 	$(NULL)

@@ -227,6 +230,9 @@ socket_activation_qemu_nbd_LDADD = $(top_builddir)/lib/libnbd.la
 dirty_bitmap_SOURCES = dirty-bitmap.c
 dirty_bitmap_LDADD = $(top_builddir)/lib/libnbd.la

+large_status_SOURCES = large-status.c
+large_status_LDADD = $(top_builddir)/lib/libnbd.la
+
 structured_read_SOURCES = structured-read.c
 structured_read_LDADD = $(top_builddir)/lib/libnbd.la

diff --git a/interop/large-status.c b/interop/large-status.c
new file mode 100644
index 00000000..3cc040fe
--- /dev/null
+++ b/interop/large-status.c
@@ -0,0 +1,186 @@
+/* NBD client library in userspace
+ * Copyright (C) 2013-2021 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/* Test 64-bit block status with qemu. */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <assert.h>
+#include <stdbool.h>
+#include <errno.h>
+
+#include <libnbd.h>
+
+static const char *bitmap;
+
+struct data {
+  bool req_one;    /* input: true if req_one was passed to request */
+  int count;       /* input: count of expected remaining calls */
+  bool seen_base;  /* output: true if base:allocation encountered */
+  bool seen_dirty; /* output: true if qemu:dirty-bitmap encountered */
+};
+
+static int
+cb32 (void *opaque, const char *metacontext, uint64_t offset,
+      uint32_t *entries, size_t len, int *error)
+{
+  struct data *data = opaque;
+
+  assert (offset == 0);
+  assert (data->count-- > 0);
+
+  if (strcmp (metacontext, LIBNBD_CONTEXT_BASE_ALLOCATION) == 0) {
+    assert (!data->seen_base);
+    data->seen_base = true;
+
+    /* Data block offset 0 size 64k, remainder is hole */
+    assert (len == 4);
+    assert (entries[0] == 65536);
+    assert (entries[1] == 0);
+    /* libnbd had to truncate qemu's >4G answer */
+    assert (entries[2] == 4227858432);
+    assert (entries[3] == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
+  }
+  else if (strcmp (metacontext, bitmap) == 0) {
+    assert (!data->seen_dirty);
+    data->seen_dirty = true;
+
+    /* Dirty block at offset 5G-64k, remainder is clean */
+    /* libnbd had to truncate qemu's >4G answer */
+    assert (len == 2);
+    assert (entries[0] == 4227858432);
+    assert (entries[1] == 0);
+  }
+  else {
+    fprintf (stderr, "unexpected context %s\n", metacontext);
+    exit (EXIT_FAILURE);
+  }
+  return 0;
+}
+
+static int
+cb64 (void *opaque, const char *metacontext, uint64_t offset,
+      nbd_extent *entries, size_t len, int *error)
+{
+  struct data *data = opaque;
+
+  assert (offset == 0);
+  assert (data->count-- > 0);
+
+  if (strcmp (metacontext, LIBNBD_CONTEXT_BASE_ALLOCATION) == 0) {
+    assert (!data->seen_base);
+    data->seen_base = true;
+
+    /* Data block offset 0 size 64k, remainder is hole */
+    assert (len == 2);
+    assert (entries[0].length == 65536);
+    assert (entries[0].flags == 0);
+    assert (entries[1].length == 5368643584ULL);
+    assert (entries[1].flags == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
+  }
+  else if (strcmp (metacontext, bitmap) == 0) {
+    assert (!data->seen_dirty);
+    data->seen_dirty = true;
+
+    /* Dirty block at offset 5G-64k, remainder is clean */
+    assert (len == 2);
+    assert (entries[0].length == 5368643584ULL);
+    assert (entries[0].flags == 0);
+    assert (entries[1].length == 65536);
+    assert (entries[1].flags == 1);
+  }
+  else {
+    fprintf (stderr, "unexpected context %s\n", metacontext);
+    exit (EXIT_FAILURE);
+  }
+  return 0;
+}
+
+int
+main (int argc, char *argv[])
+{
+  struct nbd_handle *nbd;
+  int64_t exportsize;
+  struct data data;
+
+  if (argc < 3) {
+    fprintf (stderr, "%s bitmap qemu-nbd [args ...]\n", argv[0]);
+    exit (EXIT_FAILURE);
+  }
+  bitmap = argv[1];
+
+  nbd = nbd_create ();
+  if (nbd == NULL) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  nbd_add_meta_context (nbd, LIBNBD_CONTEXT_BASE_ALLOCATION);
+  nbd_add_meta_context (nbd, bitmap);
+
+  if (nbd_connect_systemd_socket_activation (nbd, &argv[2]) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  exportsize = nbd_get_size (nbd);
+  if (exportsize == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  if (nbd_get_extended_headers_negotiated (nbd) != 1) {
+    fprintf (stderr, "skipping: qemu-nbd lacks extended headers\n");
+    exit (77);
+  }
+
+  /* Prove that we can round-trip a >4G block status request */
+  data = (struct data) { .count = 2, };
+  if (nbd_block_status_64 (nbd, exportsize, 0,
+                           (nbd_extent64_callback) { .callback = cb64,
+                             .user_data = &data },
+                           0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+  assert (data.seen_base && data.seen_dirty);
+
+  /* Check libnbd's handling of a >4G response through older interface  */
+  data = (struct data) { .count = 2, };
+  if (nbd_block_status (nbd, exportsize, 0,
+                        (nbd_extent_callback) { .callback = cb32,
+                          .user_data = &data },
+                        0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+  assert (data.seen_base && data.seen_dirty);
+
+  if (nbd_shutdown (nbd, 0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  nbd_close (nbd);
+
+  exit (EXIT_SUCCESS);
+}
diff --git a/interop/large-status.sh b/interop/large-status.sh
new file mode 100755
index 00000000..58fbdd36
--- /dev/null
+++ b/interop/large-status.sh
@@ -0,0 +1,49 @@
+#!/usr/bin/env bash
+# nbd client library in userspace
+# Copyright (C) 2019-2021 Red Hat Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+
+# Test reading qemu dirty-bitmap.
+
+source ../tests/functions.sh
+set -e
+set -x
+
+requires qemu-img bitmap --help
+requires qemu-nbd --version
+
+# This test uses the qemu-nbd -B option.
+if ! qemu-nbd --help | grep -sq -- -B; then
+    echo "$0: skipping because qemu-nbd does not support the -B option"
+    exit 77
+fi
+
+files="large-status.qcow2"
+rm -f $files
+cleanup_fn rm -f $files
+
+# Create mostly-sparse file with intentionally different data vs. dirty areas
+# (64k data, 5G-64k hole,zero; 5G-64k clean, 64k dirty)
+qemu-img create -f qcow2 large-status.qcow2 5G
+qemu-img bitmap --add --enable -f qcow2 large-status.qcow2 bitmap0
+qemu-io -f qcow2 -c "w -z $((5*1024*1024*1024 - 64*1024)) 64k" \
+        large-status.qcow2
+qemu-img bitmap --disable -f qcow2 large-status.qcow2 bitmap0
+qemu-io -f qcow2 -c 'w 0 64k' large-status.qcow2
+
+# Run the test.
+$VG ./large-status qemu:dirty-bitmap:bitmap0 \
+    qemu-nbd -f qcow2 -B bitmap0 large-status.qcow2
diff --git a/.gitignore b/.gitignore
index 3ecdceaf..cbc5b88d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -100,6 +100,7 @@ Makefile.in
 /interop/interop-qemu-nbd
 /interop/interop-qemu-nbd-tls-certs
 /interop/interop-qemu-nbd-tls-psk
+/interop/large-status
 /interop/list-exports-nbd-server
 /interop/list-exports-nbdkit
 /interop/list-exports-qemu-nbd
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2021-12-03 23:14 ` [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
@ 2021-12-06 11:40   ` Vladimir Sementsov-Ogievskiy
  2021-12-06 23:00     ` Eric Blake
  2021-12-10 18:16   ` Vladimir Sementsov-Ogievskiy
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 46+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-06 11:40 UTC (permalink / raw)
  To: Eric Blake, nbd; +Cc: qemu-devel, qemu-block, libguestfs, nsoffer

04.12.2021 02:14, Eric Blake wrote:
> Add a new negotiation feature where the client and server agree to use
> larger packet headers on every packet sent during transmission phase.
> This has two purposes: first, it makes it possible to perform
> operations like trim, write zeroes, and block status on more than 2^32
> bytes in a single command; this in turn requires that some structured
> replies from the server also be extended to match.  The wording chosen
> here is careful to permit a server to use either flavor in its reply
> (that is, a request less than 32-bits can trigger an extended reply,
> and conversely a request larger than 32-bits can trigger a compact
> reply).
> 
> Second, when structured replies are active, clients have to deal with
> the difference between 16- and 20-byte headers of simple
> vs. structured replies, which impacts performance if the client must
> perform multiple syscalls to first read the magic before knowing how
> many additional bytes to read.  In extended header mode, all headers
> are the same width, so the client can read a full header before
> deciding whether the header describes a simple or structured reply.
> Similarly, by having extended mode use a power-of-2 sizing, it becomes
> easier to manipulate headers within a single cache line, even if it
> requires padding bytes sent over the wire.  However, note that this
> change only affects the headers; as data payloads can still be
> unaligned (for example, a client performing 1-byte reads or writes),
> we would need to negotiate yet another extension if we wanted to
> ensure that all NBD transmission packets started on an 8-byte boundary
> after option haggling has completed.
> 
> This spec addition was done in parallel with a proof of concept
> implementation in qemu (server and client) and libnbd (client), and I
> also have plans to implement it in nbdkit (server).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 
> Available at https://repo.or.cz/nbd/ericb.git/shortlog/refs/tags/exthdr-v1
> 
>   doc/proto.md | 218 +++++++++++++++++++++++++++++++++++++++++----------
>   1 file changed, 177 insertions(+), 41 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 3a877a9..46560b6 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -295,6 +295,21 @@ reply is also problematic for error handling of the `NBD_CMD_READ`
>   request.  Therefore, structured replies can be used to create a
>   a context-free server stream; see below.
> 
> +The results of client negotiation also determine whether the client
> +and server will utilize only compact requests and replies, or whether
> +both sides will use only extended packets.  Compact messages are the
> +default, but inherently limit single transactions to a 32-bit window
> +starting at a 64-bit offset.  Extended messages make it possible to
> +perform 64-bit transactions (although typically only for commands that
> +do not include a data payload).  Furthermore, when structured replies
> +have been negotiated, compact messages require the client to perform
> +partial reads to determine which reply packet style (simple or
> +structured) is on the wire before knowing the length of the rest of
> +the reply, which can reduce client performance.  With extended
> +messages, all packet headers have a fixed length of 32 bytes, and
> +although this results in more traffic over the network due to padding,
> +the resulting layout is friendlier for performance.
> +
>   Replies need not be sent in the same order as requests (i.e., requests
>   may be handled by the server asynchronously), and structured reply
>   chunks from one request may be interleaved with reply messages from
> @@ -343,7 +358,9 @@ may be useful.
> 
>   #### Request message
> 
> -The request message, sent by the client, looks as follows:
> +The compact request message, sent by the client when extended
> +transactions are not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> +looks as follows:
> 
>   C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`)
>   C: 16 bits, command flags
> @@ -353,14 +370,26 @@ C: 64 bits, offset (unsigned)
>   C: 32 bits, length (unsigned)
>   C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)
> 
> +If negotiation agreed on extended transactions with
> +`NBD_OPT_EXTENDED_HEADERS`, the client instead uses extended requests:
> +
> +C: 32 bits, 0x21e41c71, magic (`NBD_REQUEST_EXT_MAGIC`)
> +C: 16 bits, command flags
> +C: 16 bits, type
> +C: 64 bits, handle
> +C: 64 bits, offset (unsigned)
> +C: 64 bits, length (unsigned)
> +C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)
> +
>   #### Simple reply message
> 
>   The simple reply message MUST be sent by the server in response to all
>   requests if structured replies have not been negotiated using
> -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
> -reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
> -but only if the reply has no data payload.  The message looks as
> -follows:
> +`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been
> +negotiated, a simple reply MAY be used as a reply to any request other
> +than `NBD_CMD_READ`, but only if the reply has no data payload.  If
> +extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> +the message looks as follows:
> 
>   S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
>      `NBD_REPLY_MAGIC`)
> @@ -369,6 +398,16 @@ S: 64 bits, handle
>   S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
>       *error* is zero)
> 
> +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> +the message looks like:
> +
> +S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`)
> +S: 32 bits, error (MAY be zero)
> +S: 64 bits, handle
> +S: 128 bits, padding (MUST be zero)
> +S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
> +    *error* is zero)
> +

If we go this way, let's put payload length into padding: it will help to make the protocol context-independent and less error-prone.

Or, the otherway, may be just forbid the payload for simple-64bit ? What's the reason to allow 64bit requests without structured reply negotiation?

>   #### Structured reply chunk message
> 
>   Some of the major downsides of the default simple reply to
> @@ -410,7 +449,9 @@ considered successful only if it did not contain any error chunks,
>   although the client MAY be able to determine partial success based
>   on the chunks received.
> 
> -A structured reply chunk message looks as follows:
> +If extended headers were not negotiated using
> +`NBD_OPT_EXTENDED_HEADERS`, a structured reply chunk message looks as
> +follows:
> 
>   S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
>   S: 16 bits, flags
> @@ -423,6 +464,17 @@ The use of *length* in the reply allows context-free division of
>   the overall server traffic into individual reply messages; the
>   *type* field describes how to further interpret the payload.
> 
> +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> +the message looks like:
> +
> +S: 32 bits, 0x6e8a278c, magic (`NBD_STRUCTURED_REPLY_EXT_MAGIC`)
> +S: 16 bits, flags
> +S: 16 bits, type
> +S: 64 bits, handle
> +S: 64 bits, length of payload (unsigned)

Maybe, 64bits is too much for payload. But who knows. And it's good that it's symmetric to 64bit length in request.

> +S: 64 bits, padding (MUST be zero)

Hmm. Extra 8 bytes to be power-of-2. Does 32 bytes really perform better than 24 bytes?

> +S: *length* bytes of payload data (if *length* is nonzero)

Hmm2: we probably may move "handle" to the start of payload. This way we can keep 16bytes header for simple reply and 16bytes header for structured. So structured are read in two shots: 1. the header, 2. handle + payload.. But that means deeper restructuring of the client code.. So seems not worth it.


> +
>   #### Terminating the transmission phase
> 
>   There are two methods of terminating the transmission phase:
> @@ -870,15 +922,19 @@ The procedure works as follows:
>     server supports.
>   - During transmission, a client can then indicate interest in metadata
>     for a given region by way of the `NBD_CMD_BLOCK_STATUS` command,
> -  where *offset* and *length* indicate the area of interest. The
> -  server MUST then respond with the requested information, for all
> +  where *offset* and *length* indicate the area of interest.
> +- The server MUST then respond with the requested information, for all
>     contexts which were selected during negotiation. For every metadata
> -  context, the server sends one set of extent chunks, where the sizes
> -  of the extents MUST be less than or equal to the length as specified
> -  in the request.

I'm not sure we can simply drop this requirement.. It seems like an incompatible change, isn't it? May be, we should allow any size of extent only for 64bit mode?

> Each extent comes with a *flags* field, the
> -  semantics of which are defined by the metadata context.
> -- A server MUST reply to `NBD_CMD_BLOCK_STATUS` with a structured
> -  reply of type `NBD_REPLY_TYPE_BLOCK_STATUS`.
> +  context, the server sends one set of extent chunks, using
> +  `NBD_REPLY_TYPE_BLOCK_STATUS` or `NBD_REPLY_TYPE_BLOCK_STATUS_EXT`
> +  (the latter is only possible if the client also negotiated
> +  `NBD_OPT_EXTENDED_HEADERS`).  Each extent comes with a *flags*
> +  field, the semantics of which are defined by the metadata context.
> +
> +The client's requested *size* is only a hint to the server, so the
> +summed size of extents in the server's reply may be shorter, or in
> +some cases longer, than the original request, and may even differ
> +between contexts when multiple metadata contexts were negotiated.
> 
>   A client MUST NOT use `NBD_CMD_BLOCK_STATUS` unless it selected a
>   nonzero number of metadata contexts during negotiation, and used the
> @@ -1179,10 +1235,10 @@ of the newstyle negotiation.
> 
>       When this command succeeds, the server MUST NOT preserve any
>       negotiation state (such as a request for
> -    `NBD_OPT_STRUCTURED_REPLY`, or metadata contexts from
> -    `NBD_OPT_SET_META_CONTEXT`) issued before this command.  A client
> -    SHOULD defer all stateful option requests until after it
> -    determines whether encryption is available.
> +    `NBD_OPT_STRUCTURED_REPLY` or `NBD_OPT_EXTENDED_HEADERS`, or
> +    metadata contexts from `NBD_OPT_SET_META_CONTEXT`) issued before
> +    this command.  A client SHOULD defer all stateful option requests
> +    until after it determines whether encryption is available.
> 
>       See the section on TLS above for further details.
> 
> @@ -1460,6 +1516,26 @@ of the newstyle negotiation.
>       option does not select any metadata context, provided the client
>       then does not attempt to issue `NBD_CMD_BLOCK_STATUS` commands.
> 
> +* `NBD_OPT_EXTENDED_HEADERS` (11)
> +
> +    The client wishes to use extended headers during the transmission
> +    phase.  The client MUST NOT send any additional data with the
> +    option, and the server SHOULD reject a request that includes data
> +    with `NBD_REP_ERR_INVALID`.
> +
> +    The server replies with the following, or with an error permitted
> +    elsewhere in this document:
> +
> +    - `NBD_REP_ACK`: Extended headers have been negotiated; the client
> +      MUST use the 32-byte extended request header, and the server
> +      MUST use the 32-byte extended reply header.
> +    - For backwards compatibility, clients SHOULD be prepared to also
> +      handle `NBD_REP_ERR_UNSUP`; in this case, only the compact
> +      transmission headers will be used.
> +
> +    If the client requests `NBD_OPT_STARTTLS` after this option, it
> +    MUST renegotiate extended headers.
> +
>   #### Option reply types
> 
>   These values are used in the "reply type" field, sent by the server
> @@ -1713,12 +1789,12 @@ unrecognized flags.
> 
>   #### Structured reply types
> 
> -These values are used in the "type" field of a structured reply.
> -Some chunk types can additionally be categorized by role, such as
> -*error chunks* or *content chunks*.  Each type determines how to
> -interpret the "length" bytes of payload.  If the client receives
> -an unknown or unexpected type, other than an *error chunk*, it
> -MUST initiate a hard disconnect.
> +These values are used in the "type" field of a structured reply.  Some
> +chunk types can additionally be categorized by role, such as *error
> +chunks*, *content chunks*, or *status chunks*.  Each type determines
> +how to interpret the "length" bytes of payload.  If the client
> +receives an unknown or unexpected type, other than an *error chunk*,
> +it MUST initiate a hard disconnect.

Just add "status chunks" to the list. Seems unrelated, better be in a separate patch.

> 
>   * `NBD_REPLY_TYPE_NONE` (0)
> 
> @@ -1761,13 +1837,34 @@ MUST initiate a hard disconnect.
>     64 bits: offset (unsigned)
>     32 bits: hole size (unsigned, MUST be nonzero)
> 
> +* `NBD_REPLY_TYPE_OFFSET_HOLE_EXT` (3)
> +
> +  This chunk type is in the content chunk category.  *length* MUST be
> +  exactly 16.  The semantics of this chunk mirror those of
> +  `NBD_REPLY_TYPE_OFFSET_HOLE`, other than the use of a larger *hole
> +  size* field.  This chunk type MUST NOT be used unless extended
> +  headers were negotiated with `NBD_OPT_EXTENDED_HEADERS`.

Why do you call all such things _EXT, not _64 ? _64 seems more informative.

> +
> +  The payload is structured as:
> +
> +  64 bits: offset (unsigned)
> +  64 bits: hole size (unsigned, MUST be nonzero)
> +
> +  Note that even when extended headers are in use, a server may
> +  enforce a maximum block size that is smaller than 32 bits, in which
> +  case no valid `NBD_CMD_READ` will have a *length* large enough to
s/nc/no/ ? But hard to read any way, as sounds very similar to "not valid", which breaks the meaning.

may be just "in which case valid NBD_CMD_READ will not have"

> +  require the use of this chunk type.  However, a client using
> +  extended headers MUST be prepared for the server to use either the
> +  compact or extended chunk type.
> +
>   * `NBD_REPLY_TYPE_BLOCK_STATUS` (5)
> 
> -  *length* MUST be 4 + (a positive integer multiple of 8).  This reply
> -  represents a series of consecutive block descriptors where the sum
> -  of the length fields within the descriptors is subject to further
> -  constraints documented below. This chunk type MUST appear
> -  exactly once per metadata ID in a structured reply.
> +  This chunk type is in the status chunk category.  *length* MUST be
> +  4 + (a positive integer multiple of 8).  This reply represents a
> +  series of consecutive block descriptors where the sum of the length
> +  fields within the descriptors is subject to further constraints
> +  documented below.  Each negotiated metadata ID must have exactly one
> +  status chunk in the overall structured reply.

just rewording, no semantic changes, yes?

> 
>     The payload starts with:
> 
> @@ -1796,9 +1893,36 @@ MUST initiate a hard disconnect.
>     information to the client, if looking up the information would be
>     too resource-intensive for the server, so long as at least one
>     extent is returned. Servers should however be aware that most
> -  clients implementations will then simply ask for the next extent
> +  client implementations will then simply ask for the next extent
>     instead.

So you keep all restrictions about NBD_CMD_FLAG_REQ_ONE and about sum of lenghts of extents as is here..

> 
> +* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)
> +
> +  This chunk type is in the status chunk category.  *length* MUST be
> +  4 + (a positive multiple of 16).  The semantics of this chunk mirror
> +  those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a
> +  larger *extent length* field, as well as added padding to ease
> +  alignment.

But what about restrictions on chunk lengths and cumulative chunk length?

> +  This chunk type MUST NOT be used unless extended headers
> +  were negotiated with `NBD_OPT_EXTENDED_HEADERS`.
> +
> +  The payload starts with:
> +
> +  32 bits, metadata context ID
> +
> +  and is followed by a list of one or more descriptors, each with this
> +  layout:
> +
> +  64 bits, length of the extent to which the status below
> +     applies (unsigned, MUST be nonzero)
> +  32 bits, status flags
> +  32 bits, padding (MUST be zero)
> +
> +  Note that even when extended headers are in use, the client MUST be
> +  prepared for the server to use either the compact or extended chunk
> +  type, regardless of whether the client's hinted length was more or
> +  less than 32 bits, but the server MUST use exactly one of the two
> +  chunk types per negotiated metacontext ID.

But we have anyway one chunk per ID in a reply.. Or you mean that the type of reply for the ID should be selected once for the whole session?

> +
>   All error chunk types have bit 15 set, and begin with the same
>   *error*, *message length*, and optional *message* fields as
>   `NBD_REPLY_TYPE_ERROR`.  If nonzero, *message length* indicates
> @@ -1812,7 +1936,10 @@ remaining structured fields at the end.
>     be at least 6.  This chunk represents that an error occurred,
>     and the client MAY NOT make any assumptions about partial
>     success. This type SHOULD NOT be used more than once in a
> -  structured reply.  Valid as a reply to any request.
> +  structured reply.  Valid as a reply to any request.  Note that
> +  *message length* MUST NOT exceed the 4096 bytes string length limit,
> +  and therefore there is no need for a counterpart extended-length
> +  error chunk type.
> 
>     The payload is structured as:
> 
> @@ -1867,7 +1994,8 @@ The following request types exist:
> 
>       If structured replies were not negotiated, then a read request
>       MUST always be answered by a simple reply, as documented above
> -    (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing
> +    (using `NBD_SIMPLE_REPLY_MAGIC` or `NBD_SIMPLE_REPLY_EXT_MAGIC`
> +    according to whether extended headers are in use, and containing
>       length bytes of data according to the client's request).
> 
>       If an error occurs, the server SHOULD set the appropriate error code
> @@ -1883,7 +2011,8 @@ The following request types exist:
> 
>       If structured replies are negotiated, then a read request MUST
>       result in a structured reply with one or more chunks (each using
> -    magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), where the final
> +    `NBD_STRUCTURED_REPLY_MAGIC` or `NBD_STRUCTURED_REPLY_EXT_MAGIC`
> +    according to whether extended headers are in use), where the final
>       chunk has the flag `NBD_REPLY_FLAG_DONE`, and with the following
>       additional constraints.
> 
> @@ -1897,13 +2026,14 @@ The following request types exist:
>       chunks that describe data outside the offset and length of the
>       request, but MAY send the content chunks in any order (the client
>       MUST reassemble content chunks into the correct order), and MAY
> -    send additional content chunks even after reporting an error chunk.
> -    Note that a request for more than 2^32 - 8 bytes MUST be split
> -    into at least two chunks, so as not to overflow the length field
> -    of a reply while still allowing space for the offset of each
> -    chunk.  When no error is detected, the server MUST send enough
> -    data chunks to cover the entire region described by the offset and
> -    length of the client's request.
> +    send additional content chunks even after reporting an error
> +    chunk.  Note that if extended headers are not in use, a request
> +    for more than 2^32 - 8 bytes MUST be split into at least two
> +    chunks, so as not to overflow the length field of a reply while
> +    still allowing space for the offset of each chunk.  When no error
> +    is detected, the server MUST send enough data chunks to cover the
> +    entire region described by the offset and length of the client's
> +    request.
> 
>       To minimize traffic, the server MAY use a content or error chunk
>       as the final chunk by setting the `NBD_REPLY_FLAG_DONE` flag, but
> @@ -2136,13 +2266,19 @@ The following request types exist:
>       server returned at least one metadata context without an error.
>       This in turn requires the client to first negotiate structured
>       replies. For a successful return, the server MUST use a structured
> -    reply, containing exactly one chunk of type
> +    reply, containing exactly one status chunk of type
>       `NBD_REPLY_TYPE_BLOCK_STATUS` per selected context id, where the
>       status field of each descriptor is determined by the flags field
>       as defined by the metadata context.  The server MAY send chunks in
>       a different order than the context ids were assigned in reply to
>       `NBD_OPT_SET_META_CONTEXT`.
> 
> +    If extended headers were negotiated via
> +    `NBD_OPT_EXTENDED_HEADERS`, the server may use
> +    `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` instead of
> +    `NBD_REPLY_TYPE_BLOCK_STATUS` as the reply chunk for a metacontext
> +    id.
> +
>       The list of block status descriptors within the
>       `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
>       of the file starting from specified *offset*.  If the client used
> 

Overall, seems good to me.

1. Could we move some fixes / rewordings to a preaparation patch?

2. I see you want also to overcome unpleasant restrictions we had around lengths / cumulative lengths of BLOCK_STATUS replies. I like the idea. But I think, it should be clarified that without 64bit extension negotiated all stay as is. And with 64bit extension negotiated, BLOCK_STATUS works in a slighter new way, so it may return what server wants, and original "length" is simply a hint. Or, at least that new behavior is only about NBD_REPLY_TYPE_BLOCK_STATUS_EXT.. Also, some clarifications may need around NBD_CMD_FLAG_REQ_ONE flag, what changes for it? You don't mention it at all in new version of BLOCK_STATUS reply.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/14] nbd/server: Minor cleanups
  2021-12-03 23:15   ` [PATCH 01/14] nbd/server: Minor cleanups Eric Blake
@ 2021-12-06 12:03     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-06 12:03 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: nbd, qemu-block, libguestfs, nsoffer

04.12.2021 02:15, Eric Blake wrote:
> Spelling fixes, grammar improvements and consistent spacing, noticed
> while preparing other patches in this file.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> ---
>   nbd/server.c | 13 ++++++-------
>   1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 4630dd732250..f302e1cbb03e 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -2085,11 +2085,10 @@ static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
>    * Add extent to NBDExtentArray. If extent can't be added (no available space),
>    * return -1.
>    * For safety, when returning -1 for the first time, .can_add is set to false,
> - * further call to nbd_extent_array_add() will crash.
> - * (to avoid the situation, when after failing to add an extent (returned -1),
> - * user miss this failure and add another extent, which is successfully added
> - * (array is full, but new extent may be squashed into the last one), then we
> - * have invalid array with skipped extent)
> + * and further calls to nbd_extent_array_add() will crash.
> + * (this avoids the situation where a caller ignores failure to add one extent,
> + * where adding another extent that would squash into the last array entry
> + * would result in an incorrect range reported to the client)
>    */
>   static int nbd_extent_array_add(NBDExtentArray *ea,
>                                   uint32_t length, uint32_t flags)
> @@ -2288,7 +2287,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
>       assert(client->recv_coroutine == qemu_coroutine_self());
>       ret = nbd_receive_request(client, request, errp);
>       if (ret < 0) {
> -        return  ret;
> +        return ret;
>       }
> 
>       trace_nbd_co_receive_request_decode_type(request->handle, request->type,
> @@ -2648,7 +2647,7 @@ static coroutine_fn void nbd_trip(void *opaque)
>       }
> 
>       if (ret < 0) {
> -        /* It wans't -EIO, so, according to nbd_co_receive_request()
> +        /* It wasn't -EIO, so, according to nbd_co_receive_request()
>            * semantics, we should return the error to the client. */
>           Error *export_err = local_err;
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/14] qemu-io: Utilize 64-bit status during map
  2021-12-03 23:15   ` [PATCH 02/14] qemu-io: Utilize 64-bit status during map Eric Blake
@ 2021-12-06 12:06     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-06 12:06 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: nbd, qemu-block, libguestfs, nsoffer, Kevin Wolf, Hanna Reitz

04.12.2021 02:15, Eric Blake wrote:
> The block layer has supported 64-bit block status from drivers since
> commit 86a3d5c688 ("block: Add .bdrv_co_block_status() callback",
> v2.12) and friends, with individual driver callbacks responsible for
> capping things where necessary.  Artificially capping things below 2G
> in the qemu-io 'map' command, added in commit d6a644bbfe ("block: Make
> bdrv_is_allocated() byte-based", v2.10) is thus no longer necessary.
> 
> One way to test this is with qemu-nbd as server on a raw file larger
> than 4G (the entire file should show as allocated), plus 'qemu-io -f
> raw -c map nbd://localhost --trace=nbd_\*' as client.  Prior to this
> patch, the NBD_CMD_BLOCK_STATUS requests are fragmented at 0x7ffffe00
> distances; with this patch, the fragmenting changes to 0x7fffffff
> (since the NBD protocol is currently still limited to 32-bit
> transactions - see block/nbd.c:nbd_client_co_block_status).  Then in
> later patches, once I add an NBD extension for a 64-bit block status,
> the same map command completes with just one NBD_CMD_BLOCK_STATUS.
> 
> Signed-off-by: Eric Blake<eblake@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/14] qemu-io: Allow larger write zeroes under no fallback
  2021-12-03 23:15   ` [PATCH 03/14] qemu-io: Allow larger write zeroes under no fallback Eric Blake
@ 2021-12-06 12:26     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-06 12:26 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: nbd, qemu-block, libguestfs, nsoffer, Kevin Wolf, Hanna Reitz

04.12.2021 02:15, Eric Blake wrote:
> When writing zeroes can fall back to a slow write, permitting an
> overly large request can become an amplification denial of service
> attack in triggering a large amount of work from a small request.  But
> the whole point of the no fallback flag is to quickly determine if
> writing an entire device to zero can be done quickly (such as when it
> is already known that the device started with zero contents); in those
> cases, artificially capping things at 2G in qemu-io itself doesn't
> help us.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   qemu-io-cmds.c | 9 +++------
>   1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 954955c12fb9..45a957093369 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -603,10 +603,6 @@ static int do_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
>           .done   = false,
>       };
> 
> -    if (bytes > INT_MAX) {
> -        return -ERANGE;
> -    }
> -
>       co = qemu_coroutine_create(co_pwrite_zeroes_entry, &data);
>       bdrv_coroutine_enter(blk_bs(blk), co);
>       while (!data.done) {
> @@ -1160,8 +1156,9 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
>       if (count < 0) {
>           print_cvtnum_err(count, argv[optind]);
>           return count;
> -    } else if (count > BDRV_REQUEST_MAX_BYTES) {
> -        printf("length cannot exceed %" PRIu64 ", given %s\n",
> +    } else if (count > BDRV_REQUEST_MAX_BYTES &&
> +               !(flags & BDRV_REQ_NO_FALLBACK)) {
> +        printf("length cannot exceed %" PRIu64 " without -n, given %s\n",

Actually, I don't see the reason to restrict qemu-io which is mostly a testing tool in this way. What if I want to test data reqeust > 2G, why not?

So, we restring user in testing, but don't avoid any kind of DOS: bad gues can always modify the code and rebuild qemu-io to overcome the restriction.

But I don't really care of it, patch is not wrong:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/14] nbd/client: Add safety check on chunk payload length
  2021-12-03 23:15   ` [PATCH 04/14] nbd/client: Add safety check on chunk payload length Eric Blake
@ 2021-12-06 12:33     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-06 12:33 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: nbd, qemu-block, libguestfs, nsoffer

04.12.2021 02:15, Eric Blake wrote:
> Our existing use of structured replies either reads into a qiov capped
> at 32M (NBD_CMD_READ) or caps allocation to 1000 bytes (see
> NBD_MAX_MALLOC_PAYLOAD in block/nbd.c).  But the existing length
> checks are rather late; if we encounter a buggy (or malicious) server
> that sends a super-large payload length, we should drop the connection
> right then rather than assuming the layer on top will be careful.
> This becomes more important when we permit 64-bit lengths which are
> even more likely to have the potential for attempted denial of service
> abuse.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> ---
>   nbd/client.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 30d5383cb195..8f137c2320bb 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -1412,6 +1412,18 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
>       chunk->handle = be64_to_cpu(chunk->handle);
>       chunk->length = be32_to_cpu(chunk->length);
> 
> +    /*
> +     * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests
> +     * at 32M, no valid server should send us payload larger than
> +     * this.  Even if we stopped using REQ_ONE, sane servers will cap
> +     * the number of extents they return for block status.
> +     */
> +    if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
> +        error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long",
> +                   chunk->type, nbd_rep_lookup(chunk->type));
> +        return -EINVAL;
> +    }
> +
>       return 0;
>   }
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2021-12-06 11:40   ` Vladimir Sementsov-Ogievskiy
@ 2021-12-06 23:00     ` Eric Blake
  2021-12-07  9:08       ` Vladimir Sementsov-Ogievskiy
  2021-12-07 16:14       ` Wouter Verhelst
  0 siblings, 2 replies; 46+ messages in thread
From: Eric Blake @ 2021-12-06 23:00 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: nsoffer, libguestfs, qemu-devel, qemu-block, nbd

On Mon, Dec 06, 2021 at 02:40:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >   #### Simple reply message
> > 
> >   The simple reply message MUST be sent by the server in response to all
> >   requests if structured replies have not been negotiated using
> > -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
> > -reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
> > -but only if the reply has no data payload.  The message looks as
> > -follows:
> > +`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been
> > +negotiated, a simple reply MAY be used as a reply to any request other
> > +than `NBD_CMD_READ`, but only if the reply has no data payload.  If
> > +extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > +the message looks as follows:
> > 
> >   S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
> >      `NBD_REPLY_MAGIC`)
> > @@ -369,6 +398,16 @@ S: 64 bits, handle
> >   S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
> >       *error* is zero)
> > 
> > +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > +the message looks like:
> > +
> > +S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`)
> > +S: 32 bits, error (MAY be zero)
> > +S: 64 bits, handle
> > +S: 128 bits, padding (MUST be zero)
> > +S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
> > +    *error* is zero)
> > +
> 
> If we go this way, let's put payload length into padding: it will help to make the protocol context-independent and less error-prone.

Easy enough to do (the payload length will be 0 except for
NBD_CMD_READ).

> 
> Or, the otherway, may be just forbid the payload for simple-64bit ? What's the reason to allow 64bit requests without structured reply negotiation?

The two happened to be orthogonal enough in my implementation.  It was
easy to demonstrate either one without the other, and it IS easier to
write a client using non-structured replies (structured reads ARE
tougher than simple reads, even if it is less efficient when it comes
to reading zeros).  But you are also right that we could require
structured reads prior to allowing 64-bit operations, and then have
only one supported reply type on the wire when negotiated.  Wouter,
which way do you prefer?

> 
> >   #### Structured reply chunk message
> > 
> >   Some of the major downsides of the default simple reply to
> > @@ -410,7 +449,9 @@ considered successful only if it did not contain any error chunks,
> >   although the client MAY be able to determine partial success based
> >   on the chunks received.
> > 
> > -A structured reply chunk message looks as follows:
> > +If extended headers were not negotiated using
> > +`NBD_OPT_EXTENDED_HEADERS`, a structured reply chunk message looks as
> > +follows:
> > 
> >   S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
> >   S: 16 bits, flags
> > @@ -423,6 +464,17 @@ The use of *length* in the reply allows context-free division of
> >   the overall server traffic into individual reply messages; the
> >   *type* field describes how to further interpret the payload.
> > 
> > +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > +the message looks like:
> > +
> > +S: 32 bits, 0x6e8a278c, magic (`NBD_STRUCTURED_REPLY_EXT_MAGIC`)
> > +S: 16 bits, flags
> > +S: 16 bits, type
> > +S: 64 bits, handle
> > +S: 64 bits, length of payload (unsigned)
> 
> Maybe, 64bits is too much for payload. But who knows. And it's good that it's symmetric to 64bit length in request.

Indeed, both qemu and libnbd implementations explicitly kill the
connection to any server that replies with more than the max buffer
used for NBD_CMD_READ/WRITE (32M for qemu, 64M for libnbd).  And if
the spec is not already clear on the topic, I should add an
independent patch to NBD_CMD_BLOCK_STATUS to make it obvious that a
server cannot reply with too many extents because of such clients.

So none of my proof-of-concept code ever used the full 64-bits of the
reply header length.  On the other hand, there is indeed the symmetry
argument - if someone writes a server willing to accept a 4G
NBD_CMD_WRITE, then it should also support a 4G NBD_CMD_READ, even if
no known existing server or client allows buffers that large..

> 
> > +S: 64 bits, padding (MUST be zero)
> 
> Hmm. Extra 8 bytes to be power-of-2. Does 32 bytes really perform better than 24 bytes?

Consider:
struct header[100];

if sizeof(header[0]) is a power of 2 <= the cache line size (and the
compiler prefers to start arrays aligned to the cache line) then we
are guaranteed that all array members each reside in a single cache
line.  But if it is not a power of 2, some of the array members
straddle two cache lines.

Will there be code that wants to create an array of headers?  Perhaps
so, because that is a logical way (along with scatter/gather to
combine the header with variable-sized payloads) of tracking the
headers for multiple commands issued in parallel.

Do I have actual performance numbers?  No. But there's plenty of
google hits for why sizing structs to a power of 2 is a good idea.

> 
> > +S: *length* bytes of payload data (if *length* is nonzero)
> 
> Hmm2: we probably may move "handle" to the start of payload. This way we can keep 16bytes header for simple reply and 16bytes header for structured. So structured are read in two shots: 1. the header, 2. handle + payload.. But that means deeper restructuring of the client code.. So seems not worth it.

Right now, the handle is in the same offset for both simple and
structured replies, and for both normal and extended headers.  My
proof-of-concept for qemu always reads just the magic number and
handle, then decides how many more bytes to read (if any) (1 syscall
for simple compact headers, 2 syscalls for compact structured and for
both extended styles); while my proof-of-concept for libnbd actually
decides up front to only do a 32-byte read if extended headers are in
use for fewer syscalls.  I don't know if one way is better than the
other, but the differences in styles fell out naturally from the rest
of those code bases, and certainly anything that can be done with
fewer syscalls per transaction is going to show a modest improvement.

But you are right that repositioning the handle to live at some other
offset (including forcing it to live in the payload with a 16-byte
header, instead of having a 32-byte header) would be more invasive.
Doable?  Maybe.  That's why this is an RFC.  But unless there is a
compelling reason to try, I'd rather not go to that effort.

> 
> 
> > +
> >   #### Terminating the transmission phase
> > 
> >   There are two methods of terminating the transmission phase:
> > @@ -870,15 +922,19 @@ The procedure works as follows:
> >     server supports.
> >   - During transmission, a client can then indicate interest in metadata
> >     for a given region by way of the `NBD_CMD_BLOCK_STATUS` command,
> > -  where *offset* and *length* indicate the area of interest. The
> > -  server MUST then respond with the requested information, for all
> > +  where *offset* and *length* indicate the area of interest.
> > +- The server MUST then respond with the requested information, for all
> >     contexts which were selected during negotiation. For every metadata
> > -  context, the server sends one set of extent chunks, where the sizes
> > -  of the extents MUST be less than or equal to the length as specified
> > -  in the request.
> 
> I'm not sure we can simply drop this requirement.. It seems like an incompatible change, isn't it? May be, we should allow any size of extent only for 64bit mode?

I'm not dropping the requirement; what was listed here is redundant
with what appears elsewhere under NBD_REPLY_TYPE_BLOCK_STATUS, where
the addition of NBD_REPLY_TYPE_BLOCK_STATUS_EXT made it too wordy to
keep the redundancy here.  But yes, I can try and separate the patch
into minor cleanups separate from new additions.

...
> >   #### Structured reply types
> > 
> > -These values are used in the "type" field of a structured reply.
> > -Some chunk types can additionally be categorized by role, such as
> > -*error chunks* or *content chunks*.  Each type determines how to
> > -interpret the "length" bytes of payload.  If the client receives
> > -an unknown or unexpected type, other than an *error chunk*, it
> > -MUST initiate a hard disconnect.
> > +These values are used in the "type" field of a structured reply.  Some
> > +chunk types can additionally be categorized by role, such as *error
> > +chunks*, *content chunks*, or *status chunks*.  Each type determines
> > +how to interpret the "length" bytes of payload.  If the client
> > +receives an unknown or unexpected type, other than an *error chunk*,
> > +it MUST initiate a hard disconnect.
> 
> Just add "status chunks" to the list. Seems unrelated, better be in a separate patch.

Previously, only NBD_REPLY_TYPE_BLOCK_STATUS counts as a status chunk,
now we have two reply types with that qualification.  But I can indeed
split up the terminology addition from the addition of the second type
of status chunk.

> 
> > 
> >   * `NBD_REPLY_TYPE_NONE` (0)
> > 
> > @@ -1761,13 +1837,34 @@ MUST initiate a hard disconnect.
> >     64 bits: offset (unsigned)
> >     32 bits: hole size (unsigned, MUST be nonzero)
> > 
> > +* `NBD_REPLY_TYPE_OFFSET_HOLE_EXT` (3)
> > +
> > +  This chunk type is in the content chunk category.  *length* MUST be
> > +  exactly 16.  The semantics of this chunk mirror those of
> > +  `NBD_REPLY_TYPE_OFFSET_HOLE`, other than the use of a larger *hole
> > +  size* field.  This chunk type MUST NOT be used unless extended
> > +  headers were negotiated with `NBD_OPT_EXTENDED_HEADERS`.
> 
> Why do you call all such things _EXT, not _64 ? _64 seems more informative.

_64 would be fine with me.  As this is an RFC, the naming is not
locked in stone.

> 
> > +
> > +  The payload is structured as:
> > +
> > +  64 bits: offset (unsigned)
> > +  64 bits: hole size (unsigned, MUST be nonzero)
> > +
> > +  Note that even when extended headers are in use, a server may
> > +  enforce a maximum block size that is smaller than 32 bits, in which
> > +  case no valid `NBD_CMD_READ` will have a *length* large enough to
> s/nc/no/ ? But hard to read any way, as sounds very similar to "not valid", which breaks the meaning.
> 
> may be just "in which case valid NBD_CMD_READ will not have"

I like that.

> 
> > +  require the use of this chunk type.  However, a client using
> > +  extended headers MUST be prepared for the server to use either the
> > +  compact or extended chunk type.
> > +
> >   * `NBD_REPLY_TYPE_BLOCK_STATUS` (5)
> > 
> > -  *length* MUST be 4 + (a positive integer multiple of 8).  This reply
> > -  represents a series of consecutive block descriptors where the sum
> > -  of the length fields within the descriptors is subject to further
> > -  constraints documented below. This chunk type MUST appear
> > -  exactly once per metadata ID in a structured reply.
> > +  This chunk type is in the status chunk category.  *length* MUST be
> > +  4 + (a positive integer multiple of 8).  This reply represents a
> > +  series of consecutive block descriptors where the sum of the length
> > +  fields within the descriptors is subject to further constraints
> > +  documented below.  Each negotiated metadata ID must have exactly one
> > +  status chunk in the overall structured reply.
> 
> just rewording, no semantic changes, yes?

The change is that it is no longer to have exactly one of these per
reply (you can have a BLOCK_STATUS_EXT instead).  True, not much of a
change, but it is because of the new type.  Again, adding the notion
of exactly one status chunk per metadata id (even with only one
possible status chunk) in one patch, then adding the second status
chunk with extended headers, may be easier to review, so I'll try that
for v2.

> 
> > 
> >     The payload starts with:
> > 
> > @@ -1796,9 +1893,36 @@ MUST initiate a hard disconnect.
> >     information to the client, if looking up the information would be
> >     too resource-intensive for the server, so long as at least one
> >     extent is returned. Servers should however be aware that most
> > -  clients implementations will then simply ask for the next extent
> > +  client implementations will then simply ask for the next extent
> >     instead.
> 
> So you keep all restrictions about NBD_CMD_FLAG_REQ_ONE and about sum of lenghts of extents as is here..

Yes.

> 
> > 
> > +* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)
> > +
> > +  This chunk type is in the status chunk category.  *length* MUST be
> > +  4 + (a positive multiple of 16).  The semantics of this chunk mirror
> > +  those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a
> > +  larger *extent length* field, as well as added padding to ease
> > +  alignment.
> 
> But what about restrictions on chunk lengths and cumulative chunk length?

That is supposed to still be in effect.  If I deleted that
restriction, it was unintentional.  That is, the cumulative length
(and thus each individual extent length, since no extent can be larger
than the cumulative length) is not allowed to exceed the client's
length request except in the case of the last extent, and even then
only when REQ_ONE was not in use.

> 
> > +  This chunk type MUST NOT be used unless extended headers
> > +  were negotiated with `NBD_OPT_EXTENDED_HEADERS`.
> > +
> > +  The payload starts with:
> > +
> > +  32 bits, metadata context ID
> > +
> > +  and is followed by a list of one or more descriptors, each with this
> > +  layout:
> > +
> > +  64 bits, length of the extent to which the status below
> > +     applies (unsigned, MUST be nonzero)
> > +  32 bits, status flags
> > +  32 bits, padding (MUST be zero)
> > +
> > +  Note that even when extended headers are in use, the client MUST be
> > +  prepared for the server to use either the compact or extended chunk
> > +  type, regardless of whether the client's hinted length was more or
> > +  less than 32 bits, but the server MUST use exactly one of the two
> > +  chunk types per negotiated metacontext ID.
> 
> But we have anyway one chunk per ID in a reply.. Or you mean that the type of reply for the ID should be selected once for the whole session?

I envision the following as valid:

OPT_SET_META_CONTEXT("base:allocation", "my:extension")
=> id 1: "base:allocation", id 2: "my:extension"
OPT_GO
...

CMD_BLOCK_STATUS(offset=0, length=3G)
=> REPLY_TYPE_BLOCK_STATUS(id=1, extent[2] { length=2G flags=0, length=2G flags=1 })
=> REPLY_TYPE_BLOCK_STATUS_EXT(id=2, extent[1] { length=3G flags=0 })
CMD_BLOCK_STATUS(offset=3G, length=6G)
=> REPLY_TYPE_BLOCK_STATUS_EXT(id 1, extent[1] { length=5G flags=0 })
=> REPLY_TYPE_BLOCK_STATUS(id 2, extent[2] { length=3.5G flags=0, length=3.5G flags=1 })

Note that the first id=1 responded with a cumulative length larger
than the client's request, and the cumulative length is > 4G, but the
response itself gets away with only 32-bit extents.  The first id=2
response is < 3G, but the server chose to use a 64-bit extent anyway.
The second id=1 response has to use the 64-bit response (because even
though 5G is shorter than the client's request for 6G, it is larger
than the 4G maximum of a 32-bit response). The second id=2 is similar
to the first id=1 in that it uses a 32-bit response even though the
cumulative length is >4G.  There is no requirement that the cumulative
lengths of the two ids be identical.  And since REQ_ONE is not in
effect, the last extent of a given extent array can cause the
cumulative value to exceed the client's request.

What is invalid:
CMD_BLOCK_STATUS(offset=0, length=3G)
=> REPLY_TYPE_BLOCK_STATUS(id=1, extent[2] { length=2G flags=0, length=2G flags=1 })
=> REPLY_TYPE_BLOCK_STATUS_EXT(id=1, extent[1] { length=3G flags=0 })

because it used two status chunks both for context id=1.

Maybe I need to add the phrase "within a given NBD_CMD_BLOCK_STATUS
response", to make it clear that exactly one status chunk per id is
chosen, but whether the server chooses a 32- or 64-bit status chunk is
dependent solely on the server's whims, and the client must be
prepared for either, regardless of the length the client used
originally.

...
> 
> Overall, seems good to me.

Glad to hear it!

> 
> 1. Could we move some fixes / rewordings to a preaparation patch?

Yes, I'll do that for v2.

> 
> 2. I see you want also to overcome unpleasant restrictions we had around lengths / cumulative lengths of BLOCK_STATUS replies. I like the idea. But I think, it should be clarified that without 64bit extension negotiated all stay as is. And with 64bit extension negotiated, BLOCK_STATUS works in a slighter new way, so it may return what server wants, and original "length" is simply a hint. Or, at least that new behavior is only about NBD_REPLY_TYPE_BLOCK_STATUS_EXT.. Also, some clarifications may need around NBD_CMD_FLAG_REQ_ONE flag, what changes for it? You don't mention it at all in new version of BLOCK_STATUS reply.

That's not new behavior.  The client's length has always been a mere
hint to the server, where the only constraints are that the server
must make progress on success, and that if REQ_ONE is in use, the
server may not report more than then length the client asked about.

Or are you proposing that we relax REQ_ONE, and allow a server to
report additional length in 64-bit mode even when REQ_ONE is in use?
The 32-bit limitation on not sending back too much length with compact
structured replies was because qemu as client at one point would abort
if the cumulative length was too long (and qemu has always used
REQ_ONE).  But if extended headers are in use, qemu no longer aborts
on oversize answers, and no other client is new enough to have
extended headers.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2021-12-06 23:00     ` Eric Blake
@ 2021-12-07  9:08       ` Vladimir Sementsov-Ogievskiy
  2021-12-10 18:05         ` Vladimir Sementsov-Ogievskiy
  2021-12-07 16:14       ` Wouter Verhelst
  1 sibling, 1 reply; 46+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-07  9:08 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd, qemu-devel, qemu-block, libguestfs, nsoffer

07.12.2021 02:00, Eric Blake wrote:
> On Mon, Dec 06, 2021 at 02:40:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>    #### Simple reply message
>>>
>>>    The simple reply message MUST be sent by the server in response to all
>>>    requests if structured replies have not been negotiated using
>>> -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
>>> -reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
>>> -but only if the reply has no data payload.  The message looks as
>>> -follows:
>>> +`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been
>>> +negotiated, a simple reply MAY be used as a reply to any request other
>>> +than `NBD_CMD_READ`, but only if the reply has no data payload.  If
>>> +extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
>>> +the message looks as follows:
>>>
>>>    S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
>>>       `NBD_REPLY_MAGIC`)
>>> @@ -369,6 +398,16 @@ S: 64 bits, handle
>>>    S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
>>>        *error* is zero)
>>>
>>> +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
>>> +the message looks like:
>>> +
>>> +S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`)
>>> +S: 32 bits, error (MAY be zero)
>>> +S: 64 bits, handle
>>> +S: 128 bits, padding (MUST be zero)
>>> +S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
>>> +    *error* is zero)
>>> +
>>
>> If we go this way, let's put payload length into padding: it will help to make the protocol context-independent and less error-prone.
> 
> Easy enough to do (the payload length will be 0 except for
> NBD_CMD_READ).
> 
>>
>> Or, the otherway, may be just forbid the payload for simple-64bit ? What's the reason to allow 64bit requests without structured reply negotiation?
> 
> The two happened to be orthogonal enough in my implementation.  It was
> easy to demonstrate either one without the other, and it IS easier to
> write a client using non-structured replies (structured reads ARE
> tougher than simple reads, even if it is less efficient when it comes
> to reading zeros).  But you are also right that we could require
> structured reads prior to allowing 64-bit operations, and then have
> only one supported reply type on the wire when negotiated.  Wouter,
> which way do you prefer?
> 
>>
>>>    #### Structured reply chunk message
>>>
>>>    Some of the major downsides of the default simple reply to
>>> @@ -410,7 +449,9 @@ considered successful only if it did not contain any error chunks,
>>>    although the client MAY be able to determine partial success based
>>>    on the chunks received.
>>>
>>> -A structured reply chunk message looks as follows:
>>> +If extended headers were not negotiated using
>>> +`NBD_OPT_EXTENDED_HEADERS`, a structured reply chunk message looks as
>>> +follows:
>>>
>>>    S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
>>>    S: 16 bits, flags
>>> @@ -423,6 +464,17 @@ The use of *length* in the reply allows context-free division of
>>>    the overall server traffic into individual reply messages; the
>>>    *type* field describes how to further interpret the payload.
>>>
>>> +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
>>> +the message looks like:
>>> +
>>> +S: 32 bits, 0x6e8a278c, magic (`NBD_STRUCTURED_REPLY_EXT_MAGIC`)
>>> +S: 16 bits, flags
>>> +S: 16 bits, type
>>> +S: 64 bits, handle
>>> +S: 64 bits, length of payload (unsigned)
>>
>> Maybe, 64bits is too much for payload. But who knows. And it's good that it's symmetric to 64bit length in request.
> 
> Indeed, both qemu and libnbd implementations explicitly kill the
> connection to any server that replies with more than the max buffer
> used for NBD_CMD_READ/WRITE (32M for qemu, 64M for libnbd).  And if
> the spec is not already clear on the topic, I should add an
> independent patch to NBD_CMD_BLOCK_STATUS to make it obvious that a
> server cannot reply with too many extents because of such clients.
> 
> So none of my proof-of-concept code ever used the full 64-bits of the
> reply header length.  On the other hand, there is indeed the symmetry
> argument - if someone writes a server willing to accept a 4G
> NBD_CMD_WRITE, then it should also support a 4G NBD_CMD_READ, even if
> no known existing server or client allows buffers that large..
> 
>>
>>> +S: 64 bits, padding (MUST be zero)
>>
>> Hmm. Extra 8 bytes to be power-of-2. Does 32 bytes really perform better than 24 bytes?
> 
> Consider:
> struct header[100];
> 
> if sizeof(header[0]) is a power of 2 <= the cache line size (and the
> compiler prefers to start arrays aligned to the cache line) then we
> are guaranteed that all array members each reside in a single cache
> line.  But if it is not a power of 2, some of the array members
> straddle two cache lines.
> 
> Will there be code that wants to create an array of headers?  Perhaps
> so, because that is a logical way (along with scatter/gather to
> combine the header with variable-sized payloads) of tracking the
> headers for multiple commands issued in parallel.
> 
> Do I have actual performance numbers?  No. But there's plenty of
> google hits for why sizing structs to a power of 2 is a good idea.
> 
>>
>>> +S: *length* bytes of payload data (if *length* is nonzero)
>>
>> Hmm2: we probably may move "handle" to the start of payload. This way we can keep 16bytes header for simple reply and 16bytes header for structured. So structured are read in two shots: 1. the header, 2. handle + payload.. But that means deeper restructuring of the client code.. So seems not worth it.
> 
> Right now, the handle is in the same offset for both simple and
> structured replies, and for both normal and extended headers.  My
> proof-of-concept for qemu always reads just the magic number and
> handle, then decides how many more bytes to read (if any) (1 syscall
> for simple compact headers, 2 syscalls for compact structured and for
> both extended styles); while my proof-of-concept for libnbd actually
> decides up front to only do a 32-byte read if extended headers are in
> use for fewer syscalls.  I don't know if one way is better than the
> other, but the differences in styles fell out naturally from the rest
> of those code bases, and certainly anything that can be done with
> fewer syscalls per transaction is going to show a modest improvement.
> 
> But you are right that repositioning the handle to live at some other
> offset (including forcing it to live in the payload with a 16-byte
> header, instead of having a 32-byte header) would be more invasive.
> Doable?  Maybe.  That's why this is an RFC.  But unless there is a
> compelling reason to try, I'd rather not go to that effort.
> 
>>
>>
>>> +
>>>    #### Terminating the transmission phase
>>>
>>>    There are two methods of terminating the transmission phase:
>>> @@ -870,15 +922,19 @@ The procedure works as follows:
>>>      server supports.
>>>    - During transmission, a client can then indicate interest in metadata
>>>      for a given region by way of the `NBD_CMD_BLOCK_STATUS` command,
>>> -  where *offset* and *length* indicate the area of interest. The
>>> -  server MUST then respond with the requested information, for all
>>> +  where *offset* and *length* indicate the area of interest.
>>> +- The server MUST then respond with the requested information, for all
>>>      contexts which were selected during negotiation. For every metadata
>>> -  context, the server sends one set of extent chunks, where the sizes
>>> -  of the extents MUST be less than or equal to the length as specified
>>> -  in the request.
>>
>> I'm not sure we can simply drop this requirement.. It seems like an incompatible change, isn't it? May be, we should allow any size of extent only for 64bit mode?
> 
> I'm not dropping the requirement; 

Hmm.

First, the sentence restrict all extents to be less than length. But actually, I think we don't want to restrict the last extent..  So, this is just a mistake in spec. Can we just drop it, not caring about possible implementations that checks all extents be less than length including the last one? It's an important thing if consider the case when server can reply with only one extent, that covers more than requested length and REQ_ONE is not set.

Second, you add a global statement that "size is only a hint". But formally that's not true: we do have some restrictions..

> what was listed here is redundant
> with what appears elsewhere under NBD_REPLY_TYPE_BLOCK_STATUS, where
> the addition of NBD_REPLY_TYPE_BLOCK_STATUS_EXT made it too wordy to
> keep the redundancy here.  But yes, I can try and separate the patch
> into minor cleanups separate from new additions.
> 
> ...
>>>    #### Structured reply types
>>>
>>> -These values are used in the "type" field of a structured reply.
>>> -Some chunk types can additionally be categorized by role, such as
>>> -*error chunks* or *content chunks*.  Each type determines how to
>>> -interpret the "length" bytes of payload.  If the client receives
>>> -an unknown or unexpected type, other than an *error chunk*, it
>>> -MUST initiate a hard disconnect.
>>> +These values are used in the "type" field of a structured reply.  Some
>>> +chunk types can additionally be categorized by role, such as *error
>>> +chunks*, *content chunks*, or *status chunks*.  Each type determines
>>> +how to interpret the "length" bytes of payload.  If the client
>>> +receives an unknown or unexpected type, other than an *error chunk*,
>>> +it MUST initiate a hard disconnect.
>>
>> Just add "status chunks" to the list. Seems unrelated, better be in a separate patch.
> 
> Previously, only NBD_REPLY_TYPE_BLOCK_STATUS counts as a status chunk,
> now we have two reply types with that qualification.  But I can indeed
> split up the terminology addition from the addition of the second type
> of status chunk.
> 
>>
>>>
>>>    * `NBD_REPLY_TYPE_NONE` (0)
>>>
>>> @@ -1761,13 +1837,34 @@ MUST initiate a hard disconnect.
>>>      64 bits: offset (unsigned)
>>>      32 bits: hole size (unsigned, MUST be nonzero)
>>>
>>> +* `NBD_REPLY_TYPE_OFFSET_HOLE_EXT` (3)
>>> +
>>> +  This chunk type is in the content chunk category.  *length* MUST be
>>> +  exactly 16.  The semantics of this chunk mirror those of
>>> +  `NBD_REPLY_TYPE_OFFSET_HOLE`, other than the use of a larger *hole
>>> +  size* field.  This chunk type MUST NOT be used unless extended
>>> +  headers were negotiated with `NBD_OPT_EXTENDED_HEADERS`.
>>
>> Why do you call all such things _EXT, not _64 ? _64 seems more informative.
> 
> _64 would be fine with me.  As this is an RFC, the naming is not
> locked in stone.
> 
>>
>>> +
>>> +  The payload is structured as:
>>> +
>>> +  64 bits: offset (unsigned)
>>> +  64 bits: hole size (unsigned, MUST be nonzero)
>>> +
>>> +  Note that even when extended headers are in use, a server may
>>> +  enforce a maximum block size that is smaller than 32 bits, in which
>>> +  case no valid `NBD_CMD_READ` will have a *length* large enough to
>> s/nc/no/ ? But hard to read any way, as sounds very similar to "not valid", which breaks the meaning.
>>
>> may be just "in which case valid NBD_CMD_READ will not have"
> 
> I like that.
> 
>>
>>> +  require the use of this chunk type.  However, a client using
>>> +  extended headers MUST be prepared for the server to use either the
>>> +  compact or extended chunk type.
>>> +
>>>    * `NBD_REPLY_TYPE_BLOCK_STATUS` (5)
>>>
>>> -  *length* MUST be 4 + (a positive integer multiple of 8).  This reply
>>> -  represents a series of consecutive block descriptors where the sum
>>> -  of the length fields within the descriptors is subject to further
>>> -  constraints documented below. This chunk type MUST appear
>>> -  exactly once per metadata ID in a structured reply.
>>> +  This chunk type is in the status chunk category.  *length* MUST be
>>> +  4 + (a positive integer multiple of 8).  This reply represents a
>>> +  series of consecutive block descriptors where the sum of the length
>>> +  fields within the descriptors is subject to further constraints
>>> +  documented below.  Each negotiated metadata ID must have exactly one
>>> +  status chunk in the overall structured reply.
>>
>> just rewording, no semantic changes, yes?
> 
> The change is that it is no longer to have exactly one of these per
> reply (you can have a BLOCK_STATUS_EXT instead).  True, not much of a
> change, but it is because of the new type.  Again, adding the notion
> of exactly one status chunk per metadata id (even with only one
> possible status chunk) in one patch, then adding the second status
> chunk with extended headers, may be easier to review, so I'll try that
> for v2.
> 
>>
>>>
>>>      The payload starts with:
>>>
>>> @@ -1796,9 +1893,36 @@ MUST initiate a hard disconnect.
>>>      information to the client, if looking up the information would be
>>>      too resource-intensive for the server, so long as at least one
>>>      extent is returned. Servers should however be aware that most
>>> -  clients implementations will then simply ask for the next extent
>>> +  client implementations will then simply ask for the next extent
>>>      instead.
>>
>> So you keep all restrictions about NBD_CMD_FLAG_REQ_ONE and about sum of lenghts of extents as is here..
> 
> Yes.
> 
>>
>>>
>>> +* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)
>>> +
>>> +  This chunk type is in the status chunk category.  *length* MUST be
>>> +  4 + (a positive multiple of 16).  The semantics of this chunk mirror
>>> +  those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a
>>> +  larger *extent length* field, as well as added padding to ease
>>> +  alignment.
>>
>> But what about restrictions on chunk lengths and cumulative chunk length?
> 
> That is supposed to still be in effect.  If I deleted that
> restriction, it was unintentional.  That is, the cumulative length
> (and thus each individual extent length, since no extent can be larger
> than the cumulative length) is not allowed to exceed the client's
> length request except in the case of the last extent, and even then
> only when REQ_ONE was not in use.
> 
>>
>>> +  This chunk type MUST NOT be used unless extended headers
>>> +  were negotiated with `NBD_OPT_EXTENDED_HEADERS`.
>>> +
>>> +  The payload starts with:
>>> +
>>> +  32 bits, metadata context ID
>>> +
>>> +  and is followed by a list of one or more descriptors, each with this
>>> +  layout:
>>> +
>>> +  64 bits, length of the extent to which the status below
>>> +     applies (unsigned, MUST be nonzero)
>>> +  32 bits, status flags
>>> +  32 bits, padding (MUST be zero)
>>> +
>>> +  Note that even when extended headers are in use, the client MUST be
>>> +  prepared for the server to use either the compact or extended chunk
>>> +  type, regardless of whether the client's hinted length was more or
>>> +  less than 32 bits, but the server MUST use exactly one of the two
>>> +  chunk types per negotiated metacontext ID.
>>
>> But we have anyway one chunk per ID in a reply.. Or you mean that the type of reply for the ID should be selected once for the whole session?
> 
> I envision the following as valid:
> 
> OPT_SET_META_CONTEXT("base:allocation", "my:extension")
> => id 1: "base:allocation", id 2: "my:extension"
> OPT_GO
> ...
> 
> CMD_BLOCK_STATUS(offset=0, length=3G)
> => REPLY_TYPE_BLOCK_STATUS(id=1, extent[2] { length=2G flags=0, length=2G flags=1 })
> => REPLY_TYPE_BLOCK_STATUS_EXT(id=2, extent[1] { length=3G flags=0 })
> CMD_BLOCK_STATUS(offset=3G, length=6G)
> => REPLY_TYPE_BLOCK_STATUS_EXT(id 1, extent[1] { length=5G flags=0 })
> => REPLY_TYPE_BLOCK_STATUS(id 2, extent[2] { length=3.5G flags=0, length=3.5G flags=1 })
> 
> Note that the first id=1 responded with a cumulative length larger
> than the client's request, and the cumulative length is > 4G, but the
> response itself gets away with only 32-bit extents.  The first id=2
> response is < 3G, but the server chose to use a 64-bit extent anyway.
> The second id=1 response has to use the 64-bit response (because even
> though 5G is shorter than the client's request for 6G, it is larger
> than the 4G maximum of a 32-bit response). The second id=2 is similar
> to the first id=1 in that it uses a 32-bit response even though the
> cumulative length is >4G.  There is no requirement that the cumulative
> lengths of the two ids be identical.  And since REQ_ONE is not in
> effect, the last extent of a given extent array can cause the
> cumulative value to exceed the client's request.
> 
> What is invalid:
> CMD_BLOCK_STATUS(offset=0, length=3G)
> => REPLY_TYPE_BLOCK_STATUS(id=1, extent[2] { length=2G flags=0, length=2G flags=1 })
> => REPLY_TYPE_BLOCK_STATUS_EXT(id=1, extent[1] { length=3G flags=0 })
> 
> because it used two status chunks both for context id=1.
> 
> Maybe I need to add the phrase "within a given NBD_CMD_BLOCK_STATUS
> response", to make it clear that exactly one status chunk per id is
> chosen, but whether the server chooses a 32- or 64-bit status chunk is
> dependent solely on the server's whims, and the client must be
> prepared for either, regardless of the length the client used
> originally.


Ah, I understand, thanks. Somehow, I just thought about REPLY_TYPE_BLOCK_STATUS and REPLY_TYPE_BLOCK_STATUS_EXT like about one object, and somehow I thought that old "This chunk type MUST appear exactly once per metadata ID in a structured reply" of NBD_REPLY_TYPE_BLOCK_STATUS is enough. But yes, REPLY_TYPE_BLOCK_STATUS_EXT is a new chunk type and we need a clarification. Yes, "within a given NBD_CMD_BLOCK_STATUS  response" helps.

> 
> ...
>>
>> Overall, seems good to me.
> 
> Glad to hear it!
> 
>>
>> 1. Could we move some fixes / rewordings to a preaparation patch?
> 
> Yes, I'll do that for v2.
> 
>>
>> 2. I see you want also to overcome unpleasant restrictions we had around lengths / cumulative lengths of BLOCK_STATUS replies. I like the idea. But I think, it should be clarified that without 64bit extension negotiated all stay as is. And with 64bit extension negotiated, BLOCK_STATUS works in a slighter new way, so it may return what server wants, and original "length" is simply a hint. Or, at least that new behavior is only about NBD_REPLY_TYPE_BLOCK_STATUS_EXT.. Also, some clarifications may need around NBD_CMD_FLAG_REQ_ONE flag, what changes for it? You don't mention it at all in new version of BLOCK_STATUS reply.
> 
> That's not new behavior.  The client's length has always been a mere
> hint to the server, where the only constraints are that the server
> must make progress on success, and that if REQ_ONE is in use, the
> server may not report more than then length the client asked about.
> 
> Or are you proposing that we relax REQ_ONE, and allow a server to
> report additional length in 64-bit mode even when REQ_ONE is in use?
> The 32-bit limitation on not sending back too much length with compact
> structured replies was because qemu as client at one point would abort
> if the cumulative length was too long (and qemu has always used
> REQ_ONE).  But if extended headers are in use, qemu no longer aborts
> on oversize answers, and no other client is new enough to have
> extended headers.
> 

I mistakenly thought that you want do it) OK, then we can consider it as my proposal: we are going to alter block-status behavior in incompatible way with new extension, why not to overcome some inconvenient limitations?

So what restrictions we have:

1. The LAST extent must be <= than requested length. It's from sentence "For every metadata context, the server sends one set of extent chunks, where the sizes of the extents MUST be less than or equal to the length"

In this patch you silently remove this mistaken limitation.. I'm not sure that we could do it. But if we sure that known clients are not affected, probably we could.
If we can't simply remove it, we still can fix it when extended headers are negotiated.

2. Only last extent can cross the boundary of the request.

Actually, why we need it? What if in some implementation, server has some more extents for free? We can drop this restriction when extended header are negotiated.

3. With REQ_ONE, the returned extent can't cross the boundary.

Recently we had patches and discussion in list to overcome this restriction by requesting more than we actually need, I argued against this logic in client.
If we allow last extent to cross the boundary when new extension is negotiated - great, we can use this feature in qemu immediately (when full support for multi-extent block-status requires a lot more work).

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2021-12-06 23:00     ` Eric Blake
  2021-12-07  9:08       ` Vladimir Sementsov-Ogievskiy
@ 2021-12-07 16:14       ` Wouter Verhelst
  2022-03-22 15:10         ` Eric Blake
  1 sibling, 1 reply; 46+ messages in thread
From: Wouter Verhelst @ 2021-12-07 16:14 UTC (permalink / raw)
  To: Eric Blake
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel, nbd,
	nsoffer, libguestfs

On Mon, Dec 06, 2021 at 05:00:47PM -0600, Eric Blake wrote:
> On Mon, Dec 06, 2021 at 02:40:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > >   #### Simple reply message
> > > 
> > >   The simple reply message MUST be sent by the server in response to all
> > >   requests if structured replies have not been negotiated using
> > > -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
> > > -reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
> > > -but only if the reply has no data payload.  The message looks as
> > > -follows:
> > > +`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been
> > > +negotiated, a simple reply MAY be used as a reply to any request other
> > > +than `NBD_CMD_READ`, but only if the reply has no data payload.  If
> > > +extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > > +the message looks as follows:
> > > 
> > >   S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
> > >      `NBD_REPLY_MAGIC`)
> > > @@ -369,6 +398,16 @@ S: 64 bits, handle
> > >   S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
> > >       *error* is zero)
> > > 
> > > +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > > +the message looks like:
> > > +
> > > +S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`)
> > > +S: 32 bits, error (MAY be zero)
> > > +S: 64 bits, handle
> > > +S: 128 bits, padding (MUST be zero)
> > > +S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
> > > +    *error* is zero)
> > > +
> > 
> > If we go this way, let's put payload length into padding: it will help to make the protocol context-independent and less error-prone.

Agreed.

> Easy enough to do (the payload length will be 0 except for
> NBD_CMD_READ).

Indeed.

> > Or, the otherway, may be just forbid the payload for simple-64bit ? What's the reason to allow 64bit requests without structured reply negotiation?
> 
> The two happened to be orthogonal enough in my implementation.  It was
> easy to demonstrate either one without the other, and it IS easier to
> write a client using non-structured replies (structured reads ARE
> tougher than simple reads, even if it is less efficient when it comes
> to reading zeros).  But you are also right that we could require
> structured reads prior to allowing 64-bit operations, and then have
> only one supported reply type on the wire when negotiated.  Wouter,
> which way do you prefer?

Given that I *still* haven't gotten around to implementing structured
replies for nbd-server, I'd prefer not to require it, but that's not
really a decent argument IMO :-)

[... I haven't read this in much detail yet, intend to do that later...]

-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Libguestfs] [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS
  2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (12 preceding siblings ...)
  2021-12-03 23:17   ` [libnbd PATCH 13/13] interop: Add test of 64-bit block status Eric Blake
@ 2021-12-10  8:16   ` Laszlo Ersek
  13 siblings, 0 replies; 46+ messages in thread
From: Laszlo Ersek @ 2021-12-10  8:16 UTC (permalink / raw)
  To: Eric Blake, libguestfs; +Cc: vsementsov, qemu-devel, qemu-block, nbd

On 12/04/21 00:17, Eric Blake wrote:
> Available here: https://repo.or.cz/libnbd/ericb.git/shortlog/refs/tags/exthdr-v1
> 
> I also want to do followup patches to teach 'nbdinfo --map' and
> 'nbdcopy' to utilize 64-bit extents.
> 
> Eric Blake (13):
>   golang: Simplify nbd_block_status callback array copy
>   block_status: Refactor array storage
>   protocol: Add definitions for extended headers
>   protocol: Prepare to send 64-bit requests
>   protocol: Prepare to receive 64-bit replies
>   protocol: Accept 64-bit holes during pread
>   generator: Add struct nbd_extent in prep for 64-bit extents
>   block_status: Track 64-bit extents internally
>   block_status: Accept 64-bit extents during block status
>   api: Add [aio_]nbd_block_status_64
>   api: Add three functions for controlling extended headers
>   generator: Actually request extended headers
>   interop: Add test of 64-bit block status
> 
>  lib/internal.h                                |  31 ++-
>  lib/nbd-protocol.h                            |  61 ++++-
>  generator/API.ml                              | 237 ++++++++++++++++--
>  generator/API.mli                             |   3 +-
>  generator/C.ml                                |  24 +-
>  generator/GoLang.ml                           |  35 ++-
>  generator/Makefile.am                         |   3 +-
>  generator/OCaml.ml                            |  20 +-
>  generator/Python.ml                           |  29 ++-
>  generator/state_machine.ml                    |  52 +++-
>  generator/states-issue-command.c              |  31 ++-
>  .../states-newstyle-opt-extended-headers.c    |  90 +++++++
>  generator/states-newstyle-opt-starttls.c      |  10 +-
>  generator/states-reply-structured.c           | 220 ++++++++++++----
>  generator/states-reply.c                      |  31 ++-
>  lib/handle.c                                  |  27 +-
>  lib/rw.c                                      | 105 +++++++-
>  python/t/110-defaults.py                      |   3 +-
>  python/t/120-set-non-defaults.py              |   4 +-
>  python/t/465-block-status-64.py               |  56 +++++
>  ocaml/helpers.c                               |  22 +-
>  ocaml/nbd-c.h                                 |   3 +-
>  ocaml/tests/Makefile.am                       |   5 +-
>  ocaml/tests/test_110_defaults.ml              |   4 +-
>  ocaml/tests/test_120_set_non_defaults.ml      |   5 +-
>  ocaml/tests/test_465_block_status_64.ml       |  58 +++++
>  tests/meta-base-allocation.c                  | 111 +++++++-
>  interop/Makefile.am                           |   6 +
>  interop/large-status.c                        | 186 ++++++++++++++
>  interop/large-status.sh                       |  49 ++++
>  .gitignore                                    |   1 +
>  golang/Makefile.am                            |   3 +-
>  golang/handle.go                              |   6 +
>  golang/libnbd_110_defaults_test.go            |   8 +
>  golang/libnbd_120_set_non_defaults_test.go    |  12 +
>  golang/libnbd_465_block_status_64_test.go     | 119 +++++++++
>  36 files changed, 1511 insertions(+), 159 deletions(-)
>  create mode 100644 generator/states-newstyle-opt-extended-headers.c
>  create mode 100644 python/t/465-block-status-64.py
>  create mode 100644 ocaml/tests/test_465_block_status_64.ml
>  create mode 100644 interop/large-status.c
>  create mode 100755 interop/large-status.sh
>  create mode 100644 golang/libnbd_465_block_status_64_test.go
> 

I figured I should slowly / gradually review this series, and as a
*pre-requisite* for it, first apply the spec patch, and then read
through the spec with something like

$ git show --color -U1000

In other words, read the whole spec, just highlight the new additions.

Now, I see Vladimir has made several comments on the spec patch; will
those comments necessitate a respin of the libnbd series? If so, how
intrusive are the changes going to be? I'm hesitant to start my review
if significant changes are already foreseen.

Thanks,
Laszlo



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2021-12-07  9:08       ` Vladimir Sementsov-Ogievskiy
@ 2021-12-10 18:05         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-10 18:05 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd, qemu-devel, qemu-block, libguestfs, nsoffer

07.12.2021 12:08, Vladimir Sementsov-Ogievskiy wrote:
> 07.12.2021 02:00, Eric Blake wrote:
>> On Mon, Dec 06, 2021 at 02:40:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:

[..]

>>>
>>>> +S: 64 bits, padding (MUST be zero)
>>>
>>> Hmm. Extra 8 bytes to be power-of-2. Does 32 bytes really perform better than 24 bytes?
>>
>> Consider:
>> struct header[100];
>>
>> if sizeof(header[0]) is a power of 2 <= the cache line size (and the
>> compiler prefers to start arrays aligned to the cache line) then we
>> are guaranteed that all array members each reside in a single cache
>> line.  But if it is not a power of 2, some of the array members
>> straddle two cache lines.
>>
>> Will there be code that wants to create an array of headers?  Perhaps
>> so, because that is a logical way (along with scatter/gather to
>> combine the header with variable-sized payloads) of tracking the
>> headers for multiple commands issued in parallel.
>>
>> Do I have actual performance numbers?  No. But there's plenty of
>> google hits for why sizing structs to a power of 2 is a good idea.

I have a thought:

If client stores headers in separate, nothing prevents make this padding in RAM. So you can define header struct with padding. But what a reason to make the padding in the stream? You can have and array of good-aligned structures, but fill only part of header structure reading from the socket. Note, that you can read only one header in one read() call anyway, as you have to analyze, does it have payload or not.

So, if we want to improve performance by padding the structures in RAM, it's not a reason for padding them in the wire, keeping in mind that we can't read more then one structure at once.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2021-12-03 23:14 ` [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
  2021-12-06 11:40   ` Vladimir Sementsov-Ogievskiy
@ 2021-12-10 18:16   ` Vladimir Sementsov-Ogievskiy
  2022-03-24 17:31   ` Wouter Verhelst
  2022-10-04 21:21   ` Eric Blake
  3 siblings, 0 replies; 46+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-10 18:16 UTC (permalink / raw)
  To: Eric Blake, nbd; +Cc: qemu-devel, qemu-block, libguestfs, nsoffer

04.12.2021 02:14, Eric Blake wrote:
> Add a new negotiation feature where the client and server agree to use
> larger packet headers on every packet sent during transmission phase.
> This has two purposes: first, it makes it possible to perform
> operations like trim, write zeroes, and block status on more than 2^32
> bytes in a single command; this in turn requires that some structured
> replies from the server also be extended to match.  The wording chosen
> here is careful to permit a server to use either flavor in its reply
> (that is, a request less than 32-bits can trigger an extended reply,
> and conversely a request larger than 32-bits can trigger a compact
> reply).


About this.. Isn't it too permissive?

I think that actually having to very similar ways to do the same thing is usually a bad design. I think we don't want someone implement the logic, which tries to send 32bit commands/replies for small requests and 64bit command/replies for larger ones? Moreover, you don't allow doing it for commands. So, for symmetry, it may be good to be strict with replies too: in 64bit mode only 64bit replies.

Now we of course have to support old 32bit commands and new 64bit commands. But, may be, we'll want to deprecate 32bit commands at some moment? I'm not sure we can deprecate them in protocol, but we can deprecate them in Qemu at least. And several years later we'll drop old code, keeping only support for 64bit commands. Less code paths, less similar structures, simpler code, I think it worth it.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2021-12-07 16:14       ` Wouter Verhelst
@ 2022-03-22 15:10         ` Eric Blake
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2022-03-22 15:10 UTC (permalink / raw)
  To: Wouter Verhelst
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel, nbd,
	nsoffer, libguestfs

On Tue, Dec 07, 2021 at 06:14:23PM +0200, Wouter Verhelst wrote:
> On Mon, Dec 06, 2021 at 05:00:47PM -0600, Eric Blake wrote:
> > On Mon, Dec 06, 2021 at 02:40:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > >   #### Simple reply message
> > > > 
> > > >   The simple reply message MUST be sent by the server in response to all
> > > >   requests if structured replies have not been negotiated using
> > > > -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
> > > > -reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
> > > > -but only if the reply has no data payload.  The message looks as
> > > > -follows:
> > > > +`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been
> > > > +negotiated, a simple reply MAY be used as a reply to any request other
> > > > +than `NBD_CMD_READ`, but only if the reply has no data payload.  If
> > > > +extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > > > +the message looks as follows:
> > > > 
> > > >   S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
> > > >      `NBD_REPLY_MAGIC`)
> > > > @@ -369,6 +398,16 @@ S: 64 bits, handle
> > > >   S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
> > > >       *error* is zero)
> > > > 
> > > > +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > > > +the message looks like:
> > > > +
> > > > +S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`)
> > > > +S: 32 bits, error (MAY be zero)
> > > > +S: 64 bits, handle
> > > > +S: 128 bits, padding (MUST be zero)
> > > > +S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
> > > > +    *error* is zero)
> > > > +
> > > 
> > > If we go this way, let's put payload length into padding: it will help to make the protocol context-independent and less error-prone.
> 
> Agreed.
> 
> > Easy enough to do (the payload length will be 0 except for
> > NBD_CMD_READ).
> 
> Indeed.
> 
> > > Or, the otherway, may be just forbid the payload for simple-64bit ? What's the reason to allow 64bit requests without structured reply negotiation?
> > 
> > The two happened to be orthogonal enough in my implementation.  It was
> > easy to demonstrate either one without the other, and it IS easier to
> > write a client using non-structured replies (structured reads ARE
> > tougher than simple reads, even if it is less efficient when it comes
> > to reading zeros).  But you are also right that we could require
> > structured reads prior to allowing 64-bit operations, and then have
> > only one supported reply type on the wire when negotiated.  Wouter,
> > which way do you prefer?
> 
> Given that I *still* haven't gotten around to implementing structured
> replies for nbd-server, I'd prefer not to require it, but that's not
> really a decent argument IMO :-)
> 
> [... I haven't read this in much detail yet, intend to do that later...]

Ping - any other responses on this thread, before I start working on
version 2 of the cross-project patches?

And repeating a comment from my original cover letter:

> with 64-bit commands, we may want to also make it easier to let
> servers advertise an actual maximum size they are willing to accept
> for the commands in question (for example, a server may be happy with
> a full 64-bit block status, but still want to limit non-fast zero and
> cache to a smaller limit to avoid denial of service).

Is it worth enhancing NBD_OPT_INFO in my v2?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2021-12-03 23:14 ` [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
  2021-12-06 11:40   ` Vladimir Sementsov-Ogievskiy
  2021-12-10 18:16   ` Vladimir Sementsov-Ogievskiy
@ 2022-03-24 17:31   ` Wouter Verhelst
  2022-03-25  0:00     ` Eric Blake
  2022-10-04 21:21   ` Eric Blake
  3 siblings, 1 reply; 46+ messages in thread
From: Wouter Verhelst @ 2022-03-24 17:31 UTC (permalink / raw)
  To: Eric Blake; +Cc: vsementsov, qemu-block, qemu-devel, nbd, nsoffer, libguestfs

Hi Eric,

Thanks for the ping; it had slipped my mind.

On Fri, Dec 03, 2021 at 05:14:34PM -0600, Eric Blake wrote:
>  #### Request message
> 
> -The request message, sent by the client, looks as follows:
> +The compact request message, sent by the client when extended
> +transactions are not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> +looks as follows:
> 
>  C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`)  
>  C: 16 bits, command flags  
> @@ -353,14 +370,26 @@ C: 64 bits, offset (unsigned)
>  C: 32 bits, length (unsigned)  
>  C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)  
> 
> +If negotiation agreed on extended transactions with
> +`NBD_OPT_EXTENDED_HEADERS`, the client instead uses extended requests:
> +
> +C: 32 bits, 0x21e41c71, magic (`NBD_REQUEST_EXT_MAGIC`)  
> +C: 16 bits, command flags  
> +C: 16 bits, type  
> +C: 64 bits, handle  
> +C: 64 bits, offset (unsigned)  
> +C: 64 bits, length (unsigned)  
> +C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)  
> +

Perhaps we should decouple the ideas of "effect length" and "payload
length"? As in,

C: 32 bits, 0x21e41c71, magic (`NBD_REQUEST_EXT_MAGIC`)
C: 16 bits, command flags
C: 16 bits, type
C: 64 bits, handle
C: 64 bits, offset
C: 64 bits, effect length
C: 64 bits, payload length
C: (*payload length* bytes of data)

This makes the protocol more context free. With the current set of
commands, only NBD_CMD_WRITE would have payload length be nonzero, but
that doesn't have to remain the case forever; e.g., we could have a
command that extends NBD_CMD_BLOCK_STATUS to only query a subset of the
metadata contexts that we declared (if that is wanted, of course).

Of course, that does have the annoying side effect of no longer fitting
in 32 bytes, requiring a 40-byte header instead. I think it would be
worth it though.

(This is obviously not relevant for reply messages, only for request
messages)

>  #### Simple reply message
> 
>  The simple reply message MUST be sent by the server in response to all
>  requests if structured replies have not been negotiated using
> -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
> -reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
> -but only if the reply has no data payload.  The message looks as
> -follows:
> +`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been
> +negotiated, a simple reply MAY be used as a reply to any request other
> +than `NBD_CMD_READ`, but only if the reply has no data payload.  If
> +extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> +the message looks as follows:
> 
>  S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
>     `NBD_REPLY_MAGIC`)  
> @@ -369,6 +398,16 @@ S: 64 bits, handle
>  S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
>      *error* is zero)  
> 
> +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> +the message looks like:
> +
> +S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`)  
> +S: 32 bits, error (MAY be zero)  
> +S: 64 bits, handle  
> +S: 128 bits, padding (MUST be zero)  

Should all these requirements about padding not be a SHOULD rather than
a MUST?

[...]
> +* `NBD_OPT_EXTENDED_HEADERS` (11)
> +
> +    The client wishes to use extended headers during the transmission
> +    phase.  The client MUST NOT send any additional data with the
> +    option, and the server SHOULD reject a request that includes data
> +    with `NBD_REP_ERR_INVALID`.
> +
> +    The server replies with the following, or with an error permitted
> +    elsewhere in this document:
> +
> +    - `NBD_REP_ACK`: Extended headers have been negotiated; the client
> +      MUST use the 32-byte extended request header, and the server
> +      MUST use the 32-byte extended reply header.
> +    - For backwards compatibility, clients SHOULD be prepared to also
> +      handle `NBD_REP_ERR_UNSUP`; in this case, only the compact
> +      transmission headers will be used.
> +
> +    If the client requests `NBD_OPT_STARTTLS` after this option, it
> +    MUST renegotiate extended headers.
> +

Two thoughts here:

- We should probably allow NBD_REP_ERR_BLOCK_SIZE_REQD as a reply to
  this message; I could imagine a server might not want to talk 64-bit
  lengths if it doesn't know that block sizes are going to be
  reasonable.
- In the same vein, should we perhaps also add an error message for when
  extended headers are negotiated without structured replies? Perhaps a
  server implementation might not want to implement the "extended
  headers but no structured replies" message format.

On that note, while I know I had said earlier that I would prefer not
making this new extension depend on structured replies, in hindsight
perhaps it *is* a good idea to add that dependency; otherwise we create
an extra message format that is really a degenerate case of "we want to
be modern in one way but not in another", and that screams out to me
"I'm not going to be used much, look at me for security issues!"

Which perhaps is not a very good idea.

[...]
-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2022-03-24 17:31   ` Wouter Verhelst
@ 2022-03-25  0:00     ` Eric Blake
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2022-03-25  0:00 UTC (permalink / raw)
  To: Wouter Verhelst
  Cc: v.sementsov-og, qemu-block, qemu-devel, nbd, nsoffer, libguestfs

[Updating Vladimir's new preferred address in cc list]

On Thu, Mar 24, 2022 at 07:31:48PM +0200, Wouter Verhelst wrote:
> Hi Eric,
> 
> Thanks for the ping; it had slipped my mind.
> 
> On Fri, Dec 03, 2021 at 05:14:34PM -0600, Eric Blake wrote:
> >  #### Request message
> > 
> > -The request message, sent by the client, looks as follows:
> > +The compact request message, sent by the client when extended
> > +transactions are not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > +looks as follows:
> > 
> >  C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`)  
> >  C: 16 bits, command flags  
> > @@ -353,14 +370,26 @@ C: 64 bits, offset (unsigned)
> >  C: 32 bits, length (unsigned)  
> >  C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)  
> > 
> > +If negotiation agreed on extended transactions with
> > +`NBD_OPT_EXTENDED_HEADERS`, the client instead uses extended requests:
> > +
> > +C: 32 bits, 0x21e41c71, magic (`NBD_REQUEST_EXT_MAGIC`)  
> > +C: 16 bits, command flags  
> > +C: 16 bits, type  
> > +C: 64 bits, handle  
> > +C: 64 bits, offset (unsigned)  
> > +C: 64 bits, length (unsigned)  
> > +C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)  
> > +
> 
> Perhaps we should decouple the ideas of "effect length" and "payload
> length"? As in,
> 
> C: 32 bits, 0x21e41c71, magic (`NBD_REQUEST_EXT_MAGIC`)
> C: 16 bits, command flags
> C: 16 bits, type
> C: 64 bits, handle
> C: 64 bits, offset
> C: 64 bits, effect length
> C: 64 bits, payload length
> C: (*payload length* bytes of data)
> 
> This makes the protocol more context free. With the current set of
> commands, only NBD_CMD_WRITE would have payload length be nonzero, but
> that doesn't have to remain the case forever; e.g., we could have a
> command that extends NBD_CMD_BLOCK_STATUS to only query a subset of the
> metadata contexts that we declared (if that is wanted, of course).
> 
> Of course, that does have the annoying side effect of no longer fitting
> in 32 bytes, requiring a 40-byte header instead. I think it would be
> worth it though.

Could we still keep a 32-byte header, by having a new command (or new
command flag to the existing NBD_CMD_BLOCK_STATUS), such that the
payload itself contains the needed extra bytes?

Hmm - right now, the only command with a payload is NBD_CMD_WRITE, and
all other commands use the length field as an effect length.  So maybe
what we do is have a single command flag that says whether the length
field is serving as payload length or as effect length.  NBD_CMD_WRITE
would always set the new flag (if extended headers were negotiated),
and most other NBD_CMD_* would leave the flag unset, but in the case
of BLOCK_STATUS wanting only a subset of id status reported, we could
then have:

HEADER:
C: 32 bits, 0x21e41c71, magic (`NBD_REQUEST_EXT_MAGIC`)
C: 16 bits, command flags, NBD_CMD_FLAG_PAYLOAD
C: 16 bits, type, NBD_CMD_BLOCK_STATUS
C: 64 bits, handle
C: 64 bits, offset
C: 64 bits, payload length = 12 + 4*n
PAYLOAD:
C: 64 bits, effect length (hint on desired range)
C: 32 bits, number of ids = n
C: 32 bits, id[0]
...
C: 32 bits, id[n-1]

vs.

HEADER:
C: 32 bits, 0x21e41c71, magic (`NBD_REQUEST_EXT_MAGIC`)
C: 16 bits, command flags, 0
C: 16 bits, type, NBD_CMD_BLOCK_STATUS
C: 64 bits, handle
C: 64 bits, offset
C: 64 bits, effect length (hint on desired range)

HEADER:
C: 32 bits, 0x21e41c71, magic (`NBD_REQUEST_EXT_MAGIC`)
C: 16 bits, command flags, NBD_CMD_FLAG_PAYLOAD
C: 16 bits, type, NBD_CMD_WRITE
C: 64 bits, handle
C: 64 bits, offset
C: 64 bits, payload length = n
PAYLOAD:
C: n*8 bits data


> 
> (This is obviously not relevant for reply messages, only for request
> messages)

Staying at a power of 2 may still be worth the expense of a new cmd
flag which must always be set for writes when extended headers are in
use.

> 
> >  #### Simple reply message
> > 
> >  The simple reply message MUST be sent by the server in response to all
> >  requests if structured replies have not been negotiated using
> > -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
> > -reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
> > -but only if the reply has no data payload.  The message looks as
> > -follows:
> > +`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been
> > +negotiated, a simple reply MAY be used as a reply to any request other
> > +than `NBD_CMD_READ`, but only if the reply has no data payload.  If
> > +extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > +the message looks as follows:
> > 
> >  S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
> >     `NBD_REPLY_MAGIC`)  
> > @@ -369,6 +398,16 @@ S: 64 bits, handle
> >  S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
> >      *error* is zero)  
> > 
> > +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > +the message looks like:
> > +
> > +S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`)  
> > +S: 32 bits, error (MAY be zero)  
> > +S: 64 bits, handle  
> > +S: 128 bits, padding (MUST be zero)  
> 
> Should all these requirements about padding not be a SHOULD rather than
> a MUST?

Elsewhere in the thread, we talked about having
NBD_SIMPLE_REPLY_EXT_MAGIC have 64 bits length (only non-zero when
replying to NBD_CMD_READ) and 64 bits pad, instead of 128 bits pad.

For future extensibility, it's probably safest to require the server
to send 0 bits in the pad now, so that we can use them later.  Should
clients then ignore unknown padding bits, or is there a risk that a
future definition of non-zero values in what is now padding bits may
confuse an existing client that merely ignores those bits?

If we don't think extensibility is needed, then using SHOULD instead
of MUST means a non-careful server can leak data through the padding.
But it is certainly less restrictive to use SHOULD instead of MUST
(well-written servers won't leak, sloppy servers might, clients must
ignore the padding, and extension is not possible because of sloppy
servers).

> 
> [...]
> > +* `NBD_OPT_EXTENDED_HEADERS` (11)
> > +
> > +    The client wishes to use extended headers during the transmission
> > +    phase.  The client MUST NOT send any additional data with the
> > +    option, and the server SHOULD reject a request that includes data
> > +    with `NBD_REP_ERR_INVALID`.
> > +
> > +    The server replies with the following, or with an error permitted
> > +    elsewhere in this document:
> > +
> > +    - `NBD_REP_ACK`: Extended headers have been negotiated; the client
> > +      MUST use the 32-byte extended request header, and the server
> > +      MUST use the 32-byte extended reply header.
> > +    - For backwards compatibility, clients SHOULD be prepared to also
> > +      handle `NBD_REP_ERR_UNSUP`; in this case, only the compact
> > +      transmission headers will be used.
> > +
> > +    If the client requests `NBD_OPT_STARTTLS` after this option, it
> > +    MUST renegotiate extended headers.
> > +
> 
> Two thoughts here:
> 
> - We should probably allow NBD_REP_ERR_BLOCK_SIZE_REQD as a reply to
>   this message; I could imagine a server might not want to talk 64-bit
>   lengths if it doesn't know that block sizes are going to be
>   reasonable.

Good addition.  I'll include it in v2.

> - In the same vein, should we perhaps also add an error message for when
>   extended headers are negotiated without structured replies? Perhaps a
>   server implementation might not want to implement the "extended
>   headers but no structured replies" message format.

Seems reasonable.

> 
> On that note, while I know I had said earlier that I would prefer not
> making this new extension depend on structured replies, in hindsight
> perhaps it *is* a good idea to add that dependency; otherwise we create
> an extra message format that is really a degenerate case of "we want to
> be modern in one way but not in another", and that screams out to me
> "I'm not going to be used much, look at me for security issues!"
> 
> Which perhaps is not a very good idea.

Yeah, the more I read back over Vladimir's message, the more I am
agreeing that just because we CAN be orthogonal doesn't mean we WANT
to be orthogonal.  Every degree of orthogonality increases the testing
burden.  I'm happy to rework v2 along those lines (structured replies
mandatory, and only one extended reply header, so that only compact
style has two header types).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS
  2021-12-03 23:14 ` [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (2 preceding siblings ...)
  2022-03-24 17:31   ` Wouter Verhelst
@ 2022-10-04 21:21   ` Eric Blake
  3 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2022-10-04 21:21 UTC (permalink / raw)
  To: nbd; +Cc: qemu-devel, qemu-block, libguestfs, Vladimir Sementsov-Ogievskiy

On Fri, Dec 03, 2021 at 05:14:34PM -0600, Eric Blake wrote:
> Add a new negotiation feature where the client and server agree to use
> larger packet headers on every packet sent during transmission phase.
> This has two purposes: first, it makes it possible to perform
> operations like trim, write zeroes, and block status on more than 2^32
> bytes in a single command; this in turn requires that some structured
> replies from the server also be extended to match.  The wording chosen
> here is careful to permit a server to use either flavor in its reply
> (that is, a request less than 32-bits can trigger an extended reply,
> and conversely a request larger than 32-bits can trigger a compact
> reply).

Following up on this original proposal with something that came out of
KVM Forum this year.

> +* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)
> +
> +  This chunk type is in the status chunk category.  *length* MUST be
> +  4 + (a positive multiple of 16).  The semantics of this chunk mirror
> +  those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a
> +  larger *extent length* field, as well as added padding to ease
> +  alignment.  This chunk type MUST NOT be used unless extended headers
> +  were negotiated with `NBD_OPT_EXTENDED_HEADERS`.
> +
> +  The payload starts with:
> +
> +  32 bits, metadata context ID  
> +
> +  and is followed by a list of one or more descriptors, each with this
> +  layout:
> +
> +  64 bits, length of the extent to which the status below
> +     applies (unsigned, MUST be nonzero)  
> +  32 bits, status flags  
> +  32 bits, padding (MUST be zero)

During KVM Forum, I had several conversations about Zoned Block
Devices (https://zonedstorage.io/docs/linux/zbd-api), and what it
would take to expose ZBD information over NBD.  In particular,
NBD_CMD_BLOCK_STATUS sounds like a great way for advertising
information about zones (by adding several metadata contexts that can
be negotiated during NBD_OPT_SET_META_CONTEXT), except for the fact
that a zone might be larger than 32 bits in size.  So Rich Jones asked
me the question of whether my work on 64-bit extensions to the NBD
protocol could also allow for a server to advertise a metadata context
only to clients that support 64-bit extensions, at which point it can
report 64-bit offsets or lengths as needed, rather than being limited
to 32-bit status flags.

The idea definitely has merit, so I'm working on incorporating that
into my next revision for 64-bit extensions in NBD.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2022-10-04 21:48 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03 23:13 RFC for NBD protocol extension: extended headers Eric Blake
2021-12-03 23:14 ` [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
2021-12-06 11:40   ` Vladimir Sementsov-Ogievskiy
2021-12-06 23:00     ` Eric Blake
2021-12-07  9:08       ` Vladimir Sementsov-Ogievskiy
2021-12-10 18:05         ` Vladimir Sementsov-Ogievskiy
2021-12-07 16:14       ` Wouter Verhelst
2022-03-22 15:10         ` Eric Blake
2021-12-10 18:16   ` Vladimir Sementsov-Ogievskiy
2022-03-24 17:31   ` Wouter Verhelst
2022-03-25  0:00     ` Eric Blake
2022-10-04 21:21   ` Eric Blake
2021-12-03 23:15 ` [PATCH 00/14] qemu patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
2021-12-03 23:15   ` [PATCH 01/14] nbd/server: Minor cleanups Eric Blake
2021-12-06 12:03     ` Vladimir Sementsov-Ogievskiy
2021-12-03 23:15   ` [PATCH 02/14] qemu-io: Utilize 64-bit status during map Eric Blake
2021-12-06 12:06     ` Vladimir Sementsov-Ogievskiy
2021-12-03 23:15   ` [PATCH 03/14] qemu-io: Allow larger write zeroes under no fallback Eric Blake
2021-12-06 12:26     ` Vladimir Sementsov-Ogievskiy
2021-12-03 23:15   ` [PATCH 04/14] nbd/client: Add safety check on chunk payload length Eric Blake
2021-12-06 12:33     ` Vladimir Sementsov-Ogievskiy
2021-12-03 23:15   ` [PATCH 05/14] nbd/server: Prepare for alternate-size headers Eric Blake
2021-12-03 23:15   ` [PATCH 06/14] nbd: Prepare for 64-bit requests Eric Blake
2021-12-03 23:15   ` [PATCH 07/14] nbd: Add types for extended headers Eric Blake
2021-12-03 23:15   ` [PATCH 08/14] nbd/server: Initial support " Eric Blake
2021-12-03 23:15   ` [PATCH 09/14] nbd/server: Support 64-bit block status Eric Blake
2021-12-03 23:15   ` [PATCH 10/14] nbd/client: Initial support for extended headers Eric Blake
2021-12-03 23:15   ` [PATCH 11/14] nbd/client: Accept 64-bit hole chunks Eric Blake
2021-12-03 23:15   ` [PATCH 12/14] nbd/client: Accept 64-bit block status chunks Eric Blake
2021-12-03 23:15   ` [PATCH 13/14] nbd/client: Request extended headers during negotiation Eric Blake
2021-12-03 23:15   ` [PATCH 14/14] do not apply: nbd/server: Send 64-bit hole chunk Eric Blake
2021-12-03 23:17 ` [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 01/13] golang: Simplify nbd_block_status callback array copy Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 02/13] block_status: Refactor array storage Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 03/13] protocol: Add definitions for extended headers Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 04/13] protocol: Prepare to send 64-bit requests Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 05/13] protocol: Prepare to receive 64-bit replies Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 06/13] protocol: Accept 64-bit holes during pread Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 07/13] generator: Add struct nbd_extent in prep for 64-bit extents Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 08/13] block_status: Track 64-bit extents internally Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 09/13] block_status: Accept 64-bit extents during block status Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 10/13] api: Add [aio_]nbd_block_status_64 Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 11/13] api: Add three functions for controlling extended headers Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 12/13] generator: Actually request " Eric Blake
2021-12-03 23:17   ` [libnbd PATCH 13/13] interop: Add test of 64-bit block status Eric Blake
2021-12-10  8:16   ` [Libguestfs] [libnbd PATCH 00/13] libnbd patches for NBD_OPT_EXTENDED_HEADERS Laszlo Ersek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.