All of lore.kernel.org
 help / color / mirror / Atom feed
* [cross-project PATCH v2] NBD 64-bit extensions
@ 2022-11-14 22:41 Eric Blake
  2022-11-14 22:46 ` [PATCH v2 0/6] NBD spec changes for " Eric Blake
                   ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:41 UTC (permalink / raw)
  To: qemu-devel, qemu-block, libguestfs, nbd

This is a cover letter for a set of multi-project patch series all
designed to implement 64-bit operations in NBD, and demonstrate
interoperability of the new extension between projects.

v1 of the project was attempted nearly a year ago:
https://lists.nongnu.org/archive/html/qemu-devel/2021-12/msg00453.html

Since then, I've addressed a lot of preliminary cleanups in libnbd to
make it easier to test things, and incorporated review comments issued
on v1, including:

- no orthogonality between simple/structured replies and 64-bit mode:
  once extended headers are negotiated, all transmission traffic (both
  from client to server and server to client) uses just one header
  size

- add support for the client to pass a payload on commands to the
  server, and demonstrate it by further implementing a way to pass a
  flag with NBD_CMD_BLOCK_STATUS that says the client is passing a
  payload to request status of just a subset of the negotiated
  contexts, rather than all possible contexts that were earlier
  negotiated during NBD_OPT_SET_META_CONTEXT

- tweaks to the header layouts: tweak block status to provide 64-bit
  flags values (although "base:allocation" will remain usable with
  just 32-bit flags); reply with offset rather than padding and with
  fields rearranged for maximal sharing between client and server
  layouts

- word-smithing of the NBD proposed protocol; split things into
  several smaller patches, where we can choose how much to take

- more unit tests added in qemu and libnbd

- the series end with RFC patches on whether to support 64-bit hole
  responses to NBD_CMD_READ, even though our current limitations say
  servers don't have to accept more than a 32M read request and
  therefore will never have that big of a hole to read)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 0/6] NBD spec changes for 64-bit extensions
  2022-11-14 22:41 [cross-project PATCH v2] NBD 64-bit extensions Eric Blake
@ 2022-11-14 22:46 ` Eric Blake
  2022-11-14 22:46   ` [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length Eric Blake
                     ` (5 more replies)
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
  2 siblings, 6 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:46 UTC (permalink / raw)
  To: nbd; +Cc: qemu-devel, qemu-block, libguestfs

This is the NBD spec series; there are matching qemu and libnbd
patches that implement the changes in this series.  I'm happy to drop
the RFC patches from all three, but wanted the conversation on whether
it makes sense to have 64-bit holes during NBD_CMD_READ first (it
would make more sense if we had a way for a client and server to agree
on a single-transaction buffer limit much larger than 32M).

Eric Blake (6):
  spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  spec: Tweak description of maximum block size
  spec: Add NBD_OPT_EXTENDED_HEADERS
  spec: Allow 64-bit block status results
  spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD
  RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT

 doc/proto.md | 698 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 521 insertions(+), 177 deletions(-)

-- 
2.38.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  2022-11-14 22:46 ` [PATCH v2 0/6] NBD spec changes for " Eric Blake
@ 2022-11-14 22:46   ` Eric Blake
  2022-12-16 19:32     ` Vladimir Sementsov-Ogievskiy
  2022-11-14 22:46   ` [PATCH v2 2/6] spec: Tweak description of maximum block size Eric Blake
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:46 UTC (permalink / raw)
  To: nbd; +Cc: qemu-devel, qemu-block, libguestfs

The spec was silent on how many extents a server could reply with.
However, both qemu and nbdkit (the two server implementations known to
have implemented the NBD_CMD_BLOCK_STATUS extension) implement a hard
cap, and will truncate the amount of extents in a reply to avoid
sending a client a reply so large that the client would treat it as a
denial of service attack.  Clients currently have no way during
negotiation to request such a limit of the server, so it is easier to
just document this as a restriction on viable server implementations
than to add yet another round of handshaking.  Also, mentioning
amplification effects is worthwhile.

When qemu first implemented NBD_CMD_BLOCK_STATUS for the
base:allocation context (qemu commit e7b1948d51, Mar 2018), it behaved
as if NBD_CMD_FLAG_REQ_ONE were always passed by the client, and never
responded with more than one extent.  Later, when adding its
qemu:dirty-bitmap:XYZ context extension (qemu commit 3d068aff16, Jun
2018), it added a cap to 128k extents (1M+4 bytes), and that cap was
applied to base:allocation once qemu started sending multiple extents
for that context as well (qemu commit fb7afc797e, Jul 2018).  Qemu
extents are never smaller than 512 bytes (other than an exception at
the end of a file whose size is not aligned to 512), but even so, a
request for just under 4G of block status could produce 8M extents,
resulting in a reply of 64M if it were not capped smaller.

When nbdkit first implemented NBD_CMD_BLOCK_STATUS (nbdkit 4ca66f70a5,
Mar 2019), it did not impose any restriction on the number of extents
in the reply chunk.  But because it allows extents as small as one
byte, it is easy to write a server that can amplify a client's request
of status over 1M of the image into a reply over 8M in size, and it
was very easy to demonstrate that a hard cap was needed to avoid
crashing clients or otherwise killing the connection (a bad server
impacting the client negatively).  So nbdkit enforced a bound of 1M
extents (8M+4 bytes, nbdkit commit 6e0dc839ea, Jun 2019).  [Unrelated
to this patch, but worth noting for history: nbdkit's situation also
has to deal with the fact that it is designed for plugin server
implementations; and not capping the number of extents in a reply also
posed a problem to nbdkit as the server, where a plugin could exhaust
memory and kill the server, unrelated to any size constraints enforced
by a client.]

Since the limit chosen by these two implementations is different, and
since nbdkit has versions that were not limited, add this as a SHOULD
NOT instead of MUST NOT constraint on servers implementing block
status.  It does not matter that qemu picked a smaller limit that it
truncates to, since we have already documented that the server may
truncate for other reasons (such as it being inefficient to collect
that many extents in the first place).  But documenting the limit now
becomes even more important in the face of a future addition of 64-bit
requests, where a client's request is no longer bounded to 4G and
could thereby produce even more than 8M extents for the corner case
when every 512 bytes is a new extent, if it were not for this
recommendation.

---
v2: Add wording about amplification effect
---
 doc/proto.md | 51 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 3a96703..8f08583 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -1818,6 +1818,12 @@ MUST initiate a hard disconnect.
   the different contexts need not have the same number of extents or
   cumulative extent length.

+  Servers SHOULD NOT send more than 2^20 extents in a single reply
+  chunk; in other words, the size of
+  `NBD_REPLY_TYPE_BLOCK_STATUS` should not be more than 4 + 8*2^20
+  (8,388,612 bytes), even if this requires that the server truncate
+  the response in relation to the *length* requested by the client.
+
   Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in
   its request, the server MAY return fewer descriptors in the reply
   than would be required to fully specify the whole range of requested
@@ -2180,26 +2186,31 @@ The following request types exist:
     `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
     of the file starting from specified *offset*.  If the client used
     the `NBD_CMD_FLAG_REQ_ONE` flag, each chunk contains exactly one
-    descriptor where the *length* of the descriptor MUST NOT be greater
-    than the *length* of the request; otherwise, a chunk MAY contain
-    multiple descriptors, and the final descriptor MAY extend beyond
-    the original requested size if the server can determine a larger
-    length without additional effort.  On the other hand, the server MAY
-    return less data than requested. However the server MUST return at
-    least one status descriptor (and since each status descriptor has
-    a non-zero length, a client can always make progress on a
-    successful return).  The server SHOULD use different *status*
-    values between consecutive descriptors where feasible, although
-    the client SHOULD be prepared to handle consecutive descriptors
-    with the same *status* value.  The server SHOULD use descriptor
-    lengths that are an integer multiple of 512 bytes where possible
-    (the first and last descriptor of an unaligned query being the
-    most obvious places for an exception), and MUST use descriptor
-    lengths that are an integer multiple of any advertised minimum
-    block size. The status flags are intentionally defined so that a
-    server MAY always safely report a status of 0 for any block,
-    although the server SHOULD return additional status values when
-    they can be easily detected.
+    descriptor where the *length* of the descriptor MUST NOT be
+    greater than the *length* of the request; otherwise, a chunk MAY
+    contain multiple descriptors, and the final descriptor MAY extend
+    beyond the original requested size if the server can determine a
+    larger length without additional effort.  On the other hand, the
+    server MAY return less data than requested.  In particular, a
+    server SHOULD NOT send more than 2^20 status descriptors in a
+    single chunk.  However the server MUST return at least one status
+    descriptor, and since each status descriptor has a non-zero
+    length, a client can always make progress on a successful return.
+
+    The server SHOULD use different *status* values between
+    consecutive descriptors where feasible, although the client SHOULD
+    be prepared to handle consecutive descriptors with the same
+    *status* value.  The server SHOULD use descriptor lengths that are
+    an integer multiple of 512 bytes where possible (the first and
+    last descriptor of an unaligned query being the most obvious
+    places for an exception), in part to avoid an amplification effect
+    where a series of smaller descriptors can cause the server's reply
+    to occupy more bytes than the *length* of the client's request.
+    The server MUST use descriptor lengths that are an integer
+    multiple of any advertised minimum block size. The status flags
+    are intentionally defined so that a server MAY always safely
+    report a status of 0 for any block, although the server SHOULD
+    return additional status values when they can be easily detected.

     If an error occurs, the server SHOULD set the appropriate error
     code in the error field of an error chunk. However, if the error
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 2/6] spec: Tweak description of maximum block size
  2022-11-14 22:46 ` [PATCH v2 0/6] NBD spec changes for " Eric Blake
  2022-11-14 22:46   ` [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length Eric Blake
@ 2022-11-14 22:46   ` Eric Blake
  2022-12-16 20:22     ` Vladimir Sementsov-Ogievskiy
  2023-02-21 15:21     ` Wouter Verhelst
  2022-11-14 22:46   ` [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
                     ` (3 subsequent siblings)
  5 siblings, 2 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:46 UTC (permalink / raw)
  To: nbd; +Cc: qemu-devel, qemu-block, libguestfs

Commit 9f30fedb improved the spec to allow non-payload requests that
exceed any advertised maximum block size.  Take this one step further
by permitting the server to use NBD_EOVERFLOW as a hint to the client
when a request is oversize (while permitting NBD_EINVAL for
back-compat), and by rewording the text to explicitly call out that
what is being advertised is the maximum payload length, not maximum
block size.  This becomes more important when we add 64-bit
extensions, where it becomes possible to extend `NBD_CMD_BLOCK_STATUS`
to have both an effect length (how much of the image does the client
want status on - may be larger than 32 bits) and an optional payload
length (a way to filter the response to a subset of negotiated
metadata contexts).  In the shorter term, it means that a server may
(but not must) accept a read request larger than the maximum block
size if it can use structured replies to keep each chunk of the
response under the maximum payload limits.
---
 doc/proto.md | 127 +++++++++++++++++++++++++++++----------------------
 1 file changed, 73 insertions(+), 54 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 8f08583..53c334a 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -745,8 +745,8 @@ text unless the client insists on TLS.

 During transmission phase, several operations are constrained by the
 export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`,
-as well as by three block size constraints defined here (minimum,
-preferred, and maximum).
+as well as by three block size constraints defined here (minimum
+block, preferred block, and maximum payload).

 If a client can honour server block size constraints (as set out below
 and under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the
@@ -772,15 +772,15 @@ learn the server's constraints without committing to them.

 If block size constraints have not been advertised or agreed on
 externally, then a server SHOULD support a default minimum block size
-of 1, a preferred block size of 2^12 (4,096), and a maximum block size
-that is effectively unlimited (0xffffffff, or the export size if that
-is smaller), while a client desiring maximum interoperability SHOULD
-constrain its requests to a minimum block size of 2^9 (512), and limit
-`NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of
-2^25 (33,554,432).  A server that wants to enforce block sizes other
-than the defaults specified here MAY refuse to go into transmission
-phase with a client that uses `NBD_OPT_EXPORT_NAME` (via a hard
-disconnect) or which uses `NBD_OPT_GO` without requesting
+of 1, a preferred block size of 2^12 (4,096), and a maximum block
+payload size that is at least 2^25 (33,554,432) (even if the export
+size is smaller); while a client desiring maximum interoperability
+SHOULD constrain its requests to a minimum block size of 2^9 (512),
+and limit `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum
+block size of 2^25 (33,554,432).  A server that wants to enforce block
+sizes other than the defaults specified here MAY refuse to go into
+transmission phase with a client that uses `NBD_OPT_EXPORT_NAME` (via
+a hard disconnect) or which uses `NBD_OPT_GO` without requesting
 `NBD_INFO_BLOCK_SIZE` (via an error reply of
 `NBD_REP_ERR_BLOCK_SIZE_REQD`); but servers SHOULD NOT refuse clients
 that do not request sizing information when the server supports
@@ -818,17 +818,40 @@ the preferred block size for that export.  The server MAY advertise an
 export size that is not an integer multiple of the preferred block
 size.

-The maximum block size represents the maximum length that the server
-is willing to handle in one request.  If advertised, it MAY be
-something other than a power of 2, but MUST be either an integer
-multiple of the minimum block size or the value 0xffffffff for no
-inherent limit, MUST be at least as large as the smaller of the
+The maximum block payload size represents the maximum payload length
+that the server is willing to handle in one request.  If advertised,
+it MAY be something other than a power of 2, but MUST be either an
+integer multiple of the minimum block size or the value 0xffffffff for
+no inherent limit, MUST be at least as large as the smaller of the
 preferred block size or export size, and SHOULD be at least 2^20
 (1,048,576) if the export is that large.  For convenience, the server
-MAY advertise a maximum block size that is larger than the export
-size, although in that case, the client MUST treat the export size as
-the effective maximum block size (as further constrained by a nonzero
-offset).
+MAY advertise a maximum block payload size that is larger than the
+export size, although in that case, the client MUST treat the export
+size as an effective maximum block size (as further constrained by a
+nonzero offset).  Notwithstanding any maximum block size advertised,
+either the server or the client MAY initiate a hard disconnect if a
+payload length of either a request or a reply would be large enough to
+be deemed a denial of service attack; however, for maximum
+portability, any payload *length* not exceeding 2^25 (33,554,432)
+bytes SHOULD NOT be considered a denial of service attack, even if
+that length is larger than the advertised maximum block payload size.
+
+For commands that require a payload in either direction and where the
+client controls the payload length (`NBD_CMD_WRITE`, or `NBD_CMD_READ`
+without structured replies), the client MUST NOT use a length larger
+than the maximum block size. For replies where the payload length is
+controlled by the server (`NBD_CMD_BLOCK_STATUS` without the flag
+`NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ`, both when structured
+replies are negotiated), the server MUST NOT send an individual reply
+chunk larger than the maximum payload size, although it may split the
+overall reply among several chunks.  For commands that do not require
+a payload in either direction (such as `NBD_CMD_TRIM`), the client MAY
+request a length larger than the maximum block size; the server SHOULD
+NOT disconnect, but MAY reply with an `NBD_EOVERFLOW` or `NBD_EINVAL`
+error if the oversize request would require more server resources than
+the same command operating on only a maximum block size (such as some
+implementations of `NBD_CMD_WRITE_ZEROES` without the flag
+`NBD_CMD_FLAG_FAST_ZERO`, or `NBD_CMD_CACHE`).

 Where a transmission request can have a nonzero *offset* and/or
 *length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`),
@@ -836,24 +859,9 @@ the client MUST ensure that *offset* and *length* are integer
 multiples of any advertised minimum block size, and SHOULD use integer
 multiples of any advertised preferred block size where possible.  For
 those requests, the client MUST NOT use a *length* which, when added to
-*offset*, would exceed the export size. Also for NBD_CMD_READ,
-NBD_CMD_WRITE, NBD_CMD_CACHE and NBD_CMD_WRITE_ZEROES (except for
-when NBD_CMD_FLAG_FAST_ZERO is set), the client MUST NOT use a *length*
-larger than any advertised maximum block size.
-The server SHOULD report an `NBD_EINVAL` error if
-the client's request is not aligned to advertised minimum block size
-boundaries, or is larger than the advertised maximum block size.
-Notwithstanding any maximum block size advertised, either the server
-or the client MAY initiate a hard disconnect if the payload of an
-`NBD_CMD_WRITE` request or `NBD_CMD_READ` reply would be large enough
-to be deemed a denial of service attack; however, for maximum
-portability, any *length* less than 2^25 (33,554,432) bytes SHOULD NOT
-be considered a denial of service attack (even if the advertised
-maximum block size is smaller).  For all other commands, where the
-*length* is not reflected in the payload (such as `NBD_CMD_TRIM` or
-`NBD_CMD_WRITE_ZEROES`), a server SHOULD merely fail the command with
-an `NBD_EINVAL` error for a client that exceeds the maximum block size,
-rather than initiating a hard disconnect.
+*offset*, would exceed the export size.  The server SHOULD report an
+`NBD_EINVAL` error if the client's request is not aligned to advertised
+minimum block size boundaries or would exceed the export size.

 ## Metadata querying

@@ -1595,7 +1603,7 @@ during option haggling in the fixed newstyle negotiation.
       - 16 bits, `NBD_INFO_BLOCK_SIZE`  
       - 32 bits, minimum block size  
       - 32 bits, preferred block size  
-      - 32 bits, maximum block size  
+      - 32 bits, maximum block payload size  

 * `NBD_REP_META_CONTEXT` (4)

@@ -1743,7 +1751,8 @@ Some chunk types can additionally be categorized by role, such as
 *error chunks* or *content chunks*.  Each type determines how to
 interpret the "length" bytes of payload.  If the client receives
 an unknown or unexpected type, other than an *error chunk*, it
-MUST initiate a hard disconnect.
+MUST initiate a hard disconnect.  A server MUST NOT send a chunk
+larger than any advertised maximum block payload size.

 * `NBD_REPLY_TYPE_NONE` (0)

@@ -1906,7 +1915,9 @@ The following request types exist:
     If structured replies were not negotiated, then a read request
     MUST always be answered by a simple reply, as documented above
     (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing
-    length bytes of data according to the client's request).
+    length bytes of data according to the client's request), which in
+    turn means any client request with a length larger than the
+    maximum block payload size will fail.

     If an error occurs, the server SHOULD set the appropriate error code
     in the error field. The server MAY then initiate a hard disconnect.
@@ -1936,13 +1947,18 @@ The following request types exist:
     request, but MAY send the content chunks in any order (the client
     MUST reassemble content chunks into the correct order), and MAY
     send additional content chunks even after reporting an error
-    chunk.  Note that a request for more than 2^32 - 8 bytes (if
-    permitted by block size constraints) MUST be split into at least
-    two chunks, so as not to overflow the length field of a reply
-    while still allowing space for the offset of each chunk.  When no
-    error is detected, the server MUST send enough data chunks to
-    cover the entire region described by the offset and length of the
-    client's request.
+    chunk.  A server MAY support read requests larger than the maximum
+    block payload size by splitting the response across multiple
+    chunks (in particular, a request for more than 2^32 - 8 bytes
+    containing data rather than holes MUST be split to avoid
+    overflowing the `NBD_REPLY_TYPE_OFFSET_DATA` length field);
+    however, the server is also permitted to reject large read
+    requests up front, so a client should be prepared to retry with
+    smaller requests if a large request fails.
+
+    When no error is detected, the server MUST send enough data chunks
+    to cover the entire region described by the offset and length of
+    the client's request.

     To minimize traffic, the server MAY use a content or error chunk
     as the final chunk by setting the `NBD_REPLY_FLAG_DONE` flag, but
@@ -2115,11 +2131,12 @@ The following request types exist:
     If the server advertised `NBD_FLAG_SEND_FAST_ZERO` but
     `NBD_CMD_FLAG_FAST_ZERO` is not set, then the server MUST NOT fail
     with `NBD_ENOTSUP`, even if the operation is no faster than a
-    corresponding `NBD_CMD_WRITE`. Conversely, if
-    `NBD_CMD_FLAG_FAST_ZERO` is set, the server MUST fail quickly with
-    `NBD_ENOTSUP` unless the request can be serviced in less time than
-    a corresponding `NBD_CMD_WRITE`, and SHOULD NOT alter the contents
-    of the export when returning this failure. The server's
+    corresponding `NBD_CMD_WRITE`. Conversely, if `NBD_CMD_FLAG_FAST_ZERO`
+    is set, the server SHOULD NOT fail with `NBD_EOVERFLOW` regardless of
+    the client length, MUST fail quickly with `NBD_ENOTSUP` unless the
+    request can be serviced in less time than a corresponding
+    `NBD_CMD_WRITE`, and SHOULD NOT alter the contents of the export when
+    returning an `NBD_ENOTSUP` failure. The server's
     determination on whether to fail a fast request MAY depend on a
     number of factors, such as whether the request was suitably
     aligned, on whether the `NBD_CMD_FLAG_NO_HOLE` flag was present,
@@ -2272,7 +2289,9 @@ SHOULD return `NBD_EPERM` if it receives a write or trim request on a
 read-only export.

 The server SHOULD NOT return `NBD_EOVERFLOW` except as documented in
-response to `NBD_CMD_READ` when `NBD_CMD_FLAG_DF` is supported.
+response to `NBD_CMD_READ` when `NBD_CMD_FLAG_DF` is supported, or when
+a command without payload requests a length larger than an advertised
+maximum block payload length.

 The server SHOULD NOT return `NBD_ENOTSUP` except as documented in
 response to `NBD_CMD_WRITE_ZEROES` when `NBD_CMD_FLAG_FAST_ZERO` is
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS
  2022-11-14 22:46 ` [PATCH v2 0/6] NBD spec changes for " Eric Blake
  2022-11-14 22:46   ` [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length Eric Blake
  2022-11-14 22:46   ` [PATCH v2 2/6] spec: Tweak description of maximum block size Eric Blake
@ 2022-11-14 22:46   ` Eric Blake
  2022-12-19 18:26     ` Vladimir Sementsov-Ogievskiy
  2023-02-22  9:49     ` Wouter Verhelst
  2022-11-14 22:46   ` [PATCH v2 4/6] spec: Allow 64-bit block status results Eric Blake
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:46 UTC (permalink / raw)
  To: nbd; +Cc: qemu-devel, qemu-block, libguestfs

Add a new negotiation feature where the client and server agree to use
larger packet headers on every packet sent during transmission phase.
This has two purposes: first, it makes it possible to perform
operations like trim, write zeroes, and block status on more than 2^32
bytes in a single command.  For NBD_CMD_READ, replies are still
implicitly capped by the maximum block payload limits (generally 32M);
if you want to know if a hole larger than 32 bits can represent,
you'll use BLOCK_STATUS instead of hoping that a large READ will
either return a hole or report overflow.  But for
NBD_CMD_BLOCK_STATUS, it is very useful to be able to report a status
extent with a size larger than 32-bits, in some cases even if the
client's request was for smaller than 32-bits (such as when it is
known that the entire image is not sparse).  Thus, the wording chosen
here is careful to permit a server to use either flavor status chunk
type in its reply, and clients requesting extended headers must be
prepared for both reply types.

Second, when structured replies are active, clients have to deal with
the difference between 16- and 20-byte headers of simple
vs. structured replies, which impacts performance if the client must
perform multiple syscalls to first read the magic before knowing if
there are even additional bytes to read to learn a payload length.  In
extended header mode, all headers are the same width and there are no
simple replies permitted.  The layout of the reply header is more like
the request header; and including the client's offset in the reply
makes it easier to convert between absolute and relative offsets for
replies to NBD_CMD_READ.  Similarly, by having extended mode use a
power-of-2 sizing, it becomes easier to manipulate arrays of headers
without worrying about an individual header crossing a cache line.
However, note that this change only affects the headers; data payloads
can still be unaligned (for example, a client performing 1-byte reads
or writes).  We would need to negotiate yet another extension if we
wanted to ensure that all NBD transmission packets started on an
8-byte boundary after option haggling has completed.

This spec addition was done in parallel with proof of concept
implementations in qemu (server and client), libnbd (client), and
nbdkit (server).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 doc/proto.md | 481 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 358 insertions(+), 123 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 53c334a..fde1e70 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -280,34 +280,53 @@ a soft disconnect.

 ### Transmission

-There are three message types in the transmission phase: the request,
-the simple reply, and the structured reply chunk.  The
+There are two general message types in the transmission phase: the
+request (simple or extended), and the reply (simple, structured, or
+extended).  The determination of which message headers to use is
+determined during handshaking phase, based on whether
+`NBD_OPT_STRUCTURED_REPLY` or `NBD_OPT_EXTENDED_HEADERS` was requested
+by the client and given a successful response by the server.  The
 transmission phase consists of a series of transactions, where the
 client submits requests and the server sends corresponding replies
 with either a single simple reply or a series of one or more
-structured reply chunks per request.  The phase continues until
-either side terminates transmission; this can be performed cleanly
-only by the client.
+structured or extended reply chunks per request.  The phase continues
+until either side terminates transmission; this can be performed
+cleanly only by the client.

 Note that without client negotiation, the server MUST use only simple
 replies, and that it is impossible to tell by reading the server
 traffic in isolation whether a data field will be present; the simple
 reply is also problematic for error handling of the `NBD_CMD_READ`
-request.  Therefore, structured replies can be used to create a
-a context-free server stream; see below.
+request.  Therefore, structured or extended replies can be used to
+create a a context-free server stream; see below.
+
+The results of client negotiation also determine whether the client
+and server will utilize only compact requests and replies, or whether
+both sides will use only extended packets.  Compact messages are the
+default, but inherently limit single transactions to a 32-bit window
+starting at a 64-bit offset.  Extended messages make it possible to
+perform 64-bit transactions (although typically only for commands that
+do not include a data payload).  Furthermore, when only structured
+replies have been negotiated, compact messages require the client to
+perform partial reads to determine which reply packet style (16-byte
+simple or 20-byte structured) is on the wire before knowing the length
+of the rest of the reply, which can reduce client performance.  With
+extended messages, all packet headers have a fixed length of 32 bytes,
+and although this results in more traffic over the network, the
+resulting layout is friendlier for performance.

 Replies need not be sent in the same order as requests (i.e., requests
-may be handled by the server asynchronously), and structured reply
-chunks from one request may be interleaved with reply messages from
-other requests; however, there may be constraints that prevent
-arbitrary reordering of structured reply chunks within a given reply.
+may be handled by the server asynchronously), and structured or
+extended reply chunks from one request may be interleaved with reply
+messages from other requests; however, there may be constraints that
+prevent arbitrary reordering of reply chunks within a given reply.
 Clients SHOULD use a handle that is distinct from all other currently
 pending transactions, but MAY reuse handles that are no longer in
 flight; handles need not be consecutive.  In each reply message
-(whether simple or structured), the server MUST use the same value for
-handle as was sent by the client in the corresponding request.  In
-this way, the client can correlate which request is receiving a
-response.
+(whether simple, structured, or extended), the server MUST use the
+same value for handle as was sent by the client in the corresponding
+request.  In this way, the client can correlate which request is
+receiving a response.

 #### Ordering of messages and writes

@@ -344,7 +363,10 @@ may be useful.

 #### Request message

-The request message, sent by the client, looks as follows:
+The compact request message is sent by the client when extended
+transactions are not in use (either the client did not request
+extended headers during negotiation, or the server responded that
+`NBD_OPT_EXTENDED_HEADERS` is unsupported), and looks as follows:

 C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`)  
 C: 16 bits, command flags  
@@ -354,19 +376,54 @@ C: 64 bits, offset (unsigned)
 C: 32 bits, length (unsigned)  
 C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)  

+In the compact style, the client MUST NOT use the
+`NBD_CMD_FLAG_PAYLOAD_LEN` flag; and the only command where *length*
+represents payload length instead of effect length is `NBD_CMD_WRITE`.
+
+If negotiation agreed on extended transactions with
+`NBD_OPT_EXTENDED_HEADERS`, the client instead uses extended requests:
+
+C: 32 bits, 0x21e41c71, magic (`NBD_EXTENDED_REQUEST_MAGIC`)  
+C: 16 bits, command flags  
+C: 16 bits, type  
+C: 64 bits, handle  
+C: 64 bits, offset (unsigned)  
+C: 64 bits, payload/effect length (unsigned)  
+C: (*length* bytes of data if *flags* includes `NBD_CMD_FLAG_PAYLOAD_LEN`)  
+
+With extended headers, the meaning of the *length* field depends on
+whether *flags* contains `NBD_CMD_FLAG_PAYLOAD_LEN` (that many
+additional bytes of payload are present), or if the flag is absent
+(there is no payload, and *length* instead is an effect length
+describing how much of the image the request operates on).  The
+command `NBD_CMD_WRITE` MUST use the flag `NBD_CMD_FLAG_PAYLOAD_LEN`
+in this mode; while other commands SHOULD avoid the flag if the
+server has not indicated extension suppport for payloads on that
+command.  A server SHOULD initiate hard disconnect if a client sets
+the `NBD_CMD_FLAG_PAYLOAD_LEN` flag and uses a *length* larger than
+a server's advertised or default maximum payload length (capped at
+32 bits by the constraints of `NBD_INFO_BLOCK_SIZE`); in all other
+cases, a server SHOULD gracefully consume *length* bytes of payload
+(even if it then replies with an `NBD_EINVAL` failure because the
+particular command was not expecting a payload), and proceed with
+the next client command.  Thus, only when *length* is used as an
+effective length will it utilize a full 64-bit value.
+
 #### Simple reply message

 The simple reply message MUST be sent by the server in response to all
-requests if structured replies have not been negotiated using
-`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
+requests if neither structured replies (`NBD_OPT_STRUCTURED_REPLY`)
+nor extended headers (`NBD_OPT_EXTENDED_HEADERS`) have been
+negotiated.  If structured replies have been negotiated, a simple
 reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
-but only if the reply has no data payload.  The message looks as
-follows:
+but only if the reply has no data payload.  If extended headers have
+been negotiated, a simple reply MUST NOT be used.  The message looks
+as follows:

 S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
    `NBD_REPLY_MAGIC`)  
 S: 32 bits, error (MAY be zero)  
-S: 64 bits, handle  
+S: 64 bits, handle (MUST match the request)   
 S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
     *error* is zero)  

@@ -416,9 +473,9 @@ on the chunks received.
 A structured reply chunk message looks as follows:

 S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)  
-S: 16 bits, flags  
+S: 16 bits, reply flags  
 S: 16 bits, type  
-S: 64 bits, handle  
+S: 64 bits, handle (MUST match the request)  
 S: 32 bits, length of payload (unsigned)  
 S: *length* bytes of payload data (if *length* is nonzero)  

@@ -426,6 +483,45 @@ The use of *length* in the reply allows context-free division of
 the overall server traffic into individual reply messages; the
 *type* field describes how to further interpret the payload.

+#### Extended reply chunk message
+
+One implementation drawback of the original structured replies via
+`NBD_OPT_STRUCTURED_REPLY` is that a client must read the magic number
+to know whether it will then be reading a 16-byte or 20-byte header
+from the server.  Another is that an array of 20-byte structures is
+not always cache-line friendly, compared to power-of-2 sizing.
+Therefore, a client may instead attempt to negotiate extended headers
+via `NBD_OPT_EXTENDED_HEADERS` in place of structured replies.
+
+If extended headers are negotiated, all server replies MUST use an
+extended reply; the server MUST NOT use simple replies even when there
+is no payload.  Most fields in the extended reply have the same
+semantics as their counterparts in structured replies (including the
+use of `NBD_REPLY_FLAG_DONE` to end a sequence of one or more chunks
+comprising the overall reply).  The extended reply has an additional
+field *offset* which MUST match the *offset* of the client's request
+(even in reply types such as `NBD_REPLY_TYPE_OFFSET_DATA` where the
+payload itself contains a different offset representing the middle of
+the buffer).
+
+The other difference between extended replies and structured replies
+is that the *length* field is 64 bits to match the layout of the
+extended request header.  However, note that while the request flags
+determine whether a request uses a 32-bit payload length or a 64-bit
+effect length, at this time the reply length is used solely for
+payload lengths and so in practice will never exceed a 32-bit value
+(see `NBD_INFO_BLOCK_SIZE`).
+
+An extended reply chunk message looks as follows:
+
+S: 32 bits, 0x6e8a278c, magic (`NBD_EXTENDED_REPLY_MAGIC`)  
+S: 16 bits, reply flags  
+S: 16 bits, type  
+S: 64 bits, handle (MUST match the request)  
+S: 64 bits, offset (MUST match the request)  
+S: 64 bits, length of payload (unsigned)  
+S: *length* bytes of payload data (if *length* is nonzero)  
+
 #### Terminating the transmission phase

 There are two methods of terminating the transmission phase:
@@ -838,18 +934,19 @@ that length is larger than the advertised maximum block payload size.

 For commands that require a payload in either direction and where the
 client controls the payload length (`NBD_CMD_WRITE`, or `NBD_CMD_READ`
-without structured replies), the client MUST NOT use a length larger
-than the maximum block size. For replies where the payload length is
-controlled by the server (`NBD_CMD_BLOCK_STATUS` without the flag
-`NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ`, both when structured
-replies are negotiated), the server MUST NOT send an individual reply
-chunk larger than the maximum payload size, although it may split the
-overall reply among several chunks.  For commands that do not require
-a payload in either direction (such as `NBD_CMD_TRIM`), the client MAY
-request a length larger than the maximum block size; the server SHOULD
-NOT disconnect, but MAY reply with an `NBD_EOVERFLOW` or `NBD_EINVAL`
-error if the oversize request would require more server resources than
-the same command operating on only a maximum block size (such as some
+without structured or extended replies), the client MUST NOT use a
+length larger than the maximum block size. For replies where the
+payload length is controlled by the server (`NBD_CMD_BLOCK_STATUS`
+without the flag `NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ`, both when
+structured replies or extended headers are negotiated), the server
+MUST NOT send an individual reply chunk larger than the maximum
+payload size, although it may split the overall reply among several
+chunks.  For commands that do not require a payload in either
+direction (such as `NBD_CMD_TRIM`), the client MAY request a length
+larger than the maximum block size; the server SHOULD NOT disconnect,
+but MAY reply with an `NBD_EOVERFLOW` or `NBD_EINVAL` error if the
+oversize request would require more server resources than the same
+command operating on only a maximum block size (such as some
 implementations of `NBD_CMD_WRITE_ZEROES` without the flag
 `NBD_CMD_FLAG_FAST_ZERO`, or `NBD_CMD_CACHE`).

@@ -894,12 +991,11 @@ The procedure works as follows:
 - During transmission, a client can then indicate interest in metadata
   for a given region by way of the `NBD_CMD_BLOCK_STATUS` command,
   where *offset* and *length* indicate the area of interest. On
-  success, the server MUST respond with one structured reply chunk of
-  type `NBD_REPLY_TYPE_BLOCK_STATUS` per metadata context selected
-  during negotiation, where each reply chunk is a list of one or more
-  consecutive extents for that context.  Each extent comes with a
-  *flags* field, the semantics of which are defined by the metadata
-  context.
+  success, the server MUST respond with one status type reply chunk
+  per metadata context selected during negotiation, where each reply
+  chunk is a list of one or more consecutive extents for that context.
+  Each extent comes with a *flags* field, the semantics of which are
+  defined by the metadata context.

 The client's requested *length* is only a hint to the server, so the
 cumulative extent length contained in a chunk of the server's reply
@@ -920,9 +1016,10 @@ same export name for the subsequent `NBD_OPT_GO` (or
 sending `NBD_CMD_BLOCK_STATUS` without selecting at least one metadata
 context.

-The reply to the `NBD_CMD_BLOCK_STATUS` request MUST be sent as a
-structured reply; this implies that in order to use metadata querying,
-structured replies MUST be negotiated first.
+The reply to a successful `NBD_CMD_BLOCK_STATUS` request MUST be sent
+via reply chunks; this implies that in order to use metadata querying,
+either structured replies or extended headers MUST be negotiated
+first.

 Metadata contexts are identified by their names. The name MUST consist
 of a namespace, followed by a colon, followed by a leaf-name.  The
@@ -1097,12 +1194,12 @@ The field has the following format:
 - bit 5, `NBD_FLAG_SEND_TRIM`: exposes support for `NBD_CMD_TRIM`.
 - bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`: exposes support for
   `NBD_CMD_WRITE_ZEROES` and `NBD_CMD_FLAG_NO_HOLE`.
-- bit 7, `NBD_FLAG_SEND_DF`: do not fragment a structured reply. The
-  server MUST set this transmission flag to 1 if the
-  `NBD_CMD_READ` request supports the `NBD_CMD_FLAG_DF` flag, and
-  MUST leave this flag clear if structured replies have not been
-  negotiated. Clients MUST NOT set the `NBD_CMD_FLAG_DF` request
-  flag unless this transmission flag is set.
+- bit 7, `NBD_FLAG_SEND_DF`: do not fragment a structured or extended
+  reply. The server MUST set this transmission flag to 1 if the
+  `NBD_CMD_READ` request supports the `NBD_CMD_FLAG_DF` flag, and MUST
+  leave this flag clear if neither structured replies nor extended
+  headers have been negotiated. Clients MUST NOT set the
+  `NBD_CMD_FLAG_DF` request flag unless this transmission flag is set.
 - bit 8, `NBD_FLAG_CAN_MULTI_CONN`: Indicates that the server operates
   entirely without cache, or that the cache it uses is shared among all
   connections to the given device. In particular, if this flag is
@@ -1212,10 +1309,10 @@ of the newstyle negotiation.

     When this command succeeds, the server MUST NOT preserve any
     negotiation state (such as a request for
-    `NBD_OPT_STRUCTURED_REPLY`, or metadata contexts from
-    `NBD_OPT_SET_META_CONTEXT`) issued before this command.  A client
-    SHOULD defer all stateful option requests until after it
-    determines whether encryption is available.
+    `NBD_OPT_STRUCTURED_REPLY` or `NBD_OPT_EXTENDED_HEADERS`, or
+    metadata contexts from `NBD_OPT_SET_META_CONTEXT`) issued before
+    this command.  A client SHOULD defer all stateful option requests
+    until after it determines whether encryption is available.

     See the section on TLS above for further details.

@@ -1296,6 +1393,10 @@ of the newstyle negotiation.
       `NBD_REP_ERR_BLOCK_SIZE_REQD` error SHOULD ensure it first
       sends an `NBD_INFO_BLOCK_SIZE` information reply in order
       to help avoid a potentially unnecessary round trip.
+    - `NBD_REP_ERR_EXT_HEADER_REQD`: The server requires the client to
+      utilize extended headers.  While this may make it easier to
+      implement a server with fewer considerations for backwards
+      compatibility, it limits connections to only recent clients.

     Additionally, if TLS has not been initiated, the server MAY reply
     with `NBD_REP_ERR_TLS_REQD` (instead of `NBD_REP_ERR_UNKNOWN`) to
@@ -1350,6 +1451,9 @@ of the newstyle negotiation.
       server MUST use structured replies to the `NBD_CMD_READ`
       transmission request.  Other extensions that require structured
       replies may now be negotiated.
+    - `NBD_REP_ERR_EXT_HEADER_REQD`: The client has already
+      successfully negotiated extended headers, which takes precedence
+      over this option.
     - For backwards compatibility, clients SHOULD be prepared to also
       handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies
       will be sent.
@@ -1357,9 +1461,10 @@ of the newstyle negotiation.
     It is envisioned that future extensions will add other new
     requests that may require a data payload in the reply.  A server
     that supports such extensions SHOULD NOT advertise those
-    extensions until the client negotiates structured replies; and a
-    client MUST NOT make use of those extensions without first
-    enabling the `NBD_OPT_STRUCTURED_REPLY` extension.
+    extensions until the client has negotiated either structured
+    replies or extended headers; and a client MUST NOT make use of
+    those extensions without first enabling support for reply
+    payloads.

     If the client requests `NBD_OPT_STARTTLS` after this option, it
     MUST renegotiate structured replies and any other dependent
@@ -1370,9 +1475,10 @@ of the newstyle negotiation.
     Return a list of `NBD_REP_META_CONTEXT` replies, one per context,
     followed by an `NBD_REP_ACK` or an error.

-    This option SHOULD NOT be requested unless structured replies have
-    been negotiated first. If a client attempts to do so, a server
-    MAY send `NBD_REP_ERR_INVALID`.
+    This option SHOULD NOT be requested unless structured replies or
+    extended headers have been negotiated first. If a client attempts
+    to do so, a server MAY send `NBD_REP_ERR_INVALID` or
+    `NBD_REP_ERR_EXT_HEADER_REQD`.

     Data:
     - 32 bits, length of export name.  
@@ -1454,9 +1560,10 @@ of the newstyle negotiation.
     they are interested in are selected with the final query that they
     sent.

-    This option MUST NOT be requested unless structured replies have
-    been negotiated first. If a client attempts to do so, a server
-    SHOULD send `NBD_REP_ERR_INVALID`.
+    This option MUST NOT be requested unless structured replies or
+    extended headers have been negotiated first. If a client attempts
+    to do so, a server SHOULD send `NBD_REP_ERR_INVALID` or
+    `NBD_REP_ERR_EXT_HEADER_REQD`.

     A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the
     negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least
@@ -1493,6 +1600,46 @@ of the newstyle negotiation.
     option does not select any metadata context, provided the client
     then does not attempt to issue `NBD_CMD_BLOCK_STATUS` commands.

+* `NBD_OPT_EXTENDED_HEADERS` (11)
+
+    The client wishes to use extended headers during the transmission
+    phase.  The client MUST NOT send any additional data with the
+    option, and the server SHOULD reject a request that includes data
+    with `NBD_REP_ERR_INVALID`.
+
+    When successful, this option takes precedence over structured
+    replies.  A client MAY request structured replies first, although
+    a server SHOULD support this option even if structured replies are
+    not negotiated.
+
+    It is envisioned that future extensions will add other new
+    requests that support a data payload in the request or reply.  A
+    server that supports such extensions SHOULD NOT advertise those
+    extensions until the client has negotiated extended headers; and a
+    client MUST NOT make use of those extensions without first
+    enabling support for reply payloads.
+
+    The server replies with the following, or with an error permitted
+    elsewhere in this document:
+
+    - `NBD_REP_ACK`: Extended headers have been negotiated; the client
+      MUST use the 32-byte extended request header, with proper use of
+      `NBD_CMD_FLAG_PAYLOAD_LEN` for all commands sending a payload;
+      and the server MUST use the 32-byte extended reply header.
+    - For backwards compatibility, clients SHOULD be prepared to also
+      handle `NBD_REP_ERR_UNSUP`; in this case, only the compact
+      transmission headers will be used.
+
+    Note that a response of `NBD_REP_ERR_BLOCK_SIZE_REQD` does not
+    make sense in response to this command, but a server MAY fail with
+    that error for a later `NBD_OPT_GO` without a client request for
+    `NBD_INFO_BLOCK_SIZE`, since the use of extended headers provides
+    more incentive for a client to promise to obey block size
+    constraints.
+
+    If the client requests `NBD_OPT_STARTTLS` after this option, it
+    MUST renegotiate extended headers.
+
 #### Option reply types

 These values are used in the "reply type" field, sent by the server
@@ -1672,6 +1819,16 @@ case that data is an error message string suitable for display to the user.

     The request or the reply is too large to process.

+* `NBD_REP_ERR_EXT_HEADER_REQD` (2^31 + 10)
+
+    The server is unwilling to proceed with the given request (such as
+    `NBD_OPT_GO` or `NBD_OPT_SET_META_CONTEXT`) for a client that has
+    not yet requested extended headers (via
+    `NBD_OPT_EXTENDED_HEADERS`).  This error is also used to indicate
+    that the server is unable to downgrade to structured replies (via
+    `NBD_OPT_STRUCTURED_REPLY`) if extended headers have already been
+    enabled.
+
 ### Transmission phase

 #### Flag fields
@@ -1725,6 +1882,17 @@ valid may depend on negotiation during the handshake phase.
   `NBD_CMD_WRITE`, then the server MUST fail quickly with an error of
   `NBD_ENOTSUP`. The client MUST NOT set this unless the server advertised
   `NBD_FLAG_SEND_FAST_ZERO`.
+- bit 5, `NBD_CMD_FLAG_PAYLOAD_LEN`; only valid if extended headers
+  were negotiated via `NBD_OPT_EXTENDED_HEADERS`.  If set, the
+  *length* field of the header describes a (nonzero) payload length of
+  data to be sent alongside the header; if the flag is clear, the
+  header *length* is instead an effect length for the command's
+  interaction with the export, and there is no payload after the
+  header.  With extended headers, the flag MUST be set for
+  `NBD_CMD_WRITE` (as the write command always sends a payload of the
+  bytes to be written); for other commands, the flag will trigger an
+  `NBD_EINVAL` error unless the server has advertised support for an
+  extension payload form for the command.

 ##### Structured reply flags

@@ -1746,13 +1914,15 @@ unrecognized flags.

 #### Structured reply types

-These values are used in the "type" field of a structured reply.
-Some chunk types can additionally be categorized by role, such as
-*error chunks* or *content chunks*.  Each type determines how to
-interpret the "length" bytes of payload.  If the client receives
-an unknown or unexpected type, other than an *error chunk*, it
-MUST initiate a hard disconnect.  A server MUST NOT send a chunk
-larger than any advertised maximum block payload size.
+These values are used in the "type" field of a structured reply.  Some
+chunk types can additionally be categorized by role, such as *error
+chunks*, *content chunks*, or *status chunks*.  Each type determines
+how to interpret the "length" bytes of payload.  If the client
+receives an unknown or unexpected type, other than an *error chunk*,
+it MAY initiate a hard disconnect on the grounds that the client is
+uncertain whether the server handled the request as desired.  A server
+MUST NOT send a chunk larger than any advertised maximum block payload
+size.

 * `NBD_REPLY_TYPE_NONE` (0)

@@ -1795,13 +1965,28 @@ larger than any advertised maximum block payload size.
   64 bits: offset (unsigned)  
   32 bits: hole size (unsigned, MUST be nonzero)  

+  At this time, although servers that support extended headers are
+  permitted to accept client requests for `NBD_CMD_READ` with an
+  effect length larger than any advertised maximum block payload size
+  by splitting the reply into multiple chunks, portable clients SHOULD
+  NOT request a read *length* larger than 32 bits (corresponding to
+  the maximum block payload constraint implied by
+  `NBD_INFO_BLOCK_SIZE`), and therefore a 32-bit constraint on the
+  *hole size* does not represent an arbitrary limitation.  Should a
+  future scenario arise where it can be demonstrated that a client and
+  server would benefit from an extension allowing a maximum block
+  payload size to be larger than 32 bits, that extension would also
+  introduce a counterpart reply type that can express a 64-bit *hole
+  size*.
+
 * `NBD_REPLY_TYPE_BLOCK_STATUS` (5)

-  *length* MUST be 4 + (a positive integer multiple of 8).  This reply
-  represents a series of consecutive block descriptors where the sum
-  of the length fields within the descriptors is subject to further
-  constraints documented below.  A successful block status request MUST
-  have exactly one status chunk per negotiated metadata context ID.
+  This chunk type is in the status chunk category.  *length* MUST be
+  4 + (a positive integer multiple of 8).  This reply represents a
+  series of consecutive block descriptors where the sum of the length
+  fields within the descriptors is subject to further constraints
+  documented below.  A successful block status request MUST have
+  exactly one status chunk per negotiated metadata context ID.

   The payload starts with:

@@ -1843,6 +2028,43 @@ larger than any advertised maximum block payload size.
   extent information at the first offset not covered by a
   reduced-length reply.

+* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)
+
+  This chunk type is in the status chunk category.  *length* MUST be
+  8 + (a positive multiple of 16).  The semantics of this chunk mirror
+  those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a
+  larger *extent length* field, added padding in each descriptor to
+  ease alignment, and the addition of a *descriptor count* field that
+  can be used for easier client processing.  This chunk type MUST NOT
+  be used unless extended headers were negotiated with
+  `NBD_OPT_EXTENDED_HEADERS`.
+
+  If the *descriptor count* field contains 0, the number of subsequent
+  descriptors is determined solely by the *length* field of the reply
+  header.  However, the server MAY populate the *descriptor count*
+  field with the number of descriptors that follow; when doing this,
+  the server MUST ensure that the header *length* is equal to
+  *descriptor count* * 16 + 8.
+
+  The payload starts with:
+
+  32 bits, metadata context ID  
+  32 bits, descriptor count  
+
+  and is followed by a list of one or more descriptors, each with this
+  layout:
+
+  64 bits, length of the extent to which the status below
+     applies (unsigned, MUST be nonzero)  
+  32 bits, padding (MUST be zero)  
+  32 bits, status flags  
+
+  Note that even when extended headers are in use, the client MUST be
+  prepared for the server to use either the compact or extended chunk
+  type, regardless of whether the client's hinted effect length was
+  more or less than 32 bits; but the server MUST use exactly one of
+  the two chunk types per negotiated metacontext ID.
+
 All error chunk types have bit 15 set, and begin with the same
 *error*, *message length*, and optional *message* fields as
 `NBD_REPLY_TYPE_ERROR`.  If nonzero, *message length* indicates
@@ -1857,8 +2079,9 @@ remaining structured fields at the end.
   and the client MAY NOT make any assumptions about partial
   success. This type SHOULD NOT be used more than once in a
   structured reply.  Valid as a reply to any request.  Note that
-  *message length* MUST NOT exceed the 4096 bytes string length
-  limit.
+  *message length* MUST NOT exceed the 4096 bytes string length limit,
+  and therefore there is no need for a counterpart extended-length
+  error chunk type.

   The payload is structured as:

@@ -1905,19 +2128,21 @@ The following request types exist:

     A read request. Length and offset define the data to be read. The
     server MUST reply with either a simple reply or a structured
-    reply, according to whether the structured replies have been
-    negotiated using `NBD_OPT_STRUCTURED_REPLY`. The client SHOULD NOT
-    request a read length of 0; the behavior of a server on such a
+    reply, according to whether structured replies
+    (`NBD_OPT_STRUCTURED_REPLY`) or extended headers
+    (`NBD_OPT_EXTENDED_HEADERS`) were negotiated.  The client SHOULD
+    NOT request a read length of 0; the behavior of a server on such a
     request is unspecified although the server SHOULD NOT disconnect.

     *Simple replies*

-    If structured replies were not negotiated, then a read request
-    MUST always be answered by a simple reply, as documented above
-    (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing
-    length bytes of data according to the client's request), which in
-    turn means any client request with a length larger than the
-    maximum block payload size will fail.
+    If neither structured replies nor extended headers were
+    negotiated, then a read request MUST always be answered by a
+    simple reply, as documented above (using magic 0x67446698
+    `NBD_SIMPLE_REPLY_MAGIC`, and containing length bytes of data
+    according to the client's request), which in turn means any client
+    request with a length larger than the maximum block payload size
+    will fail.

     If an error occurs, the server SHOULD set the appropriate error code
     in the error field. The server MAY then initiate a hard disconnect.
@@ -1930,13 +2155,15 @@ The following request types exist:

     *Structured replies*

-    If structured replies are negotiated, then a read request MUST
-    result in a structured reply with one or more chunks (each using
-    magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), where the final
-    chunk has the flag `NBD_REPLY_FLAG_DONE`, and with the following
-    additional constraints.
+    If structured replies or extended headers are negotiated, then a
+    read request MUST result in a reply with one or more structured
+    chunks (each using `NBD_STRUCTURED_REPLY_MAGIC` or
+    `NBD_EXTENDED_REPLY_MAGIC` according to what was negotiated),
+    where the final chunk has the flag `NBD_REPLY_FLAG_DONE`, and with
+    the following additional constraints.

-    The server MAY split the reply into any number of content chunks;
+    The server MAY split the reply into any number of content chunks
+    (`NBD_REPLY_TYPE_OFFSET_DATA` and `NBD_REPLY_TYPE_OFFSET_HOLE`);
     each chunk MUST describe at least one byte, although to minimize
     overhead, the server SHOULD use chunks with lengths and offsets as
     an integer multiple of 512 bytes, where possible (the first and
@@ -1949,12 +2176,13 @@ The following request types exist:
     send additional content chunks even after reporting an error
     chunk.  A server MAY support read requests larger than the maximum
     block payload size by splitting the response across multiple
-    chunks (in particular, a request for more than 2^32 - 8 bytes
-    containing data rather than holes MUST be split to avoid
-    overflowing the `NBD_REPLY_TYPE_OFFSET_DATA` length field);
-    however, the server is also permitted to reject large read
-    requests up front, so a client should be prepared to retry with
-    smaller requests if a large request fails.
+    chunks (in particular, if extended headers are not in use, a
+    request for more than 2^32 - 8 bytes containing data rather than
+    holes MUST be split to avoid overflowing the 32-bit
+    `NBD_REPLY_TYPE_OFFSET_DATA` length field); however, the server is
+    also permitted to reject large read requests up front, so a client
+    should be prepared to retry with smaller requests if a large
+    request fails.

     When no error is detected, the server MUST send enough data chunks
     to cover the entire region described by the offset and length of
@@ -2180,10 +2408,10 @@ The following request types exist:

 * `NBD_CMD_BLOCK_STATUS` (7)

-    A block status query request. Length and offset define the range
-    of interest.  The client SHOULD NOT request a status length of 0;
-    the behavior of a server on such a request is unspecified although
-    the server SHOULD NOT disconnect.
+    A block status query request. The effect length is a hint to the
+    server about the range of interest.  The client SHOULD NOT request
+    a status length of 0; the behavior of a server on such a request
+    is unspecified although the server SHOULD NOT disconnect.

     A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the
     negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least
@@ -2191,28 +2419,29 @@ The following request types exist:
     same export name used to enter transmission phase, and where the
     server returned at least one metadata context without an error.
     This in turn requires the client to first negotiate structured
-    replies. For a successful return, the server MUST use a structured
-    reply, containing exactly one chunk of type
-    `NBD_REPLY_TYPE_BLOCK_STATUS` per selected context id, where the
-    status field of each descriptor is determined by the flags field
-    as defined by the metadata context.  The server MAY send chunks in
-    a different order than the context ids were assigned in reply to
-    `NBD_OPT_SET_META_CONTEXT`.
+    replies or extended headers. For a successful return, the server
+    MUST use one reply chunk per selected context id (only
+    `NBD_REPLY_TYPE_BLOCK_STATUS` for structured replies, and either
+    `NBD_REPLY_TYPE_BLOCK_STATUS` or `NBD_REPLY_TYPE_BLOCK_STATUS_EXT`
+    for extended headers).  The status field of each descriptor is
+    determined by the flags field as defined by the metadata context.
+    The server MAY send chunks in a different order than the context
+    ids were assigned in reply to `NBD_OPT_SET_META_CONTEXT`.

-    The list of block status descriptors within the
-    `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
-    of the file starting from specified *offset*.  If the client used
-    the `NBD_CMD_FLAG_REQ_ONE` flag, each chunk contains exactly one
-    descriptor where the *length* of the descriptor MUST NOT be
-    greater than the *length* of the request; otherwise, a chunk MAY
-    contain multiple descriptors, and the final descriptor MAY extend
-    beyond the original requested size if the server can determine a
-    larger length without additional effort.  On the other hand, the
-    server MAY return less data than requested.  In particular, a
-    server SHOULD NOT send more than 2^20 status descriptors in a
-    single chunk.  However the server MUST return at least one status
-    descriptor, and since each status descriptor has a non-zero
-    length, a client can always make progress on a successful return.
+    The list of block status descriptors within a given status chunk
+    represent consecutive portions of the file starting from specified
+    *offset*.  If the client used the `NBD_CMD_FLAG_REQ_ONE` flag,
+    each chunk contains exactly one descriptor where the *length* of
+    the descriptor MUST NOT be greater than the *length* of the
+    request; otherwise, a chunk MAY contain multiple descriptors, and
+    the final descriptor MAY extend beyond the original requested size
+    if the server can determine a larger length without additional
+    effort.  On the other hand, the server MAY return less data than
+    requested.  In particular, a server SHOULD NOT send more than 2^20
+    status descriptors in a single chunk.  However the server MUST
+    return at least one status descriptor, and since each status
+    descriptor has a non-zero length, a client can always make
+    progress on a successful return.

     The server SHOULD use different *status* values between
     consecutive descriptors where feasible, although the client SHOULD
@@ -2412,6 +2641,10 @@ implement the following features:
   Clients that do not implement querying for block size constraints
   SHOULD abide by the rules laid out in the section "Block size
   constraints", above.
+- Servers that implement extended headers but desire interoperability
+  with older client implementations SHOULD NOT use the
+  `NBD_REP_ERR_EXT_HEADER_REQD` error response to `NBD_OPT_GO`, but
+  should gracefully support compact headers.

 ### Future considerations

@@ -2421,6 +2654,8 @@ implementations are not yet ready to support them:

 - Structured replies; the Linux kernel currently does not yet implement
   them.
+- Extended headers; these are similar to structured replies, but is
+  new enough that many clients and servers do not yet implement them.

 ## About this file

-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 4/6] spec: Allow 64-bit block status results
  2022-11-14 22:46 ` [PATCH v2 0/6] NBD spec changes for " Eric Blake
                     ` (2 preceding siblings ...)
  2022-11-14 22:46   ` [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
@ 2022-11-14 22:46   ` Eric Blake
  2022-11-14 22:46   ` [PATCH v2 5/6] spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD Eric Blake
  2022-11-14 22:46   ` [PATCH v2 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT Eric Blake
  5 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:46 UTC (permalink / raw)
  To: nbd; +Cc: qemu-devel, qemu-block, libguestfs

There are some potential extension metadata contexts that would
benefit from a 64-bit status value.  For example, Zoned Block Devices
(see https://zonedstorage.io/docs/linux/zbd-api) may want to return
the relative offset of where the next write will occur within the
zone, where a zone may be larger than 4G; creating a metacontext
"zbd:offset" that returns a 64-bit offset seems nicer than creating
two metacontexts "zbd:offset_lo" and "zbd:offset_hi" that each return
only 32 bits of the answer.

While the addition of extended headers superficially justified leaving
room in NBD_REPLY_TYPE_BLOCK_STATUS_EXT for the purpose of alignment,
it also has the nice benefit of being useful to allow extension
metadata contexts that can actually take advantage of the padding (and
remembering that since network byte order is big-endian, the padding
is in the correct location).  To ensure maximum backwards
compatibility, require that all contexts in the "base:" namespace (so
far, just "base:allocation") will only utilize 32-bit status.
---
 doc/proto.md | 62 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 49 insertions(+), 13 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index fde1e70..14af48d 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -987,7 +987,10 @@ The procedure works as follows:
   during transmission, the client MUST select one or more metadata
   contexts with the `NBD_OPT_SET_META_CONTEXT` command. If needed, the
   client can use `NBD_OPT_LIST_META_CONTEXT` to list contexts that the
-  server supports.
+  server supports.  Most metadata contexts expose no more than 32 bits
+  of information, but some metadata contexts have associated data that
+  is 64 bits in length; using such contexts requires the client to
+  first negotiate extended headers with `NBD_OPT_EXTENDED_HEADERS`.
 - During transmission, a client can then indicate interest in metadata
   for a given region by way of the `NBD_CMD_BLOCK_STATUS` command,
   where *offset* and *length* indicate the area of interest. On
@@ -1045,7 +1048,7 @@ third-party namespaces are currently registered:
 Save in respect of the `base:` namespace described below, this specification
 requires no specific semantics of metadata contexts, except that all the
 information they provide MUST be representable within the flags field as
-defined for `NBD_REPLY_TYPE_BLOCK_STATUS`. Likewise, save in respect of
+defined for `NBD_REPLY_TYPE_BLOCK_STATUS_EXT`. Likewise, save in respect of
 the `base:` namespace, the syntax of query strings is not specified by this
 document, other than the recommendation that the empty leaf-name makes
 sense as a wildcard for a client query during `NBD_OPT_LIST_META_CONTEXT`,
@@ -1112,7 +1115,9 @@ should make no assumption as to its contents or stability.

 For the `base:allocation` context, the remainder of the flags field is
 reserved. Servers SHOULD set it to all-zero; clients MUST ignore
-unknown flags.
+unknown flags.  Because fewer than 32 flags are defined, this metadata
+context does not require the use of `NBD_OPT_EXTENDED_HEADERS`, and a
+server can use `NBD_REPLY_TYPE_BLOCK_STATUS` to return results.

 ## Values

@@ -1480,6 +1485,18 @@ of the newstyle negotiation.
     to do so, a server MAY send `NBD_REP_ERR_INVALID` or
     `NBD_REP_ERR_EXT_HEADER_REQD`.

+    A server MAY support extension contexts that produce status values
+    that require more than 32 bits.  The server MAY advertise such
+    contexts even if the client has not yet negotiated extended
+    headers, although it SHOULD then conclude the overall response
+    with the `NBD_REP_ERR_EXT_HEADER_REQD` error to inform the client
+    that extended headers are required to make full use of all
+    contexts advertised.  However, since none of the contexts defined
+    in the "base:" namespace provide more than 32 bits of status, a
+    server MUST NOT use this failure mode when the response is limited
+    to the "base:" namespace; nor may the server use this failure mode
+    when the client has already negotiated extended headers.
+
     Data:
     - 32 bits, length of export name.  
     - String, name of export for which we wish to list metadata
@@ -1565,6 +1582,13 @@ of the newstyle negotiation.
     to do so, a server SHOULD send `NBD_REP_ERR_INVALID` or
     `NBD_REP_ERR_EXT_HEADER_REQD`.

+    If a client requests a metadata context that utilizes 64-bit
+    status, but has not yet negotiated extended headers, the server
+    MUST either omit that context from its successful reply, or else
+    fail the request with `NBD_REP_ERR_EXT_HEADER_REQD`.  The server
+    MUST NOT use this failure for a client request that is limited to
+    contexts in the "base:" namespace.
+
     A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the
     negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least
     once, and where the final time it was sent, it referred to the
@@ -2028,16 +2052,23 @@ size.
   extent information at the first offset not covered by a
   reduced-length reply.

+  For an extension metadata context that documents that the status
+  value may potentially occupy 64 bits, a server MUST NOT use this
+  reply type unless the most-significant 32 bits of all *status*
+  values included in this reply are all zeroes.  Note that if the
+  client did not negotiate extended headers, then the server already
+  guaranteed during the handshake phase that no metadata contexts
+  utilizing a 64-bit status value were negotiated.
+
 * `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)

   This chunk type is in the status chunk category.  *length* MUST be
   8 + (a positive multiple of 16).  The semantics of this chunk mirror
   those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a
-  larger *extent length* field, added padding in each descriptor to
-  ease alignment, and the addition of a *descriptor count* field that
-  can be used for easier client processing.  This chunk type MUST NOT
-  be used unless extended headers were negotiated with
-  `NBD_OPT_EXTENDED_HEADERS`.
+  larger *extent length* field and a 64-bit *status* field, and the
+  addition of a *descriptor count* field that can be used for easier
+  client processing.  This chunk type MUST NOT be used unless extended
+  headers were negotiated with `NBD_OPT_EXTENDED_HEADERS`.

   If the *descriptor count* field contains 0, the number of subsequent
   descriptors is determined solely by the *length* field of the reply
@@ -2056,14 +2087,19 @@ size.

   64 bits, length of the extent to which the status below
      applies (unsigned, MUST be nonzero)  
-  32 bits, padding (MUST be zero)  
-  32 bits, status flags  
+  64 bits, status flags  

   Note that even when extended headers are in use, the client MUST be
   prepared for the server to use either the compact or extended chunk
-  type, regardless of whether the client's hinted effect length was
-  more or less than 32 bits; but the server MUST use exactly one of
-  the two chunk types per negotiated metacontext ID.
+  type for metadata contexts, regardless of whether the client's
+  hinted effect length was more or less than 32 bits; but the server
+  MUST use exactly one of the two chunk types per negotiated
+  metacontext ID.  However, the server MUST use the extended chunk
+  type when responding to an extension metadata context that utilizes
+  a 64-bit status code where the resulting *status* value is not
+  representable in 32 bits.  For metadata contexts that only return a
+  32-bit status (including all contexts in the "base:" namespace), the
+  most-significant 32 bits of *status* MUST be all zeroes.

 All error chunk types have bit 15 set, and begin with the same
 *error*, *message length*, and optional *message* fields as
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 5/6] spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD
  2022-11-14 22:46 ` [PATCH v2 0/6] NBD spec changes for " Eric Blake
                     ` (3 preceding siblings ...)
  2022-11-14 22:46   ` [PATCH v2 4/6] spec: Allow 64-bit block status results Eric Blake
@ 2022-11-14 22:46   ` Eric Blake
  2023-02-22 10:05     ` Wouter Verhelst
  2022-11-14 22:46   ` [PATCH v2 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT Eric Blake
  5 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:46 UTC (permalink / raw)
  To: nbd; +Cc: qemu-devel, qemu-block, libguestfs

NBD_CMD_BLOCK_STATUS currently forces the server to reply to all
metacontext ids that the client negotiated via
NBD_OPT_SET_META_CONTEXT.  But since extended headers make it easy for
the client to pass command payloads, we can allow for a client to
negotiate multiple metacontexts up front but express dynamic interest
in varying subsets of those contexts over the life of the connection,
for less wasted effort in responding to NBD_CMD_BLOCK_STATUS.  This
works by having the command payload supply an effect length and a list
of ids the client is currently interested in.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 doc/proto.md | 62 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 49 insertions(+), 13 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 14af48d..645a736 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -397,17 +397,20 @@ additional bytes of payload are present), or if the flag is absent
 (there is no payload, and *length* instead is an effect length
 describing how much of the image the request operates on).  The
 command `NBD_CMD_WRITE` MUST use the flag `NBD_CMD_FLAG_PAYLOAD_LEN`
-in this mode; while other commands SHOULD avoid the flag if the
-server has not indicated extension suppport for payloads on that
-command.  A server SHOULD initiate hard disconnect if a client sets
-the `NBD_CMD_FLAG_PAYLOAD_LEN` flag and uses a *length* larger than
-a server's advertised or default maximum payload length (capped at
-32 bits by the constraints of `NBD_INFO_BLOCK_SIZE`); in all other
-cases, a server SHOULD gracefully consume *length* bytes of payload
-(even if it then replies with an `NBD_EINVAL` failure because the
-particular command was not expecting a payload), and proceed with
-the next client command.  Thus, only when *length* is used as an
-effective length will it utilize a full 64-bit value.
+in this mode; most other commands omit it, although some like
+`NBD_CMD_BLOCK_STATUS` optionally support the flag in order to allow
+the client to pass additional information in the payload (where the
+command documents what the payload will contain, including the
+possibility of a separate effect length).  A server SHOULD initiate
+hard disconnect if a client sets the `NBD_CMD_FLAG_PAYLOAD_LEN` flag
+and uses a *length* larger than a server's advertised or default
+maximum payload length (capped at 32 bits by the constraints of
+`NBD_INFO_BLOCK_SIZE`); in all other cases, a server SHOULD gracefully
+consume *length* bytes of payload (even if it then replies with an
+`NBD_EINVAL` failure because the particular command was not expecting
+a payload), and proceed with the next client command.  Thus, only when
+*length* is used as an effective length will it utilize a full 64-bit
+value.

 #### Simple reply message

@@ -1232,6 +1235,19 @@ The field has the following format:
   will be faster than a regular write). Clients MUST NOT set the
   `NBD_CMD_FLAG_FAST_ZERO` request flag unless this transmission flag
   is set.
+- bit 12, `NBD_FLAG_BLOCK_STATUS_PAYLOAD`: Indicates that the server
+  understands the use of the `NBD_CMD_FLAG_PAYLOAD_LEN` flag to
+  `NBD_CMD_BLOCK_STATUS` to allow the client to request that the
+  server filters its response to a specific subset of negotiated
+  metacontext ids passed in via a client payload, rather than the
+  default of replying to all metacontext ids. Servers MUST NOT
+  advertise this bit unless the client successfully negotiates
+  extended headers via `NBD_OPT_EXTENDED_HEADERS`, and SHOULD NOT
+  advertise this bit in response to `NBD_OPT_EXPORT_NAME` or
+  `NBD_OPT_GO` if the client does not negotiate metacontexts with
+  `NBD_OPT_SET_META_CONTEXT`; clients SHOULD NOT set the
+  `NBD_CMD_FLAG_PAYLOAD_LEN` flag for `NBD_CMD_BLOCK_STATUS` unless
+  this transmission flag is set.

 Clients SHOULD ignore unknown flags.

@@ -1915,8 +1931,11 @@ valid may depend on negotiation during the handshake phase.
   header.  With extended headers, the flag MUST be set for
   `NBD_CMD_WRITE` (as the write command always sends a payload of the
   bytes to be written); for other commands, the flag will trigger an
-  `NBD_EINVAL` error unless the server has advertised support for an
-  extension payload form for the command.
+  `NBD_EINVAL` error unless the command documents an optional payload
+  form for the command and the server has implemented that form (an
+  example being `NBD_CMD_BLOCK_STATUS` providing a payload form for
+  restricting the response to a particular metacontext id, when the
+  server advertises `NBD_FLAG_BLOCK_STATUS_PAYLOAD`).

 ##### Structured reply flags

@@ -2464,6 +2483,23 @@ The following request types exist:
     The server MAY send chunks in a different order than the context
     ids were assigned in reply to `NBD_OPT_SET_META_CONTEXT`.

+    If extended headers were negotiated, a server MAY optionally
+    advertise, via the transmission flag
+    `NBD_FLAG_BLOCK_STATUS_PAYLOAD`, that it supports an alternative
+    request form where the client sets `NBD_CMD_FLAG_PAYLOAD_LEN` in
+    order to pass a payload that informs the server to limit its
+    replies to the metacontext id(s) in the client's request payload,
+    rather than giving an answer on all possible metacontext ids.  If
+    the server does not support the payload form, or detects duplicate
+    or unknown metacontext ids in the client's payload, the server
+    MUST gracefully consume the client's payload before failing with
+    `NBD_EINVAL`.  The payload form MUST occupy 8 + n*4 bytes, where n
+    is the number of metacontext ids the client is interested in (as
+    implied by the payload length), laid out as:
+
+    64 bits, effect length  
+    n * 32 bits, list of metacontext ids to use  
+
     The list of block status descriptors within a given status chunk
     represent consecutive portions of the file starting from specified
     *offset*.  If the client used the `NBD_CMD_FLAG_REQ_ONE` flag,
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
  2022-11-14 22:46 ` [PATCH v2 0/6] NBD spec changes for " Eric Blake
                     ` (4 preceding siblings ...)
  2022-11-14 22:46   ` [PATCH v2 5/6] spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD Eric Blake
@ 2022-11-14 22:46   ` Eric Blake
  5 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:46 UTC (permalink / raw)
  To: nbd; +Cc: qemu-devel, qemu-block, libguestfs

Rather than requiring all servers and clients to have a 32-bit limit
on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize
support for a 64-bit single I/O transaction now.
NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but
NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart.

By standardizing this, all clients must be prepared to support both
types of hole type replies, even though most server implementations of
extended replies are likely to only send one hole type.

---

As this may mean a corner-case that gets less testing, I have
separated it into a separate optional patch.  I implemented it in my
proof-of-concept, but am happy to drop this patch for what actually
goes upstream.

In particular, if we foresee clients and servers that WANT to support
a payload larger than 4G, it may be worth introducing an NBD_INFO_*
that supplies 64-bit block sizing information, rather than our current
inherent 32-bit limit of NBD_INFO_BLOCK_SIZE, at the same time as we
introduce this reply type.
---
 doc/proto.md | 73 ++++++++++++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 33 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 645a736..9c04411 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -2008,19 +2008,25 @@ size.
   64 bits: offset (unsigned)  
   32 bits: hole size (unsigned, MUST be nonzero)  

-  At this time, although servers that support extended headers are
-  permitted to accept client requests for `NBD_CMD_READ` with an
-  effect length larger than any advertised maximum block payload size
-  by splitting the reply into multiple chunks, portable clients SHOULD
-  NOT request a read *length* larger than 32 bits (corresponding to
-  the maximum block payload constraint implied by
-  `NBD_INFO_BLOCK_SIZE`), and therefore a 32-bit constraint on the
-  *hole size* does not represent an arbitrary limitation.  Should a
-  future scenario arise where it can be demonstrated that a client and
-  server would benefit from an extension allowing a maximum block
-  payload size to be larger than 32 bits, that extension would also
-  introduce a counterpart reply type that can express a 64-bit *hole
-  size*.
+* `NBD_REPLY_TYPE_OFFSET_HOLE_EXT` (3)
+
+  This chunk type is in the content chunk category.  *length* MUST be
+  exactly 16.  The semantics of this chunk mirror those of
+  `NBD_REPLY_TYPE_OFFSET_HOLE`, other than the use of a larger *hole
+  size* field.  This chunk type MUST NOT be used unless extended
+  headers were negotiated with `NBD_OPT_EXTENDED_HEADERS`.
+
+  The payload is structured as:
+
+  64 bits: offset (unsigned)  
+  64 bits: hole size (unsigned, MUST be nonzero)  
+
+  Note that even though extended headers are in use, a server may
+  enforce a maximum block size that is smaller than 32 bits, in which
+  case no valid `NBD_CMD_READ` will have a *length* large enough to
+  require the use of this chunk type.  However, a client using
+  extended headers MUST be prepared for the server to use either the
+  compact or extended chunk type.

 * `NBD_REPLY_TYPE_BLOCK_STATUS` (5)

@@ -2218,26 +2224,27 @@ The following request types exist:
     the following additional constraints.

     The server MAY split the reply into any number of content chunks
-    (`NBD_REPLY_TYPE_OFFSET_DATA` and `NBD_REPLY_TYPE_OFFSET_HOLE`);
-    each chunk MUST describe at least one byte, although to minimize
-    overhead, the server SHOULD use chunks with lengths and offsets as
-    an integer multiple of 512 bytes, where possible (the first and
-    last chunk of an unaligned read being the most obvious places for
-    an exception).  The server MUST NOT send content chunks that
-    overlap with any earlier content or error chunk, and MUST NOT send
-    chunks that describe data outside the offset and length of the
-    request, but MAY send the content chunks in any order (the client
-    MUST reassemble content chunks into the correct order), and MAY
-    send additional content chunks even after reporting an error
-    chunk.  A server MAY support read requests larger than the maximum
-    block payload size by splitting the response across multiple
-    chunks (in particular, if extended headers are not in use, a
-    request for more than 2^32 - 8 bytes containing data rather than
-    holes MUST be split to avoid overflowing the 32-bit
-    `NBD_REPLY_TYPE_OFFSET_DATA` length field); however, the server is
-    also permitted to reject large read requests up front, so a client
-    should be prepared to retry with smaller requests if a large
-    request fails.
+    (`NBD_REPLY_TYPE_OFFSET_DATA` and `NBD_REPLY_TYPE_OFFSET_HOLE` for
+    structured replies, additionally `NBD_REPLY_TYPE_OFFSET_HOLE_EXT`
+    for extended headers); each chunk MUST describe at least one byte,
+    although to minimize overhead, the server SHOULD use chunks with
+    lengths and offsets as an integer multiple of 512 bytes, where
+    possible (the first and last chunk of an unaligned read being the
+    most obvious places for an exception).  The server MUST NOT send
+    content chunks that overlap with any earlier content or error
+    chunk, and MUST NOT send chunks that describe data outside the
+    offset and length of the request, but MAY send the content chunks
+    in any order (the client MUST reassemble content chunks into the
+    correct order), and MAY send additional content chunks even after
+    reporting an error chunk.  A server MAY support read requests
+    larger than the maximum block payload size by splitting the
+    response across multiple chunks (in particular, if extended
+    headers are not in use, a request for more than 2^32 - 8 bytes
+    containing data rather than holes MUST be split to avoid
+    overflowing the 32-bit `NBD_REPLY_TYPE_OFFSET_DATA` length field);
+    however, the server is also permitted to reject large read
+    requests up front, so a client should be prepared to retry with
+    smaller requests if a large request fails.

     When no error is detected, the server MUST send enough data chunks
     to cover the entire region described by the offset and length of
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 00/15] qemu patches for 64-bit NBD extensions
  2022-11-14 22:41 [cross-project PATCH v2] NBD 64-bit extensions Eric Blake
  2022-11-14 22:46 ` [PATCH v2 0/6] NBD spec changes for " Eric Blake
@ 2022-11-14 22:48 ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 01/15] nbd/client: Add safety check on chunk payload length Eric Blake
                     ` (14 more replies)
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
  2 siblings, 15 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, libguestfs, nbd

This series implements the spec changes in a counterpart NBD series,
and has been tested to be interoperable with libnbd implementing the
same spec.  I'm not too happy with the RFC patch at the end, but
implemented it for discussion.  Given the release timing, this would
be qemu 8.0 material if we are happy with the direction the spec is
headed in.

Eric Blake (15):
  nbd/client: Add safety check on chunk payload length
  nbd/server: Prepare for alternate-size headers
  nbd: Prepare for 64-bit request effect lengths
  nbd: Add types for extended headers
  nbd/server: Refactor handling of request payload
  nbd/server: Refactor to pass full request around
  nbd/server: Initial support for extended headers
  nbd/server: Support 64-bit block status
  nbd/client: Initial support for extended headers
  nbd/client: Accept 64-bit block status chunks
  nbd/client: Request extended headers during negotiation
  nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  RFC: nbd/client: Accept 64-bit hole chunks
  RFC: nbd/server: Send 64-bit hole chunk

 docs/interop/nbd.txt                          |   1 +
 include/block/nbd.h                           | 163 +++--
 nbd/nbd-internal.h                            |   8 +-
 block/nbd.c                                   | 102 ++-
 nbd/client-connection.c                       |   1 +
 nbd/client.c                                  | 132 +++-
 nbd/common.c                                  |  14 +-
 nbd/server.c                                  | 636 +++++++++++++-----
 qemu-nbd.c                                    |   3 +
 block/trace-events                            |   1 +
 nbd/trace-events                              |  11 +-
 tests/qemu-iotests/223.out                    |  18 +-
 tests/qemu-iotests/233.out                    |   5 +
 tests/qemu-iotests/241.out                    |   3 +
 tests/qemu-iotests/307.out                    |  15 +-
 .../tests/nbd-qemu-allocation.out             |   3 +-
 16 files changed, 797 insertions(+), 319 deletions(-)

-- 
2.38.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 01/15] nbd/client: Add safety check on chunk payload length
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 02/15] nbd/server: Prepare for alternate-size headers Eric Blake
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy

Our existing use of structured replies either reads into a qiov capped
at 32M (NBD_CMD_READ) or caps allocation to 1000 bytes (see
NBD_MAX_MALLOC_PAYLOAD in block/nbd.c).  But the existing length
checks are rather late; if we encounter a buggy (or malicious) server
that sends a super-large payload length, we should drop the connection
right then rather than assuming the layer on top will be careful.
This becomes more important when we permit 64-bit lengths which are
even more likely to have the potential for attempted denial of service
abuse.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/nbd/client.c b/nbd/client.c
index 90a6b7b38b..cd97a2aa09 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1412,6 +1412,18 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
     chunk->handle = be64_to_cpu(chunk->handle);
     chunk->length = be32_to_cpu(chunk->length);

+    /*
+     * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests
+     * at 32M, no valid server should send us payload larger than
+     * this.  Even if we stopped using REQ_ONE, sane servers will cap
+     * the number of extents they return for block status.
+     */
+    if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
+        error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long",
+                   chunk->type, nbd_rep_lookup(chunk->type));
+        return -EINVAL;
+    }
+
     return 0;
 }

-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 02/15] nbd/server: Prepare for alternate-size headers
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
  2022-11-14 22:48   ` [PATCH v2 01/15] nbd/client: Add safety check on chunk payload length Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 03/15] nbd: Prepare for 64-bit request effect lengths Eric Blake
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy,
	Kevin Wolf, Hanna Reitz

An upcoming NBD extension wants to add the ability to do 64-bit effect
lengths in requests.  As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry [*].  Additionally, where the reply header is currently
16 bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one structured reply type, of 32
bytes.  Since we are already wired up to use iovecs, it is easiest to
allow for this change in header size by splitting each structured
reply across two iovecs, one for the header (which will become
variable-length in a future patch according to client negotiation),
and the other for the payload, and removing the header from the
payload struct definitions.  Interestingly, the client side code never
utilized the packed types, so only the server code needs to be
updated.

[*] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction.  But even though the extended reply length is widened to
64 bits, at present we will still never send a reply payload larger
than just over 32M (the maximum buffer we allow in NBD_CMD_READ; and
we cap the number of extents we are willing to report in
NBD_CMD_BLOCK_STATUS); for a future server to truly support a 4G read
in one transaction, NBD_OPT_GO would need an extension of a new
NBD_INFO_ field that provides for a 64-bit maximum transaction length.
Where 64-bit fields really matter in the extension is in a later patch
adding 64-bit support into a counterpart for REPLY_TYPE_BLOCK_STATUS.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  8 +++---
 nbd/server.c        | 64 ++++++++++++++++++++++++++++-----------------
 2 files changed, 44 insertions(+), 28 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 4ede3b2bd0..1330dbc18b 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -95,28 +95,28 @@ typedef union NBDReply {

 /* Header of chunk for NBD_REPLY_TYPE_OFFSET_DATA */
 typedef struct NBDStructuredReadData {
-    NBDStructuredReplyChunk h; /* h.length >= 9 */
+    /* header's .length >= 9 */
     uint64_t offset;
     /* At least one byte of data payload follows, calculated from h.length */
 } QEMU_PACKED NBDStructuredReadData;

 /* Complete chunk for NBD_REPLY_TYPE_OFFSET_HOLE */
 typedef struct NBDStructuredReadHole {
-    NBDStructuredReplyChunk h; /* h.length == 12 */
+    /* header's length == 12 */
     uint64_t offset;
     uint32_t length;
 } QEMU_PACKED NBDStructuredReadHole;

 /* Header of all NBD_REPLY_TYPE_ERROR* errors */
 typedef struct NBDStructuredError {
-    NBDStructuredReplyChunk h; /* h.length >= 6 */
+    /* header's length >= 6 */
     uint32_t error;
     uint16_t message_length;
 } QEMU_PACKED NBDStructuredError;

 /* Header of NBD_REPLY_TYPE_BLOCK_STATUS */
 typedef struct NBDStructuredMeta {
-    NBDStructuredReplyChunk h; /* h.length >= 12 (at least one extent) */
+    /* header's length >= 12 (at least one extent) */
     uint32_t context_id;
     /* extents follows */
 } QEMU_PACKED NBDStructuredMeta;
diff --git a/nbd/server.c b/nbd/server.c
index ada16089f3..37f9c21d20 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1888,9 +1888,12 @@ static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
     return ret;
 }

-static inline void set_be_simple_reply(NBDSimpleReply *reply, uint64_t error,
-                                       uint64_t handle)
+static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov,
+                                       uint64_t error, uint64_t handle)
 {
+    NBDSimpleReply *reply = iov->iov_base;
+
+    iov->iov_len = sizeof(*reply);
     stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
     stl_be_p(&reply->error, error);
     stq_be_p(&reply->handle, handle);
@@ -1903,23 +1906,27 @@ static int nbd_co_send_simple_reply(NBDClient *client,
                                     size_t len,
                                     Error **errp)
 {
-    NBDSimpleReply reply;
+    NBDReply hdr;
     int nbd_err = system_errno_to_nbd_errno(error);
     struct iovec iov[] = {
-        {.iov_base = &reply, .iov_len = sizeof(reply)},
+        {.iov_base = &hdr},
         {.iov_base = data, .iov_len = len}
     };

     trace_nbd_co_send_simple_reply(handle, nbd_err, nbd_err_lookup(nbd_err),
                                    len);
-    set_be_simple_reply(&reply, nbd_err, handle);
+    set_be_simple_reply(client, &iov[0], nbd_err, handle);

     return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
 }

-static inline void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
-                                uint16_t type, uint64_t handle, uint32_t length)
+static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
+                                uint16_t flags, uint16_t type,
+                                uint64_t handle, uint32_t length)
 {
+    NBDStructuredReplyChunk *chunk = iov->iov_base;
+
+    iov->iov_len = sizeof(*chunk);
     stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
     stw_be_p(&chunk->flags, flags);
     stw_be_p(&chunk->type, type);
@@ -1931,13 +1938,14 @@ static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
                                                     uint64_t handle,
                                                     Error **errp)
 {
-    NBDStructuredReplyChunk chunk;
+    NBDReply hdr;
     struct iovec iov[] = {
-        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+        {.iov_base = &hdr},
     };

     trace_nbd_co_send_structured_done(handle);
-    set_be_chunk(&chunk, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE, handle, 0);
+    set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE,
+                 NBD_REPLY_TYPE_NONE, handle, 0);

     return nbd_co_send_iov(client, iov, 1, errp);
 }
@@ -1950,20 +1958,21 @@ static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
                                                     bool final,
                                                     Error **errp)
 {
+    NBDReply hdr;
     NBDStructuredReadData chunk;
     struct iovec iov[] = {
+        {.iov_base = &hdr},
         {.iov_base = &chunk, .iov_len = sizeof(chunk)},
         {.iov_base = data, .iov_len = size}
     };

     assert(size);
     trace_nbd_co_send_structured_read(handle, offset, data, size);
-    set_be_chunk(&chunk.h, final ? NBD_REPLY_FLAG_DONE : 0,
-                 NBD_REPLY_TYPE_OFFSET_DATA, handle,
-                 sizeof(chunk) - sizeof(chunk.h) + size);
+    set_be_chunk(client, &iov[0], final ? NBD_REPLY_FLAG_DONE : 0,
+                 NBD_REPLY_TYPE_OFFSET_DATA, handle, iov[1].iov_len + size);
     stq_be_p(&chunk.offset, offset);

-    return nbd_co_send_iov(client, iov, 2, errp);
+    return nbd_co_send_iov(client, iov, 3, errp);
 }

 static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
@@ -1972,9 +1981,11 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
                                                      const char *msg,
                                                      Error **errp)
 {
+    NBDReply hdr;
     NBDStructuredError chunk;
     int nbd_err = system_errno_to_nbd_errno(error);
     struct iovec iov[] = {
+        {.iov_base = &hdr},
         {.iov_base = &chunk, .iov_len = sizeof(chunk)},
         {.iov_base = (char *)msg, .iov_len = msg ? strlen(msg) : 0},
     };
@@ -1982,12 +1993,12 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
     assert(nbd_err);
     trace_nbd_co_send_structured_error(handle, nbd_err,
                                        nbd_err_lookup(nbd_err), msg ? msg : "");
-    set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_ERROR, handle,
-                 sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len);
+    set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE,
+                 NBD_REPLY_TYPE_ERROR, handle, iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.error, nbd_err);
-    stw_be_p(&chunk.message_length, iov[1].iov_len);
+    stw_be_p(&chunk.message_length, iov[2].iov_len);

-    return nbd_co_send_iov(client, iov, 1 + !!iov[1].iov_len, errp);
+    return nbd_co_send_iov(client, iov, 2 + !!iov[2].iov_len, errp);
 }

 /* Do a sparse read and send the structured reply to the client.
@@ -2025,19 +2036,22 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
         assert(pnum && pnum <= size - progress);
         final = progress + pnum == size;
         if (status & BDRV_BLOCK_ZERO) {
+            NBDReply hdr;
             NBDStructuredReadHole chunk;
             struct iovec iov[] = {
+                {.iov_base = &hdr},
                 {.iov_base = &chunk, .iov_len = sizeof(chunk)},
             };

             trace_nbd_co_send_structured_read_hole(handle, offset + progress,
                                                    pnum);
-            set_be_chunk(&chunk.h, final ? NBD_REPLY_FLAG_DONE : 0,
+            set_be_chunk(client, &iov[0],
+                         final ? NBD_REPLY_FLAG_DONE : 0,
                          NBD_REPLY_TYPE_OFFSET_HOLE,
-                         handle, sizeof(chunk) - sizeof(chunk.h));
+                         handle, iov[1].iov_len);
             stq_be_p(&chunk.offset, offset + progress);
             stl_be_p(&chunk.length, pnum);
-            ret = nbd_co_send_iov(client, iov, 1, errp);
+            ret = nbd_co_send_iov(client, iov, 2, errp);
         } else {
             ret = blk_pread(exp->common.blk, offset + progress, pnum,
                             data + progress, 0);
@@ -2201,8 +2215,10 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
                                NBDExtentArray *ea,
                                bool last, uint32_t context_id, Error **errp)
 {
+    NBDReply hdr;
     NBDStructuredMeta chunk;
     struct iovec iov[] = {
+        {.iov_base = &hdr},
         {.iov_base = &chunk, .iov_len = sizeof(chunk)},
         {.iov_base = ea->extents, .iov_len = ea->count * sizeof(ea->extents[0])}
     };
@@ -2211,12 +2227,12 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle,

     trace_nbd_co_send_extents(handle, ea->count, context_id, ea->total_length,
                               last);
-    set_be_chunk(&chunk.h, last ? NBD_REPLY_FLAG_DONE : 0,
+    set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0,
                  NBD_REPLY_TYPE_BLOCK_STATUS,
-                 handle, sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len);
+                 handle, iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.context_id, context_id);

-    return nbd_co_send_iov(client, iov, 2, errp);
+    return nbd_co_send_iov(client, iov, 3, errp);
 }

 /* Get block status from the exported device and send it to the client */
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 03/15] nbd: Prepare for 64-bit request effect lengths
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
  2022-11-14 22:48   ` [PATCH v2 01/15] nbd/client: Add safety check on chunk payload length Eric Blake
  2022-11-14 22:48   ` [PATCH v2 02/15] nbd/server: Prepare for alternate-size headers Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 04/15] nbd: Add types for extended headers Eric Blake
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy,
	Kevin Wolf, Hanna Reitz

Widen the length field of NBDRequest to 64-bits, although we can
assert that all current uses are still under 32 bits.  Move the
request magic number to nbd.h, to live alongside the reply magic
number.  Add the necessary bools that will eventually track whether
the client successfully negotiated extended headers with the server,
allowing the nbd driver to pass larger requests along where possible;
although in this patch they always remain false for no semantic change
yet.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h | 21 ++++++++++++---------
 nbd/nbd-internal.h  |  3 +--
 block/nbd.c         | 35 ++++++++++++++++++++++++-----------
 nbd/client.c        |  9 ++++++---
 nbd/server.c        | 12 +++++++++---
 nbd/trace-events    |  8 ++++----
 6 files changed, 56 insertions(+), 32 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 1330dbc18b..e357452a57 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -52,17 +52,16 @@ typedef struct NBDOptionReplyMetaContext {

 /* Transmission phase structs
  *
- * Note: these are _NOT_ the same as the network representation of an NBD
- * request and reply!
+ * Note: NBDRequest is _NOT_ the same as the network representation of an NBD
+ * request!
  */
-struct NBDRequest {
+typedef struct NBDRequest {
     uint64_t handle;
-    uint64_t from;
-    uint32_t len;
+    uint64_t from;  /* Offset touched by the command */
+    uint64_t len;   /* Effect length; 32 bit limit without extended headers */
     uint16_t flags; /* NBD_CMD_FLAG_* */
-    uint16_t type; /* NBD_CMD_* */
-};
-typedef struct NBDRequest NBDRequest;
+    uint16_t type;  /* NBD_CMD_* */
+} NBDRequest;

 typedef struct NBDSimpleReply {
     uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
@@ -235,6 +234,9 @@ enum {
  */
 #define NBD_MAX_STRING_SIZE 4096

+/* Transmission request structure */
+#define NBD_REQUEST_MAGIC           0x25609513
+
 /* Two types of reply structures */
 #define NBD_SIMPLE_REPLY_MAGIC      0x67446698
 #define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
@@ -293,6 +295,7 @@ struct NBDExportInfo {
     /* In-out fields, set by client before nbd_receive_negotiate() and
      * updated by server results during nbd_receive_negotiate() */
     bool structured_reply;
+    bool extended_headers;
     bool base_allocation; /* base:allocation context for NBD_CMD_BLOCK_STATUS */

     /* Set by server results during nbd_receive_negotiate() and
@@ -322,7 +325,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
                             Error **errp);
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
-int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
+int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr);
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
                                    NBDReply *reply, Error **errp);
 int nbd_client(int fd);
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 1b2141ab4b..0016793ff4 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -1,7 +1,7 @@
 /*
  * NBD Internal Declarations
  *
- * Copyright (C) 2016 Red Hat, Inc.
+ * Copyright (C) 2016-2021 Red Hat, Inc.
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -45,7 +45,6 @@
 #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124)

 #define NBD_INIT_MAGIC              0x4e42444d41474943LL /* ASCII "NBDMAGIC" */
-#define NBD_REQUEST_MAGIC           0x25609513
 #define NBD_OPTS_MAGIC              0x49484156454F5054LL /* ASCII "IHAVEOPT" */
 #define NBD_CLIENT_MAGIC            0x0000420281861253LL
 #define NBD_REP_MAGIC               0x0003e889045565a9LL
diff --git a/block/nbd.c b/block/nbd.c
index 7d485c86d2..32681d2867 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -2,7 +2,7 @@
  * QEMU Block driver for  NBD
  *
  * Copyright (c) 2019 Virtuozzo International GmbH.
- * Copyright (C) 2016 Red Hat, Inc.
+ * Copyright (C) 2016-2022 Red Hat, Inc.
  * Copyright (C) 2008 Bull S.A.S.
  *     Author: Laurent Vivier <Laurent.Vivier@bull.net>
  *
@@ -340,7 +340,7 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
          */
         NBDRequest request = { .type = NBD_CMD_DISC };

-        nbd_send_request(s->ioc, &request);
+        nbd_send_request(s->ioc, &request, s->info.extended_headers);

         yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
                                  nbd_yank, bs);
@@ -524,14 +524,14 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,

     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
-        rc = nbd_send_request(s->ioc, request);
+        rc = nbd_send_request(s->ioc, request, s->info.extended_headers);
         if (rc >= 0 && qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                               NULL) < 0) {
             rc = -EIO;
         }
         qio_channel_set_cork(s->ioc, false);
     } else {
-        rc = nbd_send_request(s->ioc, request);
+        rc = nbd_send_request(s->ioc, request, s->info.extended_headers);
     }
     qemu_co_mutex_unlock(&s->send_mutex);

@@ -1298,10 +1298,11 @@ static int coroutine_fn nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_
     NBDRequest request = {
         .type = NBD_CMD_WRITE_ZEROES,
         .from = offset,
-        .len = bytes,  /* .len is uint32_t actually */
+        .len = bytes,
     };

-    assert(bytes <= UINT32_MAX); /* rely on max_pwrite_zeroes */
+    /* rely on max_pwrite_zeroes */
+    assert(bytes <= UINT32_MAX || s->info.extended_headers);

     assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
     if (!(s->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
@@ -1348,10 +1349,11 @@ static int coroutine_fn nbd_client_co_pdiscard(BlockDriverState *bs, int64_t off
     NBDRequest request = {
         .type = NBD_CMD_TRIM,
         .from = offset,
-        .len = bytes, /* len is uint32_t */
+        .len = bytes,
     };

-    assert(bytes <= UINT32_MAX); /* rely on max_pdiscard */
+    /* rely on max_pdiscard */
+    assert(bytes <= UINT32_MAX || s->info.extended_headers);

     assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
     if (!(s->info.flags & NBD_FLAG_SEND_TRIM) || !bytes) {
@@ -1373,8 +1375,7 @@ static int coroutine_fn nbd_client_co_block_status(
     NBDRequest request = {
         .type = NBD_CMD_BLOCK_STATUS,
         .from = offset,
-        .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
-                   MIN(bytes, s->info.size - offset)),
+        .len = MIN(bytes, s->info.size - offset),
         .flags = NBD_CMD_FLAG_REQ_ONE,
     };

@@ -1384,6 +1385,10 @@ static int coroutine_fn nbd_client_co_block_status(
         *file = bs;
         return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
     }
+    if (!s->info.extended_headers) {
+        request.len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
+                          request.len);
+    }

     /*
      * Work around the fact that the block layer doesn't do
@@ -1462,7 +1467,7 @@ static void nbd_client_close(BlockDriverState *bs)
     NBDRequest request = { .type = NBD_CMD_DISC };

     if (s->ioc) {
-        nbd_send_request(s->ioc, &request);
+        nbd_send_request(s->ioc, &request, s->info.extended_headers);
     }

     nbd_teardown_connection(bs);
@@ -1953,6 +1958,14 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
     bs->bl.max_pwrite_zeroes = max;
     bs->bl.max_transfer = max;

+    /*
+     * Assume that if the server supports extended headers, it also
+     * supports unlimited size zero and trim commands.
+     */
+    if (s->info.extended_headers) {
+        bs->bl.max_pdiscard = bs->bl.max_pwrite_zeroes = 0;
+    }
+
     if (s->info.opt_block &&
         s->info.opt_block > bs->bl.opt_transfer) {
         bs->bl.opt_transfer = s->info.opt_block;
diff --git a/nbd/client.c b/nbd/client.c
index cd97a2aa09..2480a48ec6 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2019 Red Hat, Inc.
+ *  Copyright (C) 2016-2022 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Client Side
@@ -1033,6 +1033,7 @@ int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
                                  info->structured_reply, &zeroes, errp);

     info->structured_reply = false;
+    info->extended_headers = false;
     info->base_allocation = false;
     if (tlscreds && *outioc) {
         ioc = *outioc;
@@ -1221,7 +1222,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
         if (nbd_drop(ioc, 124, NULL) == 0) {
             NBDRequest request = { .type = NBD_CMD_DISC };

-            nbd_send_request(ioc, &request);
+            nbd_send_request(ioc, &request, false);
         }
         break;
     default:
@@ -1345,10 +1346,12 @@ int nbd_disconnect(int fd)

 #endif /* __linux__ */

-int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
+int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr)
 {
     uint8_t buf[NBD_REQUEST_SIZE];

+    assert(!ext_hdr);
+    assert(request->len <= UINT32_MAX);
     trace_nbd_send_request(request->from, request->len, request->handle,
                            request->flags, request->type,
                            nbd_cmd_lookup(request->type));
diff --git a/nbd/server.c b/nbd/server.c
index 37f9c21d20..7738f5f899 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -142,6 +142,7 @@ struct NBDClient {
     uint32_t check_align; /* If non-zero, check for aligned client requests */

     bool structured_reply;
+    bool extended_headers;
     NBDExportMetaContexts export_meta;

     uint32_t opt; /* Current option being negotiated */
@@ -1436,7 +1437,7 @@ static int nbd_receive_request(NBDClient *client, NBDRequest *request,
     request->type   = lduw_be_p(buf + 6);
     request->handle = ldq_be_p(buf + 8);
     request->from   = ldq_be_p(buf + 16);
-    request->len    = ldl_be_p(buf + 24);
+    request->len    = ldl_be_p(buf + 24); /* widen 32 to 64 bits */

     trace_nbd_receive_request(magic, request->flags, request->type,
                               request->from, request->len);
@@ -2343,7 +2344,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
         request->type == NBD_CMD_CACHE)
     {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
-            error_setg(errp, "len (%" PRIu32" ) is larger than max len (%u)",
+            error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
                        request->len, NBD_MAX_BUFFER_SIZE);
             return -EINVAL;
         }
@@ -2359,6 +2360,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
     }

     if (request->type == NBD_CMD_WRITE) {
+        assert(request->len <= NBD_MAX_BUFFER_SIZE);
         if (nbd_read(client->ioc, req->data, request->len, "CMD_WRITE data",
                      errp) < 0)
         {
@@ -2380,7 +2382,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
     }
     if (request->from > client->exp->size ||
         request->len > client->exp->size - request->from) {
-        error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu32
+        error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu64
                    ", Size: %" PRIu64, request->from, request->len,
                    client->exp->size);
         return (request->type == NBD_CMD_WRITE ||
@@ -2443,6 +2445,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request,
     NBDExport *exp = client->exp;

     assert(request->type == NBD_CMD_READ);
+    assert(request->len <= NBD_MAX_BUFFER_SIZE);

     /* XXX: NBD Protocol only documents use of FUA with WRITE */
     if (request->flags & NBD_CMD_FLAG_FUA) {
@@ -2494,6 +2497,7 @@ static coroutine_fn int nbd_do_cmd_cache(NBDClient *client, NBDRequest *request,
     NBDExport *exp = client->exp;

     assert(request->type == NBD_CMD_CACHE);
+    assert(request->len <= NBD_MAX_BUFFER_SIZE);

     ret = blk_co_preadv(exp->common.blk, request->from, request->len,
                         NULL, BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);
@@ -2527,6 +2531,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         if (request->flags & NBD_CMD_FLAG_FUA) {
             flags |= BDRV_REQ_FUA;
         }
+        assert(request->len <= NBD_MAX_BUFFER_SIZE);
         ret = blk_pwrite(exp->common.blk, request->from, request->len, data,
                          flags);
         return nbd_send_generic_reply(client, request->handle, ret,
@@ -2570,6 +2575,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             return nbd_send_generic_reply(client, request->handle, -EINVAL,
                                           "need non-zero length", errp);
         }
+        assert(request->len <= UINT32_MAX);
         if (client->export_meta.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
             int contexts_remaining = client->export_meta.count;
diff --git a/nbd/trace-events b/nbd/trace-events
index b7032ca277..e2c1d68688 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -31,7 +31,7 @@ nbd_client_loop(void) "Doing NBD loop"
 nbd_client_loop_ret(int ret, const char *error) "NBD loop returned %d: %s"
 nbd_client_clear_queue(void) "Clearing NBD queue"
 nbd_client_clear_socket(void) "Clearing NBD socket"
-nbd_send_request(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
+nbd_send_request(uint64_t from, uint64_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu64 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
 nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t handle) "Got simple reply: { .error = %" PRId32 " (%s), handle = %" PRIu64" }"
 nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), handle = %" PRIu64 ", length = %" PRIu32 " }"

@@ -60,7 +60,7 @@ nbd_negotiate_options_check_option(uint32_t option, const char *name) "Checking
 nbd_negotiate_begin(void) "Beginning negotiation"
 nbd_negotiate_new_style_size_flags(uint64_t size, unsigned flags) "advertising size %" PRIu64 " and flags 0x%x"
 nbd_negotiate_success(void) "Negotiation succeeded"
-nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t from, uint32_t len) "Got request: { magic = 0x%" PRIx32 ", .flags = 0x%" PRIx16 ", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len = %" PRIu32 " }"
+nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t from, uint64_t len) "Got request: { magic = 0x%" PRIx32 ", .flags = 0x%" PRIx16 ", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len = %" PRIu64 " }"
 nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching clients to AIO context %p"
 nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching clients from AIO context %p"
 nbd_co_send_simple_reply(uint64_t handle, uint32_t error, const char *errname, int len) "Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 " (%s), len = %d"
@@ -70,8 +70,8 @@ nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset, size_t size)
 nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: handle = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
 nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'"
 nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
-nbd_co_receive_request_payload_received(uint64_t handle, uint32_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu32
-nbd_co_receive_align_compliance(const char *op, uint64_t from, uint32_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx32 ", align=0x%" PRIx32
+nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64
+nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32
 nbd_trip(void) "Reading request"

 # client-connection.c
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 04/15] nbd: Add types for extended headers
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (2 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 03/15] nbd: Prepare for 64-bit request effect lengths Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 05/15] nbd/server: Refactor handling of request payload Eric Blake
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy,
	Kevin Wolf, Hanna Reitz

Add the constants and structs necessary for later patches to start
implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client
and server, matching recent commit XXX[*] in the upstream nbd project.
This patch does not change any existing behavior, but merely sets the
stage.

This patch does not change the status quo that neither the client nor
server use a packed-struct representation for the request header.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
[*]tweak commit message once nbd commit id available
---
 docs/interop/nbd.txt |  1 +
 include/block/nbd.h  | 74 ++++++++++++++++++++++++++++++++------------
 nbd/common.c         | 10 +++++-
 3 files changed, 65 insertions(+), 20 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f5ca25174a..988c072697 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,3 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
+* 7.2: NBD_OPT_EXTENDED_HEADERS
diff --git a/include/block/nbd.h b/include/block/nbd.h
index e357452a57..357121ce76 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -78,13 +78,24 @@ typedef struct NBDStructuredReplyChunk {
     uint32_t length; /* length of payload */
 } QEMU_PACKED NBDStructuredReplyChunk;

+typedef struct NBDExtendedReplyChunk {
+    uint32_t magic;  /* NBD_EXTENDED_REPLY_MAGIC */
+    uint16_t flags;  /* combination of NBD_REPLY_FLAG_* */
+    uint16_t type;   /* NBD_REPLY_TYPE_* */
+    uint64_t handle; /* request handle */
+    uint64_t offset; /* request offset */
+    uint64_t length; /* length of payload */
+} QEMU_PACKED NBDExtendedReplyChunk;
+
 typedef union NBDReply {
     NBDSimpleReply simple;
     NBDStructuredReplyChunk structured;
+    NBDExtendedReplyChunk extended;
     struct {
-        /* @magic and @handle fields have the same offset and size both in
-         * simple reply and structured reply chunk, so let them be accessible
-         * without ".simple." or ".structured." specification
+        /*
+         * @magic and @handle fields have the same offset and size in all
+         * forms of replies, so let them be accessible without ".simple.",
+         * ".structured.", or ".extended." specifications.
          */
         uint32_t magic;
         uint32_t _skip;
@@ -117,15 +128,29 @@ typedef struct NBDStructuredError {
 typedef struct NBDStructuredMeta {
     /* header's length >= 12 (at least one extent) */
     uint32_t context_id;
-    /* extents follows */
+    /* NBDExtent extents[] follows, array length implied by header */
 } QEMU_PACKED NBDStructuredMeta;

-/* Extent chunk for NBD_REPLY_TYPE_BLOCK_STATUS */
+/* Extent array for NBD_REPLY_TYPE_BLOCK_STATUS */
 typedef struct NBDExtent {
     uint32_t length;
     uint32_t flags; /* NBD_STATE_* */
 } QEMU_PACKED NBDExtent;

+/* Header of NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
+typedef struct NBDStructuredMetaExt {
+    /* header's length >= 24 (at least one extent) */
+    uint32_t context_id;
+    uint32_t count; /* header length must be count * 16 + 8 */
+    /* NBDExtentExt extents[count] follows */
+} QEMU_PACKED NBDStructuredMetaExt;
+
+/* Extent array for NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
+typedef struct NBDExtentExt {
+    uint64_t length;
+    uint64_t flags; /* NBD_STATE_* */
+} QEMU_PACKED NBDExtentExt;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 enum {
@@ -178,6 +203,7 @@ enum {
 #define NBD_OPT_STRUCTURED_REPLY  (8)
 #define NBD_OPT_LIST_META_CONTEXT (9)
 #define NBD_OPT_SET_META_CONTEXT  (10)
+#define NBD_OPT_EXTENDED_HEADERS  (11)

 /* Option reply types. */
 #define NBD_REP_ERR(value) ((UINT32_C(1) << 31) | (value))
@@ -195,6 +221,8 @@ enum {
 #define NBD_REP_ERR_UNKNOWN         NBD_REP_ERR(6)  /* Export unknown */
 #define NBD_REP_ERR_SHUTDOWN        NBD_REP_ERR(7)  /* Server shutting down */
 #define NBD_REP_ERR_BLOCK_SIZE_REQD NBD_REP_ERR(8)  /* Need INFO_BLOCK_SIZE */
+#define NBD_REP_ERR_TOO_BIG         NBD_REP_ERR(9)  /* Payload size overflow */
+#define NBD_REP_ERR_EXT_HEADER_REQD NBD_REP_ERR(10) /* Need extended headers */

 /* Info types, used during NBD_REP_INFO */
 #define NBD_INFO_EXPORT         0
@@ -203,12 +231,14 @@ enum {
 #define NBD_INFO_BLOCK_SIZE     3

 /* Request flags, sent from client to server during transmission phase */
-#define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
-#define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */
-#define NBD_CMD_FLAG_DF         (1 << 2) /* don't fragment structured read */
-#define NBD_CMD_FLAG_REQ_ONE    (1 << 3) /* only one extent in BLOCK_STATUS
-                                          * reply chunk */
-#define NBD_CMD_FLAG_FAST_ZERO  (1 << 4) /* fail if WRITE_ZEROES is not fast */
+#define NBD_CMD_FLAG_FUA         (1 << 0) /* 'force unit access' during write */
+#define NBD_CMD_FLAG_NO_HOLE     (1 << 1) /* don't punch hole on zero run */
+#define NBD_CMD_FLAG_DF          (1 << 2) /* don't fragment structured read */
+#define NBD_CMD_FLAG_REQ_ONE     (1 << 3) \
+    /* only one extent in BLOCK_STATUS reply chunk */
+#define NBD_CMD_FLAG_FAST_ZERO   (1 << 4) /* fail if WRITE_ZEROES is not fast */
+#define NBD_CMD_FLAG_PAYLOAD_LEN (1 << 5) \
+    /* length describes payload, not effect; only with ext header */

 /* Supported request types */
 enum {
@@ -234,12 +264,17 @@ enum {
  */
 #define NBD_MAX_STRING_SIZE 4096

-/* Transmission request structure */
+/* Two types of request structures, a given client will only use 1 */
 #define NBD_REQUEST_MAGIC           0x25609513
+#define NBD_EXTENDED_REQUEST_MAGIC  0x21e41c71

-/* Two types of reply structures */
+/*
+ * Three types of reply structures, but what a client expects depends
+ * on NBD_OPT_STRUCTURED_REPLY and NBD_OPT_EXTENDED_HEADERS.
+ */
 #define NBD_SIMPLE_REPLY_MAGIC      0x67446698
 #define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
+#define NBD_EXTENDED_REPLY_MAGIC    0x6e8a278c

 /* Structured reply flags */
 #define NBD_REPLY_FLAG_DONE          (1 << 0) /* This reply-chunk is last */
@@ -247,12 +282,13 @@ enum {
 /* Structured reply types */
 #define NBD_REPLY_ERR(value)         ((1 << 15) | (value))

-#define NBD_REPLY_TYPE_NONE          0
-#define NBD_REPLY_TYPE_OFFSET_DATA   1
-#define NBD_REPLY_TYPE_OFFSET_HOLE   2
-#define NBD_REPLY_TYPE_BLOCK_STATUS  5
-#define NBD_REPLY_TYPE_ERROR         NBD_REPLY_ERR(1)
-#define NBD_REPLY_TYPE_ERROR_OFFSET  NBD_REPLY_ERR(2)
+#define NBD_REPLY_TYPE_NONE              0
+#define NBD_REPLY_TYPE_OFFSET_DATA       1
+#define NBD_REPLY_TYPE_OFFSET_HOLE       2
+#define NBD_REPLY_TYPE_BLOCK_STATUS      5
+#define NBD_REPLY_TYPE_BLOCK_STATUS_EXT  6
+#define NBD_REPLY_TYPE_ERROR             NBD_REPLY_ERR(1)
+#define NBD_REPLY_TYPE_ERROR_OFFSET      NBD_REPLY_ERR(2)

 /* Extent flags for base:allocation in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_HOLE (1 << 0)
diff --git a/nbd/common.c b/nbd/common.c
index ddfe7d1183..137466defd 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -79,6 +79,8 @@ const char *nbd_opt_lookup(uint32_t opt)
         return "list meta context";
     case NBD_OPT_SET_META_CONTEXT:
         return "set meta context";
+    case NBD_OPT_EXTENDED_HEADERS:
+        return "extended headers";
     default:
         return "<unknown>";
     }
@@ -112,6 +114,10 @@ const char *nbd_rep_lookup(uint32_t rep)
         return "server shutting down";
     case NBD_REP_ERR_BLOCK_SIZE_REQD:
         return "block size required";
+    case NBD_REP_ERR_TOO_BIG:
+        return "option payload too big";
+    case NBD_REP_ERR_EXT_HEADER_REQD:
+        return "extended headers required";
     default:
         return "<unknown>";
     }
@@ -170,7 +176,9 @@ const char *nbd_reply_type_lookup(uint16_t type)
     case NBD_REPLY_TYPE_OFFSET_HOLE:
         return "hole";
     case NBD_REPLY_TYPE_BLOCK_STATUS:
-        return "block status";
+        return "block status (32-bit)";
+    case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
+        return "block status (64-bit)";
     case NBD_REPLY_TYPE_ERROR:
         return "generic error";
     case NBD_REPLY_TYPE_ERROR_OFFSET:
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 05/15] nbd/server: Refactor handling of request payload
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (3 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 04/15] nbd: Add types for extended headers Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 06/15] nbd/server: Refactor to pass full request around Eric Blake
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy

Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (up to 63 bits).  Without that extension, only the
NBD_CMD_WRITE request has a payload; but with the extension, it makes
sense to allow at least NBD_CMD_BLOCK_STATUS to have both a payload
and effect length (where the payload is a limited-size struct that in
turns gives the real effect length as well as a subset of known ids
for which status is requested).  Other future NBD commands may also
have a request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.  Note that we do not
support the payload version of BLOCK_STATUS yet.

For this patch, no semantic change is intended for a compliant client.
For a non-compliant client, it is possible that the error behavior
changes (a different message, a change on whether the connection is
killed or remains alive for the next command, or so forth), but all
errors should still be handled gracefully.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c     | 53 ++++++++++++++++++++++++++++++++----------------
 nbd/trace-events |  1 +
 2 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 7738f5f899..ad5c2052b5 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2316,6 +2316,8 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
                                   Error **errp)
 {
     NBDClient *client = req->client;
+    bool extended_with_payload;
+    int payload_len = 0;
     int valid_flags;
     int ret;

@@ -2329,27 +2331,40 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
     trace_nbd_co_receive_request_decode_type(request->handle, request->type,
                                              nbd_cmd_lookup(request->type));

-    if (request->type != NBD_CMD_WRITE) {
-        /* No payload, we are ready to read the next request.  */
-        req->complete = true;
-    }
-
     if (request->type == NBD_CMD_DISC) {
         /* Special case: we're going to disconnect without a reply,
          * whether or not flags, from, or len are bogus */
+        req->complete = true;
         return -EIO;
     }

+    /* Payload and buffer handling. */
+    extended_with_payload = client->extended_headers &&
+        (request->flags & NBD_CMD_FLAG_PAYLOAD_LEN);
     if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
-        request->type == NBD_CMD_CACHE)
-    {
+        request->type == NBD_CMD_CACHE || extended_with_payload) {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
             error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
                        request->len, NBD_MAX_BUFFER_SIZE);
             return -EINVAL;
         }

-        if (request->type != NBD_CMD_CACHE) {
+        if (request->type == NBD_CMD_WRITE || extended_with_payload) {
+            payload_len = request->len;
+            if (request->type != NBD_CMD_WRITE) {
+                /*
+                 * For now, we don't support payloads on other
+                 * commands; but we can keep the connection alive.
+                 */
+                request->len = 0;
+            } else if (client->extended_headers && !extended_with_payload) {
+                /* The client is noncompliant. Trace it, but proceed. */
+                trace_nbd_co_receive_ext_payload_compliance(request->from,
+                                                            request->len);
+            }
+        }
+
+        if (request->type == NBD_CMD_WRITE || request->type == NBD_CMD_READ) {
             req->data = blk_try_blockalign(client->exp->common.blk,
                                            request->len);
             if (req->data == NULL) {
@@ -2359,18 +2374,20 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
         }
     }

-    if (request->type == NBD_CMD_WRITE) {
-        assert(request->len <= NBD_MAX_BUFFER_SIZE);
-        if (nbd_read(client->ioc, req->data, request->len, "CMD_WRITE data",
-                     errp) < 0)
-        {
+    if (payload_len) {
+        if (req->data) {
+            ret = nbd_read(client->ioc, req->data, payload_len,
+                           "CMD_WRITE data", errp);
+        } else {
+            ret = nbd_drop(client->ioc, payload_len, errp);
+        }
+        if (ret < 0) {
             return -EIO;
         }
-        req->complete = true;
-
         trace_nbd_co_receive_request_payload_received(request->handle,
-                                                      request->len);
+                                                      payload_len);
     }
+    req->complete = true;

     /* Sanity checks. */
     if (client->exp->nbdflags & NBD_FLAG_READ_ONLY &&
@@ -2400,7 +2417,9 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
                                               client->check_align);
     }
     valid_flags = NBD_CMD_FLAG_FUA;
-    if (request->type == NBD_CMD_READ && client->structured_reply) {
+    if (request->type == NBD_CMD_WRITE && client->extended_headers) {
+        valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
+    } else if (request->type == NBD_CMD_READ && client->structured_reply) {
         valid_flags |= NBD_CMD_FLAG_DF;
     } else if (request->type == NBD_CMD_WRITE_ZEROES) {
         valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO;
diff --git a/nbd/trace-events b/nbd/trace-events
index e2c1d68688..adf5666e20 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -71,6 +71,7 @@ nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t
 nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'"
 nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
 nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64
+nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
 nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32
 nbd_trip(void) "Reading request"

-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 06/15] nbd/server: Refactor to pass full request around
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (4 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 05/15] nbd/server: Refactor handling of request payload Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 07/15] nbd/server: Initial support for extended headers Eric Blake
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy

Part of NBD's 64-bit headers extension involves passing the client's
requested offset back as part of the reply header (one reason for this
change: converting absolute offsets stored in
NBD_REPLY_TYPE_OFFSET_DATA to relative offsets within the buffer is
easier if the absolute offset of the buffer is also available).  This
is a refactoring patch to pass the full request around the reply
stack, rather than just the handle, so that later patches can then
access request->from when extended headers are active.  But for this
patch, there are no semantic changes.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 109 ++++++++++++++++++++++++++-------------------------
 1 file changed, 55 insertions(+), 54 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index ad5c2052b5..4d1400430b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1890,18 +1890,17 @@ static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
 }

 static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov,
-                                       uint64_t error, uint64_t handle)
+                                       uint64_t error, NBDRequest *request)
 {
     NBDSimpleReply *reply = iov->iov_base;

     iov->iov_len = sizeof(*reply);
     stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
     stl_be_p(&reply->error, error);
-    stq_be_p(&reply->handle, handle);
+    stq_be_p(&reply->handle, request->handle);
 }

-static int nbd_co_send_simple_reply(NBDClient *client,
-                                    uint64_t handle,
+static int nbd_co_send_simple_reply(NBDClient *client, NBDRequest *request,
                                     uint32_t error,
                                     void *data,
                                     size_t len,
@@ -1914,16 +1913,16 @@ static int nbd_co_send_simple_reply(NBDClient *client,
         {.iov_base = data, .iov_len = len}
     };

-    trace_nbd_co_send_simple_reply(handle, nbd_err, nbd_err_lookup(nbd_err),
-                                   len);
-    set_be_simple_reply(client, &iov[0], nbd_err, handle);
+    trace_nbd_co_send_simple_reply(request->handle, nbd_err,
+                                   nbd_err_lookup(nbd_err), len);
+    set_be_simple_reply(client, &iov[0], nbd_err, request);

     return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
 }

 static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
                                 uint16_t flags, uint16_t type,
-                                uint64_t handle, uint32_t length)
+                                NBDRequest *request, uint32_t length)
 {
     NBDStructuredReplyChunk *chunk = iov->iov_base;

@@ -1931,12 +1930,12 @@ static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
     stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
     stw_be_p(&chunk->flags, flags);
     stw_be_p(&chunk->type, type);
-    stq_be_p(&chunk->handle, handle);
+    stq_be_p(&chunk->handle, request->handle);
     stl_be_p(&chunk->length, length);
 }

 static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
-                                                    uint64_t handle,
+                                                    NBDRequest *request,
                                                     Error **errp)
 {
     NBDReply hdr;
@@ -1944,15 +1943,15 @@ static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
         {.iov_base = &hdr},
     };

-    trace_nbd_co_send_structured_done(handle);
+    trace_nbd_co_send_structured_done(request->handle);
     set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE,
-                 NBD_REPLY_TYPE_NONE, handle, 0);
+                 NBD_REPLY_TYPE_NONE, request, 0);

     return nbd_co_send_iov(client, iov, 1, errp);
 }

 static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
-                                                    uint64_t handle,
+                                                    NBDRequest *request,
                                                     uint64_t offset,
                                                     void *data,
                                                     size_t size,
@@ -1968,16 +1967,16 @@ static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
     };

     assert(size);
-    trace_nbd_co_send_structured_read(handle, offset, data, size);
+    trace_nbd_co_send_structured_read(request->handle, offset, data, size);
     set_be_chunk(client, &iov[0], final ? NBD_REPLY_FLAG_DONE : 0,
-                 NBD_REPLY_TYPE_OFFSET_DATA, handle, iov[1].iov_len + size);
+                 NBD_REPLY_TYPE_OFFSET_DATA, request, iov[1].iov_len + size);
     stq_be_p(&chunk.offset, offset);

     return nbd_co_send_iov(client, iov, 3, errp);
 }

 static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
-                                                     uint64_t handle,
+                                                     NBDRequest *request,
                                                      uint32_t error,
                                                      const char *msg,
                                                      Error **errp)
@@ -1992,10 +1991,11 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
     };

     assert(nbd_err);
-    trace_nbd_co_send_structured_error(handle, nbd_err,
+    trace_nbd_co_send_structured_error(request->handle, nbd_err,
                                        nbd_err_lookup(nbd_err), msg ? msg : "");
     set_be_chunk(client, &iov[0], NBD_REPLY_FLAG_DONE,
-                 NBD_REPLY_TYPE_ERROR, handle, iov[1].iov_len + iov[2].iov_len);
+                 NBD_REPLY_TYPE_ERROR, request,
+                 iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.error, nbd_err);
     stw_be_p(&chunk.message_length, iov[2].iov_len);

@@ -2007,7 +2007,7 @@ static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
  * reported to the client, at which point this function succeeds.
  */
 static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
-                                                uint64_t handle,
+                                                NBDRequest *request,
                                                 uint64_t offset,
                                                 uint8_t *data,
                                                 size_t size,
@@ -2029,7 +2029,7 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
             char *msg = g_strdup_printf("unable to check for holes: %s",
                                         strerror(-status));

-            ret = nbd_co_send_structured_error(client, handle, -status, msg,
+            ret = nbd_co_send_structured_error(client, request, -status, msg,
                                                errp);
             g_free(msg);
             return ret;
@@ -2044,12 +2044,12 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
                 {.iov_base = &chunk, .iov_len = sizeof(chunk)},
             };

-            trace_nbd_co_send_structured_read_hole(handle, offset + progress,
+            trace_nbd_co_send_structured_read_hole(request->handle,
+                                                   offset + progress,
                                                    pnum);
             set_be_chunk(client, &iov[0],
                          final ? NBD_REPLY_FLAG_DONE : 0,
-                         NBD_REPLY_TYPE_OFFSET_HOLE,
-                         handle, iov[1].iov_len);
+                         NBD_REPLY_TYPE_OFFSET_HOLE, request, iov[1].iov_len);
             stq_be_p(&chunk.offset, offset + progress);
             stl_be_p(&chunk.length, pnum);
             ret = nbd_co_send_iov(client, iov, 2, errp);
@@ -2060,7 +2060,8 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
                 error_setg_errno(errp, -ret, "reading from file failed");
                 break;
             }
-            ret = nbd_co_send_structured_read(client, handle, offset + progress,
+            ret = nbd_co_send_structured_read(client, request,
+                                              offset + progress,
                                               data + progress, pnum, final,
                                               errp);
         }
@@ -2212,7 +2213,7 @@ static int blockalloc_to_extents(BlockDriverState *bs, uint64_t offset,
  * @ea is converted to BE by the function
  * @last controls whether NBD_REPLY_FLAG_DONE is sent.
  */
-static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
+static int nbd_co_send_extents(NBDClient *client, NBDRequest *request,
                                NBDExtentArray *ea,
                                bool last, uint32_t context_id, Error **errp)
 {
@@ -2226,18 +2227,18 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle,

     nbd_extent_array_convert_to_be(ea);

-    trace_nbd_co_send_extents(handle, ea->count, context_id, ea->total_length,
-                              last);
+    trace_nbd_co_send_extents(request->handle, ea->count, context_id,
+                              ea->total_length, last);
     set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0,
                  NBD_REPLY_TYPE_BLOCK_STATUS,
-                 handle, iov[1].iov_len + iov[2].iov_len);
+                 request, iov[1].iov_len + iov[2].iov_len);
     stl_be_p(&chunk.context_id, context_id);

     return nbd_co_send_iov(client, iov, 3, errp);
 }

 /* Get block status from the exported device and send it to the client */
-static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
+static int nbd_co_send_block_status(NBDClient *client, NBDRequest *request,
                                     BlockDriverState *bs, uint64_t offset,
                                     uint32_t length, bool dont_fragment,
                                     bool last, uint32_t context_id,
@@ -2254,10 +2255,10 @@ static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
     }
     if (ret < 0) {
         return nbd_co_send_structured_error(
-                client, handle, -ret, "can't get block status", errp);
+                client, request, -ret, "can't get block status", errp);
     }

-    return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
+    return nbd_co_send_extents(client, request, ea, last, context_id, errp);
 }

 /* Populate @ea from a dirty bitmap. */
@@ -2292,7 +2293,7 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
     bdrv_dirty_bitmap_unlock(bitmap);
 }

-static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
+static int nbd_co_send_bitmap(NBDClient *client, NBDRequest *request,
                               BdrvDirtyBitmap *bitmap, uint64_t offset,
                               uint32_t length, bool dont_fragment, bool last,
                               uint32_t context_id, Error **errp)
@@ -2302,7 +2303,7 @@ static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,

     bitmap_to_extents(bitmap, offset, length, ea);

-    return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
+    return nbd_co_send_extents(client, request, ea, last, context_id, errp);
 }

 /* nbd_co_receive_request
@@ -2440,16 +2441,16 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
  * Returns 0 if connection is still live, -errno on failure to talk to client
  */
 static coroutine_fn int nbd_send_generic_reply(NBDClient *client,
-                                               uint64_t handle,
+                                               NBDRequest *request,
                                                int ret,
                                                const char *error_msg,
                                                Error **errp)
 {
     if (client->structured_reply && ret < 0) {
-        return nbd_co_send_structured_error(client, handle, -ret, error_msg,
+        return nbd_co_send_structured_error(client, request, -ret, error_msg,
                                             errp);
     } else {
-        return nbd_co_send_simple_reply(client, handle, ret < 0 ? -ret : 0,
+        return nbd_co_send_simple_reply(client, request, ret < 0 ? -ret : 0,
                                         NULL, 0, errp);
     }
 }
@@ -2470,7 +2471,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request,
     if (request->flags & NBD_CMD_FLAG_FUA) {
         ret = blk_co_flush(exp->common.blk);
         if (ret < 0) {
-            return nbd_send_generic_reply(client, request->handle, ret,
+            return nbd_send_generic_reply(client, request, ret,
                                           "flush failed", errp);
         }
     }
@@ -2478,26 +2479,26 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request,
     if (client->structured_reply && !(request->flags & NBD_CMD_FLAG_DF) &&
         request->len)
     {
-        return nbd_co_send_sparse_read(client, request->handle, request->from,
+        return nbd_co_send_sparse_read(client, request, request->from,
                                        data, request->len, errp);
     }

     ret = blk_pread(exp->common.blk, request->from, request->len, data, 0);
     if (ret < 0) {
-        return nbd_send_generic_reply(client, request->handle, ret,
+        return nbd_send_generic_reply(client, request, ret,
                                       "reading from file failed", errp);
     }

     if (client->structured_reply) {
         if (request->len) {
-            return nbd_co_send_structured_read(client, request->handle,
+            return nbd_co_send_structured_read(client, request,
                                                request->from, data,
                                                request->len, true, errp);
         } else {
-            return nbd_co_send_structured_done(client, request->handle, errp);
+            return nbd_co_send_structured_done(client, request, errp);
         }
     } else {
-        return nbd_co_send_simple_reply(client, request->handle, 0,
+        return nbd_co_send_simple_reply(client, request, 0,
                                         data, request->len, errp);
     }
 }
@@ -2521,7 +2522,7 @@ static coroutine_fn int nbd_do_cmd_cache(NBDClient *client, NBDRequest *request,
     ret = blk_co_preadv(exp->common.blk, request->from, request->len,
                         NULL, BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);

-    return nbd_send_generic_reply(client, request->handle, ret,
+    return nbd_send_generic_reply(client, request, ret,
                                   "caching data failed", errp);
 }

@@ -2553,7 +2554,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         assert(request->len <= NBD_MAX_BUFFER_SIZE);
         ret = blk_pwrite(exp->common.blk, request->from, request->len, data,
                          flags);
-        return nbd_send_generic_reply(client, request->handle, ret,
+        return nbd_send_generic_reply(client, request, ret,
                                       "writing to file failed", errp);

     case NBD_CMD_WRITE_ZEROES:
@@ -2569,7 +2570,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         }
         ret = blk_pwrite_zeroes(exp->common.blk, request->from, request->len,
                                 flags);
-        return nbd_send_generic_reply(client, request->handle, ret,
+        return nbd_send_generic_reply(client, request, ret,
                                       "writing to file failed", errp);

     case NBD_CMD_DISC:
@@ -2578,7 +2579,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,

     case NBD_CMD_FLUSH:
         ret = blk_co_flush(exp->common.blk);
-        return nbd_send_generic_reply(client, request->handle, ret,
+        return nbd_send_generic_reply(client, request, ret,
                                       "flush failed", errp);

     case NBD_CMD_TRIM:
@@ -2586,12 +2587,12 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         if (ret >= 0 && request->flags & NBD_CMD_FLAG_FUA) {
             ret = blk_co_flush(exp->common.blk);
         }
-        return nbd_send_generic_reply(client, request->handle, ret,
+        return nbd_send_generic_reply(client, request, ret,
                                       "discard failed", errp);

     case NBD_CMD_BLOCK_STATUS:
         if (!request->len) {
-            return nbd_send_generic_reply(client, request->handle, -EINVAL,
+            return nbd_send_generic_reply(client, request, -EINVAL,
                                           "need non-zero length", errp);
         }
         assert(request->len <= UINT32_MAX);
@@ -2600,7 +2601,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             int contexts_remaining = client->export_meta.count;

             if (client->export_meta.base_allocation) {
-                ret = nbd_co_send_block_status(client, request->handle,
+                ret = nbd_co_send_block_status(client, request,
                                                blk_bs(exp->common.blk),
                                                request->from,
                                                request->len, dont_fragment,
@@ -2613,7 +2614,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             }

             if (client->export_meta.allocation_depth) {
-                ret = nbd_co_send_block_status(client, request->handle,
+                ret = nbd_co_send_block_status(client, request,
                                                blk_bs(exp->common.blk),
                                                request->from, request->len,
                                                dont_fragment,
@@ -2629,7 +2630,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                 if (!client->export_meta.bitmaps[i]) {
                     continue;
                 }
-                ret = nbd_co_send_bitmap(client, request->handle,
+                ret = nbd_co_send_bitmap(client, request,
                                          client->exp->export_bitmaps[i],
                                          request->from, request->len,
                                          dont_fragment, !--contexts_remaining,
@@ -2643,7 +2644,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,

             return 0;
         } else {
-            return nbd_send_generic_reply(client, request->handle, -EINVAL,
+            return nbd_send_generic_reply(client, request, -EINVAL,
                                           "CMD_BLOCK_STATUS not negotiated",
                                           errp);
         }
@@ -2651,7 +2652,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
     default:
         msg = g_strdup_printf("invalid request type (%" PRIu32 ") received",
                               request->type);
-        ret = nbd_send_generic_reply(client, request->handle, -EINVAL, msg,
+        ret = nbd_send_generic_reply(client, request, -EINVAL, msg,
                                      errp);
         g_free(msg);
         return ret;
@@ -2712,7 +2713,7 @@ static coroutine_fn void nbd_trip(void *opaque)
         Error *export_err = local_err;

         local_err = NULL;
-        ret = nbd_send_generic_reply(client, request.handle, -EINVAL,
+        ret = nbd_send_generic_reply(client, &request, -EINVAL,
                                      error_get_pretty(export_err), &local_err);
         error_free(export_err);
     } else {
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 07/15] nbd/server: Initial support for extended headers
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (5 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 06/15] nbd/server: Refactor to pass full request around Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 08/15] nbd/server: Support 64-bit block status Eric Blake
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy

Even though the NBD spec has been altered to allow us to accept
NBD_CMD_READ larger than the max payload size (provided our response
is a hole or broken up over more than one data chunk), we are not
planning to take advantage of that, and continue to cap NBD_CMD_READ
to 32M regardless of header size.

For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already
supports 64-bit operations without any effort on our part.  For
NBD_CMD_BLOCK_STATUS, the client's length is a hint; the easiest
approach for now is to truncate our answer back to 32 bits, which lets
us delay the effort of implementing NBD_REPLY_TYPE_BLOCK_STATUS_EXT to
a separate patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/nbd-internal.h |   7 ++-
 nbd/server.c       | 132 +++++++++++++++++++++++++++++++++++----------
 2 files changed, 108 insertions(+), 31 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 0016793ff4..f9fe0b6ce3 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -1,7 +1,7 @@
 /*
  * NBD Internal Declarations
  *
- * Copyright (C) 2016-2021 Red Hat, Inc.
+ * Copyright (C) 2016-2022 Red Hat, Inc.
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -35,8 +35,11 @@
  * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

-/* Size of all NBD_OPT_*, without payload */
+/* Size of all compact NBD_CMD_*, without payload */
 #define NBD_REQUEST_SIZE            (4 + 2 + 2 + 8 + 8 + 4)
+/* Size of all extended NBD_CMD_*, without payload */
+#define NBD_EXTENDED_REQUEST_SIZE   (4 + 2 + 2 + 8 + 8 + 8)
+
 /* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
 #define NBD_REPLY_SIZE              (4 + 4 + 8)
 /* Size of reply to NBD_OPT_EXPORT_NAME */
diff --git a/nbd/server.c b/nbd/server.c
index 4d1400430b..b46655b4d8 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -141,7 +141,7 @@ struct NBDClient {

     uint32_t check_align; /* If non-zero, check for aligned client requests */

-    bool structured_reply;
+    bool structured_reply; /* also set true if extended_headers is set */
     bool extended_headers;
     NBDExportMetaContexts export_meta;

@@ -1260,6 +1260,10 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
             case NBD_OPT_STRUCTURED_REPLY:
                 if (length) {
                     ret = nbd_reject_length(client, false, errp);
+                } else if (client->extended_headers) {
+                    ret = nbd_negotiate_send_rep_err(
+                        client, NBD_REP_ERR_EXT_HEADER_REQD, errp,
+                        "extended headers already negotiated");
                 } else if (client->structured_reply) {
                     ret = nbd_negotiate_send_rep_err(
                         client, NBD_REP_ERR_INVALID, errp,
@@ -1276,6 +1280,19 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
                                                  errp);
                 break;

+            case NBD_OPT_EXTENDED_HEADERS:
+                if (length) {
+                    ret = nbd_reject_length(client, false, errp);
+                } else if (client->extended_headers) {
+                    ret = nbd_negotiate_send_rep_err(
+                        client, NBD_REP_ERR_INVALID, errp,
+                        "extended headers already negotiated");
+                } else {
+                    ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
+                    client->structured_reply = client->extended_headers = true;
+                }
+                break;
+
             default:
                 ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
                                    "Unsupported option %" PRIu32 " (%s)",
@@ -1411,11 +1428,13 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
 static int nbd_receive_request(NBDClient *client, NBDRequest *request,
                                Error **errp)
 {
-    uint8_t buf[NBD_REQUEST_SIZE];
-    uint32_t magic;
+    uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+    uint32_t magic, expect;
     int ret;
+    size_t size = client->extended_headers ? NBD_EXTENDED_REQUEST_SIZE
+        : NBD_REQUEST_SIZE;

-    ret = nbd_read_eof(client, buf, sizeof(buf), errp);
+    ret = nbd_read_eof(client, buf, size, errp);
     if (ret < 0) {
         return ret;
     }
@@ -1423,13 +1442,21 @@ static int nbd_receive_request(NBDClient *client, NBDRequest *request,
         return -EIO;
     }

-    /* Request
-       [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
-       [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
-       [ 6 ..  7]   type    (NBD_CMD_READ, ...)
-       [ 8 .. 15]   handle
-       [16 .. 23]   from
-       [24 .. 27]   len
+    /*
+     * Compact request
+     *  [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
+     *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+     *  [ 6 ..  7]   type    (NBD_CMD_READ, ...)
+     *  [ 8 .. 15]   handle
+     *  [16 .. 23]   from
+     *  [24 .. 27]   len
+     * Extended request
+     *  [ 0 ..  3]   magic   (NBD_EXTENDED_REQUEST_MAGIC)
+     *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, NBD_CMD_FLAG_PAYLOAD_LEN, ...)
+     *  [ 6 ..  7]   type    (NBD_CMD_READ, ...)
+     *  [ 8 .. 15]   handle
+     *  [16 .. 23]   from
+     *  [24 .. 31]   len
      */

     magic = ldl_be_p(buf);
@@ -1437,12 +1464,18 @@ static int nbd_receive_request(NBDClient *client, NBDRequest *request,
     request->type   = lduw_be_p(buf + 6);
     request->handle = ldq_be_p(buf + 8);
     request->from   = ldq_be_p(buf + 16);
-    request->len    = ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+    if (client->extended_headers) {
+        request->len = ldq_be_p(buf + 24);
+        expect = NBD_EXTENDED_REQUEST_MAGIC;
+    } else {
+        request->len = ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+        expect = NBD_REQUEST_MAGIC;
+    }

     trace_nbd_receive_request(magic, request->flags, request->type,
                               request->from, request->len);

-    if (magic != NBD_REQUEST_MAGIC) {
+    if (magic != expect) {
         error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
@@ -1890,14 +1923,37 @@ static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
 }

 static inline void set_be_simple_reply(NBDClient *client, struct iovec *iov,
-                                       uint64_t error, NBDRequest *request)
+                                       uint32_t error, NBDStructuredError *err,
+                                       NBDRequest *request)
 {
-    NBDSimpleReply *reply = iov->iov_base;
+    if (client->extended_headers) {
+        NBDExtendedReplyChunk *chunk = iov->iov_base;

-    iov->iov_len = sizeof(*reply);
-    stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
-    stl_be_p(&reply->error, error);
-    stq_be_p(&reply->handle, request->handle);
+        iov->iov_len = sizeof(*chunk);
+        stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC);
+        stw_be_p(&chunk->flags, NBD_REPLY_FLAG_DONE);
+        stq_be_p(&chunk->handle, request->handle);
+        stq_be_p(&chunk->offset, request->from);
+        if (error) {
+            assert(!iov[1].iov_base);
+            iov[1].iov_base = err;
+            iov[1].iov_len = sizeof(*err);
+            stw_be_p(&chunk->type, NBD_REPLY_TYPE_ERROR);
+            stq_be_p(&chunk->length, sizeof(*err));
+            stl_be_p(&err->error, error);
+            stw_be_p(&err->message_length, 0);
+        } else {
+            stw_be_p(&chunk->type, NBD_REPLY_TYPE_NONE);
+            stq_be_p(&chunk->length, 0);
+        }
+    } else {
+        NBDSimpleReply *reply = iov->iov_base;
+
+        iov->iov_len = sizeof(*reply);
+        stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
+        stl_be_p(&reply->error, error);
+        stq_be_p(&reply->handle, request->handle);
+    }
 }

 static int nbd_co_send_simple_reply(NBDClient *client, NBDRequest *request,
@@ -1908,30 +1964,44 @@ static int nbd_co_send_simple_reply(NBDClient *client, NBDRequest *request,
 {
     NBDReply hdr;
     int nbd_err = system_errno_to_nbd_errno(error);
+    NBDStructuredError err;
     struct iovec iov[] = {
         {.iov_base = &hdr},
         {.iov_base = data, .iov_len = len}
     };

+    assert(!len || !nbd_err);
     trace_nbd_co_send_simple_reply(request->handle, nbd_err,
                                    nbd_err_lookup(nbd_err), len);
-    set_be_simple_reply(client, &iov[0], nbd_err, request);
+    set_be_simple_reply(client, &iov[0], nbd_err, &err, request);

-    return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
+    return nbd_co_send_iov(client, iov, iov[1].iov_len ? 2 : 1, errp);
 }

 static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
                                 uint16_t flags, uint16_t type,
                                 NBDRequest *request, uint32_t length)
 {
-    NBDStructuredReplyChunk *chunk = iov->iov_base;
+    if (client->extended_headers) {
+        NBDExtendedReplyChunk *chunk = iov->iov_base;

-    iov->iov_len = sizeof(*chunk);
-    stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
-    stw_be_p(&chunk->flags, flags);
-    stw_be_p(&chunk->type, type);
-    stq_be_p(&chunk->handle, request->handle);
-    stl_be_p(&chunk->length, length);
+        iov->iov_len = sizeof(*chunk);
+        stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC);
+        stw_be_p(&chunk->flags, flags);
+        stw_be_p(&chunk->type, type);
+        stq_be_p(&chunk->handle, request->handle);
+        stq_be_p(&chunk->offset, request->from);
+        stq_be_p(&chunk->length, length);
+    } else {
+        NBDStructuredReplyChunk *chunk = iov->iov_base;
+
+        iov->iov_len = sizeof(*chunk);
+        stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
+        stw_be_p(&chunk->flags, flags);
+        stw_be_p(&chunk->type, type);
+        stq_be_p(&chunk->handle, request->handle);
+        stl_be_p(&chunk->length, length);
+    }
 }

 static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
@@ -2595,7 +2665,11 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             return nbd_send_generic_reply(client, request, -EINVAL,
                                           "need non-zero length", errp);
         }
-        assert(request->len <= UINT32_MAX);
+        if (request->len > UINT32_MAX) {
+            /* For now, truncate our response to a 32 bit window */
+            request->len = QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
+                                           client->check_align ?: 1);
+        }
         if (client->export_meta.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
             int contexts_remaining = client->export_meta.count;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 08/15] nbd/server: Support 64-bit block status
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (6 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 07/15] nbd/server: Initial support for extended headers Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 09/15] nbd/client: Initial support for extended headers Eric Blake
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy

The previous patch handled extended headers by truncating large block
status requests from the client back to 32 bits.  But this is not
ideal; for cases where we can truly determine the status of the entire
image quickly (for example, when reporting the entire image as
non-sparse because we lack the ability to probe for holes), this
causes more network traffic for the client to iterate through 4G
chunks than for us to just report the entire image at once.  For ease
of implementation, if extended headers were negotiated, then we always
reply with 64-bit block status replies, even when the result could
have fit in the older 32-bit block status chunk (clients supporting
extended headers have to be prepared for either chunk type, so
temporarily reverting this patch proves whether a client is
compliant).

For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.

Note that we previously had some interesting size-juggling on call
chains, such as:

nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
  -> bdrv_block_status_above(bytes, &uint64_t num)
  -> nbd_extent_array_add(uint64_t num)
    -> store num in 32-bit length

But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (either by the protocol, or in the previous patch
with an explicit truncation).  This patch adds some assertions that
ensure we continue to avoid overflowing 32 bits for a narrow client,
while fully utilizing 64-bits all the way through when the client
understands that.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 86 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 59 insertions(+), 27 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index b46655b4d8..f21f8098c1 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2145,20 +2145,30 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
 }

 typedef struct NBDExtentArray {
-    NBDExtent *extents;
+    union {
+        NBDStructuredMeta id;
+        NBDStructuredMetaExt meta;
+    };
+    union {
+        NBDExtent *narrow;
+        NBDExtentExt *extents;
+    };
     unsigned int nb_alloc;
     unsigned int count;
     uint64_t total_length;
+    bool extended; /* Whether 64-bit extents are allowed */
     bool can_add;
     bool converted_to_be;
 } NBDExtentArray;

-static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc)
+static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc,
+                                            bool extended)
 {
     NBDExtentArray *ea = g_new0(NBDExtentArray, 1);

     ea->nb_alloc = nb_alloc;
-    ea->extents = g_new(NBDExtent, nb_alloc);
+    ea->extents = g_new(NBDExtentExt, nb_alloc);
+    ea->extended = extended;
     ea->can_add = true;

     return ea;
@@ -2172,17 +2182,37 @@ static void nbd_extent_array_free(NBDExtentArray *ea)
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(NBDExtentArray, nbd_extent_array_free)

 /* Further modifications of the array after conversion are abandoned */
-static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
+static void nbd_extent_array_convert_to_be(NBDExtentArray *ea,
+                                           uint32_t context_id,
+                                           struct iovec *iov)
 {
     int i;

     assert(!ea->converted_to_be);
+    assert(iov[0].iov_base == &ea->meta);
+    assert(iov[1].iov_base == ea->extents);
     ea->can_add = false;
     ea->converted_to_be = true;

-    for (i = 0; i < ea->count; i++) {
-        ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
-        ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
+    stl_be_p(&ea->meta.context_id, context_id);
+    if (ea->extended) {
+        stl_be_p(&ea->meta.count, ea->count);
+        for (i = 0; i < ea->count; i++) {
+            ea->extents[i].length = cpu_to_be64(ea->extents[i].length);
+            ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags);
+        }
+        iov[0].iov_len = sizeof(ea->meta);
+        iov[1].iov_len = ea->count * sizeof(ea->extents[0]);
+    } else {
+        /* Conversion reduces memory usage, order of iteration matters */
+        for (i = 0; i < ea->count; i++) {
+            assert(ea->extents[i].length <= UINT32_MAX);
+            assert((uint32_t) ea->extents[i].flags == ea->extents[i].flags);
+            ea->narrow[i].length = cpu_to_be32(ea->extents[i].length);
+            ea->narrow[i].flags = cpu_to_be32(ea->extents[i].flags);
+        }
+        iov[0].iov_len = sizeof(ea->id);
+        iov[1].iov_len = ea->count * sizeof(ea->narrow[0]);
     }
 }

@@ -2196,19 +2226,23 @@ static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
  * would result in an incorrect range reported to the client)
  */
 static int nbd_extent_array_add(NBDExtentArray *ea,
-                                uint32_t length, uint32_t flags)
+                                uint64_t length, uint32_t flags)
 {
     assert(ea->can_add);

     if (!length) {
         return 0;
     }
+    if (!ea->extended) {
+        assert(length <= UINT32_MAX);
+    }

     /* Extend previous extent if flags are the same */
     if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) {
-        uint64_t sum = (uint64_t)length + ea->extents[ea->count - 1].length;
+        uint64_t sum = length + ea->extents[ea->count - 1].length;

-        if (sum <= UINT32_MAX) {
+        assert(sum >= length);
+        if (sum <= UINT32_MAX || ea->extended) {
             ea->extents[ea->count - 1].length = sum;
             ea->total_length += length;
             return 0;
@@ -2221,7 +2255,7 @@ static int nbd_extent_array_add(NBDExtentArray *ea,
     }

     ea->total_length += length;
-    ea->extents[ea->count] = (NBDExtent) {.length = length, .flags = flags};
+    ea->extents[ea->count] = (NBDExtentExt) {.length = length, .flags = flags};
     ea->count++;

     return 0;
@@ -2288,21 +2322,20 @@ static int nbd_co_send_extents(NBDClient *client, NBDRequest *request,
                                bool last, uint32_t context_id, Error **errp)
 {
     NBDReply hdr;
-    NBDStructuredMeta chunk;
     struct iovec iov[] = {
         {.iov_base = &hdr},
-        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
-        {.iov_base = ea->extents, .iov_len = ea->count * sizeof(ea->extents[0])}
+        {.iov_base = &ea->meta},
+        {.iov_base = ea->extents}
     };

-    nbd_extent_array_convert_to_be(ea);
+    nbd_extent_array_convert_to_be(ea, context_id, &iov[1]);

     trace_nbd_co_send_extents(request->handle, ea->count, context_id,
                               ea->total_length, last);
     set_be_chunk(client, &iov[0], last ? NBD_REPLY_FLAG_DONE : 0,
-                 NBD_REPLY_TYPE_BLOCK_STATUS,
+                 client->extended_headers ? NBD_REPLY_TYPE_BLOCK_STATUS_EXT
+                 : NBD_REPLY_TYPE_BLOCK_STATUS,
                  request, iov[1].iov_len + iov[2].iov_len);
-    stl_be_p(&chunk.context_id, context_id);

     return nbd_co_send_iov(client, iov, 3, errp);
 }
@@ -2310,13 +2343,14 @@ static int nbd_co_send_extents(NBDClient *client, NBDRequest *request,
 /* Get block status from the exported device and send it to the client */
 static int nbd_co_send_block_status(NBDClient *client, NBDRequest *request,
                                     BlockDriverState *bs, uint64_t offset,
-                                    uint32_t length, bool dont_fragment,
+                                    uint64_t length, bool dont_fragment,
                                     bool last, uint32_t context_id,
                                     Error **errp)
 {
     int ret;
     unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
-    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
+    g_autoptr(NBDExtentArray) ea =
+        nbd_extent_array_new(nb_extents, client->extended_headers);

     if (context_id == NBD_META_ID_BASE_ALLOCATION) {
         ret = blockstatus_to_extents(bs, offset, length, ea);
@@ -2343,7 +2377,8 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
     bdrv_dirty_bitmap_lock(bitmap);

     for (start = offset;
-         bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end, INT32_MAX,
+         bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end,
+                                           es->extended ? INT64_MAX : INT32_MAX,
                                            &dirty_start, &dirty_count);
          start = dirty_start + dirty_count)
     {
@@ -2365,11 +2400,12 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,

 static int nbd_co_send_bitmap(NBDClient *client, NBDRequest *request,
                               BdrvDirtyBitmap *bitmap, uint64_t offset,
-                              uint32_t length, bool dont_fragment, bool last,
+                              uint64_t length, bool dont_fragment, bool last,
                               uint32_t context_id, Error **errp)
 {
     unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
-    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
+    g_autoptr(NBDExtentArray) ea =
+        nbd_extent_array_new(nb_extents, client->extended_headers);

     bitmap_to_extents(bitmap, offset, length, ea);

@@ -2665,11 +2701,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             return nbd_send_generic_reply(client, request, -EINVAL,
                                           "need non-zero length", errp);
         }
-        if (request->len > UINT32_MAX) {
-            /* For now, truncate our response to a 32 bit window */
-            request->len = QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
-                                           client->check_align ?: 1);
-        }
+        assert(client->extended_headers || request->len <= UINT32_MAX);
         if (client->export_meta.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
             int contexts_remaining = client->export_meta.count;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 09/15] nbd/client: Initial support for extended headers
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (7 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 08/15] nbd/server: Support 64-bit block status Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 10/15] nbd/client: Accept 64-bit block status chunks Eric Blake
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy,
	Kevin Wolf, Hanna Reitz

Update the client code to be able to send an extended request, and
parse an extended header from the server.  Note that since we reject
any structured reply with a too-large payload, we can always normalize
a valid header back into the compact form, so that the caller need not
deal with two branches of a union.  Still, until a later patch lets
the client negotiate extended headers, the code added here should not
be reached.  Note that because of the different magic numbers, it is
just as easy to trace and then tolerate a non-compliant server sending
the wrong header reply as it would be to insist that the server is
compliant.

The only caller to nbd_receive_reply() always passed NULL for errp;
since we are changing the signature anyways, I decided to sink the
decision to ignore errors one layer lower.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  2 +-
 block/nbd.c         |  3 +-
 nbd/client.c        | 84 ++++++++++++++++++++++++++++++---------------
 nbd/trace-events    |  1 +
 4 files changed, 61 insertions(+), 29 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 357121ce76..02e31b2261 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -363,7 +363,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr);
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp);
+                                   NBDReply *reply, bool ext_hdrs);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd.c b/block/nbd.c
index 32681d2867..a8b1bc1054 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -457,7 +457,8 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)

         /* We are under mutex and handle is 0. We have to do the dirty work. */
         assert(s->reply.handle == 0);
-        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, NULL);
+        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply,
+                                s->info.extended_headers);
         if (ret <= 0) {
             ret = ret ? ret : -EIO;
             nbd_channel_error(s, ret);
diff --git a/nbd/client.c b/nbd/client.c
index 2480a48ec6..70f06ce637 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1348,22 +1348,28 @@ int nbd_disconnect(int fd)

 int nbd_send_request(QIOChannel *ioc, NBDRequest *request, bool ext_hdr)
 {
-    uint8_t buf[NBD_REQUEST_SIZE];
+    uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+    size_t len;

-    assert(!ext_hdr);
-    assert(request->len <= UINT32_MAX);
     trace_nbd_send_request(request->from, request->len, request->handle,
                            request->flags, request->type,
                            nbd_cmd_lookup(request->type));

-    stl_be_p(buf, NBD_REQUEST_MAGIC);
+    stl_be_p(buf, ext_hdr ? NBD_EXTENDED_REQUEST_MAGIC : NBD_REQUEST_MAGIC);
     stw_be_p(buf + 4, request->flags);
     stw_be_p(buf + 6, request->type);
     stq_be_p(buf + 8, request->handle);
     stq_be_p(buf + 16, request->from);
-    stl_be_p(buf + 24, request->len);
+    if (ext_hdr) {
+        stq_be_p(buf + 24, request->len);
+        len = NBD_EXTENDED_REQUEST_SIZE;
+    } else {
+        assert(request->len <= UINT32_MAX);
+        stl_be_p(buf + 24, request->len);
+        len = NBD_REQUEST_SIZE;
+    }

-    return nbd_write(ioc, buf, sizeof(buf), NULL);
+    return nbd_write(ioc, buf, len, NULL);
 }

 /* nbd_receive_simple_reply
@@ -1392,28 +1398,34 @@ static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply,

 /* nbd_receive_structured_reply_chunk
  * Read structured reply chunk except magic field (which should be already
- * read).
+ * read).  Normalize into the compact form.
  * Payload is not read.
  */
-static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
-                                              NBDStructuredReplyChunk *chunk,
+static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply *chunk,
                                               Error **errp)
 {
     int ret;
+    size_t len;
+    uint64_t payload_len;

-    assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
+    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+        len = sizeof(chunk->structured);
+    } else {
+        assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC);
+        len = sizeof(chunk->extended);
+    }

     ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
-                   sizeof(*chunk) - sizeof(chunk->magic), "structured chunk",
+                   len - sizeof(chunk->magic), "structured chunk",
                    errp);
     if (ret < 0) {
         return ret;
     }

-    chunk->flags = be16_to_cpu(chunk->flags);
-    chunk->type = be16_to_cpu(chunk->type);
-    chunk->handle = be64_to_cpu(chunk->handle);
-    chunk->length = be32_to_cpu(chunk->length);
+    /* flags, type, and handle occupy same space between forms */
+    chunk->structured.flags = be16_to_cpu(chunk->structured.flags);
+    chunk->structured.type = be16_to_cpu(chunk->structured.type);
+    chunk->structured.handle = be64_to_cpu(chunk->structured.handle);

     /*
      * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests
@@ -1421,11 +1433,20 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
      * this.  Even if we stopped using REQ_ONE, sane servers will cap
      * the number of extents they return for block status.
      */
-    if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
+    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+        payload_len = be32_to_cpu(chunk->structured.length);
+    } else {
+        /* For now, we are ignoring the extended header offset. */
+        payload_len = be64_to_cpu(chunk->extended.length);
+        chunk->magic = NBD_STRUCTURED_REPLY_MAGIC;
+    }
+    if (payload_len > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
         error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long",
-                   chunk->type, nbd_rep_lookup(chunk->type));
+                   chunk->structured.type,
+                   nbd_rep_lookup(chunk->structured.type));
         return -EINVAL;
     }
+    chunk->structured.length = payload_len;

     return 0;
 }
@@ -1472,30 +1493,35 @@ nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,

 /* nbd_receive_reply
  *
- * Decreases bs->in_flight while waiting for a new reply. This yield is where
- * we wait indefinitely and the coroutine must be able to be safely reentered
- * for nbd_client_attach_aio_context().
+ * Wait for a new reply. If this yields, the coroutine must be able to be
+ * safely reentered for nbd_client_attach_aio_context().  @ext_hdrs determines
+ * which reply magic we are expecting, although this normalizes the result
+ * so that the caller only has to work with compact headers.
  *
  * Returns 1 on success
- *         0 on eof, when no data was read (errp is not set)
- *         negative errno on failure (errp is set)
+ *         0 on eof, when no data was read
+ *         negative errno on failure
  */
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp)
+                                   NBDReply *reply, bool ext_hdrs)
 {
     int ret;
     const char *type;

-    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp);
+    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), NULL);
     if (ret <= 0) {
         return ret;
     }

     reply->magic = be32_to_cpu(reply->magic);

+    /* Diagnose but accept wrong-width header */
     switch (reply->magic) {
     case NBD_SIMPLE_REPLY_MAGIC:
-        ret = nbd_receive_simple_reply(ioc, &reply->simple, errp);
+        if (ext_hdrs) {
+            trace_nbd_receive_wrong_header(reply->magic);
+        }
+        ret = nbd_receive_simple_reply(ioc, &reply->simple, NULL);
         if (ret < 0) {
             break;
         }
@@ -1504,7 +1530,11 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
                                        reply->handle);
         break;
     case NBD_STRUCTURED_REPLY_MAGIC:
-        ret = nbd_receive_structured_reply_chunk(ioc, &reply->structured, errp);
+    case NBD_EXTENDED_REPLY_MAGIC:
+        if (ext_hdrs != (reply->magic == NBD_EXTENDED_REPLY_MAGIC)) {
+            trace_nbd_receive_wrong_header(reply->magic);
+        }
+        ret = nbd_receive_structured_reply_chunk(ioc, reply, NULL);
         if (ret < 0) {
             break;
         }
@@ -1515,7 +1545,7 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
                                                  reply->structured.length);
         break;
     default:
-        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", reply->magic);
+        trace_nbd_receive_wrong_header(reply->magic);
         return -EINVAL;
     }
     if (ret < 0) {
diff --git a/nbd/trace-events b/nbd/trace-events
index adf5666e20..c20df33a43 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -34,6 +34,7 @@ nbd_client_clear_socket(void) "Clearing NBD socket"
 nbd_send_request(uint64_t from, uint64_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu64 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
 nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t handle) "Got simple reply: { .error = %" PRId32 " (%s), handle = %" PRIu64" }"
 nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), handle = %" PRIu64 ", length = %" PRIu32 " }"
+nbd_receive_wrong_header(uint32_t magic) "Server sent unexpected magic 0x%" PRIx32

 # common.c
 nbd_unknown_error(int err) "Squashing unexpected error %d to EINVAL"
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 10/15] nbd/client: Accept 64-bit block status chunks
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (8 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 09/15] nbd/client: Initial support for extended headers Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 11/15] nbd/client: Request extended headers during negotiation Eric Blake
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy,
	Kevin Wolf, Hanna Reitz

Because we use NBD_CMD_FLAG_REQ_ONE with NBD_CMD_BLOCK_STATUS, a
client in narrow mode should not be able to provoke a server into
sending a block status result larger than the client's 32-bit request.
But in extended mode, a 64-bit status request must be able to handle a
64-bit status result, once a future patch enables the client
requesting extended mode.  We can also tolerate a non-compliant server
sending the new chunk even when it should not.

In normal execution, we are only requesting "base:allocation" which
never exceeds 32 bits. But during testing with x-dirty-bitmap, we can
force qemu to connect to some other context that might have 64-bit
status bit; however, we ignore those upper bits (other than mapping
qemu:allocation-depth into something that 'qemu-img map --output=json'
can expose), and since it is only testing, we really don't bother with
checking whether more than the two least-significant bits are set.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd.c        | 38 +++++++++++++++++++++++++++-----------
 block/trace-events |  1 +
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index a8b1bc1054..44ab5437ea 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -608,13 +608,16 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
  */
 static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
                                          NBDStructuredReplyChunk *chunk,
-                                         uint8_t *payload, uint64_t orig_length,
-                                         NBDExtent *extent, Error **errp)
+                                         uint8_t *payload, bool wide,
+                                         uint64_t orig_length,
+                                         NBDExtentExt *extent, Error **errp)
 {
     uint32_t context_id;
+    uint32_t count = 0;
+    size_t len = wide ? sizeof(*extent) : sizeof(NBDExtent);

     /* The server succeeded, so it must have sent [at least] one extent */
-    if (chunk->length < sizeof(context_id) + sizeof(*extent)) {
+    if (chunk->length < sizeof(context_id) + wide * sizeof(count) + len) {
         error_setg(errp, "Protocol error: invalid payload for "
                          "NBD_REPLY_TYPE_BLOCK_STATUS");
         return -EINVAL;
@@ -629,8 +632,14 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
         return -EINVAL;
     }

-    extent->length = payload_advance32(&payload);
-    extent->flags = payload_advance32(&payload);
+    if (wide) {
+        count = payload_advance32(&payload);
+        extent->length = payload_advance64(&payload);
+        extent->flags = payload_advance64(&payload);
+    } else {
+        extent->length = payload_advance32(&payload);
+        extent->flags = payload_advance32(&payload);
+    }

     if (extent->length == 0) {
         error_setg(errp, "Protocol error: server sent status chunk with "
@@ -670,7 +679,8 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
      * connection; just ignore trailing extents, and clamp things to
      * the length of our request.
      */
-    if (chunk->length > sizeof(context_id) + sizeof(*extent)) {
+    if (count > 1 ||
+        chunk->length > sizeof(context_id) + wide * sizeof(count) + len) {
         trace_nbd_parse_blockstatus_compliance("more than one extent");
     }
     if (extent->length > orig_length) {
@@ -1114,7 +1124,7 @@ static int coroutine_fn nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t h

 static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
                                                          uint64_t handle, uint64_t length,
-                                                         NBDExtent *extent,
+                                                         NBDExtentExt *extent,
                                                          int *request_ret, Error **errp)
 {
     NBDReplyChunkIter iter;
@@ -1131,6 +1141,11 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
         assert(nbd_reply_is_structured(&reply));

         switch (chunk->type) {
+        case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
+            if (!s->info.extended_headers) {
+                trace_nbd_extended_headers_compliance("block_status_ext");
+            }
+            /* fallthrough */
         case NBD_REPLY_TYPE_BLOCK_STATUS:
             if (received) {
                 nbd_channel_error(s, -EINVAL);
@@ -1139,9 +1154,10 @@ static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
             }
             received = true;

-            ret = nbd_parse_blockstatus_payload(s, &reply.structured,
-                                                payload, length, extent,
-                                                &local_err);
+            ret = nbd_parse_blockstatus_payload(
+                s, &reply.structured, payload,
+                chunk->type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT,
+                length, extent, &local_err);
             if (ret < 0) {
                 nbd_channel_error(s, ret);
                 nbd_iter_channel_error(&iter, ret, &local_err);
@@ -1369,7 +1385,7 @@ static int coroutine_fn nbd_client_co_block_status(
         int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
     int ret, request_ret;
-    NBDExtent extent = { 0 };
+    NBDExtentExt extent = { 0 };
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
     Error *local_err = NULL;

diff --git a/block/trace-events b/block/trace-events
index 48dbf10c66..b84a5155a3 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -168,6 +168,7 @@ iscsi_xcopy(void *src_lun, uint64_t src_off, void *dst_lun, uint64_t dst_off, ui
 # nbd.c
 nbd_parse_blockstatus_compliance(const char *err) "ignoring extra data from non-compliant server: %s"
 nbd_structured_read_compliance(const char *type) "server sent non-compliant unaligned read %s chunk"
+nbd_extended_headers_compliance(const char *type) "server sent non-compliant %s chunk without extended headers"
 nbd_read_reply_entry_fail(int ret, const char *err) "ret = %d, err: %s"
 nbd_co_request_fail(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name, int ret, const char *err) "Request failed { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) } ret = %d, err: %s"
 nbd_client_handshake(const char *export_name) "export '%s'"
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 11/15] nbd/client: Request extended headers during negotiation
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (9 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 10/15] nbd/client: Accept 64-bit block status chunks Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 12/15] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy,
	Kevin Wolf, Hanna Reitz

All the pieces are in place for a client to finally request extended
headers.  Note that we must not request extended headers when qemu-nbd
is used to connect to the kernel module (as nbd.ko does not expect
them), but there is no harm in all other clients requesting them.

Extended headers are not essential to the information collected during
'qemu-nbd --list', but probing for it gives us one more piece of
information in that output.  Update the iotests affected by the new
line of output.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client-connection.c                       |  1 +
 nbd/client.c                                  | 35 +++++++++++++------
 qemu-nbd.c                                    |  2 ++
 tests/qemu-iotests/223.out                    |  6 ++++
 tests/qemu-iotests/233.out                    |  5 +++
 tests/qemu-iotests/241.out                    |  3 ++
 tests/qemu-iotests/307.out                    |  5 +++
 .../tests/nbd-qemu-allocation.out             |  1 +
 8 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 0c5f917efa..3576190d09 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -93,6 +93,7 @@ NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,

         .initial_info.request_sizes = true,
         .initial_info.structured_reply = true,
+        .initial_info.extended_headers = true,
         .initial_info.base_allocation = true,
         .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
         .initial_info.name = g_strdup(export_name ?: "")
diff --git a/nbd/client.c b/nbd/client.c
index 70f06ce637..413304f553 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -878,12 +878,13 @@ static int nbd_list_meta_contexts(QIOChannel *ioc,
  *          1: server is newstyle, but can only accept EXPORT_NAME
  *          2: server is newstyle, but lacks structured replies
  *          3: server is newstyle and set up for structured replies
+ *          4: server is newstyle and set up for extended headers
  */
 static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
                                QCryptoTLSCreds *tlscreds,
                                const char *hostname, QIOChannel **outioc,
-                               bool structured_reply, bool *zeroes,
-                               Error **errp)
+                               bool structured_reply, bool ext_hdrs,
+                               bool *zeroes, Error **errp)
 {
     ERRP_GUARD();
     uint64_t magic;
@@ -960,15 +961,23 @@ static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
         if (fixedNewStyle) {
             int result = 0;

-            if (structured_reply) {
+            if (ext_hdrs) {
+                result = nbd_request_simple_option(ioc,
+                                                   NBD_OPT_EXTENDED_HEADERS,
+                                                   false, errp);
+                if (result) {
+                    return result < 0 ? -EINVAL : 4;
+                }
+            }
+            if (structured_reply && !result) {
                 result = nbd_request_simple_option(ioc,
                                                    NBD_OPT_STRUCTURED_REPLY,
                                                    false, errp);
-                if (result < 0) {
-                    return -EINVAL;
+                if (result) {
+                    return result < 0 ? -EINVAL : 3;
                 }
             }
-            return 2 + result;
+            return 2;
         } else {
             return 1;
         }
@@ -1030,7 +1039,8 @@ int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
     trace_nbd_receive_negotiate_name(info->name);

     result = nbd_start_negotiate(aio_context, ioc, tlscreds, hostname, outioc,
-                                 info->structured_reply, &zeroes, errp);
+                                 info->structured_reply,
+                                 info->extended_headers, &zeroes, errp);

     info->structured_reply = false;
     info->extended_headers = false;
@@ -1040,6 +1050,9 @@ int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
     }

     switch (result) {
+    case 4: /* newstyle, with extended headers */
+        info->extended_headers = true;
+        /* fall through */
     case 3: /* newstyle, with structured replies */
         info->structured_reply = true;
         if (base_allocation) {
@@ -1151,7 +1164,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,

     *info = NULL;
     result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, &sioc, true,
-                                 NULL, errp);
+                                 true, NULL, errp);
     if (tlscreds && sioc) {
         ioc = sioc;
     }
@@ -1159,6 +1172,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
     switch (result) {
     case 2:
     case 3:
+    case 4:
         /* newstyle - use NBD_OPT_LIST to populate array, then try
          * NBD_OPT_INFO on each array member. If structured replies
          * are enabled, also try NBD_OPT_LIST_META_CONTEXT. */
@@ -1179,7 +1193,8 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
             memset(&array[count - 1], 0, sizeof(*array));
             array[count - 1].name = name;
             array[count - 1].description = desc;
-            array[count - 1].structured_reply = result == 3;
+            array[count - 1].structured_reply = result >= 3;
+            array[count - 1].extended_headers = result >= 4;
         }

         for (i = 0; i < count; i++) {
@@ -1195,7 +1210,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
                 break;
             }

-            if (result == 3 &&
+            if (result >= 3 &&
                 nbd_list_meta_contexts(ioc, &array[i], errp) < 0) {
                 goto out;
             }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 0cd5aa6f02..9404164487 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -238,6 +238,8 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
             printf("  opt block: %u\n", list[i].opt_block);
             printf("  max block: %u\n", list[i].max_block);
         }
+        printf("  transaction size: %s\n",
+               list[i].extended_headers ? "64-bit" : "32-bit");
         if (list[i].n_contexts) {
             printf("  available meta contexts: %d\n", list[i].n_contexts);
             for (j = 0; j < list[i].n_contexts; j++) {
diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index 26fb347c5d..b98582c38e 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -87,6 +87,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b
@@ -97,6 +98,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b2
@@ -106,6 +108,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b3
@@ -206,6 +209,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b
@@ -216,6 +220,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b2
@@ -225,6 +230,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b3
diff --git a/tests/qemu-iotests/233.out b/tests/qemu-iotests/233.out
index 237c82767e..33cb622ecf 100644
--- a/tests/qemu-iotests/233.out
+++ b/tests/qemu-iotests/233.out
@@ -53,6 +53,11 @@ exports available: 1
  export: ''
   size:  67108864
   min block: 1
+  opt block: 4096
+  max block: 33554432
+  transaction size: 64-bit
+  available meta contexts: 1
+   base:allocation

 == check TLS with different CA fails ==
 qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': The certificate hasn't got a known issuer
diff --git a/tests/qemu-iotests/241.out b/tests/qemu-iotests/241.out
index 88e8cfcd7e..a9efb87652 100644
--- a/tests/qemu-iotests/241.out
+++ b/tests/qemu-iotests/241.out
@@ -6,6 +6,7 @@ exports available: 1
  export: ''
   size:  1024
   min block: 1
+  transaction size: 64-bit
 [{ "start": 0, "length": 1000, "depth": 0, "present": true, "zero": false, "data": true, "offset": OFFSET},
 { "start": 1000, "length": 24, "depth": 0, "present": true, "zero": true, "data": false, "offset": OFFSET}]
 1 KiB (0x400) bytes     allocated at offset 0 bytes (0x0)
@@ -16,6 +17,7 @@ exports available: 1
  export: ''
   size:  1024
   min block: 512
+  transaction size: 64-bit
 [{ "start": 0, "length": 1024, "depth": 0, "present": true, "zero": false, "data": true, "offset": OFFSET}]
 1 KiB (0x400) bytes     allocated at offset 0 bytes (0x0)
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
@@ -28,6 +30,7 @@ exports available: 1
  export: ''
   size:  1024
   min block: 1
+  transaction size: 64-bit
 [{ "start": 0, "length": 1000, "depth": 0, "present": true, "zero": false, "data": true, "offset": OFFSET},
 { "start": 1000, "length": 24, "depth": 0, "present": true, "zero": true, "data": false, "offset": OFFSET}]
 1 KiB (0x400) bytes     allocated at offset 0 bytes (0x0)
diff --git a/tests/qemu-iotests/307.out b/tests/qemu-iotests/307.out
index 390f05d1b7..2b9a6a67a1 100644
--- a/tests/qemu-iotests/307.out
+++ b/tests/qemu-iotests/307.out
@@ -19,6 +19,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -47,6 +48,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -78,6 +80,7 @@ exports available: 2
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation
  export: 'export1'
@@ -87,6 +90,7 @@ exports available: 2
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -113,6 +117,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
index 9d938db24e..659276032b 100644
--- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out
+++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
@@ -21,6 +21,7 @@ exports available: 1
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:allocation-depth
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 12/15] nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (10 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 11/15] nbd/client: Request extended headers during negotiation Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 13/15] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy,
	Kevin Wolf, Hanna Reitz

The next commit will add support for the new addition of
NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
request that the server only return a subset of negotiated contexts,
rather than all contexts.  To make that task easier, this patch
populates the list of contexts to return on a per-command basis (for
now, identical to the full set of negotiated contexts).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  20 +++++++-
 nbd/server.c        | 108 +++++++++++++++++++++++---------------------
 2 files changed, 75 insertions(+), 53 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 02e31b2261..9a8ac1c8a5 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -50,8 +50,23 @@ typedef struct NBDOptionReplyMetaContext {
     /* metadata context name follows */
 } QEMU_PACKED NBDOptionReplyMetaContext;

-/* Transmission phase structs
- *
+/* Transmission phase structs */
+
+/*
+ * NBDMetaContexts represents a list of meta contexts in use, as
+ * selected by NBD_OPT_SET_META_CONTEXT. Also used for
+ * NBD_OPT_LIST_META_CONTEXT, and payload filtering in
+ * NBD_CMD_BLOCK_STATUS.
+ */
+typedef struct NBDMetaContexts {
+    size_t count; /* number of negotiated contexts */
+    bool base_allocation; /* export base:allocation context (block status) */
+    bool allocation_depth; /* export qemu:allocation-depth */
+    size_t nr_bitmaps; /* Length of bitmaps array */
+    bool *bitmaps; /* export qemu:dirty-bitmap:<export bitmap name> */
+} NBDMetaContexts;
+
+/*
  * Note: NBDRequest is _NOT_ the same as the network representation of an NBD
  * request!
  */
@@ -61,6 +76,7 @@ typedef struct NBDRequest {
     uint64_t len;   /* Effect length; 32 bit limit without extended headers */
     uint16_t flags; /* NBD_CMD_FLAG_* */
     uint16_t type;  /* NBD_CMD_* */
+    NBDMetaContexts contexts; /* Used by NBD_CMD_BLOCK_STATUS */
 } NBDRequest;

 typedef struct NBDSimpleReply {
diff --git a/nbd/server.c b/nbd/server.c
index f21f8098c1..1fd1f32028 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -103,20 +103,6 @@ struct NBDExport {

 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

-/* NBDExportMetaContexts represents a list of contexts to be exported,
- * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
- * NBD_OPT_LIST_META_CONTEXT. */
-typedef struct NBDExportMetaContexts {
-    NBDExport *exp;
-    size_t count; /* number of negotiated contexts */
-    bool base_allocation; /* export base:allocation context (block status) */
-    bool allocation_depth; /* export qemu:allocation-depth */
-    bool *bitmaps; /*
-                    * export qemu:dirty-bitmap:<export bitmap name>,
-                    * sized by exp->nr_export_bitmaps
-                    */
-} NBDExportMetaContexts;
-
 struct NBDClient {
     int refcount;
     void (*close_fn)(NBDClient *client, bool negotiated);
@@ -143,7 +129,8 @@ struct NBDClient {

     bool structured_reply; /* also set true if extended_headers is set */
     bool extended_headers;
-    NBDExportMetaContexts export_meta;
+    NBDExport *context_exp; /* export of last OPT_SET_META_CONTEXT */
+    NBDMetaContexts contexts; /* Negotiated meta contexts */

     uint32_t opt; /* Current option being negotiated */
     uint32_t optlen; /* remaining length of data in ioc for the option being
@@ -456,8 +443,8 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp)

 static void nbd_check_meta_export(NBDClient *client)
 {
-    if (client->exp != client->export_meta.exp) {
-        client->export_meta.count = 0;
+    if (client->exp != client->context_exp) {
+        client->contexts.count = 0;
     }
 }

@@ -847,7 +834,7 @@ static bool nbd_strshift(const char **str, const char *prefix)
  * Handle queries to 'base' namespace. For now, only the base:allocation
  * context is available.  Return true if @query has been handled.
  */
-static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
+static bool nbd_meta_base_query(NBDClient *client, NBDMetaContexts *meta,
                                 const char *query)
 {
     if (!nbd_strshift(&query, "base:")) {
@@ -867,8 +854,8 @@ static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
  * and qemu:allocation-depth contexts are available.  Return true if @query
  * has been handled.
  */
-static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
-                                const char *query)
+static bool nbd_meta_qemu_query(NBDClient *client, NBDExport *exp,
+                                NBDMetaContexts *meta, const char *query)
 {
     size_t i;

@@ -879,9 +866,9 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,

     if (!*query) {
         if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-            meta->allocation_depth = meta->exp->allocation_depth;
-            if (meta->exp->nr_export_bitmaps) {
-                memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
+            meta->allocation_depth = exp->allocation_depth;
+            if (meta->nr_bitmaps) {
+                memset(meta->bitmaps, 1, meta->nr_bitmaps);
             }
         }
         trace_nbd_negotiate_meta_query_parse("empty");
@@ -890,7 +877,7 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,

     if (strcmp(query, "allocation-depth") == 0) {
         trace_nbd_negotiate_meta_query_parse("allocation-depth");
-        meta->allocation_depth = meta->exp->allocation_depth;
+        meta->allocation_depth = exp->allocation_depth;
         return true;
     }

@@ -898,17 +885,17 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
         trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
         if (!*query) {
             if (client->opt == NBD_OPT_LIST_META_CONTEXT &&
-                meta->exp->nr_export_bitmaps) {
-                memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
+                exp->nr_export_bitmaps) {
+                memset(meta->bitmaps, 1, exp->nr_export_bitmaps);
             }
             trace_nbd_negotiate_meta_query_parse("empty");
             return true;
         }

-        for (i = 0; i < meta->exp->nr_export_bitmaps; i++) {
+        for (i = 0; i < meta->nr_bitmaps; i++) {
             const char *bm_name;

-            bm_name = bdrv_dirty_bitmap_name(meta->exp->export_bitmaps[i]);
+            bm_name = bdrv_dirty_bitmap_name(exp->export_bitmaps[i]);
             if (strcmp(bm_name, query) == 0) {
                 meta->bitmaps[i] = true;
                 trace_nbd_negotiate_meta_query_parse(query);
@@ -932,8 +919,8 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
  *
  * Return -errno on I/O error, 0 if option was completely handled by
  * sending a reply about inconsistent lengths, or 1 on success. */
-static int nbd_negotiate_meta_query(NBDClient *client,
-                                    NBDExportMetaContexts *meta, Error **errp)
+static int nbd_negotiate_meta_query(NBDClient *client, NBDExport *exp,
+                                    NBDMetaContexts *meta, Error **errp)
 {
     int ret;
     g_autofree char *query = NULL;
@@ -960,7 +947,7 @@ static int nbd_negotiate_meta_query(NBDClient *client,
     if (nbd_meta_base_query(client, meta, query)) {
         return 1;
     }
-    if (nbd_meta_qemu_query(client, meta, query)) {
+    if (nbd_meta_qemu_query(client, exp, meta, query)) {
         return 1;
     }

@@ -972,14 +959,15 @@ static int nbd_negotiate_meta_query(NBDClient *client,
  * Handle NBD_OPT_LIST_META_CONTEXT and NBD_OPT_SET_META_CONTEXT
  *
  * Return -errno on I/O error, or 0 if option was completely handled. */
-static int nbd_negotiate_meta_queries(NBDClient *client,
-                                      NBDExportMetaContexts *meta, Error **errp)
+static int nbd_negotiate_meta_queries(NBDClient *client, Error **errp)
 {
     int ret;
     g_autofree char *export_name = NULL;
     /* Mark unused to work around https://bugs.llvm.org/show_bug.cgi?id=3888 */
     g_autofree G_GNUC_UNUSED bool *bitmaps = NULL;
-    NBDExportMetaContexts local_meta = {0};
+    NBDMetaContexts local_meta = {0};
+    NBDMetaContexts *meta;
+    NBDExport *exp;
     uint32_t nb_queries;
     size_t i;
     size_t count = 0;
@@ -994,6 +982,9 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
     if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
         /* Only change the caller's meta on SET. */
         meta = &local_meta;
+    } else {
+        meta = &client->contexts;
+        client->context_exp = NULL;
     }

     g_free(meta->bitmaps);
@@ -1004,14 +995,15 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
         return ret;
     }

-    meta->exp = nbd_export_find(export_name);
-    if (meta->exp == NULL) {
+    exp = nbd_export_find(export_name);
+    if (exp == NULL) {
         g_autofree char *sane_name = nbd_sanitize_name(export_name);

         return nbd_opt_drop(client, NBD_REP_ERR_UNKNOWN, errp,
                             "export '%s' not present", sane_name);
     }
-    meta->bitmaps = g_new0(bool, meta->exp->nr_export_bitmaps);
+    meta->nr_bitmaps = exp->nr_export_bitmaps;
+    meta->bitmaps = g_new0(bool, exp->nr_export_bitmaps);
     if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
         bitmaps = meta->bitmaps;
     }
@@ -1027,13 +1019,13 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
     if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
         /* enable all known contexts */
         meta->base_allocation = true;
-        meta->allocation_depth = meta->exp->allocation_depth;
-        if (meta->exp->nr_export_bitmaps) {
-            memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
+        meta->allocation_depth = exp->allocation_depth;
+        if (exp->nr_export_bitmaps) {
+            memset(meta->bitmaps, 1, meta->nr_bitmaps);
         }
     } else {
         for (i = 0; i < nb_queries; ++i) {
-            ret = nbd_negotiate_meta_query(client, meta, errp);
+            ret = nbd_negotiate_meta_query(client, exp, meta, errp);
             if (ret <= 0) {
                 return ret;
             }
@@ -1060,7 +1052,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
         count++;
     }

-    for (i = 0; i < meta->exp->nr_export_bitmaps; i++) {
+    for (i = 0; i < meta->nr_bitmaps; i++) {
         const char *bm_name;
         g_autofree char *context = NULL;

@@ -1068,7 +1060,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
             continue;
         }

-        bm_name = bdrv_dirty_bitmap_name(meta->exp->export_bitmaps[i]);
+        bm_name = bdrv_dirty_bitmap_name(exp->export_bitmaps[i]);
         context = g_strdup_printf("qemu:dirty-bitmap:%s", bm_name);

         ret = nbd_negotiate_send_meta_context(client, context,
@@ -1083,6 +1075,9 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
     ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
     if (ret == 0) {
         meta->count = count;
+        if (client->opt == NBD_OPT_SET_META_CONTEXT) {
+            client->context_exp = exp;
+        }
     }

     return ret;
@@ -1276,8 +1271,7 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)

             case NBD_OPT_LIST_META_CONTEXT:
             case NBD_OPT_SET_META_CONTEXT:
-                ret = nbd_negotiate_meta_queries(client, &client->export_meta,
-                                                 errp);
+                ret = nbd_negotiate_meta_queries(client, errp);
                 break;

             case NBD_OPT_EXTENDED_HEADERS:
@@ -1508,7 +1502,7 @@ void nbd_client_put(NBDClient *client)
             QTAILQ_REMOVE(&client->exp->clients, client, next);
             blk_exp_unref(&client->exp->common);
         }
-        g_free(client->export_meta.bitmaps);
+        g_free(client->contexts.bitmaps);
         g_free(client);
     }
 }
@@ -2479,6 +2473,8 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
                 return -ENOMEM;
             }
         }
+    } else if (request->type == NBD_CMD_BLOCK_STATUS) {
+        request->contexts = client->contexts;
     }

     if (payload_len) {
@@ -2702,11 +2698,11 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                                           "need non-zero length", errp);
         }
         assert(client->extended_headers || request->len <= UINT32_MAX);
-        if (client->export_meta.count) {
+        if (request->contexts.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
-            int contexts_remaining = client->export_meta.count;
+            int contexts_remaining = request->contexts.count;

-            if (client->export_meta.base_allocation) {
+            if (request->contexts.base_allocation) {
                 ret = nbd_co_send_block_status(client, request,
                                                blk_bs(exp->common.blk),
                                                request->from,
@@ -2719,7 +2715,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                 }
             }

-            if (client->export_meta.allocation_depth) {
+            if (request->contexts.allocation_depth) {
                 ret = nbd_co_send_block_status(client, request,
                                                blk_bs(exp->common.blk),
                                                request->from, request->len,
@@ -2732,8 +2728,10 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                 }
             }

+            assert(request->contexts.nr_bitmaps ==
+                   client->exp->nr_export_bitmaps);
             for (i = 0; i < client->exp->nr_export_bitmaps; i++) {
-                if (!client->export_meta.bitmaps[i]) {
+                if (!request->contexts.bitmaps[i]) {
                     continue;
                 }
                 ret = nbd_co_send_bitmap(client, request,
@@ -2749,6 +2747,10 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             assert(!contexts_remaining);

             return 0;
+        } else if (client->contexts.count) {
+            return nbd_send_generic_reply(client, request, -EINVAL,
+                                          "CMD_BLOCK_STATUS payload not valid",
+                                          errp);
         } else {
             return nbd_send_generic_reply(client, request, -EINVAL,
                                           "CMD_BLOCK_STATUS not negotiated",
@@ -2825,6 +2827,10 @@ static coroutine_fn void nbd_trip(void *opaque)
     } else {
         ret = nbd_handle_request(client, &request, req->data, &local_err);
     }
+    if (request.type == NBD_CMD_BLOCK_STATUS &&
+        request.contexts.bitmaps != client->contexts.bitmaps) {
+        g_free(request.contexts.bitmaps);
+    }
     if (ret < 0) {
         error_prepend(&local_err, "Failed to send reply: ");
         goto disconnect;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 13/15] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (11 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 12/15] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 14/15] RFC: nbd/client: Accept 64-bit hole chunks Eric Blake
  2022-11-14 22:48   ` [PATCH v2 15/15] RFC: nbd/server: Send 64-bit hole chunk Eric Blake
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy,
	Kevin Wolf, Hanna Reitz

Allow a client to request a subset of negotiated meta contexts.  For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).

Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads.  Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.

This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 docs/interop/nbd.txt                          |   2 +-
 include/block/nbd.h                           |  32 ++++--
 nbd/server.c                                  | 106 +++++++++++++++++-
 qemu-nbd.c                                    |   1 +
 nbd/trace-events                              |   1 +
 tests/qemu-iotests/223.out                    |  12 +-
 tests/qemu-iotests/307.out                    |  10 +-
 .../tests/nbd-qemu-allocation.out             |   2 +-
 8 files changed, 136 insertions(+), 30 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index 988c072697..b7893043a3 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,4 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
-* 7.2: NBD_OPT_EXTENDED_HEADERS
+* 7.2: NBD_OPT_EXTENDED_HEADERS, NBD_FLAG_BLOCK_STATUS_PAYLOAD
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 9a8ac1c8a5..2a65c606c9 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -167,6 +167,12 @@ typedef struct NBDExtentExt {
     uint64_t flags; /* NBD_STATE_* */
 } QEMU_PACKED NBDExtentExt;

+/* Client payload for limiting NBD_CMD_BLOCK_STATUS reply */
+typedef struct NBDBlockStatusPayload {
+    uint64_t effect_length;
+    /* uint32_t ids[] follows, array length implied by header */
+} QEMU_PACKED NBDBlockStatusPayload;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 enum {
@@ -183,20 +189,22 @@ enum {
     NBD_FLAG_SEND_RESIZE_BIT        =  9, /* Send resize */
     NBD_FLAG_SEND_CACHE_BIT         = 10, /* Send CACHE (prefetch) */
     NBD_FLAG_SEND_FAST_ZERO_BIT     = 11, /* FAST_ZERO flag for WRITE_ZEROES */
+    NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT = 12, /* PAYLOAD flag for BLOCK_STATUS */
 };

-#define NBD_FLAG_HAS_FLAGS         (1 << NBD_FLAG_HAS_FLAGS_BIT)
-#define NBD_FLAG_READ_ONLY         (1 << NBD_FLAG_READ_ONLY_BIT)
-#define NBD_FLAG_SEND_FLUSH        (1 << NBD_FLAG_SEND_FLUSH_BIT)
-#define NBD_FLAG_SEND_FUA          (1 << NBD_FLAG_SEND_FUA_BIT)
-#define NBD_FLAG_ROTATIONAL        (1 << NBD_FLAG_ROTATIONAL_BIT)
-#define NBD_FLAG_SEND_TRIM         (1 << NBD_FLAG_SEND_TRIM_BIT)
-#define NBD_FLAG_SEND_WRITE_ZEROES (1 << NBD_FLAG_SEND_WRITE_ZEROES_BIT)
-#define NBD_FLAG_SEND_DF           (1 << NBD_FLAG_SEND_DF_BIT)
-#define NBD_FLAG_CAN_MULTI_CONN    (1 << NBD_FLAG_CAN_MULTI_CONN_BIT)
-#define NBD_FLAG_SEND_RESIZE       (1 << NBD_FLAG_SEND_RESIZE_BIT)
-#define NBD_FLAG_SEND_CACHE        (1 << NBD_FLAG_SEND_CACHE_BIT)
-#define NBD_FLAG_SEND_FAST_ZERO    (1 << NBD_FLAG_SEND_FAST_ZERO_BIT)
+#define NBD_FLAG_HAS_FLAGS          (1 << NBD_FLAG_HAS_FLAGS_BIT)
+#define NBD_FLAG_READ_ONLY          (1 << NBD_FLAG_READ_ONLY_BIT)
+#define NBD_FLAG_SEND_FLUSH         (1 << NBD_FLAG_SEND_FLUSH_BIT)
+#define NBD_FLAG_SEND_FUA           (1 << NBD_FLAG_SEND_FUA_BIT)
+#define NBD_FLAG_ROTATIONAL         (1 << NBD_FLAG_ROTATIONAL_BIT)
+#define NBD_FLAG_SEND_TRIM          (1 << NBD_FLAG_SEND_TRIM_BIT)
+#define NBD_FLAG_SEND_WRITE_ZEROES  (1 << NBD_FLAG_SEND_WRITE_ZEROES_BIT)
+#define NBD_FLAG_SEND_DF            (1 << NBD_FLAG_SEND_DF_BIT)
+#define NBD_FLAG_CAN_MULTI_CONN     (1 << NBD_FLAG_CAN_MULTI_CONN_BIT)
+#define NBD_FLAG_SEND_RESIZE        (1 << NBD_FLAG_SEND_RESIZE_BIT)
+#define NBD_FLAG_SEND_CACHE         (1 << NBD_FLAG_SEND_CACHE_BIT)
+#define NBD_FLAG_SEND_FAST_ZERO     (1 << NBD_FLAG_SEND_FAST_ZERO_BIT)
+#define NBD_FLAG_BLOCK_STAT_PAYLOAD (1 << NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT)

 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
diff --git a/nbd/server.c b/nbd/server.c
index 1fd1f32028..cd280f1721 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -441,9 +441,9 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp)
     return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
 }

-static void nbd_check_meta_export(NBDClient *client)
+static void nbd_check_meta_export(NBDClient *client, NBDExport *exp)
 {
-    if (client->exp != client->context_exp) {
+    if (exp != client->context_exp) {
         client->contexts.count = 0;
     }
 }
@@ -486,11 +486,15 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
         error_setg(errp, "export not found");
         return -EINVAL;
     }
+    nbd_check_meta_export(client, client->exp);

     myflags = client->exp->nbdflags;
     if (client->structured_reply) {
         myflags |= NBD_FLAG_SEND_DF;
     }
+    if (client->extended_headers && client->contexts.count) {
+        myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+    }
     trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags);
     stq_be_p(buf, client->exp->size);
     stw_be_p(buf + 8, myflags);
@@ -503,7 +507,6 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,

     QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
     blk_exp_ref(&client->exp->common);
-    nbd_check_meta_export(client);

     return 0;
 }
@@ -623,6 +626,9 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
                                           errp, "export '%s' not present",
                                           sane_name);
     }
+    if (client->opt == NBD_OPT_GO) {
+        nbd_check_meta_export(client, exp);
+    }

     /* Don't bother sending NBD_INFO_NAME unless client requested it */
     if (sendname) {
@@ -676,6 +682,10 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
     if (client->structured_reply) {
         myflags |= NBD_FLAG_SEND_DF;
     }
+    if (client->extended_headers &&
+        (client->contexts.count || client->opt == NBD_OPT_INFO)) {
+        myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+    }
     trace_nbd_negotiate_new_style_size_flags(exp->size, myflags);
     stq_be_p(buf, exp->size);
     stw_be_p(buf + 8, myflags);
@@ -711,7 +721,6 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
         client->check_align = check_align;
         QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
         blk_exp_ref(&client->exp->common);
-        nbd_check_meta_export(client);
         rc = 1;
     }
     return rc;
@@ -2406,6 +2415,83 @@ static int nbd_co_send_bitmap(NBDClient *client, NBDRequest *request,
     return nbd_co_send_extents(client, request, ea, last, context_id, errp);
 }

+/*
+ * nbd_co_block_status_payload_read
+ * Called when a client wants a subset of negotiated contexts via a
+ * BLOCK_STATUS payload.  Check the payload for valid length and
+ * contents.  On success, return 0 with request updated to effective
+ * length.  If request was invalid but payload consumed, return 0 with
+ * request->len and request->contexts.count set to 0 (which will
+ * trigger an appropriate NBD_EINVAL response later on).  On I/O
+ * error, return -EIO.
+ */
+static int
+nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
+                                 Error **errp)
+{
+    int payload_len = request->len;
+    g_autofree char *buf = NULL;
+    g_autofree bool *bitmaps = NULL;
+    size_t count, i;
+    uint32_t id;
+
+    assert(request->len <= NBD_MAX_BUFFER_SIZE);
+    if (payload_len % sizeof(uint32_t) ||
+        payload_len < sizeof(NBDBlockStatusPayload) ||
+        payload_len > (sizeof(NBDBlockStatusPayload) +
+                       sizeof(id) * client->contexts.count)) {
+        goto skip;
+    }
+
+    buf = g_malloc(payload_len);
+    if (nbd_read(client->ioc, buf, payload_len,
+                 "CMD_BLOCK_STATUS data", errp) < 0) {
+        return -EIO;
+    }
+    trace_nbd_co_receive_request_payload_received(request->handle,
+                                                  payload_len);
+    memset(&request->contexts, 0, sizeof(request->contexts));
+    request->contexts.nr_bitmaps = client->context_exp->nr_export_bitmaps;
+    bitmaps = g_new0(bool, request->contexts.nr_bitmaps);
+    count = (payload_len - sizeof(NBDBlockStatusPayload)) / sizeof(id);
+    payload_len = 0;
+
+    for (i = 0; i < count; i++) {
+
+        id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
+        if (id == NBD_META_ID_BASE_ALLOCATION) {
+            if (request->contexts.base_allocation) {
+                goto skip;
+            }
+            request->contexts.base_allocation = true;
+        } else if (id == NBD_META_ID_ALLOCATION_DEPTH) {
+            if (request->contexts.allocation_depth) {
+                goto skip;
+            }
+            request->contexts.allocation_depth = true;
+        } else {
+            if (id - NBD_META_ID_DIRTY_BITMAP >
+                request->contexts.nr_bitmaps ||
+                bitmaps[id - NBD_META_ID_DIRTY_BITMAP]) {
+                goto skip;
+            }
+            bitmaps[id - NBD_META_ID_DIRTY_BITMAP] = true;
+        }
+    }
+
+    request->len = ldq_be_p(buf);
+    request->contexts.count = count;
+    request->contexts.bitmaps = bitmaps;
+    bitmaps = NULL;
+    return 0;
+
+ skip:
+    trace_nbd_co_receive_block_status_payload_compliance(request->from,
+                                                         request->len);
+    request->len = request->contexts.count = 0;
+    return nbd_drop(client->ioc, payload_len, errp);
+}
+
 /* nbd_co_receive_request
  * Collect a client request. Return 0 if request looks valid, -EIO to drop
  * connection right away, -EAGAIN to indicate we were interrupted and the
@@ -2452,7 +2538,14 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,

         if (request->type == NBD_CMD_WRITE || extended_with_payload) {
             payload_len = request->len;
-            if (request->type != NBD_CMD_WRITE) {
+            if (request->type == NBD_CMD_BLOCK_STATUS) {
+                payload_len = nbd_co_block_status_payload_read(client,
+                                                               request,
+                                                               errp);
+                if (payload_len < 0) {
+                    return -EIO;
+                }
+            } else if (request->type != NBD_CMD_WRITE) {
                 /*
                  * For now, we don't support payloads on other
                  * commands; but we can keep the connection alive.
@@ -2528,6 +2621,9 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
         valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO;
     } else if (request->type == NBD_CMD_BLOCK_STATUS) {
         valid_flags |= NBD_CMD_FLAG_REQ_ONE;
+        if (client->extended_headers && client->contexts.count) {
+            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
+        }
     }
     if (request->flags & ~valid_flags) {
         error_setg(errp, "unsupported flags for command %s (got 0x%x)",
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 9404164487..4ae5c5e73e 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -222,6 +222,7 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
                 [NBD_FLAG_SEND_RESIZE_BIT]          = "resize",
                 [NBD_FLAG_SEND_CACHE_BIT]           = "cache",
                 [NBD_FLAG_SEND_FAST_ZERO_BIT]       = "fast-zero",
+                [NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT]   = "block-status-payload",
             };

             printf("  size:  %" PRIu64 "\n", list[i].size);
diff --git a/nbd/trace-events b/nbd/trace-events
index c20df33a43..da92fe1b56 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -70,6 +70,7 @@ nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void *data, size_t
 nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset, size_t size) "Send structured read hole reply: handle = %" PRIu64 ", offset = %" PRIu64 ", len = %zu"
 nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: handle = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
 nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'"
+nbd_co_receive_block_status_payload_compliance(uint64_t from, int len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%x"
 nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
 nbd_co_receive_request_payload_received(uint64_t handle, uint64_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu64
 nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index b98582c38e..b38f0b7963 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -83,7 +83,7 @@ exports available: 0
 exports available: 3
  export: 'n'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -94,7 +94,7 @@ exports available: 3
  export: 'n2'
   description: some text
   size:  4194304
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -104,7 +104,7 @@ exports available: 3
    qemu:dirty-bitmap:b2
  export: 'n3'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -205,7 +205,7 @@ exports available: 0
 exports available: 3
  export: 'n'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -216,7 +216,7 @@ exports available: 3
  export: 'n2'
   description: some text
   size:  4194304
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -226,7 +226,7 @@ exports available: 3
    qemu:dirty-bitmap:b2
  export: 'n3'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
diff --git a/tests/qemu-iotests/307.out b/tests/qemu-iotests/307.out
index 2b9a6a67a1..f645f3315f 100644
--- a/tests/qemu-iotests/307.out
+++ b/tests/qemu-iotests/307.out
@@ -15,7 +15,7 @@ wrote 4096/4096 bytes at offset 0
 exports available: 1
  export: 'fmt'
   size:  67108864
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -44,7 +44,7 @@ exports available: 1
 exports available: 1
  export: 'fmt'
   size:  67108864
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -76,7 +76,7 @@ exports available: 1
 exports available: 2
  export: 'fmt'
   size:  67108864
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -86,7 +86,7 @@ exports available: 2
  export: 'export1'
   description: This is the writable second export
   size:  67108864
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -113,7 +113,7 @@ exports available: 1
  export: 'export1'
   description: This is the writable second export
   size:  67108864
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
index 659276032b..794d1bfce6 100644
--- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out
+++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
@@ -17,7 +17,7 @@ wrote 2097152/2097152 bytes at offset 1048576
 exports available: 1
  export: ''
   size:  4194304
-  flags: 0x48f ( readonly flush fua df cache )
+  flags: 0x148f ( readonly flush fua df cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 14/15] RFC: nbd/client: Accept 64-bit hole chunks
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (12 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 13/15] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  2022-11-14 22:48   ` [PATCH v2 15/15] RFC: nbd/server: Send 64-bit hole chunk Eric Blake
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy,
	Kevin Wolf, Hanna Reitz

As part of adding extended headers, the NBD spec debated about adding
support for reading 64-bit holes.  It was documented in a separate
upstream commit XXX[*] to make it easier to decide whether 64-bit
holes should be required of all clients supporting extended headers,
or whether it is an unneeded feature; hence, the qemu work to support
it is also pulled out into a separate commit.

Note that we can also tolerate a non-compliant server sending the new
chunk even when it should not.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
[*] Fix commit id if we actually go with idea
---
 include/block/nbd.h |  8 ++++++++
 block/nbd.c         | 26 ++++++++++++++++++++------
 nbd/common.c        |  4 +++-
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 2a65c606c9..18b6bad038 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -133,6 +133,13 @@ typedef struct NBDStructuredReadHole {
     uint32_t length;
 } QEMU_PACKED NBDStructuredReadHole;

+/* Complete chunk for NBD_REPLY_TYPE_OFFSET_HOLE_EXT */
+typedef struct NBDStructuredReadHoleExt {
+    /* header's length == 16 */
+    uint64_t offset;
+    uint64_t length;
+} QEMU_PACKED NBDStructuredReadHoleExt;
+
 /* Header of all NBD_REPLY_TYPE_ERROR* errors */
 typedef struct NBDStructuredError {
     /* header's length >= 6 */
@@ -309,6 +316,7 @@ enum {
 #define NBD_REPLY_TYPE_NONE              0
 #define NBD_REPLY_TYPE_OFFSET_DATA       1
 #define NBD_REPLY_TYPE_OFFSET_HOLE       2
+#define NBD_REPLY_TYPE_OFFSET_HOLE_EXT   3
 #define NBD_REPLY_TYPE_BLOCK_STATUS      5
 #define NBD_REPLY_TYPE_BLOCK_STATUS_EXT  6
 #define NBD_REPLY_TYPE_ERROR             NBD_REPLY_ERR(1)
diff --git a/block/nbd.c b/block/nbd.c
index 44ab5437ea..968d5d8a37 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -570,20 +570,26 @@ static inline uint64_t payload_advance64(uint8_t **payload)

 static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
                                          NBDStructuredReplyChunk *chunk,
-                                         uint8_t *payload, uint64_t orig_offset,
+                                         uint8_t *payload, bool wide,
+                                         uint64_t orig_offset,
                                          QEMUIOVector *qiov, Error **errp)
 {
     uint64_t offset;
-    uint32_t hole_size;
+    uint64_t hole_size;
+    size_t len = wide ? sizeof(hole_size) : sizeof(uint32_t);

-    if (chunk->length != sizeof(offset) + sizeof(hole_size)) {
+    if (chunk->length != sizeof(offset) + len) {
         error_setg(errp, "Protocol error: invalid payload for "
                          "NBD_REPLY_TYPE_OFFSET_HOLE");
         return -EINVAL;
     }

     offset = payload_advance64(&payload);
-    hole_size = payload_advance32(&payload);
+    if (wide) {
+        hole_size = payload_advance64(&payload);
+    } else {
+        hole_size = payload_advance32(&payload);
+    }

     if (!hole_size || offset < orig_offset || hole_size > qiov->size ||
         offset > orig_offset + qiov->size - hole_size) {
@@ -596,6 +602,7 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
         trace_nbd_structured_read_compliance("hole");
     }

+    assert(hole_size <= SIZE_MAX);
     qemu_iovec_memset(qiov, offset - orig_offset, 0, hole_size);

     return 0;
@@ -1094,9 +1101,16 @@ static int coroutine_fn nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t h
              * in qiov
              */
             break;
+        case NBD_REPLY_TYPE_OFFSET_HOLE_EXT:
+            if (!s->info.extended_headers) {
+                trace_nbd_extended_headers_compliance("hole_ext");
+            }
+            /* fallthrough */
         case NBD_REPLY_TYPE_OFFSET_HOLE:
-            ret = nbd_parse_offset_hole_payload(s, &reply.structured, payload,
-                                                offset, qiov, &local_err);
+            ret = nbd_parse_offset_hole_payload(
+                s, &reply.structured, payload,
+                chunk->type == NBD_REPLY_TYPE_OFFSET_HOLE_EXT,
+                offset, qiov, &local_err);
             if (ret < 0) {
                 nbd_channel_error(s, ret);
                 nbd_iter_channel_error(&iter, ret, &local_err);
diff --git a/nbd/common.c b/nbd/common.c
index 137466defd..54f7d6a4fd 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -174,7 +174,9 @@ const char *nbd_reply_type_lookup(uint16_t type)
     case NBD_REPLY_TYPE_OFFSET_DATA:
         return "data";
     case NBD_REPLY_TYPE_OFFSET_HOLE:
-        return "hole";
+        return "hole (32-bit)";
+    case NBD_REPLY_TYPE_OFFSET_HOLE_EXT:
+        return "hole (64-bit)";
     case NBD_REPLY_TYPE_BLOCK_STATUS:
         return "block status (32-bit)";
     case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 15/15] RFC: nbd/server: Send 64-bit hole chunk
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
                     ` (13 preceding siblings ...)
  2022-11-14 22:48   ` [PATCH v2 14/15] RFC: nbd/client: Accept 64-bit hole chunks Eric Blake
@ 2022-11-14 22:48   ` Eric Blake
  14 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, libguestfs, nbd, Vladimir Sementsov-Ogievskiy

Since we cap NBD_CMD_READ requests to 32M, we never have a reason to
send a 64-bit chunk type for a hole; but it is worth producing these
for interoperability testing of clients that want extended headers.
---
 nbd/server.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index cd280f1721..04cb172f97 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2112,9 +2112,13 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
         if (status & BDRV_BLOCK_ZERO) {
             NBDReply hdr;
             NBDStructuredReadHole chunk;
+            NBDStructuredReadHoleExt chunk_ext;
             struct iovec iov[] = {
                 {.iov_base = &hdr},
-                {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+                {.iov_base = client->extended_headers ? &chunk_ext
+                 : (void *) &chunk,
+                 .iov_len = client->extended_headers ? sizeof(chunk_ext)
+                 : sizeof(chunk)},
             };

             trace_nbd_co_send_structured_read_hole(request->handle,
@@ -2122,9 +2126,17 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
                                                    pnum);
             set_be_chunk(client, &iov[0],
                          final ? NBD_REPLY_FLAG_DONE : 0,
-                         NBD_REPLY_TYPE_OFFSET_HOLE, request, iov[1].iov_len);
-            stq_be_p(&chunk.offset, offset + progress);
-            stl_be_p(&chunk.length, pnum);
+                         client->extended_headers
+                         ? NBD_REPLY_TYPE_OFFSET_HOLE_EXT
+                         : NBD_REPLY_TYPE_OFFSET_HOLE,
+                         request, iov[1].iov_len);
+            if (client->extended_headers) {
+                stq_be_p(&chunk_ext.offset, offset + progress);
+                stq_be_p(&chunk_ext.length, pnum);
+            } else {
+                stq_be_p(&chunk.offset, offset + progress);
+                stl_be_p(&chunk.length, pnum);
+            }
             ret = nbd_co_send_iov(client, iov, 2, errp);
         } else {
             ret = blk_pread(exp->common.blk, offset + progress, pnum,
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions
  2022-11-14 22:41 [cross-project PATCH v2] NBD 64-bit extensions Eric Blake
  2022-11-14 22:46 ` [PATCH v2 0/6] NBD spec changes for " Eric Blake
  2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
@ 2022-11-14 22:51 ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 01/23] block_status: Refactor array storage Eric Blake
                     ` (22 more replies)
  2 siblings, 23 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

This series is posted alongside a spec change to NBD, and
interoperable with changes posted to qemu-nbd/qemu-storage-daemon.
The RFC patch at the end is optional; ineroperability with qemu works
only when either both projects omit the RFC patch, or when both
projects include it (if only one of the two RFC projects include it,
the protocol is incompatible between the two, but at least client and
server gracefully detect the bug rather than SEGV'ing).

Eric Blake (23):
  block_status: Refactor array storage
  internal: Refactor layout of replies in sbuf
  protocol: Add definitions for extended headers
  states: Prepare to send 64-bit requests
  states: Prepare to receive 64-bit replies
  states: Break deadlock if server goofs on extended replies
  generator: Add struct nbd_extent in prep for 64-bit extents
  block_status: Track 64-bit extents internally
  block_status: Accept 64-bit extents during block status
  api: Add [aio_]nbd_block_status_64
  api: Add several functions for controlling extended headers
  copy: Update nbdcopy to use 64-bit block status
  dump: Update nbddump to use 64-bit block status
  info: Expose extended-headers support through nbdinfo
  info: Update nbdinfo --map to use 64-bit block status
  examples: Update copy-libev to use 64-bit block status
  ocaml: Add example for 64-bit extents
  generator: Actually request extended headers
  api: Add nbd_[aio_]opt_extended_headers()
  interop: Add test of 64-bit block status
  api: Add nbd_can_block_status_payload()
  api: Add nbd_[aio_]block_status_filter()
  RFC: pread: Accept 64-bit holes

 docs/libnbd.pod                               |  18 +-
 info/nbdinfo.pod                              |  21 +-
 sh/nbdsh.pod                                  |   2 +-
 lib/internal.h                                |  42 +-
 lib/nbd-protocol.h                            | 120 ++--
 generator/API.ml                              | 532 +++++++++++++++---
 generator/API.mli                             |   1 +
 generator/C.ml                                |  24 +-
 generator/GoLang.ml                           |  24 +
 generator/Makefile.am                         |   3 +-
 generator/OCaml.ml                            |  18 +-
 generator/Python.ml                           |  20 +-
 generator/state_machine.ml                    |  50 +-
 generator/states-issue-command.c              |  33 +-
 .../states-newstyle-opt-extended-headers.c    | 110 ++++
 generator/states-newstyle-opt-starttls.c      |   7 +-
 .../states-newstyle-opt-structured-reply.c    |   3 +-
 generator/states-newstyle.c                   |   3 +
 generator/states-reply-simple.c               |   4 +-
 generator/states-reply-structured.c           | 279 ++++++---
 generator/states-reply.c                      |  57 +-
 lib/aio.c                                     |   7 +-
 lib/flags.c                                   |  11 +
 lib/handle.c                                  |  25 +-
 lib/opt.c                                     |  44 ++
 lib/rw.c                                      | 250 +++++++-
 python/t/110-defaults.py                      |   1 +
 python/t/120-set-non-defaults.py              |   2 +
 python/t/465-block-status-64.py               |  56 ++
 ocaml/examples/Makefile.am                    |   3 +-
 ocaml/examples/extents64.ml                   |  42 ++
 ocaml/helpers.c                               |  20 +
 ocaml/nbd-c.h                                 |   3 +-
 ocaml/tests/Makefile.am                       |   1 +
 ocaml/tests/test_110_defaults.ml              |   2 +
 ocaml/tests/test_120_set_non_defaults.ml      |   3 +
 ocaml/tests/test_465_block_status_64.ml       |  58 ++
 tests/Makefile.am                             |   4 +
 tests/meta-base-allocation.c                  | 111 +++-
 tests/pwrite-extended.c                       | 112 ++++
 examples/copy-libev.c                         |  21 +-
 examples/server-flags.c                       |   7 +-
 interop/Makefile.am                           |  18 +
 interop/block-status-payload.c                | 241 ++++++++
 interop/block-status-payload.sh               |  80 +++
 interop/large-status.c                        | 186 ++++++
 interop/large-status.sh                       |  49 ++
 interop/opt-extended-headers.c                | 153 +++++
 interop/opt-extended-headers.sh               |  29 +
 .gitignore                                    |   4 +
 copy/nbd-ops.c                                |  22 +-
 dump/dump.c                                   |  27 +-
 fuzzing/libnbd-fuzz-wrapper.c                 |  22 +-
 golang/Makefile.am                            |   1 +
 golang/handle.go                              |   8 +-
 golang/libnbd_110_defaults_test.go            |   8 +
 golang/libnbd_120_set_non_defaults_test.go    |  12 +
 golang/libnbd_465_block_status_64_test.go     | 119 ++++
 info/can.c                                    |  14 +
 info/info-can.sh                              |  30 +
 info/info-packets.sh                          |  17 +-
 info/main.c                                   |   7 +-
 info/map.c                                    |  67 +--
 info/show.c                                   |   9 +-
 64 files changed, 2920 insertions(+), 357 deletions(-)
 create mode 100644 generator/states-newstyle-opt-extended-headers.c
 create mode 100644 python/t/465-block-status-64.py
 create mode 100644 ocaml/examples/extents64.ml
 create mode 100644 ocaml/tests/test_465_block_status_64.ml
 create mode 100644 tests/pwrite-extended.c
 create mode 100644 interop/block-status-payload.c
 create mode 100755 interop/block-status-payload.sh
 create mode 100644 interop/large-status.c
 create mode 100755 interop/large-status.sh
 create mode 100644 interop/opt-extended-headers.c
 create mode 100755 interop/opt-extended-headers.sh
 create mode 100644 golang/libnbd_465_block_status_64_test.go

-- 
2.38.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 01/23] block_status: Refactor array storage
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 02/23] internal: Refactor layout of replies in sbuf Eric Blake
                     ` (21 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

For 32-bit block status, we were able to cheat and use an array with
an odd number of elements, with array[0] holding the context id, and
passing &array[1] to the user's callback.  But once we have 64-bit
extents, we can no longer abuse array element 0 like that, for two
reasons: 64-bit extents contain uint64_t which might not be
alignment-compatible with an array of uint32_t on all architectures,
and the new NBD_REPLY_TYPE_BLOCK_STATUS_EXT adds an additional count
field before the array.

Split out a new state STRUCTURED_REPLY.BS_HEADER to receive the
context id (and eventually the new count field for 64-bit replies)
separately from the extents array, and add another structured_reply
type in the payload section for tracking it.  No behavioral change,
other than the rare possibility of landing in the new state.
---
 lib/internal.h                      |  1 +
 lib/nbd-protocol.h                  | 19 ++++++----
 generator/state_machine.ml          |  9 ++++-
 generator/states-reply-structured.c | 56 ++++++++++++++++++++---------
 4 files changed, 61 insertions(+), 24 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index bbbd2639..fe81f1a0 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -241,6 +241,7 @@ struct nbd_handle {
       union {
         struct nbd_structured_reply_offset_data offset_data;
         struct nbd_structured_reply_offset_hole offset_hole;
+        struct nbd_structured_reply_block_status_hdr bs_hdr;
         struct {
           struct nbd_structured_reply_error error;
           char msg[NBD_MAX_STRING]; /* Common to all error types */
diff --git a/lib/nbd-protocol.h b/lib/nbd-protocol.h
index e5d6404b..4400d3ab 100644
--- a/lib/nbd-protocol.h
+++ b/lib/nbd-protocol.h
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
@@ -182,12 +182,6 @@ struct nbd_fixed_new_option_reply_meta_context {
   /* followed by a string */
 } NBD_ATTRIBUTE_PACKED;

-/* NBD_REPLY_TYPE_BLOCK_STATUS block descriptor. */
-struct nbd_block_descriptor {
-  uint32_t length;              /* length of block */
-  uint32_t status_flags;        /* block type (hole etc) */
-} NBD_ATTRIBUTE_PACKED;
-
 /* Request (client -> server). */
 struct nbd_request {
   uint32_t magic;               /* NBD_REQUEST_MAGIC. */
@@ -224,6 +218,17 @@ struct nbd_structured_reply_offset_hole {
   uint32_t length;              /* Length of hole. */
 } NBD_ATTRIBUTE_PACKED;

+/* NBD_REPLY_TYPE_BLOCK_STATUS block descriptor. */
+struct nbd_block_descriptor {
+  uint32_t length;              /* length of block */
+  uint32_t status_flags;        /* block type (hole etc) */
+} NBD_ATTRIBUTE_PACKED;
+
+struct nbd_structured_reply_block_status_hdr {
+  uint32_t context_id;          /* metadata context ID */
+  /* followed by array of nbd_block_descriptor extents */
+} NBD_ATTRIBUTE_PACKED;
+
 struct nbd_structured_reply_error {
   uint32_t error;               /* NBD_E* error number */
   uint16_t len;                 /* Length of human readable error. */
diff --git a/generator/state_machine.ml b/generator/state_machine.ml
index 257cb4f4..d2c326f3 100644
--- a/generator/state_machine.ml
+++ b/generator/state_machine.ml
@@ -871,10 +871,17 @@ and
     external_events = [];
   };

+  State {
+    default_state with
+    name = "RECV_BS_HEADER";
+    comment = "Receive header of structured reply block-status payload";
+    external_events = [];
+  };
+
   State {
     default_state with
     name = "RECV_BS_ENTRIES";
-    comment = "Receive a structured reply block-status payload";
+    comment = "Receive entries array of structured reply block-status payload";
     external_events = [];
   };

diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index 2456e6da..bbd3de0c 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -126,19 +126,10 @@  REPLY.STRUCTURED_REPLY.CHECK:
         length < 12 || ((length-4) & 7) != 0)
       goto resync;
     assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
-    /* We read the context ID followed by all the entries into a
-     * single array and deal with it at the end.
-     */
-    free (h->bs_entries);
-    h->bs_entries = malloc (length);
-    if (h->bs_entries == NULL) {
-      SET_NEXT_STATE (%.DEAD);
-      set_error (errno, "malloc");
-      break;
-    }
-    h->rbuf = h->bs_entries;
-    h->rlen = length;
-    SET_NEXT_STATE (%RECV_BS_ENTRIES);
+    /* Start by reading the context ID. */
+    h->rbuf = &h->sbuf.sr.payload.bs_hdr;
+    h->rlen = sizeof h->sbuf.sr.payload.bs_hdr;
+    SET_NEXT_STATE (%RECV_BS_HEADER);
     break;

   default:
@@ -424,9 +415,41 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_HOLE:
   }
   return 0;

+ REPLY.STRUCTURED_REPLY.RECV_BS_HEADER:
+  struct command *cmd = h->reply_cmd;
+  uint32_t length;
+
+  switch (recv_into_rbuf (h)) {
+  case -1: SET_NEXT_STATE (%.DEAD); return 0;
+  case 1:
+    save_reply_state (h);
+    SET_NEXT_STATE (%.READY);
+    return 0;
+  case 0:
+    length = be32toh (h->sbuf.sr.structured_reply.length);
+
+    assert (cmd); /* guaranteed by CHECK */
+    assert (cmd->type == NBD_CMD_BLOCK_STATUS);
+    assert (length >= 12);
+    length -= sizeof h->sbuf.sr.payload.bs_hdr;
+
+    free (h->bs_entries);
+    h->bs_entries = malloc (length);
+    if (h->bs_entries == NULL) {
+      SET_NEXT_STATE (%.DEAD);
+      set_error (errno, "malloc");
+      return 0;
+    }
+    h->rbuf = h->bs_entries;
+    h->rlen = length;
+    SET_NEXT_STATE (%RECV_BS_ENTRIES);
+  }
+  return 0;
+
  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
   struct command *cmd = h->reply_cmd;
   uint32_t length;
+  uint32_t count;
   size_t i;
   uint32_t context_id;

@@ -445,15 +468,16 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
     assert (h->bs_entries);
     assert (length >= 12);
     assert (h->meta_valid);
+    count = (length - sizeof h->sbuf.sr.payload.bs_hdr) / sizeof *h->bs_entries;

     /* Need to byte-swap the entries returned, but apart from that we
      * don't validate them.
      */
-    for (i = 0; i < length/4; ++i)
+    for (i = 0; i < count; ++i)
       h->bs_entries[i] = be32toh (h->bs_entries[i]);

     /* Look up the context ID. */
-    context_id = h->bs_entries[0];
+    context_id = be32toh (h->sbuf.sr.payload.bs_hdr.context_id);
     for (i = 0; i < h->meta_contexts.len; ++i)
       if (context_id == h->meta_contexts.ptr[i].context_id)
         break;
@@ -464,7 +488,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:

       if (CALL_CALLBACK (cmd->cb.fn.extent,
                          h->meta_contexts.ptr[i].name, cmd->offset,
-                         &h->bs_entries[1], (length-4) / 4,
+                         h->bs_entries, count,
                          &error) == -1)
         if (cmd->error == 0)
           cmd->error = error ? error : EPROTO;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 02/23] internal: Refactor layout of replies in sbuf
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 01/23] block_status: Refactor array storage Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 03/23] protocol: Add definitions for extended headers Eric Blake
                     ` (20 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

In order to more easily add a third reply type with an even larger
header, but where the payload will look the same for both structured
and extended replies, it is nicer if simple and structured replies are
nested inside the same layer of sbuf.reply.hdr.  While at it, note
that while .or and .sr are structs declared within the overall sbuf
union, we never read into both halves of those structs at the same
time, so it does not matter if their two halves are consecutive.
Dropping the packed notation on those structs means the compiler can
align .payload more naturally, which may slightly improve performance
on some platforms, even if it makes the overall union a few bytes
larger due to padding.

Visually, this patch changes the layout from:

 offset  simple                structured
+------------------------------------------------------------+
|     union sbuf                                             |
|     +---------------------+------------------------------+ |
|     | struct simple_reply | union sr                     | |
|     | +-----------------+ | +--------------------------+ | |
|     | |                 | | | struct structured_reply  | | |
|     | |                 | | | +----------------------+ | | |
|  0  | | uint32_t magic  | | | | uint32_t magic       | | | |
|  4  | | uint32_t error  | | | | uint16_t flags       | | | |
|  6  | |                 | | | | uint16_t type        | | | |
|  8  | | uint64_t handle | | | | uint64_t handle      | | | |
|     | +-----------------+ | | |                      | | | |
| 16  | [padding]           | | | uint32_t length      | | | |
|     |                     | | +----------------------+ | | |
|     |                     | | union payload            | | |
|     |                     | | +-----------+----------+ | | |
| 20  |                     | | | ...       | ...      | | | |
|     |                     | | +-----------+----------+ | | |
|     |                     | +--------------------------+ | |
|     +---------------------+------------------------------+ |
+------------------------------------------------------------+

to:

 offset  simple                structured
+-------------------------------------------------------------+
|     union sbuf                                              |
|     +-----------------------------------------------------+ |
|     | struct reply                                        | |
|     | +-------------------------------------------------+ | |
|     | | union hdr                                       | | |
|     | | +--------------------+------------------------+ | | |
|     | | | struct simple      | struct structured      | | | |
|     | | | +----------------+ | +--------------------+ | | | |
|  0  | | | | uint32_t magic | | | uint32_t magic     | | | | |
|  4  | | | | uint32_t error | | | uint16_t flags     | | | | |
|  6  | | | |                | | | uint16_t type      | | | | |
|  8  | | | | uint64_t handle| | | uint64_t handle    | | | | |
|     | | | +----------------+ | |                    | | | | |
| 16  | | | [padding]          | | uint32_t length    | | | | |
|     | | |                    | +--------------------+ | | | |
| 20  | | |                    | [padding]              | | | |
|     | | +--------------------+------------------------+ | | |
|     | | union payload                                   | | |
|     | | +--------------------+------------------------+ | | |
| 24  | | | ...                | ...                    | | | |
|     | | +--------------------+------------------------+ | | |
|     | +-------------------------------------------------+ | |
|     +-----------------------------------------------------+ |
+-------------------------------------------------------------+

Technically, whether the payload union offset moves to byte 24 (with
20-23 now padding) or stays at 20 depends on compiler ABI; but many
systems prefer that any struct with a uint64_t provide 8-byte
alignment to its containing union.

The commit is largely mechanical, and there should be no semantic
change.
---
 lib/internal.h                      |  12 ++--
 generator/states-reply-simple.c     |   4 +-
 generator/states-reply-structured.c | 103 ++++++++++++++--------------
 generator/states-reply.c            |  10 +--
 4 files changed, 66 insertions(+), 63 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index fe81f1a0..f81c41ba 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -230,14 +230,16 @@ struct nbd_handle {
         struct {
           struct nbd_fixed_new_option_reply_meta_context context;
           char str[NBD_MAX_STRING];
-        }  __attribute__((packed)) context;
+        } __attribute__((packed)) context;
         char err_msg[NBD_MAX_STRING];
       } payload;
-    }  __attribute__((packed)) or;
+    } or;
     struct nbd_export_name_option_reply export_name_reply;
-    struct nbd_simple_reply simple_reply;
     struct {
-      struct nbd_structured_reply structured_reply;
+      union {
+        struct nbd_simple_reply simple;
+        struct nbd_structured_reply structured;
+      } hdr;
       union {
         struct nbd_structured_reply_offset_data offset_data;
         struct nbd_structured_reply_offset_hole offset_hole;
@@ -248,7 +250,7 @@ struct nbd_handle {
           uint64_t offset; /* Only used for NBD_REPLY_TYPE_ERROR_OFFSET */
         } __attribute__((packed)) error;
       } payload;
-    }  __attribute__((packed)) sr;
+    } reply;
     uint16_t gflags;
     uint32_t cflags;
     uint32_t len;
diff --git a/generator/states-reply-simple.c b/generator/states-reply-simple.c
index a43d61dd..f390b5a5 100644
--- a/generator/states-reply-simple.c
+++ b/generator/states-reply-simple.c
@@ -23,7 +23,7 @@  REPLY.SIMPLE_REPLY.START:
   struct command *cmd = h->reply_cmd;
   uint32_t error;

-  error = be32toh (h->sbuf.simple_reply.error);
+  error = be32toh (h->sbuf.reply.hdr.simple.error);

   if (cmd == NULL) {
     /* Unexpected reply.  If error was set or we have structured
@@ -39,7 +39,7 @@  REPLY.SIMPLE_REPLY.START:
     if (error || h->structured_replies)
       SET_NEXT_STATE (%^FINISH_COMMAND);
     else {
-      uint64_t cookie = be64toh (h->sbuf.simple_reply.handle);
+      uint64_t cookie = be64toh (h->sbuf.reply.hdr.simple.handle);
       SET_NEXT_STATE (%.DEAD);
       set_error (EPROTO,
                  "no matching cookie %" PRIu64 " found for server reply, "
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index bbd3de0c..7587d856 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -49,9 +49,9 @@  REPLY.STRUCTURED_REPLY.START:
    * so read the remaining part.
    */
   h->rbuf = &h->sbuf;
-  h->rbuf = (char *) h->rbuf + sizeof h->sbuf.simple_reply;
-  h->rlen = sizeof h->sbuf.sr.structured_reply;
-  h->rlen -= sizeof h->sbuf.simple_reply;
+  h->rbuf = (char *) h->rbuf + sizeof h->sbuf.reply.hdr.simple;
+  h->rlen = sizeof h->sbuf.reply.hdr.structured;
+  h->rlen -= sizeof h->sbuf.reply.hdr.simple;
   SET_NEXT_STATE (%RECV_REMAINING);
   return 0;

@@ -71,9 +71,9 @@  REPLY.STRUCTURED_REPLY.CHECK:
   uint16_t flags, type;
   uint32_t length;

-  flags = be16toh (h->sbuf.sr.structured_reply.flags);
-  type = be16toh (h->sbuf.sr.structured_reply.type);
-  length = be32toh (h->sbuf.sr.structured_reply.length);
+  flags = be16toh (h->sbuf.reply.hdr.structured.flags);
+  type = be16toh (h->sbuf.reply.hdr.structured.type);
+  length = be32toh (h->sbuf.reply.hdr.structured.length);

   /* Reject a server that replies with too much information, but don't
    * reject a single structured reply to NBD_CMD_READ on the largest
@@ -82,7 +82,7 @@  REPLY.STRUCTURED_REPLY.CHECK:
    * oversized reply is going to take long enough to resync that it is
    * not worth keeping the connection alive.
    */
-  if (length > MAX_REQUEST_SIZE + sizeof h->sbuf.sr.payload.offset_data) {
+  if (length > MAX_REQUEST_SIZE + sizeof h->sbuf.reply.payload.offset_data) {
     set_error (0, "invalid server reply length %" PRIu32, length);
     SET_NEXT_STATE (%.DEAD);
     return 0;
@@ -105,19 +105,19 @@  REPLY.STRUCTURED_REPLY.CHECK:
      * them as an extension, so we use < instead of <=.
      */
     if (cmd->type != NBD_CMD_READ ||
-        length < sizeof h->sbuf.sr.payload.offset_data)
+        length < sizeof h->sbuf.reply.payload.offset_data)
       goto resync;
-    h->rbuf = &h->sbuf.sr.payload.offset_data;
-    h->rlen = sizeof h->sbuf.sr.payload.offset_data;
+    h->rbuf = &h->sbuf.reply.payload.offset_data;
+    h->rlen = sizeof h->sbuf.reply.payload.offset_data;
     SET_NEXT_STATE (%RECV_OFFSET_DATA);
     break;

   case NBD_REPLY_TYPE_OFFSET_HOLE:
     if (cmd->type != NBD_CMD_READ ||
-        length != sizeof h->sbuf.sr.payload.offset_hole)
+        length != sizeof h->sbuf.reply.payload.offset_hole)
       goto resync;
-    h->rbuf = &h->sbuf.sr.payload.offset_hole;
-    h->rlen = sizeof h->sbuf.sr.payload.offset_hole;
+    h->rbuf = &h->sbuf.reply.payload.offset_hole;
+    h->rlen = sizeof h->sbuf.reply.payload.offset_hole;
     SET_NEXT_STATE (%RECV_OFFSET_HOLE);
     break;

@@ -127,8 +127,8 @@  REPLY.STRUCTURED_REPLY.CHECK:
       goto resync;
     assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
     /* Start by reading the context ID. */
-    h->rbuf = &h->sbuf.sr.payload.bs_hdr;
-    h->rlen = sizeof h->sbuf.sr.payload.bs_hdr;
+    h->rbuf = &h->sbuf.reply.payload.bs_hdr;
+    h->rlen = sizeof h->sbuf.reply.payload.bs_hdr;
     SET_NEXT_STATE (%RECV_BS_HEADER);
     break;

@@ -139,10 +139,10 @@  REPLY.STRUCTURED_REPLY.CHECK:
        * compliant, will favor the wire error over EPROTO during more
        * length checks in RECV_ERROR_MESSAGE and RECV_ERROR_TAIL.
        */
-      if (length < sizeof h->sbuf.sr.payload.error.error.error)
+      if (length < sizeof h->sbuf.reply.payload.error.error.error)
         goto resync;
-      h->rbuf = &h->sbuf.sr.payload.error.error;
-      h->rlen = MIN (length, sizeof h->sbuf.sr.payload.error.error);
+      h->rbuf = &h->sbuf.reply.payload.error.error;
+      h->rlen = MIN (length, sizeof h->sbuf.reply.payload.error.error);
       SET_NEXT_STATE (%RECV_ERROR);
     }
     else
@@ -168,19 +168,19 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
-    assert (length >= sizeof h->sbuf.sr.payload.error.error.error);
+    length = be32toh (h->sbuf.reply.hdr.structured.length);
+    assert (length >= sizeof h->sbuf.reply.payload.error.error.error);
     assert (cmd);

-    if (length < sizeof h->sbuf.sr.payload.error.error)
+    if (length < sizeof h->sbuf.reply.payload.error.error)
       goto resync;

-    msglen = be16toh (h->sbuf.sr.payload.error.error.len);
-    if (msglen > length - sizeof h->sbuf.sr.payload.error.error ||
-        msglen > sizeof h->sbuf.sr.payload.error.msg)
+    msglen = be16toh (h->sbuf.reply.payload.error.error.len);
+    if (msglen > length - sizeof h->sbuf.reply.payload.error.error ||
+        msglen > sizeof h->sbuf.reply.payload.error.msg)
       goto resync;

-    h->rbuf = h->sbuf.sr.payload.error.msg;
+    h->rbuf = h->sbuf.reply.payload.error.msg;
     h->rlen = msglen;
     SET_NEXT_STATE (%RECV_ERROR_MESSAGE);
   }
@@ -188,11 +188,11 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR:

  resync:
   /* Favor the error packet's errno over RESYNC's EPROTO. */
-  error = be32toh (h->sbuf.sr.payload.error.error.error);
+  error = be32toh (h->sbuf.reply.payload.error.error.error);
   if (cmd->error == 0)
     cmd->error = nbd_internal_errno_of_nbd_error (error);
   h->rbuf = NULL;
-  h->rlen = length - MIN (length, sizeof h->sbuf.sr.payload.error.error);
+  h->rlen = length - MIN (length, sizeof h->sbuf.reply.payload.error.error);
   SET_NEXT_STATE (%RESYNC);
   return 0;

@@ -207,15 +207,15 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR_MESSAGE:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
-    msglen = be16toh (h->sbuf.sr.payload.error.error.len);
-    type = be16toh (h->sbuf.sr.structured_reply.type);
+    length = be32toh (h->sbuf.reply.hdr.structured.length);
+    msglen = be16toh (h->sbuf.reply.payload.error.error.len);
+    type = be16toh (h->sbuf.reply.hdr.structured.type);

-    length -= sizeof h->sbuf.sr.payload.error.error + msglen;
+    length -= sizeof h->sbuf.reply.payload.error.error + msglen;

     if (msglen)
       debug (h, "structured error server message: %.*s", (int) msglen,
-             h->sbuf.sr.payload.error.msg);
+             h->sbuf.reply.payload.error.msg);

     /* Special case two specific errors; silently ignore tail for all others */
     h->rbuf = NULL;
@@ -227,11 +227,11 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR_MESSAGE:
                "the server may have a bug");
       break;
     case NBD_REPLY_TYPE_ERROR_OFFSET:
-      if (length != sizeof h->sbuf.sr.payload.error.offset)
+      if (length != sizeof h->sbuf.reply.payload.error.offset)
         debug (h, "unable to safely extract error offset, "
                "the server may have a bug");
       else
-        h->rbuf = &h->sbuf.sr.payload.error.offset;
+        h->rbuf = &h->sbuf.reply.payload.error.offset;
       break;
     }
     SET_NEXT_STATE (%RECV_ERROR_TAIL);
@@ -250,8 +250,8 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR_TAIL:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    error = be32toh (h->sbuf.sr.payload.error.error.error);
-    type = be16toh (h->sbuf.sr.structured_reply.type);
+    error = be32toh (h->sbuf.reply.payload.error.error.error);
+    type = be16toh (h->sbuf.reply.hdr.structured.type);

     assert (cmd); /* guaranteed by CHECK */

@@ -266,7 +266,7 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR_TAIL:
      * user callback if present.  Ignore the offset if it was bogus.
      */
     if (type == NBD_REPLY_TYPE_ERROR_OFFSET && h->rbuf) {
-      uint64_t offset = be64toh (h->sbuf.sr.payload.error.offset);
+      uint64_t offset = be64toh (h->sbuf.reply.payload.error.offset);
       if (structured_reply_in_bounds (offset, 0, cmd) &&
           cmd->type == NBD_CMD_READ &&
           CALLBACK_IS_NOT_NULL (cmd->cb.fn.chunk)) {
@@ -307,8 +307,8 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
-    offset = be64toh (h->sbuf.sr.payload.offset_data.offset);
+    length = be32toh (h->sbuf.reply.hdr.structured.length);
+    offset = be64toh (h->sbuf.reply.payload.offset_data.offset);

     assert (cmd); /* guaranteed by CHECK */

@@ -346,8 +346,8 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA_DATA:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
-    offset = be64toh (h->sbuf.sr.payload.offset_data.offset);
+    length = be32toh (h->sbuf.reply.hdr.structured.length);
+    offset = be64toh (h->sbuf.reply.payload.offset_data.offset);

     assert (cmd); /* guaranteed by CHECK */
     if (CALLBACK_IS_NOT_NULL (cmd->cb.fn.chunk)) {
@@ -377,8 +377,8 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_HOLE:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    offset = be64toh (h->sbuf.sr.payload.offset_hole.offset);
-    length = be32toh (h->sbuf.sr.payload.offset_hole.length);
+    offset = be64toh (h->sbuf.reply.payload.offset_hole.offset);
+    length = be32toh (h->sbuf.reply.payload.offset_hole.length);

     assert (cmd); /* guaranteed by CHECK */

@@ -426,12 +426,12 @@  REPLY.STRUCTURED_REPLY.RECV_BS_HEADER:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
+    length = be32toh (h->sbuf.reply.hdr.structured.length);

     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
     assert (length >= 12);
-    length -= sizeof h->sbuf.sr.payload.bs_hdr;
+    length -= sizeof h->sbuf.reply.payload.bs_hdr;

     free (h->bs_entries);
     h->bs_entries = malloc (length);
@@ -460,7 +460,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.sr.structured_reply.length);
+    length = be32toh (h->sbuf.reply.hdr.structured.length);

     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
@@ -468,7 +468,8 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
     assert (h->bs_entries);
     assert (length >= 12);
     assert (h->meta_valid);
-    count = (length - sizeof h->sbuf.sr.payload.bs_hdr) / sizeof *h->bs_entries;
+    count = (length - sizeof h->sbuf.reply.payload.bs_hdr) /
+      sizeof *h->bs_entries;

     /* Need to byte-swap the entries returned, but apart from that we
      * don't validate them.
@@ -477,7 +478,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
       h->bs_entries[i] = be32toh (h->bs_entries[i]);

     /* Look up the context ID. */
-    context_id = be32toh (h->sbuf.sr.payload.bs_hdr.context_id);
+    context_id = be32toh (h->sbuf.reply.payload.bs_hdr.context_id);
     for (i = 0; i < h->meta_contexts.len; ++i)
       if (context_id == h->meta_contexts.ptr[i].context_id)
         break;
@@ -524,8 +525,8 @@  REPLY.STRUCTURED_REPLY.RESYNC:
       SET_NEXT_STATE (%^FINISH_COMMAND);
       return 0;
     }
-    type = be16toh (h->sbuf.sr.structured_reply.type);
-    length = be32toh (h->sbuf.sr.structured_reply.length);
+    type = be16toh (h->sbuf.reply.hdr.structured.type);
+    length = be32toh (h->sbuf.reply.hdr.structured.length);
     debug (h, "unexpected reply type %u or payload length %" PRIu32
            " for cookie %" PRIu64 " and command %" PRIu32
            ", this is probably a server bug",
@@ -539,7 +540,7 @@  REPLY.STRUCTURED_REPLY.RESYNC:
  REPLY.STRUCTURED_REPLY.FINISH:
   uint16_t flags;

-  flags = be16toh (h->sbuf.sr.structured_reply.flags);
+  flags = be16toh (h->sbuf.reply.hdr.structured.flags);
   if (flags & NBD_REPLY_FLAG_DONE) {
     SET_NEXT_STATE (%^FINISH_COMMAND);
   }
diff --git a/generator/states-reply.c b/generator/states-reply.c
index b53409e4..845973a4 100644
--- a/generator/states-reply.c
+++ b/generator/states-reply.c
@@ -69,8 +69,8 @@  REPLY.START:
   assert (h->reply_cmd == NULL);
   assert (h->rlen == 0);

-  h->rbuf = &h->sbuf;
-  h->rlen = sizeof h->sbuf.simple_reply;
+  h->rbuf = &h->sbuf.reply.hdr;
+  h->rlen = sizeof h->sbuf.reply.hdr.simple;

   r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen);
   if (r == -1) {
@@ -115,7 +115,7 @@  REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY:
   uint32_t magic;
   uint64_t cookie;

-  magic = be32toh (h->sbuf.simple_reply.magic);
+  magic = be32toh (h->sbuf.reply.hdr.simple.magic);
   if (magic == NBD_SIMPLE_REPLY_MAGIC) {
     SET_NEXT_STATE (%SIMPLE_REPLY.START);
   }
@@ -132,7 +132,7 @@  REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY:
    * handle (our cookie) is stored at the same offset.
    */
   h->chunks_received++;
-  cookie = be64toh (h->sbuf.simple_reply.handle);
+  cookie = be64toh (h->sbuf.reply.hdr.simple.handle);
   /* Find the command amongst the commands in flight. If the server sends
    * a reply for an unknown cookie, FINISH will diagnose that later.
    */
@@ -151,7 +151,7 @@  REPLY.FINISH_COMMAND:
   /* NB: This works for both simple and structured replies because the
    * handle (our cookie) is stored at the same offset.
    */
-  cookie = be64toh (h->sbuf.simple_reply.handle);
+  cookie = be64toh (h->sbuf.reply.hdr.simple.handle);
   /* Find the command amongst the commands in flight. */
   for (cmd = h->cmds_in_flight, prev_cmd = NULL;
        cmd != NULL;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 03/23] protocol: Add definitions for extended headers
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 01/23] block_status: Refactor array storage Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 02/23] internal: Refactor layout of replies in sbuf Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 04/23] states: Prepare to send 64-bit requests Eric Blake
                     ` (19 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Add the magic numbers and new structs necessary to implement the NBD
protocol extension of extended headers providing 64-bit lengths.  This
corresponds to upstream nbd commits XXX-XXX[*].

---
[*] FIXME update commit ids before pushing
---
 lib/nbd-protocol.h | 66 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 52 insertions(+), 14 deletions(-)

diff --git a/lib/nbd-protocol.h b/lib/nbd-protocol.h
index 4400d3ab..ac569a11 100644
--- a/lib/nbd-protocol.h
+++ b/lib/nbd-protocol.h
@@ -124,6 +124,7 @@ struct nbd_fixed_new_option_reply {
 #define NBD_OPT_STRUCTURED_REPLY   8
 #define NBD_OPT_LIST_META_CONTEXT  9
 #define NBD_OPT_SET_META_CONTEXT   10
+#define NBD_OPT_EXTENDED_HEADERS   11

 #define NBD_REP_ERR(val) (0x80000000 | (val))
 #define NBD_REP_IS_ERR(val) (!!((val) & 0x80000000))
@@ -141,6 +142,7 @@ struct nbd_fixed_new_option_reply {
 #define NBD_REP_ERR_SHUTDOWN         NBD_REP_ERR (7)
 #define NBD_REP_ERR_BLOCK_SIZE_REQD  NBD_REP_ERR (8)
 #define NBD_REP_ERR_TOO_BIG          NBD_REP_ERR (9)
+#define NBD_REP_ERR_EXT_HEADER_REQD  NBD_REP_ERR (10)

 #define NBD_INFO_EXPORT      0
 #define NBD_INFO_NAME        1
@@ -182,16 +184,26 @@ struct nbd_fixed_new_option_reply_meta_context {
   /* followed by a string */
 } NBD_ATTRIBUTE_PACKED;

-/* Request (client -> server). */
+/* Compact request (client -> server). */
 struct nbd_request {
   uint32_t magic;               /* NBD_REQUEST_MAGIC. */
-  uint16_t flags;               /* Request flags. */
-  uint16_t type;                /* Request type. */
+  uint16_t flags;               /* Request flags: NBD_CMD_FLAG_*. */
+  uint16_t type;                /* Request type: NBD_CMD_*. */
   uint64_t handle;              /* Opaque handle. */
   uint64_t offset;              /* Request offset. */
   uint32_t count;               /* Request length. */
 } NBD_ATTRIBUTE_PACKED;

+/* Extended request (client -> server). */
+struct nbd_request_ext {
+  uint32_t magic;               /* NBD_EXTENDED_REQUEST_MAGIC. */
+  uint16_t flags;               /* Request flags: NBD_CMD_FLAG_*. */
+  uint16_t type;                /* Request type: NBD_CMD_*. */
+  uint64_t handle;              /* Opaque handle. */
+  uint64_t offset;              /* Request offset. */
+  uint64_t count;               /* Request effect or payload length. */
+} NBD_ATTRIBUTE_PACKED;
+
 /* Simple reply (server -> client). */
 struct nbd_simple_reply {
   uint32_t magic;               /* NBD_SIMPLE_REPLY_MAGIC. */
@@ -208,6 +220,16 @@ struct nbd_structured_reply {
   uint32_t length;              /* Length of payload which follows. */
 } NBD_ATTRIBUTE_PACKED;

+/* Extended reply (server -> client). */
+struct nbd_extended_reply {
+  uint32_t magic;               /* NBD_EXTENDED_REPLY_MAGIC. */
+  uint16_t flags;               /* NBD_REPLY_FLAG_* */
+  uint16_t type;                /* NBD_REPLY_TYPE_* */
+  uint64_t handle;              /* Opaque handle. */
+  uint64_t offset;              /* Client's offset. */
+  uint64_t length;              /* Length of payload which follows. */
+} NBD_ATTRIBUTE_PACKED;
+
 struct nbd_structured_reply_offset_data {
   uint64_t offset;              /* offset */
   /* Followed by data. */
@@ -224,11 +246,23 @@ struct nbd_block_descriptor {
   uint32_t status_flags;        /* block type (hole etc) */
 } NBD_ATTRIBUTE_PACKED;

+/* NBD_REPLY_TYPE_BLOCK_STATUS_EXT block descriptor. */
+struct nbd_block_descriptor_ext {
+  uint64_t length;              /* length of block */
+  uint64_t status_flags;        /* block type (hole etc) */
+} NBD_ATTRIBUTE_PACKED;
+
 struct nbd_structured_reply_block_status_hdr {
   uint32_t context_id;          /* metadata context ID */
   /* followed by array of nbd_block_descriptor extents */
 } NBD_ATTRIBUTE_PACKED;

+struct nbd_structured_reply_block_status_ext_hdr {
+  uint32_t context_id;          /* metadata context ID */
+  uint32_t count;               /* 0, or length of following array */
+  /* followed by array of nbd_block_descriptor_ext extents */
+} NBD_ATTRIBUTE_PACKED;
+
 struct nbd_structured_reply_error {
   uint32_t error;               /* NBD_E* error number */
   uint16_t len;                 /* Length of human readable error. */
@@ -236,8 +270,10 @@ struct nbd_structured_reply_error {
 } NBD_ATTRIBUTE_PACKED;

 #define NBD_REQUEST_MAGIC           0x25609513
+#define NBD_EXTENDED_REQUEST_MAGIC  0x21e41c71
 #define NBD_SIMPLE_REPLY_MAGIC      0x67446698
 #define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
+#define NBD_EXTENDED_REPLY_MAGIC    0x6e8a278c

 /* Structured reply flags. */
 #define NBD_REPLY_FLAG_DONE         (1<<0)
@@ -246,12 +282,13 @@ struct nbd_structured_reply_error {
 #define NBD_REPLY_TYPE_IS_ERR(val) (!!((val) & (1<<15)))

 /* Structured reply types. */
-#define NBD_REPLY_TYPE_NONE         0
-#define NBD_REPLY_TYPE_OFFSET_DATA  1
-#define NBD_REPLY_TYPE_OFFSET_HOLE  2
-#define NBD_REPLY_TYPE_BLOCK_STATUS 5
-#define NBD_REPLY_TYPE_ERROR        NBD_REPLY_TYPE_ERR (1)
-#define NBD_REPLY_TYPE_ERROR_OFFSET NBD_REPLY_TYPE_ERR (2)
+#define NBD_REPLY_TYPE_NONE             0
+#define NBD_REPLY_TYPE_OFFSET_DATA      1
+#define NBD_REPLY_TYPE_OFFSET_HOLE      2
+#define NBD_REPLY_TYPE_BLOCK_STATUS     5
+#define NBD_REPLY_TYPE_BLOCK_STATUS_EXT 6
+#define NBD_REPLY_TYPE_ERROR            NBD_REPLY_TYPE_ERR (1)
+#define NBD_REPLY_TYPE_ERROR_OFFSET     NBD_REPLY_TYPE_ERR (2)

 /* NBD commands. */
 #define NBD_CMD_READ              0
@@ -263,11 +300,12 @@ struct nbd_structured_reply_error {
 #define NBD_CMD_WRITE_ZEROES      6
 #define NBD_CMD_BLOCK_STATUS      7

-#define NBD_CMD_FLAG_FUA       (1<<0)
-#define NBD_CMD_FLAG_NO_HOLE   (1<<1)
-#define NBD_CMD_FLAG_DF        (1<<2)
-#define NBD_CMD_FLAG_REQ_ONE   (1<<3)
-#define NBD_CMD_FLAG_FAST_ZERO (1<<4)
+#define NBD_CMD_FLAG_FUA         (1<<0)
+#define NBD_CMD_FLAG_NO_HOLE     (1<<1)
+#define NBD_CMD_FLAG_DF          (1<<2)
+#define NBD_CMD_FLAG_REQ_ONE     (1<<3)
+#define NBD_CMD_FLAG_FAST_ZERO   (1<<4)
+#define NBD_CMD_FLAG_PAYLOAD_LEN (1<<5)

 /* NBD error codes. */
 #define NBD_SUCCESS     0
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 04/23] states: Prepare to send 64-bit requests
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (2 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 03/23] protocol: Add definitions for extended headers Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 05/23] states: Prepare to receive 64-bit replies Eric Blake
                     ` (18 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Support sending 64-bit requests if extended headers were negotiated.
This includes setting NBD_CMD_FLAG_PAYLOAD_LEN any time we send an
extended NBD_CMD_WRITE; this is such a fundamental part of the
protocol that for now it is easier to silently ignore whatever value
the user passes in for that bit in the flags parameter of nbd_pwrite
regardless of the current settings in set_strict_mode, rather than
trying to force the user to pass in the correct value to match whether
extended mode is negotiated.  However, when we later add APIs to give
the user more control for interoperability testing, it may be worth
adding a new set_strict_mode control knob to explicitly enable the
client to intentionally violate the protocol (the testsuite added in
this patch would then be updated to match).

At this point, h->extended_headers is permanently false (we can't
enable it until all other aspects of the protocol have likewise been
converted).

Support for using FLAG_PAYLOAD_LEN with NBD_CMD_BLOCK_STATUS is less
fundamental, and deserves to be in its own patch.
---
 lib/internal.h                      |  10 ++-
 generator/API.ml                    |  20 +++--
 generator/states-issue-command.c    |  29 ++++---
 generator/states-reply-structured.c |   2 +-
 lib/rw.c                            |  17 +++--
 tests/Makefile.am                   |   4 +
 tests/pwrite-extended.c             | 112 ++++++++++++++++++++++++++++
 .gitignore                          |   1 +
 8 files changed, 169 insertions(+), 26 deletions(-)
 create mode 100644 tests/pwrite-extended.c

diff --git a/lib/internal.h b/lib/internal.h
index f81c41ba..e900eca3 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -109,6 +109,9 @@ struct nbd_handle {
   char *tls_username;           /* Username, NULL = use current username */
   char *tls_psk_file;           /* PSK filename, NULL = no PSK */

+  /* Extended headers. */
+  bool extended_headers;        /* If we negotiated NBD_OPT_EXTENDED_HEADERS */
+
   /* Desired metadata contexts. */
   bool request_sr;
   bool request_meta;
@@ -262,7 +265,10 @@ struct nbd_handle {
   /* Issuing a command must use a buffer separate from sbuf, for the
    * case when we interrupt a request to service a reply.
    */
-  struct nbd_request request;
+  union {
+    struct nbd_request compact;
+    struct nbd_request_ext extended;
+  } req;
   bool in_write_payload;
   bool in_write_shutdown;

@@ -363,7 +369,7 @@ struct command {
   uint16_t type;
   uint64_t cookie;
   uint64_t offset;
-  uint32_t count;
+  uint64_t count;
   void *data; /* Buffer for read/write */
   struct command_cb cb;
   bool initialized; /* For read, true if getting a hole may skip memset */
diff --git a/generator/API.ml b/generator/API.ml
index 25a612a2..beb7a2b4 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -198,11 +198,12 @@ let cmd_flags =
   flag_prefix = "CMD_FLAG";
   guard = Some "((h->strict & LIBNBD_STRICT_FLAGS) || flags > UINT16_MAX)";
   flags = [
-    "FUA",       1 lsl 0;
-    "NO_HOLE",   1 lsl 1;
-    "DF",        1 lsl 2;
-    "REQ_ONE",   1 lsl 3;
-    "FAST_ZERO", 1 lsl 4;
+    "FUA",         1 lsl 0;
+    "NO_HOLE",     1 lsl 1;
+    "DF",          1 lsl 2;
+    "REQ_ONE",     1 lsl 3;
+    "FAST_ZERO",   1 lsl 4;
+    "PAYLOAD_LEN", 1 lsl 5;
   ]
 }
 let handshake_flags = {
@@ -2459,7 +2460,7 @@   "pread_structured", {
   "pwrite", {
     default_call with
     args = [ BytesIn ("buf", "count"); UInt64 "offset" ];
-    optargs = [ OFlags ("flags", cmd_flags, Some ["FUA"]) ];
+    optargs = [ OFlags ("flags", cmd_flags, Some ["FUA"; "PAYLOAD_LEN"]) ];
     ret = RErr;
     permitted_states = [ Connected ];
     shortdesc = "write to the NBD server";
@@ -2482,7 +2483,10 @@   "pwrite", {
 C<LIBNBD_CMD_FLAG_FUA> meaning that the server should not
 return until the data has been committed to permanent storage
 (if that is supported - some servers cannot do this, see
-L<nbd_can_fua(3)>)."
+L<nbd_can_fua(3)>).  For convenience, libnbd ignores the presence
+or absence of the flag C<LIBNBD_CMD_FLAG_PAYLOAD_LEN> in C<flags>,
+while correctly using the flag over the wire according to whether
+extended headers were negotiated."
 ^ strict_call_description;
     see_also = [Link "can_fua"; Link "is_read_only";
                 Link "aio_pwrite"; Link "get_block_size";
@@ -3172,7 +3176,7 @@   "aio_pwrite", {
     default_call with
     args = [ BytesPersistIn ("buf", "count"); UInt64 "offset" ];
     optargs = [ OClosure completion_closure;
-                OFlags ("flags", cmd_flags, Some ["FUA"]) ];
+                OFlags ("flags", cmd_flags, Some ["FUA"; "PAYLOAD_LEN"]) ];
     ret = RCookie;
     permitted_states = [ Connected ];
     shortdesc = "write to the NBD server";
diff --git a/generator/states-issue-command.c b/generator/states-issue-command.c
index df9295b5..feea2672 100644
--- a/generator/states-issue-command.c
+++ b/generator/states-issue-command.c
@@ -41,15 +41,24 @@  ISSUE_COMMAND.START:
     return 0;
   }

-  h->request.magic = htobe32 (NBD_REQUEST_MAGIC);
-  h->request.flags = htobe16 (cmd->flags);
-  h->request.type = htobe16 (cmd->type);
-  h->request.handle = htobe64 (cmd->cookie);
-  h->request.offset = htobe64 (cmd->offset);
-  h->request.count = htobe32 ((uint32_t) cmd->count);
+  /* These fields are coincident between req.compact and req.extended */
+  h->req.compact.flags = htobe16 (cmd->flags);
+  h->req.compact.type = htobe16 (cmd->type);
+  h->req.compact.handle = htobe64 (cmd->cookie);
+  h->req.compact.offset = htobe64 (cmd->offset);
+  if (h->extended_headers) {
+    h->req.extended.magic = htobe32 (NBD_EXTENDED_REQUEST_MAGIC);
+    h->req.extended.count = htobe64 (cmd->count);
+    h->wlen = sizeof (h->req.extended);
+  }
+  else {
+    assert (cmd->count <= UINT32_MAX);
+    h->req.compact.magic = htobe32 (NBD_REQUEST_MAGIC);
+    h->req.compact.count = htobe32 (cmd->count);
+    h->wlen = sizeof (h->req.compact);
+  }
   h->chunks_sent++;
-  h->wbuf = &h->request;
-  h->wlen = sizeof (h->request);
+  h->wbuf = &h->req;
   if (cmd->type == NBD_CMD_WRITE || cmd->next)
     h->wflags = MSG_MORE;
   SET_NEXT_STATE (%SEND_REQUEST);
@@ -74,7 +83,7 @@  ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD:

   assert (h->cmds_to_issue != NULL);
   cmd = h->cmds_to_issue;
-  assert (cmd->cookie == be64toh (h->request.handle));
+  assert (cmd->cookie == be64toh (h->req.compact.handle));
   if (cmd->type == NBD_CMD_WRITE) {
     h->wbuf = cmd->data;
     h->wlen = cmd->count;
@@ -120,7 +129,7 @@  ISSUE_COMMAND.FINISH:
   assert (!h->wlen);
   assert (h->cmds_to_issue != NULL);
   cmd = h->cmds_to_issue;
-  assert (cmd->cookie == be64toh (h->request.handle));
+  assert (cmd->cookie == be64toh (h->req.compact.handle));
   h->cmds_to_issue = cmd->next;
   if (h->cmds_to_issue_tail == cmd)
     h->cmds_to_issue_tail = NULL;
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index 7587d856..6f187f14 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -34,7 +34,7 @@ structured_reply_in_bounds (uint64_t offset, uint32_t length,
       offset + length > cmd->offset + cmd->count) {
     set_error (0, "range of structured reply is out of bounds, "
                "offset=%" PRIu64 ", cmd->offset=%" PRIu64 ", "
-               "length=%" PRIu32 ", cmd->count=%" PRIu32 ": "
+               "length=%" PRIu32 ", cmd->count=%" PRIu64 ": "
                "this is likely to be a bug in the NBD server",
                offset, cmd->offset, length, cmd->count);
     return false;
diff --git a/lib/rw.c b/lib/rw.c
index 104d0fb0..81dded3f 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -223,13 +223,11 @@ nbd_internal_command_common (struct nbd_handle *h,
     }
     break;

-    /* Other commands are currently limited by the 32 bit field in the
-     * command structure on the wire, but in future we hope to support
-     * 64 bit values here with a change to the NBD protocol which is
-     * being discussed upstream.
+    /* Other commands are limited by the 32 bit field in the command
+     * structure on the wire, unless extended headers were negotiated.
      */
   default:
-    if (count > UINT32_MAX) {
+    if (!h->extended_headers && count > UINT32_MAX) {
       set_error (ERANGE, "request too large: maximum request size is %" PRIu32,
                  UINT32_MAX);
       goto err;
@@ -358,6 +356,15 @@ nbd_unlocked_aio_pwrite (struct nbd_handle *h, const void *buf,
       return -1;
     }
   }
+  /* It is more convenient to manage PAYLOAD_LEN by what was negotiated
+   * than to require the user to have to set it correctly.
+   * TODO: Add new h->strict bit to allow intentional protocol violation
+   * for interoperability testing.
+   */
+  if (h->extended_headers)
+    flags |= LIBNBD_CMD_FLAG_PAYLOAD_LEN;
+  else
+    flags &= ~LIBNBD_CMD_FLAG_PAYLOAD_LEN;

   SET_CALLBACK_TO_NULL (*completion);
   return nbd_internal_command_common (h, flags, NBD_CMD_WRITE, offset, count,
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 0b9c454e..cb99309c 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -231,6 +231,7 @@ check_PROGRAMS += \
 	meta-base-allocation \
 	closure-lifetimes \
 	pread-initialize \
+  pwrite-extended \
 	$(NULL)

 TESTS += \
@@ -642,6 +643,9 @@ closure_lifetimes_LDADD = $(top_builddir)/lib/libnbd.la
 pread_initialize_SOURCES = pread-initialize.c
 pread_initialize_LDADD = $(top_builddir)/lib/libnbd.la

+pwrite_extended_SOURCES = pwrite-extended.c
+pwrite_extended_LDADD = $(top_builddir)/lib/libnbd.la
+
 #----------------------------------------------------------------------
 # Testing TLS support.

diff --git a/tests/pwrite-extended.c b/tests/pwrite-extended.c
new file mode 100644
index 00000000..eeedf0a1
--- /dev/null
+++ b/tests/pwrite-extended.c
@@ -0,0 +1,112 @@
+/* NBD client library in userspace
+ * Copyright (C) 2013-2022 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/* Check behavior of pwrite with PAYLOAD_LEN flag for extended headers. */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sys/stat.h>
+
+#include <libnbd.h>
+
+static char *progname;
+static char buf[512];
+
+static void
+check (int experr, const char *prefix)
+{
+  const char *msg = nbd_get_error ();
+  int errnum = nbd_get_errno ();
+
+  fprintf (stderr, "error: \"%s\"\n", msg);
+  fprintf (stderr, "errno: %d (%s)\n", errnum, strerror (errnum));
+  if (strncmp (msg, prefix, strlen (prefix)) != 0) {
+    fprintf (stderr, "%s: test failed: missing context prefix: %s\n",
+             progname, msg);
+    exit (EXIT_FAILURE);
+  }
+  if (errnum != experr) {
+    fprintf (stderr, "%s: test failed: "
+             "expected errno = %d (%s), but got %d\n",
+             progname, experr, strerror (experr), errnum);
+    exit (EXIT_FAILURE);
+  }
+}
+
+int
+main (int argc, char *argv[])
+{
+  struct nbd_handle *nbd;
+  const char *cmd[] = {
+    "nbdkit", "-s", "-v", "--exit-with-parent", "memory", "1048576", NULL
+  };
+  uint32_t strict;
+
+  progname = argv[0];
+
+  nbd = nbd_create ();
+  if (nbd == NULL) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  /* Connect to the server. */
+  if (nbd_connect_command (nbd, (char **) cmd) == -1) {
+    fprintf (stderr, "%s: %s\n", argv[0], nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  /* FIXME: future API addition to test if server negotiated extended mode.
+   * Until then, strict flags must ignore the PAYLOAD_LEN flag for pwrite,
+   * even though it is rejected for other commands.
+   */
+  strict = nbd_get_strict_mode (nbd);
+  if (!(strict & LIBNBD_STRICT_FLAGS)) {
+    fprintf (stderr, "%s: test failed: "
+             "nbd_get_strict_mode did not have expected flag set\n",
+             argv[0]);
+    exit (EXIT_FAILURE);
+  }
+  if (nbd_aio_pread (nbd, buf, 512, 0, NBD_NULL_COMPLETION,
+                     LIBNBD_CMD_FLAG_PAYLOAD_LEN) != -1) {
+    fprintf (stderr, "%s: test failed: "
+             "nbd_aio_pread did not fail with unexpected flag\n",
+             argv[0]);
+    exit (EXIT_FAILURE);
+  }
+  check (EINVAL, "nbd_aio_pread: ");
+
+  if (nbd_aio_pwrite (nbd, buf, 512, 0, NBD_NULL_COMPLETION,
+                     LIBNBD_CMD_FLAG_PAYLOAD_LEN) == -1) {
+    fprintf (stderr, "%s: %s\n", argv[0], nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  if (nbd_aio_pwrite (nbd, buf, 512, 0, NBD_NULL_COMPLETION, 0) == -1) {
+    fprintf (stderr, "%s: %s\n", argv[0], nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  nbd_close (nbd);
+  exit (EXIT_SUCCESS);
+}
diff --git a/.gitignore b/.gitignore
index f4273713..bd002522 100644
--- a/.gitignore
+++ b/.gitignore
@@ -242,6 +242,7 @@ Makefile.in
 /tests/pki/
 /tests/pread-initialize
 /tests/private-data
+/tests/pwrite-extended
 /tests/read-only-flag
 /tests/read-write-flag
 /tests/server-death
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 05/23] states: Prepare to receive 64-bit replies
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (3 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 04/23] states: Prepare to send 64-bit requests Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 06/23] states: Break deadlock if server goofs on extended replies Eric Blake
                     ` (17 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Support receiving headers for 64-bit replies if extended headers were
negotiated.  We already insist that the server not send us too much
payload in one reply, so we can exploit that and merge the 64-bit
length back into a normalized 32-bit field for the rest of the payload
length calculations.  The NBD protocol specifically documents that
extended mode takes precedence over structured replies, and that there
are no simple replies in extended mode.  We can also take advantage
that the handle field is in the same offset in all the various reply
types.

Note that if we negotiate extended headers, but a non-compliant server
replies with a non-extended header, this patch will stall waiting for
the server to send more bytes rather than noticing that the magic
number is wrong (for aio operations, you'll get a magic number
mismatch once you send a second command that elicits a reply; but for
blocking operations, we basically deadlock).  The easy alternative
would be to read just the first 4 bytes of magic, then determine how
many more bytes to expect; but that would require more states and
syscalls, and not worth it since the typical server will be compliant.
The other alternative is what the next patch implements: teaching
REPLY.RECV_REPLY to handle short reads that were at least long enough
to transmit magic to specifically look for magic number mismatch.

At this point, h->extended_headers is permanently false (we can't
enable it until all other aspects of the protocol have likewise been
converted).
---
 lib/internal.h                      |  1 +
 generator/states-reply-structured.c | 52 +++++++++++++++++++----------
 generator/states-reply.c            | 31 +++++++++++------
 3 files changed, 57 insertions(+), 27 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index e900eca3..73fd24c0 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -242,6 +242,7 @@ struct nbd_handle {
       union {
         struct nbd_simple_reply simple;
         struct nbd_structured_reply structured;
+        struct nbd_extended_reply extended;
       } hdr;
       union {
         struct nbd_structured_reply_offset_data offset_data;
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index 6f187f14..da9894c6 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -45,14 +45,20 @@ structured_reply_in_bounds (uint64_t offset, uint32_t length,

 STATE_MACHINE {
  REPLY.STRUCTURED_REPLY.START:
-  /* We've only read the simple_reply.  The structured_reply is longer,
-   * so read the remaining part.
+  /* If we have extended headers, we've already read the entire header.
+   * Otherwise, we've only read enough for a simple_reply; since structured
+   * replies are longer, read the remaining part.
    */
-  h->rbuf = &h->sbuf;
-  h->rbuf = (char *) h->rbuf + sizeof h->sbuf.reply.hdr.simple;
-  h->rlen = sizeof h->sbuf.reply.hdr.structured;
-  h->rlen -= sizeof h->sbuf.reply.hdr.simple;
-  SET_NEXT_STATE (%RECV_REMAINING);
+  if (h->extended_headers) {
+    assert (h->rbuf == sizeof h->sbuf.reply.hdr.extended + (char*) &h->sbuf);
+    SET_NEXT_STATE (%CHECK);
+  }
+  else {
+    assert (h->rbuf == sizeof h->sbuf.reply.hdr.simple + (char*) &h->sbuf);
+    h->rlen = sizeof h->sbuf.reply.hdr.structured -
+      sizeof h->sbuf.reply.hdr.simple;
+    SET_NEXT_STATE (%RECV_REMAINING);
+  }
   return 0;

  REPLY.STRUCTURED_REPLY.RECV_REMAINING:
@@ -69,11 +75,18 @@  REPLY.STRUCTURED_REPLY.RECV_REMAINING:
  REPLY.STRUCTURED_REPLY.CHECK:
   struct command *cmd = h->reply_cmd;
   uint16_t flags, type;
-  uint32_t length;
+  uint64_t length;
+  uint64_t offset = -1;

+  assert (cmd);
   flags = be16toh (h->sbuf.reply.hdr.structured.flags);
   type = be16toh (h->sbuf.reply.hdr.structured.type);
-  length = be32toh (h->sbuf.reply.hdr.structured.length);
+  if (h->extended_headers) {
+    length = be64toh (h->sbuf.reply.hdr.extended.length);
+    offset = be64toh (h->sbuf.reply.hdr.extended.offset);
+  }
+  else
+    length = be32toh (h->sbuf.reply.hdr.structured.length);

   /* Reject a server that replies with too much information, but don't
    * reject a single structured reply to NBD_CMD_READ on the largest
@@ -83,13 +96,18 @@  REPLY.STRUCTURED_REPLY.CHECK:
    * not worth keeping the connection alive.
    */
   if (length > MAX_REQUEST_SIZE + sizeof h->sbuf.reply.payload.offset_data) {
-    set_error (0, "invalid server reply length %" PRIu32, length);
+    set_error (0, "invalid server reply length %" PRIu64, length);
     SET_NEXT_STATE (%.DEAD);
     return 0;
   }
+  /* For convenience, we now normalize extended replies into compact,
+   * doable since we validated length fits in 32 bits.
+   */
+  h->sbuf.reply.hdr.structured.length = length;

   /* Skip an unexpected structured reply, including to an unknown cookie. */
-  if (cmd == NULL || !h->structured_replies)
+  if (cmd == NULL || !h->structured_replies ||
+      (h->extended_headers && offset != cmd->offset))
     goto resync;

   switch (type) {
@@ -168,7 +186,7 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.reply.hdr.structured.length);
+    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
     assert (length >= sizeof h->sbuf.reply.payload.error.error.error);
     assert (cmd);

@@ -207,7 +225,7 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR_MESSAGE:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.reply.hdr.structured.length);
+    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
     msglen = be16toh (h->sbuf.reply.payload.error.error.len);
     type = be16toh (h->sbuf.reply.hdr.structured.type);

@@ -307,7 +325,7 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.reply.hdr.structured.length);
+    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
     offset = be64toh (h->sbuf.reply.payload.offset_data.offset);

     assert (cmd); /* guaranteed by CHECK */
@@ -346,7 +364,7 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA_DATA:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.reply.hdr.structured.length);
+    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
     offset = be64toh (h->sbuf.reply.payload.offset_data.offset);

     assert (cmd); /* guaranteed by CHECK */
@@ -426,7 +444,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_HEADER:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.reply.hdr.structured.length);
+    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */

     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
@@ -460,7 +478,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = be32toh (h->sbuf.reply.hdr.structured.length);
+    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */

     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
diff --git a/generator/states-reply.c b/generator/states-reply.c
index 845973a4..dde23b39 100644
--- a/generator/states-reply.c
+++ b/generator/states-reply.c
@@ -62,15 +62,19 @@  REPLY.START:
    */
   ssize_t r;

-  /* We read all replies initially as if they are simple replies, but
-   * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
-   * This works because the structured_reply header is larger.
+  /* With extended headers, there is only one size to read, so we can do it
+   * all in one syscall.  But for older structured replies, we don't know if
+   * we have a simple or structured reply until we read the magic number,
+   * requiring a two-part read with CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
    */
   assert (h->reply_cmd == NULL);
   assert (h->rlen == 0);

   h->rbuf = &h->sbuf.reply.hdr;
-  h->rlen = sizeof h->sbuf.reply.hdr.simple;
+  if (h->extended_headers)
+    h->rlen = sizeof h->sbuf.reply.hdr.extended;
+  else
+    h->rlen = sizeof h->sbuf.reply.hdr.simple;

   r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen);
   if (r == -1) {
@@ -116,15 +120,22 @@  REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY:
   uint64_t cookie;

   magic = be32toh (h->sbuf.reply.hdr.simple.magic);
-  if (magic == NBD_SIMPLE_REPLY_MAGIC) {
+  switch (magic) {
+  case NBD_SIMPLE_REPLY_MAGIC:
+    if (h->extended_headers)
+      goto invalid;
     SET_NEXT_STATE (%SIMPLE_REPLY.START);
-  }
-  else if (magic == NBD_STRUCTURED_REPLY_MAGIC) {
+    break;
+  case NBD_STRUCTURED_REPLY_MAGIC:
+  case NBD_EXTENDED_REPLY_MAGIC:
+    if ((magic == NBD_STRUCTURED_REPLY_MAGIC) == h->extended_headers)
+      goto invalid;
     SET_NEXT_STATE (%STRUCTURED_REPLY.START);
-  }
-  else {
+    break;
+  default:
+  invalid:
     SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */
-    set_error (0, "invalid reply magic 0x%" PRIx32, magic);
+    set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, magic);
     return 0;
   }

-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 06/23] states: Break deadlock if server goofs on extended replies
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (4 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 05/23] states: Prepare to receive 64-bit replies Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 07/23] generator: Add struct nbd_extent in prep for 64-bit extents Eric Blake
                     ` (16 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

One of the benefits of extended replies is that we can do a
fixed-length read for the entire header of every server reply, which
is fewer syscalls than the split-read approach required by structured
replies.  But one of the drawbacks of doing a large read is that if
the server is non-compliant (not a problem for normal servers, but
something I hit rather more than I'd like to admit while developing
extended header support in servers), nbd_pwrite() and friends will
deadlock if the server replies with the wrong header.  Add in some
code to catch that failure mode and move the state machine to DEAD
sooner, to make it easier to diagnose the fault in the server.

Unlike in the case of an unexpected simply reply from a structured
server (where we never over-read, and can therefore commit b31e7bac
can merely fail the command with EPROTO and successfully move on to
the next server reply), in this case we really do have to move to
DEAD: in addition to having already read the 16 or 20 bytes that the
server sent in its (short) reply for this command, we may have already
read the initial bytes of the server's next reply, but we have no way
to push those extra bytes back onto our read stream for parsing on our
next pass through the state machine.
---
 generator/states-reply.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/generator/states-reply.c b/generator/states-reply.c
index dde23b39..e89e9019 100644
--- a/generator/states-reply.c
+++ b/generator/states-reply.c
@@ -109,7 +109,23 @@  REPLY.START:
  REPLY.RECV_REPLY:
   switch (recv_into_rbuf (h)) {
   case -1: SET_NEXT_STATE (%.DEAD); return 0;
-  case 1: SET_NEXT_STATE (%.READY); return 0;
+  case 1: SET_NEXT_STATE (%.READY);
+    /* Special case: if we have a short read, but got at least far
+     * enough to decode the magic number, we can check if the server
+     * is matching our expectations. This lets us avoid deadlocking if
+     * a buggy server sends only 16 bytes of a simple reply, and is
+     * waiting for our next command, while we are blocked waiting for
+     * the server to send 32 bytes of an extended reply.
+     */
+    if (h->extended_headers &&
+        (char *) h->rbuf >= (char *) &h->sbuf.reply.hdr.extended.flags) {
+      uint32_t magic = be32toh (h->sbuf.reply.hdr.extended.magic);
+      if (magic != NBD_EXTENDED_REPLY_MAGIC) {
+        SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */
+        set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, magic);
+      }
+    }
+    return 0;
   case 0: SET_NEXT_STATE (%CHECK_SIMPLE_OR_STRUCTURED_REPLY);
   }
   return 0;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 07/23] generator: Add struct nbd_extent in prep for 64-bit extents
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (5 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 06/23] states: Break deadlock if server goofs on extended replies Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 08/23] block_status: Track 64-bit extents internally Eric Blake
                     ` (15 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

The existing nbd_block_status() callback is permanently stuck with an
array of uint32_t pairs (len/2 extents), which is both constrained on
maximum extent size (no larger than 4G) and on the status flags (must
fit in 32 bits).  While the "base:allocation" metacontext will never
exceed 32 bits, it is not hard to envision other extension
metacontexts where a 64-bit status would be useful (for example, Zoned
Block Devices expressing a 64-bit offset[1]).  Exposing 64-bit extents
will require a new API; we now have the decision of whether to copy
the existing API practice of returning a bare array containing len/2
extent pairs, or with a saner idea of an array of structs with len
extents.  Returning an array of structs is actually easier to map to
various language bindings, particularly since we know the length field
can fit in 63-bits (fallout from the fact that NBD exports are capped
in size by signed off_t), but where the status field must represent a
full 64 bits (assuming that the user wants to connect to a metadata
extension that utilizes 64 bits, rather than existing metadata
contexts that only expose 32 bits).

This patch introduces the struct we plan to use in the new API, along
with language bindings.  The bindings for Python and OCaml were
relatively straightforward; the Golang bindings took a bit more effort
for me to write.  Temporary unused attributes are needed to keep the
compiler happy until a later patch exposes a new API using the new
callback type.

Note that 'struct nbd_block_descriptor_ext' in lib/nbd-protocol.h is
exactly the same as what we want to use in C.  But it is easier to
stick a new public type in <libnbd.h> than to figure out how to expose
just part of a header we only want to use internally.

[1] https://zonedstorage.io/docs/linux/zbd-api
---
 generator/API.ml    | 12 +++++++++++-
 generator/API.mli   |  1 +
 generator/C.ml      | 24 +++++++++++++++++++++---
 generator/GoLang.ml | 24 ++++++++++++++++++++++++
 generator/OCaml.ml  | 19 ++++++++++++++++---
 generator/Python.ml | 21 ++++++++++++++++++++-
 ocaml/helpers.c     | 20 ++++++++++++++++++++
 ocaml/nbd-c.h       |  3 ++-
 golang/handle.go    |  8 +++++++-
 9 files changed, 122 insertions(+), 10 deletions(-)

diff --git a/generator/API.ml b/generator/API.ml
index beb7a2b4..85509867 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -42,6 +42,7 @@
 | BytesPersistOut of string * string
 | Closure of closure
 | Enum of string * enum
+| Extent64 of string
 | Fd of string
 | Flags of string * flags
 | Int of string
@@ -157,6 +158,14 @@ let extent_closure =
                             "nr_entries");
              CBMutable (Int "error") ]
 }
+let extent64_closure = {
+  cbname = "extent64";
+  cbargs = [ CBString "metacontext";
+             CBUInt64 "offset";
+             CBArrayAndLen (Extent64 "entries",
+                            "nr_entries");
+             CBMutable (Int "error") ]
+}
 let list_closure = {
   cbname = "list";
   cbargs = [ CBString "name"; CBString "description" ]
@@ -166,7 +175,8 @@ let context_closure =
   cbargs = [ CBString "name" ]
 }
 let all_closures = [ chunk_closure; completion_closure;
-                     debug_closure; extent_closure; list_closure;
+                     debug_closure; extent_closure; extent64_closure;
+                     list_closure;
                      context_closure ]

 (* Enums. *)
diff --git a/generator/API.mli b/generator/API.mli
index b0267705..12ad1fb4 100644
--- a/generator/API.mli
+++ b/generator/API.mli
@@ -52,6 +52,7 @@ and
 | BytesPersistOut of string * string
 | Closure of closure       (** function pointer + void *opaque *)
 | Enum of string * enum    (** enum/union type, int in C *)
+| Extent64 of string       (** extent descriptor, struct nbd_extent in C *)
 | Fd of string             (** file descriptor *)
 | Flags of string * flags  (** flags, uint32_t in C *)
 | Int of string            (** small int *)
diff --git a/generator/C.ml b/generator/C.ml
index f9171996..51a4df65 100644
--- a/generator/C.ml
+++ b/generator/C.ml
@@ -93,6 +93,7 @@ let
   | Closure { cbname } ->
      [ sprintf "%s_callback" cbname; sprintf "%s_user_data" cbname ]
   | Enum (n, _) -> [n]
+  | Extent64 n -> [n]
   | Fd n -> [n]
   | Flags (n, _) -> [n]
   | Int n -> [n]
@@ -128,7 +129,7 @@ let
   (* list of strings should be marked as non-null *)
   | StringList n -> [ true ]
   (* numeric and other non-pointer types are not able to be null *)
-  | Bool _ | Closure _ | Enum _ | Fd _ | Flags _
+  | Bool _ | Closure _ | Enum _ | Extent64 _ | Fd _ | Flags _
   | Int _ | Int64 _ | SizeT _
   | UInt _ | UInt32 _ | UInt64 _ | UIntPtr _ -> [ false ]

@@ -182,6 +183,9 @@ and
       | Enum (n, _) ->
          if types then pr "int ";
          pr "%s" n
+      | Extent64 n ->
+         if types then pr "nbd_extent ";
+         pr "%s" n
       | Flags (n, _) ->
          if types then pr "uint32_t ";
          pr "%s" n
@@ -292,6 +296,11 @@ and
          pr "%s, " n;
          if types then pr "size_t ";
          pr "%s" len
+      | CBArrayAndLen (Extent64 n, len) ->
+         if types then pr "nbd_extent *";
+         pr "%s, " n;
+         if types then pr "size_t ";
+         pr "%s" len
       | CBArrayAndLen _ -> assert false
       | CBBytesIn (n, len) ->
          if types then pr "const void *";
@@ -463,6 +472,13 @@ let
   pr "extern int nbd_get_errno (void);\n";
   pr "#define LIBNBD_HAVE_NBD_GET_ERRNO 1\n";
   pr "\n";
+  pr "/* This is used in the callback for nbd_block_status_64.\n";
+  pr " */\n";
+  pr "typedef struct {\n";
+  pr "  uint64_t length;\n";
+  pr "  uint64_t flags;\n";
+  pr "} nbd_extent;\n";
+  pr "\n";
   print_closure_structs ();
   List.iter (
     fun (name, { args; optargs; ret }) ->
@@ -716,7 +732,7 @@ let
          pr "    char *%s_printable =\n" n;
          pr "        nbd_internal_printable_string_list (%s);\n" n
       | BytesOut _ | BytesPersistOut _
-      | Bool _ | Closure _ | Enum _ | Flags _ | Fd _ | Int _
+      | Bool _ | Closure _ | Enum _ | Extent64 _ | Flags _ | Fd _ | Int _
       | Int64 _ | SizeT _
       | SockAddrAndLen _ | UInt _ | UInt32 _ | UInt64 _ | UIntPtr _ -> ()
     ) args;
@@ -735,6 +751,7 @@ let
          pr " %s=\\\"%%s\\\" %s=%%zu" n count
       | Closure { cbname } -> pr " %s=<fun>" cbname
       | Enum (n, _) -> pr " %s=%%d" n
+      | Extent64 _ -> assert false (* only used in extent64_closure *)
       | Flags (n, _) -> pr " %s=0x%%x" n
       | Fd n | Int n -> pr " %s=%%d" n
       | Int64 n -> pr " %s=%%\" PRIi64 \"" n
@@ -764,6 +781,7 @@ let
          pr ", %s_printable ? %s_printable : \"\", %s" n n count
       | Closure { cbname } -> ()
       | Enum (n, _) -> pr ", %s" n
+      | Extent64 _ -> assert false (* only used in extent64_closure *)
       | Flags (n, _) -> pr ", %s" n
       | Fd n | Int n | Int64 n | SizeT n -> pr ", %s" n
       | SockAddrAndLen (_, len) -> pr ", (int) %s" len
@@ -787,7 +805,7 @@ let
       | StringList n ->
          pr "    free (%s_printable);\n" n
       | BytesOut _ | BytesPersistOut _
-      | Bool _ | Closure _ | Enum _ | Flags _ | Fd _ | Int _
+      | Bool _ | Closure _ | Enum _ | Extent64 _ | Flags _ | Fd _ | Int _
       | Int64 _ | SizeT _
       | SockAddrAndLen _ | UInt _ | UInt32 _ | UInt64 _ | UIntPtr _ -> ()
     ) args;
diff --git a/generator/GoLang.ml b/generator/GoLang.ml
index eb7279a4..39b6cc64 100644
--- a/generator/GoLang.ml
+++ b/generator/GoLang.ml
@@ -49,6 +49,7 @@ let
   | BytesPersistOut (n, len) -> n
   | Closure { cbname } -> cbname
   | Enum (n, _) -> n
+  | Extent64 n -> n
   | Fd n -> n
   | Flags (n, _) -> n
   | Int n -> n
@@ -71,6 +72,7 @@ let
   | BytesPersistOut _ -> "AioBuffer"
   | Closure { cbname } -> sprintf "%sCallback" (camel_case cbname)
   | Enum (_, { enum_prefix }) -> camel_case enum_prefix
+  | Extent64 _ -> assert false (* only used in extent64_closure *)
   | Fd _ -> "int"
   | Flags (_, { flag_prefix }) -> camel_case flag_prefix
   | Int _ -> "int"
@@ -262,6 +264,7 @@ let
        pr "    c_%s.user_data = C.alloc_cbid(C.long(%s_cbid))\n" cbname cbname
     | Enum (n, _) ->
        pr "    c_%s := C.int (%s)\n" n n
+    | Extent64 _ -> assert false (* only used in extent64_closure *)
     | Fd n ->
        pr "    c_%s := C.int (%s)\n" n n
     | Flags (n, _) ->
@@ -334,6 +337,7 @@ let
     | BytesPersistOut (n, len) ->  pr ", c_%s, c_%s" n len
     | Closure { cbname } ->  pr ", c_%s" cbname
     | Enum (n, _) -> pr ", c_%s" n
+    | Extent64 _ -> assert false (* only used in extent64_closure *)
     | Fd n -> pr ", c_%s" n
     | Flags (n, _) -> pr ", c_%s" n
     | Int n -> pr ", c_%s" n
@@ -524,6 +528,18 @@ let
     copy(ret, s)
     return ret
 }
+
+func copy_extent_array (entries *C.nbd_extent, count C.size_t) []LibnbdExtent {
+    unsafePtr := unsafe.Pointer(entries)
+    arrayPtr := (*[1 << 20]C.nbd_extent)(unsafePtr)
+    slice := arrayPtr[:count:count]
+    ret := make([]LibnbdExtent, count)
+    for i := 0; i < int (count); i++ {
+      ret[i].Length = uint64 (slice[i].length)
+      ret[i].Flags = uint64 (slice[i].flags)
+    }
+    return ret
+}
 ";

   List.iter (
@@ -537,6 +553,8 @@ let
           match cbarg with
           | CBArrayAndLen (UInt32 n, _) ->
              pr "%s []uint32" n;
+          | CBArrayAndLen (Extent64 n, _) ->
+             pr "%s []LibnbdExtent" n;
           | CBBytesIn (n, len) ->
              pr "%s []byte" n;
           | CBInt n ->
@@ -563,6 +581,8 @@ let
           match cbarg with
           | CBArrayAndLen (UInt32 n, count) ->
              pr "%s *C.uint32_t, %s C.size_t" n count
+          | CBArrayAndLen (Extent64 n, count) ->
+             pr "%s *C.nbd_extent, %s C.size_t" n count
           | CBBytesIn (n, len) ->
              pr "%s unsafe.Pointer, %s C.size_t" n len
           | CBInt n ->
@@ -605,6 +625,8 @@ let
           match cbarg with
           | CBArrayAndLen (UInt32 n, count) ->
              pr "copy_uint32_array (%s, %s)" n count
+          | CBArrayAndLen (Extent64 n, count) ->
+             pr "copy_extent_array (%s, %s)" n count
           | CBBytesIn (n, len) ->
              pr "C.GoBytes (%s, C.int (%s))" n len
           | CBInt n ->
@@ -756,6 +778,8 @@ let
            match cbarg with
            | CBArrayAndLen (UInt32 n, count) ->
               pr "uint32_t *%s, size_t %s" n count
+           | CBArrayAndLen (Extent64 n, count) ->
+              pr "nbd_extent *%s, size_t %s" n count
            | CBBytesIn (n, len) ->
               pr "void *%s, size_t %s" n len
            | CBInt n ->
diff --git a/generator/OCaml.ml b/generator/OCaml.ml
index 6a280b67..66dc6cad 100644
--- a/generator/OCaml.ml
+++ b/generator/OCaml.ml
@@ -44,6 +44,7 @@ and
   | Closure { cbargs } ->
      sprintf "(%s)" (ocaml_closuredecl_to_string cbargs)
   | Enum (_, { enum_prefix }) -> sprintf "%s.t" enum_prefix
+  | Extent64 _ -> "extent"
   | Fd _ -> "Unix.file_descr"
   | Flags (_, { flag_prefix }) -> sprintf "%s.t list" flag_prefix
   | Int _ -> "int"
@@ -102,6 +103,7 @@ let
   | BytesPersistOut (n, len) -> n
   | Closure { cbname } -> cbname
   | Enum (n, _) -> n
+  | Extent64 n -> n
   | Fd n -> n
   | Flags (n, _) -> n
   | Int n -> n
@@ -147,6 +149,9 @@ let

 type cookie = int64

+type extent = int64 * int64
+(** Length and flags of an extent in block_status_64 callback. *)
+
 ";

   List.iter (
@@ -268,6 +273,7 @@ let
 exception Error of string * Unix.error option
 exception Closed of string
 type cookie = int64
+type extent = int64 * int64

 (* Give the exceptions names so that they can be raised from the C code. *)
 let () =
@@ -498,7 +504,8 @@ let
   let argnames =
     List.map (
       function
-      | CBArrayAndLen (UInt32 n, _) | CBBytesIn (n, _)
+      | CBArrayAndLen (UInt32 n, _) | CBArrayAndLen (Extent64 n, _)
+      | CBBytesIn (n, _)
       | CBInt n | CBInt64 n
       | CBMutable (Int n) | CBString n | CBUInt n | CBUInt64 n ->
          n ^ "v"
@@ -526,6 +533,9 @@ let
     | CBArrayAndLen (UInt32 n, count) ->
        pr "  %sv = nbd_internal_ocaml_alloc_int64_from_uint32_array (%s, %s);\n"
          n n count;
+    | CBArrayAndLen (Extent64 n, count) ->
+       pr "  %sv = nbd_internal_ocaml_alloc_extent64_array (%s, %s);\n"
+         n n count;
     | CBBytesIn (n, len) ->
        pr "  %sv = caml_alloc_initialized_string (%s, %s);\n" n len n
     | CBInt n | CBUInt n ->
@@ -549,7 +559,7 @@ let

   List.iter (
     function
-    | CBArrayAndLen (UInt32 _, _)
+    | CBArrayAndLen (_, _)
     | CBBytesIn _
     | CBInt _
     | CBInt64 _
@@ -558,7 +568,7 @@ let
     | CBUInt64 _ -> ()
     | CBMutable (Int n) ->
        pr "  *%s = Int_val (Field (%sv, 0));\n" n n
-    | CBArrayAndLen _ | CBMutable _ -> assert false
+    | CBMutable _ -> assert false
   ) cbargs;

   pr "  if (Is_exception_result (rv)) {\n";
@@ -573,6 +583,7 @@ let
   pr "}\n";
   pr "\n";
   pr "static int\n";
+  pr "__attribute__((unused)) /* XXX temporary hack */\n";
   pr "%s_wrapper " cbname;
   C.print_cbarg_list ~wrap:true cbargs;
   pr "\n";
@@ -688,6 +699,7 @@ let
        pr "  %s_callback.free = free_user_data;\n" cbname
     | Enum (n, { enum_prefix }) ->
        pr "  int %s = %s_val (%sv);\n" n enum_prefix n
+    | Extent64 _ -> assert false (* only used in extent64_closure *)
     | Fd n ->
        pr "  /* OCaml Unix.file_descr is just an int, at least on Unix. */\n";
        pr "  int %s = Int_val (%sv);\n" n n
@@ -773,6 +785,7 @@ let
     | BytesPersistOut _
     | Closure _
     | Enum _
+    | Extent64 _
     | Fd _
     | Flags _
     | Int _
diff --git a/generator/Python.ml b/generator/Python.ml
index 3585955b..980fdcea 100644
--- a/generator/Python.ml
+++ b/generator/Python.ml
@@ -153,6 +153,7 @@ let
 let print_python_closure_wrapper { cbname; cbargs } =
   pr "/* Wrapper for %s callback. */\n" cbname;
   pr "static int\n";
+  pr "__attribute__((unused)) /* XXX temporary hack */\n";
   pr "%s_wrapper " cbname;
   C.print_cbarg_list ~wrap:true cbargs;
   pr "\n";
@@ -165,6 +166,7 @@ let
   List.iter (
     function
     | CBArrayAndLen (UInt32 n, _)
+    | CBArrayAndLen (Extent64 n, _)
     | CBBytesIn (n, _)
     | CBMutable (Int n) ->
        pr "  PyObject *py_%s = NULL;\n" n
@@ -182,6 +184,16 @@ let
        pr "    if (!py_e_%s) { PyErr_PrintEx (0); goto out; }\n" n;
        pr "    PyList_SET_ITEM (py_%s, i_%s, py_e_%s);\n" n n n;
        pr "  }\n"
+    | CBArrayAndLen (Extent64 n, len) ->
+       pr "  py_%s = PyList_New (%s);\n" n len;
+       pr "  size_t i_%s;\n" n;
+       pr "  for (i_%s = 0; i_%s < %s; ++i_%s) {\n" n n len n;
+       pr "    PyObject *py_e_%s = Py_BuildValue (\"OO\",\n" n;
+       pr "      PyLong_FromUnsignedLong (%s[i_%s].length),\n" n n;
+       pr "      PyLong_FromUnsignedLong (%s[i_%s].flags));\n" n n;
+       pr "    if (!py_e_%s) { PyErr_PrintEx (0); goto out; }\n" n;
+       pr "    PyList_SET_ITEM (py_%s, i_%s, py_e_%s);\n" n n n;
+       pr "  }\n"
     | CBBytesIn (n, len) ->
        pr "  py_%s = nbd_internal_py_get_subview (data->view, %s, %s);\n" n n len;
        pr "  if (!py_%s) { PyErr_PrintEx (0); goto out; }\n" n
@@ -201,6 +213,7 @@ let
     List.map (
       function
       | CBArrayAndLen (UInt32 n, _) -> "O", sprintf "py_%s" n
+      | CBArrayAndLen (Extent64 n, _) -> "O", sprintf "py_%s" n
       | CBBytesIn (n, _) -> "O", sprintf "py_%s" n
       | CBInt n -> "i", n
       | CBInt64 n -> "L", n
@@ -249,7 +262,8 @@ let
   pr " out:\n";
   List.iter (
     function
-    | CBArrayAndLen (UInt32 n, _) ->
+    | CBArrayAndLen (UInt32 n, _)
+    | CBArrayAndLen (Extent64 n, _) ->
        pr "  Py_XDECREF (py_%s);\n" n
     | CBBytesIn (n, _) ->
        pr "  Py_XDECREF (py_%s);\n" n
@@ -302,6 +316,7 @@ let
            pr ".callback = %s_wrapper, .free = free_user_data" cbname);
        pr " };\n"
     | Enum (n, _) -> pr "  int %s;\n" n
+    | Extent64 _ -> assert false (* only used in extent64_closure *)
     | Flags (n, _) ->
        pr "  uint32_t %s_u32;\n" n;
        pr "  unsigned int %s; /* really uint32_t */\n" n
@@ -358,6 +373,7 @@ let
          "O", sprintf "&%s" n, sprintf "py_%s->buf, py_%s->len" n n
       | Closure { cbname } -> "O", sprintf "&py_%s_fn" cbname, cbname
       | Enum (n, _) -> "i", sprintf "&%s" n, n
+      | Extent64 _ -> assert false (* only used in extent64_closure *)
       | Flags (n, _) -> "I", sprintf "&%s" n, sprintf "%s_u32" n
       | Fd n | Int n -> "i", sprintf "&%s" n, n
       | Int64 n -> "L", sprintf "&%s" n, sprintf "%s_i64" n
@@ -449,6 +465,7 @@ let
          pr "  if (!chunk_user_data->view) goto out;\n"
        )
     | Enum _ -> ()
+    | Extent64 _ -> ()
     | Flags (n, _) -> pr "  %s_u32 = %s;\n" n n
     | Fd _ | Int _ -> ()
     | Int64 n -> pr "  %s_i64 = %s;\n" n n
@@ -547,6 +564,7 @@ let
     | Closure { cbname } ->
        pr "  free_user_data (%s_user_data);\n" cbname
     | Enum _ -> ()
+    | Extent64 _ -> ()
     | Flags _ -> ()
     | Fd _ | Int _ -> ()
     | Int64 _ -> ()
@@ -834,6 +852,7 @@ let
           | BytesPersistOut (n, _) -> n, None
           | Closure { cbname } -> cbname, None
           | Enum (n, _) -> n, None
+          | Extent64 _ -> assert false (* only used in extent64_closure *)
           | Flags (n, _) -> n, None
           | Fd n | Int n -> n, None
           | Int64 n -> n, None
diff --git a/ocaml/helpers.c b/ocaml/helpers.c
index 6568755b..ac8bb0e2 100644
--- a/ocaml/helpers.c
+++ b/ocaml/helpers.c
@@ -133,6 +133,26 @@ nbd_internal_ocaml_alloc_int64_from_uint32_array (uint32_t *a, size_t len)
   CAMLreturn (rv);
 }

+value
+nbd_internal_ocaml_alloc_extent64_array (nbd_extent *a, size_t len)
+{
+  CAMLparam0 ();
+  CAMLlocal3 (s, v, rv);
+  size_t i;
+
+  rv = caml_alloc (len, 0);
+  for (i = 0; i < len; ++i) {
+    s = caml_alloc (2, 0);
+    v = caml_copy_int64 (a[i].length);
+    Store_field (s, 0, v);
+    v = caml_copy_int64 (a[i].flags);
+    Store_field (s, 1, v);
+    Store_field (rv, i, s);
+  }
+
+  CAMLreturn (rv);
+}
+
 /* Convert a Unix.sockaddr to a C struct sockaddr. */
 void
 nbd_internal_unix_sockaddr_to_sa (value sockaddrv,
diff --git a/ocaml/nbd-c.h b/ocaml/nbd-c.h
index 8b0c088d..9a5a80a2 100644
--- a/ocaml/nbd-c.h
+++ b/ocaml/nbd-c.h
@@ -1,5 +1,5 @@
 /* NBD client library in userspace
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -63,6 +63,7 @@ extern void nbd_internal_ocaml_raise_closed (const char *func) Noreturn;
 extern const char **nbd_internal_ocaml_string_list (value);
 extern value nbd_internal_ocaml_alloc_int64_from_uint32_array (uint32_t *,
                                                                size_t);
+extern value nbd_internal_ocaml_alloc_extent64_array (nbd_extent *, size_t);
 extern void nbd_internal_unix_sockaddr_to_sa (value, struct sockaddr_storage *,
                                               socklen_t *);
 extern void nbd_internal_ocaml_exception_in_wrapper (const char *, value);
diff --git a/golang/handle.go b/golang/handle.go
index c8d9485d..b8388b61 100644
--- a/golang/handle.go
+++ b/golang/handle.go
@@ -1,5 +1,5 @@
 /* libnbd golang handle.
- * Copyright (C) 2013-2021 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -58,6 +58,12 @@ func (h *Libnbd) String() string {
 	return "&Libnbd{}"
 }

+/* Used for block status callback. */
+type LibnbdExtent struct {
+	Length uint64        // length of the extent
+	Flags  uint64        // flags describing properties of the extent
+}
+
 /* All functions (except Close) return ([result,] LibnbdError). */
 type LibnbdError struct {
 	Op     string        // operation which failed
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 08/23] block_status: Track 64-bit extents internally
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (6 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 07/23] generator: Add struct nbd_extent in prep for 64-bit extents Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 09/23] block_status: Accept 64-bit extents during block status Eric Blake
                     ` (14 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

When extended headers are in use, the server can send us 64-bit
extents, even for a 32-bit query (if the server knows the entire image
is data, for example, or if the metacontext has a status definition
that uses more than 32 bits).  Also, while most contexts only have
32-bit flags, a server is allowed to negotiate contexts with 64-bit
flags when extended headers are in use.  Thus, for maximum
flexibility, we are better off storing 64-bit data internally, with a
goal of letting the client's 32-bit interface work as much as
possible, and for a future API addition of a 64-bit interface to work
even when the server only gave 32-bit results.

For backwards compatibility, a client that never negotiates a 64-bit
status context can be handled without errors by truncating any 64-bit
lengths down to just under 4G; so the old 32-bit interface will
continue to work in most cases.  But we can't truncate status down; if
a user requests an extended status, the 32-bit interface can now
report EOVERFLOW for that context (although that situation can't
happen until a later patch actually turns on the use of extended
headers).

Note that the existing 32-bit nbd_block_status() API is now slightly
slower, particularly when talking with a server that lacks extended
headers: we are doing double size conversions.  But this speed penalty
is likely in the noise compared to the network delays, and ideally
clients will switch over to the new 64-bit interfaces as more and more
servers start supporting extended headers.

One of the trickier aspects of this patch is auditing that both the
user's extent and our malloc'd shim get cleaned up once on all
possible paths, so that there is neither a leak nor a double free.
---
 lib/internal.h                      |  8 +++-
 generator/states-reply-structured.c | 30 ++++++++-----
 lib/handle.c                        |  2 +-
 lib/rw.c                            | 70 ++++++++++++++++++++++++++++-
 4 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index 73fd24c0..0c23f882 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -80,7 +80,7 @@ struct export {

 struct command_cb {
   union {
-    nbd_extent_callback extent;
+    nbd_extent64_callback extent;
     nbd_chunk_callback chunk;
     nbd_list_callback list;
     nbd_context_callback context;
@@ -302,7 +302,11 @@ struct nbd_handle {
   size_t querynum;

   /* When receiving block status, this is used. */
-  uint32_t *bs_entries;
+  union {
+    char *storage;      /* malloc's view */
+    nbd_extent *normal; /* Our 64-bit preferred internal form; n slots */
+    uint32_t *narrow;   /* 32-bit NBD_REPLY_TYPE_BLOCK_STATUS form; n*2 slots */
+  } bs_entries;

   /* Commands which are waiting to be issued [meaning the request
    * packet is sent to the server].  This is used as a simple linked
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index da9894c6..d23e56a9 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -436,6 +436,7 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_HOLE:
  REPLY.STRUCTURED_REPLY.RECV_BS_HEADER:
   struct command *cmd = h->reply_cmd;
   uint32_t length;
+  uint32_t count;

   switch (recv_into_rbuf (h)) {
   case -1: SET_NEXT_STATE (%.DEAD); return 0;
@@ -450,15 +451,19 @@  REPLY.STRUCTURED_REPLY.RECV_BS_HEADER:
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
     assert (length >= 12);
     length -= sizeof h->sbuf.reply.payload.bs_hdr;
+    count = length / (2 * sizeof (uint32_t));

-    free (h->bs_entries);
-    h->bs_entries = malloc (length);
-    if (h->bs_entries == NULL) {
+    /* Read raw data into a subset of h->bs_entries, then expand it
+     * into place later during byte-swapping.
+     */
+    free (h->bs_entries.storage);
+    h->bs_entries.storage = malloc (count * sizeof *h->bs_entries.normal);
+    if (h->bs_entries.storage == NULL) {
       SET_NEXT_STATE (%.DEAD);
       set_error (errno, "malloc");
       return 0;
     }
-    h->rbuf = h->bs_entries;
+    h->rbuf = h->bs_entries.narrow;
     h->rlen = length;
     SET_NEXT_STATE (%RECV_BS_ENTRIES);
   }
@@ -470,6 +475,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
   uint32_t count;
   size_t i;
   uint32_t context_id;
+  uint32_t *raw;

   switch (recv_into_rbuf (h)) {
   case -1: SET_NEXT_STATE (%.DEAD); return 0;
@@ -483,17 +489,21 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
     assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
-    assert (h->bs_entries);
+    assert (h->bs_entries.normal);
     assert (length >= 12);
     assert (h->meta_valid);
     count = (length - sizeof h->sbuf.reply.payload.bs_hdr) /
-      sizeof *h->bs_entries;
+      (2 * sizeof (uint32_t));

     /* Need to byte-swap the entries returned, but apart from that we
-     * don't validate them.
+     * don't validate them.  Reverse order is essential, since we are
+     * expanding in-place from narrow to wider type.
      */
-    for (i = 0; i < count; ++i)
-      h->bs_entries[i] = be32toh (h->bs_entries[i]);
+    raw = h->bs_entries.narrow;
+    for (i = count; i-- > 0; ) {
+      h->bs_entries.normal[i].flags = be32toh (raw[i * 2 + 1]);
+      h->bs_entries.normal[i].length = be32toh (raw[i * 2]);
+    }

     /* Look up the context ID. */
     context_id = be32toh (h->sbuf.reply.payload.bs_hdr.context_id);
@@ -507,7 +517,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:

       if (CALL_CALLBACK (cmd->cb.fn.extent,
                          h->meta_contexts.ptr[i].name, cmd->offset,
-                         h->bs_entries, count,
+                         h->bs_entries.normal, count,
                          &error) == -1)
         if (cmd->error == 0)
           cmd->error = error ? error : EPROTO;
diff --git a/lib/handle.c b/lib/handle.c
index 4a186f8f..41e442e7 100644
--- a/lib/handle.c
+++ b/lib/handle.c
@@ -130,7 +130,7 @@ nbd_close (struct nbd_handle *h)

   string_vector_iter (&h->querylist, (void *) free);
   free (h->querylist.ptr);
-  free (h->bs_entries);
+  free (h->bs_entries.storage);
   nbd_internal_reset_size_and_flags (h);
   for (i = 0; i < h->meta_contexts.len; ++i)
     free (h->meta_contexts.ptr[i].name);
diff --git a/lib/rw.c b/lib/rw.c
index 81dded3f..6212bf4c 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -42,6 +42,61 @@ wait_for_command (struct nbd_handle *h, int64_t cookie)
   return r == -1 ? -1 : 0;
 }

+/* Convert from 64-bit to 32-bit extent callback. */
+static int
+nbd_convert_extent (void *data, const char *metacontext, uint64_t offset,
+                    nbd_extent *entries, size_t nr_entries, int *error)
+{
+  nbd_extent_callback *cb = data;
+  uint32_t *array = malloc (nr_entries * 2 * sizeof *array);
+  size_t i;
+  int ret;
+  bool fail = false;
+
+  if (array == NULL) {
+    set_error (*error = errno, "malloc");
+    return -1;
+  }
+
+  for (i = 0; i < nr_entries; i++) {
+    array[i * 2] = entries[i].length;
+    array[i * 2 + 1] = entries[i].flags;
+    /* If an extent is larger than 32 bits, silently truncate the rest
+     * of the server's response; the client can then make progress
+     * instead of needing to see failure.  Rather than track the
+     * connection's alignment, just clamp the large extent to 4G-64M.
+     * However, if flags doesn't fit in 32 bits, it's better to inform
+     * the caller of an EOVERFLOW failure.
+     *
+     * Technically, a server with 64-bit answers is non-compliant if
+     * the client did not negotiate extended headers - contexts that
+     * include 64-bit flags should not have been negotiated in that
+     * case.
+     */
+    if (entries[i].length > UINT32_MAX) {
+      array[i++ * 2] = -MAX_REQUEST_SIZE;
+      break;
+    }
+    if (entries[i].flags > UINT32_MAX) {
+      *error = EOVERFLOW;
+      fail = true;
+      break;
+    }
+  }
+
+  ret = CALL_CALLBACK (*cb, metacontext, offset, array, i * 2, error);
+  free (array);
+  return fail ? -1 : ret;
+}
+
+static void
+nbd_convert_extent_free (void *data)
+{
+  nbd_extent_callback *cb = data;
+  FREE_CALLBACK (*cb);
+  free (cb);
+}
+
 /* Issue a read command and wait for the reply. */
 int
 nbd_unlocked_pread (struct nbd_handle *h, void *buf,
@@ -487,12 +542,23 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
                                nbd_completion_callback *completion,
                                uint32_t flags)
 {
-  struct command_cb cb = { .fn.extent = *extent,
+  nbd_extent_callback *shim = malloc (sizeof *shim);
+  struct command_cb cb = { .fn.extent.callback = nbd_convert_extent,
+                           .fn.extent.user_data = shim,
+                           .fn.extent.free = nbd_convert_extent_free,
                            .completion = *completion };

+  if (shim == NULL) {
+    set_error (errno, "malloc");
+    return -1;
+  }
+  *shim = *extent;
+  SET_CALLBACK_TO_NULL (*extent);
+
   if (h->strict & LIBNBD_STRICT_COMMANDS) {
     if (!h->structured_replies) {
       set_error (ENOTSUP, "server does not support structured replies");
+      FREE_CALLBACK (cb.fn.extent);
       return -1;
     }

@@ -500,11 +566,11 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
       set_error (ENOTSUP, "did not negotiate any metadata contexts, "
                  "either you did not call nbd_add_meta_context before "
                  "connecting or the server does not support it");
+      FREE_CALLBACK (cb.fn.extent);
       return -1;
     }
   }

-  SET_CALLBACK_TO_NULL (*extent);
   SET_CALLBACK_TO_NULL (*completion);
   return nbd_internal_command_common (h, flags, NBD_CMD_BLOCK_STATUS, offset,
                                       count, EINVAL, NULL, &cb);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 09/23] block_status: Accept 64-bit extents during block status
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (7 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 08/23] block_status: Track 64-bit extents internally Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 10/23] api: Add [aio_]nbd_block_status_64 Eric Blake
                     ` (13 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Support a server giving us a 64-bit extent.  Note that the protocol
says a server should not give a 64-bit answer when extended headers
are not negotiated; we can handle that by reporting EPROTO but
otherwise accepting the information.  Meanwhile, when extended headers
are in effect, even a 32-bit original query can produce a 64-bit
answer; and likewise, a 64-bit query may have so much information that
the server truncates it to a 32-bit answer, so we must be prepared for
either type of response.  Since we already store 64-bit extents
internally, the user's 32-bit callback doesn't have to care which
reply chunk the server uses (the shim takes care of that, and an
upcoming patch adds new APIs to let the client use a 64-bit callback).
Of course, until a later patch enables extended headers negotiation,
no compliant server will trigger the new code here.

Implementation-wise, 'normal' and 'wide' are two different types but
have the same underlying size; keeping the two names makes it easier
to reason about when values are still in network byte order from the
server or native endian for local processing.
---
 lib/internal.h                      |  3 +
 generator/states-reply-structured.c | 88 ++++++++++++++++++++++-------
 2 files changed, 72 insertions(+), 19 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index 0c23f882..b91fe6f6 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -248,6 +248,7 @@ struct nbd_handle {
         struct nbd_structured_reply_offset_data offset_data;
         struct nbd_structured_reply_offset_hole offset_hole;
         struct nbd_structured_reply_block_status_hdr bs_hdr;
+        struct nbd_structured_reply_block_status_ext_hdr bs_ext_hdr;
         struct {
           struct nbd_structured_reply_error error;
           char msg[NBD_MAX_STRING]; /* Common to all error types */
@@ -306,6 +307,8 @@ struct nbd_handle {
     char *storage;      /* malloc's view */
     nbd_extent *normal; /* Our 64-bit preferred internal form; n slots */
     uint32_t *narrow;   /* 32-bit NBD_REPLY_TYPE_BLOCK_STATUS form; n*2 slots */
+    struct nbd_block_descriptor_ext *wide;
+                        /* 64-bit NBD_REPLY_TYPE_BLOCK_STATUS_EXT; n slots */
   } bs_entries;

   /* Commands which are waiting to be issued [meaning the request
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index d23e56a9..7e313b5a 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -22,6 +22,8 @@
 #include <stdint.h>
 #include <inttypes.h>

+#include "minmax.h"
+
 /* Structured reply must be completely inside the bounds of the
  * requesting command.
  */
@@ -147,6 +149,24 @@  REPLY.STRUCTURED_REPLY.CHECK:
     /* Start by reading the context ID. */
     h->rbuf = &h->sbuf.reply.payload.bs_hdr;
     h->rlen = sizeof h->sbuf.reply.payload.bs_hdr;
+    h->sbuf.reply.payload.bs_ext_hdr.count = 0;
+    SET_NEXT_STATE (%RECV_BS_HEADER);
+    break;
+
+  case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
+    if (cmd->type != NBD_CMD_BLOCK_STATUS ||
+        length < 24 ||
+        (length-8) % sizeof(struct nbd_block_descriptor_ext))
+      goto resync;
+    if (!h->extended_headers) {
+      debug (h, "unexpected 64-bit block status without extended headers, "
+             "this is probably a server bug");
+      if (cmd->error == 0)
+        cmd->error = EPROTO;
+    }
+    /* Start by reading the context ID. */
+    h->rbuf = &h->sbuf.reply.payload.bs_ext_hdr;
+    h->rlen = sizeof h->sbuf.reply.payload.bs_ext_hdr;
     SET_NEXT_STATE (%RECV_BS_HEADER);
     break;

@@ -437,6 +457,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_HEADER:
   struct command *cmd = h->reply_cmd;
   uint32_t length;
   uint32_t count;
+  uint16_t type;

   switch (recv_into_rbuf (h)) {
   case -1: SET_NEXT_STATE (%.DEAD); return 0;
@@ -446,24 +467,44 @@  REPLY.STRUCTURED_REPLY.RECV_BS_HEADER:
     return 0;
   case 0:
     length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
+    type = be16toh (h->sbuf.reply.hdr.structured.type);

     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
-    assert (length >= 12);
-    length -= sizeof h->sbuf.reply.payload.bs_hdr;
-    count = length / (2 * sizeof (uint32_t));

-    /* Read raw data into a subset of h->bs_entries, then expand it
+    if (type == NBD_REPLY_TYPE_BLOCK_STATUS) {
+      assert (length >= 12);
+      length -= sizeof h->sbuf.reply.payload.bs_hdr;
+      count = length / (2 * sizeof (uint32_t));
+    }
+    else {
+      assert (type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT);
+      assert (length >= 24);
+      length -= sizeof h->sbuf.reply.payload.bs_ext_hdr;
+      count = length / sizeof (struct nbd_block_descriptor_ext);
+      if (h->sbuf.reply.payload.bs_ext_hdr.count &&
+          count != be32toh (h->sbuf.reply.payload.bs_ext_hdr.count)) {
+        h->rbuf = NULL;
+        h->rlen = length;
+        SET_NEXT_STATE (%RESYNC);
+        return 0;
+      }
+    }
+    /* Normalize count for later use. */
+    h->sbuf.reply.payload.bs_ext_hdr.count = count;
+
+    /* Read raw data into an overlap of h->bs_entries, then move it
      * into place later during byte-swapping.
      */
     free (h->bs_entries.storage);
-    h->bs_entries.storage = malloc (count * sizeof *h->bs_entries.normal);
+    h->bs_entries.storage = malloc (MAX (count * sizeof *h->bs_entries.normal,
+                                         length));
     if (h->bs_entries.storage == NULL) {
       SET_NEXT_STATE (%.DEAD);
       set_error (errno, "malloc");
       return 0;
     }
-    h->rbuf = h->bs_entries.narrow;
+    h->rbuf = h->bs_entries.storage;
     h->rlen = length;
     SET_NEXT_STATE (%RECV_BS_ENTRIES);
   }
@@ -471,11 +512,10 @@  REPLY.STRUCTURED_REPLY.RECV_BS_HEADER:

  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
   struct command *cmd = h->reply_cmd;
-  uint32_t length;
   uint32_t count;
   size_t i;
   uint32_t context_id;
-  uint32_t *raw;
+  uint16_t type;

   switch (recv_into_rbuf (h)) {
   case -1: SET_NEXT_STATE (%.DEAD); return 0;
@@ -484,25 +524,35 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
     SET_NEXT_STATE (%.READY);
     return 0;
   case 0:
-    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
+    type = be16toh (h->sbuf.reply.hdr.structured.type);
+    count = h->sbuf.reply.payload.bs_ext_hdr.count; /* normalized in BS_HEADER */

     assert (cmd); /* guaranteed by CHECK */
     assert (cmd->type == NBD_CMD_BLOCK_STATUS);
     assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
-    assert (h->bs_entries.normal);
-    assert (length >= 12);
+    assert (h->bs_entries.normal && count);
     assert (h->meta_valid);
-    count = (length - sizeof h->sbuf.reply.payload.bs_hdr) /
-      (2 * sizeof (uint32_t));

     /* Need to byte-swap the entries returned, but apart from that we
-     * don't validate them.  Reverse order is essential, since we are
-     * expanding in-place from narrow to wider type.
+     * don't validate them.
      */
-    raw = h->bs_entries.narrow;
-    for (i = count; i-- > 0; ) {
-      h->bs_entries.normal[i].flags = be32toh (raw[i * 2 + 1]);
-      h->bs_entries.normal[i].length = be32toh (raw[i * 2]);
+    if (type == NBD_REPLY_TYPE_BLOCK_STATUS) {
+      uint32_t *raw = h->bs_entries.narrow;
+
+      /* Expanding in-place from narrow to wide, must use reverse order. */
+      for (i = count; i-- > 0; ) {
+        h->bs_entries.normal[i].flags = be32toh (raw[i * 2 + 1]);
+        h->bs_entries.normal[i].length = be32toh (raw[i * 2]);
+      }
+    }
+    else {
+      struct nbd_block_descriptor_ext *wide = h->bs_entries.wide;
+
+      assert (type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT);
+      for (i = 0; i < count; i++) {
+        h->bs_entries.normal[i].length = be64toh (wide[i].length);
+        h->bs_entries.normal[i].flags = be64toh (wide[i].status_flags);
+      }
     }

     /* Look up the context ID. */
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 10/23] api: Add [aio_]nbd_block_status_64
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (8 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 09/23] block_status: Accept 64-bit extents during block status Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 11/23] api: Add several functions for controlling extended headers Eric Blake
                     ` (12 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Overcome the inherent 32-bit limitation of our existing
nbd_block_status command by adding a 64-bit variant.  The command sent
to the server does not change, but the user's callback is now handed
64-bit information regardless of whether the server replies with 32-
or 64-bit extents.

Unit tests prove that the new API works in each of C, Python, OCaml,
and Go bindings.  We can also get rid of the temporary hack added to
appease the compiler in an earlier patch.
---
 docs/libnbd.pod                           |  18 +--
 sh/nbdsh.pod                              |   2 +-
 generator/API.ml                          | 144 +++++++++++++++++++---
 generator/OCaml.ml                        |   1 -
 generator/Python.ml                       |   1 -
 lib/rw.c                                  |  48 ++++++--
 python/t/465-block-status-64.py           |  56 +++++++++
 ocaml/tests/Makefile.am                   |   1 +
 ocaml/tests/test_465_block_status_64.ml   |  58 +++++++++
 tests/meta-base-allocation.c              | 111 +++++++++++++++--
 fuzzing/libnbd-fuzz-wrapper.c             |  22 +++-
 golang/Makefile.am                        |   1 +
 golang/libnbd_465_block_status_64_test.go | 119 ++++++++++++++++++
 13 files changed, 534 insertions(+), 48 deletions(-)
 create mode 100644 python/t/465-block-status-64.py
 create mode 100644 ocaml/tests/test_465_block_status_64.ml
 create mode 100644 golang/libnbd_465_block_status_64_test.go

diff --git a/docs/libnbd.pod b/docs/libnbd.pod
index 170fc1fa..b57cfa46 100644
--- a/docs/libnbd.pod
+++ b/docs/libnbd.pod
@@ -661,14 +661,14 @@ In order to utilize block status, the client must call
 L<nbd_add_meta_context(3)> prior to connecting, for each meta context
 in which it is interested, then check L<nbd_can_meta_context(3)> after
 connection to see which contexts the server actually supports.  If a
-context is supported, the client can then use L<nbd_block_status(3)>
-with a callback function that will receive an array of 32-bit integer
-pairs describing consecutive extents within a context.  In each pair,
-the first integer is the length of the extent, the second is a bitmask
-description of that extent (for the "base:allocation" context, the
-bitmask may include C<LIBNBD_STATE_HOLE> for unallocated portions of
-the file, and/or C<LIBNBD_STATE_ZERO> for portions of the file known
-to read as zero).
+context is supported, the client can then use
+L<nbd_block_status_64(3)> with a callback function that will receive
+an array of structs describing consecutive extents within a context.
+Each struct gives the length of the extent, then a bitmask description
+of that extent (for the "base:allocation" context, the bitmask may
+include C<LIBNBD_STATE_HOLE> for unallocated portions of the file,
+and/or C<LIBNBD_STATE_ZERO> for portions of the file known to read as
+zero).

 There is a full example of requesting meta context and using block
 status available at
@@ -917,7 +917,7 @@ will tie up resources until L<nbd_close(3)> is eventually reached).
 =head2 Callbacks with C<int *error> parameter

 Some of the high-level commands (L<nbd_pread_structured(3)>,
-L<nbd_block_status(3)>) involve the use of a callback function invoked
+L<nbd_block_status_64(3)>) involve the use of a callback function invoked
 by the state machine at appropriate points in the server's reply
 before the overall command is complete.  These callback functions,
 along with all of the completion callbacks, include a parameter
diff --git a/sh/nbdsh.pod b/sh/nbdsh.pod
index 4d14118c..666a31b6 100644
--- a/sh/nbdsh.pod
+++ b/sh/nbdsh.pod
@@ -59,7 +59,7 @@ Display brief command line help and exit.
 =item B<--base-allocation>

 Request the use of the "base:allocation" meta context, which is the
-most common context used with L<nbd_block_status(3)>.  This is
+most common context used with L<nbd_block_status_64(3)>.  This is
 equivalent to calling
 S<C<h.set_meta_context(nbd.CONTEXT_BASE_ALLOCATION)>> in the shell
 prior to connecting, and works even when combined with C<--uri> (while
diff --git a/generator/API.ml b/generator/API.ml
index 85509867..aeee41fb 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -1565,7 +1565,7 @@   "add_meta_context", {
 During connection libnbd can negotiate zero or more metadata
 contexts with the server.  Metadata contexts are features (such
 as C<\"base:allocation\">) which describe information returned
-by the L<nbd_block_status(3)> command (for C<\"base:allocation\">
+by the L<nbd_block_status_64(3)> command (for C<\"base:allocation\">
 this is whether blocks of data are allocated, zero or sparse).

 This call adds one metadata context to the list to be negotiated.
@@ -1592,7 +1592,7 @@   "add_meta_context", {
 Other metadata contexts are server-specific, but include
 C<\"qemu:dirty-bitmap:...\"> and C<\"qemu:allocation-depth\"> for
 qemu-nbd (see qemu-nbd I<-B> and I<-A> options).";
-    see_also = [Link "block_status"; Link "can_meta_context";
+    see_also = [Link "block_status_64"; Link "can_meta_context";
                 Link "get_nr_meta_contexts"; Link "get_meta_context";
                 Link "clear_meta_contexts"];
   };
@@ -1605,14 +1605,14 @@   "get_nr_meta_contexts", {
 During connection libnbd can negotiate zero or more metadata
 contexts with the server.  Metadata contexts are features (such
 as C<\"base:allocation\">) which describe information returned
-by the L<nbd_block_status(3)> command (for C<\"base:allocation\">
+by the L<nbd_block_status_64(3)> command (for C<\"base:allocation\">
 this is whether blocks of data are allocated, zero or sparse).

 This command returns how many meta contexts have been added to
 the list to request from the server via L<nbd_add_meta_context(3)>.
 The server is not obligated to honor all of the requests; to see
 what it actually supports, see L<nbd_can_meta_context(3)>.";
-    see_also = [Link "block_status"; Link "can_meta_context";
+    see_also = [Link "block_status_64"; Link "can_meta_context";
                 Link "add_meta_context"; Link "get_meta_context";
                 Link "clear_meta_contexts"];
   };
@@ -1625,13 +1625,13 @@   "get_meta_context", {
 During connection libnbd can negotiate zero or more metadata
 contexts with the server.  Metadata contexts are features (such
 as C<\"base:allocation\">) which describe information returned
-by the L<nbd_block_status(3)> command (for C<\"base:allocation\">
+by the L<nbd_block_status_64(3)> command (for C<\"base:allocation\">
 this is whether blocks of data are allocated, zero or sparse).

 This command returns the i'th meta context request, as added by
 L<nbd_add_meta_context(3)>, and bounded by
 L<nbd_get_nr_meta_contexts(3)>.";
-    see_also = [Link "block_status"; Link "can_meta_context";
+    see_also = [Link "block_status_64"; Link "can_meta_context";
                 Link "add_meta_context"; Link "get_nr_meta_contexts";
                 Link "clear_meta_contexts"];
   };
@@ -1645,7 +1645,7 @@   "clear_meta_contexts", {
 During connection libnbd can negotiate zero or more metadata
 contexts with the server.  Metadata contexts are features (such
 as C<\"base:allocation\">) which describe information returned
-by the L<nbd_block_status(3)> command (for C<\"base:allocation\">
+by the L<nbd_block_status_64(3)> command (for C<\"base:allocation\">
 this is whether blocks of data are allocated, zero or sparse).

 This command resets the list of meta contexts to request back to
@@ -1654,7 +1654,7 @@   "clear_meta_contexts", {
 negotiation mode is selected (see L<nbd_set_opt_mode(3)>), for
 altering the list of attempted contexts between subsequent export
 queries.";
-    see_also = [Link "block_status"; Link "can_meta_context";
+    see_also = [Link "block_status_64"; Link "can_meta_context";
                 Link "add_meta_context"; Link "get_nr_meta_contexts";
                 Link "get_meta_context"; Link "set_opt_mode"];
   };
@@ -2233,7 +2233,7 @@   "can_meta_context", {
 ^ non_blocking_test_call_description;
     see_also = [SectionLink "Flag calls"; Link "opt_info";
                 Link "add_meta_context";
-                Link "block_status"; Link "aio_block_status";
+                Link "block_status_64"; Link "aio_block_status_64";
                 Link "set_request_meta_context"; Link "opt_set_meta_context"];
   };

@@ -2664,7 +2664,7 @@   "block_status", {
     optargs = [ OFlags ("flags", cmd_flags, Some ["REQ_ONE"]) ];
     ret = RErr;
     permitted_states = [ Connected ];
-    shortdesc = "send block status command to the NBD server";
+    shortdesc = "send block status command to the NBD server, with 32-bit callback";
     longdesc = "\
 Issue the block status command to the NBD server.  If
 supported by the server, this causes metadata context
@@ -2679,7 +2679,15 @@   "block_status", {
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum block status size,
 although a future extension may add a constraint visible in
-L<nbd_get_block_size(3)>.
+L<nbd_get_block_size(3)>.  Furthermore, this function is inherently
+limited to 32-bit values.  If the server replies with a larger
+extent, the length of that extent will be truncated to just
+below 32 bits and any further extents from the server will be
+ignored.  If the server replies with a status value larger than
+32 bits (only possible when extended headers are in use), the
+callback function will be passed an C<EOVERFLOW> error.  To get
+the full extent information from a server that supports 64-bit
+extents, you must use L<nbd_block_status_64(3)>.

 Depending on which metadata contexts were enabled before
 connecting (see L<nbd_add_meta_context(3)>) and which are
@@ -2699,7 +2707,7 @@   "block_status", {
 array is an array of pairs of integers with the first entry in each
 pair being the length (in bytes) of the block and the second entry
 being a status/flags field which is specific to the metadata context.
-(The number of pairs passed to the function is C<nr_entries/2>.)  The
+The number of pairs passed to the function is C<nr_entries/2>.  The
 NBD protocol document in the section about
 C<NBD_REPLY_TYPE_BLOCK_STATUS> describes the meaning of this array;
 for contexts known to libnbd, B<E<lt>libnbd.hE<gt>> contains constants
@@ -2723,10 +2731,79 @@   "block_status", {
 does not exceed C<count> bytes; however, libnbd does not
 validate that the server obeyed the flag."
 ^ strict_call_description;
-    see_also = [Link "add_meta_context"; Link "can_meta_context";
+    see_also = [Link "block_status_64";
+                Link "add_meta_context"; Link "can_meta_context";
                 Link "aio_block_status"; Link "set_strict_mode"];
   };

+  "block_status_64", {
+    default_call with
+    args = [ UInt64 "count"; UInt64 "offset"; Closure extent64_closure ];
+    optargs = [ OFlags ("flags", cmd_flags, Some ["REQ_ONE"]) ];
+    ret = RErr;
+    permitted_states = [ Connected ];
+    shortdesc = "send block status command to the NBD server, with 64-bit callback";
+    longdesc = "\
+Issue the block status command to the NBD server.  If
+supported by the server, this causes metadata context
+information about blocks beginning from the specified
+offset to be returned. The C<count> parameter is a hint: the
+server may choose to return less status, or the final block
+may extend beyond the requested range. If multiple contexts
+are supported, the number of blocks and cumulative length
+of those blocks need not be identical between contexts.
+
+Note that not all servers can support a C<count> of 4GiB or larger.
+The NBD protocol does not yet have a way for a client to learn if
+the server will enforce an even smaller maximum block status size,
+although a future extension may add a constraint visible in
+L<nbd_get_block_size(3)>.
+
+Depending on which metadata contexts were enabled before
+connecting (see L<nbd_add_meta_context(3)>) and which are
+supported by the server (see L<nbd_can_meta_context(3)>) this call
+returns information about extents by calling back to the
+C<extent64> function.  The callback cannot call C<nbd_*> APIs on the
+same handle since it holds the handle lock and will
+cause a deadlock.  If the callback returns C<-1>, and no earlier
+error has been detected, then the overall block status command
+will fail with any non-zero value stored into the callback's
+C<error> parameter (with a default of C<EPROTO>); but any further
+contexts will still invoke the callback.
+
+The C<extent64> function is called once per type of metadata available,
+with the C<user_data> passed to this function.  The C<metacontext>
+parameter is a string such as C<\"base:allocation\">.  The C<entries>
+array is an array of B<nbd_extent> structs, containing length (in bytes)
+of the block and a status/flags field which is specific to the metadata
+context.  The number of array entries passed to the function is
+C<nr_entries>.  The NBD protocol document in the section about
+C<NBD_REPLY_TYPE_BLOCK_STATUS> describes the meaning of this array;
+for contexts known to libnbd, B<E<lt>libnbd.hE<gt>> contains constants
+beginning with C<LIBNBD_STATE_> that may help decipher the values.
+On entry to the callback, the C<error> parameter contains the errno
+value of any previously detected error.
+
+It is possible for the extent function to be called
+more times than you expect (if the server is buggy),
+so always check the C<metacontext> field to ensure you
+are receiving the data you expect.  It is also possible
+that the extent function is not called at all, even for
+metadata contexts that you requested.  This indicates
+either that the server doesn't support the context
+or for some other reason cannot return the data.
+
+The C<flags> parameter may be C<0> for no flags, or may contain
+C<LIBNBD_CMD_FLAG_REQ_ONE> meaning that the server should
+return only one extent per metadata context where that extent
+does not exceed C<count> bytes; however, libnbd does not
+validate that the server obeyed the flag."
+^ strict_call_description;
+    see_also = [Link "block_status";
+                Link "add_meta_context"; Link "can_meta_context";
+                Link "aio_block_status_64"; Link "set_strict_mode"];
+  };
+
   "poll", {
     default_call with
     args = [ Int "timeout" ]; ret = RInt;
@@ -3321,7 +3398,7 @@   "aio_block_status", {
                 OFlags ("flags", cmd_flags, Some ["REQ_ONE"]) ];
     ret = RCookie;
     permitted_states = [ Connected ];
-    shortdesc = "send block status command to the NBD server";
+    shortdesc = "send block status command to the NBD server, with 32-bit callback";
     longdesc = "\
 Send the block status command to the NBD server.

@@ -3329,13 +3406,48 @@   "aio_block_status", {
 Or supply the optional C<completion_callback> which will be invoked
 as described in L<libnbd(3)/Completion callbacks>.

-Other parameters behave as documented in L<nbd_block_status(3)>."
+Other parameters behave as documented in L<nbd_block_status(3)>.
+
+This function is inherently limited to 32-bit values.  If the
+server replies with a larger extent, the length of that extent
+will be truncated to just below 32 bits and any further extents
+from the server will be ignored.  If the server replies with a
+status value larger than 32 bits (only possible when extended
+headers are in use), the callback function will be passed an
+C<EOVERFLOW> error.  To get the full extent information from a
+server that supports 64-bit extents, you must use
+L<nbd_aio_block_status_64(3)>.
+"
 ^ strict_call_description;
     see_also = [SectionLink "Issuing asynchronous commands";
+                Link "aio_block_status_64";
                 Link "can_meta_context"; Link "block_status";
                 Link "set_strict_mode"];
   };

+  "aio_block_status_64", {
+    default_call with
+    args = [ UInt64 "count"; UInt64 "offset"; Closure extent64_closure ];
+    optargs = [ OClosure completion_closure;
+                OFlags ("flags", cmd_flags, Some ["REQ_ONE"]) ];
+    ret = RCookie;
+    permitted_states = [ Connected ];
+    shortdesc = "send block status command to the NBD server, with 64-bit callback";
+    longdesc = "\
+Send the block status command to the NBD server.
+
+To check if the command completed, call L<nbd_aio_command_completed(3)>.
+Or supply the optional C<completion_callback> which will be invoked
+as described in L<libnbd(3)/Completion callbacks>.
+
+Other parameters behave as documented in L<nbd_block_status_64(3)>."
+^ strict_call_description;
+    see_also = [SectionLink "Issuing asynchronous commands";
+                Link "aio_block_status";
+                Link "can_meta_context"; Link "block_status_64";
+                Link "set_strict_mode"];
+  };
+
   "aio_get_fd", {
     default_call with
     args = []; ret = RFd;
@@ -3858,6 +3970,8 @@ let first_version =
   "aio_opt_structured_reply", (1, 16);
   "opt_starttls", (1, 16);
   "aio_opt_starttls", (1, 16);
+  "block_status_64", (1, 16);
+  "aio_block_status_64", (1, 16);

   (* These calls are proposed for a future version of libnbd, but
    * have not been added to any released version so far.
diff --git a/generator/OCaml.ml b/generator/OCaml.ml
index 66dc6cad..360a75a9 100644
--- a/generator/OCaml.ml
+++ b/generator/OCaml.ml
@@ -583,7 +583,6 @@ let
   pr "}\n";
   pr "\n";
   pr "static int\n";
-  pr "__attribute__((unused)) /* XXX temporary hack */\n";
   pr "%s_wrapper " cbname;
   C.print_cbarg_list ~wrap:true cbargs;
   pr "\n";
diff --git a/generator/Python.ml b/generator/Python.ml
index 980fdcea..482cdbda 100644
--- a/generator/Python.ml
+++ b/generator/Python.ml
@@ -153,7 +153,6 @@ let
 let print_python_closure_wrapper { cbname; cbargs } =
   pr "/* Wrapper for %s callback. */\n" cbname;
   pr "static int\n";
-  pr "__attribute__((unused)) /* XXX temporary hack */\n";
   pr "%s_wrapper " cbname;
   C.print_cbarg_list ~wrap:true cbargs;
   pr "\n";
diff --git a/lib/rw.c b/lib/rw.c
index 6212bf4c..26533836 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -205,7 +205,7 @@ nbd_unlocked_zero (struct nbd_handle *h,
   return wait_for_command (h, cookie);
 }

-/* Issue a block status command and wait for the reply. */
+/* Issue a block status command and wait for the reply, 32-bit callback. */
 int
 nbd_unlocked_block_status (struct nbd_handle *h,
                            uint64_t count, uint64_t offset,
@@ -223,6 +223,25 @@ nbd_unlocked_block_status (struct nbd_handle *h,
   return wait_for_command (h, cookie);
 }

+/* Issue a block status command and wait for the reply, 64-bit callback. */
+int
+nbd_unlocked_block_status_64 (struct nbd_handle *h,
+                              uint64_t count, uint64_t offset,
+                              nbd_extent64_callback *extent64,
+                              uint32_t flags)
+{
+  int64_t cookie;
+  nbd_completion_callback c = NBD_NULL_COMPLETION;
+
+  cookie = nbd_unlocked_aio_block_status_64 (h, count, offset, extent64, &c,
+                                             flags);
+  if (cookie == -1)
+    return -1;
+
+  assert (CALLBACK_IS_NULL (*extent64));
+  return wait_for_command (h, cookie);
+}
+
 /* count_err represents the errno to return if bounds check fail */
 int64_t
 nbd_internal_command_common (struct nbd_handle *h,
@@ -543,10 +562,10 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
                                uint32_t flags)
 {
   nbd_extent_callback *shim = malloc (sizeof *shim);
-  struct command_cb cb = { .fn.extent.callback = nbd_convert_extent,
-                           .fn.extent.user_data = shim,
-                           .fn.extent.free = nbd_convert_extent_free,
-                           .completion = *completion };
+  nbd_extent64_callback wrapper = { .callback = nbd_convert_extent,
+                                    .user_data = shim,
+                                    .free = nbd_convert_extent_free, };
+  int ret;

   if (shim == NULL) {
     set_error (errno, "malloc");
@@ -555,10 +574,25 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
   *shim = *extent;
   SET_CALLBACK_TO_NULL (*extent);

+  ret = nbd_unlocked_aio_block_status_64 (h, count, offset, &wrapper,
+                                          completion, flags);
+  FREE_CALLBACK (wrapper);
+  return ret;
+}
+
+int64_t
+nbd_unlocked_aio_block_status_64 (struct nbd_handle *h,
+                                  uint64_t count, uint64_t offset,
+                                  nbd_extent64_callback *extent64,
+                                  nbd_completion_callback *completion,
+                                  uint32_t flags)
+{
+  struct command_cb cb = { .fn.extent = *extent64,
+                           .completion = *completion };
+
   if (h->strict & LIBNBD_STRICT_COMMANDS) {
     if (!h->structured_replies) {
       set_error (ENOTSUP, "server does not support structured replies");
-      FREE_CALLBACK (cb.fn.extent);
       return -1;
     }

@@ -566,11 +600,11 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
       set_error (ENOTSUP, "did not negotiate any metadata contexts, "
                  "either you did not call nbd_add_meta_context before "
                  "connecting or the server does not support it");
-      FREE_CALLBACK (cb.fn.extent);
       return -1;
     }
   }

+  SET_CALLBACK_TO_NULL (*extent64);
   SET_CALLBACK_TO_NULL (*completion);
   return nbd_internal_command_common (h, flags, NBD_CMD_BLOCK_STATUS, offset,
                                       count, EINVAL, NULL, &cb);
diff --git a/python/t/465-block-status-64.py b/python/t/465-block-status-64.py
new file mode 100644
index 00000000..6fba9c3e
--- /dev/null
+++ b/python/t/465-block-status-64.py
@@ -0,0 +1,56 @@
+# libnbd Python bindings
+# Copyright (C) 2010-2022 Red Hat Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+
+import os
+
+import nbd
+
+script = "%s/../tests/meta-base-allocation.sh" % os.getenv("srcdir", ".")
+
+h = nbd.NBD()
+h.add_meta_context("base:allocation")
+h.connect_command(["nbdkit", "-s", "--exit-with-parent", "-v", "sh", script])
+
+entries = []
+
+
+def f(user_data, metacontext, offset, e, err):
+    global entries
+    assert user_data == 42
+    assert err.value == 0
+    if metacontext != "base:allocation":
+        return
+    entries = e
+
+
+h.block_status_64(65536, 0, lambda *args: f(42, *args))
+print("entries = %r" % entries)
+assert entries == [(8192, 0),
+                   (8192, 1),
+                   (16384, 3),
+                   (16384, 2),
+                   (16384, 0)]
+
+h.block_status_64(1024, 32256, lambda *args: f(42, *args))
+print("entries = %r" % entries)
+assert entries == [(512, 3),
+                   (16384, 2)]
+
+h.block_status_64(1024, 32256, lambda *args: f(42, *args),
+                  nbd.CMD_FLAG_REQ_ONE)
+print("entries = %r" % entries)
+assert entries == [(512, 3)]
diff --git a/ocaml/tests/Makefile.am b/ocaml/tests/Makefile.am
index 2cd36eb0..fbd53ac9 100644
--- a/ocaml/tests/Makefile.am
+++ b/ocaml/tests/Makefile.am
@@ -39,6 +39,7 @@ ML_TESTS = \
 	test_405_pread_structured.ml \
 	test_410_pwrite.ml \
 	test_460_block_status.ml \
+	test_465_block_status_64.ml \
 	test_500_aio_pread.ml \
 	test_505_aio_pread_structured_callback.ml \
 	test_510_aio_pwrite.ml \
diff --git a/ocaml/tests/test_465_block_status_64.ml b/ocaml/tests/test_465_block_status_64.ml
new file mode 100644
index 00000000..2f36a692
--- /dev/null
+++ b/ocaml/tests/test_465_block_status_64.ml
@@ -0,0 +1,58 @@
+(* hey emacs, this is OCaml code: -*- tuareg -*- *)
+(* libnbd OCaml test case
+ * Copyright (C) 2013-2022 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *)
+
+open Printf
+
+let script =
+  try
+    let srcdir = Sys.getenv "srcdir" in
+    sprintf "%s/../../tests/meta-base-allocation.sh" srcdir
+  with
+    Not_found -> failwith "error: srcdir is not defined"
+
+let entries = ref [||]
+let f user_data metacontext offset e err =
+  assert (user_data = 42);
+  assert (!err = 0);
+  if metacontext = "base:allocation" then
+    entries := e;
+  0
+
+let () =
+  let nbd = NBD.create () in
+  NBD.add_meta_context nbd "base:allocation";
+  NBD.connect_command nbd ["nbdkit"; "-s"; "--exit-with-parent"; "-v";
+                           "sh"; script];
+
+  NBD.block_status_64 nbd 65536_L 0_L (f 42);
+  assert (!entries = [|  8192_L, 0_L;
+                         8192_L, 1_L;
+                        16384_L, 3_L;
+                        16384_L, 2_L;
+                        16384_L, 0_L |]);
+
+  NBD.block_status_64 nbd 1024_L 32256_L (f 42);
+  assert (!entries = [|   512_L, 3_L;
+                        16384_L, 2_L |]);
+
+  let flags = let open NBD.CMD_FLAG in [REQ_ONE] in
+  NBD.block_status_64 nbd 1024_L 32256_L (f 42) ~flags;
+  assert (!entries = [|   512_L, 3_L |])
+
+let () = Gc.compact ()
diff --git a/tests/meta-base-allocation.c b/tests/meta-base-allocation.c
index 401c0c88..c8f14929 100644
--- a/tests/meta-base-allocation.c
+++ b/tests/meta-base-allocation.c
@@ -1,5 +1,5 @@
 /* NBD client library in userspace
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -32,10 +32,13 @@

 #define BOGUS_CONTEXT "x-libnbd:nosuch"

-static int check_extent (void *data,
-                         const char *metacontext,
-                         uint64_t offset,
-                         uint32_t *entries, size_t nr_entries, int *error);
+static int check_extent32 (void *data, const char *metacontext,
+                           uint64_t offset,
+                           uint32_t *entries, size_t nr_entries, int *error);
+
+static int check_extent64 (void *data, const char *metacontext,
+                           uint64_t offset,
+                           nbd_extent *entries, size_t nr_entries, int *error);

 int
 main (int argc, char *argv[])
@@ -149,27 +152,51 @@ main (int argc, char *argv[])
   /* Read the block status. */
   id = 1;
   if (nbd_block_status (nbd, 65536, 0,
-                        (nbd_extent_callback) { .callback = check_extent, .user_data = &id },
+                        (nbd_extent_callback) { .callback = check_extent32,
+                                                .user_data = &id },
                         0) == -1) {
     fprintf (stderr, "%s\n", nbd_get_error ());
     exit (EXIT_FAILURE);
   }
+  if (nbd_block_status_64 (nbd, 65536, 0,
+                           (nbd_extent64_callback) { .callback = check_extent64,
+                                                     .user_data = &id },
+                           0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }

   id = 2;
   if (nbd_block_status (nbd, 1024, 32768-512,
-                        (nbd_extent_callback) { .callback = check_extent, .user_data = &id },
+                        (nbd_extent_callback) { .callback = check_extent32,
+                                                .user_data = &id },
                         0) == -1) {
     fprintf (stderr, "%s\n", nbd_get_error ());
     exit (EXIT_FAILURE);
   }
+  if (nbd_block_status_64 (nbd, 1024, 32768-512,
+                           (nbd_extent64_callback) { .callback = check_extent64,
+                                                     .user_data = &id },
+                           0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }

   id = 3;
   if (nbd_block_status (nbd, 1024, 32768-512,
-                        (nbd_extent_callback) { .callback = check_extent, .user_data = &id },
+                        (nbd_extent_callback) { .callback = check_extent32,
+                                                .user_data = &id },
                         LIBNBD_CMD_FLAG_REQ_ONE) == -1) {
     fprintf (stderr, "%s\n", nbd_get_error ());
     exit (EXIT_FAILURE);
   }
+  if (nbd_block_status_64 (nbd, 1024, 32768-512,
+                           (nbd_extent64_callback) { .callback = check_extent64,
+                                                     .user_data = &id },
+                           LIBNBD_CMD_FLAG_REQ_ONE) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }

   if (nbd_shutdown (nbd, 0) == -1) {
     fprintf (stderr, "%s\n", nbd_get_error ());
@@ -181,10 +208,8 @@ main (int argc, char *argv[])
 }

 static int
-check_extent (void *data,
-              const char *metacontext,
-              uint64_t offset,
-              uint32_t *entries, size_t nr_entries, int *error)
+check_extent32 (void *data, const char *metacontext, uint64_t offset,
+                uint32_t *entries, size_t nr_entries, int *error)
 {
   size_t i;
   int id;
@@ -238,3 +263,65 @@ check_extent (void *data,

   return 0;
 }
+
+static int
+check_extent64 (void *data, const char *metacontext, uint64_t offset,
+                nbd_extent *entries, size_t nr_entries, int *error)
+{
+  size_t i;
+  int id;
+
+  id = * (int *)data;
+
+  printf ("extent: id=%d, metacontext=%s, offset=%" PRIu64 ", "
+          "nr_entries=%zu, error=%d\n",
+          id, metacontext, offset, nr_entries, *error);
+
+  assert (*error == 0);
+  if (strcmp (metacontext, LIBNBD_CONTEXT_BASE_ALLOCATION) == 0) {
+    for (i = 0; i < nr_entries; i++) {
+      printf ("\t%zu\tlength=%" PRIu64 ", status=%" PRIu64 "\n",
+              i, entries[i].length, entries[i].flags);
+    }
+    fflush (stdout);
+
+    switch (id) {
+    case 1:
+      assert (nr_entries == 5);
+      assert (entries[0].length == 8192);
+      assert (entries[0].flags == 0);
+      assert (entries[1].length == 8192);
+      assert (entries[1].flags == LIBNBD_STATE_HOLE);
+      assert (entries[2].length == 16384);
+      assert (entries[2].flags == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
+      assert (entries[3].length == 16384);
+      assert (entries[3].flags == LIBNBD_STATE_ZERO);
+      assert (entries[4].length == 16384);
+      assert (entries[4].flags == 0);
+      break;
+
+    case 2:
+      assert (nr_entries == 2);
+      assert (entries[0].length == 512);
+      assert (entries[0].flags == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
+      assert (entries[1].length == 16384);
+      assert (entries[1].flags == LIBNBD_STATE_ZERO);
+      break;
+
+    case 3:
+      assert (nr_entries == 1);
+      assert (entries[0].length == 512);
+      assert (entries[0].flags == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
+      break;
+
+    default:
+      abort ();
+    }
+
+  }
+  else
+    fprintf (stderr, "warning: ignored unexpected meta context %s\n",
+             metacontext);
+
+  return 0;
+}
diff --git a/fuzzing/libnbd-fuzz-wrapper.c b/fuzzing/libnbd-fuzz-wrapper.c
index d8cb7cb0..e7cf7fe9 100644
--- a/fuzzing/libnbd-fuzz-wrapper.c
+++ b/fuzzing/libnbd-fuzz-wrapper.c
@@ -1,5 +1,5 @@
 /* NBD client library in userspace
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -165,6 +165,17 @@ extent_callback (void *user_data,
   return 0;
 }

+/* Block status (extents64) callback, does nothing. */
+static int
+extent64_callback (void *user_data,
+                   const char *metacontext,
+                   uint64_t offset, nbd_extent *entries,
+                   size_t nr_entries, int *error)
+{
+  //fprintf (stderr, "extent64 called, nr_entries = %zu\n", nr_entries);
+  return 0;
+}
+
 /* This is the client (parent process) running libnbd. */
 static char buf[512];
 static char prbuf[65536];
@@ -213,7 +224,7 @@ client (int sock)
                         },
                         0);

-  /* Test block status. */
+  /* Test both sizes of block status. */
   nbd_block_status (nbd, length, 0,
                     (nbd_extent_callback) {
                       .callback = extent_callback,
@@ -221,6 +232,13 @@ client (int sock)
                       .free = NULL
                     },
                     0);
+  nbd_block_status_64 (nbd, length, 0,
+                       (nbd_extent64_callback) {
+                         .callback = extent64_callback,
+                         .user_data = NULL,
+                         .free = NULL
+                       },
+                       0);

   nbd_shutdown (nbd, 0);
 }
diff --git a/golang/Makefile.am b/golang/Makefile.am
index 0808a639..d342b51c 100644
--- a/golang/Makefile.am
+++ b/golang/Makefile.am
@@ -43,6 +43,7 @@ source_files = \
 	libnbd_405_pread_structured_test.go \
 	libnbd_410_pwrite_test.go \
 	libnbd_460_block_status_test.go \
+	libnbd_465_block_status_64_test.go \
 	libnbd_500_aio_pread_test.go \
 	libnbd_510_aio_pwrite_test.go \
 	libnbd_590_aio_copy_test.go \
diff --git a/golang/libnbd_465_block_status_64_test.go b/golang/libnbd_465_block_status_64_test.go
new file mode 100644
index 00000000..066cccbf
--- /dev/null
+++ b/golang/libnbd_465_block_status_64_test.go
@@ -0,0 +1,119 @@
+/* libnbd golang tests
+ * Copyright (C) 2013-2022 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+package libnbd
+
+import (
+	"fmt"
+	"os"
+	"strings"
+	"testing"
+)
+
+var entries64 []LibnbdExtent
+
+func mcf64(metacontext string, offset uint64, e []LibnbdExtent, error *int) int {
+	if *error != 0 {
+		panic("expected *error == 0")
+	}
+	if metacontext == "base:allocation" {
+		entries64 = e
+	}
+	return 0
+}
+
+// Seriously WTF?
+func mc64_compare(a1 []LibnbdExtent, a2 []LibnbdExtent) bool {
+	if len(a1) != len(a2) {
+		return false
+	}
+	for i := 0; i < len(a1); i++ {
+		if a1[i] != a2[i] {
+			return false
+		}
+	}
+	return true
+}
+
+func mc64_to_string(a []LibnbdExtent) string {
+	ss := make([]string, len(a))
+	for i := 0; i < len(a); i++ {
+		ss[i] = fmt.Sprintf("%#v", a[i])
+	}
+	return strings.Join(ss, ", ")
+}
+
+func Test465BlockStatus64(t *testing.T) {
+	srcdir := os.Getenv("abs_top_srcdir")
+	script := srcdir + "/tests/meta-base-allocation.sh"
+
+	h, err := Create()
+	if err != nil {
+		t.Fatalf("could not create handle: %s", err)
+	}
+	defer h.Close()
+
+	err = h.AddMetaContext("base:allocation")
+	if err != nil {
+		t.Fatalf("%s", err)
+	}
+	err = h.ConnectCommand([]string{
+		"nbdkit", "-s", "--exit-with-parent", "-v",
+		"sh", script,
+	})
+	if err != nil {
+		t.Fatalf("%s", err)
+	}
+
+	err = h.BlockStatus64(65536, 0, mcf64, nil)
+	if err != nil {
+		t.Fatalf("%s", err)
+	}
+	if !mc64_compare(entries64, []LibnbdExtent{
+		{8192, 0},
+		{8192, 1},
+		{16384, 3},
+		{16384, 2},
+		{16384, 0},
+	}) {
+		t.Fatalf("unexpected entries (1): %s", mc64_to_string(entries64))
+	}
+
+	err = h.BlockStatus64(1024, 32256, mcf64, nil)
+	if err != nil {
+		t.Fatalf("%s", err)
+	}
+	if !mc64_compare(entries64, []LibnbdExtent{
+		{512, 3},
+		{16384, 2},
+	}) {
+		t.Fatalf("unexpected entries (2): %s", mc64_to_string(entries64))
+	}
+
+	var optargs BlockStatus64Optargs
+	optargs.FlagsSet = true
+	optargs.Flags = CMD_FLAG_REQ_ONE
+	err = h.BlockStatus64(1024, 32256, mcf64, &optargs)
+	if err != nil {
+		t.Fatalf("%s", err)
+	}
+	if !mc64_compare(entries64, []LibnbdExtent{{512, 3}}) {
+		t.Fatalf("unexpected entries (3): %s", mc64_to_string(entries64))
+	}
+
+}
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 11/23] api: Add several functions for controlling extended headers
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (9 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 10/23] api: Add [aio_]nbd_block_status_64 Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 12/23] copy: Update nbdcopy to use 64-bit block status Eric Blake
                     ` (11 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

The new NBD_OPT_EXTENDED_HEADERS feature is worth using by default,
but there may be cases where the user explicitly wants to stick with
the older 32-bit headers.  nbd_set_request_extended_headers() will let
the client override the default, nbd_get_request_extended_headers()
determines the current state of the request, and
nbd_get_extended_headers_negotiated() determines what the client and
server actually settled on.  These use
nbd_set_request_structured_replies() and friends as a template.

Note that this patch just adds the API for controlling the defaults,
but ignores the state variable; a later one will then tweak the state
machine to actually request structured headers when the state variable
is set, as well as add nbd_opt_extended_headers() for manual control.

The intent is that because extended headers takes priority over
structured replies, the functions will interact as follows during the
automatic handshaking that takes place prior to nbd_opt_set_mode()
relinquishing control to the client in negotiation mode:

1. client default: request_eh=1, request_sr=1
 => try EXTENDED_HEADERS
    - server supports it: nothing further to do, use extended headers,
      but also report structured replies as active
    - server lacks it: behave like case 2
2. request_eh=0 (explicit client downgrade), request_sr=1 (left at default)
 => try STRUCTURED_REPLY
    - server supports it: expect structured replies
    - server lacks it: expect simple replies
3. request_sr=0 (explicit client downgrade), request_eh ignored
 => don't try either handshake
    - expect simple replies

Client code that wants to manually force simple replies only has to
disable structured replies; and only code that wants to disable
extended headers but still use structured replies has to use the new
nbd_set_request_extended_headers() API.  Until a later patch adds an
explicit API nbd_opt_extended_headers(), there is no way to request
extended headers after structured replies are already negotiated.
---
 lib/internal.h                                |   1 +
 generator/API.ml                              | 115 ++++++++++++++++--
 generator/states-newstyle-opt-starttls.c      |   1 +
 .../states-newstyle-opt-structured-reply.c    |   3 +-
 lib/handle.c                                  |  23 ++++
 python/t/110-defaults.py                      |   1 +
 python/t/120-set-non-defaults.py              |   2 +
 ocaml/tests/test_110_defaults.ml              |   2 +
 ocaml/tests/test_120_set_non_defaults.ml      |   3 +
 golang/libnbd_110_defaults_test.go            |   8 ++
 golang/libnbd_120_set_non_defaults_test.go    |  12 ++
 11 files changed, 160 insertions(+), 11 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index b91fe6f6..1abb21cb 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -110,6 +110,7 @@ struct nbd_handle {
   char *tls_psk_file;           /* PSK filename, NULL = no PSK */

   /* Extended headers. */
+  bool request_eh;              /* Whether to request extended headers */
   bool extended_headers;        /* If we negotiated NBD_OPT_EXTENDED_HEADERS */

   /* Desired metadata contexts. */
diff --git a/generator/API.ml b/generator/API.ml
index aeee41fb..3d0289f6 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -808,6 +808,77 @@   "get_tls_psk_file", {
   };
 *)

+  "set_request_extended_headers", {
+    default_call with
+    args = [Bool "request"]; ret = RErr;
+    permitted_states = [ Created ];
+    shortdesc = "control use of extended headers";
+    longdesc = "\
+By default, libnbd tries to negotiate extended headers with the
+server, as this protocol extension permits the use of 64-bit
+zero, trim, and block status actions.  However,
+for integration testing, it can be useful to clear this flag
+rather than find a way to alter the server to fail the negotiation
+request.
+
+For backwards compatibility, the setting of this knob is ignored
+if L<nbd_set_request_structured_replies(3)> is also set to false,
+since the use of extended headers implies structured replies.";
+    see_also = [Link "get_request_extended_headers";
+                Link "set_handshake_flags"; Link "set_strict_mode";
+                Link "get_extended_headers_negotiated";
+                Link "zero"; Link "trim"; Link "cache";
+                Link "block_status_64";
+                Link "set_request_structured_replies"];
+  };
+
+  "get_request_extended_headers", {
+    default_call with
+    args = []; ret = RBool;
+    may_set_error = false;
+    shortdesc = "see if extended headers are attempted";
+    longdesc = "\
+Return the state of the request extended headers flag on this
+handle.
+
+B<Note:> If you want to find out if extended headers were actually
+negotiated on a particular connection use
+L<nbd_get_extended_headers_negotiated(3)> instead.";
+    see_also = [Link "set_request_extended_headers";
+                Link "get_extended_headers_negotiated";
+                Link "get_request_extended_headers"];
+  };
+
+  "get_extended_headers_negotiated", {
+    default_call with
+    args = []; ret = RBool;
+    permitted_states = [ Negotiating; Connected; Closed ];
+    shortdesc = "see if extended headers are in use";
+    longdesc = "\
+After connecting you may call this to find out if the connection is
+using extended headers.
+
+When extended headers are not in use, commands are limited to a
+32-bit length, even when the libnbd API uses a 64-bit parameter
+to express the length.  But even when extended headers are
+supported, the server may enforce other limits, visible through
+L<nbd_get_block_size(3)>.
+
+Note that when extended headers are negotiated, you should
+prefer the use of L<nbd_block_status_64(3)> instead of
+L<nbd_block_status(3)> if any of the meta contexts you requested
+via L<nbd_add_meta_context(3)> might return 64-bit status
+values; however, all of the well-known meta contexts covered
+by current C<LIBNBD_CONTEXT_*> constants only return 32-bit
+status.";
+    see_also = [Link "set_request_extended_headers";
+                Link "get_request_extended_headers";
+                Link "zero"; Link "trim"; Link "cache";
+                Link "block_status_64"; Link "get_block_size";
+                Link "get_protocol";
+                Link "get_structured_replies_negotiated"];
+  };
+
   "set_request_structured_replies", {
     default_call with
     args = [Bool "request"]; ret = RErr;
@@ -821,12 +892,16 @@   "set_request_structured_replies", {
 rather than find a way to alter the server to fail the negotiation
 request.  It is also useful to set this to false prior to using
 L<nbd_set_opt_mode(3)> if it is desired to control when to send
-L<nbd_opt_structured_reply(3)> during negotiation.";
+L<nbd_opt_structured_reply(3)> during negotiation.
+
+Note that setting this knob to false also disables any automatic
+request for extended headers.";
     see_also = [Link "get_request_structured_replies";
                 Link "set_handshake_flags"; Link "set_strict_mode";
                 Link "opt_structured_reply";
                 Link "get_structured_replies_negotiated";
-                Link "can_meta_context"; Link "can_df"];
+                Link "can_meta_context"; Link "can_df";
+                Link "set_request_extended_headers"];
   };

   "get_request_structured_replies", {
@@ -842,7 +917,8 @@   "get_request_structured_replies", {
 negotiated on a particular connection use
 L<nbd_get_structured_replies_negotiated(3)> instead.";
     see_also = [Link "set_request_structured_replies";
-                Link "get_structured_replies_negotiated"];
+                Link "get_structured_replies_negotiated";
+                Link "get_request_extended_headers"];
   };

   "get_structured_replies_negotiated", {
@@ -854,11 +930,17 @@   "get_structured_replies_negotiated", {
 After connecting you may call this to find out if the connection is
 using structured replies.  Note that this setting is sticky; this
 can return true even after a second L<nbd_opt_structured_reply(3)>
-returns false because the server detected a duplicate request.";
+returns false because the server detected a duplicate request.
+
+Note that if the connection negotiates extended headers, this
+function returns true (as extended headers imply structured
+replies) even if no explicit request for structured replies was
+attempted.";
     see_also = [Link "set_request_structured_replies";
                 Link "get_request_structured_replies";
                 Link "opt_structured_reply";
-                Link "get_protocol"];
+                Link "get_protocol";
+                Link "get_extended_headers_negotiated"];
   };

   "set_request_meta_context", {
@@ -2575,7 +2657,9 @@   "trim", {
 or there is an error.  Note this will generally return an error
 if L<nbd_can_trim(3)> is false or L<nbd_is_read_only(3)> is true.

-Note that not all servers can support a C<count> of 4GiB or larger.
+Note that not all servers can support a C<count> of 4GiB or larger;
+L<nbd_get_extended_headers_negotiated(3)> indicates which servers
+will parse a request larger than 32 bits.
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum trim size, although
 a future extension may add a constraint visible in
@@ -2606,7 +2690,9 @@   "cache", {
 this command.  Note this will generally return an error if
 L<nbd_can_cache(3)> is false.

-Note that not all servers can support a C<count> of 4GiB or larger.
+Note that not all servers can support a C<count> of 4GiB or larger;
+L<nbd_get_extended_headers_negotiated(3)> indicates which servers
+will parse a request larger than 32 bits.
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum cache size, although
 a future extension may add a constraint visible in
@@ -2635,7 +2721,9 @@   "zero", {
 or there is an error.  Note this will generally return an error if
 L<nbd_can_zero(3)> is false or L<nbd_is_read_only(3)> is true.

-Note that not all servers can support a C<count> of 4GiB or larger.
+Note that not all servers can support a C<count> of 4GiB or larger;
+L<nbd_get_extended_headers_negotiated(3)> indicates which servers
+will parse a request larger than 32 bits.
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum zero size, although
 a future extension may add a constraint visible in
@@ -2675,7 +2763,9 @@   "block_status", {
 are supported, the number of blocks and cumulative length
 of those blocks need not be identical between contexts.

-Note that not all servers can support a C<count> of 4GiB or larger.
+Note that not all servers can support a C<count> of 4GiB or larger;
+L<nbd_get_extended_headers_negotiated(3)> indicates which servers
+will parse a request larger than 32 bits.
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum block status size,
 although a future extension may add a constraint visible in
@@ -2753,7 +2843,9 @@   "block_status_64", {
 are supported, the number of blocks and cumulative length
 of those blocks need not be identical between contexts.

-Note that not all servers can support a C<count> of 4GiB or larger.
+Note that not all servers can support a C<count> of 4GiB or larger;
+L<nbd_get_extended_headers_negotiated(3)> indicates which servers
+will parse a request larger than 32 bits.
 The NBD protocol does not yet have a way for a client to learn if
 the server will enforce an even smaller maximum block status size,
 although a future extension may add a constraint visible in
@@ -3972,6 +4064,9 @@ let first_version =
   "aio_opt_starttls", (1, 16);
   "block_status_64", (1, 16);
   "aio_block_status_64", (1, 16);
+  "set_request_extended_headers", (1, 16);
+  "get_request_extended_headers", (1, 16);
+  "get_extended_headers_negotiated", (1, 16);

   (* These calls are proposed for a future version of libnbd, but
    * have not been added to any released version so far.
diff --git a/generator/states-newstyle-opt-starttls.c b/generator/states-newstyle-opt-starttls.c
index 75fce085..6d528672 100644
--- a/generator/states-newstyle-opt-starttls.c
+++ b/generator/states-newstyle-opt-starttls.c
@@ -86,6 +86,7 @@  NEWSTYLE.OPT_STARTTLS.CHECK_REPLY:
     }
     nbd_internal_reset_size_and_flags (h);
     h->structured_replies = false;
+    h->extended_headers = false;
     h->meta_valid = false;
     new_sock = nbd_internal_crypto_create_session (h, h->sock);
     if (new_sock == NULL) {
diff --git a/generator/states-newstyle-opt-structured-reply.c b/generator/states-newstyle-opt-structured-reply.c
index 46b4161b..4608a38a 100644
--- a/generator/states-newstyle-opt-structured-reply.c
+++ b/generator/states-newstyle-opt-structured-reply.c
@@ -25,7 +25,7 @@  NEWSTYLE.OPT_STRUCTURED_REPLY.START:
     assert (h->opt_mode);
   else {
     assert (CALLBACK_IS_NULL (h->opt_cb.completion));
-    if (!h->request_sr) {
+    if (!h->request_sr || h->structured_replies) {
       if (h->opt_mode)
         SET_NEXT_STATE (%.NEGOTIATING);
       else
@@ -84,6 +84,7 @@  NEWSTYLE.OPT_STRUCTURED_REPLY.CHECK_REPLY:
     err = 0;
     break;
   case NBD_REP_ERR_INVALID:
+  case NBD_REP_ERR_EXT_HEADER_REQD:
     err = EINVAL;
     /* fallthrough */
   default:
diff --git a/lib/handle.c b/lib/handle.c
index 41e442e7..5cf7956e 100644
--- a/lib/handle.c
+++ b/lib/handle.c
@@ -63,6 +63,7 @@ nbd_create (void)

   h->unique = 1;
   h->tls_verify_peer = true;
+  h->request_eh = true;
   h->request_sr = true;
   h->request_meta = true;
   h->request_block_size = true;
@@ -384,6 +385,28 @@ nbd_unlocked_clear_meta_contexts (struct nbd_handle *h)
   return 0;
 }

+
+int
+nbd_unlocked_set_request_extended_headers (struct nbd_handle *h,
+                                           bool request)
+{
+  h->request_eh = request;
+  return 0;
+}
+
+/* NB: may_set_error = false. */
+int
+nbd_unlocked_get_request_extended_headers (struct nbd_handle *h)
+{
+  return h->request_eh;
+}
+
+int
+nbd_unlocked_get_extended_headers_negotiated (struct nbd_handle *h)
+{
+  return h->extended_headers;
+}
+
 int
 nbd_unlocked_set_request_structured_replies (struct nbd_handle *h,
                                              bool request)
diff --git a/python/t/110-defaults.py b/python/t/110-defaults.py
index 6b62c8a4..bada47de 100644
--- a/python/t/110-defaults.py
+++ b/python/t/110-defaults.py
@@ -21,6 +21,7 @@
 assert h.get_export_name() == ""
 assert h.get_full_info() is False
 assert h.get_tls() == nbd.TLS_DISABLE
+assert h.get_request_extended_headers() is True
 assert h.get_request_structured_replies() is True
 assert h.get_request_meta_context() is True
 assert h.get_request_block_size() is True
diff --git a/python/t/120-set-non-defaults.py b/python/t/120-set-non-defaults.py
index 8b48b4ad..2a7bd928 100644
--- a/python/t/120-set-non-defaults.py
+++ b/python/t/120-set-non-defaults.py
@@ -31,6 +31,8 @@
 if h.supports_tls():
     h.set_tls(nbd.TLS_ALLOW)
     assert h.get_tls() == nbd.TLS_ALLOW
+h.set_request_extended_headers(False)
+assert h.get_request_extended_headers() is False
 h.set_request_structured_replies(False)
 assert h.get_request_structured_replies() is False
 h.set_request_meta_context(False)
diff --git a/ocaml/tests/test_110_defaults.ml b/ocaml/tests/test_110_defaults.ml
index 768ad636..cc0b492e 100644
--- a/ocaml/tests/test_110_defaults.ml
+++ b/ocaml/tests/test_110_defaults.ml
@@ -26,6 +26,8 @@ let
       assert (not info);
       let tls = NBD.get_tls nbd in
       assert (tls = NBD.TLS.DISABLE);
+      let eh = NBD.get_request_extended_headers nbd in
+      assert (eh = true);
       let sr = NBD.get_request_structured_replies nbd in
       assert sr;
       let meta = NBD.get_request_meta_context nbd in
diff --git a/ocaml/tests/test_120_set_non_defaults.ml b/ocaml/tests/test_120_set_non_defaults.ml
index 76ff82fb..efb6817a 100644
--- a/ocaml/tests/test_120_set_non_defaults.ml
+++ b/ocaml/tests/test_120_set_non_defaults.ml
@@ -39,6 +39,9 @@ let
         let tls = NBD.get_tls nbd in
         assert (tls = NBD.TLS.ALLOW);
       );
+      NBD.set_request_extended_headers nbd false;
+      let eh = NBD.get_request_extended_headers nbd in
+      assert (eh = false);
       NBD.set_request_structured_replies nbd false;
       let sr = NBD.get_request_structured_replies nbd in
       assert (not sr);
diff --git a/golang/libnbd_110_defaults_test.go b/golang/libnbd_110_defaults_test.go
index d7ad319c..bba94df8 100644
--- a/golang/libnbd_110_defaults_test.go
+++ b/golang/libnbd_110_defaults_test.go
@@ -51,6 +51,14 @@ func Test110Defaults(t *testing.T) {
 		t.Fatalf("unexpected tls state")
 	}

+	eh, err := h.GetRequestExtendedHeaders()
+	if err != nil {
+		t.Fatalf("could not get extended headers state: %s", err)
+	}
+	if eh != true {
+		t.Fatalf("unexpected extended headers state")
+	}
+
 	sr, err := h.GetRequestStructuredReplies()
 	if err != nil {
 		t.Fatalf("could not get structured replies state: %s", err)
diff --git a/golang/libnbd_120_set_non_defaults_test.go b/golang/libnbd_120_set_non_defaults_test.go
index 06bb29d5..5b63bc7b 100644
--- a/golang/libnbd_120_set_non_defaults_test.go
+++ b/golang/libnbd_120_set_non_defaults_test.go
@@ -81,6 +81,18 @@ func Test120SetNonDefaults(t *testing.T) {
 		}
 	}

+	err = h.SetRequestExtendedHeaders(false)
+	if err != nil {
+		t.Fatalf("could not set extended headers state: %s", err)
+	}
+	eh, err := h.GetRequestExtendedHeaders()
+	if err != nil {
+		t.Fatalf("could not get extended headers state: %s", err)
+	}
+	if eh != false {
+		t.Fatalf("unexpected extended headers state")
+	}
+
 	err = h.SetRequestStructuredReplies(false)
 	if err != nil {
 		t.Fatalf("could not set structured replies state: %s", err)
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 12/23] copy: Update nbdcopy to use 64-bit block status
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (10 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 11/23] api: Add several functions for controlling extended headers Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 13/23] dump: Update nbddump " Eric Blake
                     ` (10 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Although our use of "base:allocation" doesn't require the use of the
64-bit API for flags, we might perform slightly faster for a server
that does give us 64-bit extent lengths and honors larger nbd_zero
lengths.
---
 copy/nbd-ops.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/copy/nbd-ops.c b/copy/nbd-ops.c
index 34ab4857..dad78ea9 100644
--- a/copy/nbd-ops.c
+++ b/copy/nbd-ops.c
@@ -428,7 +428,7 @@ nbd_ops_asynch_notify_write (struct rw *rw, size_t index)
  * request for extents in a single round trip.
  */
 static int add_extent (void *vp, const char *metacontext,
-                       uint64_t offset, uint32_t *entries, size_t nr_entries,
+                       uint64_t offset, nbd_extent *entries, size_t nr_entries,
                        int *error);

 static void
@@ -449,11 +449,11 @@ nbd_ops_get_extents (struct rw *rw, size_t index,
     size_t i;

     exts.len = 0;
-    if (nbd_block_status (nbd, count, offset,
-                          (nbd_extent_callback) {
-                            .user_data = &exts,
-                            .callback = add_extent
-                          }, 0) == -1) {
+    if (nbd_block_status_64 (nbd, count, offset,
+                             (nbd_extent64_callback) {
+                               .user_data = &exts,
+                               .callback = add_extent
+                             }, 0) == -1) {
       /* XXX We could call default_get_extents, but unclear if it's
        * the right thing to do if the server is returning errors.
        */
@@ -493,7 +493,7 @@ nbd_ops_get_extents (struct rw *rw, size_t index,

 static int
 add_extent (void *vp, const char *metacontext,
-            uint64_t offset, uint32_t *entries, size_t nr_entries,
+            uint64_t offset, nbd_extent *entries, size_t nr_entries,
             int *error)
 {
   extent_list *ret = vp;
@@ -502,25 +502,25 @@ add_extent (void *vp, const char *metacontext,
   if (strcmp (metacontext, "base:allocation") != 0 || *error)
     return 0;

-  for (i = 0; i < nr_entries; i += 2) {
+  for (i = 0; i < nr_entries; i++) {
     struct extent e;

     e.offset = offset;
-    e.length = entries[i];
+    e.length = entries[i].length;

     /* Note we deliberately don't care about the HOLE flag.  There is
      * no need to read extent that reads as zeroes.  We will convert
      * to it to a hole or allocated extents based on the command line
      * arguments.
      */
-    e.zero = (entries[i+1] & LIBNBD_STATE_ZERO) != 0;
+    e.zero = (entries[i].flags & LIBNBD_STATE_ZERO) != 0;

     if (extent_list_append (ret, e) == -1) {
       perror ("realloc");
       exit (EXIT_FAILURE);
     }

-    offset += entries[i];
+    offset += entries[i].length;
   }

   return 0;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 13/23] dump: Update nbddump to use 64-bit block status
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (11 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 12/23] copy: Update nbdcopy to use 64-bit block status Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 14/23] info: Expose extended-headers support through nbdinfo Eric Blake
                     ` (9 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Although our use of "base:allocation" doesn't require the use of the
64-bit API for flags, we might perform slightly faster for a server
that does give us 64-bit extent lengths.
---
 dump/dump.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index bdbc9040..0427ab86 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -38,7 +38,7 @@
 #include "version.h"
 #include "vector.h"

-DEFINE_VECTOR_TYPE (uint32_vector, uint32_t)
+DEFINE_VECTOR_TYPE (uint64_vector, uint64_t)

 static const char *progname;
 static struct nbd_handle *nbd;
@@ -262,10 +262,10 @@ catch_signal (int sig)
 static int
 extent_callback (void *user_data, const char *metacontext,
                  uint64_t offset,
-                 uint32_t *entries, size_t nr_entries,
+                 nbd_extent *entries, size_t nr_entries,
                  int *error)
 {
-  uint32_vector *list = user_data;
+  uint64_vector *list = user_data;
   size_t i;

   if (strcmp (metacontext, LIBNBD_CONTEXT_BASE_ALLOCATION) != 0)
@@ -273,7 +273,8 @@ extent_callback (void *user_data, const char *metacontext,

   /* Just append the entries we got to the list. */
   for (i = 0; i < nr_entries; ++i) {
-    if (uint32_vector_append (list, entries[i]) == -1) {
+    if (uint64_vector_append (list, entries[i].length) == -1 ||
+        uint64_vector_append (list, entries[i].flags) == -1) {
       perror ("realloc");
       exit (EXIT_FAILURE);
     }
@@ -284,7 +285,7 @@ extent_callback (void *user_data, const char *metacontext,
 static bool
 test_all_zeroes (uint64_t offset, size_t count)
 {
-  uint32_vector entries = empty_vector;
+  uint64_vector entries = empty_vector;
   size_t i;
   uint64_t count_read;

@@ -296,22 +297,22 @@ test_all_zeroes (uint64_t offset, size_t count)
    * false, causing the main code to do a full read.  We could be
    * smarter and keep asking the server (XXX).
    */
-  if (nbd_block_status (nbd, count, offset,
-                        (nbd_extent_callback) {
-                          .callback = extent_callback,
-                          .user_data = &entries },
-                        0) == -1) {
+  if (nbd_block_status_64 (nbd, count, offset,
+                           (nbd_extent64_callback) {
+                             .callback = extent_callback,
+                             .user_data = &entries },
+                           0) == -1) {
     fprintf (stderr, "%s: %s\n", progname, nbd_get_error ());
     exit (EXIT_FAILURE);
   }

   count_read = 0;
   for (i = 0; i < entries.len; i += 2) {
-    uint32_t len = entries.ptr[i];
-    uint32_t type = entries.ptr[i+1];
+    uint64_t len = entries.ptr[i];
+    uint64_t type = entries.ptr[i+1];

     count_read += len;
-    if (!(type & 2))            /* not zero */
+    if (!(type & LIBNBD_STATE_ZERO))            /* not zero */
       return false;
   }

-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 14/23] info: Expose extended-headers support through nbdinfo
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (12 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 13/23] dump: Update nbddump " Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 15/23] info: Update nbdinfo --map to use 64-bit block status Eric Blake
                     ` (8 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Add another bit of overall server information, as well as a '--can
extended-headers' silent query.  For now, the testsuite is written
assuming that when nbdkit finally adds extended headers support, it
will also add a --no-eh kill switch comparable to its existing --no-sr
switch.
---
 info/nbdinfo.pod     | 11 ++++++++++-
 info/can.c           |  9 +++++++++
 info/info-can.sh     | 27 +++++++++++++++++++++++++++
 info/info-packets.sh | 17 ++++++++++++++++-
 info/main.c          |  7 ++++++-
 5 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/info/nbdinfo.pod b/info/nbdinfo.pod
index c47e5175..2455e1c0 100644
--- a/info/nbdinfo.pod
+++ b/info/nbdinfo.pod
@@ -86,6 +86,7 @@ the I<--json> parameter:
    "protocol": "newstyle-fixed",
    "TLS": false,
    "structured": true,
+   "extended": false,
    "exports": [
      {
        "export-name": "",
@@ -165,6 +166,11 @@ Test if the NBD URI connection is using TLS.
 Test if server can respond with structured replies (a prerequisite
 for supporting block status commands).

+=item nbdinfo --can extended-headers URI
+
+Test if server supports extended headers (a prerequisite for
+supporting 64-bit commands; implies structured replies as well).
+
 =item nbdinfo --is rotational URI

 Test if the server export is backed by something which behaves like a
@@ -312,6 +318,8 @@ Display brief command line help and exit.

 =item B<--can df>

+=item B<--can extended-headers>
+
 =item B<--can fast-zero>

 =item B<--can flush>
@@ -341,7 +349,8 @@ and the following libnbd functions: L<nbd_can_cache(3)>,
 L<nbd_can_df(3)>, L<nbd_can_fast_zero(3)>, L<nbd_can_flush(3)>,
 L<nbd_can_fua(3)>, L<nbd_can_multi_conn(3)>, L<nbd_can_trim(3)>,
 L<nbd_can_zero(3)>, L<nbd_is_read_only(3)>,
-L<nbd_get_structured_replies_negotiated(3)>.
+L<nbd_get_structured_replies_negotiated(3)>,
+L<nbd_get_extended_headers_negotiated(3)>.

 =item B<--color>

diff --git a/info/can.c b/info/can.c
index 08d6bcd5..f602ffce 100644
--- a/info/can.c
+++ b/info/can.c
@@ -50,6 +50,15 @@ do_can (void)
            strcasecmp (can, "structured_replies") == 0)
     feature = nbd_get_structured_replies_negotiated (nbd);

+  else if (strcasecmp (can, "eh") == 0 ||
+           strcasecmp (can, "extended header") == 0 ||
+           strcasecmp (can, "extended-header") == 0 ||
+           strcasecmp (can, "extended_header") == 0 ||
+           strcasecmp (can, "extended headers") == 0 ||
+           strcasecmp (can, "extended-headers") == 0 ||
+           strcasecmp (can, "extended_headers") == 0)
+    feature = nbd_get_extended_headers_negotiated (nbd);
+
   else if (strcasecmp (can, "readonly") == 0 ||
            strcasecmp (can, "read-only") == 0 ||
            strcasecmp (can, "read_only") == 0)
diff --git a/info/info-can.sh b/info/info-can.sh
index 3edc3948..e5f6a44b 100755
--- a/info/info-can.sh
+++ b/info/info-can.sh
@@ -61,6 +61,33 @@ esac
 EOF
 test $st = 2

+# --can extended-headers cannot be positively tested until nbdkit gains
+# --no-eh support.  Otherwise, it is similar to --can structured-reply.
+
+no_eh=
+if nbdkit --no-eh --help >/dev/null 2>/dev/null; then
+    no_eh=--no-eh
+    nbdkit -v -U - sh - \
+           --run '$VG nbdinfo --can extended-headers "nbd+unix:///?socket=$unixsocket"' <<'EOF'
+case "$1" in
+  get_size) echo 1024 ;;
+  pread) ;;
+  *) exit 2 ;;
+esac
+EOF
+fi
+
+st=0
+nbdkit -v -U - $no_eh sh - \
+       --run '$VG nbdinfo --can extended-headers "nbd+unix:///?socket=$unixsocket"' <<'EOF' || st=$?
+case "$1" in
+  get_size) echo 1024 ;;
+  pread) ;;
+  *) exit 2 ;;
+esac
+EOF
+test $st = 2
+
 # --can cache and --can fua require special handling because in
 # nbdkit-sh-plugin we must print "native" or "none".  Also the can_fua
 # flag is only sent if the export is writable (hence can_write below).
diff --git a/info/info-packets.sh b/info/info-packets.sh
index 82bb526c..a6b307a0 100755
--- a/info/info-packets.sh
+++ b/info/info-packets.sh
@@ -27,12 +27,27 @@ requires nbdkit --no-sr memory --version
 out=info-packets.out
 cleanup_fn rm -f $out

+# Older nbdkit does not support extended headers; --no-eh is a reliable
+# witness of whether nbdkit is new enough.
+
+no_eh=
+if nbdkit --no-eh --help >/dev/null 2>/dev/null; then
+    no_eh=--no-eh
+fi
+
 nbdkit --no-sr -U - memory size=1M \
        --run '$VG nbdinfo "nbd+unix:///?socket=$unixsocket"' > $out
 cat $out
 grep "protocol: .*using simple packets" $out

-nbdkit -U - memory size=1M \
+nbdkit $no_eh -U - memory size=1M \
        --run '$VG nbdinfo "nbd+unix:///?socket=$unixsocket"' > $out
 cat $out
 grep "protocol: .*using structured packets" $out
+
+if test x != "x$no_eh"; then
+    nbdkit -U - memory size=1M \
+           --run '$VG nbdinfo "nbd+unix:///?socket=$unixsocket"' > $out
+    cat $out
+    grep "protocol: .*using extended packets" $out
+fi
diff --git a/info/main.c b/info/main.c
index 5cd91fe1..9794c109 100644
--- a/info/main.c
+++ b/info/main.c
@@ -302,11 +302,13 @@ main (int argc, char *argv[])
     const char *protocol;
     int tls_negotiated;
     int sr_negotiated;
+    int eh_negotiated;

     /* Print per-connection fields. */
     protocol = nbd_get_protocol (nbd);
     tls_negotiated = nbd_get_tls_negotiated (nbd);
     sr_negotiated = nbd_get_structured_replies_negotiated (nbd);
+    eh_negotiated = nbd_get_extended_headers_negotiated (nbd);

     if (!json_output) {
       if (protocol) {
@@ -314,8 +316,9 @@ main (int argc, char *argv[])
         fprintf (fp, "protocol: %s", protocol);
         if (tls_negotiated >= 0)
           fprintf (fp, " %s TLS", tls_negotiated ? "with" : "without");
-        if (sr_negotiated >= 0)
+        if (eh_negotiated >= 0 && sr_negotiated >= 0)
           fprintf (fp, ", using %s packets",
+                   eh_negotiated ? "extended" :
                    sr_negotiated ? "structured" : "simple");
         fprintf (fp, "\n");
         ansi_restore (fp);
@@ -333,6 +336,8 @@ main (int argc, char *argv[])
         fprintf (fp, "\"TLS\": %s,\n", tls_negotiated ? "true" : "false");
       if (sr_negotiated >= 0)
         fprintf (fp, "\"structured\": %s,\n", sr_negotiated ? "true" : "false");
+      if (eh_negotiated >= 0)
+        fprintf (fp, "\"extended\": %s,\n", eh_negotiated ? "true" : "false");
     }

     if (!list_all)
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 15/23] info: Update nbdinfo --map to use 64-bit block status
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (13 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 14/23] info: Expose extended-headers support through nbdinfo Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 16/23] examples: Update copy-libev " Eric Blake
                     ` (7 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Although we usually map "base:allocation" which doesn't require the
use of the 64-bit API for flags, this application IS intended to map
out other metacontexts that might have 64-bit flags.  And when
extended headers are in use, we might as well ask for the server to
give us extents as large as it wants, rather than breaking things up
at 4G boundaries.
---
 info/map.c | 67 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 35 insertions(+), 32 deletions(-)

diff --git a/info/map.c b/info/map.c
index a5aad955..ffa53b81 100644
--- a/info/map.c
+++ b/info/map.c
@@ -1,5 +1,5 @@
 /* NBD client library in userspace
- * Copyright (C) 2020-2021 Red Hat Inc.
+ * Copyright (C) 2020-2022 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -36,13 +36,13 @@

 #include "nbdinfo.h"

-DEFINE_VECTOR_TYPE (uint32_vector, uint32_t)
+DEFINE_VECTOR_TYPE (uint64_vector, uint64_t)

-static void print_extents (uint32_vector *entries);
-static void print_totals (uint32_vector *entries, int64_t size);
+static void print_extents (uint64_vector *entries);
+static void print_totals (uint64_vector *entries, int64_t size);
 static int extent_callback (void *user_data, const char *metacontext,
                             uint64_t offset,
-                            uint32_t *entries, size_t nr_entries,
+                            nbd_extent *entries, size_t nr_entries,
                             int *error);

 void
@@ -50,7 +50,7 @@ do_map (void)
 {
   size_t i;
   int64_t size;
-  uint32_vector entries = empty_vector;
+  uint64_vector entries = empty_vector;
   uint64_t offset, align, max_len;
   size_t prev_entries_size;

@@ -69,14 +69,16 @@ do_map (void)
     fprintf (stderr, "%s: %s\n", progname, nbd_get_error ());
     exit (EXIT_FAILURE);
   }
+  if (nbd_get_extended_headers_negotiated (nbd) == 1)
+    max_len = size;

   for (offset = 0; offset < size;) {
     prev_entries_size = entries.len;
-    if (nbd_block_status (nbd, MIN (size - offset, max_len), offset,
-                          (nbd_extent_callback) {
-                            .callback = extent_callback,
-                            .user_data = &entries },
-                          0) == -1) {
+    if (nbd_block_status_64 (nbd, MIN (size - offset, max_len), offset,
+                             (nbd_extent64_callback) {
+                               .callback = extent_callback,
+                               .user_data = &entries },
+                             0) == -1) {
       fprintf (stderr, "%s: %s\n", progname, nbd_get_error ());
       exit (EXIT_FAILURE);
     }
@@ -99,18 +101,18 @@ do_map (void)
 }

 /* Callback handling --map. */
-static void print_one_extent (uint64_t offset, uint64_t len, uint32_t type);
-static void extent_description (const char *metacontext, uint32_t type,
+static void print_one_extent (uint64_t offset, uint64_t len, uint64_t type);
+static void extent_description (const char *metacontext, uint64_t type,
                                 char **descr, bool *free_descr,
                                 const char **fg, const char **bg);

 static int
 extent_callback (void *user_data, const char *metacontext,
                  uint64_t offset,
-                 uint32_t *entries, size_t nr_entries,
+                 nbd_extent *entries, size_t nr_entries,
                  int *error)
 {
-  uint32_vector *list = user_data;
+  uint64_vector *list = user_data;
   size_t i;

   if (strcmp (metacontext, map) != 0)
@@ -120,7 +122,8 @@ extent_callback (void *user_data, const char *metacontext,
    * print_extents below.
    */
   for (i = 0; i < nr_entries; ++i) {
-    if (uint32_vector_append (list, entries[i]) == -1) {
+    if (uint64_vector_append (list, entries[i].length) == -1 ||
+        uint64_vector_append (list, entries[i].flags) == -1) {
       perror ("realloc");
       exit (EXIT_FAILURE);
     }
@@ -129,7 +132,7 @@ extent_callback (void *user_data, const char *metacontext,
 }

 static void
-print_extents (uint32_vector *entries)
+print_extents (uint64_vector *entries)
 {
   size_t i, j;
   uint64_t offset = 0;          /* end of last extent printed + 1 */
@@ -138,7 +141,7 @@ print_extents (uint32_vector *entries)
   if (json_output) fprintf (fp, "[\n");

   for (i = 0; i < entries->len; i += 2) {
-    uint32_t type = entries->ptr[last+1];
+    uint64_t type = entries->ptr[last+1];

     /* If we're coalescing and the current type is different from the
      * previous one then we should print everything up to this entry.
@@ -157,7 +160,7 @@ print_extents (uint32_vector *entries)

   /* Print the last extent if there is one. */
   if (last != i) {
-    uint32_t type = entries->ptr[last+1];
+    uint64_t type = entries->ptr[last+1];
     uint64_t len;

     for (j = last, len = 0; j < i; j += 2)
@@ -169,7 +172,7 @@ print_extents (uint32_vector *entries)
 }

 static void
-print_one_extent (uint64_t offset, uint64_t len, uint32_t type)
+print_one_extent (uint64_t offset, uint64_t len, uint64_t type)
 {
   static bool comma = false;
   char *descr;
@@ -185,7 +188,7 @@ print_one_extent (uint64_t offset, uint64_t len, uint32_t type)
       ansi_colour (bg, fp);
     fprintf (fp, "%10" PRIu64 "  "
              "%10" PRIu64 "  "
-             "%3" PRIu32,
+             "%3" PRIu64,
              offset, len, type);
     if (descr)
       fprintf (fp, "  %s", descr);
@@ -199,7 +202,7 @@ print_one_extent (uint64_t offset, uint64_t len, uint32_t type)

     fprintf (fp, "{ \"offset\": %" PRIu64 ", "
              "\"length\": %" PRIu64 ", "
-             "\"type\": %" PRIu32,
+             "\"type\": %" PRIu64,
              offset, len, type);
     if (descr) {
       fprintf (fp, ", \"description\": ");
@@ -215,9 +218,9 @@ print_one_extent (uint64_t offset, uint64_t len, uint32_t type)

 /* --map --totals suboption */
 static void
-print_totals (uint32_vector *entries, int64_t size)
+print_totals (uint64_vector *entries, int64_t size)
 {
-  uint32_t type;
+  uint64_t type;
   bool comma = false;

   /* This is necessary to avoid a divide by zero below, but if the
@@ -237,16 +240,16 @@ print_totals (uint32_vector *entries, int64_t size)
    */
   type = 0;
   for (;;) {
-    uint64_t next_type = (uint64_t)UINT32_MAX + 1;
+    uint64_t next_type = 0;
     uint64_t c = 0;
     size_t i;

     for (i = 0; i < entries->len; i += 2) {
-      uint32_t t = entries->ptr[i+1];
+      uint64_t t = entries->ptr[i+1];

       if (t == type)
         c += entries->ptr[i];
-      else if (type < t && t < next_type)
+      else if (type < t && (next_type == 0 || t < next_type))
         next_type = t;
     }

@@ -263,7 +266,7 @@ print_totals (uint32_vector *entries, int64_t size)
           ansi_colour (fg, fp);
         if (bg)
           ansi_colour (bg, fp);
-        fprintf (fp, "%10" PRIu64 " %5.1f%% %3" PRIu32,
+        fprintf (fp, "%10" PRIu64 " %5.1f%% %3" PRIu64,
                  c, percent, type);
         if (descr)
           fprintf (fp, " %s", descr);
@@ -278,7 +281,7 @@ print_totals (uint32_vector *entries, int64_t size)
         fprintf (fp,
                  "{ \"size\": %" PRIu64 ", "
                  "\"percent\": %g, "
-                 "\"type\": %" PRIu32,
+                 "\"type\": %" PRIu64,
                  c, percent, type);
         if (descr) {
           fprintf (fp, ", \"description\": ");
@@ -292,7 +295,7 @@ print_totals (uint32_vector *entries, int64_t size)
         free (descr);
     }

-    if (next_type == (uint64_t)UINT32_MAX + 1)
+    if (next_type == 0)
       break;
     type = next_type;
   }
@@ -301,7 +304,7 @@ print_totals (uint32_vector *entries, int64_t size)
 }

 static void
-extent_description (const char *metacontext, uint32_t type,
+extent_description (const char *metacontext, uint64_t type,
                     char **descr, bool *free_descr,
                     const char **fg, const char **bg)
 {
@@ -348,7 +351,7 @@ extent_description (const char *metacontext, uint32_t type,
       *fg = ANSI_FG_BRIGHT_WHITE; *bg = ANSI_BG_BLACK;
       return;
     default:
-      if (asprintf (descr, "backing depth %u", type) == -1) {
+      if (asprintf (descr, "backing depth %" PRIu64, type) == -1) {
         perror ("asprintf");
         exit (EXIT_FAILURE);
       }
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 16/23] examples: Update copy-libev to use 64-bit block status
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (14 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 15/23] info: Update nbdinfo --map to use 64-bit block status Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 17/23] ocaml: Add example for 64-bit extents Eric Blake
                     ` (6 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Although our use of "base:allocation" doesn't require the use of the
64-bit API for flags, we might perform slightly faster for a server
that does give us 64-bit extent lengths and honors larger nbd_zero
lengths.
---
 examples/copy-libev.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/examples/copy-libev.c b/examples/copy-libev.c
index 418d99f1..d8b45d87 100644
--- a/examples/copy-libev.c
+++ b/examples/copy-libev.c
@@ -94,7 +94,7 @@ struct request {
 };

 struct extent {
-    uint32_t length;
+    uint64_t length;
     bool zero;
 };

@@ -182,7 +182,7 @@ get_events(struct connection *c)

 static int
 extent_callback (void *user_data, const char *metacontext, uint64_t offset,
-                 uint32_t *entries, size_t nr_entries, int *error)
+                 nbd_extent *entries, size_t nr_entries, int *error)
 {
     struct request *r = user_data;

@@ -197,22 +197,21 @@ extent_callback (void *user_data, const char *metacontext, uint64_t offset,
         return 1;
     }

-    /* Libnbd returns uint32_t pair (length, flags) for each extent. */
-    extents_len = nr_entries / 2;
+    extents_len = nr_entries;

     extents = malloc (extents_len * sizeof *extents);
     if (extents == NULL)
         FAIL ("Cannot allocated extents: %s", strerror (errno));

     /* Copy libnbd entries to extents array. */
-    for (int i = 0, j = 0; i < extents_len; i++, j=i*2) {
-        extents[i].length = entries[j];
+    for (int i = 0; i < extents_len; i++) {
+        extents[i].length = entries[i].length;

         /* Libnbd exposes both ZERO and HOLE flags. We care only about
          * ZERO status, meaning we can copy this extent using efficinet
          * zero method.
          */
-        extents[i].zero = (entries[j + 1] & LIBNBD_STATE_ZERO) != 0;
+        extents[i].zero = (entries[i].flags & LIBNBD_STATE_ZERO) != 0;
     }

     DEBUG ("r%zu: received %zu extents for %s",
@@ -284,10 +283,10 @@ start_extents (struct request *r)
     DEBUG ("r%zu: start extents offset=%" PRIi64 " count=%zu",
            r->index, offset, count);

-    cookie = nbd_aio_block_status (
+    cookie = nbd_aio_block_status_64 (
         src.nbd, count, offset,
-        (nbd_extent_callback) { .callback=extent_callback,
-                                .user_data=r },
+        (nbd_extent64_callback) { .callback=extent_callback,
+                                  .user_data=r },
         (nbd_completion_callback) { .callback=extents_completed,
                                     .user_data=r },
         0);
@@ -322,7 +321,7 @@ next_extent (struct request *r)
         limit = MIN (REQUEST_SIZE, size - offset);

     while (length < limit) {
-        DEBUG ("e%zu: offset=%" PRIi64 " len=%" PRIu32 " zero=%d",
+        DEBUG ("e%zu: offset=%" PRIi64 " len=%" PRIu64 " zero=%d",
                extents_pos, offset, extents[extents_pos].length, is_zero);

         /* If this extent is too large, steal some data from it to
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 17/23] ocaml: Add example for 64-bit extents
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (15 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 16/23] examples: Update copy-libev " Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 18/23] generator: Actually request extended headers Eric Blake
                     ` (5 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Since our example program for 32-bit extents is inherently limited to
32-bit lengths, it is also worth demonstrating the 64-bit extent API,
including the difference in the array indexing being saner.
---
 ocaml/examples/Makefile.am  |  3 ++-
 ocaml/examples/extents64.ml | 42 +++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 ocaml/examples/extents64.ml

diff --git a/ocaml/examples/Makefile.am b/ocaml/examples/Makefile.am
index 5ee6dd63..c6f4989d 100644
--- a/ocaml/examples/Makefile.am
+++ b/ocaml/examples/Makefile.am
@@ -1,5 +1,5 @@
 # nbd client library in userspace
-# Copyright (C) 2013-2019 Red Hat Inc.
+# Copyright (C) 2013-2022 Red Hat Inc.
 #
 # This library is free software; you can redistribute it and/or
 # modify it under the terms of the GNU Lesser General Public
@@ -20,6 +20,7 @@ include $(top_srcdir)/subdir-rules.mk
 ml_examples = \
 	asynch_copy.ml \
 	extents.ml \
+	extents64.ml \
 	get_size.ml \
 	open_qcow2.ml \
 	server_flags.ml \
diff --git a/ocaml/examples/extents64.ml b/ocaml/examples/extents64.ml
new file mode 100644
index 00000000..8ee7e218
--- /dev/null
+++ b/ocaml/examples/extents64.ml
@@ -0,0 +1,42 @@
+open Printf
+
+let () =
+  NBD.with_handle (
+    fun nbd ->
+      NBD.add_meta_context nbd "base:allocation";
+      NBD.connect_command nbd
+                          ["nbdkit"; "-s"; "--exit-with-parent"; "-r";
+                           "sparse-random"; "8G"];
+
+      (* Read the extents and print them. *)
+      let size = NBD.get_size nbd in
+      let cap =
+        match NBD.get_extended_headers_negotiated nbd with
+        | true -> size
+        | false -> 0x8000_0000_L in
+      let fetch_offset = ref 0_L in
+      while !fetch_offset < size do
+        let remaining = Int64.sub size !fetch_offset in
+        let fetch_size = min remaining cap in
+        NBD.block_status_64 nbd fetch_size !fetch_offset (
+          fun meta _ entries err ->
+            printf "nbd_block_status callback: meta=%s err=%d\n" meta !err;
+            if meta = "base:allocation" then (
+              printf "index\t%16s %16s %s\n" "offset" "length" "flags";
+              for i = 0 to Array.length entries - 1 do
+                let len = fst entries.(i)
+                and flags =
+                  match snd entries.(i) with
+                  | 0_L -> "data"
+                  | 1_L -> "hole"
+                  | 2_L -> "zero"
+                  | 3_L -> "hole+zero"
+                  | unknown -> sprintf "unknown (%Ld)" unknown in
+                printf "%d:\t%16Ld %16Ld %s\n" i !fetch_offset len flags;
+                fetch_offset := Int64.add !fetch_offset len
+              done;
+            );
+            0
+        ) (* NBD.block_status *)
+      done
+  )
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 18/23] generator: Actually request extended headers
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (16 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 17/23] ocaml: Add example for 64-bit extents Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 19/23] api: Add nbd_[aio_]opt_extended_headers() Eric Blake
                     ` (4 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

This is the culmination of the previous patches' preparation work for
using extended headers when possible.  The new states in the state
machine are copied extensively from our handling of
OPT_STRUCTURED_REPLY.  The next patch will then expose a new API
nbd_opt_extended_headers() for manual control.

At the same time I posted this patch, I had patches for qemu-nbd to
support extended headers as server (nbdkit is a bit tougher).  The
next patches will add some interop tests that pass when using a new
enough qemu-nbd, showing that we have cross-project interoperability
and therefore an extension worth standardizing.
---
 generator/API.ml                              | 87 ++++++++---------
 generator/Makefile.am                         |  3 +-
 generator/state_machine.ml                    | 41 ++++++++
 .../states-newstyle-opt-extended-headers.c    | 94 +++++++++++++++++++
 generator/states-newstyle-opt-starttls.c      |  6 +-
 5 files changed, 185 insertions(+), 46 deletions(-)
 create mode 100644 generator/states-newstyle-opt-extended-headers.c

diff --git a/generator/API.ml b/generator/API.ml
index 3d0289f6..570f15f4 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -953,23 +953,24 @@   "set_request_meta_context", {
 (all C<nbd_connect_*> calls when L<nbd_set_opt_mode(3)> is false,
 or L<nbd_opt_go(3)> and L<nbd_opt_info(3)> when option mode is
 enabled) will also try to issue NBD_OPT_SET_META_CONTEXT when
-the server supports structured replies and any contexts were
-registered by L<nbd_add_meta_context(3)>.  The default setting
-is true; however the extra step of negotiating meta contexts is
-not always desirable: performing both info and go on the same
-export works without needing to re-negotiate contexts on the
-second call; integration testing of other servers may benefit
-from manual invocation of L<nbd_opt_set_meta_context(3)> at
-other times in the negotiation sequence; and even when using just
-L<nbd_opt_info(3)>, it can be faster to collect the server's
+the server supports structured replies or extended headers and
+any contexts were registered by L<nbd_add_meta_context(3)>.  The
+default setting is true; however the extra step of negotiating
+meta contexts is not always desirable: performing both info and
+go on the same export works without needing to re-negotiate
+contexts on the second call; integration testing of other servers
+may benefit from manual invocation of L<nbd_opt_set_meta_context(3)>
+at other times in the negotiation sequence; and even when using
+just L<nbd_opt_info(3)>, it can be faster to collect the server's
 results by relying on the callback function passed to
 L<nbd_opt_list_meta_context(3)> than a series of post-process
 calls to L<nbd_can_meta_context(3)>.

 Note that this control has no effect if the server does not
-negotiate structured replies, or if the client did not request
-any contexts via L<nbd_add_meta_context(3)>.  Setting this
-control to false may cause L<nbd_block_status(3)> to fail.";
+negotiate structured replies or extended headers, or if the
+client did not request any contexts via L<nbd_add_meta_context(3)>.
+Setting this control to false may cause L<nbd_block_status(3)>
+to fail.";
     see_also = [Link "set_opt_mode"; Link "opt_go"; Link "opt_info";
                 Link "opt_list_meta_context"; Link "opt_set_meta_context";
                 Link "get_structured_replies_negotiated";
@@ -1404,11 +1405,11 @@   "opt_info", {
 If successful, functions like L<nbd_is_read_only(3)> and
 L<nbd_get_size(3)> will report details about that export.  If
 L<nbd_set_request_meta_context(3)> is set (the default) and
-structured replies were negotiated, it is also valid to use
-L<nbd_can_meta_context(3)> after this call.  However, it may be
-more efficient to clear that setting and manually utilize
-L<nbd_opt_list_meta_context(3)> with its callback approach, for
-learning which contexts an export supports.  In general, if
+structured replies or extended headers were negotiated, it is also
+valid to use L<nbd_can_meta_context(3)> after this call.  However,
+it may be more efficient to clear that setting and manually
+utilize L<nbd_opt_list_meta_context(3)> with its callback approach,
+for learning which contexts an export supports.  In general, if
 L<nbd_opt_go(3)> is called next, that call will likely succeed
 with the details remaining the same, although this is not
 guaranteed by all servers.
@@ -1538,12 +1539,12 @@   "opt_set_meta_context", {
 recent L<nbd_set_export_name(3)> or L<nbd_connect_uri(3)>.
 This can only be used if L<nbd_set_opt_mode(3)> enabled option
 mode.  Normally, this function is redundant, as L<nbd_opt_go(3)>
-automatically does the same task if structured replies have
-already been negotiated.  But manual control over meta context
-requests can be useful for fine-grained testing of how a server
-handles unusual negotiation sequences.  Often, use of this
-function is coupled with L<nbd_set_request_meta_context(3)> to
-bypass the automatic context request normally performed by
+automatically does the same task if structured replies or extended
+headers have already been negotiated.  But manual control over
+meta context requests can be useful for fine-grained testing of
+how a server handles unusual negotiation sequences.  Often, use
+of this function is coupled with L<nbd_set_request_meta_context(3)>
+to bypass the automatic context request normally performed by
 L<nbd_opt_go(3)>.

 The NBD protocol allows a client to decide how many queries to ask
@@ -1597,12 +1598,13 @@   "opt_set_meta_context_queries", {
 or L<nbd_connect_uri(3)>.  This can only be used if
 L<nbd_set_opt_mode(3)> enabled option mode.  Normally, this
 function is redundant, as L<nbd_opt_go(3)> automatically does
-the same task if structured replies have already been
-negotiated.  But manual control over meta context requests can
-be useful for fine-grained testing of how a server handles
-unusual negotiation sequences.  Often, use of this function is
-coupled with L<nbd_set_request_meta_context(3)> to bypass the
-automatic context request normally performed by L<nbd_opt_go(3)>.
+the same task if structured replies or extended headers have
+already been negotiated.  But manual control over meta context
+requests can be useful for fine-grained testing of how a server
+handles unusual negotiation sequences.  Often, use of this
+function is coupled with L<nbd_set_request_meta_context(3)> to
+bypass the automatic context request normally performed by
+L<nbd_opt_go(3)>.

 The NBD protocol allows a client to decide how many queries to ask
 the server.  This function takes an explicit list of queries; to
@@ -3233,13 +3235,13 @@   "aio_opt_set_meta_context", {
 recent L<nbd_set_export_name(3)> or L<nbd_connect_uri(3)>.
 This can only be used if L<nbd_set_opt_mode(3)> enabled option
 mode.  Normally, this function is redundant, as L<nbd_opt_go(3)>
-automatically does the same task if structured replies have
-already been negotiated.  But manual control over meta context
-requests can be useful for fine-grained testing of how a server
-handles unusual negotiation sequences.  Often, use of this
-function is coupled with L<nbd_set_request_meta_context(3)> to
-bypass the automatic context request normally performed by
-L<nbd_opt_go(3)>.
+automatically does the same task if structured replies or
+extended headers have already been negotiated.  But manual
+control over meta context requests can be useful for fine-grained
+testing of how a server handles unusual negotiation sequences.
+Often, use of this function is coupled with
+L<nbd_set_request_meta_context(3)> to bypass the automatic
+context request normally performed by L<nbd_opt_go(3)>.

 To determine when the request completes, wait for
 L<nbd_aio_is_connecting(3)> to return false.  Or supply the optional
@@ -3266,12 +3268,13 @@   "aio_opt_set_meta_context_queries", {
 or L<nbd_connect_uri(3)>.  This can only be used
 if L<nbd_set_opt_mode(3)> enabled option mode.  Normally, this
 function is redundant, as L<nbd_opt_go(3)> automatically does
-the same task if structured replies have already been
-negotiated.  But manual control over meta context requests can
-be useful for fine-grained testing of how a server handles
-unusual negotiation sequences.  Often, use of this function is
-coupled with L<nbd_set_request_meta_context(3)> to bypass the
-automatic context request normally performed by L<nbd_opt_go(3)>.
+the same task if structured replies or extended headers have
+already been negotiated.  But manual control over meta context
+requests can be useful for fine-grained testing of how a server
+handles unusual negotiation sequences.  Often, use of this
+function is coupled with L<nbd_set_request_meta_context(3)> to
+bypass the automatic context request normally performed by
+L<nbd_opt_go(3)>.

 To determine when the request completes, wait for
 L<nbd_aio_is_connecting(3)> to return false.  Or supply the optional
diff --git a/generator/Makefile.am b/generator/Makefile.am
index 3193c733..09739449 100644
--- a/generator/Makefile.am
+++ b/generator/Makefile.am
@@ -1,5 +1,5 @@
 # nbd client library in userspace
-# Copyright (C) 2013-2020 Red Hat Inc.
+# Copyright (C) 2013-2022 Red Hat Inc.
 #
 # This library is free software; you can redistribute it and/or
 # modify it under the terms of the GNU Lesser General Public
@@ -30,6 +30,7 @@ states_code = \
 	states-issue-command.c \
 	states-magic.c \
 	states-newstyle-opt-export-name.c \
+	states-newstyle-opt-extended-headers.c \
 	states-newstyle-opt-list.c \
 	states-newstyle-opt-go.c \
 	states-newstyle-opt-meta-context.c \
diff --git a/generator/state_machine.ml b/generator/state_machine.ml
index d2c326f3..c9e38a5e 100644
--- a/generator/state_machine.ml
+++ b/generator/state_machine.ml
@@ -297,6 +297,7 @@ and
    * NEGOTIATING after OPT_STRUCTURED_REPLY or any failed OPT_GO.
    *)
   Group ("OPT_STARTTLS", newstyle_opt_starttls_state_machine);
+  Group ("OPT_EXTENDED_HEADERS", newstyle_opt_extended_headers_state_machine);
   Group ("OPT_STRUCTURED_REPLY", newstyle_opt_structured_reply_state_machine);
   Group ("OPT_META_CONTEXT", newstyle_opt_meta_context_state_machine);
   Group ("OPT_GO", newstyle_opt_go_state_machine);
@@ -441,6 +442,46 @@ and
   };
 ]

+(* Fixed newstyle NBD_OPT_EXTENDED_HEADERS option.
+ * Implementation: generator/states-newstyle-opt-extended-headers.c
+ *)
+and newstyle_opt_extended_headers_state_machine = [
+  State {
+    default_state with
+    name = "START";
+    comment = "Try to negotiate newstyle NBD_OPT_EXTENDED_HEADERS";
+    external_events = [];
+  };
+
+  State {
+    default_state with
+    name = "SEND";
+    comment = "Send newstyle NBD_OPT_EXTENDED_HEADERS negotiation request";
+    external_events = [ NotifyWrite, "" ];
+  };
+
+  State {
+    default_state with
+    name = "RECV_REPLY";
+    comment = "Receive newstyle NBD_OPT_EXTENDED_HEADERS option reply";
+    external_events = [ NotifyRead, "" ];
+  };
+
+  State {
+    default_state with
+    name = "RECV_REPLY_PAYLOAD";
+    comment = "Receive any newstyle NBD_OPT_EXTENDED_HEADERS reply payload";
+    external_events = [ NotifyRead, "" ];
+  };
+
+  State {
+    default_state with
+    name = "CHECK_REPLY";
+    comment = "Check newstyle NBD_OPT_EXTENDED_HEADERS option reply";
+    external_events = [];
+  };
+]
+
 (* Fixed newstyle NBD_OPT_STRUCTURED_REPLY option.
  * Implementation: generator/states-newstyle-opt-structured-reply.c
  *)
diff --git a/generator/states-newstyle-opt-extended-headers.c b/generator/states-newstyle-opt-extended-headers.c
new file mode 100644
index 00000000..151787bf
--- /dev/null
+++ b/generator/states-newstyle-opt-extended-headers.c
@@ -0,0 +1,94 @@
+/* nbd client library in userspace: state machine
+ * Copyright (C) 2013-2022 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/* State machine for negotiating NBD_OPT_EXTENDED_HEADERS. */
+
+STATE_MACHINE {
+ NEWSTYLE.OPT_EXTENDED_HEADERS.START:
+  assert (h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE);
+  assert (h->opt_current != NBD_OPT_EXTENDED_HEADERS);
+  assert (CALLBACK_IS_NULL (h->opt_cb.completion));
+  if (!h->request_eh || !h->request_sr) {
+    SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+    return 0;
+  }
+
+  h->sbuf.option.version = htobe64 (NBD_NEW_VERSION);
+  h->sbuf.option.option = htobe32 (NBD_OPT_EXTENDED_HEADERS);
+  h->sbuf.option.optlen = htobe32 (0);
+  h->chunks_sent++;
+  h->wbuf = &h->sbuf;
+  h->wlen = sizeof h->sbuf.option;
+  SET_NEXT_STATE (%SEND);
+  return 0;
+
+ NEWSTYLE.OPT_EXTENDED_HEADERS.SEND:
+  switch (send_from_wbuf (h)) {
+  case -1: SET_NEXT_STATE (%.DEAD); return 0;
+  case 0:
+    h->rbuf = &h->sbuf;
+    h->rlen = sizeof h->sbuf.or.option_reply;
+    SET_NEXT_STATE (%RECV_REPLY);
+  }
+  return 0;
+
+ NEWSTYLE.OPT_EXTENDED_HEADERS.RECV_REPLY:
+  switch (recv_into_rbuf (h)) {
+  case -1: SET_NEXT_STATE (%.DEAD); return 0;
+  case 0:
+    if (prepare_for_reply_payload (h, NBD_OPT_EXTENDED_HEADERS) == -1) {
+      SET_NEXT_STATE (%.DEAD);
+      return 0;
+    }
+    SET_NEXT_STATE (%RECV_REPLY_PAYLOAD);
+  }
+  return 0;
+
+ NEWSTYLE.OPT_EXTENDED_HEADERS.RECV_REPLY_PAYLOAD:
+  switch (recv_into_rbuf (h)) {
+  case -1: SET_NEXT_STATE (%.DEAD); return 0;
+  case 0:  SET_NEXT_STATE (%CHECK_REPLY);
+  }
+  return 0;
+
+ NEWSTYLE.OPT_EXTENDED_HEADERS.CHECK_REPLY:
+  uint32_t reply;
+
+  reply = be32toh (h->sbuf.or.option_reply.reply);
+  switch (reply) {
+  case NBD_REP_ACK:
+    debug (h, "negotiated extended headers on this connection");
+    h->extended_headers = true;
+    /* Extended headers trump structured replies, so skip ahead. */
+    h->structured_replies = true;
+    break;
+  default:
+    if (handle_reply_error (h) == -1) {
+      SET_NEXT_STATE (%.DEAD);
+      return 0;
+    }
+
+    debug (h, "extended headers are not supported by this server");
+    break;
+  }
+
+  /* Next option. */
+  SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+  return 0;
+
+} /* END STATE MACHINE */
diff --git a/generator/states-newstyle-opt-starttls.c b/generator/states-newstyle-opt-starttls.c
index 6d528672..3b509841 100644
--- a/generator/states-newstyle-opt-starttls.c
+++ b/generator/states-newstyle-opt-starttls.c
@@ -26,7 +26,7 @@  NEWSTYLE.OPT_STARTTLS.START:
   else {
     /* If TLS was not requested we skip this option and go to the next one. */
     if (h->tls == LIBNBD_TLS_DISABLE) {
-      SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+      SET_NEXT_STATE (%^OPT_EXTENDED_HEADERS.START);
       return 0;
     }
     assert (CALLBACK_IS_NULL (h->opt_cb.completion));
@@ -128,7 +128,7 @@  NEWSTYLE.OPT_STARTTLS.CHECK_REPLY:
       SET_NEXT_STATE (%.NEGOTIATING);
     else {
       debug (h, "continuing with unencrypted connection");
-      SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+      SET_NEXT_STATE (%^OPT_EXTENDED_HEADERS.START);
     }
     return 0;
   }
@@ -185,7 +185,7 @@  NEWSTYLE.OPT_STARTTLS.TLS_HANDSHAKE_DONE:
   if (h->opt_current == NBD_OPT_STARTTLS)
     SET_NEXT_STATE (%.NEGOTIATING);
   else
-    SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+    SET_NEXT_STATE (%^OPT_EXTENDED_HEADERS.START);
   return 0;

 } /* END STATE MACHINE */
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 19/23] api: Add nbd_[aio_]opt_extended_headers()
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (17 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 18/23] generator: Actually request extended headers Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 20/23] interop: Add test of 64-bit block status Eric Blake
                     ` (3 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Very similar to the recent addition of nbd_opt_structured_reply,
giving us fine-grained control over an extended headers request.

Because nbdkit does not yet support extended headers, testsuite
coverage is limited to interop testing with qemu-nbd.  It shows that
extended headers imply structured replies, regardless of which order
the two are manually negotiated in.
---
 generator/API.ml                              |  79 +++++++--
 .../states-newstyle-opt-extended-headers.c    |  30 +++-
 generator/states-newstyle.c                   |   3 +
 lib/opt.c                                     |  44 +++++
 interop/Makefile.am                           |   6 +
 interop/opt-extended-headers.c                | 153 ++++++++++++++++++
 interop/opt-extended-headers.sh               |  29 ++++
 .gitignore                                    |   1 +
 8 files changed, 329 insertions(+), 16 deletions(-)
 create mode 100644 interop/opt-extended-headers.c
 create mode 100755 interop/opt-extended-headers.sh

diff --git a/generator/API.ml b/generator/API.ml
index 570f15f4..c4d15e3a 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -825,6 +825,7 @@   "set_request_extended_headers", {
 if L<nbd_set_request_structured_replies(3)> is also set to false,
 since the use of extended headers implies structured replies.";
     see_also = [Link "get_request_extended_headers";
+                Link "opt_extended_headers";
                 Link "set_handshake_flags"; Link "set_strict_mode";
                 Link "get_extended_headers_negotiated";
                 Link "zero"; Link "trim"; Link "cache";
@@ -856,7 +857,9 @@   "get_extended_headers_negotiated", {
     shortdesc = "see if extended headers are in use";
     longdesc = "\
 After connecting you may call this to find out if the connection is
-using extended headers.
+using extended headers.  Note that this setting is sticky; this
+can return true even after a second L<nbd_opt_extended_headers(3)>
+returns false because the server detected a duplicate request.

 When extended headers are not in use, commands are limited to a
 32-bit length, even when the libnbd API uses a 64-bit parameter
@@ -938,7 +941,7 @@   "get_structured_replies_negotiated", {
 attempted.";
     see_also = [Link "set_request_structured_replies";
                 Link "get_request_structured_replies";
-                Link "opt_structured_reply";
+                Link "opt_structured_reply"; Link "opt_extended_headers";
                 Link "get_protocol";
                 Link "get_extended_headers_negotiated"];
   };
@@ -1211,12 +1214,13 @@   "set_opt_mode", {
 newstyle server.  This setting has no effect when connecting to an
 oldstyle server.

-Note that libnbd defaults to attempting C<NBD_OPT_STARTTLS> and
-C<NBD_OPT_STRUCTURED_REPLY> before letting you control remaining
-negotiation steps; if you need control over these steps as well,
-first set L<nbd_set_tls(3)> to C<LIBNBD_TLS_DISABLE> and
-L<nbd_set_request_structured_replies(3)> to false before starting
-the connection attempt.
+Note that libnbd defaults to attempting C<NBD_OPT_STARTTLS>,
+C<NBD_OPT_EXTENDED_HEADERS>, and C<NBD_OPT_STRUCTURED_REPLY>
+before letting you control remaining negotiation steps; if you
+need control over these steps as well, first set L<nbd_set_tls(3)>
+to C<LIBNBD_TLS_DISABLE>, and L<nbd_set_request_extended_headers(3)>
+or L<nbd_set_request_structured_replies(3)> to false, before
+starting the connection attempt.

 When option mode is enabled, you have fine-grained control over which
 options are negotiated, compared to the default of the server
@@ -1324,6 +1328,35 @@   "opt_starttls", {
                 Link "supports_tls"]
   };

+  "opt_extended_headers", {
+    default_call with
+    args = []; ret = RBool;
+    permitted_states = [ Negotiating ];
+    shortdesc = "request the server to enable extended headers";
+    longdesc = "\
+Request that the server use extended headers, by sending
+C<NBD_OPT_EXTENDED_HEADERS>.  This can only be used if
+L<nbd_set_opt_mode(3)> enabled option mode; furthermore, libnbd
+defaults to automatically requesting this unless you use
+L<nbd_set_request_extended_headers(3)> or
+L<nbd_set_request_structured_replies(3)> prior to connecting.
+This function is mainly useful for integration testing of corner
+cases in server handling.
+
+This function returns true if the server replies with success,
+false if the server replies with an error, and fails only if
+the server does not reply (such as for a loss of connection).
+Note that some servers fail a second request as redundant;
+libnbd assumes that once one request has succeeded, then
+extended headers are supported (as visible by
+L<nbd_get_extended_headers_negotiated(3)>) regardless if
+later calls to this function return false.  If this function
+returns true, the use of structured replies is implied.";
+    see_also = [Link "set_opt_mode"; Link "aio_opt_extended_headers";
+                Link "opt_go"; Link "set_request_extended_headers";
+                Link "set_request_structured_replies"]
+  };
+
   "opt_structured_reply", {
     default_call with
     args = []; ret = RBool;
@@ -1345,7 +1378,9 @@   "opt_structured_reply", {
 libnbd assumes that once one request has succeeded, then
 structured replies are supported (as visible by
 L<nbd_get_structured_replies_negotiated(3)>) regardless if
-later calls to this function return false.";
+later calls to this function return false.  Similarly, a
+server may fail this request if extended headers are already
+negotiated, since extended headers take priority.";
     see_also = [Link "set_opt_mode"; Link "aio_opt_structured_reply";
                 Link "opt_go"; Link "set_request_structured_replies"]
   };
@@ -3098,6 +3133,30 @@   "aio_opt_starttls", {
     see_also = [Link "set_opt_mode"; Link "opt_starttls"];
   };

+  "aio_opt_extended_headers", {
+    default_call with
+    args = [];
+    optargs = [ OClosure completion_closure ];
+    ret = RErr;
+    permitted_states = [ Negotiating ];
+    shortdesc = "request the server to enable extended headers";
+    longdesc = "\
+Request that the server use extended headers, by sending
+C<NBD_OPT_EXTENDED_HEADERS>.  This behaves like the synchronous
+counterpart L<nbd_opt_extended_headers(3)>, except that it does
+not wait for the server's response.
+
+To determine when the request completes, wait for
+L<nbd_aio_is_connecting(3)> to return false.  Or supply the optional
+C<completion_callback> which will be invoked as described in
+L<libnbd(3)/Completion callbacks>, except that it is automatically
+retired regardless of return value.  Note that detecting whether the
+server returns an error (as is done by the return value of the
+synchronous counterpart) is only possible with a completion
+callback.";
+    see_also = [Link "set_opt_mode"; Link "opt_extended_headers"];
+  };
+
   "aio_opt_structured_reply", {
     default_call with
     args = [];
@@ -4070,6 +4129,8 @@ let first_version =
   "set_request_extended_headers", (1, 16);
   "get_request_extended_headers", (1, 16);
   "get_extended_headers_negotiated", (1, 16);
+  "opt_extended_headers", (1, 16);
+  "aio_opt_extended_headers", (1, 16);

   (* These calls are proposed for a future version of libnbd, but
    * have not been added to any released version so far.
diff --git a/generator/states-newstyle-opt-extended-headers.c b/generator/states-newstyle-opt-extended-headers.c
index 151787bf..d73d1b5f 100644
--- a/generator/states-newstyle-opt-extended-headers.c
+++ b/generator/states-newstyle-opt-extended-headers.c
@@ -21,11 +21,14 @@
 STATE_MACHINE {
  NEWSTYLE.OPT_EXTENDED_HEADERS.START:
   assert (h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE);
-  assert (h->opt_current != NBD_OPT_EXTENDED_HEADERS);
-  assert (CALLBACK_IS_NULL (h->opt_cb.completion));
-  if (!h->request_eh || !h->request_sr) {
-    SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
-    return 0;
+  if (h->opt_current == NBD_OPT_EXTENDED_HEADERS)
+    assert (h->opt_mode);
+  else {
+    assert (CALLBACK_IS_NULL (h->opt_cb.completion));
+    if (!h->request_eh || !h->request_sr) {
+      SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+      return 0;
+    }
   }

   h->sbuf.option.version = htobe64 (NBD_NEW_VERSION);
@@ -68,6 +71,7 @@  NEWSTYLE.OPT_EXTENDED_HEADERS.RECV_REPLY_PAYLOAD:

  NEWSTYLE.OPT_EXTENDED_HEADERS.CHECK_REPLY:
   uint32_t reply;
+  int err = ENOTSUP;

   reply = be32toh (h->sbuf.or.option_reply.reply);
   switch (reply) {
@@ -76,19 +80,31 @@  NEWSTYLE.OPT_EXTENDED_HEADERS.CHECK_REPLY:
     h->extended_headers = true;
     /* Extended headers trump structured replies, so skip ahead. */
     h->structured_replies = true;
+    err = 0;
     break;
+  case NBD_REP_ERR_INVALID:
+    err = EINVAL;
+    /* fallthrough */
   default:
     if (handle_reply_error (h) == -1) {
       SET_NEXT_STATE (%.DEAD);
       return 0;
     }

-    debug (h, "extended headers are not supported by this server");
+    if (h->extended_headers)
+      debug (h, "extended headers already negotiated");
+    else
+      debug (h, "extended headers are not supported by this server");
     break;
   }

   /* Next option. */
-  SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+  if (h->opt_current == NBD_OPT_EXTENDED_HEADERS)
+    SET_NEXT_STATE (%.NEGOTIATING);
+  else
+    SET_NEXT_STATE (%^OPT_STRUCTURED_REPLY.START);
+  CALL_CALLBACK (h->opt_cb.completion, &err);
+  nbd_internal_free_option (h);
   return 0;

 } /* END STATE MACHINE */
diff --git a/generator/states-newstyle.c b/generator/states-newstyle.c
index c88430e2..66bbd86a 100644
--- a/generator/states-newstyle.c
+++ b/generator/states-newstyle.c
@@ -146,6 +146,9 @@  NEWSTYLE.START:
     case NBD_OPT_STRUCTURED_REPLY:
       SET_NEXT_STATE (%OPT_STRUCTURED_REPLY.START);
       return 0;
+    case NBD_OPT_EXTENDED_HEADERS:
+      SET_NEXT_STATE (%OPT_EXTENDED_HEADERS.START);
+      return 0;
     case NBD_OPT_STARTTLS:
       SET_NEXT_STATE (%OPT_STARTTLS.START);
       return 0;
diff --git a/lib/opt.c b/lib/opt.c
index 776a3612..bcdd5741 100644
--- a/lib/opt.c
+++ b/lib/opt.c
@@ -164,6 +164,31 @@ nbd_unlocked_opt_starttls (struct nbd_handle *h)
   return r;
 }

+/* Issue NBD_OPT_EXTENDED_HEADERS and wait for the reply. */
+int
+nbd_unlocked_opt_extended_headers (struct nbd_handle *h)
+{
+  int err;
+  nbd_completion_callback c = { .callback = go_complete, .user_data = &err };
+  int r = nbd_unlocked_aio_opt_extended_headers (h, &c);
+
+  if (r == -1)
+    return r;
+
+  r = wait_for_option (h);
+  if (r == 0) {
+    if (nbd_internal_is_state_negotiating (get_next_state (h)))
+      r = err == 0;
+    else {
+      assert (nbd_internal_is_state_dead (get_next_state (h)));
+      set_error (err,
+                 "failed to get response to opt_extended_headers request");
+      r = -1;
+    }
+  }
+  return r;
+}
+
 /* Issue NBD_OPT_STRUCTURED_REPLY and wait for the reply. */
 int
 nbd_unlocked_opt_structured_reply (struct nbd_handle *h)
@@ -386,6 +411,25 @@ nbd_unlocked_aio_opt_starttls (struct nbd_handle *h,
 #endif
 }

+/* Issue NBD_OPT_EXTENDED_HEADERS without waiting. */
+int
+nbd_unlocked_aio_opt_extended_headers (struct nbd_handle *h,
+                                       nbd_completion_callback *complete)
+{
+  if ((h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE) == 0) {
+    set_error (ENOTSUP, "server is not using fixed newstyle protocol");
+    return -1;
+  }
+
+  h->opt_current = NBD_OPT_EXTENDED_HEADERS;
+  h->opt_cb.completion = *complete;
+  SET_CALLBACK_TO_NULL (*complete);
+
+  if (nbd_internal_run (h, cmd_issue) == -1)
+    debug (h, "option queued, ignoring state machine failure");
+  return 0;
+}
+
 /* Issue NBD_OPT_STRUCTURED_REPLY without waiting. */
 int
 nbd_unlocked_aio_opt_structured_reply (struct nbd_handle *h,
diff --git a/interop/Makefile.am b/interop/Makefile.am
index 75716eb9..5e0ea1ed 100644
--- a/interop/Makefile.am
+++ b/interop/Makefile.am
@@ -25,6 +25,7 @@ EXTRA_DIST = \
 	list-exports-test-dir/disk1 \
 	list-exports-test-dir/disk2 \
 	structured-read.sh \
+	opt-extended-headers.sh \
 	$(NULL)

 TESTS_ENVIRONMENT = \
@@ -134,6 +135,7 @@ check_PROGRAMS += \
 	socket-activation-qemu-nbd \
 	dirty-bitmap \
 	structured-read \
+	opt-extended-headers \
 	$(NULL)
 TESTS += \
 	interop-qemu-nbd \
@@ -144,6 +146,7 @@ TESTS += \
 	dirty-bitmap.sh \
 	structured-read.sh \
 	interop-qemu-block-size.sh \
+	opt-extended-headers.sh \
 	$(NULL)

 interop_qemu_nbd_SOURCES = \
@@ -235,6 +238,9 @@ dirty_bitmap_LDADD = $(top_builddir)/lib/libnbd.la
 structured_read_SOURCES = structured-read.c
 structured_read_LDADD = $(top_builddir)/lib/libnbd.la

+opt_extended_headers_SOURCES = opt-extended-headers.c
+opt_extended_headers_LDADD = $(top_builddir)/lib/libnbd.la
+
 endif HAVE_QEMU_NBD

 #----------------------------------------------------------------------
diff --git a/interop/opt-extended-headers.c b/interop/opt-extended-headers.c
new file mode 100644
index 00000000..d8ca102d
--- /dev/null
+++ b/interop/opt-extended-headers.c
@@ -0,0 +1,153 @@
+/* NBD client library in userspace
+ * Copyright (C) 2013-2022 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/* Demonstrate low-level use of nbd_opt_extended_headers(). */
+
+#include <config.h>
+
+#include <inttypes.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sys/stat.h>
+
+#include <libnbd.h>
+
+#define check(got, exp) do_check (#got, got, exp)
+
+static void
+do_check (const char *act, int64_t got, int64_t exp)
+{
+  fprintf (stderr, "trying %s\n", act);
+  if (got == -1)
+    fprintf (stderr, "%s\n", nbd_get_error ());
+  else
+    fprintf (stderr, "succeeded, result %" PRId64 "\n", got);
+  if (got != exp) {
+    fprintf (stderr, "got %" PRId64 ", but expected %" PRId64 "\n", got, exp);
+    exit (EXIT_FAILURE);
+  }
+}
+
+static int
+cb (void *data, const char *metacontext, uint64_t offset,
+         nbd_extent *entries, size_t nr_entries, int *error)
+{
+  /* If we got here, extents worked, implying at least structured replies */
+  bool *seen = data;
+
+  *seen = true;
+  return 0;
+}
+
+struct nbd_handle *
+prep (bool sr, bool eh, char **cmd)
+{
+  struct nbd_handle *nbd;
+
+  nbd = nbd_create ();
+  if (nbd == NULL) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  /* Connect to the server in opt mode, disable client-side failsafes so
+   * that we are testing server response even when client breaks protocol.
+   */
+  check (nbd_set_opt_mode (nbd, true), 0);
+  check (nbd_set_strict_mode (nbd, 0), 0);
+  check (nbd_add_meta_context (nbd, LIBNBD_CONTEXT_BASE_ALLOCATION), 0);
+  check (nbd_set_request_structured_replies (nbd, sr), 0);
+  check (nbd_set_request_extended_headers (nbd, eh), 0);
+  check (nbd_connect_systemd_socket_activation (nbd, cmd), 0);
+
+  return nbd;
+}
+
+void
+cleanup (struct nbd_handle *nbd, bool extents_exp)
+{
+  bool extents = false;
+
+  check (nbd_opt_go (nbd), 0);
+  check (nbd_can_meta_context (nbd, LIBNBD_CONTEXT_BASE_ALLOCATION),
+         extents_exp);
+  check (nbd_block_status_64 (nbd, 512, 0,
+                              (nbd_extent64_callback) { .callback = cb,
+                                                        .user_data = &extents },
+                              0), extents_exp ? 0 : -1);
+  check (extents, extents_exp);
+  nbd_close (nbd);
+}
+
+int
+main (int argc, char *argv[])
+{
+  struct nbd_handle *nbd;
+  int64_t bytes_sent;
+
+  if (argc < 2) {
+    fprintf (stderr, "%s qemu-nbd [args ...]\n", argv[0]);
+    exit (EXIT_FAILURE);
+  }
+
+  /* Default setup tries eh first, and skips sr request when eh works... */
+  nbd = prep (true, true, &argv[1]);
+  bytes_sent = nbd_stats_bytes_sent (nbd);
+  check (nbd_get_extended_headers_negotiated (nbd), true);
+  check (nbd_get_structured_replies_negotiated (nbd), true);
+  /* Duplicate eh request is no-op as redundant, but does not change state */
+  check (nbd_opt_extended_headers (nbd), false);
+  /* Trying sr after eh is no-op as redundant, but does not change state */
+  check (nbd_opt_structured_reply (nbd), false);
+  check (nbd_get_extended_headers_negotiated (nbd), true);
+  check (nbd_get_structured_replies_negotiated (nbd), true);
+  cleanup (nbd, true);
+
+  /* ...which should result in the same amount of initial negotiation
+   * traffic as explicitly requesting just structured replies, albeit
+   * with different results on what got negotiated.
+   */
+  nbd = prep (true, false, &argv[1]);
+  check (nbd_stats_bytes_sent (nbd), bytes_sent);
+  check (nbd_get_extended_headers_negotiated (nbd), false);
+  check (nbd_get_structured_replies_negotiated (nbd), true);
+  cleanup (nbd, true);
+
+  /* request_eh is ignored if request_sr is false. */
+  nbd = prep (false, true, &argv[1]);
+  check (nbd_get_extended_headers_negotiated (nbd), false);
+  check (nbd_get_structured_replies_negotiated (nbd), false);
+  cleanup (nbd, false);
+
+  /* Swap order, requesting structured replies before extended headers */
+  nbd = prep (false, false, &argv[1]);
+  check (nbd_get_extended_headers_negotiated (nbd), false);
+  check (nbd_get_structured_replies_negotiated (nbd), false);
+  check (nbd_opt_structured_reply (nbd), true);
+  check (nbd_get_extended_headers_negotiated (nbd), false);
+  check (nbd_get_structured_replies_negotiated (nbd), true);
+  check (nbd_opt_extended_headers (nbd), true);
+  check (nbd_get_extended_headers_negotiated (nbd), true);
+  check (nbd_get_structured_replies_negotiated (nbd), true);
+  cleanup (nbd, true);
+
+  exit (EXIT_SUCCESS);
+}
diff --git a/interop/opt-extended-headers.sh b/interop/opt-extended-headers.sh
new file mode 100755
index 00000000..491770d9
--- /dev/null
+++ b/interop/opt-extended-headers.sh
@@ -0,0 +1,29 @@
+#!/usr/bin/env bash
+# nbd client library in userspace
+# Copyright (C) 2019-2022 Red Hat Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+
+# Test low-level nbd_opt_extended_headers() details with qemu-nbd
+
+source ../tests/functions.sh
+set -e
+set -x
+
+requires qemu-nbd --version
+requires nbdinfo --can extended-headers -- [ qemu-nbd -r -f raw "$0" ]
+
+# Run the test.
+$VG ./opt-extended-headers qemu-nbd -r -f raw "$0"
diff --git a/.gitignore b/.gitignore
index bd002522..35e65335 100644
--- a/.gitignore
+++ b/.gitignore
@@ -113,6 +113,7 @@ Makefile.in
 /interop/list-exports-nbdkit
 /interop/list-exports-qemu-nbd
 /interop/nbd-server-tls.conf
+/interop/opt-extended-headers
 /interop/requires.c
 /interop/socket-activation-nbdkit
 /interop/socket-activation-qemu-nbd
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 20/23] interop: Add test of 64-bit block status
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (18 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 19/23] api: Add nbd_[aio_]opt_extended_headers() Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 21/23] api: Add nbd_can_block_status_payload() Eric Blake
                     ` (2 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Prove that we can round-trip a block status request larger than 4G
through a new-enough qemu-nbd.  Also serves as a unit test of our shim
for converting internal 64-bit representation back to the older 32-bit
nbd_block_status callback interface.
---
 interop/Makefile.am     |   6 ++
 interop/large-status.c  | 186 ++++++++++++++++++++++++++++++++++++++++
 interop/large-status.sh |  49 +++++++++++
 .gitignore              |   1 +
 4 files changed, 242 insertions(+)
 create mode 100644 interop/large-status.c
 create mode 100755 interop/large-status.sh

diff --git a/interop/Makefile.am b/interop/Makefile.am
index 5e0ea1ed..84c8fbbb 100644
--- a/interop/Makefile.am
+++ b/interop/Makefile.am
@@ -21,6 +21,7 @@ EXTRA_DIST = \
 	dirty-bitmap.sh \
 	interop-qemu-storage-daemon.sh \
 	interop-qemu-block-size.sh \
+	large-status.sh \
 	list-exports-nbd-config \
 	list-exports-test-dir/disk1 \
 	list-exports-test-dir/disk2 \
@@ -134,6 +135,7 @@ check_PROGRAMS += \
 	list-exports-qemu-nbd \
 	socket-activation-qemu-nbd \
 	dirty-bitmap \
+	large-status \
 	structured-read \
 	opt-extended-headers \
 	$(NULL)
@@ -144,6 +146,7 @@ TESTS += \
 	list-exports-qemu-nbd \
 	socket-activation-qemu-nbd \
 	dirty-bitmap.sh \
+	large-status.sh \
 	structured-read.sh \
 	interop-qemu-block-size.sh \
 	opt-extended-headers.sh \
@@ -235,6 +238,9 @@ socket_activation_qemu_nbd_LDADD = $(top_builddir)/lib/libnbd.la
 dirty_bitmap_SOURCES = dirty-bitmap.c
 dirty_bitmap_LDADD = $(top_builddir)/lib/libnbd.la

+large_status_SOURCES = large-status.c
+large_status_LDADD = $(top_builddir)/lib/libnbd.la
+
 structured_read_SOURCES = structured-read.c
 structured_read_LDADD = $(top_builddir)/lib/libnbd.la

diff --git a/interop/large-status.c b/interop/large-status.c
new file mode 100644
index 00000000..44a911d7
--- /dev/null
+++ b/interop/large-status.c
@@ -0,0 +1,186 @@
+/* NBD client library in userspace
+ * Copyright (C) 2013-2022 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/* Test 64-bit block status with qemu. */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <assert.h>
+#include <stdbool.h>
+#include <errno.h>
+
+#include <libnbd.h>
+
+static const char *bitmap;
+
+struct data {
+  bool req_one;    /* input: true if req_one was passed to request */
+  int count;       /* input: count of expected remaining calls */
+  bool seen_base;  /* output: true if base:allocation encountered */
+  bool seen_dirty; /* output: true if qemu:dirty-bitmap encountered */
+};
+
+static int
+cb32 (void *opaque, const char *metacontext, uint64_t offset,
+      uint32_t *entries, size_t len, int *error)
+{
+  struct data *data = opaque;
+
+  assert (offset == 0);
+  assert (data->count-- > 0);
+
+  if (strcmp (metacontext, LIBNBD_CONTEXT_BASE_ALLOCATION) == 0) {
+    assert (!data->seen_base);
+    data->seen_base = true;
+
+    /* Data block offset 0 size 64k, remainder is hole */
+    assert (len == 4);
+    assert (entries[0] == 65536);
+    assert (entries[1] == 0);
+    /* libnbd had to truncate qemu's >4G answer */
+    assert (entries[2] == 4227858432);
+    assert (entries[3] == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
+  }
+  else if (strcmp (metacontext, bitmap) == 0) {
+    assert (!data->seen_dirty);
+    data->seen_dirty = true;
+
+    /* Dirty block at offset 5G-64k, remainder is clean */
+    /* libnbd had to truncate qemu's >4G answer */
+    assert (len == 2);
+    assert (entries[0] == 4227858432);
+    assert (entries[1] == 0);
+  }
+  else {
+    fprintf (stderr, "unexpected context %s\n", metacontext);
+    exit (EXIT_FAILURE);
+  }
+  return 0;
+}
+
+static int
+cb64 (void *opaque, const char *metacontext, uint64_t offset,
+      nbd_extent *entries, size_t len, int *error)
+{
+  struct data *data = opaque;
+
+  assert (offset == 0);
+  assert (data->count-- > 0);
+
+  if (strcmp (metacontext, LIBNBD_CONTEXT_BASE_ALLOCATION) == 0) {
+    assert (!data->seen_base);
+    data->seen_base = true;
+
+    /* Data block offset 0 size 64k, remainder is hole */
+    assert (len == 2);
+    assert (entries[0].length == 65536);
+    assert (entries[0].flags == 0);
+    assert (entries[1].length == 5368643584ULL);
+    assert (entries[1].flags == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
+  }
+  else if (strcmp (metacontext, bitmap) == 0) {
+    assert (!data->seen_dirty);
+    data->seen_dirty = true;
+
+    /* Dirty block at offset 5G-64k, remainder is clean */
+    assert (len == 2);
+    assert (entries[0].length == 5368643584ULL);
+    assert (entries[0].flags == 0);
+    assert (entries[1].length == 65536);
+    assert (entries[1].flags == 1);
+  }
+  else {
+    fprintf (stderr, "unexpected context %s\n", metacontext);
+    exit (EXIT_FAILURE);
+  }
+  return 0;
+}
+
+int
+main (int argc, char *argv[])
+{
+  struct nbd_handle *nbd;
+  int64_t exportsize;
+  struct data data;
+
+  if (argc < 3) {
+    fprintf (stderr, "%s bitmap qemu-nbd [args ...]\n", argv[0]);
+    exit (EXIT_FAILURE);
+  }
+  bitmap = argv[1];
+
+  nbd = nbd_create ();
+  if (nbd == NULL) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  nbd_add_meta_context (nbd, LIBNBD_CONTEXT_BASE_ALLOCATION);
+  nbd_add_meta_context (nbd, bitmap);
+
+  if (nbd_connect_systemd_socket_activation (nbd, &argv[2]) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  exportsize = nbd_get_size (nbd);
+  if (exportsize == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  if (nbd_get_extended_headers_negotiated (nbd) != 1) {
+    fprintf (stderr, "skipping: qemu-nbd lacks extended headers\n");
+    exit (77);
+  }
+
+  /* Prove that we can round-trip a >4G block status request */
+  data = (struct data) { .count = 2, };
+  if (nbd_block_status_64 (nbd, exportsize, 0,
+                           (nbd_extent64_callback) { .callback = cb64,
+                             .user_data = &data },
+                           0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+  assert (data.seen_base && data.seen_dirty);
+
+  /* Check libnbd's handling of a >4G response through older interface  */
+  data = (struct data) { .count = 2, };
+  if (nbd_block_status (nbd, exportsize, 0,
+                        (nbd_extent_callback) { .callback = cb32,
+                          .user_data = &data },
+                        0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+  assert (data.seen_base && data.seen_dirty);
+
+  if (nbd_shutdown (nbd, 0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  nbd_close (nbd);
+
+  exit (EXIT_SUCCESS);
+}
diff --git a/interop/large-status.sh b/interop/large-status.sh
new file mode 100755
index 00000000..86271a41
--- /dev/null
+++ b/interop/large-status.sh
@@ -0,0 +1,49 @@
+#!/usr/bin/env bash
+# nbd client library in userspace
+# Copyright (C) 2019-2022 Red Hat Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+
+# Test reading qemu dirty-bitmap.
+
+source ../tests/functions.sh
+set -e
+set -x
+
+requires qemu-img bitmap --help
+requires qemu-nbd --version
+
+# This test uses the qemu-nbd -B option.
+if ! qemu-nbd --help | grep -sq -- -B; then
+    echo "$0: skipping because qemu-nbd does not support the -B option"
+    exit 77
+fi
+
+files="large-status.qcow2"
+rm -f $files
+cleanup_fn rm -f $files
+
+# Create mostly-sparse file with intentionally different data vs. dirty areas
+# (64k data, 5G-64k hole,zero; 5G-64k clean, 64k dirty)
+qemu-img create -f qcow2 large-status.qcow2 5G
+qemu-img bitmap --add --enable -f qcow2 large-status.qcow2 bitmap0
+qemu-io -f qcow2 -c "w -z $((5*1024*1024*1024 - 64*1024)) 64k" \
+        large-status.qcow2
+qemu-img bitmap --disable -f qcow2 large-status.qcow2 bitmap0
+qemu-io -f qcow2 -c 'w 0 64k' large-status.qcow2
+
+# Run the test.
+$VG ./large-status qemu:dirty-bitmap:bitmap0 \
+    qemu-nbd -f qcow2 -B bitmap0 large-status.qcow2
diff --git a/.gitignore b/.gitignore
index 35e65335..67bcee58 100644
--- a/.gitignore
+++ b/.gitignore
@@ -109,6 +109,7 @@ Makefile.in
 /interop/interop-qemu-nbd
 /interop/interop-qemu-nbd-tls-certs
 /interop/interop-qemu-nbd-tls-psk
+/interop/large-status
 /interop/list-exports-nbd-server
 /interop/list-exports-nbdkit
 /interop/list-exports-qemu-nbd
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 21/23] api: Add nbd_can_block_status_payload()
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (19 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 20/23] interop: Add test of 64-bit block status Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 22/23] api: Add nbd_[aio_]block_status_filter() Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 23/23] RFC: pread: Accept 64-bit holes Eric Blake
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

In the recent NBD protocol extensions to add 64-bit commands, an
additional option was added to allow NBD_CMD_BLOCK_STATUS pass a
client payload instructing the server to filter its answers (mainly
useful when the client requests more than one meta context with
NBD_OPT_SET_META_CONTEXT).  This patch lays the groundwork by exposing
servers that advertise this capability, although libnbd does not yet
actually utilize it until the next patch.

At the time this patch was written, qemu-nbd was also patched to
provide such support; hence, an interop/ test shows the API in action.
---
 info/nbdinfo.pod                |  10 ++-
 lib/nbd-protocol.h              |  29 +++++---
 generator/API.ml                |  18 +++++
 lib/flags.c                     |  11 +++
 examples/server-flags.c         |   7 +-
 interop/Makefile.am             |   6 ++
 interop/block-status-payload.c  | 126 ++++++++++++++++++++++++++++++++
 interop/block-status-payload.sh |  68 +++++++++++++++++
 .gitignore                      |   1 +
 info/can.c                      |   5 ++
 info/show.c                     |   9 ++-
 11 files changed, 273 insertions(+), 17 deletions(-)
 create mode 100644 interop/block-status-payload.c
 create mode 100755 interop/block-status-payload.sh

diff --git a/info/nbdinfo.pod b/info/nbdinfo.pod
index 2455e1c0..c3a97c3b 100644
--- a/info/nbdinfo.pod
+++ b/info/nbdinfo.pod
@@ -178,6 +178,8 @@ rotating disk: accessing nearby blocks may be faster than random
 access and requests should be sorted to improve performance.  Many
 servers do not or cannot report this accurately.

+=item nbdinfo --can block-status-payload URI
+
 =item nbdinfo --can cache URI

 =item nbdinfo --can df URI
@@ -345,10 +347,10 @@ The command does not print anything.  Instead it exits with success

 For further information see the L<NBD
 protocol|https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md>
-and the following libnbd functions: L<nbd_can_cache(3)>,
-L<nbd_can_df(3)>, L<nbd_can_fast_zero(3)>, L<nbd_can_flush(3)>,
-L<nbd_can_fua(3)>, L<nbd_can_multi_conn(3)>, L<nbd_can_trim(3)>,
-L<nbd_can_zero(3)>, L<nbd_is_read_only(3)>,
+and the following libnbd functions: L<nbd_can_block_status_payload(3)>,
+L<nbd_can_cache(3)>, L<nbd_can_df(3)>, L<nbd_can_fast_zero(3)>,
+L<nbd_can_flush(3)>, L<nbd_can_fua(3)>, L<nbd_can_multi_conn(3)>,
+L<nbd_can_trim(3)>, L<nbd_can_zero(3)>, L<nbd_is_read_only(3)>,
 L<nbd_get_structured_replies_negotiated(3)>,
 L<nbd_get_extended_headers_negotiated(3)>.

diff --git a/lib/nbd-protocol.h b/lib/nbd-protocol.h
index ac569a11..2d1fabd0 100644
--- a/lib/nbd-protocol.h
+++ b/lib/nbd-protocol.h
@@ -102,17 +102,18 @@ struct nbd_fixed_new_option_reply {
 #define NBD_FLAG_NO_ZEROES         (1 << 1)

 /* Per-export flags. */
-#define NBD_FLAG_HAS_FLAGS         (1 << 0)
-#define NBD_FLAG_READ_ONLY         (1 << 1)
-#define NBD_FLAG_SEND_FLUSH        (1 << 2)
-#define NBD_FLAG_SEND_FUA          (1 << 3)
-#define NBD_FLAG_ROTATIONAL        (1 << 4)
-#define NBD_FLAG_SEND_TRIM         (1 << 5)
-#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)
-#define NBD_FLAG_SEND_DF           (1 << 7)
-#define NBD_FLAG_CAN_MULTI_CONN    (1 << 8)
-#define NBD_FLAG_SEND_CACHE        (1 << 10)
-#define NBD_FLAG_SEND_FAST_ZERO    (1 << 11)
+#define NBD_FLAG_HAS_FLAGS            (1 << 0)
+#define NBD_FLAG_READ_ONLY            (1 << 1)
+#define NBD_FLAG_SEND_FLUSH           (1 << 2)
+#define NBD_FLAG_SEND_FUA             (1 << 3)
+#define NBD_FLAG_ROTATIONAL           (1 << 4)
+#define NBD_FLAG_SEND_TRIM            (1 << 5)
+#define NBD_FLAG_SEND_WRITE_ZEROES    (1 << 6)
+#define NBD_FLAG_SEND_DF              (1 << 7)
+#define NBD_FLAG_CAN_MULTI_CONN       (1 << 8)
+#define NBD_FLAG_SEND_CACHE           (1 << 10)
+#define NBD_FLAG_SEND_FAST_ZERO       (1 << 11)
+#define NBD_FLAG_BLOCK_STATUS_PAYLOAD (1 << 12)

 /* NBD options (new style handshake only). */
 #define NBD_OPT_EXPORT_NAME        1
@@ -204,6 +205,12 @@ struct nbd_request_ext {
   uint64_t count;               /* Request effect or payload length. */
 } NBD_ATTRIBUTE_PACKED;

+/* Extended request payload for NBD_CMD_BLOCK_STATUS, when supported. */
+struct nbd_block_status_payload {
+  uint64_t length;              /* Effective length of client request */
+  /* followed by array of uint32_t ids */
+} NBD_ATTRIBUTE_PACKED;
+
 /* Simple reply (server -> client). */
 struct nbd_simple_reply {
   uint32_t magic;               /* NBD_SIMPLE_REPLY_MAGIC. */
diff --git a/generator/API.ml b/generator/API.ml
index c4d15e3a..bbf7c0bb 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -2279,6 +2279,23 @@   "can_fast_zero", {
     example = Some "examples/server-flags.c";
   };

+  "can_block_status_payload", {
+    default_call with
+    args = []; ret = RBool;
+    permitted_states = [ Negotiating; Connected; Closed ];
+    shortdesc = "does the server support the block status payload flag?";
+    longdesc = "\
+Returns true if the server supports the use of the
+C<LIBNBD_CMD_FLAG_PAYLOAD_LEN> flag to allow filtering of the
+block status command.  Returns
+false if the server does not.  Note that this will never return
+true if L<nbd_get_extended_headers_negotiated(3)> is false."
+^ non_blocking_test_call_description;
+    see_also = [SectionLink "Flag calls"; Link "opt_info";
+                Link "get_extended_headers_negotiated"];
+    example = Some "examples/server-flags.c";
+  };
+
   "can_df", {
     default_call with
     args = []; ret = RBool;
@@ -4131,6 +4148,7 @@ let first_version =
   "get_extended_headers_negotiated", (1, 16);
   "opt_extended_headers", (1, 16);
   "aio_opt_extended_headers", (1, 16);
+  "can_block_status_payload", (1, 16);

   (* These calls are proposed for a future version of libnbd, but
    * have not been added to any released version so far.
diff --git a/lib/flags.c b/lib/flags.c
index eaf4ff85..7d3ce4c7 100644
--- a/lib/flags.c
+++ b/lib/flags.c
@@ -66,6 +66,11 @@ nbd_internal_set_size_and_flags (struct nbd_handle *h,
     eflags &= ~NBD_FLAG_SEND_DF;
   }

+  if (eflags & NBD_FLAG_BLOCK_STATUS_PAYLOAD && !h->extended_headers) {
+    debug (h, "server lacks extended headers, ignoring claim of block status payload");
+    eflags &= ~NBD_FLAG_BLOCK_STATUS_PAYLOAD;
+  }
+
   if (eflags & NBD_FLAG_SEND_FAST_ZERO &&
       !(eflags & NBD_FLAG_SEND_WRITE_ZEROES)) {
     debug (h, "server lacks write zeroes, ignoring claim of fast zero");
@@ -213,6 +218,12 @@ nbd_unlocked_can_cache (struct nbd_handle *h)
   return get_flag (h, NBD_FLAG_SEND_CACHE);
 }

+int
+nbd_unlocked_can_block_status_payload (struct nbd_handle *h)
+{
+  return get_flag (h, NBD_FLAG_BLOCK_STATUS_PAYLOAD);
+}
+
 int
 nbd_unlocked_can_meta_context (struct nbd_handle *h, const char *name)
 {
diff --git a/examples/server-flags.c b/examples/server-flags.c
index d156aced..c826f146 100644
--- a/examples/server-flags.c
+++ b/examples/server-flags.c
@@ -78,8 +78,13 @@ main (int argc, char *argv[])
   PRINT_FLAG (nbd_can_multi_conn);
   PRINT_FLAG (nbd_can_trim);
   PRINT_FLAG (nbd_can_zero);
-#if LIBNBD_HAVE_NBD_CAN_FAST_ZERO /* Added in 1.2 */
+#if LIBNBD_HAVE_NBD_CAN_FAST_ZERO
+  /* Added in 1.2 */
   PRINT_FLAG (nbd_can_fast_zero);
+#endif
+#if LIBNBD_HAVE_NBD_CAN_BLOCK_STATUS_PAYLOAD
+  /* Added in 1.16 */
+  PRINT_FLAG (nbd_can_block_status_payload);
 #endif
   PRINT_FLAG (nbd_is_read_only);
   PRINT_FLAG (nbd_is_rotational);
diff --git a/interop/Makefile.am b/interop/Makefile.am
index 84c8fbbb..8a7bb17e 100644
--- a/interop/Makefile.am
+++ b/interop/Makefile.am
@@ -27,6 +27,7 @@ EXTRA_DIST = \
 	list-exports-test-dir/disk2 \
 	structured-read.sh \
 	opt-extended-headers.sh \
+	block-status-payload.sh \
 	$(NULL)

 TESTS_ENVIRONMENT = \
@@ -138,6 +139,7 @@ check_PROGRAMS += \
 	large-status \
 	structured-read \
 	opt-extended-headers \
+	block-status-payload \
 	$(NULL)
 TESTS += \
 	interop-qemu-nbd \
@@ -150,6 +152,7 @@ TESTS += \
 	structured-read.sh \
 	interop-qemu-block-size.sh \
 	opt-extended-headers.sh \
+	block-status-payload.sh \
 	$(NULL)

 interop_qemu_nbd_SOURCES = \
@@ -247,6 +250,9 @@ structured_read_LDADD = $(top_builddir)/lib/libnbd.la
 opt_extended_headers_SOURCES = opt-extended-headers.c
 opt_extended_headers_LDADD = $(top_builddir)/lib/libnbd.la

+block_status_payload_SOURCES = block-status-payload.c
+block_status_payload_LDADD = $(top_builddir)/lib/libnbd.la
+
 endif HAVE_QEMU_NBD

 #----------------------------------------------------------------------
diff --git a/interop/block-status-payload.c b/interop/block-status-payload.c
new file mode 100644
index 00000000..cdb0de7c
--- /dev/null
+++ b/interop/block-status-payload.c
@@ -0,0 +1,126 @@
+/* NBD client library in userspace
+ * Copyright (C) 2013-2022 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/* Test interaction with qemu using block status payload filtering. */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <assert.h>
+#include <stdbool.h>
+#include <errno.h>
+
+#include <libnbd.h>
+
+#include "array-size.h"
+
+static const char *contexts[] = {
+  "base:allocation",
+  "qemu:allocation-depth",
+  "qemu:dirty-bitmap:bitmap0",
+  "qemu:dirty-bitmap:bitmap1",
+};
+
+static int
+cb (void *opaque, const char *metacontext, uint64_t offset,
+    nbd_extent *entries, size_t len, int *error)
+{
+  /* Adjust seen according to which context was visited */
+  unsigned int *seen = opaque;
+  size_t i;
+
+  for (i = 0; i < ARRAY_SIZE (contexts); i++)
+    if (strcmp (contexts[i], metacontext) == 0)
+      break;
+  *seen |= 1 << i;
+  return 0;
+}
+
+int
+main (int argc, char *argv[])
+{
+  struct nbd_handle *nbd;
+  int64_t exportsize;
+  unsigned int seen;
+  size_t i;
+  int r;
+
+  if (argc < 2) {
+    fprintf (stderr, "%s qemu-nbd [args ...]\n", argv[0]);
+    exit (EXIT_FAILURE);
+  }
+
+  nbd = nbd_create ();
+  if (nbd == NULL) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  assert (ARRAY_SIZE (contexts) == 4);
+  for (i = 0; i < ARRAY_SIZE (contexts); i++) {
+    if (nbd_add_meta_context (nbd, contexts[i]) == -1) {
+      fprintf (stderr, "%s\n", nbd_get_error ());
+      exit (EXIT_FAILURE);
+    }
+  }
+
+  if (nbd_connect_systemd_socket_activation (nbd, &argv[1]) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  r = nbd_can_block_status_payload (nbd);
+  if (r == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+  if (r != 1) {
+    fprintf (stderr, "expecting block status payload support from qemu\n");
+    exit (EXIT_FAILURE);
+  }
+
+  exportsize = nbd_get_size (nbd);
+  if (exportsize == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  /* An unfiltered call should see all four contexts */
+  seen = 0;
+  if (nbd_block_status_64 (nbd, exportsize, 0,
+                           (nbd_extent64_callback) { .callback = cb,
+                                                     .user_data = &seen },
+                           0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+  assert (seen == 0xf);
+
+  /* FIXME: Test filtered calls once the API is added */
+  if (nbd_shutdown (nbd, 0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+
+  nbd_close (nbd);
+
+  exit (EXIT_SUCCESS);
+}
diff --git a/interop/block-status-payload.sh b/interop/block-status-payload.sh
new file mode 100755
index 00000000..a178583b
--- /dev/null
+++ b/interop/block-status-payload.sh
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+# nbd client library in userspace
+# Copyright (C) 2019-2022 Red Hat Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+
+# Test use of block status payload for server filtering
+
+source ../tests/functions.sh
+set -e
+set -x
+
+requires qemu-img bitmap --help
+# This test uses the qemu-nbd -A and -B options.
+requires qemu-nbd -A -BA --version
+
+file="block-status-payload.qcow2"
+rm -f $file
+cleanup_fn rm -f $file
+
+# Create sparse file with two bitmaps.
+qemu-img create -f qcow2 $file 1M
+qemu-img bitmap --add --enable -f qcow2 $file bitmap0
+qemu-img bitmap --add --enable -f qcow2 $file bitmap1
+
+# Unconditional part of test: qemu should not advertise block status payload
+# support if extended headers are not in use
+nbdsh -c '
+h.set_request_extended_headers(False)
+h.add_meta_context("base:allocation")
+h.add_meta_context("qemu:allocation-depth")
+h.add_meta_context("qemu:dirty-bitmap:bitmap0")
+h.add_meta_context("qemu:dirty-bitmap:bitmap1")
+h.set_opt_mode(True)
+args = ["qemu-nbd", "-f", "qcow2", "-A", "-B", "bitmap0", "-B", "bitmap1",
+        "'"$file"'"]
+h.connect_systemd_socket_activation(args)
+assert h.aio_is_negotiating() is True
+assert h.get_extended_headers_negotiated() is False
+# Flag not available until info or go
+try:
+  h.can_block_status_payload()
+  assert False
+except nbd.Error:
+  pass
+h.opt_info()
+assert h.can_block_status_payload() is False
+assert h.can_meta_context("base:allocation") is True
+h.opt_abort()
+'
+
+# Conditional part of test: if qemu is new enough to support extended
+# headers, we assume it can also support block status payload.
+requires nbdinfo --can extended-headers -- [ qemu-nbd -r -f qcow2 "$file" ]
+$VG ./block-status-payload \
+    qemu-nbd -f qcow2 -A -B bitmap0 -B bitmap1 $file
diff --git a/.gitignore b/.gitignore
index 67bcee58..a61aa00e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -96,6 +96,7 @@ Makefile.in
 /info/nbdinfo
 /info/nbdinfo.1
 /install-sh
+/interop/block-status-payload
 /interop/dirty-bitmap
 /interop/interop-nbd-server
 /interop/interop-nbd-server-tls
diff --git a/info/can.c b/info/can.c
index f602ffce..daa7d2af 100644
--- a/info/can.c
+++ b/info/can.c
@@ -72,6 +72,11 @@ do_can (void)
   else if (strcasecmp (can, "rotational") == 0)
     feature = nbd_is_rotational (nbd);

+  else if (strcasecmp (can, "block status payload") == 0 ||
+           strcasecmp (can, "block-status-payload") == 0 ||
+           strcasecmp (can, "block_status_payload") == 0)
+    feature = nbd_can_block_status_payload (nbd);
+
   else if (strcasecmp (can, "cache") == 0)
     feature = nbd_can_cache (nbd);

diff --git a/info/show.c b/info/show.c
index 4bf59671..113b1b50 100644
--- a/info/show.c
+++ b/info/show.c
@@ -54,7 +54,7 @@ show_one_export (struct nbd_handle *nbd, const char *desc,
   char *uri = NULL;
   int is_rotational, is_read_only;
   int can_cache, can_df, can_fast_zero, can_flush, can_fua,
-    can_multi_conn, can_trim, can_zero;
+    can_multi_conn, can_trim, can_zero, can_block_status_payload;
   int64_t block_minimum, block_preferred, block_maximum;
   string_vector contexts = empty_vector;
   bool show_context = false;
@@ -120,6 +120,7 @@ show_one_export (struct nbd_handle *nbd, const char *desc,
   can_multi_conn = nbd_can_multi_conn (nbd);
   can_trim = nbd_can_trim (nbd);
   can_zero = nbd_can_zero (nbd);
+  can_block_status_payload = nbd_can_block_status_payload (nbd);
   block_minimum = nbd_get_block_size (nbd, LIBNBD_SIZE_MINIMUM);
   block_preferred = nbd_get_block_size (nbd, LIBNBD_SIZE_PREFERRED);
   block_maximum = nbd_get_block_size (nbd, LIBNBD_SIZE_MAXIMUM);
@@ -161,6 +162,8 @@ show_one_export (struct nbd_handle *nbd, const char *desc,
     if (is_read_only >= 0)
       fprintf (fp, "\t%s: %s\n", "is_read_only",
                is_read_only ? "true" : "false");
+    if (can_block_status_payload >= 0)
+      show_boolean ("can_block_status_payload", can_block_status_payload);
     if (can_cache >= 0)
       show_boolean ("can_cache", can_cache);
     if (can_df >= 0)
@@ -230,6 +233,10 @@ show_one_export (struct nbd_handle *nbd, const char *desc,
     if (is_read_only >= 0)
       fprintf (fp, "\t\"%s\": %s,\n",
               "is_read_only", is_read_only ? "true" : "false");
+    if (can_block_status_payload >= 0)
+      fprintf (fp, "\t\"%s\": %s,\n",
+              "can_block_status_payload",
+               can_block_status_payload ? "true" : "false");
     if (can_cache >= 0)
       fprintf (fp, "\t\"%s\": %s,\n",
               "can_cache", can_cache ? "true" : "false");
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 22/23] api: Add nbd_[aio_]block_status_filter()
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (20 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 21/23] api: Add nbd_can_block_status_payload() Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  2022-11-14 22:51   ` [libnbd PATCH v2 23/23] RFC: pread: Accept 64-bit holes Eric Blake
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

As part of extending NBD to support 64-bit lengths, the protocol also
added an option for servers to allow clients to request filtered
responses to NBD_CMD_BLOCK_STATUS when more than one meta-context is
negotiated (see NBD commit XXX[*]).  At the same time as this patch,
qemu-nbd was taught to support and advertise this feature as a server,
but does not utilize it as a client (qemu doesn't yet need to connect
to multiple contexts at once).  Thus, addding generic client support
and enhancing the interop/ test in libnbd is needed to prove that the
feature is viable and worth standardizing.

---
[*] FIXME with actual commit id
---
 lib/internal.h                   |   5 +-
 generator/API.ml                 |  71 +++++++++++++++--
 generator/states-issue-command.c |   4 +-
 lib/aio.c                        |   7 +-
 lib/rw.c                         | 127 ++++++++++++++++++++++++++++++-
 interop/block-status-payload.c   | 117 +++++++++++++++++++++++++++-
 interop/block-status-payload.sh  |  14 +++-
 info/info-can.sh                 |   3 +
 8 files changed, 336 insertions(+), 12 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index 1abb21cb..ac8d99c4 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -73,6 +73,8 @@ struct meta_context {
 };
 DEFINE_VECTOR_TYPE(meta_vector, struct meta_context);

+DEFINE_VECTOR_TYPE(uint32_vector, uint32_t);
+
 struct export {
   char *name;
   char *description;
@@ -379,7 +381,8 @@ struct command {
   uint64_t cookie;
   uint64_t offset;
   uint64_t count;
-  void *data; /* Buffer for read/write */
+  void *data; /* Buffer for read/write, uint32_vector* for status payload */
+  uint32_vector *ids; /* For block status with payload */
   struct command_cb cb;
   bool initialized; /* For read, true if getting a hole may skip memset */
   uint32_t data_seen; /* For read, cumulative size of data chunks seen */
diff --git a/generator/API.ml b/generator/API.ml
index bbf7c0bb..6bf67de0 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -2287,12 +2287,13 @@   "can_block_status_payload", {
     longdesc = "\
 Returns true if the server supports the use of the
 C<LIBNBD_CMD_FLAG_PAYLOAD_LEN> flag to allow filtering of the
-block status command.  Returns
+block status command (see L<nbd_block_status_filter(3)>).  Returns
 false if the server does not.  Note that this will never return
 true if L<nbd_get_extended_headers_negotiated(3)> is false."
 ^ non_blocking_test_call_description;
     see_also = [SectionLink "Flag calls"; Link "opt_info";
-                Link "get_extended_headers_negotiated"];
+                Link "get_extended_headers_negotiated";
+                Link "block_status_filter"];
     example = Some "examples/server-flags.c";
   };

@@ -2361,6 +2362,10 @@   "can_meta_context", {
 meta contexts were requested but there is a missing or failed
 attempt at NBD_OPT_SET_META_CONTEXT during option negotiation.

+If the server supports block status filtering (see
+L<nbd_can_block_status_payload(3)>, this function must return
+true for any filter name passed to L<nbd_block_status_filter(3)>.
+
 The single parameter is the name of the metadata context,
 for example C<LIBNBD_CONTEXT_BASE_ALLOCATION>.
 B<E<lt>libnbd.hE<gt>> includes defined constants for well-known
@@ -2893,9 +2898,12 @@   "block_status_64", {
 information about blocks beginning from the specified
 offset to be returned. The C<count> parameter is a hint: the
 server may choose to return less status, or the final block
-may extend beyond the requested range. If multiple contexts
+may extend beyond the requested range. When multiple contexts
 are supported, the number of blocks and cumulative length
-of those blocks need not be identical between contexts.
+of those blocks need not be identical between contexts; this
+command generally returns the status of all negotiated contexts,
+while some servers also support a filtered request (see
+L<nbd_can_block_status_payload(3)>, L<nbd_block_status_filter(3)>).

 Note that not all servers can support a C<count> of 4GiB or larger;
 L<nbd_get_extended_headers_negotiated(3)> indicates which servers
@@ -2945,11 +2953,38 @@   "block_status_64", {
 does not exceed C<count> bytes; however, libnbd does not
 validate that the server obeyed the flag."
 ^ strict_call_description;
-    see_also = [Link "block_status";
+    see_also = [Link "block_status"; Link "block_status_filter";
                 Link "add_meta_context"; Link "can_meta_context";
                 Link "aio_block_status_64"; Link "set_strict_mode"];
   };

+  "block_status_filter", {
+    default_call with
+    args = [ UInt64 "count"; UInt64 "offset"; StringList "contexts";
+             Closure extent64_closure ];
+    optargs = [ OFlags ("flags", cmd_flags, Some ["REQ_ONE"; "PAYLOAD_LEN"]) ];
+    ret = RErr;
+    permitted_states = [ Connected ];
+    shortdesc = "send filtered block status command, with 64-bit callback";
+    longdesc = "\
+Issue a filtered block status command to the NBD server.  If
+supported by the server (see L<nbd_can_block_status_payload(3)>),
+this causes metadata context information about blocks beginning
+from the specified offset to be returned, and with the result
+limited to just the contexts specified in C<filter>.  Note that
+all strings in C<filter> must be supported by
+L<nbd_can_meta_context(3)>.
+
+All other parameters to this function have the same semantics
+as in L<nbd_block_status_64(3)>; except that for convenience,
+the C<flags> parameter may additionally contain or omit
+C<LIBNBD_CMD_FLAG_PAYLOAD_LEN>."
+^ strict_call_description;
+    see_also = [Link "block_status_64";
+                Link "can_block_status_payload"; Link "can_meta_context";
+                Link "aio_block_status_filter"; Link "set_strict_mode"];
+  };
+
   "poll", {
     default_call with
     args = [ Int "timeout" ]; ret = RInt;
@@ -3619,6 +3654,30 @@   "aio_block_status_64", {
                 Link "set_strict_mode"];
   };

+  "aio_block_status_filter", {
+    default_call with
+    args = [ UInt64 "count"; UInt64 "offset"; StringList "contexts";
+             Closure extent64_closure ];
+    optargs = [ OClosure completion_closure;
+                OFlags ("flags", cmd_flags, Some ["REQ_ONE"; "PAYLOAD_LEN"]) ];
+    ret = RCookie;
+    permitted_states = [ Connected ];
+    shortdesc = "send filtered block status command to the NBD server";
+    longdesc = "\
+Send a filtered block status command to the NBD server.
+
+To check if the command completed, call L<nbd_aio_command_completed(3)>.
+Or supply the optional C<completion_callback> which will be invoked
+as described in L<libnbd(3)/Completion callbacks>.
+
+Other parameters behave as documented in L<nbd_block_status_filter(3)>."
+^ strict_call_description;
+    see_also = [SectionLink "Issuing asynchronous commands";
+                Link "aio_block_status_64"; Link "block_status_filter";
+                Link "can_meta_context"; Link "can_block_status_payload";
+                Link "set_strict_mode"];
+  };
+
   "aio_get_fd", {
     default_call with
     args = []; ret = RFd;
@@ -4149,6 +4208,8 @@ let first_version =
   "opt_extended_headers", (1, 16);
   "aio_opt_extended_headers", (1, 16);
   "can_block_status_payload", (1, 16);
+  "block_status_filter", (1, 16);
+  "aio_block_status_filter", (1, 16);

   (* These calls are proposed for a future version of libnbd, but
    * have not been added to any released version so far.
diff --git a/generator/states-issue-command.c b/generator/states-issue-command.c
index feea2672..da898aef 100644
--- a/generator/states-issue-command.c
+++ b/generator/states-issue-command.c
@@ -84,7 +84,9 @@  ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD:
   assert (h->cmds_to_issue != NULL);
   cmd = h->cmds_to_issue;
   assert (cmd->cookie == be64toh (h->req.compact.handle));
-  if (cmd->type == NBD_CMD_WRITE) {
+  if (cmd->type == NBD_CMD_WRITE ||
+      (h->extended_headers && cmd->type == NBD_CMD_BLOCK_STATUS &&
+       cmd->flags & NBD_CMD_FLAG_PAYLOAD_LEN)) {
     h->wbuf = cmd->data;
     h->wlen = cmd->count;
     if (cmd->next && cmd->count < 64 * 1024)
diff --git a/lib/aio.c b/lib/aio.c
index 1188a328..fa142881 100644
--- a/lib/aio.c
+++ b/lib/aio.c
@@ -32,8 +32,13 @@ void
 nbd_internal_retire_and_free_command (struct command *cmd)
 {
   /* Free the callbacks. */
-  if (cmd->type == NBD_CMD_BLOCK_STATUS)
+  if (cmd->type == NBD_CMD_BLOCK_STATUS) {
+    if (cmd->ids) {
+      uint32_vector_reset (cmd->ids);
+      free (cmd->ids);
+    }
     FREE_CALLBACK (cmd->cb.fn.extent);
+  }
   if (cmd->type == NBD_CMD_READ)
     FREE_CALLBACK (cmd->cb.fn.chunk);
   FREE_CALLBACK (cmd->cb.completion);
diff --git a/lib/rw.c b/lib/rw.c
index 26533836..310c1285 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -242,6 +242,26 @@ nbd_unlocked_block_status_64 (struct nbd_handle *h,
   return wait_for_command (h, cookie);
 }

+/* Issue a filtered block status command and wait for the reply. */
+int
+nbd_unlocked_block_status_filter (struct nbd_handle *h,
+                                  uint64_t count, uint64_t offset,
+                                  char **filter,
+                                  nbd_extent64_callback *extent64,
+                                  uint32_t flags)
+{
+  int64_t cookie;
+  nbd_completion_callback c = NBD_NULL_COMPLETION;
+
+  cookie = nbd_unlocked_aio_block_status_filter (h, count, offset, filter,
+                                                 extent64, &c, flags);
+  if (cookie == -1)
+    return -1;
+
+  assert (CALLBACK_IS_NULL (*extent64));
+  return wait_for_command (h, cookie);
+}
+
 /* count_err represents the errno to return if bounds check fail */
 int64_t
 nbd_internal_command_common (struct nbd_handle *h,
@@ -250,6 +270,7 @@ nbd_internal_command_common (struct nbd_handle *h,
                              void *data, struct command_cb *cb)
 {
   struct command *cmd;
+  uint32_vector *ids = NULL;

   if (h->disconnect_request) {
       set_error (EINVAL, "cannot request more commands after NBD_CMD_DISC");
@@ -297,10 +318,23 @@ nbd_internal_command_common (struct nbd_handle *h,
     }
     break;

+  case NBD_CMD_BLOCK_STATUS:
+    if (data) {
+      ids = data;
+      count = ids->len * sizeof (uint32_t);
+      data = ids->ptr;
+      if (count > MAX_REQUEST_SIZE ||
+          (h->strict & LIBNBD_STRICT_PAYLOAD && count > h->payload_maximum)) {
+        set_error (ERANGE, "filter set too large");
+        goto err;
+      }
+      break;
+    }
+    /* fallthrough */
+  default:
     /* Other commands are limited by the 32 bit field in the command
      * structure on the wire, unless extended headers were negotiated.
      */
-  default:
     if (!h->extended_headers && count > UINT32_MAX) {
       set_error (ERANGE, "request too large: maximum request size is %" PRIu32,
                  UINT32_MAX);
@@ -320,6 +354,7 @@ nbd_internal_command_common (struct nbd_handle *h,
   cmd->offset = offset;
   cmd->count = count;
   cmd->data = data;
+  cmd->ids = ids;
   if (cb)
     cmd->cb = *cb;

@@ -364,8 +399,13 @@ nbd_internal_command_common (struct nbd_handle *h,
  err:
   /* Since we did not queue the command, we must free the callbacks. */
   if (cb) {
-    if (type == NBD_CMD_BLOCK_STATUS)
+    if (type == NBD_CMD_BLOCK_STATUS) {
+      if (ids) {
+        uint32_vector_reset (ids);
+        free (ids);
+      }
       FREE_CALLBACK (cb->fn.extent);
+    }
     if (type == NBD_CMD_READ)
       FREE_CALLBACK (cb->fn.chunk);
     FREE_CALLBACK (cb->completion);
@@ -609,3 +649,86 @@ nbd_unlocked_aio_block_status_64 (struct nbd_handle *h,
   return nbd_internal_command_common (h, flags, NBD_CMD_BLOCK_STATUS, offset,
                                       count, EINVAL, NULL, &cb);
 }
+
+int64_t
+nbd_unlocked_aio_block_status_filter (struct nbd_handle *h,
+                                      uint64_t count, uint64_t offset,
+                                      char **filter,
+                                      nbd_extent64_callback *extent64,
+                                      nbd_completion_callback *completion,
+                                      uint32_t flags)
+{
+  struct command_cb cb = { .fn.extent = *extent64,
+                           .completion = *completion };
+  uint32_vector *ids;
+  char *name;
+  size_t i;
+
+  /* Because this affects wire format, it is more convenient to manage
+   * PAYLOAD_LEN by what was negotiated than to require the user to
+   * have to set it correctly.
+   */
+  if (!h->extended_headers) {
+    set_error (ENOTSUP, "server does not support extended headers");
+    return -1;
+  }
+  flags |= LIBNBD_CMD_FLAG_PAYLOAD_LEN;
+
+  if (h->strict & LIBNBD_STRICT_COMMANDS) {
+    if (nbd_unlocked_can_block_status_payload (h) != 1) {
+      set_error (EINVAL,
+                 "server does not support the block status payload flag");
+      return -1;
+    }
+
+    if (!h->meta_valid || h->meta_contexts.len == 0) {
+      set_error (ENOTSUP, "did not negotiate any metadata contexts, "
+                 "either you did not call nbd_add_meta_context before "
+                 "connecting or the server does not support it");
+      return -1;
+    }
+  }
+
+  ids = calloc (1, sizeof *ids);
+  if (ids == NULL) {
+    set_error (errno, "calloc");
+    return -1;
+  }
+  if (uint32_vector_append (ids, htobe32 (count >> 32)) == -1 ||
+      uint32_vector_append (ids, htobe32 (count)) == -1) {
+    set_error (errno, "realloc");
+    goto fail;
+  }
+
+  /* O(n^2) search - hopefully filter and negotiated contexts are both small */
+  for ( ; (name = *filter) != NULL; filter++) {
+    if (!h->meta_valid) {
+      set_error (EINVAL, "context %s not negotiated", name);
+      goto fail;
+    }
+    for (i = 0; i < h->meta_contexts.len; i++) {
+      struct meta_context *meta = &h->meta_contexts.ptr[i];
+      if (strcmp (name, meta->name) == 0) {
+        if (uint32_vector_append (ids, htobe32 (meta->context_id)) == -1) {
+          set_error (errno, "realloc");
+          goto fail;
+        }
+        break;
+      }
+    }
+    if (i == h->meta_contexts.len) {
+      set_error (EINVAL, "context %s not negotiated", name);
+      goto fail;
+    }
+  }
+
+  SET_CALLBACK_TO_NULL (*extent64);
+  SET_CALLBACK_TO_NULL (*completion);
+  return nbd_internal_command_common (h, flags, NBD_CMD_BLOCK_STATUS, offset,
+                                      count, EINVAL, ids, &cb);
+
+ fail:
+  uint32_vector_reset (ids);
+  free (ids);
+  return -1;
+}
diff --git a/interop/block-status-payload.c b/interop/block-status-payload.c
index cdb0de7c..0c0348ba 100644
--- a/interop/block-status-payload.c
+++ b/interop/block-status-payload.c
@@ -54,11 +54,26 @@ cb (void *opaque, const char *metacontext, uint64_t offset,
   return 0;
 }

+static char **
+list (unsigned int use)
+{
+  static const char *array[ARRAY_SIZE (contexts) + 1];
+  size_t i, j;
+
+  assert (use < 1 << ARRAY_SIZE (contexts));
+  for (i = j = 0; i < ARRAY_SIZE (contexts); i++)
+    if (use & (1 << i))
+      array[j++] = contexts[i];
+  array[j] = NULL;
+  return (char **) array;
+}
+
 int
 main (int argc, char *argv[])
 {
   struct nbd_handle *nbd;
   int64_t exportsize;
+  uint64_t bytes_sent;
   unsigned int seen;
   size_t i;
   int r;
@@ -114,7 +129,107 @@ main (int argc, char *argv[])
   }
   assert (seen == 0xf);

-  /* FIXME: Test filtered calls once the API is added */
+  /* Filtering with all contexts listed, same effect as unfilitered call */
+  seen = 0;
+  if (nbd_block_status_filter (nbd, exportsize, 0, list (0xf),
+                               (nbd_extent64_callback) { .callback = cb,
+                                                         .user_data = &seen },
+                               0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+  assert (seen == 0xf);
+
+  /* Filtering with just two out of four contexts; test optional flag */
+  seen = 0;
+  if (nbd_block_status_filter (nbd, exportsize, 0, list (0x5),
+                               (nbd_extent64_callback) { .callback = cb,
+                                                         .user_data = &seen },
+                               LIBNBD_CMD_FLAG_PAYLOAD_LEN) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+  assert (seen == 0x5);
+
+  /* Filtering with one context, near end of file (to make sure the
+   * payload length isn't confused with the effect length)
+   */
+  seen = 0;
+  if (nbd_block_status_filter (nbd, 1, exportsize - 1, list (0x2),
+                               (nbd_extent64_callback) { .callback = cb,
+                                                         .user_data = &seen },
+                               0) == -1) {
+    fprintf (stderr, "%s\n", nbd_get_error ());
+    exit (EXIT_FAILURE);
+  }
+  assert (seen == 0x2);
+
+  /* Filtering with no contexts - pointless, so qemu rejects it */
+  bytes_sent = nbd_stats_bytes_sent (nbd);
+  seen = 0;
+  if (nbd_block_status_filter (nbd, exportsize, 0, list (0x0),
+                               (nbd_extent64_callback) { .callback = cb,
+                                                         .user_data = &seen },
+                               0) != -1) {
+    fprintf (stderr, "expecting block status failure\n");
+    exit (EXIT_FAILURE);
+  }
+  assert (seen == 0x0);
+  if (nbd_get_errno () != EINVAL) {
+    fprintf (stderr, "expecting EINVAL after block status failure\n");
+    exit (EXIT_FAILURE);
+  }
+  if (nbd_stats_bytes_sent (nbd) <= bytes_sent) {
+    fprintf (stderr, "expecting server-side rejection of bad request\n");
+    exit (EXIT_FAILURE);
+  }
+
+  /* Giving unknown string triggers EINVAL from libnbd */
+  bytes_sent = nbd_stats_bytes_sent (nbd);
+  seen = 0;
+  {
+    const char *bogus[] = { "qemu:dirty-bitmap:bitmap2", NULL };
+    if (nbd_block_status_filter (nbd, exportsize, 0, (char **) bogus,
+                                 (nbd_extent64_callback) { .callback = cb,
+                                                           .user_data = &seen },
+                                 0) != -1) {
+      fprintf (stderr, "expecting block status failure\n");
+      exit (EXIT_FAILURE);
+    }
+  }
+  if (nbd_get_errno () != EINVAL) {
+    fprintf (stderr, "expecting EINVAL after block status failure\n");
+    exit (EXIT_FAILURE);
+  }
+  assert (seen == 0x0);
+  if (nbd_stats_bytes_sent (nbd) != bytes_sent) {
+    fprintf (stderr, "expecting client-side rejection of bad request\n");
+    exit (EXIT_FAILURE);
+  }
+
+  /* Giving same string twice triggers EINVAL from qemu */
+  seen = 0;
+  {
+    const char *dupes[] = { "base:allocation", "base:allocation", NULL };
+    if (nbd_block_status_filter (nbd, exportsize, 0, (char **) dupes,
+                                 (nbd_extent64_callback) { .callback = cb,
+                                                           .user_data = &seen },
+                                 0) != -1) {
+      fprintf (stderr, "expecting block status failure\n");
+      exit (EXIT_FAILURE);
+    }
+  }
+  if (nbd_get_errno () != EINVAL) {
+    fprintf (stderr, "expecting EINVAL after block status failure\n");
+    exit (EXIT_FAILURE);
+  }
+  assert (seen == 0x0);
+  if (nbd_stats_bytes_sent (nbd) <= bytes_sent) {
+    fprintf (stderr, "expecting server-side rejection of bad request\n");
+    exit (EXIT_FAILURE);
+  }
+
+  /* Done */
   if (nbd_shutdown (nbd, 0) == -1) {
     fprintf (stderr, "%s\n", nbd_get_error ());
     exit (EXIT_FAILURE);
diff --git a/interop/block-status-payload.sh b/interop/block-status-payload.sh
index a178583b..e42e30e9 100755
--- a/interop/block-status-payload.sh
+++ b/interop/block-status-payload.sh
@@ -49,6 +49,7 @@ args = ["qemu-nbd", "-f", "qcow2", "-A", "-B", "bitmap0", "-B", "bitmap1",
 h.connect_systemd_socket_activation(args)
 assert h.aio_is_negotiating() is True
 assert h.get_extended_headers_negotiated() is False
+
 # Flag not available until info or go
 try:
   h.can_block_status_payload()
@@ -58,7 +59,18 @@ except nbd.Error:
 h.opt_info()
 assert h.can_block_status_payload() is False
 assert h.can_meta_context("base:allocation") is True
-h.opt_abort()
+
+# Filter request not allowed if not advertised
+def f():
+  assert False
+h.opt_go()
+assert h.can_block_status_payload() is False
+try:
+  h.block_status_filter(0, 512, ["base:allocation"], f)
+  assert False
+except nbd.Error:
+  pass
+h.shutdown()
 '

 # Conditional part of test: if qemu is new enough to support extended
diff --git a/info/info-can.sh b/info/info-can.sh
index e5f6a44b..111b0be2 100755
--- a/info/info-can.sh
+++ b/info/info-can.sh
@@ -38,6 +38,9 @@ requires bash -c "nbdkit sh --dump-plugin | grep has_can_cache=1"
 # and oldstyle never, but that feels like depending a bit too much on
 # the implementation.

+# --can block-status-payload is not supported by nbdkit yet. Testing
+# is done during interop with new-enough qemu.
+
 # --can structured-reply is not a per-export setting, but rather
 # something set on the server as a whole.

-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [libnbd PATCH v2 23/23] RFC: pread: Accept 64-bit holes
  2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
                     ` (21 preceding siblings ...)
  2022-11-14 22:51   ` [libnbd PATCH v2 22/23] api: Add nbd_[aio_]block_status_filter() Eric Blake
@ 2022-11-14 22:51   ` Eric Blake
  22 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2022-11-14 22:51 UTC (permalink / raw)
  To: libguestfs; +Cc: qemu-devel, qemu-block, nbd

Even though we don't currently allow the user to request NBD_CMD_READ
with more than 64M (and even if we did, our API signature caps us at
SIZE_MAX, which is 32 bits on a 32-bit machine), upstream NBD commit
XXX[*] states that for symmetry with 64-bit requests, extended header
clients must be prepared for a server response with a 64-bit hole,
even if the client never makes a read request that large.  Note that
we don't have to change the signature of the callback for
nbd_pread_structured; nor is it worth adding a 64-bit counterpart to
LIBNBD_READ_HOLE, because it is unlikely that a user callback will
ever need to distinguish between which size was sent over the wire,
when the value is always less than 32 bits.  Also note that the recent
NBD spec changes to add 64-bits did state that servers may allow
clients to request a read of larger than the max block size, but if
such read is not rejected with EOVERFLOW or EINVAL, then the reply
will be divided into chunks so that no chunk sends more than a max
block size payload.

While we cannot guarantee which size structured reply the server will
use, it is easy enough to handle both sizes, but tag the read with
EPROTO for a non-compliant server that sends wide replies when
extended headers were not negotiated.  A more likely reason that a
server may choose to send 64-bit hole chunks even for a 32-bit hole is
because the extended hole payload has nicer power-of-2 sizing.

---

[*] FIXME: Update this with the actual commit id, if upstream NBD even
goes with this option.
---
 lib/internal.h                      |  1 +
 lib/nbd-protocol.h                  |  6 +++++
 generator/states-reply-structured.c | 36 ++++++++++++++++++++++++-----
 3 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/lib/internal.h b/lib/internal.h
index ac8d99c4..64d1941c 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -250,6 +250,7 @@ struct nbd_handle {
       union {
         struct nbd_structured_reply_offset_data offset_data;
         struct nbd_structured_reply_offset_hole offset_hole;
+        struct nbd_structured_reply_offset_hole_ext offset_hole_ext;
         struct nbd_structured_reply_block_status_hdr bs_hdr;
         struct nbd_structured_reply_block_status_ext_hdr bs_ext_hdr;
         struct {
diff --git a/lib/nbd-protocol.h b/lib/nbd-protocol.h
index 2d1fabd0..2d1a3202 100644
--- a/lib/nbd-protocol.h
+++ b/lib/nbd-protocol.h
@@ -247,6 +247,11 @@ struct nbd_structured_reply_offset_hole {
   uint32_t length;              /* Length of hole. */
 } NBD_ATTRIBUTE_PACKED;

+struct nbd_structured_reply_offset_hole_ext {
+  uint64_t offset;
+  uint64_t length;              /* Length of hole. */
+} NBD_ATTRIBUTE_PACKED;
+
 /* NBD_REPLY_TYPE_BLOCK_STATUS block descriptor. */
 struct nbd_block_descriptor {
   uint32_t length;              /* length of block */
@@ -292,6 +297,7 @@ struct nbd_structured_reply_error {
 #define NBD_REPLY_TYPE_NONE             0
 #define NBD_REPLY_TYPE_OFFSET_DATA      1
 #define NBD_REPLY_TYPE_OFFSET_HOLE      2
+#define NBD_REPLY_TYPE_OFFSET_HOLE_EXT  3
 #define NBD_REPLY_TYPE_BLOCK_STATUS     5
 #define NBD_REPLY_TYPE_BLOCK_STATUS_EXT 6
 #define NBD_REPLY_TYPE_ERROR            NBD_REPLY_TYPE_ERR (1)
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index 7e313b5a..e338bf74 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -28,15 +28,16 @@
  * requesting command.
  */
 static bool
-structured_reply_in_bounds (uint64_t offset, uint32_t length,
+structured_reply_in_bounds (uint64_t offset, uint64_t length,
                             const struct command *cmd)
 {
   if (offset < cmd->offset ||
       offset >= cmd->offset + cmd->count ||
-      offset + length > cmd->offset + cmd->count) {
+      length > cmd->offset + cmd->count ||
+      offset > cmd->offset + cmd->count - length) {
     set_error (0, "range of structured reply is out of bounds, "
                "offset=%" PRIu64 ", cmd->offset=%" PRIu64 ", "
-               "length=%" PRIu32 ", cmd->count=%" PRIu64 ": "
+               "length=%" PRIu64 ", cmd->count=%" PRIu64 ": "
                "this is likely to be a bug in the NBD server",
                offset, cmd->offset, length, cmd->count);
     return false;
@@ -141,6 +142,21 @@  REPLY.STRUCTURED_REPLY.CHECK:
     SET_NEXT_STATE (%RECV_OFFSET_HOLE);
     break;

+  case NBD_REPLY_TYPE_OFFSET_HOLE_EXT:
+    if (cmd->type != NBD_CMD_READ ||
+        length != sizeof h->sbuf.reply.payload.offset_hole_ext)
+      goto resync;
+    if (!h->extended_headers) {
+      debug (h, "unexpected 64-bit hole without extended headers, "
+             "this is probably a server bug");
+      if (cmd->error == 0)
+        cmd->error = EPROTO;
+    }
+    h->rbuf = &h->sbuf.reply.payload.offset_hole_ext;
+    h->rlen = sizeof h->sbuf.reply.payload.offset_hole_ext;
+    SET_NEXT_STATE (%RECV_OFFSET_HOLE);
+    break;
+
   case NBD_REPLY_TYPE_BLOCK_STATUS:
     if (cmd->type != NBD_CMD_BLOCK_STATUS ||
         length < 12 || ((length-4) & 7) != 0)
@@ -406,7 +422,8 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA_DATA:
  REPLY.STRUCTURED_REPLY.RECV_OFFSET_HOLE:
   struct command *cmd = h->reply_cmd;
   uint64_t offset;
-  uint32_t length;
+  uint64_t length;
+  uint16_t type;

   switch (recv_into_rbuf (h)) {
   case -1: SET_NEXT_STATE (%.DEAD); return 0;
@@ -416,10 +433,14 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_HOLE:
     return 0;
   case 0:
     offset = be64toh (h->sbuf.reply.payload.offset_hole.offset);
-    length = be32toh (h->sbuf.reply.payload.offset_hole.length);
+    type = be16toh (h->sbuf.reply.hdr.structured.type);
+
+    if (type == NBD_REPLY_TYPE_OFFSET_HOLE)
+      length = be32toh (h->sbuf.reply.payload.offset_hole.length);
+    else
+      length = be64toh (h->sbuf.reply.payload.offset_hole_ext.length);

     assert (cmd); /* guaranteed by CHECK */
-
     assert (cmd->data && cmd->type == NBD_CMD_READ);

     /* Is the data within bounds? */
@@ -435,7 +456,10 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_HOLE:
     /* The spec states that 0-length requests are unspecified, but
      * 0-length replies are broken. Still, it's easy enough to support
      * them as an extension, and this works even when length == 0.
+     * Although length is 64 bits, the bounds check above ensures that
+     * it is no larger than the 64M cap we put on NBD_CMD_READ.
      */
+    assert (length <= SIZE_MAX);
     if (!cmd->initialized)
       memset ((char *) cmd->data + offset, 0, length);
     if (CALLBACK_IS_NOT_NULL (cmd->cb.fn.chunk)) {
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  2022-11-14 22:46   ` [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length Eric Blake
@ 2022-12-16 19:32     ` Vladimir Sementsov-Ogievskiy
  2023-03-03 22:17       ` Eric Blake
  0 siblings, 1 reply; 65+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-12-16 19:32 UTC (permalink / raw)
  To: Eric Blake, nbd; +Cc: qemu-devel, qemu-block, libguestfs

On 11/15/22 01:46, Eric Blake wrote:
> The spec was silent on how many extents a server could reply with.
> However, both qemu and nbdkit (the two server implementations known to
> have implemented the NBD_CMD_BLOCK_STATUS extension) implement a hard
> cap, and will truncate the amount of extents in a reply to avoid
> sending a client a reply so large that the client would treat it as a
> denial of service attack.  Clients currently have no way during
> negotiation to request such a limit of the server, so it is easier to
> just document this as a restriction on viable server implementations
> than to add yet another round of handshaking.  Also, mentioning
> amplification effects is worthwhile.
> 
> When qemu first implemented NBD_CMD_BLOCK_STATUS for the
> base:allocation context (qemu commit e7b1948d51, Mar 2018), it behaved
> as if NBD_CMD_FLAG_REQ_ONE were always passed by the client, and never
> responded with more than one extent.  Later, when adding its
> qemu:dirty-bitmap:XYZ context extension (qemu commit 3d068aff16, Jun
> 2018), it added a cap to 128k extents (1M+4 bytes), and that cap was
> applied to base:allocation once qemu started sending multiple extents
> for that context as well (qemu commit fb7afc797e, Jul 2018).  Qemu
> extents are never smaller than 512 bytes (other than an exception at
> the end of a file whose size is not aligned to 512), but even so, a
> request for just under 4G of block status could produce 8M extents,
> resulting in a reply of 64M if it were not capped smaller.
> 
> When nbdkit first implemented NBD_CMD_BLOCK_STATUS (nbdkit 4ca66f70a5,
> Mar 2019), it did not impose any restriction on the number of extents
> in the reply chunk.  But because it allows extents as small as one
> byte, it is easy to write a server that can amplify a client's request
> of status over 1M of the image into a reply over 8M in size, and it
> was very easy to demonstrate that a hard cap was needed to avoid
> crashing clients or otherwise killing the connection (a bad server
> impacting the client negatively).  So nbdkit enforced a bound of 1M
> extents (8M+4 bytes, nbdkit commit 6e0dc839ea, Jun 2019).  [Unrelated
> to this patch, but worth noting for history: nbdkit's situation also
> has to deal with the fact that it is designed for plugin server
> implementations; and not capping the number of extents in a reply also
> posed a problem to nbdkit as the server, where a plugin could exhaust
> memory and kill the server, unrelated to any size constraints enforced
> by a client.]
> 
> Since the limit chosen by these two implementations is different, and
> since nbdkit has versions that were not limited, add this as a SHOULD
> NOT instead of MUST NOT constraint on servers implementing block
> status.  It does not matter that qemu picked a smaller limit that it
> truncates to, since we have already documented that the server may
> truncate for other reasons (such as it being inefficient to collect
> that many extents in the first place).  But documenting the limit now
> becomes even more important in the face of a future addition of 64-bit
> requests, where a client's request is no longer bounded to 4G and
> could thereby produce even more than 8M extents for the corner case
> when every 512 bytes is a new extent, if it were not for this
> recommendation.

s-o-b line missed.


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 2/6] spec: Tweak description of maximum block size
  2022-11-14 22:46   ` [PATCH v2 2/6] spec: Tweak description of maximum block size Eric Blake
@ 2022-12-16 20:22     ` Vladimir Sementsov-Ogievskiy
  2023-03-03 22:20       ` Eric Blake
  2023-02-21 15:21     ` Wouter Verhelst
  1 sibling, 1 reply; 65+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-12-16 20:22 UTC (permalink / raw)
  To: Eric Blake, nbd; +Cc: qemu-devel, qemu-block, libguestfs

On 11/15/22 01:46, Eric Blake wrote:
> Commit 9f30fedb improved the spec to allow non-payload requests that
> exceed any advertised maximum block size.  Take this one step further
> by permitting the server to use NBD_EOVERFLOW as a hint to the client
> when a request is oversize (while permitting NBD_EINVAL for
> back-compat), and by rewording the text to explicitly call out that
> what is being advertised is the maximum payload length, not maximum
> block size.  This becomes more important when we add 64-bit
> extensions, where it becomes possible to extend `NBD_CMD_BLOCK_STATUS`
> to have both an effect length (how much of the image does the client
> want status on - may be larger than 32 bits) and an optional payload
> length (a way to filter the response to a subset of negotiated
> metadata contexts).  In the shorter term, it means that a server may
> (but not must) accept a read request larger than the maximum block
> size if it can use structured replies to keep each chunk of the
> response under the maximum payload limits.
> ---
>   doc/proto.md | 127 +++++++++++++++++++++++++++++----------------------
>   1 file changed, 73 insertions(+), 54 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 8f08583..53c334a 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -745,8 +745,8 @@ text unless the client insists on TLS.
> 
>   During transmission phase, several operations are constrained by the
>   export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`,
> -as well as by three block size constraints defined here (minimum,
> -preferred, and maximum).
> +as well as by three block size constraints defined here (minimum
> +block, preferred block, and maximum payload).
> 
>   If a client can honour server block size constraints (as set out below
>   and under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the
> @@ -772,15 +772,15 @@ learn the server's constraints without committing to them.
> 
>   If block size constraints have not been advertised or agreed on
>   externally, then a server SHOULD support a default minimum block size
> -of 1, a preferred block size of 2^12 (4,096), and a maximum block size
> -that is effectively unlimited (0xffffffff, or the export size if that
> -is smaller), while a client desiring maximum interoperability SHOULD
> -constrain its requests to a minimum block size of 2^9 (512), and limit
> -`NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of
> -2^25 (33,554,432).  A server that wants to enforce block sizes other
> -than the defaults specified here MAY refuse to go into transmission
> -phase with a client that uses `NBD_OPT_EXPORT_NAME` (via a hard
> -disconnect) or which uses `NBD_OPT_GO` without requesting
> +of 1, a preferred block size of 2^12 (4,096), and a maximum block
> +payload size that is at least 2^25 (33,554,432) (even if the export

I'm not sure about term "block payload size".. What's "block payload"? May be "reply payload size" or something like this?

Or should we simply say about simple-reply / structured-reply-chunk total size limit?

> +size is smaller); while a client desiring maximum interoperability
> +SHOULD constrain its requests to a minimum block size of 2^9 (512),
> +and limit `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum
> +block size of 2^25 (33,554,432).  A server that wants to enforce block
> +sizes other than the defaults specified here MAY refuse to go into
> +transmission phase with a client that uses `NBD_OPT_EXPORT_NAME` (via
> +a hard disconnect) or which uses `NBD_OPT_GO` without requesting
>   `NBD_INFO_BLOCK_SIZE` (via an error reply of
>   `NBD_REP_ERR_BLOCK_SIZE_REQD`); but servers SHOULD NOT refuse clients
>   that do not request sizing information when the server supports
> @@ -818,17 +818,40 @@ the preferred block size for that export.  The server MAY advertise an
>   export size that is not an integer multiple of the preferred block
>   size.
> 
> -The maximum block size represents the maximum length that the server

The term "maximum block size" is not defined anymore (and removed from NBD_INFO_BLOCK_SIZE)...

> -is willing to handle in one request.  If advertised, it MAY be
> -something other than a power of 2, but MUST be either an integer
> -multiple of the minimum block size or the value 0xffffffff for no
> -inherent limit, MUST be at least as large as the smaller of the
> +The maximum block payload size represents the maximum payload length
> +that the server is willing to handle in one request.  If advertised,
> +it MAY be something other than a power of 2, but MUST be either an
> +integer multiple of the minimum block size or the value 0xffffffff for
> +no inherent limit, MUST be at least as large as the smaller of the
>   preferred block size or export size, and SHOULD be at least 2^20
>   (1,048,576) if the export is that large.  For convenience, the server
> -MAY advertise a maximum block size that is larger than the export
> -size, although in that case, the client MUST treat the export size as
> -the effective maximum block size (as further constrained by a nonzero
> -offset).
> +MAY advertise a maximum block payload size that is larger than the
> +export size, although in that case, the client MUST treat the export
> +size as an effective maximum block size (as further constrained by a

... but still used.


> +nonzero offset).  Notwithstanding any maximum block size advertised,
> +either the server or the client MAY initiate a hard disconnect if a
> +payload length of either a request or a reply would be large enough to
> +be deemed a denial of service attack; however, for maximum
> +portability, any payload *length* not exceeding 2^25 (33,554,432)
> +bytes SHOULD NOT be considered a denial of service attack, even if
> +that length is larger than the advertised maximum block payload size.
> +
> +For commands that require a payload in either direction and where the
> +client controls the payload length (`NBD_CMD_WRITE`, or `NBD_CMD_READ`
> +without structured replies), the client MUST NOT use a length larger
> +than the maximum block size. For replies where the payload length is
> +controlled by the server (`NBD_CMD_BLOCK_STATUS` without the flag
> +`NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ`, both when structured
> +replies are negotiated), the server MUST NOT send an individual reply
> +chunk larger than the maximum payload size, although it may split the
> +overall reply among several chunks.  For commands that do not require
> +a payload in either direction (such as `NBD_CMD_TRIM`), the client MAY
> +request a length larger than the maximum block size; the server SHOULD
> +NOT disconnect, but MAY reply with an `NBD_EOVERFLOW` or `NBD_EINVAL`
> +error if the oversize request would require more server resources than
> +the same command operating on only a maximum block size (such as some
> +implementations of `NBD_CMD_WRITE_ZEROES` without the flag
> +`NBD_CMD_FLAG_FAST_ZERO`, or `NBD_CMD_CACHE`).

[..]

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS
  2022-11-14 22:46   ` [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
@ 2022-12-19 18:26     ` Vladimir Sementsov-Ogievskiy
  2023-02-22  9:49     ` Wouter Verhelst
  1 sibling, 0 replies; 65+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-12-19 18:26 UTC (permalink / raw)
  To: Eric Blake, nbd; +Cc: qemu-devel, qemu-block, libguestfs

On 11/15/22 01:46, Eric Blake wrote:
> Add a new negotiation feature where the client and server agree to use
> larger packet headers on every packet sent during transmission phase.
> This has two purposes: first, it makes it possible to perform
> operations like trim, write zeroes, and block status on more than 2^32
> bytes in a single command.  For NBD_CMD_READ, replies are still
> implicitly capped by the maximum block payload limits (generally 32M);
> if you want to know if a hole larger than 32 bits can represent,
> you'll use BLOCK_STATUS instead of hoping that a large READ will
> either return a hole or report overflow.  But for
> NBD_CMD_BLOCK_STATUS, it is very useful to be able to report a status
> extent with a size larger than 32-bits, in some cases even if the
> client's request was for smaller than 32-bits (such as when it is
> known that the entire image is not sparse).  Thus, the wording chosen
> here is careful to permit a server to use either flavor status chunk
> type in its reply, and clients requesting extended headers must be
> prepared for both reply types.
> 
> Second, when structured replies are active, clients have to deal with
> the difference between 16- and 20-byte headers of simple
> vs. structured replies, which impacts performance if the client must
> perform multiple syscalls to first read the magic before knowing if
> there are even additional bytes to read to learn a payload length.  In
> extended header mode, all headers are the same width and there are no
> simple replies permitted.  The layout of the reply header is more like
> the request header; and including the client's offset in the reply
> makes it easier to convert between absolute and relative offsets for
> replies to NBD_CMD_READ.

Hmm.. Can't imagine, probably will see it in code patches.

>  Similarly, by having extended mode use a
> power-of-2 sizing, it becomes easier to manipulate arrays of headers
> without worrying about an individual header crossing a cache line.

Do we really manipulate with such arrays in Qemu or Libnbd or Nbdkit?

> However, note that this change only affects the headers; data payloads
> can still be unaligned (for example, a client performing 1-byte reads
> or writes).  We would need to negotiate yet another extension if we
> wanted to ensure that all NBD transmission packets started on an
> 8-byte boundary after option haggling has completed.
> 
> This spec addition was done in parallel with proof of concept
> implementations in qemu (server and client), libnbd (client), and
> nbdkit (server).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

In general looks good to me.

I'm not sure in extra bytes to get 8bytes-alignment.. Maybe that's a right way.

> ---
>   doc/proto.md | 481 ++++++++++++++++++++++++++++++++++++++-------------
>   1 file changed, 358 insertions(+), 123 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 53c334a..fde1e70 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -280,34 +280,53 @@ a soft disconnect.
> 
>   ### Transmission
> 
> -There are three message types in the transmission phase: the request,
> -the simple reply, and the structured reply chunk.  The
> +There are two general message types in the transmission phase: the
> +request (simple or extended), and the reply (simple, structured, or
> +extended).  The determination of which message headers to use is
> +determined during handshaking phase, based on whether

"determination is determined" sounds strange

should we also define "compact" in this paragraph? [*]

> +`NBD_OPT_STRUCTURED_REPLY` or `NBD_OPT_EXTENDED_HEADERS` was requested
> +by the client and given a successful response by the server.  The
>   transmission phase consists of a series of transactions, where the
>   client submits requests and the server sends corresponding replies
>   with either a single simple reply or a series of one or more
> -structured reply chunks per request.  The phase continues until
> -either side terminates transmission; this can be performed cleanly
> -only by the client.
> +structured or extended reply chunks per request.  The phase continues
> +until either side terminates transmission; this can be performed
> +cleanly only by the client.
> 
>   Note that without client negotiation, the server MUST use only simple
>   replies, and that it is impossible to tell by reading the server
>   traffic in isolation whether a data field will be present; the simple
>   reply is also problematic for error handling of the `NBD_CMD_READ`
> -request.  Therefore, structured replies can be used to create a
> -a context-free server stream; see below.
> +request.  Therefore, structured or extended replies can be used to
> +create a a context-free server stream; see below.

s/a a/a/

> +
> +The results of client negotiation also determine whether the client
> +and server will utilize only compact requests and replies, or whether

first place, where "compact" term appears and it is not defined. [*]

> +both sides will use only extended packets.  Compact messages are the
> +default, but inherently limit single transactions to a 32-bit window
> +starting at a 64-bit offset.  Extended messages make it possible to
> +perform 64-bit transactions (although typically only for commands that
> +do not include a data payload).  Furthermore, when only structured
> +replies have been negotiated, compact messages require the client to
> +perform partial reads to determine which reply packet style (16-byte
> +simple or 20-byte structured) is on the wire before knowing the length
> +of the rest of the reply, which can reduce client performance.  With
> +extended messages, all packet headers have a fixed length of 32 bytes,
> +and although this results in more traffic over the network, the
> +resulting layout is friendlier for performance.
> 
>   Replies need not be sent in the same order as requests (i.e., requests
> -may be handled by the server asynchronously), and structured reply
> -chunks from one request may be interleaved with reply messages from
> -other requests; however, there may be constraints that prevent
> -arbitrary reordering of structured reply chunks within a given reply.
> +may be handled by the server asynchronously), and structured or
> +extended reply chunks from one request may be interleaved with reply
> +messages from other requests; however, there may be constraints that
> +prevent arbitrary reordering of reply chunks within a given reply.
>   Clients SHOULD use a handle that is distinct from all other currently
>   pending transactions, but MAY reuse handles that are no longer in
>   flight; handles need not be consecutive.  In each reply message
> -(whether simple or structured), the server MUST use the same value for
> -handle as was sent by the client in the corresponding request.  In
> -this way, the client can correlate which request is receiving a
> -response.
> +(whether simple, structured, or extended), the server MUST use the
> +same value for handle as was sent by the client in the corresponding
> +request.  In this way, the client can correlate which request is
> +receiving a response.
> 
>   #### Ordering of messages and writes
> 
> @@ -344,7 +363,10 @@ may be useful.
> 
>   #### Request message
> 
> -The request message, sent by the client, looks as follows:
> +The compact request message is sent by the client when extended
> +transactions are not in use (either the client did not request
> +extended headers during negotiation, or the server responded that
> +`NBD_OPT_EXTENDED_HEADERS` is unsupported), and looks as follows:
> 
>   C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`)
>   C: 16 bits, command flags
> @@ -354,19 +376,54 @@ C: 64 bits, offset (unsigned)
>   C: 32 bits, length (unsigned)
>   C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)
> 
> +In the compact style, the client MUST NOT use the
> +`NBD_CMD_FLAG_PAYLOAD_LEN` flag; and the only command where *length*
> +represents payload length instead of effect length is `NBD_CMD_WRITE`.

I'd say it represents both payload length and effect length in this case..

> +
> +If negotiation agreed on extended transactions with
> +`NBD_OPT_EXTENDED_HEADERS`, the client instead uses extended requests:
> +
> +C: 32 bits, 0x21e41c71, magic (`NBD_EXTENDED_REQUEST_MAGIC`)
> +C: 16 bits, command flags
> +C: 16 bits, type
> +C: 64 bits, handle
> +C: 64 bits, offset (unsigned)
> +C: 64 bits, payload/effect length (unsigned)
> +C: (*length* bytes of data if *flags* includes `NBD_CMD_FLAG_PAYLOAD_LEN`)
> +
> +With extended headers, the meaning of the *length* field depends on
> +whether *flags* contains `NBD_CMD_FLAG_PAYLOAD_LEN` (that many
> +additional bytes of payload are present), or if the flag is absent
> +(there is no payload, and *length* instead is an effect length
> +describing how much of the image the request operates on).  The
> +command `NBD_CMD_WRITE` MUST use the flag `NBD_CMD_FLAG_PAYLOAD_LEN`
> +in this mode; while other commands SHOULD avoid the flag if the
> +server has not indicated extension suppport for payloads on that
> +command.  A server SHOULD initiate hard disconnect if a client sets
> +the `NBD_CMD_FLAG_PAYLOAD_LEN` flag and uses a *length* larger than
> +a server's advertised or default maximum payload length (capped at
> +32 bits by the constraints of `NBD_INFO_BLOCK_SIZE`); in all other
> +cases, a server SHOULD gracefully consume *length* bytes of payload
> +(even if it then replies with an `NBD_EINVAL` failure because the
> +particular command was not expecting a payload), and proceed with
> +the next client command.  Thus, only when *length* is used as an
> +effective length will it utilize a full 64-bit value.
> +
>   #### Simple reply message
> 
>   The simple reply message MUST be sent by the server in response to all
> -requests if structured replies have not been negotiated using
> -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a simple
> +requests if neither structured replies (`NBD_OPT_STRUCTURED_REPLY`)
> +nor extended headers (`NBD_OPT_EXTENDED_HEADERS`) have been
> +negotiated.  If structured replies have been negotiated, a simple
>   reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
> -but only if the reply has no data payload.  The message looks as
> -follows:
> +but only if the reply has no data payload.  If extended headers have
> +been negotiated, a simple reply MUST NOT be used.  The message looks
> +as follows:
> 
>   S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
>      `NBD_REPLY_MAGIC`)
>   S: 32 bits, error (MAY be zero)
> -S: 64 bits, handle
> +S: 64 bits, handle (MUST match the request)
>   S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
>       *error* is zero)
> 
> @@ -416,9 +473,9 @@ on the chunks received.
>   A structured reply chunk message looks as follows:
> 
>   S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
> -S: 16 bits, flags
> +S: 16 bits, reply flags
>   S: 16 bits, type
> -S: 64 bits, handle
> +S: 64 bits, handle (MUST match the request)
>   S: 32 bits, length of payload (unsigned)
>   S: *length* bytes of payload data (if *length* is nonzero)
> 
> @@ -426,6 +483,45 @@ The use of *length* in the reply allows context-free division of
>   the overall server traffic into individual reply messages; the
>   *type* field describes how to further interpret the payload.
> 
> +#### Extended reply chunk message
> +
> +One implementation drawback of the original structured replies via
> +`NBD_OPT_STRUCTURED_REPLY` is that a client must read the magic number
> +to know whether it will then be reading a 16-byte or 20-byte header
> +from the server.  Another is that an array of 20-byte structures is
> +not always cache-line friendly, compared to power-of-2 sizing.
> +Therefore, a client may instead attempt to negotiate extended headers
> +via `NBD_OPT_EXTENDED_HEADERS` in place of structured replies.
> +
> +If extended headers are negotiated, all server replies MUST use an
> +extended reply; the server MUST NOT use simple replies even when there
> +is no payload.  Most fields in the extended reply have the same
> +semantics as their counterparts in structured replies (including the
> +use of `NBD_REPLY_FLAG_DONE` to end a sequence of one or more chunks
> +comprising the overall reply).  The extended reply has an additional
> +field *offset* which MUST match the *offset* of the client's request
> +(even in reply types such as `NBD_REPLY_TYPE_OFFSET_DATA` where the
> +payload itself contains a different offset representing the middle of
> +the buffer).
> +
> +The other difference between extended replies and structured replies
> +is that the *length* field is 64 bits to match the layout of the
> +extended request header.  However, note that while the request flags
> +determine whether a request uses a 32-bit payload length or a 64-bit
> +effect length, at this time the reply length is used solely for
> +payload lengths and so in practice will never exceed a 32-bit value
> +(see `NBD_INFO_BLOCK_SIZE`).
> +
> +An extended reply chunk message looks as follows:
> +
> +S: 32 bits, 0x6e8a278c, magic (`NBD_EXTENDED_REPLY_MAGIC`)
> +S: 16 bits, reply flags
> +S: 16 bits, type
> +S: 64 bits, handle (MUST match the request)
> +S: 64 bits, offset (MUST match the request)
> +S: 64 bits, length of payload (unsigned)
> +S: *length* bytes of payload data (if *length* is nonzero)
> +
>   #### Terminating the transmission phase
> 
>   There are two methods of terminating the transmission phase:
> @@ -838,18 +934,19 @@ that length is larger than the advertised maximum block payload size.
> 
>   For commands that require a payload in either direction and where the
>   client controls the payload length (`NBD_CMD_WRITE`, or `NBD_CMD_READ`
> -without structured replies), the client MUST NOT use a length larger
> -than the maximum block size. For replies where the payload length is
> -controlled by the server (`NBD_CMD_BLOCK_STATUS` without the flag
> -`NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ`, both when structured
> -replies are negotiated), the server MUST NOT send an individual reply
> -chunk larger than the maximum payload size, although it may split the
> -overall reply among several chunks.  For commands that do not require
> -a payload in either direction (such as `NBD_CMD_TRIM`), the client MAY
> -request a length larger than the maximum block size; the server SHOULD
> -NOT disconnect, but MAY reply with an `NBD_EOVERFLOW` or `NBD_EINVAL`
> -error if the oversize request would require more server resources than
> -the same command operating on only a maximum block size (such as some
> +without structured or extended replies), the client MUST NOT use a
> +length larger than the maximum block size. For replies where the
> +payload length is controlled by the server (`NBD_CMD_BLOCK_STATUS`
> +without the flag `NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ`, both when
> +structured replies or extended headers are negotiated), the server
> +MUST NOT send an individual reply chunk larger than the maximum
> +payload size, although it may split the overall reply among several
> +chunks.  For commands that do not require a payload in either
> +direction (such as `NBD_CMD_TRIM`), the client MAY request a length
> +larger than the maximum block size; the server SHOULD NOT disconnect,
> +but MAY reply with an `NBD_EOVERFLOW` or `NBD_EINVAL` error if the
> +oversize request would require more server resources than the same
> +command operating on only a maximum block size (such as some
>   implementations of `NBD_CMD_WRITE_ZEROES` without the flag
>   `NBD_CMD_FLAG_FAST_ZERO`, or `NBD_CMD_CACHE`).
> 
> @@ -894,12 +991,11 @@ The procedure works as follows:
>   - During transmission, a client can then indicate interest in metadata
>     for a given region by way of the `NBD_CMD_BLOCK_STATUS` command,
>     where *offset* and *length* indicate the area of interest. On
> -  success, the server MUST respond with one structured reply chunk of
> -  type `NBD_REPLY_TYPE_BLOCK_STATUS` per metadata context selected
> -  during negotiation, where each reply chunk is a list of one or more
> -  consecutive extents for that context.  Each extent comes with a
> -  *flags* field, the semantics of which are defined by the metadata
> -  context.
> +  success, the server MUST respond with one status type reply chunk
> +  per metadata context selected during negotiation, where each reply
> +  chunk is a list of one or more consecutive extents for that context.
> +  Each extent comes with a *flags* field, the semantics of which are
> +  defined by the metadata context.
> 
>   The client's requested *length* is only a hint to the server, so the
>   cumulative extent length contained in a chunk of the server's reply
> @@ -920,9 +1016,10 @@ same export name for the subsequent `NBD_OPT_GO` (or
>   sending `NBD_CMD_BLOCK_STATUS` without selecting at least one metadata
>   context.
> 
> -The reply to the `NBD_CMD_BLOCK_STATUS` request MUST be sent as a
> -structured reply; this implies that in order to use metadata querying,
> -structured replies MUST be negotiated first.
> +The reply to a successful `NBD_CMD_BLOCK_STATUS` request MUST be sent
> +via reply chunks; this implies that in order to use metadata querying,
> +either structured replies or extended headers MUST be negotiated
> +first.
> 
>   Metadata contexts are identified by their names. The name MUST consist
>   of a namespace, followed by a colon, followed by a leaf-name.  The
> @@ -1097,12 +1194,12 @@ The field has the following format:
>   - bit 5, `NBD_FLAG_SEND_TRIM`: exposes support for `NBD_CMD_TRIM`.
>   - bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`: exposes support for
>     `NBD_CMD_WRITE_ZEROES` and `NBD_CMD_FLAG_NO_HOLE`.
> -- bit 7, `NBD_FLAG_SEND_DF`: do not fragment a structured reply. The
> -  server MUST set this transmission flag to 1 if the
> -  `NBD_CMD_READ` request supports the `NBD_CMD_FLAG_DF` flag, and
> -  MUST leave this flag clear if structured replies have not been
> -  negotiated. Clients MUST NOT set the `NBD_CMD_FLAG_DF` request
> -  flag unless this transmission flag is set.
> +- bit 7, `NBD_FLAG_SEND_DF`: do not fragment a structured or extended
> +  reply. The server MUST set this transmission flag to 1 if the
> +  `NBD_CMD_READ` request supports the `NBD_CMD_FLAG_DF` flag, and MUST
> +  leave this flag clear if neither structured replies nor extended
> +  headers have been negotiated. Clients MUST NOT set the
> +  `NBD_CMD_FLAG_DF` request flag unless this transmission flag is set.
>   - bit 8, `NBD_FLAG_CAN_MULTI_CONN`: Indicates that the server operates
>     entirely without cache, or that the cache it uses is shared among all
>     connections to the given device. In particular, if this flag is
> @@ -1212,10 +1309,10 @@ of the newstyle negotiation.
> 
>       When this command succeeds, the server MUST NOT preserve any
>       negotiation state (such as a request for
> -    `NBD_OPT_STRUCTURED_REPLY`, or metadata contexts from
> -    `NBD_OPT_SET_META_CONTEXT`) issued before this command.  A client
> -    SHOULD defer all stateful option requests until after it
> -    determines whether encryption is available.
> +    `NBD_OPT_STRUCTURED_REPLY` or `NBD_OPT_EXTENDED_HEADERS`, or
> +    metadata contexts from `NBD_OPT_SET_META_CONTEXT`) issued before
> +    this command.  A client SHOULD defer all stateful option requests
> +    until after it determines whether encryption is available.
> 
>       See the section on TLS above for further details.
> 
> @@ -1296,6 +1393,10 @@ of the newstyle negotiation.
>         `NBD_REP_ERR_BLOCK_SIZE_REQD` error SHOULD ensure it first
>         sends an `NBD_INFO_BLOCK_SIZE` information reply in order
>         to help avoid a potentially unnecessary round trip.
> +    - `NBD_REP_ERR_EXT_HEADER_REQD`: The server requires the client to
> +      utilize extended headers.  While this may make it easier to
> +      implement a server with fewer considerations for backwards
> +      compatibility, it limits connections to only recent clients.
> 
>       Additionally, if TLS has not been initiated, the server MAY reply
>       with `NBD_REP_ERR_TLS_REQD` (instead of `NBD_REP_ERR_UNKNOWN`) to
> @@ -1350,6 +1451,9 @@ of the newstyle negotiation.
>         server MUST use structured replies to the `NBD_CMD_READ`
>         transmission request.  Other extensions that require structured
>         replies may now be negotiated.
> +    - `NBD_REP_ERR_EXT_HEADER_REQD`: The client has already
> +      successfully negotiated extended headers, which takes precedence
> +      over this option.
>       - For backwards compatibility, clients SHOULD be prepared to also
>         handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies
>         will be sent.
> @@ -1357,9 +1461,10 @@ of the newstyle negotiation.
>       It is envisioned that future extensions will add other new
>       requests that may require a data payload in the reply.  A server
>       that supports such extensions SHOULD NOT advertise those
> -    extensions until the client negotiates structured replies; and a
> -    client MUST NOT make use of those extensions without first
> -    enabling the `NBD_OPT_STRUCTURED_REPLY` extension.
> +    extensions until the client has negotiated either structured
> +    replies or extended headers; and a client MUST NOT make use of
> +    those extensions without first enabling support for reply
> +    payloads.
> 
>       If the client requests `NBD_OPT_STARTTLS` after this option, it
>       MUST renegotiate structured replies and any other dependent
> @@ -1370,9 +1475,10 @@ of the newstyle negotiation.
>       Return a list of `NBD_REP_META_CONTEXT` replies, one per context,
>       followed by an `NBD_REP_ACK` or an error.
> 
> -    This option SHOULD NOT be requested unless structured replies have
> -    been negotiated first. If a client attempts to do so, a server
> -    MAY send `NBD_REP_ERR_INVALID`.
> +    This option SHOULD NOT be requested unless structured replies or
> +    extended headers have been negotiated first. If a client attempts
> +    to do so, a server MAY send `NBD_REP_ERR_INVALID` or
> +    `NBD_REP_ERR_EXT_HEADER_REQD`.
> 
>       Data:
>       - 32 bits, length of export name.
> @@ -1454,9 +1560,10 @@ of the newstyle negotiation.
>       they are interested in are selected with the final query that they
>       sent.
> 
> -    This option MUST NOT be requested unless structured replies have
> -    been negotiated first. If a client attempts to do so, a server
> -    SHOULD send `NBD_REP_ERR_INVALID`.
> +    This option MUST NOT be requested unless structured replies or
> +    extended headers have been negotiated first. If a client attempts

Not really good that we have to fix documentation like this everywhere where noted structured replies..

And probably will have to do so for future extensions.


> +    to do so, a server SHOULD send `NBD_REP_ERR_INVALID` or
> +    `NBD_REP_ERR_EXT_HEADER_REQD`.
> 
>       A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the
>       negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least
> @@ -1493,6 +1600,46 @@ of the newstyle negotiation.
>       option does not select any metadata context, provided the client
>       then does not attempt to issue `NBD_CMD_BLOCK_STATUS` commands.
> 
> +* `NBD_OPT_EXTENDED_HEADERS` (11)
> +
> +    The client wishes to use extended headers during the transmission
> +    phase.  The client MUST NOT send any additional data with the
> +    option, and the server SHOULD reject a request that includes data
> +    with `NBD_REP_ERR_INVALID`.
> +
> +    When successful, this option takes precedence over structured
> +    replies.  A client MAY request structured replies first, although
> +    a server SHOULD support this option even if structured replies are
> +    not negotiated.
> +
> +    It is envisioned that future extensions will add other new

Hm. We should somehow not forget to replace 'future' with something  when introduce these future extensions..

Now I understand, what means 'future'.. But what will it mean for someone who read the whole documentation for the first time?

> +    requests that support a data payload in the request or reply.  A
> +    server that supports such extensions SHOULD NOT advertise those
> +    extensions until the client has negotiated extended headers; and a
> +    client MUST NOT make use of those extensions without first
> +    enabling support for reply payloads.
> +
> +    The server replies with the following, or with an error permitted
> +    elsewhere in this document:
> +
> +    - `NBD_REP_ACK`: Extended headers have been negotiated; the client
> +      MUST use the 32-byte extended request header, with proper use of
> +      `NBD_CMD_FLAG_PAYLOAD_LEN` for all commands sending a payload;

Is I understand s/for all commands sending a payload/for all commands/.. Or it is allowed to send both types of request?

> +      and the server MUST use the 32-byte extended reply header.
> +    - For backwards compatibility, clients SHOULD be prepared to also
> +      handle `NBD_REP_ERR_UNSUP`; in this case, only the compact
> +      transmission headers will be used.
> +
> +    Note that a response of `NBD_REP_ERR_BLOCK_SIZE_REQD` does not
> +    make sense in response to this command, but a server MAY fail with
> +    that error for a later `NBD_OPT_GO` without a client request for
> +    `NBD_INFO_BLOCK_SIZE`, since the use of extended headers provides
> +    more incentive for a client to promise to obey block size
> +    constraints.
> +
> +    If the client requests `NBD_OPT_STARTTLS` after this option, it
> +    MUST renegotiate extended headers.
> +
>   #### Option reply types
> 
>   These values are used in the "reply type" field, sent by the server
> @@ -1672,6 +1819,16 @@ case that data is an error message string suitable for display to the user.
> 
>       The request or the reply is too large to process.
> 
> +* `NBD_REP_ERR_EXT_HEADER_REQD` (2^31 + 10)
> +
> +    The server is unwilling to proceed with the given request (such as
> +    `NBD_OPT_GO` or `NBD_OPT_SET_META_CONTEXT`) for a client that has
> +    not yet requested extended headers (via
> +    `NBD_OPT_EXTENDED_HEADERS`).  This error is also used to indicate
> +    that the server is unable to downgrade to structured replies (via
> +    `NBD_OPT_STRUCTURED_REPLY`) if extended headers have already been
> +    enabled.
> +
>   ### Transmission phase
> 
>   #### Flag fields
> @@ -1725,6 +1882,17 @@ valid may depend on negotiation during the handshake phase.
>     `NBD_CMD_WRITE`, then the server MUST fail quickly with an error of
>     `NBD_ENOTSUP`. The client MUST NOT set this unless the server advertised
>     `NBD_FLAG_SEND_FAST_ZERO`.
> +- bit 5, `NBD_CMD_FLAG_PAYLOAD_LEN`; only valid if extended headers
> +  were negotiated via `NBD_OPT_EXTENDED_HEADERS`.  If set, the
> +  *length* field of the header describes a (nonzero) payload length of
> +  data to be sent alongside the header; if the flag is clear, the
> +  header *length* is instead an effect length for the command's
> +  interaction with the export, and there is no payload after the
> +  header.  With extended headers, the flag MUST be set for
> +  `NBD_CMD_WRITE` (as the write command always sends a payload of the
> +  bytes to be written); for other commands, the flag will trigger an
> +  `NBD_EINVAL` error unless the server has advertised support for an
> +  extension payload form for the command.
> 
>   ##### Structured reply flags
> 
> @@ -1746,13 +1914,15 @@ unrecognized flags.
> 
>   #### Structured reply types
> 
> -These values are used in the "type" field of a structured reply.
> -Some chunk types can additionally be categorized by role, such as
> -*error chunks* or *content chunks*.  Each type determines how to
> -interpret the "length" bytes of payload.  If the client receives
> -an unknown or unexpected type, other than an *error chunk*, it
> -MUST initiate a hard disconnect.  A server MUST NOT send a chunk
> -larger than any advertised maximum block payload size.
> +These values are used in the "type" field of a structured reply.  Some
> +chunk types can additionally be categorized by role, such as *error
> +chunks*, *content chunks*, or *status chunks*.  Each type determines
> +how to interpret the "length" bytes of payload.  If the client
> +receives an unknown or unexpected type, other than an *error chunk*,
> +it MAY initiate a hard disconnect on the grounds that the client is
> +uncertain whether the server handled the request as desired.  A server
> +MUST NOT send a chunk larger than any advertised maximum block payload
> +size.
> 
>   * `NBD_REPLY_TYPE_NONE` (0)
> 
> @@ -1795,13 +1965,28 @@ larger than any advertised maximum block payload size.
>     64 bits: offset (unsigned)
>     32 bits: hole size (unsigned, MUST be nonzero)
> 
> +  At this time, although servers that support extended headers are

"At this time" is like "future". Not sure that it's good for documentation.

> +  permitted to accept client requests for `NBD_CMD_READ` with an
> +  effect length larger than any advertised maximum block payload size
> +  by splitting the reply into multiple chunks, portable clients SHOULD
> +  NOT request a read *length* larger than 32 bits (corresponding to
> +  the maximum block payload constraint implied by
> +  `NBD_INFO_BLOCK_SIZE`), and therefore a 32-bit constraint on the
> +  *hole size* does not represent an arbitrary limitation.  Should a
> +  future scenario arise where it can be demonstrated that a client and
> +  server would benefit from an extension allowing a maximum block
> +  payload size to be larger than 32 bits, that extension would also
> +  introduce a counterpart reply type that can express a 64-bit *hole
> +  size*.
> +
>   * `NBD_REPLY_TYPE_BLOCK_STATUS` (5)
> 
> -  *length* MUST be 4 + (a positive integer multiple of 8).  This reply
> -  represents a series of consecutive block descriptors where the sum
> -  of the length fields within the descriptors is subject to further
> -  constraints documented below.  A successful block status request MUST
> -  have exactly one status chunk per negotiated metadata context ID.
> +  This chunk type is in the status chunk category.  *length* MUST be
> +  4 + (a positive integer multiple of 8).  This reply represents a
> +  series of consecutive block descriptors where the sum of the length
> +  fields within the descriptors is subject to further constraints
> +  documented below.  A successful block status request MUST have
> +  exactly one status chunk per negotiated metadata context ID.
> 
>     The payload starts with:
> 
> @@ -1843,6 +2028,43 @@ larger than any advertised maximum block payload size.
>     extent information at the first offset not covered by a
>     reduced-length reply.
> 
> +* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)
> +
> +  This chunk type is in the status chunk category.  *length* MUST be
> +  8 + (a positive multiple of 16).  The semantics of this chunk mirror
> +  those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a
> +  larger *extent length* field, added padding in each descriptor to
> +  ease alignment, and the addition of a *descriptor count* field that
> +  can be used for easier client processing.  This chunk type MUST NOT
> +  be used unless extended headers were negotiated with
> +  `NBD_OPT_EXTENDED_HEADERS`.
> +
> +  If the *descriptor count* field contains 0, the number of subsequent
> +  descriptors is determined solely by the *length* field of the reply
> +  header.  However, the server MAY populate the *descriptor count*
> +  field with the number of descriptors that follow; when doing this,
> +  the server MUST ensure that the header *length* is equal to
> +  *descriptor count* * 16 + 8.
> +
> +  The payload starts with:
> +
> +  32 bits, metadata context ID
> +  32 bits, descriptor count
> +
> +  and is followed by a list of one or more descriptors, each with this
> +  layout:
> +
> +  64 bits, length of the extent to which the status below
> +     applies (unsigned, MUST be nonzero)
> +  32 bits, padding (MUST be zero)
> +  32 bits, status flags
> +
> +  Note that even when extended headers are in use, the client MUST be
> +  prepared for the server to use either the compact or extended chunk
> +  type, regardless of whether the client's hinted effect length was
> +  more or less than 32 bits; but the server MUST use exactly one of
> +  the two chunk types per negotiated metacontext ID.

Maybe better to negotiate this thing with additional negotiation option?
So that client knows in advance what kind of replies to expect.

> +
>   All error chunk types have bit 15 set, and begin with the same
>   *error*, *message length*, and optional *message* fields as
>   `NBD_REPLY_TYPE_ERROR`.  If nonzero, *message length* indicates
> @@ -1857,8 +2079,9 @@ remaining structured fields at the end.
>     and the client MAY NOT make any assumptions about partial
>     success. This type SHOULD NOT be used more than once in a
>     structured reply.  Valid as a reply to any request.  Note that
> -  *message length* MUST NOT exceed the 4096 bytes string length
> -  limit.
> +  *message length* MUST NOT exceed the 4096 bytes string length limit,
> +  and therefore there is no need for a counterpart extended-length
> +  error chunk type.
> 
>     The payload is structured as:
> 
> @@ -1905,19 +2128,21 @@ The following request types exist:
> 
>       A read request. Length and offset define the data to be read. The
>       server MUST reply with either a simple reply or a structured
> -    reply, according to whether the structured replies have been
> -    negotiated using `NBD_OPT_STRUCTURED_REPLY`. The client SHOULD NOT
> -    request a read length of 0; the behavior of a server on such a
> +    reply, according to whether structured replies
> +    (`NBD_OPT_STRUCTURED_REPLY`) or extended headers
> +    (`NBD_OPT_EXTENDED_HEADERS`) were negotiated.  The client SHOULD
> +    NOT request a read length of 0; the behavior of a server on such a
>       request is unspecified although the server SHOULD NOT disconnect.
> 
>       *Simple replies*
> 
> -    If structured replies were not negotiated, then a read request
> -    MUST always be answered by a simple reply, as documented above
> -    (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing
> -    length bytes of data according to the client's request), which in
> -    turn means any client request with a length larger than the
> -    maximum block payload size will fail.
> +    If neither structured replies nor extended headers were
> +    negotiated, then a read request MUST always be answered by a
> +    simple reply, as documented above (using magic 0x67446698
> +    `NBD_SIMPLE_REPLY_MAGIC`, and containing length bytes of data
> +    according to the client's request), which in turn means any client
> +    request with a length larger than the maximum block payload size
> +    will fail.
> 
>       If an error occurs, the server SHOULD set the appropriate error code
>       in the error field. The server MAY then initiate a hard disconnect.
> @@ -1930,13 +2155,15 @@ The following request types exist:
> 
>       *Structured replies*
> 
> -    If structured replies are negotiated, then a read request MUST
> -    result in a structured reply with one or more chunks (each using
> -    magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), where the final
> -    chunk has the flag `NBD_REPLY_FLAG_DONE`, and with the following
> -    additional constraints.
> +    If structured replies or extended headers are negotiated, then a
> +    read request MUST result in a reply with one or more structured
> +    chunks (each using `NBD_STRUCTURED_REPLY_MAGIC` or
> +    `NBD_EXTENDED_REPLY_MAGIC` according to what was negotiated),
> +    where the final chunk has the flag `NBD_REPLY_FLAG_DONE`, and with
> +    the following additional constraints.
> 
> -    The server MAY split the reply into any number of content chunks;
> +    The server MAY split the reply into any number of content chunks
> +    (`NBD_REPLY_TYPE_OFFSET_DATA` and `NBD_REPLY_TYPE_OFFSET_HOLE`);
>       each chunk MUST describe at least one byte, although to minimize
>       overhead, the server SHOULD use chunks with lengths and offsets as
>       an integer multiple of 512 bytes, where possible (the first and
> @@ -1949,12 +2176,13 @@ The following request types exist:
>       send additional content chunks even after reporting an error
>       chunk.  A server MAY support read requests larger than the maximum
>       block payload size by splitting the response across multiple
> -    chunks (in particular, a request for more than 2^32 - 8 bytes
> -    containing data rather than holes MUST be split to avoid
> -    overflowing the `NBD_REPLY_TYPE_OFFSET_DATA` length field);
> -    however, the server is also permitted to reject large read
> -    requests up front, so a client should be prepared to retry with
> -    smaller requests if a large request fails.
> +    chunks (in particular, if extended headers are not in use, a
> +    request for more than 2^32 - 8 bytes containing data rather than
> +    holes MUST be split to avoid overflowing the 32-bit
> +    `NBD_REPLY_TYPE_OFFSET_DATA` length field); however, the server is
> +    also permitted to reject large read requests up front, so a client
> +    should be prepared to retry with smaller requests if a large
> +    request fails.
> 
>       When no error is detected, the server MUST send enough data chunks
>       to cover the entire region described by the offset and length of
> @@ -2180,10 +2408,10 @@ The following request types exist:
> 
>   * `NBD_CMD_BLOCK_STATUS` (7)
> 
> -    A block status query request. Length and offset define the range
> -    of interest.  The client SHOULD NOT request a status length of 0;
> -    the behavior of a server on such a request is unspecified although
> -    the server SHOULD NOT disconnect.
> +    A block status query request. The effect length is a hint to the
> +    server about the range of interest.  The client SHOULD NOT request
> +    a status length of 0; the behavior of a server on such a request
> +    is unspecified although the server SHOULD NOT disconnect.
> 
>       A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the
>       negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least
> @@ -2191,28 +2419,29 @@ The following request types exist:
>       same export name used to enter transmission phase, and where the
>       server returned at least one metadata context without an error.
>       This in turn requires the client to first negotiate structured
> -    replies. For a successful return, the server MUST use a structured
> -    reply, containing exactly one chunk of type
> -    `NBD_REPLY_TYPE_BLOCK_STATUS` per selected context id, where the
> -    status field of each descriptor is determined by the flags field
> -    as defined by the metadata context.  The server MAY send chunks in
> -    a different order than the context ids were assigned in reply to
> -    `NBD_OPT_SET_META_CONTEXT`.
> +    replies or extended headers. For a successful return, the server
> +    MUST use one reply chunk per selected context id (only
> +    `NBD_REPLY_TYPE_BLOCK_STATUS` for structured replies, and either
> +    `NBD_REPLY_TYPE_BLOCK_STATUS` or `NBD_REPLY_TYPE_BLOCK_STATUS_EXT`
> +    for extended headers).  The status field of each descriptor is
> +    determined by the flags field as defined by the metadata context.
> +    The server MAY send chunks in a different order than the context
> +    ids were assigned in reply to `NBD_OPT_SET_META_CONTEXT`.
> 
> -    The list of block status descriptors within the
> -    `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
> -    of the file starting from specified *offset*.  If the client used
> -    the `NBD_CMD_FLAG_REQ_ONE` flag, each chunk contains exactly one
> -    descriptor where the *length* of the descriptor MUST NOT be
> -    greater than the *length* of the request; otherwise, a chunk MAY
> -    contain multiple descriptors, and the final descriptor MAY extend
> -    beyond the original requested size if the server can determine a
> -    larger length without additional effort.  On the other hand, the
> -    server MAY return less data than requested.  In particular, a
> -    server SHOULD NOT send more than 2^20 status descriptors in a
> -    single chunk.  However the server MUST return at least one status
> -    descriptor, and since each status descriptor has a non-zero
> -    length, a client can always make progress on a successful return.
> +    The list of block status descriptors within a given status chunk
> +    represent consecutive portions of the file starting from specified
> +    *offset*.  If the client used the `NBD_CMD_FLAG_REQ_ONE` flag,
> +    each chunk contains exactly one descriptor where the *length* of
> +    the descriptor MUST NOT be greater than the *length* of the
> +    request; otherwise, a chunk MAY contain multiple descriptors, and
> +    the final descriptor MAY extend beyond the original requested size
> +    if the server can determine a larger length without additional
> +    effort.  On the other hand, the server MAY return less data than
> +    requested.  In particular, a server SHOULD NOT send more than 2^20
> +    status descriptors in a single chunk.  However the server MUST
> +    return at least one status descriptor, and since each status
> +    descriptor has a non-zero length, a client can always make
> +    progress on a successful return.
> 
>       The server SHOULD use different *status* values between
>       consecutive descriptors where feasible, although the client SHOULD
> @@ -2412,6 +2641,10 @@ implement the following features:
>     Clients that do not implement querying for block size constraints
>     SHOULD abide by the rules laid out in the section "Block size
>     constraints", above.
> +- Servers that implement extended headers but desire interoperability
> +  with older client implementations SHOULD NOT use the
> +  `NBD_REP_ERR_EXT_HEADER_REQD` error response to `NBD_OPT_GO`, but
> +  should gracefully support compact headers.
> 
>   ### Future considerations
> 
> @@ -2421,6 +2654,8 @@ implementations are not yet ready to support them:
> 
>   - Structured replies; the Linux kernel currently does not yet implement
>     them.
> +- Extended headers; these are similar to structured replies, but is
> +  new enough that many clients and servers do not yet implement them.

"External headers" looks like improved (if not say "fixed") "Structured replies", not just "similar".. Maybe, mention it here or somewhere?


> 
>   ## About this file
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 2/6] spec: Tweak description of maximum block size
  2022-11-14 22:46   ` [PATCH v2 2/6] spec: Tweak description of maximum block size Eric Blake
  2022-12-16 20:22     ` Vladimir Sementsov-Ogievskiy
@ 2023-02-21 15:21     ` Wouter Verhelst
  2023-03-03 22:26       ` Eric Blake
  1 sibling, 1 reply; 65+ messages in thread
From: Wouter Verhelst @ 2023-02-21 15:21 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd, qemu-devel, qemu-block, libguestfs

Hi Eric,

Busy days, busy times. Sorry about the insane delays here.

On Mon, Nov 14, 2022 at 04:46:51PM -0600, Eric Blake wrote:
> Commit 9f30fedb improved the spec to allow non-payload requests that
> exceed any advertised maximum block size.  Take this one step further
> by permitting the server to use NBD_EOVERFLOW as a hint to the client
> when a request is oversize (while permitting NBD_EINVAL for
> back-compat), and by rewording the text to explicitly call out that
> what is being advertised is the maximum payload length, not maximum
> block size.  This becomes more important when we add 64-bit
> extensions, where it becomes possible to extend `NBD_CMD_BLOCK_STATUS`
> to have both an effect length (how much of the image does the client
> want status on - may be larger than 32 bits) and an optional payload
> length (a way to filter the response to a subset of negotiated
> metadata contexts).  In the shorter term, it means that a server may
> (but not must) accept a read request larger than the maximum block
> size if it can use structured replies to keep each chunk of the
> response under the maximum payload limits.
> ---
>  doc/proto.md | 127 +++++++++++++++++++++++++++++----------------------
>  1 file changed, 73 insertions(+), 54 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 8f08583..53c334a 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -745,8 +745,8 @@ text unless the client insists on TLS.
> 
>  During transmission phase, several operations are constrained by the
>  export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`,
> -as well as by three block size constraints defined here (minimum,
> -preferred, and maximum).
> +as well as by three block size constraints defined here (minimum
> +block, preferred block, and maximum payload).
> 
>  If a client can honour server block size constraints (as set out below
>  and under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the
> @@ -772,15 +772,15 @@ learn the server's constraints without committing to them.
> 
>  If block size constraints have not been advertised or agreed on
>  externally, then a server SHOULD support a default minimum block size
> -of 1, a preferred block size of 2^12 (4,096), and a maximum block size
> -that is effectively unlimited (0xffffffff, or the export size if that
> -is smaller), while a client desiring maximum interoperability SHOULD
> -constrain its requests to a minimum block size of 2^9 (512), and limit
> -`NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of
> -2^25 (33,554,432).  A server that wants to enforce block sizes other
> -than the defaults specified here MAY refuse to go into transmission
> -phase with a client that uses `NBD_OPT_EXPORT_NAME` (via a hard
> -disconnect) or which uses `NBD_OPT_GO` without requesting
> +of 1, a preferred block size of 2^12 (4,096), and a maximum block
> +payload size that is at least 2^25 (33,554,432) (even if the export
> +size is smaller); while a client desiring maximum interoperability
> +SHOULD constrain its requests to a minimum block size of 2^9 (512),
> +and limit `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum
> +block size of 2^25 (33,554,432).  A server that wants to enforce block
> +sizes other than the defaults specified here MAY refuse to go into
> +transmission phase with a client that uses `NBD_OPT_EXPORT_NAME` (via
> +a hard disconnect) or which uses `NBD_OPT_GO` without requesting

This does more than what the commit message says: it also limits payload
size from 0xffffffff to 2^25. We already have a "A server that desires
maximum interoperability" clause that mentions the 2^25, so I'm not
entirely sure why we need to restrict that for the default cause.

Remember, apart from specifying how something should be implemented, the
spec also documents current and historic behavior. I am probably
convinced that it makes more sense to steer people towards limiting to
2^25, but it should be done in such a way that servers which accept a
0xffffffff block size are not suddenly noncompliant. I don't think this
does that.

[no further comments on this one]
-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS
  2022-11-14 22:46   ` [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
  2022-12-19 18:26     ` Vladimir Sementsov-Ogievskiy
@ 2023-02-22  9:49     ` Wouter Verhelst
  2023-03-03 22:36       ` Eric Blake
  1 sibling, 1 reply; 65+ messages in thread
From: Wouter Verhelst @ 2023-02-22  9:49 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd, qemu-devel, qemu-block, libguestfs

On Mon, Nov 14, 2022 at 04:46:52PM -0600, Eric Blake wrote:
[...]
> @@ -1370,9 +1475,10 @@ of the newstyle negotiation.
>      Return a list of `NBD_REP_META_CONTEXT` replies, one per context,
>      followed by an `NBD_REP_ACK` or an error.
> 
> -    This option SHOULD NOT be requested unless structured replies have
> -    been negotiated first. If a client attempts to do so, a server
> -    MAY send `NBD_REP_ERR_INVALID`.
> +    This option SHOULD NOT be requested unless structured replies or
> +    extended headers have been negotiated first. If a client attempts
> +    to do so, a server MAY send `NBD_REP_ERR_INVALID` or
> +    `NBD_REP_ERR_EXT_HEADER_REQD`.

Is it the intent that NBD_REP_ERR_EXT_HEADER_REQD means structured
replies are not supported by this server? I think that could be
clarified here.

(this occurs twice)

[...]
> +* `NBD_OPT_EXTENDED_HEADERS` (11)
> +
> +    The client wishes to use extended headers during the transmission
> +    phase.  The client MUST NOT send any additional data with the
> +    option, and the server SHOULD reject a request that includes data
> +    with `NBD_REP_ERR_INVALID`.
> +
> +    When successful, this option takes precedence over structured
> +    replies.  A client MAY request structured replies first, although
> +    a server SHOULD support this option even if structured replies are
> +    not negotiated.
> +
> +    It is envisioned that future extensions will add other new
> +    requests that support a data payload in the request or reply.  A
> +    server that supports such extensions SHOULD NOT advertise those
> +    extensions until the client has negotiated extended headers; and a
> +    client MUST NOT make use of those extensions without first
> +    enabling support for reply payloads.
> +
> +    The server replies with the following, or with an error permitted
> +    elsewhere in this document:
> +
> +    - `NBD_REP_ACK`: Extended headers have been negotiated; the client
> +      MUST use the 32-byte extended request header, with proper use of
> +      `NBD_CMD_FLAG_PAYLOAD_LEN` for all commands sending a payload;
> +      and the server MUST use the 32-byte extended reply header.
> +    - For backwards compatibility, clients SHOULD be prepared to also
> +      handle `NBD_REP_ERR_UNSUP`; in this case, only the compact
> +      transmission headers will be used.
> +
> +    Note that a response of `NBD_REP_ERR_BLOCK_SIZE_REQD` does not
> +    make sense in response to this command, but a server MAY fail with
> +    that error for a later `NBD_OPT_GO` without a client request for
> +    `NBD_INFO_BLOCK_SIZE`, since the use of extended headers provides
> +    more incentive for a client to promise to obey block size
> +    constraints.
> +
> +    If the client requests `NBD_OPT_STARTTLS` after this option, it
> +    MUST renegotiate extended headers.
> +

Does it make sense here to also forbid use of NBD_OPT_EXPORT_NAME? I
think the sooner we get rid of that, the better ;-)

[...]
> @@ -1746,13 +1914,15 @@ unrecognized flags.
> 
>  #### Structured reply types
> 
> -These values are used in the "type" field of a structured reply.
> -Some chunk types can additionally be categorized by role, such as
> -*error chunks* or *content chunks*.  Each type determines how to
> -interpret the "length" bytes of payload.  If the client receives
> -an unknown or unexpected type, other than an *error chunk*, it
> -MUST initiate a hard disconnect.  A server MUST NOT send a chunk
> -larger than any advertised maximum block payload size.
> +These values are used in the "type" field of a structured reply.  Some
> +chunk types can additionally be categorized by role, such as *error
> +chunks*, *content chunks*, or *status chunks*.  Each type determines
> +how to interpret the "length" bytes of payload.  If the client
> +receives an unknown or unexpected type, other than an *error chunk*,
> +it MAY initiate a hard disconnect on the grounds that the client is
> +uncertain whether the server handled the request as desired.  A server
> +MUST NOT send a chunk larger than any advertised maximum block payload
> +size.

Why do we make this a MAY rather than a MUST?

Also, should this section say "structured or extended reply"? We use the
same types for both.

[...]
> +* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)
> +
> +  This chunk type is in the status chunk category.  *length* MUST be
> +  8 + (a positive multiple of 16).  The semantics of this chunk mirror
> +  those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a
> +  larger *extent length* field, added padding in each descriptor to
> +  ease alignment, and the addition of a *descriptor count* field that
> +  can be used for easier client processing.  This chunk type MUST NOT
> +  be used unless extended headers were negotiated with
> +  `NBD_OPT_EXTENDED_HEADERS`.
> +
> +  If the *descriptor count* field contains 0, the number of subsequent
> +  descriptors is determined solely by the *length* field of the reply
> +  header.  However, the server MAY populate the *descriptor count*
> +  field with the number of descriptors that follow; when doing this,
> +  the server MUST ensure that the header *length* is equal to
> +  *descriptor count* * 16 + 8.
> +
> +  The payload starts with:
> +
> +  32 bits, metadata context ID  
> +  32 bits, descriptor count  
> +
> +  and is followed by a list of one or more descriptors, each with this
> +  layout:
> +
> +  64 bits, length of the extent to which the status below
> +     applies (unsigned, MUST be nonzero)  
> +  32 bits, padding (MUST be zero)  
> +  32 bits, status flags  
> +
> +  Note that even when extended headers are in use, the client MUST be
> +  prepared for the server to use either the compact or extended chunk
> +  type, regardless of whether the client's hinted effect length was
> +  more or less than 32 bits; but the server MUST use exactly one of
> +  the two chunk types per negotiated metacontext ID.

Is this last paragraph really a good idea? I would think it makes more
sense to require the new format if we're already required to support it
on both sides anyway.

[...]
> -    The list of block status descriptors within the
> -    `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
> -    of the file starting from specified *offset*.  If the client used

I know this was in the old text (hence me mentioning it here), but this
should probably say "export" rarher than "file". NBD does not deal
(conceptually) with files...

> -    the `NBD_CMD_FLAG_REQ_ONE` flag, each chunk contains exactly one
> -    descriptor where the *length* of the descriptor MUST NOT be
> -    greater than the *length* of the request; otherwise, a chunk MAY
> -    contain multiple descriptors, and the final descriptor MAY extend
> -    beyond the original requested size if the server can determine a
> -    larger length without additional effort.  On the other hand, the
> -    server MAY return less data than requested.  In particular, a
> -    server SHOULD NOT send more than 2^20 status descriptors in a
> -    single chunk.  However the server MUST return at least one status
> -    descriptor, and since each status descriptor has a non-zero
> -    length, a client can always make progress on a successful return.
> +    The list of block status descriptors within a given status chunk
> +    represent consecutive portions of the file starting from specified
> +    *offset*.  If the client used the `NBD_CMD_FLAG_REQ_ONE` flag,
> +    each chunk contains exactly one descriptor where the *length* of
> +    the descriptor MUST NOT be greater than the *length* of the
> +    request; otherwise, a chunk MAY contain multiple descriptors, and
> +    the final descriptor MAY extend beyond the original requested size
> +    if the server can determine a larger length without additional
> +    effort.  On the other hand, the server MAY return less data than
> +    requested.  In particular, a server SHOULD NOT send more than 2^20
> +    status descriptors in a single chunk.  However the server MUST
> +    return at least one status descriptor, and since each status
> +    descriptor has a non-zero length, a client can always make
> +    progress on a successful return.

Other than that, no comments on this one.

-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 5/6] spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD
  2022-11-14 22:46   ` [PATCH v2 5/6] spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD Eric Blake
@ 2023-02-22 10:05     ` Wouter Verhelst
  2023-03-03 22:40       ` Eric Blake
  0 siblings, 1 reply; 65+ messages in thread
From: Wouter Verhelst @ 2023-02-22 10:05 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd, qemu-devel, qemu-block, libguestfs

On Mon, Nov 14, 2022 at 04:46:54PM -0600, Eric Blake wrote:
>  #### Simple reply message
> 
> @@ -1232,6 +1235,19 @@ The field has the following format:
>    will be faster than a regular write). Clients MUST NOT set the
>    `NBD_CMD_FLAG_FAST_ZERO` request flag unless this transmission flag
>    is set.
> +- bit 12, `NBD_FLAG_BLOCK_STATUS_PAYLOAD`: Indicates that the server
> +  understands the use of the `NBD_CMD_FLAG_PAYLOAD_LEN` flag to
> +  `NBD_CMD_BLOCK_STATUS` to allow the client to request that the
> +  server filters its response to a specific subset of negotiated
> +  metacontext ids passed in via a client payload, rather than the
> +  default of replying to all metacontext ids. Servers MUST NOT
> +  advertise this bit unless the client successfully negotiates
> +  extended headers via `NBD_OPT_EXTENDED_HEADERS`, and SHOULD NOT
> +  advertise this bit in response to `NBD_OPT_EXPORT_NAME` or
> +  `NBD_OPT_GO` if the client does not negotiate metacontexts with
> +  `NBD_OPT_SET_META_CONTEXT`; clients SHOULD NOT set the
> +  `NBD_CMD_FLAG_PAYLOAD_LEN` flag for `NBD_CMD_BLOCK_STATUS` unless
> +  this transmission flag is set.

Given that we are introducing this at the same time as the extended
headers (and given that we're running out of available flags in this
16-bit field), isn't it better to make the ability to understand
PAYLOAD_LEN be implied by extended headers? Or is there a use case for
implementing extended headers but not a payload length on
CMD_BLOCK_STATUS that I'm missing?

>  Clients SHOULD ignore unknown flags.
> 
> @@ -1915,8 +1931,11 @@ valid may depend on negotiation during the handshake phase.
>    header.  With extended headers, the flag MUST be set for
>    `NBD_CMD_WRITE` (as the write command always sends a payload of the
>    bytes to be written); for other commands, the flag will trigger an
> -  `NBD_EINVAL` error unless the server has advertised support for an
> -  extension payload form for the command.
> +  `NBD_EINVAL` error unless the command documents an optional payload
> +  form for the command and the server has implemented that form (an
> +  example being `NBD_CMD_BLOCK_STATUS` providing a payload form for
> +  restricting the response to a particular metacontext id, when the
> +  server advertises `NBD_FLAG_BLOCK_STATUS_PAYLOAD`).
> 
>  ##### Structured reply flags
> 
> @@ -2464,6 +2483,23 @@ The following request types exist:
>      The server MAY send chunks in a different order than the context
>      ids were assigned in reply to `NBD_OPT_SET_META_CONTEXT`.
> 
> +    If extended headers were negotiated, a server MAY optionally
> +    advertise, via the transmission flag
> +    `NBD_FLAG_BLOCK_STATUS_PAYLOAD`, that it supports an alternative
> +    request form where the client sets `NBD_CMD_FLAG_PAYLOAD_LEN` in
> +    order to pass a payload that informs the server to limit its
> +    replies to the metacontext id(s) in the client's request payload,
> +    rather than giving an answer on all possible metacontext ids.  If
> +    the server does not support the payload form, or detects duplicate
> +    or unknown metacontext ids in the client's payload, the server
> +    MUST gracefully consume the client's payload before failing with
> +    `NBD_EINVAL`.  The payload form MUST occupy 8 + n*4 bytes, where n
> +    is the number of metacontext ids the client is interested in (as
> +    implied by the payload length), laid out as:
> +
> +    64 bits, effect length  
> +    n * 32 bits, list of metacontext ids to use  
> +
>      The list of block status descriptors within a given status chunk
>      represent consecutive portions of the file starting from specified
>      *offset*.  If the client used the `NBD_CMD_FLAG_REQ_ONE` flag,

-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  2022-12-16 19:32     ` Vladimir Sementsov-Ogievskiy
@ 2023-03-03 22:17       ` Eric Blake
  2023-03-05  8:41         ` Wouter Verhelst
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2023-03-03 22:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: nbd, qemu-devel, qemu-block, libguestfs

On Fri, Dec 16, 2022 at 10:32:01PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 11/15/22 01:46, Eric Blake wrote:
> > The spec was silent on how many extents a server could reply with.
> > However, both qemu and nbdkit (the two server implementations known to
> > have implemented the NBD_CMD_BLOCK_STATUS extension) implement a hard
> > cap, and will truncate the amount of extents in a reply to avoid
> > sending a client a reply so large that the client would treat it as a
> > denial of service attack.  Clients currently have no way during
> > negotiation to request such a limit of the server, so it is easier to
> > just document this as a restriction on viable server implementations
> > than to add yet another round of handshaking.  Also, mentioning
> > amplification effects is worthwhile.
> > 
> > When qemu first implemented NBD_CMD_BLOCK_STATUS for the
> > base:allocation context (qemu commit e7b1948d51, Mar 2018), it behaved
> > as if NBD_CMD_FLAG_REQ_ONE were always passed by the client, and never
> > responded with more than one extent.  Later, when adding its
> > qemu:dirty-bitmap:XYZ context extension (qemu commit 3d068aff16, Jun
> > 2018), it added a cap to 128k extents (1M+4 bytes), and that cap was
> > applied to base:allocation once qemu started sending multiple extents
> > for that context as well (qemu commit fb7afc797e, Jul 2018).  Qemu
> > extents are never smaller than 512 bytes (other than an exception at
> > the end of a file whose size is not aligned to 512), but even so, a
> > request for just under 4G of block status could produce 8M extents,
> > resulting in a reply of 64M if it were not capped smaller.
> > 
> > When nbdkit first implemented NBD_CMD_BLOCK_STATUS (nbdkit 4ca66f70a5,
> > Mar 2019), it did not impose any restriction on the number of extents
> > in the reply chunk.  But because it allows extents as small as one
> > byte, it is easy to write a server that can amplify a client's request
> > of status over 1M of the image into a reply over 8M in size, and it
> > was very easy to demonstrate that a hard cap was needed to avoid
> > crashing clients or otherwise killing the connection (a bad server
> > impacting the client negatively).  So nbdkit enforced a bound of 1M
> > extents (8M+4 bytes, nbdkit commit 6e0dc839ea, Jun 2019).  [Unrelated
> > to this patch, but worth noting for history: nbdkit's situation also
> > has to deal with the fact that it is designed for plugin server
> > implementations; and not capping the number of extents in a reply also
> > posed a problem to nbdkit as the server, where a plugin could exhaust
> > memory and kill the server, unrelated to any size constraints enforced
> > by a client.]
> > 
> > Since the limit chosen by these two implementations is different, and
> > since nbdkit has versions that were not limited, add this as a SHOULD
> > NOT instead of MUST NOT constraint on servers implementing block
> > status.  It does not matter that qemu picked a smaller limit that it
> > truncates to, since we have already documented that the server may
> > truncate for other reasons (such as it being inefficient to collect
> > that many extents in the first place).  But documenting the limit now
> > becomes even more important in the face of a future addition of 64-bit
> > requests, where a client's request is no longer bounded to 4G and
> > could thereby produce even more than 8M extents for the corner case
> > when every 512 bytes is a new extent, if it were not for this
> > recommendation.
> 
> s-o-b line missed.

I'm not sure if the NBD project has a strict policy on including one,
but I don't mind adding it.

> 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 2/6] spec: Tweak description of maximum block size
  2022-12-16 20:22     ` Vladimir Sementsov-Ogievskiy
@ 2023-03-03 22:20       ` Eric Blake
  0 siblings, 0 replies; 65+ messages in thread
From: Eric Blake @ 2023-03-03 22:20 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: nbd, qemu-devel, qemu-block, libguestfs

On Fri, Dec 16, 2022 at 11:22:49PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 11/15/22 01:46, Eric Blake wrote:
> > Commit 9f30fedb improved the spec to allow non-payload requests that
> > exceed any advertised maximum block size.  Take this one step further
> > by permitting the server to use NBD_EOVERFLOW as a hint to the client
> > when a request is oversize (while permitting NBD_EINVAL for
> > back-compat), and by rewording the text to explicitly call out that
> > what is being advertised is the maximum payload length, not maximum
> > block size.  This becomes more important when we add 64-bit
> > extensions, where it becomes possible to extend `NBD_CMD_BLOCK_STATUS`
> > to have both an effect length (how much of the image does the client
> > want status on - may be larger than 32 bits) and an optional payload
> > length (a way to filter the response to a subset of negotiated
> > metadata contexts).  In the shorter term, it means that a server may
> > (but not must) accept a read request larger than the maximum block
> > size if it can use structured replies to keep each chunk of the
> > response under the maximum payload limits.
> > ---
> >   doc/proto.md | 127 +++++++++++++++++++++++++++++----------------------
> >   1 file changed, 73 insertions(+), 54 deletions(-)
> > 
> > diff --git a/doc/proto.md b/doc/proto.md
> > index 8f08583..53c334a 100644
> > --- a/doc/proto.md
> > +++ b/doc/proto.md
> > @@ -745,8 +745,8 @@ text unless the client insists on TLS.
> > 
> >   During transmission phase, several operations are constrained by the
> >   export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`,
> > -as well as by three block size constraints defined here (minimum,
> > -preferred, and maximum).
> > +as well as by three block size constraints defined here (minimum
> > +block, preferred block, and maximum payload).
> > 
> >   If a client can honour server block size constraints (as set out below
> >   and under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the
> > @@ -772,15 +772,15 @@ learn the server's constraints without committing to them.
> > 
> >   If block size constraints have not been advertised or agreed on
> >   externally, then a server SHOULD support a default minimum block size
> > -of 1, a preferred block size of 2^12 (4,096), and a maximum block size
> > -that is effectively unlimited (0xffffffff, or the export size if that
> > -is smaller), while a client desiring maximum interoperability SHOULD
> > -constrain its requests to a minimum block size of 2^9 (512), and limit
> > -`NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of
> > -2^25 (33,554,432).  A server that wants to enforce block sizes other
> > -than the defaults specified here MAY refuse to go into transmission
> > -phase with a client that uses `NBD_OPT_EXPORT_NAME` (via a hard
> > -disconnect) or which uses `NBD_OPT_GO` without requesting
> > +of 1, a preferred block size of 2^12 (4,096), and a maximum block
> > +payload size that is at least 2^25 (33,554,432) (even if the export
> 
> I'm not sure about term "block payload size".. What's "block payload"? May be "reply payload size" or something like this?
> 
> Or should we simply say about simple-reply / structured-reply-chunk total size limit?

Yes, I could live with 'reply payload size limit'; a simple-reply
always has one payload, while a structured-reply can be divided across
multiple chunks where each chunk payload fits that limit.

> 
> > +size is smaller); while a client desiring maximum interoperability
> > +SHOULD constrain its requests to a minimum block size of 2^9 (512),
> > +and limit `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum
> > +block size of 2^25 (33,554,432).  A server that wants to enforce block
> > +sizes other than the defaults specified here MAY refuse to go into
> > +transmission phase with a client that uses `NBD_OPT_EXPORT_NAME` (via
> > +a hard disconnect) or which uses `NBD_OPT_GO` without requesting
> >   `NBD_INFO_BLOCK_SIZE` (via an error reply of
> >   `NBD_REP_ERR_BLOCK_SIZE_REQD`); but servers SHOULD NOT refuse clients
> >   that do not request sizing information when the server supports
> > @@ -818,17 +818,40 @@ the preferred block size for that export.  The server MAY advertise an
> >   export size that is not an integer multiple of the preferred block
> >   size.
> > 
> > -The maximum block size represents the maximum length that the server
> 
> The term "maximum block size" is not defined anymore (and removed from NBD_INFO_BLOCK_SIZE)...
> 
> > -is willing to handle in one request.  If advertised, it MAY be
> > -something other than a power of 2, but MUST be either an integer
> > -multiple of the minimum block size or the value 0xffffffff for no
> > -inherent limit, MUST be at least as large as the smaller of the
> > +The maximum block payload size represents the maximum payload length
> > +that the server is willing to handle in one request.  If advertised,
> > +it MAY be something other than a power of 2, but MUST be either an
> > +integer multiple of the minimum block size or the value 0xffffffff for
> > +no inherent limit, MUST be at least as large as the smaller of the
> >   preferred block size or export size, and SHOULD be at least 2^20
> >   (1,048,576) if the export is that large.  For convenience, the server
> > -MAY advertise a maximum block size that is larger than the export
> > -size, although in that case, the client MUST treat the export size as
> > -the effective maximum block size (as further constrained by a nonzero
> > -offset).
> > +MAY advertise a maximum block payload size that is larger than the
> > +export size, although in that case, the client MUST treat the export
> > +size as an effective maximum block size (as further constrained by a
> 
> ... but still used.

Hmm, I'll have to make sure I get my wording precise on v3.

> 
> 
> > +nonzero offset).  Notwithstanding any maximum block size advertised,
> > +either the server or the client MAY initiate a hard disconnect if a
> > +payload length of either a request or a reply would be large enough to
> > +be deemed a denial of service attack; however, for maximum
> > +portability, any payload *length* not exceeding 2^25 (33,554,432)
> > +bytes SHOULD NOT be considered a denial of service attack, even if
> > +that length is larger than the advertised maximum block payload size.
> > +
> > +For commands that require a payload in either direction and where the
> > +client controls the payload length (`NBD_CMD_WRITE`, or `NBD_CMD_READ`
> > +without structured replies), the client MUST NOT use a length larger
> > +than the maximum block size. For replies where the payload length is
> > +controlled by the server (`NBD_CMD_BLOCK_STATUS` without the flag
> > +`NBD_CMD_FLAG_REQ_ONE`, or `NBD_CMD_READ`, both when structured
> > +replies are negotiated), the server MUST NOT send an individual reply
> > +chunk larger than the maximum payload size, although it may split the
> > +overall reply among several chunks.  For commands that do not require
> > +a payload in either direction (such as `NBD_CMD_TRIM`), the client MAY
> > +request a length larger than the maximum block size; the server SHOULD
> > +NOT disconnect, but MAY reply with an `NBD_EOVERFLOW` or `NBD_EINVAL`
> > +error if the oversize request would require more server resources than
> > +the same command operating on only a maximum block size (such as some
> > +implementations of `NBD_CMD_WRITE_ZEROES` without the flag
> > +`NBD_CMD_FLAG_FAST_ZERO`, or `NBD_CMD_CACHE`).
> 
> [..]
> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 2/6] spec: Tweak description of maximum block size
  2023-02-21 15:21     ` Wouter Verhelst
@ 2023-03-03 22:26       ` Eric Blake
  2023-03-05  8:45         ` Wouter Verhelst
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2023-03-03 22:26 UTC (permalink / raw)
  To: Wouter Verhelst; +Cc: nbd, qemu-devel, qemu-block, libguestfs

On Tue, Feb 21, 2023 at 05:21:37PM +0200, Wouter Verhelst wrote:
> Hi Eric,
> 
> Busy days, busy times. Sorry about the insane delays here.

No problem; I've been tackling other things in the meantime too, so
this extension has taken far longer than I planned for more reasons
than just slow review time.

> 
> On Mon, Nov 14, 2022 at 04:46:51PM -0600, Eric Blake wrote:
> > Commit 9f30fedb improved the spec to allow non-payload requests that
> > exceed any advertised maximum block size.  Take this one step further
> > by permitting the server to use NBD_EOVERFLOW as a hint to the client
> > when a request is oversize (while permitting NBD_EINVAL for
> > back-compat), and by rewording the text to explicitly call out that
> > what is being advertised is the maximum payload length, not maximum
> > block size.  This becomes more important when we add 64-bit
> > extensions, where it becomes possible to extend `NBD_CMD_BLOCK_STATUS`
> > to have both an effect length (how much of the image does the client
> > want status on - may be larger than 32 bits) and an optional payload
> > length (a way to filter the response to a subset of negotiated
> > metadata contexts).  In the shorter term, it means that a server may
> > (but not must) accept a read request larger than the maximum block
> > size if it can use structured replies to keep each chunk of the
> > response under the maximum payload limits.
> > ---
> >  doc/proto.md | 127 +++++++++++++++++++++++++++++----------------------
> >  1 file changed, 73 insertions(+), 54 deletions(-)
> > 
> > diff --git a/doc/proto.md b/doc/proto.md
> > index 8f08583..53c334a 100644
> > --- a/doc/proto.md
> > +++ b/doc/proto.md
> > @@ -745,8 +745,8 @@ text unless the client insists on TLS.
> > 
> >  During transmission phase, several operations are constrained by the
> >  export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`,
> > -as well as by three block size constraints defined here (minimum,
> > -preferred, and maximum).
> > +as well as by three block size constraints defined here (minimum
> > +block, preferred block, and maximum payload).
> > 
> >  If a client can honour server block size constraints (as set out below
> >  and under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the
> > @@ -772,15 +772,15 @@ learn the server's constraints without committing to them.
> > 
> >  If block size constraints have not been advertised or agreed on
> >  externally, then a server SHOULD support a default minimum block size
> > -of 1, a preferred block size of 2^12 (4,096), and a maximum block size
> > -that is effectively unlimited (0xffffffff, or the export size if that
> > -is smaller), while a client desiring maximum interoperability SHOULD
> > -constrain its requests to a minimum block size of 2^9 (512), and limit
> > -`NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of
> > -2^25 (33,554,432).  A server that wants to enforce block sizes other
> > -than the defaults specified here MAY refuse to go into transmission
> > -phase with a client that uses `NBD_OPT_EXPORT_NAME` (via a hard
> > -disconnect) or which uses `NBD_OPT_GO` without requesting
> > +of 1, a preferred block size of 2^12 (4,096), and a maximum block
> > +payload size that is at least 2^25 (33,554,432) (even if the export
> > +size is smaller); while a client desiring maximum interoperability
> > +SHOULD constrain its requests to a minimum block size of 2^9 (512),
> > +and limit `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum
> > +block size of 2^25 (33,554,432).  A server that wants to enforce block
> > +sizes other than the defaults specified here MAY refuse to go into
> > +transmission phase with a client that uses `NBD_OPT_EXPORT_NAME` (via
> > +a hard disconnect) or which uses `NBD_OPT_GO` without requesting
> 
> This does more than what the commit message says: it also limits payload
> size from 0xffffffff to 2^25. We already have a "A server that desires
> maximum interoperability" clause that mentions the 2^25, so I'm not
> entirely sure why we need to restrict that for the default cause.
> 
> Remember, apart from specifying how something should be implemented, the
> spec also documents current and historic behavior. I am probably
> convinced that it makes more sense to steer people towards limiting to
> 2^25, but it should be done in such a way that servers which accept a
> 0xffffffff block size are not suddenly noncompliant. I don't think this
> does that.

I'll have to play more with the wording here.  A client shouldn't send
larger write requests to a server without knowing the server will
accept it (because traditional servers disconnect instead - the
alternative is to read the client's entire payload, and the larger the
payload is, the longer that takes - the client is starving the server
from serving other clients by making it chew through data).  But
sending larger read requests MAY be tolerated by a server that sends a
graceful error message ("you requested more than I'm willing to send,
but my error response is short so the connection can stay open"), even
if several historical implementations have also hung up at that point
(at least qemu implements a malloc() call for both read and writes;
reads populate the buffer from the client, writes populate the buffer
to send back to the client - the connection is aborted when the
malloc() is not attempted because it exceeds 2^25 bytes).

But I also see your point about not declaring a server non-compliant
merely because it allows for a larger payload, nor a client that sends
larger payloads to a known-happy server that accepts those payloads.
nbdkit historically chose 64M as its limits instead of qemu's 32M.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS
  2023-02-22  9:49     ` Wouter Verhelst
@ 2023-03-03 22:36       ` Eric Blake
  2023-03-05  8:49         ` Wouter Verhelst
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2023-03-03 22:36 UTC (permalink / raw)
  To: Wouter Verhelst; +Cc: nbd, qemu-devel, qemu-block, libguestfs

On Wed, Feb 22, 2023 at 11:49:18AM +0200, Wouter Verhelst wrote:
> On Mon, Nov 14, 2022 at 04:46:52PM -0600, Eric Blake wrote:
> [...]
> > @@ -1370,9 +1475,10 @@ of the newstyle negotiation.
> >      Return a list of `NBD_REP_META_CONTEXT` replies, one per context,
> >      followed by an `NBD_REP_ACK` or an error.
> > 
> > -    This option SHOULD NOT be requested unless structured replies have
> > -    been negotiated first. If a client attempts to do so, a server
> > -    MAY send `NBD_REP_ERR_INVALID`.
> > +    This option SHOULD NOT be requested unless structured replies or
> > +    extended headers have been negotiated first. If a client attempts
> > +    to do so, a server MAY send `NBD_REP_ERR_INVALID` or
> > +    `NBD_REP_ERR_EXT_HEADER_REQD`.
> 
> Is it the intent that NBD_REP_ERR_EXT_HEADER_REQD means structured
> replies are not supported by this server? I think that could be
> clarified here.

My intent here was that a newer server that ONLY wants to talk to
clients that understand extended headers can use
NBD_REP_ERR_EXT_HEADER_REQD as its error message to clients that have
not requested extended headers yet.  Older clients may not know what
that error code means, but our spec already has reasonable constraints
that the client should at least recognize it as an error code.

That way, a server only has to implement extended headers, rather than
maintaining the structured header code in parallel.

> 
> (this occurs twice)
> 
> [...]
> > +* `NBD_OPT_EXTENDED_HEADERS` (11)
> > +
> > +    The client wishes to use extended headers during the transmission
> > +    phase.  The client MUST NOT send any additional data with the
> > +    option, and the server SHOULD reject a request that includes data
> > +    with `NBD_REP_ERR_INVALID`.
> > +
> > +    When successful, this option takes precedence over structured
> > +    replies.  A client MAY request structured replies first, although
> > +    a server SHOULD support this option even if structured replies are
> > +    not negotiated.
> > +
> > +    It is envisioned that future extensions will add other new
> > +    requests that support a data payload in the request or reply.  A
> > +    server that supports such extensions SHOULD NOT advertise those
> > +    extensions until the client has negotiated extended headers; and a
> > +    client MUST NOT make use of those extensions without first
> > +    enabling support for reply payloads.
> > +
> > +    The server replies with the following, or with an error permitted
> > +    elsewhere in this document:
> > +
> > +    - `NBD_REP_ACK`: Extended headers have been negotiated; the client
> > +      MUST use the 32-byte extended request header, with proper use of
> > +      `NBD_CMD_FLAG_PAYLOAD_LEN` for all commands sending a payload;
> > +      and the server MUST use the 32-byte extended reply header.
> > +    - For backwards compatibility, clients SHOULD be prepared to also
> > +      handle `NBD_REP_ERR_UNSUP`; in this case, only the compact
> > +      transmission headers will be used.
> > +
> > +    Note that a response of `NBD_REP_ERR_BLOCK_SIZE_REQD` does not
> > +    make sense in response to this command, but a server MAY fail with
> > +    that error for a later `NBD_OPT_GO` without a client request for
> > +    `NBD_INFO_BLOCK_SIZE`, since the use of extended headers provides
> > +    more incentive for a client to promise to obey block size
> > +    constraints.
> > +
> > +    If the client requests `NBD_OPT_STARTTLS` after this option, it
> > +    MUST renegotiate extended headers.
> > +
> 
> Does it make sense here to also forbid use of NBD_OPT_EXPORT_NAME? I
> think the sooner we get rid of that, the better ;-)

I hadn't thought of that, but it does indeed sound desirable.  I can
touch up the spec to state that NBD_OPT_EXPORT_NAME MUST NOT be used
by a client that requested extended headers.

> 
> [...]
> > @@ -1746,13 +1914,15 @@ unrecognized flags.
> > 
> >  #### Structured reply types
> > 
> > -These values are used in the "type" field of a structured reply.
> > -Some chunk types can additionally be categorized by role, such as
> > -*error chunks* or *content chunks*.  Each type determines how to
> > -interpret the "length" bytes of payload.  If the client receives
> > -an unknown or unexpected type, other than an *error chunk*, it
> > -MUST initiate a hard disconnect.  A server MUST NOT send a chunk
> > -larger than any advertised maximum block payload size.
> > +These values are used in the "type" field of a structured reply.  Some
> > +chunk types can additionally be categorized by role, such as *error
> > +chunks*, *content chunks*, or *status chunks*.  Each type determines
> > +how to interpret the "length" bytes of payload.  If the client
> > +receives an unknown or unexpected type, other than an *error chunk*,
> > +it MAY initiate a hard disconnect on the grounds that the client is
> > +uncertain whether the server handled the request as desired.  A server
> > +MUST NOT send a chunk larger than any advertised maximum block payload
> > +size.
> 
> Why do we make this a MAY rather than a MUST?

Hmm, now I'm trying to remember why I relaxed this from MUST to MAY on
the client side.  Keeping it at MUST makes sense, because a
well-formed server won't be sending unknown reply types.

> 
> Also, should this section say "structured or extended reply"? We use the
> same types for both.

Makes sense.

> 
> [...]
> > +* `NBD_REPLY_TYPE_BLOCK_STATUS_EXT` (6)
> > +
> > +  This chunk type is in the status chunk category.  *length* MUST be
> > +  8 + (a positive multiple of 16).  The semantics of this chunk mirror
> > +  those of `NBD_REPLY_TYPE_BLOCK_STATUS`, other than the use of a
> > +  larger *extent length* field, added padding in each descriptor to
> > +  ease alignment, and the addition of a *descriptor count* field that
> > +  can be used for easier client processing.  This chunk type MUST NOT
> > +  be used unless extended headers were negotiated with
> > +  `NBD_OPT_EXTENDED_HEADERS`.
> > +
> > +  If the *descriptor count* field contains 0, the number of subsequent
> > +  descriptors is determined solely by the *length* field of the reply
> > +  header.  However, the server MAY populate the *descriptor count*
> > +  field with the number of descriptors that follow; when doing this,
> > +  the server MUST ensure that the header *length* is equal to
> > +  *descriptor count* * 16 + 8.
> > +
> > +  The payload starts with:
> > +
> > +  32 bits, metadata context ID  
> > +  32 bits, descriptor count  
> > +
> > +  and is followed by a list of one or more descriptors, each with this
> > +  layout:
> > +
> > +  64 bits, length of the extent to which the status below
> > +     applies (unsigned, MUST be nonzero)  
> > +  32 bits, padding (MUST be zero)  
> > +  32 bits, status flags  
> > +
> > +  Note that even when extended headers are in use, the client MUST be
> > +  prepared for the server to use either the compact or extended chunk
> > +  type, regardless of whether the client's hinted effect length was
> > +  more or less than 32 bits; but the server MUST use exactly one of
> > +  the two chunk types per negotiated metacontext ID.
> 
> Is this last paragraph really a good idea? I would think it makes more
> sense to require the new format if we're already required to support it
> on both sides anyway.

My proof of implementation was easier to code when I didn't have to
resize the block status reply sizing in the same patch as adding the
64-bit headers.  But if you think requiring the 64-bit reply type
always (and forbidding the 32-bit reply) when extended headers are in
force, that's also possible.
> 
> [...]
> > -    The list of block status descriptors within the
> > -    `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
> > -    of the file starting from specified *offset*.  If the client used
> 
> I know this was in the old text (hence me mentioning it here), but this
> should probably say "export" rarher than "file". NBD does not deal
> (conceptually) with files...

Agreed - we have been fairly consistent on the term 'export' (whether
that export be a file or some other source of data).  Will fix
(perhaps separately and just push it as trivial).

> 
> > -    the `NBD_CMD_FLAG_REQ_ONE` flag, each chunk contains exactly one
> > -    descriptor where the *length* of the descriptor MUST NOT be
> > -    greater than the *length* of the request; otherwise, a chunk MAY
> > -    contain multiple descriptors, and the final descriptor MAY extend
> > -    beyond the original requested size if the server can determine a
> > -    larger length without additional effort.  On the other hand, the
> > -    server MAY return less data than requested.  In particular, a
> > -    server SHOULD NOT send more than 2^20 status descriptors in a
> > -    single chunk.  However the server MUST return at least one status
> > -    descriptor, and since each status descriptor has a non-zero
> > -    length, a client can always make progress on a successful return.
> > +    The list of block status descriptors within a given status chunk
> > +    represent consecutive portions of the file starting from specified
> > +    *offset*.  If the client used the `NBD_CMD_FLAG_REQ_ONE` flag,
> > +    each chunk contains exactly one descriptor where the *length* of
> > +    the descriptor MUST NOT be greater than the *length* of the
> > +    request; otherwise, a chunk MAY contain multiple descriptors, and
> > +    the final descriptor MAY extend beyond the original requested size
> > +    if the server can determine a larger length without additional
> > +    effort.  On the other hand, the server MAY return less data than
> > +    requested.  In particular, a server SHOULD NOT send more than 2^20
> > +    status descriptors in a single chunk.  However the server MUST
> > +    return at least one status descriptor, and since each status
> > +    descriptor has a non-zero length, a client can always make
> > +    progress on a successful return.
> 
> Other than that, no comments on this one.
> 
> -- 
>      w@uter.{be,co.za}
> wouter@{grep.be,fosdem.org,debian.org}
> 
> I will have a Tin-Actinium-Potassium mixture, thanks.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 5/6] spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD
  2023-02-22 10:05     ` Wouter Verhelst
@ 2023-03-03 22:40       ` Eric Blake
  2023-03-05  8:50         ` Wouter Verhelst
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2023-03-03 22:40 UTC (permalink / raw)
  To: Wouter Verhelst; +Cc: nbd, qemu-devel, qemu-block, libguestfs

On Wed, Feb 22, 2023 at 12:05:44PM +0200, Wouter Verhelst wrote:
> On Mon, Nov 14, 2022 at 04:46:54PM -0600, Eric Blake wrote:
> >  #### Simple reply message
> > 
> > @@ -1232,6 +1235,19 @@ The field has the following format:
> >    will be faster than a regular write). Clients MUST NOT set the
> >    `NBD_CMD_FLAG_FAST_ZERO` request flag unless this transmission flag
> >    is set.
> > +- bit 12, `NBD_FLAG_BLOCK_STATUS_PAYLOAD`: Indicates that the server
> > +  understands the use of the `NBD_CMD_FLAG_PAYLOAD_LEN` flag to
> > +  `NBD_CMD_BLOCK_STATUS` to allow the client to request that the
> > +  server filters its response to a specific subset of negotiated
> > +  metacontext ids passed in via a client payload, rather than the
> > +  default of replying to all metacontext ids. Servers MUST NOT
> > +  advertise this bit unless the client successfully negotiates
> > +  extended headers via `NBD_OPT_EXTENDED_HEADERS`, and SHOULD NOT
> > +  advertise this bit in response to `NBD_OPT_EXPORT_NAME` or
> > +  `NBD_OPT_GO` if the client does not negotiate metacontexts with
> > +  `NBD_OPT_SET_META_CONTEXT`; clients SHOULD NOT set the
> > +  `NBD_CMD_FLAG_PAYLOAD_LEN` flag for `NBD_CMD_BLOCK_STATUS` unless
> > +  this transmission flag is set.
> 
> Given that we are introducing this at the same time as the extended
> headers (and given that we're running out of available flags in this
> 16-bit field), isn't it better to make the ability to understand
> PAYLOAD_LEN be implied by extended headers? Or is there a use case for
> implementing extended headers but not a payload length on
> CMD_BLOCK_STATUS that I'm missing?

In my proof of implementation, this was a distinct feature addition on
top of 64-bit headers.

It is easy to write a server that does not want to deal with a client
payload on a block status request (for example, a server that only
knows about one metacontext, and therefore never needs to deal with a
client wanting a subset of negotiated metacontexts).  Forcing a server
to have to support a client-side filtering request on block status
commands, merely because the server now supports 64-bit lengths,
seemed like a stretch too far, which is why I decided to use a feature
bit for this one.


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  2023-03-03 22:17       ` Eric Blake
@ 2023-03-05  8:41         ` Wouter Verhelst
  2023-03-06  8:48           ` [Libguestfs] " Laszlo Ersek
  2023-03-06 13:48           ` Nir Soffer
  0 siblings, 2 replies; 65+ messages in thread
From: Wouter Verhelst @ 2023-03-05  8:41 UTC (permalink / raw)
  To: Eric Blake
  Cc: Vladimir Sementsov-Ogievskiy, nbd, qemu-devel, qemu-block, libguestfs

On Fri, Mar 03, 2023 at 04:17:40PM -0600, Eric Blake wrote:
> On Fri, Dec 16, 2022 at 10:32:01PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > s-o-b line missed.
> 
> I'm not sure if the NBD project has a strict policy on including one,
> but I don't mind adding it.

I've never required it, mostly because it's something that I myself
always forget, too, so, *shrug*.

(if there were a way in git to make it add that automatically, that
would help; I've looked but haven't found it)

-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 2/6] spec: Tweak description of maximum block size
  2023-03-03 22:26       ` Eric Blake
@ 2023-03-05  8:45         ` Wouter Verhelst
  0 siblings, 0 replies; 65+ messages in thread
From: Wouter Verhelst @ 2023-03-05  8:45 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd, qemu-devel, qemu-block, libguestfs

On Fri, Mar 03, 2023 at 04:26:53PM -0600, Eric Blake wrote:
> On Tue, Feb 21, 2023 at 05:21:37PM +0200, Wouter Verhelst wrote:
> > Hi Eric,
> > 
> > Busy days, busy times. Sorry about the insane delays here.
> 
> No problem; I've been tackling other things in the meantime too, so
> this extension has taken far longer than I planned for more reasons
> than just slow review time.
> 
> > 
> > On Mon, Nov 14, 2022 at 04:46:51PM -0600, Eric Blake wrote:
> > > Commit 9f30fedb improved the spec to allow non-payload requests that
> > > exceed any advertised maximum block size.  Take this one step further
> > > by permitting the server to use NBD_EOVERFLOW as a hint to the client
> > > when a request is oversize (while permitting NBD_EINVAL for
> > > back-compat), and by rewording the text to explicitly call out that
> > > what is being advertised is the maximum payload length, not maximum
> > > block size.  This becomes more important when we add 64-bit
> > > extensions, where it becomes possible to extend `NBD_CMD_BLOCK_STATUS`
> > > to have both an effect length (how much of the image does the client
> > > want status on - may be larger than 32 bits) and an optional payload
> > > length (a way to filter the response to a subset of negotiated
> > > metadata contexts).  In the shorter term, it means that a server may
> > > (but not must) accept a read request larger than the maximum block
> > > size if it can use structured replies to keep each chunk of the
> > > response under the maximum payload limits.
> > > ---
> > >  doc/proto.md | 127 +++++++++++++++++++++++++++++----------------------
> > >  1 file changed, 73 insertions(+), 54 deletions(-)
> > > 
> > > diff --git a/doc/proto.md b/doc/proto.md
> > > index 8f08583..53c334a 100644
> > > --- a/doc/proto.md
> > > +++ b/doc/proto.md
> > > @@ -745,8 +745,8 @@ text unless the client insists on TLS.
> > > 
> > >  During transmission phase, several operations are constrained by the
> > >  export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`,
> > > -as well as by three block size constraints defined here (minimum,
> > > -preferred, and maximum).
> > > +as well as by three block size constraints defined here (minimum
> > > +block, preferred block, and maximum payload).
> > > 
> > >  If a client can honour server block size constraints (as set out below
> > >  and under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the
> > > @@ -772,15 +772,15 @@ learn the server's constraints without committing to them.
> > > 
> > >  If block size constraints have not been advertised or agreed on
> > >  externally, then a server SHOULD support a default minimum block size
> > > -of 1, a preferred block size of 2^12 (4,096), and a maximum block size
> > > -that is effectively unlimited (0xffffffff, or the export size if that
> > > -is smaller), while a client desiring maximum interoperability SHOULD
> > > -constrain its requests to a minimum block size of 2^9 (512), and limit
> > > -`NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of
> > > -2^25 (33,554,432).  A server that wants to enforce block sizes other
> > > -than the defaults specified here MAY refuse to go into transmission
> > > -phase with a client that uses `NBD_OPT_EXPORT_NAME` (via a hard
> > > -disconnect) or which uses `NBD_OPT_GO` without requesting
> > > +of 1, a preferred block size of 2^12 (4,096), and a maximum block
> > > +payload size that is at least 2^25 (33,554,432) (even if the export
> > > +size is smaller); while a client desiring maximum interoperability
> > > +SHOULD constrain its requests to a minimum block size of 2^9 (512),
> > > +and limit `NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum
> > > +block size of 2^25 (33,554,432).  A server that wants to enforce block
> > > +sizes other than the defaults specified here MAY refuse to go into
> > > +transmission phase with a client that uses `NBD_OPT_EXPORT_NAME` (via
> > > +a hard disconnect) or which uses `NBD_OPT_GO` without requesting
> > 
> > This does more than what the commit message says: it also limits payload
> > size from 0xffffffff to 2^25. We already have a "A server that desires
> > maximum interoperability" clause that mentions the 2^25, so I'm not
> > entirely sure why we need to restrict that for the default cause.
> > 
> > Remember, apart from specifying how something should be implemented, the
> > spec also documents current and historic behavior. I am probably
> > convinced that it makes more sense to steer people towards limiting to
> > 2^25, but it should be done in such a way that servers which accept a
> > 0xffffffff block size are not suddenly noncompliant. I don't think this
> > does that.
> 
> I'll have to play more with the wording here.  A client shouldn't send
> larger write requests to a server without knowing the server will
> accept it (because traditional servers disconnect instead - the
> alternative is to read the client's entire payload, and the larger the
> payload is, the longer that takes - the client is starving the server
> from serving other clients by making it chew through data).

Well, in the case of a preforking server (like nbd-server), that isn't
really the case, the client is instead only starving itself. But yeah.

>  But sending larger read requests MAY be tolerated by a server that
>  sends a graceful error message ("you requested more than I'm willing
>  to send, but my error response is short so the connection can stay
>  open"), even if several historical implementations have also hung up
>  at that point (at least qemu implements a malloc() call for both read
>  and writes; reads populate the buffer from the client, writes
>  populate the buffer to send back to the client - the connection is
>  aborted when the malloc() is not attempted because it exceeds 2^25
>  bytes).

Yeah, nbd-server does that too, except for the abortion. At least intentionally
so; again, since it's a preforking server, the worst that can happen here is
that it gets an ENOMEM, at which point it will just die, which causes the
client to lose its connection and nothing else.

> But I also see your point about not declaring a server non-compliant
> merely because it allows for a larger payload, nor a client that sends
> larger payloads to a known-happy server that accepts those payloads.
> nbdkit historically chose 64M as its limits instead of qemu's 32M.

Right, thanks.

-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS
  2023-03-03 22:36       ` Eric Blake
@ 2023-03-05  8:49         ` Wouter Verhelst
  0 siblings, 0 replies; 65+ messages in thread
From: Wouter Verhelst @ 2023-03-05  8:49 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd, qemu-devel, qemu-block, libguestfs

On Fri, Mar 03, 2023 at 04:36:41PM -0600, Eric Blake wrote:
> On Wed, Feb 22, 2023 at 11:49:18AM +0200, Wouter Verhelst wrote:
> > On Mon, Nov 14, 2022 at 04:46:52PM -0600, Eric Blake wrote:
[...]
> > > +  Note that even when extended headers are in use, the client MUST be
> > > +  prepared for the server to use either the compact or extended chunk
> > > +  type, regardless of whether the client's hinted effect length was
> > > +  more or less than 32 bits; but the server MUST use exactly one of
> > > +  the two chunk types per negotiated metacontext ID.
> > 
> > Is this last paragraph really a good idea? I would think it makes more
> > sense to require the new format if we're already required to support it
> > on both sides anyway.
> 
> My proof of implementation was easier to code when I didn't have to
> resize the block status reply sizing in the same patch as adding the
> 64-bit headers.  But if you think requiring the 64-bit reply type
> always (and forbidding the 32-bit reply) when extended headers are in
> force, that's also possible.

Intuitively, this sounds off. It would seem to me that it's easier to do
if you don't have to have a conditional on each received data packet.
But maybe I'm missing something -- I haven't done an implementation yet,
anyway.

-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 5/6] spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD
  2023-03-03 22:40       ` Eric Blake
@ 2023-03-05  8:50         ` Wouter Verhelst
  0 siblings, 0 replies; 65+ messages in thread
From: Wouter Verhelst @ 2023-03-05  8:50 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd, qemu-devel, qemu-block, libguestfs

On Fri, Mar 03, 2023 at 04:40:38PM -0600, Eric Blake wrote:
> On Wed, Feb 22, 2023 at 12:05:44PM +0200, Wouter Verhelst wrote:
> > On Mon, Nov 14, 2022 at 04:46:54PM -0600, Eric Blake wrote:
> > >  #### Simple reply message
> > > 
> > > @@ -1232,6 +1235,19 @@ The field has the following format:
> > >    will be faster than a regular write). Clients MUST NOT set the
> > >    `NBD_CMD_FLAG_FAST_ZERO` request flag unless this transmission flag
> > >    is set.
> > > +- bit 12, `NBD_FLAG_BLOCK_STATUS_PAYLOAD`: Indicates that the server
> > > +  understands the use of the `NBD_CMD_FLAG_PAYLOAD_LEN` flag to
> > > +  `NBD_CMD_BLOCK_STATUS` to allow the client to request that the
> > > +  server filters its response to a specific subset of negotiated
> > > +  metacontext ids passed in via a client payload, rather than the
> > > +  default of replying to all metacontext ids. Servers MUST NOT
> > > +  advertise this bit unless the client successfully negotiates
> > > +  extended headers via `NBD_OPT_EXTENDED_HEADERS`, and SHOULD NOT
> > > +  advertise this bit in response to `NBD_OPT_EXPORT_NAME` or
> > > +  `NBD_OPT_GO` if the client does not negotiate metacontexts with
> > > +  `NBD_OPT_SET_META_CONTEXT`; clients SHOULD NOT set the
> > > +  `NBD_CMD_FLAG_PAYLOAD_LEN` flag for `NBD_CMD_BLOCK_STATUS` unless
> > > +  this transmission flag is set.
> > 
> > Given that we are introducing this at the same time as the extended
> > headers (and given that we're running out of available flags in this
> > 16-bit field), isn't it better to make the ability to understand
> > PAYLOAD_LEN be implied by extended headers? Or is there a use case for
> > implementing extended headers but not a payload length on
> > CMD_BLOCK_STATUS that I'm missing?
> 
> In my proof of implementation, this was a distinct feature addition on
> top of 64-bit headers.
> 
> It is easy to write a server that does not want to deal with a client
> payload on a block status request (for example, a server that only
> knows about one metacontext, and therefore never needs to deal with a
> client wanting a subset of negotiated metacontexts).  Forcing a server
> to have to support a client-side filtering request on block status
> commands, merely because the server now supports 64-bit lengths,
> seemed like a stretch too far, which is why I decided to use a feature
> bit for this one.

Okay, yes, that makes sense. Thanks.

-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Libguestfs] [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  2023-03-05  8:41         ` Wouter Verhelst
@ 2023-03-06  8:48           ` Laszlo Ersek
  2023-03-06 13:48           ` Nir Soffer
  1 sibling, 0 replies; 65+ messages in thread
From: Laszlo Ersek @ 2023-03-06  8:48 UTC (permalink / raw)
  To: Wouter Verhelst, Eric Blake
  Cc: libguestfs, nbd, qemu-devel, qemu-block, Vladimir Sementsov-Ogievskiy

On 3/5/23 09:41, Wouter Verhelst wrote:
> On Fri, Mar 03, 2023 at 04:17:40PM -0600, Eric Blake wrote:
>> On Fri, Dec 16, 2022 at 10:32:01PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> s-o-b line missed.
>>
>> I'm not sure if the NBD project has a strict policy on including one,
>> but I don't mind adding it.
> 
> I've never required it, mostly because it's something that I myself
> always forget, too, so, *shrug*.
> 
> (if there were a way in git to make it add that automatically, that
> would help; I've looked but haven't found it)
> 

You can point the "commit.template" git-config knob to a particular
file, and then include your S-o-b in that file.

There is also the "-s" ("--signoff") option for git-commit, but I don't
see a git-config knob to make that permanent. (You can always introduce
a git.alias for it though.)

Laszlo



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Libguestfs] [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  2023-03-05  8:41         ` Wouter Verhelst
  2023-03-06  8:48           ` [Libguestfs] " Laszlo Ersek
@ 2023-03-06 13:48           ` Nir Soffer
  1 sibling, 0 replies; 65+ messages in thread
From: Nir Soffer @ 2023-03-06 13:48 UTC (permalink / raw)
  To: Wouter Verhelst
  Cc: Eric Blake, libguestfs, nbd, qemu-devel, qemu-block,
	Vladimir Sementsov-Ogievskiy

On Sun, Mar 5, 2023 at 10:42 AM Wouter Verhelst <w@uter.be> wrote:
>
> On Fri, Mar 03, 2023 at 04:17:40PM -0600, Eric Blake wrote:
> > On Fri, Dec 16, 2022 at 10:32:01PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > s-o-b line missed.
> >
> > I'm not sure if the NBD project has a strict policy on including one,
> > but I don't mind adding it.
>
> I've never required it, mostly because it's something that I myself
> always forget, too, so, *shrug*.
>
> (if there were a way in git to make it add that automatically, that
> would help; I've looked but haven't found it)

What I'm using in all projects that require signed-off-by is:

$ cat .git/hooks/commit-msg
#!/bin/sh

# Add Signed-off-by trailer.
sob=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p')
git interpret-trailers --in-place --trailer "$sob" "$1"

You can also use a pre-commit hook but the commit-msg hook is more
convenient.

And in github you can add the DCO application to the project:
https://github.com/apps/dco

Once installed it will check that all commits are signed off, and
provide helpful error
messages to contributors.

Nir



^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2023-03-06 13:49 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-14 22:41 [cross-project PATCH v2] NBD 64-bit extensions Eric Blake
2022-11-14 22:46 ` [PATCH v2 0/6] NBD spec changes for " Eric Blake
2022-11-14 22:46   ` [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length Eric Blake
2022-12-16 19:32     ` Vladimir Sementsov-Ogievskiy
2023-03-03 22:17       ` Eric Blake
2023-03-05  8:41         ` Wouter Verhelst
2023-03-06  8:48           ` [Libguestfs] " Laszlo Ersek
2023-03-06 13:48           ` Nir Soffer
2022-11-14 22:46   ` [PATCH v2 2/6] spec: Tweak description of maximum block size Eric Blake
2022-12-16 20:22     ` Vladimir Sementsov-Ogievskiy
2023-03-03 22:20       ` Eric Blake
2023-02-21 15:21     ` Wouter Verhelst
2023-03-03 22:26       ` Eric Blake
2023-03-05  8:45         ` Wouter Verhelst
2022-11-14 22:46   ` [PATCH v2 3/6] spec: Add NBD_OPT_EXTENDED_HEADERS Eric Blake
2022-12-19 18:26     ` Vladimir Sementsov-Ogievskiy
2023-02-22  9:49     ` Wouter Verhelst
2023-03-03 22:36       ` Eric Blake
2023-03-05  8:49         ` Wouter Verhelst
2022-11-14 22:46   ` [PATCH v2 4/6] spec: Allow 64-bit block status results Eric Blake
2022-11-14 22:46   ` [PATCH v2 5/6] spec: Introduce NBD_FLAG_BLOCK_STATUS_PAYLOAD Eric Blake
2023-02-22 10:05     ` Wouter Verhelst
2023-03-03 22:40       ` Eric Blake
2023-03-05  8:50         ` Wouter Verhelst
2022-11-14 22:46   ` [PATCH v2 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT Eric Blake
2022-11-14 22:48 ` [PATCH v2 00/15] qemu patches for 64-bit NBD extensions Eric Blake
2022-11-14 22:48   ` [PATCH v2 01/15] nbd/client: Add safety check on chunk payload length Eric Blake
2022-11-14 22:48   ` [PATCH v2 02/15] nbd/server: Prepare for alternate-size headers Eric Blake
2022-11-14 22:48   ` [PATCH v2 03/15] nbd: Prepare for 64-bit request effect lengths Eric Blake
2022-11-14 22:48   ` [PATCH v2 04/15] nbd: Add types for extended headers Eric Blake
2022-11-14 22:48   ` [PATCH v2 05/15] nbd/server: Refactor handling of request payload Eric Blake
2022-11-14 22:48   ` [PATCH v2 06/15] nbd/server: Refactor to pass full request around Eric Blake
2022-11-14 22:48   ` [PATCH v2 07/15] nbd/server: Initial support for extended headers Eric Blake
2022-11-14 22:48   ` [PATCH v2 08/15] nbd/server: Support 64-bit block status Eric Blake
2022-11-14 22:48   ` [PATCH v2 09/15] nbd/client: Initial support for extended headers Eric Blake
2022-11-14 22:48   ` [PATCH v2 10/15] nbd/client: Accept 64-bit block status chunks Eric Blake
2022-11-14 22:48   ` [PATCH v2 11/15] nbd/client: Request extended headers during negotiation Eric Blake
2022-11-14 22:48   ` [PATCH v2 12/15] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
2022-11-14 22:48   ` [PATCH v2 13/15] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
2022-11-14 22:48   ` [PATCH v2 14/15] RFC: nbd/client: Accept 64-bit hole chunks Eric Blake
2022-11-14 22:48   ` [PATCH v2 15/15] RFC: nbd/server: Send 64-bit hole chunk Eric Blake
2022-11-14 22:51 ` [libnbd PATCH v2 00/23] libnbd 64-bit NBD extensions Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 01/23] block_status: Refactor array storage Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 02/23] internal: Refactor layout of replies in sbuf Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 03/23] protocol: Add definitions for extended headers Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 04/23] states: Prepare to send 64-bit requests Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 05/23] states: Prepare to receive 64-bit replies Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 06/23] states: Break deadlock if server goofs on extended replies Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 07/23] generator: Add struct nbd_extent in prep for 64-bit extents Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 08/23] block_status: Track 64-bit extents internally Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 09/23] block_status: Accept 64-bit extents during block status Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 10/23] api: Add [aio_]nbd_block_status_64 Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 11/23] api: Add several functions for controlling extended headers Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 12/23] copy: Update nbdcopy to use 64-bit block status Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 13/23] dump: Update nbddump " Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 14/23] info: Expose extended-headers support through nbdinfo Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 15/23] info: Update nbdinfo --map to use 64-bit block status Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 16/23] examples: Update copy-libev " Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 17/23] ocaml: Add example for 64-bit extents Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 18/23] generator: Actually request extended headers Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 19/23] api: Add nbd_[aio_]opt_extended_headers() Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 20/23] interop: Add test of 64-bit block status Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 21/23] api: Add nbd_can_block_status_payload() Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 22/23] api: Add nbd_[aio_]block_status_filter() Eric Blake
2022-11-14 22:51   ` [libnbd PATCH v2 23/23] RFC: pread: Accept 64-bit holes Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.