All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions
@ 2016-04-08 22:05 Eric Blake
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 01/18] nbd: Don't kill server on client that doesn't request TLS Eric Blake
                   ` (18 more replies)
  0 siblings, 19 replies; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, nbd-general

This series is for qemu 2.7, and will probably need some rework
especially since some of it is trying to implement features
that are still marked experimental in upstream NBD.

Included are some interoperability bug fixes, code cleanups, then
added support both client-side and server-side for:
NBD_FLAG_C_NO_ZEROES
NBD_CMD_WRITE_ZEROES
NBD_CMD_CLOSE
NBD_OPT_INFO
NBD_OPT_GO

Still to come:
improvements to NBD_CMD_WRITE_ZEROES
support for NBD_OPT_STRUCTURED_REPLY
strawman implementations to help with discussions towards
 NBD_CMD_BLOCK_STATUS
proposal I'm still working up to teach NBD servers to advertise
 minimum/preferred/maximum block sizes

This posting is tied to this particular version of the NBD protocol:
https://github.com/yoe/nbd/blob/18918eb/doc/proto.md
plus this email about NBD_CMD_CLOSE:
https://sourceforge.net/p/nbd/mailman/message/35000466/

I performed testing by temporarily turning on DEBUG_NBD while
compiling, then connecting variations on:
  ./qemu-nbd -f raw -x foo file
  ./qemu-io -f raw nbd://localhost:10809/foo
and watching the traces on both screen (both for startup negotiation,
and for various 'write -z', 'discard', and 'q' commands in qemu-io).
I intentionally tested all three combinations of:
old client, new server
new client, old server
new client, new server
to make sure that either side gracefully handles unknown
advertisements when the other side is newer, and correctly falls
back to older usage when the other side is too old.

I'm posting now so that others may compile my work and help with
cross-project testing (such as qemu client to Alex's NBDGO
server), which in turn will help us move experimental extensions
into final form in the NBD protocol.

Also available as a tag at this location:
git fetch git://repo.or.cz/qemu/ericb.git nbd-flags-v2

Tag is named v2 because patches 1 and 2 in this grouping have
been previously posted for inclusion in qemu 2.6

Eric Blake (18):
  nbd: Don't kill server on client that doesn't request TLS
  nbd: Don't fail handshake on NBD_OPT_LIST descriptions
  nbd: More debug typo fixes, use correct formats
  nbd: Detect servers that send unexpected error values
  nbd: Reject unknown request flags
  nbd: Avoid magic number for NBD max name size
  nbd: Treat flags vs. command type as separate fields
  nbd: Limit nbdflags to 16 bits
  nbd: Share common reply-sending code in server
  nbd: Share common option-sending code in client
  nbd: Let client skip portions of server reply
  nbd: Less allocation during NBD_OPT_LIST
  nbd: Support shorter handshake
  nbd: Implement NBD_OPT_GO on client
  nbd: Implement NBD_OPT_GO on server
  nbd: Support NBD_CMD_CLOSE
  nbd: Implement NBD_CMD_WRITE_ZEROES on server
  nbd: Implement NBD_CMD_WRITE_ZEROES on client

 block/nbd-client.h  |   2 +
 include/block/nbd.h |  61 ++++--
 nbd/nbd-internal.h  |  13 +-
 block/nbd-client.c  |  88 ++++++++-
 block/nbd.c         |  23 +++
 nbd/client.c        | 522 ++++++++++++++++++++++++++++++----------------------
 nbd/server.c        | 356 ++++++++++++++++++++++++++---------
 qemu-nbd.c          |   2 +-
 8 files changed, 744 insertions(+), 323 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 01/18] nbd: Don't kill server on client that doesn't request TLS
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:28   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 02/18] nbd: Don't fail handshake on NBD_OPT_LIST descriptions Eric Blake
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini

Upstream NBD is documenting that servers MAY choose to operate
in a conditional mode, where it is up to the client whether to
use TLS.  For qemu's case, we want to always be in FORCEDTLS
mode, because of the risk of man-in-the-middle attacks, and since
we never export more than one device; likewise, the qemu client
will ALWAYS send NBD_OPT_STARTTLS as its first option.  But now
that SELECTIVETLS servers exist, it is feasible to encounter a
(non-qemu) client that does not do NBD_OPT_STARTTLS first, but
rather wants to take advantage of the conditional modes it might
find elsewhere.

Since we require TLS, we are within our rights to drop connections
on any client that doesn't negotiate it right away, or which
attempts to negotiate it incorrectly, without violating the intent
of the NBD Protocol.  However, it's better to allow the client to
continue trying, on the grounds that maybe the client will get the
hint to send NBD_OPT_STARTTLS.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 2a4dd10..e7e4881 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -451,9 +451,12 @@ static int nbd_negotiate_options(NBDClient *client)

             default:
                 TRACE("Option 0x%x not permitted before TLS", clientflags);
+                if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
+                    return -EIO;
+                }
                 nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_TLS_REQD,
                                        clientflags);
-                return -EINVAL;
+                break;
             }
         } else if (fixedNewstyle) {
             switch (clientflags) {
@@ -471,6 +474,9 @@ static int nbd_negotiate_options(NBDClient *client)
                 return nbd_negotiate_handle_export_name(client, length);

             case NBD_OPT_STARTTLS:
+                if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
+                    return -EIO;
+                }
                 if (client->tlscreds) {
                     TRACE("TLS already enabled");
                     nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_INVALID,
@@ -480,7 +486,7 @@ static int nbd_negotiate_options(NBDClient *client)
                     nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_POLICY,
                                            clientflags);
                 }
-                return -EINVAL;
+                break;
             default:
                 TRACE("Unsupported option 0x%x", clientflags);
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 02/18] nbd: Don't fail handshake on NBD_OPT_LIST descriptions
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 01/18] nbd: Don't kill server on client that doesn't request TLS Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:30   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 03/18] nbd: More debug typo fixes, use correct formats Eric Blake
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini

The NBD Protocol states that NBD_REP_SERVER may set
'length > sizeof(namelen) + namelen'; in which case the rest
of the packet is a UTF-8 description of the export.  While we
don't know of any NBD servers that send this description yet,
we had better consume the data so we don't choke when we start
to talk to such a server.

Also, a (buggy/malicious) server that replies with length <
sizeof(namelen) would cause us to block waiting for bytes that
the server is not sending, and one that replies with super-huge
lengths could cause us to temporarily allocate up to 4G memory.
Sanity check things before blindly reading incorrectly.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 6777e58..48f2a21 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -192,13 +192,18 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             return -1;
         }
     } else if (type == NBD_REP_SERVER) {
+        if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
+            error_setg(errp, "incorrect option length");
+            return -1;
+        }
         if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
             error_setg(errp, "failed to read option name length");
             return -1;
         }
         namelen = be32_to_cpu(namelen);
-        if (len != (namelen + sizeof(namelen))) {
-            error_setg(errp, "incorrect option mame length");
+        len -= sizeof(namelen);
+        if (len < namelen) {
+            error_setg(errp, "incorrect option name length");
             return -1;
         }
         if (namelen > 255) {
@@ -214,6 +219,20 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             return -1;
         }
         (*name)[namelen] = '\0';
+        len -= namelen;
+        if (len) {
+            char *buf = g_malloc(len + 1);
+            if (read_sync(ioc, buf, len) != len) {
+                error_setg(errp, "failed to read export description");
+                g_free(*name);
+                g_free(buf);
+                *name = NULL;
+                return -1;
+            }
+            buf[len] = '\0';
+            TRACE("Ignoring export description: %s", buf);
+            g_free(buf);
+        }
     } else {
         error_setg(errp, "Unexpected reply type %x expected %x",
                    type, NBD_REP_SERVER);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 03/18] nbd: More debug typo fixes, use correct formats
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 01/18] nbd: Don't kill server on client that doesn't request TLS Eric Blake
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 02/18] nbd: Don't fail handshake on NBD_OPT_LIST descriptions Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:30   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 04/18] nbd: Detect servers that send unexpected error values Eric Blake
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini

Clean up some debug message oddities missed earlier; this includes
both typos, and recognizing that %d is not necessarily compatible
with uint32_t.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 41 ++++++++++++++++++++++-------------------
 nbd/server.c | 44 +++++++++++++++++++++++---------------------
 2 files changed, 45 insertions(+), 40 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 48f2a21..42e4e52 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -109,25 +109,27 @@ static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,

     switch (type) {
     case NBD_REP_ERR_UNSUP:
-        TRACE("server doesn't understand request %d, attempting fallback",
-              opt);
+        TRACE("server doesn't understand request %" PRIx32
+              ", attempting fallback", opt);
         result = 0;
         goto cleanup;

     case NBD_REP_ERR_POLICY:
-        error_setg(errp, "Denied by server for option %x", opt);
+        error_setg(errp, "Denied by server for option %" PRIx32, opt);
         break;

     case NBD_REP_ERR_INVALID:
-        error_setg(errp, "Invalid data length for option %x", opt);
+        error_setg(errp, "Invalid data length for option %" PRIx32, opt);
         break;

     case NBD_REP_ERR_TLS_REQD:
-        error_setg(errp, "TLS negotiation required before option %x", opt);
+        error_setg(errp, "TLS negotiation required before option %" PRIx32,
+                   opt);
         break;

     default:
-        error_setg(errp, "Unknown error code when asking for option %x", opt);
+        error_setg(errp, "Unknown error code when asking for option %" PRIx32,
+                   opt);
         break;
     }

@@ -165,7 +167,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
     }
     opt = be32_to_cpu(opt);
     if (opt != NBD_OPT_LIST) {
-        error_setg(errp, "Unexpected option type %x expected %x",
+        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
                    opt, NBD_OPT_LIST);
         return -1;
     }
@@ -207,7 +209,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             return -1;
         }
         if (namelen > 255) {
-            error_setg(errp, "export name length too long %d", namelen);
+            error_setg(errp, "export name length too long %" PRIu32, namelen);
             return -1;
         }

@@ -234,7 +236,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             g_free(buf);
         }
     } else {
-        error_setg(errp, "Unexpected reply type %x expected %x",
+        error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
                    type, NBD_REP_SERVER);
         return -1;
     }
@@ -349,7 +351,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
     }
     opt = be32_to_cpu(opt);
     if (opt != NBD_OPT_STARTTLS) {
-        error_setg(errp, "Unexpected option type %x expected %x",
+        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
                    opt, NBD_OPT_STARTTLS);
         return NULL;
     }
@@ -361,7 +363,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
     }
     type = be32_to_cpu(type);
     if (type != NBD_REP_ACK) {
-        error_setg(errp, "Server rejected request to start TLS %x",
+        error_setg(errp, "Server rejected request to start TLS %" PRIx32,
                    type);
         return NULL;
     }
@@ -373,7 +375,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
     }
     length = be32_to_cpu(length);
     if (length != 0) {
-        error_setg(errp, "Start TLS reponse was not zero %x",
+        error_setg(errp, "Start TLS response was not zero %" PRIu32,
                    length);
         return NULL;
     }
@@ -384,7 +386,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
         return NULL;
     }
     data.loop = g_main_loop_new(g_main_context_default(), FALSE);
-    TRACE("Starting TLS hanshake");
+    TRACE("Starting TLS handshake");
     qio_channel_tls_handshake(tioc,
                               nbd_tls_handshake,
                               &data,
@@ -474,7 +476,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         }
         globalflags = be16_to_cpu(globalflags);
         *flags = globalflags << 16;
-        TRACE("Global flags are %x", globalflags);
+        TRACE("Global flags are %" PRIx32, globalflags);
         if (globalflags & NBD_FLAG_FIXED_NEWSTYLE) {
             fixedNewStyle = true;
             TRACE("Server supports fixed new style");
@@ -550,7 +552,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         }
         exportflags = be16_to_cpu(exportflags);
         *flags |= exportflags;
-        TRACE("Export flags are %x", exportflags);
+        TRACE("Export flags are %" PRIx16, exportflags);
     } else if (magic == NBD_CLIENT_MAGIC) {
         if (name) {
             error_setg(errp, "Server does not support export names");
@@ -683,7 +685,8 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request)
     ssize_t ret;

     TRACE("Sending request to server: "
-          "{ .from = %" PRIu64", .len = %u, .handle = %" PRIu64", .type=%i}",
+          "{ .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64
+          ", .type=%" PRIu16 " }",
           request->from, request->len, request->handle, request->type);

     cpu_to_be32w((uint32_t*)buf, NBD_REQUEST_MAGIC);
@@ -732,12 +735,12 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply)

     reply->error = nbd_errno_to_system_errno(reply->error);

-    TRACE("Got reply: "
-          "{ magic = 0x%x, .error = %d, handle = %" PRIu64" }",
+    TRACE("Got reply: { magic = 0x%" PRIx32 ", .error = % " PRId32
+          ", handle = %" PRIu64" }",
           magic, reply->error, reply->handle);

     if (magic != NBD_REPLY_MAGIC) {
-        LOG("invalid magic (got 0x%x)", magic);
+        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
     return 0;
diff --git a/nbd/server.c b/nbd/server.c
index e7e4881..81afae2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -196,7 +196,7 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
     uint64_t magic;
     uint32_t len;

-    TRACE("Reply opt=%x type=%x", type, opt);
+    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32, type, opt);

     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
@@ -226,7 +226,7 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
     uint64_t magic, name_len;
     uint32_t opt, type, len;

-    TRACE("Advertizing export name '%s'", exp->name ? exp->name : "");
+    TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
     name_len = strlen(exp->name);
     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
@@ -392,12 +392,12 @@ static int nbd_negotiate_options(NBDClient *client)
     TRACE("Checking client flags");
     be32_to_cpus(&flags);
     if (flags & NBD_FLAG_C_FIXED_NEWSTYLE) {
-        TRACE("Support supports fixed newstyle handshake");
+        TRACE("Client supports fixed newstyle handshake");
         fixedNewstyle = true;
         flags &= ~NBD_FLAG_C_FIXED_NEWSTYLE;
     }
     if (flags != 0) {
-        TRACE("Unknown client flags 0x%x received", flags);
+        TRACE("Unknown client flags 0x%" PRIx32 " received", flags);
         return -EIO;
     }

@@ -431,12 +431,12 @@ static int nbd_negotiate_options(NBDClient *client)
         }
         length = be32_to_cpu(length);

-        TRACE("Checking option 0x%x", clientflags);
+        TRACE("Checking option 0x%" PRIx32, clientflags);
         if (client->tlscreds &&
             client->ioc == (QIOChannel *)client->sioc) {
             QIOChannel *tioc;
             if (!fixedNewstyle) {
-                TRACE("Unsupported option 0x%x", clientflags);
+                TRACE("Unsupported option 0x%" PRIx32, clientflags);
                 return -EINVAL;
             }
             switch (clientflags) {
@@ -450,7 +450,8 @@ static int nbd_negotiate_options(NBDClient *client)
                 break;

             default:
-                TRACE("Option 0x%x not permitted before TLS", clientflags);
+                TRACE("Option 0x%" PRIx32 " not permitted before TLS",
+                      clientflags);
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
                 }
@@ -488,7 +489,7 @@ static int nbd_negotiate_options(NBDClient *client)
                 }
                 break;
             default:
-                TRACE("Unsupported option 0x%x", clientflags);
+                TRACE("Unsupported option 0x%" PRIx32, clientflags);
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
                 }
@@ -506,7 +507,7 @@ static int nbd_negotiate_options(NBDClient *client)
                 return nbd_negotiate_handle_export_name(client, length);

             default:
-                TRACE("Unsupported option 0x%x", clientflags);
+                TRACE("Unsupported option 0x%" PRIx32, clientflags);
                 return -EINVAL;
             }
         }
@@ -647,12 +648,12 @@ static ssize_t nbd_receive_request(QIOChannel *ioc, struct nbd_request *request)
     request->from  = be64_to_cpup((uint64_t*)(buf + 16));
     request->len   = be32_to_cpup((uint32_t*)(buf + 24));

-    TRACE("Got request: "
-          "{ magic = 0x%x, .type = %d, from = %" PRIu64" , len = %u }",
+    TRACE("Got request: { magic = 0x%" PRIx32 ", .type = %" PRIx32
+          ", from = %" PRIu64 " , len = %" PRIu32 " }",
           magic, request->type, request->from, request->len);

     if (magic != NBD_REQUEST_MAGIC) {
-        LOG("invalid magic (got 0x%x)", magic);
+        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
     return 0;
@@ -665,7 +666,8 @@ static ssize_t nbd_send_reply(QIOChannel *ioc, struct nbd_reply *reply)

     reply->error = system_errno_to_nbd_errno(reply->error);

-    TRACE("Sending response to client: { .error = %d, handle = %" PRIu64 " }",
+    TRACE("Sending response to client: { .error = %" PRId32
+          ", handle = %" PRIu64 " }",
           reply->error, reply->handle);

     /* Reply
@@ -994,7 +996,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
     command = request->type & NBD_CMD_MASK_COMMAND;
     if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
-            LOG("len (%u) is larger than max len (%u)",
+            LOG("len (%" PRIu32" ) is larger than max len (%u)",
                 request->len, NBD_MAX_BUFFER_SIZE);
             rc = -EINVAL;
             goto out;
@@ -1007,7 +1009,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
         }
     }
     if (command == NBD_CMD_WRITE) {
-        TRACE("Reading %u byte(s)", request->len);
+        TRACE("Reading %" PRIu32 " byte(s)", request->len);

         if (read_sync(client->ioc, req->data, request->len) != request->len) {
             LOG("reading from socket failed");
@@ -1057,10 +1059,10 @@ static void nbd_trip(void *opaque)
     }
     command = request.type & NBD_CMD_MASK_COMMAND;
     if (command != NBD_CMD_DISC && (request.from + request.len) > exp->size) {
-            LOG("From: %" PRIu64 ", Len: %u, Size: %" PRIu64
-            ", Offset: %" PRIu64 "\n",
-                    request.from, request.len,
-                    (uint64_t)exp->size, (uint64_t)exp->dev_offset);
+            LOG("From: %" PRIu64 ", Len: %" PRIu32", Size: %" PRIu64
+                ", Offset: %" PRIu64 "\n",
+                request.from, request.len,
+                (uint64_t)exp->size, (uint64_t)exp->dev_offset);
         LOG("requested operation past EOF--bad client?");
         goto invalid_request;
     }
@@ -1095,7 +1097,7 @@ static void nbd_trip(void *opaque)
             goto error_reply;
         }

-        TRACE("Read %u byte(s)", request.len);
+        TRACE("Read %" PRIu32" byte(s)", request.len);
         if (nbd_co_send_reply(req, &reply, request.len) < 0)
             goto out;
         break;
@@ -1162,7 +1164,7 @@ static void nbd_trip(void *opaque)
         }
         break;
     default:
-        LOG("invalid request type (%u) received", request.type);
+        LOG("invalid request type (%" PRIu32 ") received", request.type);
     invalid_request:
         reply.error = EINVAL;
     error_reply:
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 04/18] nbd: Detect servers that send unexpected error values
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (2 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 03/18] nbd: More debug typo fixes, use correct formats Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:31   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 05/18] nbd: Reject unknown request flags Eric Blake
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini

Add some debugging to flag servers that are not compliant to
the NBD protocol.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/nbd/client.c b/nbd/client.c
index 42e4e52..c834587 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -33,8 +33,10 @@ static int nbd_errno_to_system_errno(int err)
         return ENOMEM;
     case NBD_ENOSPC:
         return ENOSPC;
+    default:
+        TRACE("Squashing unexpected error %d to EINVAL", err);
+        /* fallthrough */
     case NBD_EINVAL:
-    default:
         return EINVAL;
     }
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 05/18] nbd: Reject unknown request flags
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (3 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 04/18] nbd: Detect servers that send unexpected error values Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:32   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 06/18] nbd: Avoid magic number for NBD max name size Eric Blake
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini

The NBD protocol says that clients should not send a command flag
that has not been negotiated (whether by the client requesting an
option during a handshake, or because we advertise support for the
flag in response to NBD_OPT_EXPORT_NAME), and that servers should
reject invalid flags with EINVAL.  We were silently ignoring the
flags instead.  The client can't rely on our behavior, since it is
their fault for passing the bad flag in the first place, but it's
better to be robust up front than to possibly behave differently
than the client was expecting with the attempted flag.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/nbd/server.c b/nbd/server.c
index 81afae2..a10294e 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -984,6 +984,11 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
         goto out;
     }

+    if (request->type & ~NBD_CMD_MASK_COMMAND & ~NBD_CMD_FLAG_FUA) {
+        LOG("unsupported flags (got 0x%x)",
+            request->type & ~NBD_CMD_MASK_COMMAND);
+        return -EINVAL;
+    }
     if ((request->from + request->len) < request->from) {
         LOG("integer overflow detected! "
             "you're probably being attacked");
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 06/18] nbd: Avoid magic number for NBD max name size
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (4 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 05/18] nbd: Reject unknown request flags Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:35   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 07/18] nbd: Treat flags vs. command type as separate fields Eric Blake
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini, Kevin Wolf, open list:Block layer core

Declare a constant and use that when determining if an export
name fits within the constraints we are willing to support.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h | 2 ++
 nbd/client.c        | 2 +-
 nbd/server.c        | 4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index b86a976..3f047bf 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -76,6 +76,8 @@ enum {

 /* Maximum size of a single READ/WRITE data buffer */
 #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
+/* Maximum size of an export name */
+#define NBD_MAX_NAME_SIZE 255

 ssize_t nbd_wr_syncv(QIOChannel *ioc,
                      struct iovec *iov,
diff --git a/nbd/client.c b/nbd/client.c
index c834587..00f9244 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -210,7 +210,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             error_setg(errp, "incorrect option name length");
             return -1;
         }
-        if (namelen > 255) {
+        if (namelen > NBD_MAX_NAME_SIZE) {
             error_setg(errp, "export name length too long %" PRIu32, namelen);
             return -1;
         }
diff --git a/nbd/server.c b/nbd/server.c
index a10294e..5414c49 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -285,13 +285,13 @@ static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
 static int nbd_negotiate_handle_export_name(NBDClient *client, uint32_t length)
 {
     int rc = -EINVAL;
-    char name[256];
+    char name[NBD_MAX_NAME_SIZE + 1];

     /* Client sends:
         [20 ..  xx]   export name (length bytes)
      */
     TRACE("Checking length");
-    if (length > 255) {
+    if (length >= sizeof(name)) {
         LOG("Bad length received");
         goto fail;
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 07/18] nbd: Treat flags vs. command type as separate fields
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (5 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 06/18] nbd: Avoid magic number for NBD max name size Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:37   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 08/18] nbd: Limit nbdflags to 16 bits Eric Blake
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini, Kevin Wolf, open list:Block layer core

Current upstream NBD documents that requests have a 16-bit flags,
followed by a 16-bit type integer; although older versions mentioned
only a 32-bit field with masking to find flags.  Since the protocol
is in network order (big-endian over the wire), the ABI is unchanged;
but dealing with the flags as a separate field rather than masking
will make it easier to add support for upcoming NBD extensions that
increase the number of both flags and commands.

Improve some comments in nbd.h based on the current upstream
NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
and touch some nearby code to keep checkpatch.pl happy.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h | 18 ++++++++++++------
 nbd/nbd-internal.h  |  4 ++--
 block/nbd-client.c  |  9 +++------
 nbd/client.c        | 17 ++++++++++-------
 nbd/server.c        | 35 +++++++++++++++++++----------------
 5 files changed, 46 insertions(+), 37 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3f047bf..2c61901 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,4 +1,5 @@
 /*
+ *  Copyright (C) 2016 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -27,7 +28,8 @@

 struct nbd_request {
     uint32_t magic;
-    uint32_t type;
+    uint16_t flags;
+    uint16_t type;
     uint64_t handle;
     uint64_t from;
     uint32_t len;
@@ -39,6 +41,8 @@ struct nbd_reply {
     uint64_t handle;
 } QEMU_PACKED;

+/* Transmission (export) flags: sent from server to client during handshake,
+   but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
 #define NBD_FLAG_READ_ONLY      (1 << 1)        /* Device is read-only */
 #define NBD_FLAG_SEND_FLUSH     (1 << 2)        /* Send FLUSH */
@@ -46,10 +50,12 @@ struct nbd_reply {
 #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
 #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */

-/* New-style global flags. */
+/* New-style handshake (global) flags, sent from server to client, and
+   control what will happen during handshake phase. */
 #define NBD_FLAG_FIXED_NEWSTYLE     (1 << 0)    /* Fixed newstyle protocol. */

-/* New-style client flags. */
+/* New-style client flags, sent from client to server to control what happens
+   during handshake phase. */
 #define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)    /* Fixed newstyle protocol. */

 /* Reply types. */
@@ -60,10 +66,10 @@ struct nbd_reply {
 #define NBD_REP_ERR_INVALID     ((UINT32_C(1) << 31) | 3) /* Invalid length. */
 #define NBD_REP_ERR_TLS_REQD    ((UINT32_C(1) << 31) | 5) /* TLS required */

+/* Request flags, sent from client to server during transmission phase */
+#define NBD_CMD_FLAG_FUA        (1 << 0)

-#define NBD_CMD_MASK_COMMAND	0x0000ffff
-#define NBD_CMD_FLAG_FUA	(1 << 16)
-
+/* Supported request types */
 enum {
     NBD_CMD_READ = 0,
     NBD_CMD_WRITE = 1,
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 3791535..b663bf3 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -52,10 +52,10 @@
 /* This is all part of the "official" NBD API.
  *
  * The most up-to-date documentation is available at:
- * https://github.com/yoe/nbd/blob/master/doc/proto.txt
+ * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

-#define NBD_REQUEST_SIZE        (4 + 4 + 8 + 8 + 4)
+#define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
 #define NBD_REPLY_SIZE          (4 + 4 + 8)
 #define NBD_REQUEST_MAGIC       0x25609513
 #define NBD_REPLY_MAGIC         0x67446698
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 878e879..285025d 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -1,6 +1,7 @@
 /*
  * QEMU Block driver for  NBD
  *
+ * Copyright (C) 2016 Red Hat, Inc.
  * Copyright (C) 2008 Bull S.A.S.
  *     Author: Laurent Vivier <Laurent.Vivier@bull.net>
  *
@@ -252,7 +253,7 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,

     if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
         *flags &= ~BDRV_REQ_FUA;
-        request.type |= NBD_CMD_FLAG_FUA;
+        request.flags |= NBD_CMD_FLAG_FUA;
     }

     request.from = sector_num * 512;
@@ -376,11 +377,7 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
 void nbd_client_close(BlockDriverState *bs)
 {
     NbdClientSession *client = nbd_get_client_session(bs);
-    struct nbd_request request = {
-        .type = NBD_CMD_DISC,
-        .from = 0,
-        .len = 0
-    };
+    struct nbd_request request = { .type = NBD_CMD_DISC };

     if (client->ioc == NULL) {
         return;
diff --git a/nbd/client.c b/nbd/client.c
index 00f9244..7fd6059 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,4 +1,5 @@
 /*
+ *  Copyright (C) 2016 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Client Side
@@ -688,14 +689,16 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request)

     TRACE("Sending request to server: "
           "{ .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64
-          ", .type=%" PRIu16 " }",
-          request->from, request->len, request->handle, request->type);
+          ", .flags=%" PRIx16 ", .type=%" PRIu16 " }",
+          request->from, request->len, request->handle,
+          request->flags, request->type);

-    cpu_to_be32w((uint32_t*)buf, NBD_REQUEST_MAGIC);
-    cpu_to_be32w((uint32_t*)(buf + 4), request->type);
-    cpu_to_be64w((uint64_t*)(buf + 8), request->handle);
-    cpu_to_be64w((uint64_t*)(buf + 16), request->from);
-    cpu_to_be32w((uint32_t*)(buf + 24), request->len);
+    cpu_to_be32w((uint32_t *)buf, NBD_REQUEST_MAGIC);
+    cpu_to_be16w((uint16_t *)(buf + 4), request->flags);
+    cpu_to_be16w((uint16_t *)(buf + 6), request->type);
+    cpu_to_be64w((uint64_t *)(buf + 8), request->handle);
+    cpu_to_be64w((uint64_t *)(buf + 16), request->from);
+    cpu_to_be32w((uint32_t *)(buf + 24), request->len);

     ret = write_sync(ioc, buf, sizeof(buf));
     if (ret < 0) {
diff --git a/nbd/server.c b/nbd/server.c
index 5414c49..93c077e 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,4 +1,5 @@
 /*
+ *  Copyright (C) 2016 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Server Side
@@ -636,21 +637,23 @@ static ssize_t nbd_receive_request(QIOChannel *ioc, struct nbd_request *request)

     /* Request
        [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
-       [ 4 ..  7]   type    (0 == READ, 1 == WRITE)
+       [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+       [ 6 ..  7]   type    (NBD_CMD_READ, ...)
        [ 8 .. 15]   handle
        [16 .. 23]   from
        [24 .. 27]   len
      */

-    magic = be32_to_cpup((uint32_t*)buf);
-    request->type  = be32_to_cpup((uint32_t*)(buf + 4));
-    request->handle = be64_to_cpup((uint64_t*)(buf + 8));
-    request->from  = be64_to_cpup((uint64_t*)(buf + 16));
-    request->len   = be32_to_cpup((uint32_t*)(buf + 24));
+    magic = be32_to_cpup((uint32_t *)buf);
+    request->flags = be16_to_cpup((uint16_t *)(buf + 4));
+    request->type  = be16_to_cpup((uint16_t *)(buf + 6));
+    request->handle = be64_to_cpup((uint64_t *)(buf + 8));
+    request->from  = be64_to_cpup((uint64_t *)(buf + 16));
+    request->len   = be32_to_cpup((uint32_t *)(buf + 24));

-    TRACE("Got request: { magic = 0x%" PRIx32 ", .type = %" PRIx32
-          ", from = %" PRIu64 " , len = %" PRIu32 " }",
-          magic, request->type, request->from, request->len);
+    TRACE("Got request: { magic = 0x%" PRIx32 ", .flags = %" PRIx16
+          ".type = %" PRIx16 ", from = %" PRIu64 " , len = %" PRIu32 " }",
+          magic, request->flags, request->type, request->from, request->len);

     if (magic != NBD_REQUEST_MAGIC) {
         LOG("invalid magic (got 0x%" PRIx32 ")", magic);
@@ -984,9 +987,8 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
         goto out;
     }

-    if (request->type & ~NBD_CMD_MASK_COMMAND & ~NBD_CMD_FLAG_FUA) {
-        LOG("unsupported flags (got 0x%x)",
-            request->type & ~NBD_CMD_MASK_COMMAND);
+    if (request->flags & ~NBD_CMD_FLAG_FUA) {
+        LOG("unsupported flags (got 0x%x)", request->flags);
         return -EINVAL;
     }
     if ((request->from + request->len) < request->from) {
@@ -998,7 +1000,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque

     TRACE("Decoding type");

-    command = request->type & NBD_CMD_MASK_COMMAND;
+    command = request->type;
     if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
             LOG("len (%" PRIu32" ) is larger than max len (%u)",
@@ -1062,7 +1064,7 @@ static void nbd_trip(void *opaque)
         reply.error = -ret;
         goto error_reply;
     }
-    command = request.type & NBD_CMD_MASK_COMMAND;
+    command = request.type;
     if (command != NBD_CMD_DISC && (request.from + request.len) > exp->size) {
             LOG("From: %" PRIu64 ", Len: %" PRIu32", Size: %" PRIu64
                 ", Offset: %" PRIu64 "\n",
@@ -1084,7 +1086,8 @@ static void nbd_trip(void *opaque)
     case NBD_CMD_READ:
         TRACE("Request type is READ");

-        if (request.type & NBD_CMD_FLAG_FUA) {
+        /* XXX: NBD Protocol only documents use of FUA with WRITE */
+        if (request.flags & NBD_CMD_FLAG_FUA) {
             ret = blk_co_flush(exp->blk);
             if (ret < 0) {
                 LOG("flush failed");
@@ -1126,7 +1129,7 @@ static void nbd_trip(void *opaque)
             goto error_reply;
         }

-        if (request.type & NBD_CMD_FLAG_FUA) {
+        if (request.flags & NBD_CMD_FLAG_FUA) {
             ret = blk_co_flush(exp->blk);
             if (ret < 0) {
                 LOG("flush failed");
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 08/18] nbd: Limit nbdflags to 16 bits
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (6 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 07/18] nbd: Treat flags vs. command type as separate fields Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:37   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 09/18] nbd: Share common reply-sending code in server Eric Blake
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini, Kevin Wolf, open list:Block layer core

Rather than asserting that nbdflags is within range, just give
it the correct type to begin with :)  nbdflags corresponds to
the per-export portion of NBD Protocol "transmission flags", which
is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  2 +-
 nbd/server.c        | 10 ++++------
 qemu-nbd.c          |  2 +-
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 2c61901..42fd670 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -105,7 +105,7 @@ typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;

 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
-                          uint32_t nbdflags, void (*close)(NBDExport *),
+                          uint16_t nbdflags, void (*close)(NBDExport *),
                           Error **errp);
 void nbd_export_close(NBDExport *exp);
 void nbd_export_get(NBDExport *exp);
diff --git a/nbd/server.c b/nbd/server.c
index 93c077e..c8666ab 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -63,7 +63,7 @@ struct NBDExport {
     char *name;
     off_t dev_offset;
     off_t size;
-    uint32_t nbdflags;
+    uint16_t nbdflags;
     QTAILQ_HEAD(, NBDClient) clients;
     QTAILQ_ENTRY(NBDExport) next;

@@ -525,8 +525,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     NBDClient *client = data->client;
     char buf[8 + 8 + 8 + 128];
     int rc;
-    const int myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
-                         NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
+    const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
+                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
     bool oldStyle;

     /* Old style negotiation header without options
@@ -556,7 +556,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)

     oldStyle = client->exp != NULL && !client->tlscreds;
     if (oldStyle) {
-        assert ((client->exp->nbdflags & ~65535) == 0);
         stq_be_p(buf + 8, NBD_CLIENT_MAGIC);
         stq_be_p(buf + 16, client->exp->size);
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
@@ -585,7 +584,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
             goto fail;
         }

-        assert ((client->exp->nbdflags & ~65535) == 0);
         stq_be_p(buf + 18, client->exp->size);
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
         if (nbd_negotiate_write(client->ioc, buf + 18, sizeof(buf) - 18) !=
@@ -807,7 +805,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
 }

 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
-                          uint32_t nbdflags, void (*close)(NBDExport *),
+                          uint16_t nbdflags, void (*close)(NBDExport *),
                           Error **errp)
 {
     NBDExport *exp = g_malloc0(sizeof(NBDExport));
diff --git a/qemu-nbd.c b/qemu-nbd.c
index c2e4d3f..8880ac3 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -454,7 +454,7 @@ int main(int argc, char **argv)
     BlockBackend *blk;
     BlockDriverState *bs;
     off_t dev_offset = 0;
-    uint32_t nbdflags = 0;
+    uint16_t nbdflags = 0;
     bool disconnect = false;
     const char *bindto = "0.0.0.0";
     const char *port = NULL;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 09/18] nbd: Share common reply-sending code in server
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (7 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 08/18] nbd: Limit nbdflags to 16 bits Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:38   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 10/18] nbd: Share common option-sending code in client Eric Blake
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini

Rather than open-coding NBD_REP_SERVER, reuse the code we
already have by adding a length parameter.  The code gets
longer because of added comments, but the refactoring will
make adding NBD_OPT_GO in a later patch easier.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 59 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index c8666ab..69724c9 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -192,12 +192,15 @@ static ssize_t nbd_negotiate_drop_sync(QIOChannel *ioc, size_t size)

 */

-static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
+/* Send a reply header, including length, but no payload.
+ * Return -errno to kill connection, 0 to continue negotiation */
+static int nbd_negotiate_send_rep_len(QIOChannel *ioc, uint32_t type,
+                                      uint32_t opt, uint32_t len)
 {
     uint64_t magic;
-    uint32_t len;

-    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32, type, opt);
+    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32 " len=%" PRIu32,
+          type, opt, len);

     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
@@ -214,7 +217,7 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
         LOG("write failed (rep type)");
         return -EINVAL;
     }
-    len = cpu_to_be32(0);
+    len = cpu_to_be32(len);
     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
         LOG("write failed (rep data length)");
         return -EINVAL;
@@ -222,39 +225,35 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
     return 0;
 }

+/* Send a reply header with default 0 length.
+ * Return -errno to kill connection, 0 to continue negotiation */
+static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
+{
+    return nbd_negotiate_send_rep_len(ioc, type, opt, 0);
+}
+
+/* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
+ * Return -errno to kill connection, 0 to continue negotiation */
 static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
 {
-    uint64_t magic, name_len;
-    uint32_t opt, type, len;
+    uint32_t len;
+    int rc;

     TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
-    name_len = strlen(exp->name);
-    magic = cpu_to_be64(NBD_REP_MAGIC);
-    if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        LOG("write failed (magic)");
-        return -EINVAL;
-     }
-    opt = cpu_to_be32(NBD_OPT_LIST);
-    if (nbd_negotiate_write(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        LOG("write failed (opt)");
-        return -EINVAL;
-    }
-    type = cpu_to_be32(NBD_REP_SERVER);
-    if (nbd_negotiate_write(ioc, &type, sizeof(type)) != sizeof(type)) {
-        LOG("write failed (reply type)");
-        return -EINVAL;
-    }
-    len = cpu_to_be32(name_len + sizeof(len));
-    if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
-        LOG("write failed (length)");
-        return -EINVAL;
-    }
-    len = cpu_to_be32(name_len);
+    len = strlen(exp->name);
+    rc = nbd_negotiate_send_rep_len(ioc, NBD_REP_SERVER, NBD_OPT_LIST,
+                                    len + sizeof(len));
+    if (rc < 0) {
+        return rc;
+    }
+
+    len = cpu_to_be32(len);
     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
-        LOG("write failed (length)");
+        LOG("write failed (name length)");
         return -EINVAL;
     }
-    if (nbd_negotiate_write(ioc, exp->name, name_len) != name_len) {
+    len = be32_to_cpu(len);
+    if (nbd_negotiate_write(ioc, exp->name, len) != len) {
         LOG("write failed (buffer)");
         return -EINVAL;
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 10/18] nbd: Share common option-sending code in client
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (8 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 09/18] nbd: Share common reply-sending code in server Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:38   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 11/18] nbd: Let client skip portions of server reply Eric Blake
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini, Kevin Wolf, open list:Block layer core

Rather than open-coding each option request, it's easier to
have common helper functions do the work.  That in turn requires
having convenient packed types for handling option requests
and replies.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  29 +++++-
 nbd/nbd-internal.h  |   2 +-
 nbd/client.c        | 250 ++++++++++++++++++++++------------------------------
 3 files changed, 129 insertions(+), 152 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 42fd670..155196e 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -26,20 +26,41 @@
 #include "io/channel-socket.h"
 #include "crypto/tlscreds.h"

+/* Handshake phase structs */
+
+struct nbd_option {
+    uint64_t magic; /* NBD_OPTS_MAGIC */
+    uint32_t option; /* NBD_OPT_* */
+    uint32_t length;
+} QEMU_PACKED;
+typedef struct nbd_option nbd_option;
+
+struct nbd_opt_reply {
+    uint64_t magic; /* NBD_REP_MAGIC */
+    uint32_t option; /* NBD_OPT_* */
+    uint32_t type; /* NBD_REP_* */
+    uint32_t length;
+} QEMU_PACKED;
+typedef struct nbd_opt_reply nbd_opt_reply;
+
+/* Transmission phase structs */
+
 struct nbd_request {
-    uint32_t magic;
-    uint16_t flags;
-    uint16_t type;
+    uint32_t magic; /* NBD_REQUEST_MAGIC */
+    uint16_t flags; /* NBD_CMD_FLAG_* */
+    uint16_t type; /* NBD_CMD_* */
     uint64_t handle;
     uint64_t from;
     uint32_t len;
 } QEMU_PACKED;
+typedef struct nbd_request nbd_request;

 struct nbd_reply {
-    uint32_t magic;
+    uint32_t magic; /* NBD_REPLY_MAGIC */
     uint32_t error;
     uint64_t handle;
 } QEMU_PACKED;
+typedef struct nbd_reply nbd_reply;

 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index b663bf3..b78d249 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -61,7 +61,7 @@
 #define NBD_REPLY_MAGIC         0x67446698
 #define NBD_OPTS_MAGIC          0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC        0x0000420281861253LL
-#define NBD_REP_MAGIC           0x3e889045565a9LL
+#define NBD_REP_MAGIC           0x0003e889045565a9LL

 #define NBD_SET_SOCK            _IO(0xab, 0)
 #define NBD_SET_BLKSIZE         _IO(0xab, 1)
diff --git a/nbd/client.c b/nbd/client.c
index 7fd6059..07b8d2e 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -75,64 +75,123 @@ static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

 */

+/* Send an option request. Return 0 if successful, -1 with errp set if
+ * it is impossible to continue. */
+static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
+                                   uint32_t len, const char *data,
+                                   Error **errp)
+{
+    nbd_option req;
+    QEMU_BUILD_BUG_ON(sizeof(req) != 16);

-/* If type represents success, return 1 without further action.
- * If type represents an error reply, consume the rest of the packet on ioc.
- * Then return 0 for unsupported (so the client can fall back to
- * other approaches), or -1 with errp set for other errors.
+    if (len == -1) {
+        req.length = len = strlen(data);
+    }
+    TRACE("Sending option request %"PRIu32", len %"PRIu32, opt, len);
+
+    stq_be_p(&req.magic, NBD_OPTS_MAGIC);
+    stl_be_p(&req.option, opt);
+    stl_be_p(&req.length, len);
+
+    if (write_sync(ioc, &req, sizeof(req)) != sizeof(req)) {
+        error_setg(errp, "Failed to send option request header");
+        return -1;
+    }
+
+    if (len && write_sync(ioc, (char *) data, len) != len) {
+        error_setg(errp, "Failed to send option request data");
+        return -1;
+    }
+
+    return 0;
+}
+
+/* Receive the header of an option reply, which should match the given
+ * opt.  Read through the length field, but NOT the length bytes of
+ * payload. Return 0 if successful, -1 with errp set if it is
+ * impossible to continue. */
+static int nbd_receive_option_reply(QIOChannel *ioc, uint32_t opt,
+                                    nbd_opt_reply *reply, Error **errp)
+{
+    QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
+    if (read_sync(ioc, reply, sizeof(*reply)) != sizeof(*reply)) {
+        error_setg(errp, "failed to read option reply");
+        return -1;
+    }
+    be64_to_cpus(&reply->magic);
+    be32_to_cpus(&reply->option);
+    be32_to_cpus(&reply->type);
+    be32_to_cpus(&reply->length);
+
+    TRACE("Received option reply %"PRIu32", type %"PRIu32", len %"PRIu32,
+          reply->option, reply->type, reply->length);
+
+    if (reply->magic != NBD_REP_MAGIC) {
+        error_setg(errp, "Unexpected option reply magic");
+        return -1;
+    }
+    if (reply->option != opt) {
+        error_setg(errp, "Unexpected option type %x expected %x",
+                   reply->option, opt);
+        return -1;
+    }
+    return 0;
+}
+
+/* If reply represents success, return 1 without further action.
+ * If reply represents an error, consume the optional payload of
+ * the packet on ioc.  Then return 0 for unsupported (so the client
+ * can fall back to other approaches), or -1 with errp set for other
+ * errors.
  */
-static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,
+static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
                                 Error **errp)
 {
-    uint32_t len;
     char *msg = NULL;
     int result = -1;

-    if (!(type & (1 << 31))) {
+    if (!(reply->type & (1 << 31))) {
         return 1;
     }

-    if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
-        error_setg(errp, "failed to read option length");
-        return -1;
-    }
-    len = be32_to_cpu(len);
-    if (len) {
-        if (len > NBD_MAX_BUFFER_SIZE) {
+    if (reply->length) {
+        if (reply->length > NBD_MAX_BUFFER_SIZE) {
             error_setg(errp, "server's error message is too long");
             goto cleanup;
         }
-        msg = g_malloc(len + 1);
-        if (read_sync(ioc, msg, len) != len) {
+        msg = g_malloc(reply->length + 1);
+        if (read_sync(ioc, msg, reply->length) != reply->length) {
             error_setg(errp, "failed to read option error message");
             goto cleanup;
         }
-        msg[len] = '\0';
+        msg[reply->length] = '\0';
     }

-    switch (type) {
+    switch (reply->type) {
     case NBD_REP_ERR_UNSUP:
         TRACE("server doesn't understand request %" PRIx32
-              ", attempting fallback", opt);
+              ", attempting fallback", reply->option);
         result = 0;
         goto cleanup;

     case NBD_REP_ERR_POLICY:
-        error_setg(errp, "Denied by server for option %" PRIx32, opt);
+        error_setg(errp, "Denied by server for option %" PRIx32,
+                   reply->option);
         break;

     case NBD_REP_ERR_INVALID:
-        error_setg(errp, "Invalid data length for option %" PRIx32, opt);
+        error_setg(errp, "Invalid data length for option %" PRIx32,
+                   reply->option);
         break;

     case NBD_REP_ERR_TLS_REQD:
         error_setg(errp, "TLS negotiation required before option %" PRIx32,
-                   opt);
+                   reply->option);
         break;

     default:
         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
-                   opt);
+                   reply->option);
         break;
     }

@@ -147,58 +206,29 @@ static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,

 static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
 {
-    uint64_t magic;
-    uint32_t opt;
-    uint32_t type;
+    nbd_opt_reply reply;
     uint32_t len;
     uint32_t namelen;
     int error;

     *name = NULL;
-    if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "failed to read list option magic");
+    if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
         return -1;
     }
-    magic = be64_to_cpu(magic);
-    if (magic != NBD_REP_MAGIC) {
-        error_setg(errp, "Unexpected option list magic");
-        return -1;
-    }
-    if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "failed to read list option");
-        return -1;
-    }
-    opt = be32_to_cpu(opt);
-    if (opt != NBD_OPT_LIST) {
-        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
-                   opt, NBD_OPT_LIST);
-        return -1;
-    }
-
-    if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
-        error_setg(errp, "failed to read list option type");
-        return -1;
-    }
-    type = be32_to_cpu(type);
-    error = nbd_handle_reply_err(ioc, opt, type, errp);
+    error = nbd_handle_reply_err(ioc, &reply, errp);
     if (error <= 0) {
         return error;
     }
+    len = reply.length;

-    if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
-        error_setg(errp, "failed to read option length");
-        return -1;
-    }
-    len = be32_to_cpu(len);
-
-    if (type == NBD_REP_ACK) {
+    if (reply.type == NBD_REP_ACK) {
         if (len != 0) {
             error_setg(errp, "length too long for option end");
             return -1;
         }
-    } else if (type == NBD_REP_SERVER) {
+    } else if (reply.type == NBD_REP_SERVER) {
         if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
-            error_setg(errp, "incorrect option length");
+            error_setg(errp, "incorrect option length %"PRIu32, len);
             return -1;
         }
         if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
@@ -240,7 +270,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
         }
     } else {
         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
-                   type, NBD_REP_SERVER);
+                   reply.type, NBD_REP_SERVER);
         return -1;
     }
     return 1;
@@ -251,24 +281,10 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
                                      const char *wantname,
                                      Error **errp)
 {
-    uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
-    uint32_t opt = cpu_to_be32(NBD_OPT_LIST);
-    uint32_t length = 0;
     bool foundExport = false;

     TRACE("Querying export list");
-    if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "Failed to send list option magic");
-        return -1;
-    }
-
-    if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "Failed to send list option number");
-        return -1;
-    }
-
-    if (write_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
-        error_setg(errp, "Failed to send list option length");
+    if (nbd_send_option_request(ioc, NBD_OPT_LIST, 0, NULL, errp) < 0) {
         return -1;
     }

@@ -314,72 +330,29 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
                                         QCryptoTLSCreds *tlscreds,
                                         const char *hostname, Error **errp)
 {
-    uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
-    uint32_t opt = cpu_to_be32(NBD_OPT_STARTTLS);
-    uint32_t length = 0;
-    uint32_t type;
+    nbd_opt_reply reply;
     QIOChannelTLS *tioc;
     struct NBDTLSHandshakeData data = { 0 };

     TRACE("Requesting TLS from server");
-    if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "Failed to send option magic");
-        return NULL;
-    }
-
-    if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "Failed to send option number");
-        return NULL;
-    }
-
-    if (write_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
-        error_setg(errp, "Failed to send option length");
-        return NULL;
-    }
-
-    TRACE("Getting TLS reply from server1");
-    if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "failed to read option magic");
-        return NULL;
-    }
-    magic = be64_to_cpu(magic);
-    if (magic != NBD_REP_MAGIC) {
-        error_setg(errp, "Unexpected option magic");
-        return NULL;
-    }
-    TRACE("Getting TLS reply from server2");
-    if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "failed to read option");
-        return NULL;
-    }
-    opt = be32_to_cpu(opt);
-    if (opt != NBD_OPT_STARTTLS) {
-        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
-                   opt, NBD_OPT_STARTTLS);
+    if (nbd_send_option_request(ioc, NBD_OPT_STARTTLS, 0, NULL, errp) < 0) {
         return NULL;
     }

     TRACE("Getting TLS reply from server");
-    if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
-        error_setg(errp, "failed to read option type");
+    if (nbd_receive_option_reply(ioc, NBD_OPT_STARTTLS, &reply, errp) < 0) {
         return NULL;
     }
-    type = be32_to_cpu(type);
-    if (type != NBD_REP_ACK) {
+
+    if (reply.type != NBD_REP_ACK) {
         error_setg(errp, "Server rejected request to start TLS %" PRIx32,
-                   type);
+                   reply.type);
         return NULL;
     }

-    TRACE("Getting TLS reply from server");
-    if (read_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
-        error_setg(errp, "failed to read option length");
-        return NULL;
-    }
-    length = be32_to_cpu(length);
-    if (length != 0) {
+    if (reply.length != 0) {
         error_setg(errp, "Start TLS response was not zero %" PRIu32,
-                   length);
+                   reply.length);
         return NULL;
     }

@@ -466,8 +439,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,

     if (magic == NBD_OPTS_MAGIC) {
         uint32_t clientflags = 0;
-        uint32_t opt;
-        uint32_t namesize;
         uint16_t globalflags;
         uint16_t exportflags;
         bool fixedNewStyle = false;
@@ -519,28 +490,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
                 goto fail;
             }
         }
-        /* write the export name */
-        magic = cpu_to_be64(magic);
-        if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-            error_setg(errp, "Failed to send export name magic");
-            goto fail;
-        }
-        opt = cpu_to_be32(NBD_OPT_EXPORT_NAME);
-        if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-            error_setg(errp, "Failed to send export name option number");
-            goto fail;
-        }
-        namesize = cpu_to_be32(strlen(name));
-        if (write_sync(ioc, &namesize, sizeof(namesize)) !=
-            sizeof(namesize)) {
-            error_setg(errp, "Failed to send export name length");
-            goto fail;
-        }
-        if (write_sync(ioc, (char *)name, strlen(name)) != strlen(name)) {
-            error_setg(errp, "Failed to send export name");
+        /* write the export name request */
+        if (nbd_send_option_request(ioc, NBD_OPT_EXPORT_NAME, -1, name,
+                                    errp) < 0) {
             goto fail;
         }

+        /* Read the response */
         if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
             error_setg(errp, "Failed to read export length");
             goto fail;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 11/18] nbd: Let client skip portions of server reply
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (9 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 10/18] nbd: Share common option-sending code in client Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:39   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 12/18] nbd: Less allocation during NBD_OPT_LIST Eric Blake
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini

The server has a nice helper function nbd_negotiate_drop_sync()
which lets it easily ignore fluff from the client (such as the
payload to an unknown option request).  We can't quite make it
common, since it depends on nbd_negotiate_read() which handles
coroutine magic, but we can copy the idea into the client where
we have places where we want to ignore data (such as the
description tacked on the end of NBD_REP_SERVER).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 07b8d2e..b2dfc11 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -75,6 +75,32 @@ static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

 */

+/* Discard length bytes from channel.  Return -errno on failure, or
+ * the amount of bytes consumed. */
+static ssize_t drop_sync(QIOChannel *ioc, size_t size)
+{
+    ssize_t ret, dropped = size;
+    char small[1024];
+    char *buffer;
+
+    buffer = sizeof(small) < size ? small : g_malloc(MIN(65536, size));
+    while (size > 0) {
+        ret = read_sync(ioc, buffer, MIN(65536, size));
+        if (ret < 0) {
+            goto cleanup;
+        }
+        assert(ret <= size);
+        size -= ret;
+    }
+    ret = dropped;
+
+ cleanup:
+    if (buffer != small) {
+        g_free(buffer);
+    }
+    return ret;
+}
+
 /* Send an option request. Return 0 if successful, -1 with errp set if
  * it is impossible to continue. */
 static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
@@ -255,18 +281,11 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
         }
         (*name)[namelen] = '\0';
         len -= namelen;
-        if (len) {
-            char *buf = g_malloc(len + 1);
-            if (read_sync(ioc, buf, len) != len) {
-                error_setg(errp, "failed to read export description");
-                g_free(*name);
-                g_free(buf);
-                *name = NULL;
-                return -1;
-            }
-            buf[len] = '\0';
-            TRACE("Ignoring export description: %s", buf);
-            g_free(buf);
+        if (drop_sync(ioc, len) != len) {
+            error_setg(errp, "failed to read export description");
+            g_free(*name);
+            *name = NULL;
+            return -1;
         }
     } else {
         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
@@ -539,7 +558,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         goto fail;
     }

-    if (read_sync(ioc, &buf, 124) != 124) {
+    if (drop_sync(ioc, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 12/18] nbd: Less allocation during NBD_OPT_LIST
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (10 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 11/18] nbd: Let client skip portions of server reply Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:41   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 13/18] nbd: Support shorter handshake Eric Blake
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini

Since we know that the maximum name we are willing to accept
is small enough to stack-allocate, rework the iteration over
NBD_OPT_LIST responses to reuse a stack buffer rather than
allocating every time.  Furthermore, we don't even have to
allocate if we know the server's length doesn't match what
we are searching for.

Not fixed here: Upstream NBD Protocol recently added this
clarification:
https://github.com/yoe/nbd/blob/18918eb/doc/proto.md#conventions

 Where this document refers to a string, then unless otherwise
 stated, that string is a sequence of UTF-8 code points, which
 is not NUL terminated, MUST NOT contain NUL characters, SHOULD
 be no longer than 256 bytes and MUST be no longer than 4096
 bytes. This applies to export names and error messages (amongst
 others).

To be fully compliant to that, we need to bump our export name
limit from 255 to at least 256, and need to decide whether we
can bump it higher (bumping it all the way to 4096 is annoying
in that we could no longer safely stack-allocate a worst-case
string, so we may still want to take the leeway offered by SHOULD
to force a reasonable smaller limit).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 130 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 65 insertions(+), 65 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index b2dfc11..d4e37d5 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -230,14 +230,17 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
     return result;
 }

-static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
+/* Return -1 if unrecoverable error occurs, 0 if NBD_OPT_LIST is
+ * unsupported, 1 if iteration is done, 2 to keep looking, and 3 if
+ * this entry matches want. */
+static int nbd_receive_list(QIOChannel *ioc, const char *want, Error **errp)
 {
     nbd_opt_reply reply;
     uint32_t len;
     uint32_t namelen;
+    char name[NBD_MAX_NAME_SIZE + 1];
     int error;

-    *name = NULL;
     if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
         return -1;
     }
@@ -252,97 +255,94 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             error_setg(errp, "length too long for option end");
             return -1;
         }
-    } else if (reply.type == NBD_REP_SERVER) {
-        if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
-            error_setg(errp, "incorrect option length %"PRIu32, len);
-            return -1;
-        }
-        if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
-            error_setg(errp, "failed to read option name length");
-            return -1;
-        }
-        namelen = be32_to_cpu(namelen);
-        len -= sizeof(namelen);
-        if (len < namelen) {
-            error_setg(errp, "incorrect option name length");
-            return -1;
-        }
-        if (namelen > NBD_MAX_NAME_SIZE) {
-            error_setg(errp, "export name length too long %" PRIu32, namelen);
-            return -1;
-        }
-
-        *name = g_new0(char, namelen + 1);
-        if (read_sync(ioc, *name, namelen) != namelen) {
-            error_setg(errp, "failed to read export name");
-            g_free(*name);
-            *name = NULL;
-            return -1;
-        }
-        (*name)[namelen] = '\0';
-        len -= namelen;
-        if (drop_sync(ioc, len) != len) {
-            error_setg(errp, "failed to read export description");
-            g_free(*name);
-            *name = NULL;
-            return -1;
-        }
-    } else {
+        return 1;
+    } else if (reply.type != NBD_REP_SERVER) {
         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
                    reply.type, NBD_REP_SERVER);
         return -1;
     }
-    return 1;
+
+    if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
+        error_setg(errp, "incorrect option length %"PRIu32, len);
+        return -1;
+    }
+    if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
+        error_setg(errp, "failed to read option name length");
+        return -1;
+    }
+    namelen = be32_to_cpu(namelen);
+    len -= sizeof(namelen);
+    if (len < namelen) {
+        error_setg(errp, "incorrect option name length");
+        return -1;
+    }
+    if (namelen != strlen(want)) {
+        if (drop_sync(ioc, len) != len) {
+            error_setg(errp, "failed to skip export name with wrong length");
+            return -1;
+        }
+        return 2;
+    }
+
+    assert(namelen < sizeof(name));
+    if (read_sync(ioc, name, namelen) != namelen) {
+        error_setg(errp, "failed to read export name");
+        return -1;
+    }
+    name[namelen] = '\0';
+    len -= namelen;
+    if (drop_sync(ioc, len) != len) {
+        error_setg(errp, "failed to read export description");
+        return -1;
+    }
+    return strcmp(name, want) == 0 ? 3 : 2;
 }


+/* Return -1 on failure, 0 if wantname is an available export. */
 static int nbd_receive_query_exports(QIOChannel *ioc,
                                      const char *wantname,
                                      Error **errp)
 {
     bool foundExport = false;

-    TRACE("Querying export list");
+    TRACE("Querying export list for '%s'", wantname);
     if (nbd_send_option_request(ioc, NBD_OPT_LIST, 0, NULL, errp) < 0) {
         return -1;
     }

     TRACE("Reading available export names");
     while (1) {
-        char *name = NULL;
-        int ret = nbd_receive_list(ioc, &name, errp);
+        int ret = nbd_receive_list(ioc, wantname, errp);

-        if (ret < 0) {
-            g_free(name);
-            name = NULL;
+        switch (ret) {
+        default:
+            /* Server gave unexpected reply */
+            assert(ret < 0);
             return -1;
-        }
-        if (ret == 0) {
+        case 0:
             /* Server doesn't support export listing, so
              * we will just assume an export with our
              * wanted name exists */
-            foundExport = true;
-            break;
-        }
-        if (name == NULL) {
-            TRACE("End of export name list");
+            return 0;
+        case 1:
+            /* Done iterating. */
+            if (!foundExport) {
+                error_setg(errp, "No export with name '%s' available",
+                           wantname);
+                return -1;
+            }
+            return 0;
+        case 2:
+            /* Wasn't this one, keep going. */
             break;
-        }
-        if (g_str_equal(name, wantname)) {
+        case 3:
+            /* Found a match, but must finish parsing reply. */
+            TRACE("Found desired export name '%s'", wantname);
             foundExport = true;
-            TRACE("Found desired export name '%s'", name);
-        } else {
-            TRACE("Ignored export name '%s'", name);
+            break;
         }
-        g_free(name);
-    }
-
-    if (!foundExport) {
-        error_setg(errp, "No export with name '%s' available", wantname);
-        return -1;
     }
-
-    return 0;
 }

 static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 13/18] nbd: Support shorter handshake
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (11 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 12/18] nbd: Less allocation during NBD_OPT_LIST Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:42   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 14/18] nbd: Implement NBD_OPT_GO on client Eric Blake
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini, Kevin Wolf, open list:Block layer core

The NBD Protocol allows the server and client to mutually agree
on a shorter handshake (omit the 124 bytes of reserved 0), via
the server advertising NBD_FLAG_NO_ZEROES and the client
acknowledging with NBD_FLAG_C_NO_ZEROES (only possible in
newstyle, whether or not it is fixed newstyle).  It doesn't
shave much off the wire, but we might as well implement it.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  6 ++++--
 nbd/client.c        |  8 +++++++-
 nbd/server.c        | 15 +++++++++++----
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 155196e..35c0ea3 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -73,11 +73,13 @@ typedef struct nbd_reply nbd_reply;

 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
-#define NBD_FLAG_FIXED_NEWSTYLE     (1 << 0)    /* Fixed newstyle protocol. */
+#define NBD_FLAG_FIXED_NEWSTYLE   (1 << 0) /* Fixed newstyle protocol. */
+#define NBD_FLAG_NO_ZEROES        (1 << 1) /* End handshake without zeroes. */

 /* New-style client flags, sent from client to server to control what happens
    during handshake phase. */
-#define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)    /* Fixed newstyle protocol. */
+#define NBD_FLAG_C_FIXED_NEWSTYLE (1 << 0) /* Fixed newstyle protocol. */
+#define NBD_FLAG_C_NO_ZEROES      (1 << 1) /* End handshake without zeroes. */

 /* Reply types. */
 #define NBD_REP_ACK             (1)             /* Data sending finished. */
diff --git a/nbd/client.c b/nbd/client.c
index d4e37d5..507ddc1 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -409,6 +409,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
     char buf[256];
     uint64_t magic, s;
     int rc;
+    bool zeroes = true;

     TRACE("Receiving negotiation tlscreds=%p hostname=%s.",
           tlscreds, hostname ? hostname : "<null>");
@@ -475,6 +476,11 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
             TRACE("Server supports fixed new style");
             clientflags |= NBD_FLAG_C_FIXED_NEWSTYLE;
         }
+        if (globalflags & NBD_FLAG_NO_ZEROES) {
+            zeroes = false;
+            TRACE("Server supports no zeroes");
+            clientflags |= NBD_FLAG_C_NO_ZEROES;
+        }
         /* client requested flags */
         clientflags = cpu_to_be32(clientflags);
         if (write_sync(ioc, &clientflags, sizeof(clientflags)) !=
@@ -558,7 +564,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         goto fail;
     }

-    if (drop_sync(ioc, 124) != 124) {
+    if (zeroes && drop_sync(ioc, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
     }
diff --git a/nbd/server.c b/nbd/server.c
index 69724c9..379df8c 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -78,6 +78,7 @@ struct NBDClient {
     int refcount;
     void (*close)(NBDClient *client);

+    bool no_zeroes;
     NBDExport *exp;
     QCryptoTLSCreds *tlscreds;
     char *tlsaclname;
@@ -396,6 +397,11 @@ static int nbd_negotiate_options(NBDClient *client)
         fixedNewstyle = true;
         flags &= ~NBD_FLAG_C_FIXED_NEWSTYLE;
     }
+    if (flags & NBD_FLAG_C_NO_ZEROES) {
+        TRACE("Client supports no zeroes at handshake end");
+        client->no_zeroes = true;
+        flags &= ~NBD_FLAG_C_NO_ZEROES;
+    }
     if (flags != 0) {
         TRACE("Unknown client flags 0x%" PRIx32 " received", flags);
         return -EIO;
@@ -527,6 +533,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
                               NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
     bool oldStyle;
+    size_t len;

     /* Old style negotiation header without options
         [ 0 ..   7]   passwd       ("NBDMAGIC")
@@ -543,7 +550,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
         ....options sent....
         [18 ..  25]   size
         [26 ..  27]   export flags
-        [28 .. 151]   reserved     (0)
+        [28 .. 151]   reserved     (0, omit if no_zeroes)
      */

     qio_channel_set_blocking(client->ioc, false, NULL);
@@ -560,7 +567,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
     } else {
         stq_be_p(buf + 8, NBD_OPTS_MAGIC);
-        stw_be_p(buf + 16, NBD_FLAG_FIXED_NEWSTYLE);
+        stw_be_p(buf + 16, NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES);
     }

     if (oldStyle) {
@@ -585,8 +592,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)

         stq_be_p(buf + 18, client->exp->size);
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
-        if (nbd_negotiate_write(client->ioc, buf + 18, sizeof(buf) - 18) !=
-            sizeof(buf) - 18) {
+        len = client->no_zeroes ? 10 : sizeof(buf) - 18;
+        if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
             LOG("write failed");
             goto fail;
         }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 14/18] nbd: Implement NBD_OPT_GO on client
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (12 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 13/18] nbd: Support shorter handshake Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:47   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 15/18] nbd: Implement NBD_OPT_GO on server Eric Blake
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini, Kevin Wolf, open list:Block layer core

NBD_OPT_EXPORT_NAME is lousy: it doesn't have any sane error
reporting.  Upstream NBD recently added NBD_OPT_GO as the
improved version of the option that does what we want: it
reports sane errors on failures (including when a server
requires TLS but does not have NBD_OPT_GO!), and on success
it concludes with the same data as NBD_OPT_EXPORT_NAME sends.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  1 +
 nbd/nbd-internal.h  |  7 +++++
 nbd/client.c        | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 35c0ea3..d261dbc 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -88,6 +88,7 @@ typedef struct nbd_reply nbd_reply;
 #define NBD_REP_ERR_POLICY      ((UINT32_C(1) << 31) | 2) /* Server denied */
 #define NBD_REP_ERR_INVALID     ((UINT32_C(1) << 31) | 3) /* Invalid length. */
 #define NBD_REP_ERR_TLS_REQD    ((UINT32_C(1) << 31) | 5) /* TLS required */
+#define NBD_REP_ERR_UNKNOWN     ((UINT32_C(1) << 31) | 6) /* Export unknown */

 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0)
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index b78d249..ddba1d0 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -55,8 +55,13 @@
  * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

+/* Size of all NBD_OPT_*, without payload */
 #define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
+/* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
 #define NBD_REPLY_SIZE          (4 + 4 + 8)
+/* Size of reply to NBD_OPT_EXPORT_NAME, without trailing zeroes */
+#define NBD_FINAL_REPLY_SIZE    (8 + 2)
+
 #define NBD_REQUEST_MAGIC       0x25609513
 #define NBD_REPLY_MAGIC         0x67446698
 #define NBD_OPTS_MAGIC          0x49484156454F5054LL
@@ -80,6 +85,8 @@
 #define NBD_OPT_LIST            (3)
 #define NBD_OPT_PEEK_EXPORT     (4)
 #define NBD_OPT_STARTTLS        (5)
+#define NBD_OPT_INFO            (6)
+#define NBD_OPT_GO              (7)

 /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
  * but only a limited set of errno values is specified in the protocol.
diff --git a/nbd/client.c b/nbd/client.c
index 507ddc1..af17d4c 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -215,6 +215,11 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
                    reply->option);
         break;

+    case NBD_REP_ERR_UNKNOWN:
+        error_setg(errp, "Requested export not available for option %" PRIx32,
+                   reply->option);
+        break;
+
     default:
         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
                    reply->option);
@@ -299,6 +304,67 @@ static int nbd_receive_list(QIOChannel *ioc, const char *want, Error **errp)
 }


+/* Returns -1 if NBD_OPT_GO proves the export cannot be used, 0 if
+ * NBD_OPT_GO is unsupported (fall back to NBD_OPT_LIST and
+ * NBD_OPT_EXPORT_NAME in that case), and > 0 if the export is good to
+ * go (with the server data at the same point as it would be right
+ * after sending NBD_OPT_EXPORT_NAME). */
+static int nbd_opt_go(QIOChannel *ioc, const char *wantname, Error **errp)
+{
+    nbd_opt_reply reply;
+    uint32_t len;
+    uint32_t namelen;
+    int error;
+    char buf[NBD_MAX_NAME_SIZE];
+
+    TRACE("Attempting NBD_OPT_GO for export '%s'", wantname);
+    if (nbd_send_option_request(ioc, NBD_OPT_GO, -1, wantname, errp) < 0) {
+        return -1;
+    }
+
+    TRACE("Reading export info");
+    if (nbd_receive_option_reply(ioc, NBD_OPT_GO, &reply, errp) < 0) {
+        return -1;
+    }
+    error = nbd_handle_reply_err(ioc, &reply, errp);
+    if (error <= 0) {
+        return error;
+    }
+    len = reply.length;
+
+    if (reply.type != NBD_REP_SERVER) {
+        error_setg(errp, "unexpected reply type %" PRIx32 ", expected %x",
+                   reply.type, NBD_REP_SERVER);
+        return -1;
+    }
+
+    if (len < sizeof(namelen) + NBD_FINAL_REPLY_SIZE ||
+        len > sizeof(namelen) + sizeof(buf) + NBD_FINAL_REPLY_SIZE) {
+        error_setg(errp, "reply with %" PRIu32 " bytes is unexpected size",
+                   len);
+        return -1;
+    }
+
+    if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
+        error_setg(errp, "failed to read namelen");
+        return -1;
+    }
+    be32_to_cpus(&namelen);
+    if (len != sizeof(namelen) + namelen + NBD_FINAL_REPLY_SIZE) {
+        error_setg(errp, "namelen %" PRIu32 " is unexpected size",
+                   len);
+        return -1;
+    }
+
+    if (read_sync(ioc, buf, namelen) != namelen) {
+        error_setg(errp, "failed to read name");
+        return -1;
+    }
+
+    TRACE("export is good to go");
+    return 1;
+}
+
 /* Return -1 on failure, 0 if wantname is an available export. */
 static int nbd_receive_query_exports(QIOChannel *ioc,
                                      const char *wantname,
@@ -505,11 +571,26 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
             name = "";
         }
         if (fixedNewStyle) {
+            int result;
+
+            /* Try NBD_OPT_GO first - if it works, we are done (it
+             * also gives us a good message if the server requires
+             * TLS).  If it is not available, fall back to
+             * NBD_OPT_LIST for nicer error messages about a missing
+             * export, then use NBD_OPT_EXPORT_NAME.  */
+            result = nbd_opt_go(ioc, name, errp);
+            if (result < 0) {
+                goto fail;
+            }
+            if (result > 0) {
+                zeroes = false;
+                goto success;
+            }
             /* Check our desired export is present in the
              * server export list. Since NBD_OPT_EXPORT_NAME
              * cannot return an error message, running this
-             * query gives us good error reporting if the
-             * server required TLS
+             * query gives us better error reporting if the
+             * export name is not available.
              */
             if (nbd_receive_query_exports(ioc, name, errp) < 0) {
                 goto fail;
@@ -522,6 +603,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         }

         /* Read the response */
+    success:
         if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
             error_setg(errp, "Failed to read export length");
             goto fail;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 15/18] nbd: Implement NBD_OPT_GO on server
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (13 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 14/18] nbd: Implement NBD_OPT_GO on client Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:48   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 16/18] nbd: Support NBD_CMD_CLOSE Eric Blake
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini

NBD_OPT_EXPORT_NAME is lousy: it requires us to close the connection
rather than report an error.  Upstream NBD recently added NBD_OPT_GO
as the improved version of the option that does what we want, along
with NBD_OPT_INFO that returns the same information but does not
transition to transmission phase.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 109 insertions(+), 13 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 379df8c..e68e83c 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -233,17 +233,19 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
     return nbd_negotiate_send_rep_len(ioc, type, opt, 0);
 }

-/* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
+/* Send the common part of an NBD_REP_SERVER reply for the given option,
+ * and include extra_len in the advertised payload.
  * Return -errno to kill connection, 0 to continue negotiation */
-static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
+static int nbd_negotiate_send_rep_server(QIOChannel *ioc, NBDExport *exp,
+                                         uint32_t opt, uint32_t extra_len)
 {
     uint32_t len;
     int rc;

     TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
     len = strlen(exp->name);
-    rc = nbd_negotiate_send_rep_len(ioc, NBD_REP_SERVER, NBD_OPT_LIST,
-                                    len + sizeof(len));
+    rc = nbd_negotiate_send_rep_len(ioc, NBD_REP_SERVER, opt,
+                                    len + sizeof(len) + extra_len);
     if (rc < 0) {
         return rc;
     }
@@ -261,6 +263,15 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
     return 0;
 }

+/* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
+ * Return -errno to kill connection, 0 to continue negotiation. */
+static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
+{
+    return nbd_negotiate_send_rep_server(ioc, exp, NBD_OPT_LIST, 0);
+}
+
+/* Send a sequence of replies to NBD_OPT_LIST.
+ * Return -errno to kill connection, 0 to continue negotiation. */
 static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
 {
     NBDExport *exp;
@@ -283,6 +294,8 @@ static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
     return nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, NBD_OPT_LIST);
 }

+/* Send a reply to NBD_OPT_EXPORT_NAME.
+ * Return -errno to kill connection, 0 to end negotiation. */
 static int nbd_negotiate_handle_export_name(NBDClient *client, uint32_t length)
 {
     int rc = -EINVAL;
@@ -318,6 +331,73 @@ fail:
 }


+/* Handle NBD_OPT_INFO and NBD_OPT_GO.
+ * Return -errno to kill connection, 0 if ready for next option, and 1
+ * to move into transmission phase.  */
+static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
+                                     uint32_t opt, uint16_t myflags)
+{
+    int rc;
+    char name[NBD_MAX_NAME_SIZE + 1];
+    NBDExport *exp;
+    uint64_t size;
+    uint16_t flags;
+
+    /* Client sends:
+        [20 ..  xx]   export name (length bytes)
+     */
+    TRACE("Checking length");
+    if (length >= sizeof(name)) {
+        if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
+            return -EIO;
+        }
+        return nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_INVALID, opt);
+    }
+    if (nbd_negotiate_read(client->ioc, name, length) != length) {
+        LOG("read failed");
+        return -EIO;
+    }
+    name[length] = '\0';
+
+    TRACE("Client requested info on export '%s'", name);
+
+    exp = nbd_export_find(name);
+    if (!exp) {
+        return nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_UNKNOWN, opt);
+    }
+
+    QEMU_BUILD_BUG_ON(NBD_FINAL_REPLY_SIZE != sizeof(size) + sizeof(flags));
+    rc = nbd_negotiate_send_rep_server(client->ioc, exp, opt,
+                                       NBD_FINAL_REPLY_SIZE);
+    if (rc < 0) {
+        return rc;
+    }
+
+    assert((exp->nbdflags & ~65535) == 0);
+    size = cpu_to_be64(exp->size);
+    flags = cpu_to_be16(exp->nbdflags | myflags);
+
+    if (nbd_negotiate_write(client->ioc, &size, sizeof(size)) !=
+        sizeof(size)) {
+        LOG("write failed");
+        return -EIO;
+    }
+    if (nbd_negotiate_write(client->ioc, &flags, sizeof(flags)) !=
+        sizeof(flags)) {
+        LOG("write failed");
+        return -EIO;
+    }
+
+    if (opt == NBD_OPT_GO) {
+        client->exp = exp;
+        QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
+        nbd_export_get(client->exp);
+        rc = 1;
+    }
+    return rc;
+}
+
+
 static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
                                                  uint32_t length)
 {
@@ -366,7 +446,10 @@ static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
 }


-static int nbd_negotiate_options(NBDClient *client)
+/* Loop over all client options, during fixed newstyle negotiation.
+ * Return -errno to kill connection, 0 on successful NBD_OPT_EXPORT_NAME,
+ * 1 on successful NBD_OPT_GO.  */
+static int nbd_negotiate_options(NBDClient *client, uint16_t myflags)
 {
     uint32_t flags;
     bool fixedNewstyle = false;
@@ -480,6 +563,16 @@ static int nbd_negotiate_options(NBDClient *client)
             case NBD_OPT_EXPORT_NAME:
                 return nbd_negotiate_handle_export_name(client, length);

+            case NBD_OPT_INFO:
+            case NBD_OPT_GO:
+                ret = nbd_negotiate_handle_info(client, length, clientflags,
+                                                myflags);
+                if (ret) {
+                    assert(ret < 0 || clientflags == NBD_OPT_GO);
+                    return ret;
+                }
+                break;
+
             case NBD_OPT_STARTTLS:
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
@@ -584,18 +677,21 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
             LOG("write failed");
             goto fail;
         }
-        rc = nbd_negotiate_options(client);
-        if (rc != 0) {
+        rc = nbd_negotiate_options(client, myflags);
+        if (rc < 0) {
             LOG("option negotiation failed");
             goto fail;
         }

-        stq_be_p(buf + 18, client->exp->size);
-        stw_be_p(buf + 26, client->exp->nbdflags | myflags);
-        len = client->no_zeroes ? 10 : sizeof(buf) - 18;
-        if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
-            LOG("write failed");
-            goto fail;
+        if (!rc) {
+            /* If options ended with NBD_OPT_GO, we already sent this. */
+            stq_be_p(buf + 18, client->exp->size);
+            stw_be_p(buf + 26, client->exp->nbdflags | myflags);
+            len = client->no_zeroes ? 10 : sizeof(buf) - 18;
+            if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
+                LOG("write failed");
+                goto fail;
+            }
         }
     }

-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 16/18] nbd: Support NBD_CMD_CLOSE
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (14 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 15/18] nbd: Implement NBD_OPT_GO on server Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:50   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [RFC PATCH 17/18] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini, Kevin Wolf, open list:Block layer core

NBD_CMD_DISC is annoying: the server is not required to reply,
so the client has no choice but to disconnect once it has sent
the message; but depending on timing, the server can see the
disconnect prior to reading the request, and treat things as
an abrupt exit rather than a clean shutdown (which may affect
whether the server properly fsync()s data to disk, and so on).
The new NBD_CMD_CLOSE adds another round of handshake, where
the client waits for the server's action before closing, to
make sure both parties know that it was a clean close rather
than an accidental early disconnect.

In nbd-client.c, nbd_client_close() is called after we have
already exited the normal coroutine context used by all the
other transmission phase handlers, so the code is a bit more
complex to build up a coroutine just for the purpose of waiting
for the server's response.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  4 +++-
 block/nbd-client.c  | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 nbd/server.c        | 19 +++++++++++++++++--
 3 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index d261dbc..4c57754 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -70,6 +70,7 @@ typedef struct nbd_reply nbd_reply;
 #define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
 #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
 #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
+#define NBD_FLAG_SEND_CLOSE     (1 << 8)        /* Send CLOSE */

 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
@@ -99,7 +100,8 @@ enum {
     NBD_CMD_WRITE = 1,
     NBD_CMD_DISC = 2,
     NBD_CMD_FLUSH = 3,
-    NBD_CMD_TRIM = 4
+    NBD_CMD_TRIM = 4,
+    NBD_CMD_CLOSE = 7,
 };

 #define NBD_DEFAULT_PORT	10809
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 285025d..f013084 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -374,6 +374,29 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
                        false, nbd_reply_ready, NULL, bs);
 }

+typedef struct NbdCloseCo {
+    BlockDriverState *bs;
+    nbd_request request;
+    nbd_reply reply;
+    bool done;
+} NbdCloseCo;
+
+static void coroutine_fn nbd_client_close_co(void *opaque)
+{
+    NbdCloseCo *closeco = opaque;
+    NbdClientSession *client = nbd_get_client_session(closeco->bs);
+    ssize_t ret;
+
+    nbd_coroutine_start(client, &closeco->request);
+    ret = nbd_co_send_request(closeco->bs, &closeco->request, NULL, 0);
+    if (ret >= 0) {
+        nbd_co_receive_reply(client, &closeco->request, &closeco->reply,
+                             NULL, 0);
+    }
+    nbd_coroutine_end(client, &closeco->request);
+    closeco->done = true;
+}
+
 void nbd_client_close(BlockDriverState *bs)
 {
     NbdClientSession *client = nbd_get_client_session(bs);
@@ -383,8 +406,28 @@ void nbd_client_close(BlockDriverState *bs)
         return;
     }

-    nbd_send_request(client->ioc, &request);
+    if (client->nbdflags & NBD_FLAG_SEND_CLOSE) {
+        /* Newer server, wants us to wait for reply before we close */
+        Coroutine *co;
+        NbdCloseCo closeco = {
+            .bs = bs,
+            .request = { .type = NBD_CMD_CLOSE },
+        };
+        AioContext *aio_context;

+        g_assert(!qemu_in_coroutine());
+        aio_context = bdrv_get_aio_context(bs);
+        co = qemu_coroutine_create(nbd_client_close_co);
+        qemu_coroutine_enter(co, &closeco);
+        while (!closeco.done) {
+            aio_poll(aio_context, true);
+        }
+    } else {
+        /* Older server, send request, but no reply will come */
+        nbd_send_request(client->ioc, &request);
+    }
+
+    /* Regardless of any received errors, the connection is done. */
     nbd_teardown_connection(bs);
 }

diff --git a/nbd/server.c b/nbd/server.c
index e68e83c..2a6eaf2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -624,7 +624,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     char buf[8 + 8 + 8 + 128];
     int rc;
     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
-                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
+                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
+                              NBD_FLAG_SEND_CLOSE);
     bool oldStyle;
     size_t len;

@@ -1244,7 +1245,21 @@ static void nbd_trip(void *opaque)
         break;
     case NBD_CMD_DISC:
         TRACE("Request type is DISCONNECT");
-        errno = 0;
+        goto out;
+    case NBD_CMD_CLOSE:
+        TRACE("Request type is CLOSE");
+        if (request.flags || request.from || request.len) {
+            LOG("bad parameters, skipping flush");
+            reply.error = EINVAL;
+        } else {
+            ret = blk_co_flush(exp->blk);
+            if (ret < 0) {
+                LOG("flush failed");
+                reply.error = -ret;
+            }
+        }
+        /* Attempt to send reply, but even if it fails, we are done */
+        nbd_co_send_reply(req, &reply, 0);
         goto out;
     case NBD_CMD_FLUSH:
         TRACE("Request type is FLUSH");
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC PATCH 17/18] nbd: Implement NBD_CMD_WRITE_ZEROES on server
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (15 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 16/18] nbd: Support NBD_CMD_CLOSE Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09  9:39   ` Pavel Borzenkov
  2016-04-09 10:54   ` Alex Bligh
  2016-04-08 22:05 ` [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
  2016-04-09 10:21 ` [Qemu-devel] [Nbd] [RFC PATCH 00/18] NBD protocol additions Wouter Verhelst
  18 siblings, 2 replies; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini, Kevin Wolf, open list:Block layer core

RFC because there is still discussion on the NBD list about
adding an NBD_OPT_ to let the client suggest server defaults
related to scanning for zeroes during NBD_CMD_WRITE, which may
tweak this patch.

Upstream NBD protocol recently added the ability to efficiently
write zeroes without having to send the zeroes over the wire,
along with a flag to control whether the client wants a hole.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  5 ++++-
 nbd/server.c        | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 4c57754..a1d955c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -70,6 +70,7 @@ typedef struct nbd_reply nbd_reply;
 #define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
 #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
 #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
+#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)     /* Send WRITE_ZEROES */
 #define NBD_FLAG_SEND_CLOSE     (1 << 8)        /* Send CLOSE */

 /* New-style handshake (global) flags, sent from server to client, and
@@ -92,7 +93,8 @@ typedef struct nbd_reply nbd_reply;
 #define NBD_REP_ERR_UNKNOWN     ((UINT32_C(1) << 31) | 6) /* Export unknown */

 /* Request flags, sent from client to server during transmission phase */
-#define NBD_CMD_FLAG_FUA        (1 << 0)
+#define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
+#define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */

 /* Supported request types */
 enum {
@@ -101,6 +103,7 @@ enum {
     NBD_CMD_DISC = 2,
     NBD_CMD_FLUSH = 3,
     NBD_CMD_TRIM = 4,
+    NBD_CMD_WRITE_ZEROES = 5,
     NBD_CMD_CLOSE = 7,
 };

diff --git a/nbd/server.c b/nbd/server.c
index 2a6eaf2..09af915 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -625,7 +625,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     int rc;
     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
                               NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
-                              NBD_FLAG_SEND_CLOSE);
+                              NBD_FLAG_SEND_CLOSE |
+                              NBD_FLAG_SEND_WRITE_ZEROES);
     bool oldStyle;
     size_t len;

@@ -1088,7 +1089,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
         goto out;
     }

-    if (request->flags & ~NBD_CMD_FLAG_FUA) {
+    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
         LOG("unsupported flags (got 0x%x)", request->flags);
         return -EINVAL;
     }
@@ -1102,7 +1103,13 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
     TRACE("Decoding type");

     command = request->type;
-    if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
+    if (request->flags & NBD_CMD_FLAG_NO_HOLE &&
+        !(command == NBD_CMD_WRITE || command == NBD_CMD_WRITE_ZEROES)) {
+        LOG("NO_HOLE flag valid only with write operation");
+        return -EINVAL;
+    }
+    if (command == NBD_CMD_READ || command == NBD_CMD_WRITE ||
+        command == NBD_CMD_WRITE_ZEROES) {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
             LOG("len (%" PRIu32" ) is larger than max len (%u)",
                 request->len, NBD_MAX_BUFFER_SIZE);
@@ -1143,6 +1150,7 @@ static void nbd_trip(void *opaque)
     struct nbd_reply reply;
     ssize_t ret;
     uint32_t command;
+    int flags;

     TRACE("Reading request.");
     if (client->closing) {
@@ -1221,6 +1229,9 @@ static void nbd_trip(void *opaque)

         TRACE("Writing to device");

+        /* FIXME: if the client passes NBD_CMD_FLAG_NO_HOLE, can we
+         * make that override a server that is set to look for
+         * holes? */
         ret = blk_write(exp->blk,
                         (request.from + exp->dev_offset) / BDRV_SECTOR_SIZE,
                         req->data, request.len / BDRV_SECTOR_SIZE);
@@ -1243,6 +1254,52 @@ static void nbd_trip(void *opaque)
             goto out;
         }
         break;
+    case NBD_CMD_WRITE_ZEROES:
+        TRACE("Request type is WRITE_ZEROES");
+
+        if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
+            TRACE("Server is read-only, return error");
+            reply.error = EROFS;
+            goto error_reply;
+        }
+
+        TRACE("Writing to device");
+
+        flags = 0;
+        if (request.flags & NBD_CMD_FLAG_FUA) {
+            flags |= BDRV_REQ_FUA;
+        }
+        if (!(request.flags & NBD_CMD_FLAG_NO_HOLE)) {
+            /* FIXME: should this depend on whether the server is set to
+               look for holes? */
+            flags |= BDRV_REQ_MAY_UNMAP;
+        }
+        ret = blk_write_zeroes(exp->blk,
+                               ((request.from + exp->dev_offset) /
+                                BDRV_SECTOR_SIZE),
+                               request.len / BDRV_SECTOR_SIZE,
+                               flags);
+        if (ret < 0) {
+            LOG("writing to file failed");
+            reply.error = -ret;
+            goto error_reply;
+        }
+
+        /* FIXME: do we need FUA flush here, if we also passed it to
+         * blk_write_zeroes? */
+        if (request.flags & NBD_CMD_FLAG_FUA) {
+            ret = blk_co_flush(exp->blk);
+            if (ret < 0) {
+                LOG("flush failed");
+                reply.error = -ret;
+                goto error_reply;
+            }
+        }
+
+        if (nbd_co_send_reply(req, &reply, 0) < 0) {
+            goto out;
+        }
+        break;
     case NBD_CMD_DISC:
         TRACE("Request type is DISCONNECT");
         goto out;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (16 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [RFC PATCH 17/18] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
@ 2016-04-08 22:05 ` Eric Blake
  2016-04-09 10:57   ` Alex Bligh
  2016-04-09 10:21 ` [Qemu-devel] [Nbd] [RFC PATCH 00/18] NBD protocol additions Wouter Verhelst
  18 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-08 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex, Paolo Bonzini, Kevin Wolf, open list:Block layer core

RFC because there is still discussion on the NBD list about
adding an NBD_OPT_ to let the client suggest server defaults
related to scanning for zeroes during NBD_CMD_WRITE, which may
tweak this patch.

Upstream NBD protocol recently added the ability to efficiently
write zeroes without having to send the zeroes over the wire,
along with a flag to control whether the client wants a hole.

The generic block code takes care of falling back to the obvious
write lots of zeroes if we return -ENOTSUP because the server
does not have WRITE_ZEROES.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd-client.h |  2 ++
 block/nbd-client.c | 34 ++++++++++++++++++++++++++++++++++
 block/nbd.c        | 23 +++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index bc7aec0..2fe6654 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -47,6 +47,8 @@ void nbd_client_close(BlockDriverState *bs);
 int nbd_client_co_discard(BlockDriverState *bs, int64_t sector_num,
                           int nb_sectors);
 int nbd_client_co_flush(BlockDriverState *bs);
+int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
+                               int nb_sectors, int *flags);
 int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
                          int nb_sectors, QEMUIOVector *qiov, int *flags);
 int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
diff --git a/block/nbd-client.c b/block/nbd-client.c
index f013084..4be83a8 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -291,6 +291,40 @@ int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
     return nbd_co_readv_1(bs, sector_num, nb_sectors, qiov, offset);
 }

+int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
+                               int nb_sectors, int *flags)
+{
+    ssize_t ret;
+    NbdClientSession *client = nbd_get_client_session(bs);
+    struct nbd_request request = { .type = NBD_CMD_WRITE_ZEROES };
+    struct nbd_reply reply;
+
+    if (!(client->nbdflags & NBD_FLAG_SEND_WRITE_ZEROES)) {
+        return -ENOTSUP;
+    }
+
+    if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
+        *flags &= ~BDRV_REQ_FUA;
+        request.flags |= NBD_CMD_FLAG_FUA;
+    }
+    if (!(*flags & BDRV_REQ_MAY_UNMAP)) {
+        request.flags |= NBD_CMD_FLAG_NO_HOLE;
+    }
+
+    request.from = sector_num * 512;
+    request.len = nb_sectors * 512;
+
+    nbd_coroutine_start(client, &request);
+    ret = nbd_co_send_request(bs, &request, NULL, 0);
+    if (ret < 0) {
+        reply.error = -ret;
+    } else {
+        nbd_co_receive_reply(client, &request, &reply, NULL, 0);
+    }
+    nbd_coroutine_end(client, &request);
+    return -reply.error;
+}
+
 int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
                          int nb_sectors, QEMUIOVector *qiov, int *flags)
 {
diff --git a/block/nbd.c b/block/nbd.c
index f7ea3b3..f5119c0 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -355,6 +355,26 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
     return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
 }

+static int nbd_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
+                               int nb_sectors, BdrvRequestFlags orig_flags)
+{
+    int flags = orig_flags;
+    int ret;
+
+    ret = nbd_client_co_write_zeroes(bs, sector_num, nb_sectors, &flags);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* The flag wasn't sent to the server, so we need to emulate it with an
+     * explicit flush */
+    if (flags & BDRV_REQ_FUA) {
+        ret = nbd_client_co_flush(bs);
+    }
+
+    return ret;
+}
+
 static int nbd_co_writev_flags(BlockDriverState *bs, int64_t sector_num,
                                int nb_sectors, QEMUIOVector *qiov, int flags)
 {
@@ -476,6 +496,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_parse_filename        = nbd_parse_filename,
     .bdrv_file_open             = nbd_open,
     .bdrv_co_readv              = nbd_co_readv,
+    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
     .bdrv_co_writev             = nbd_co_writev,
     .bdrv_co_writev_flags       = nbd_co_writev_flags,
     .supported_write_flags      = BDRV_REQ_FUA,
@@ -496,6 +517,7 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_parse_filename        = nbd_parse_filename,
     .bdrv_file_open             = nbd_open,
     .bdrv_co_readv              = nbd_co_readv,
+    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
     .bdrv_co_writev             = nbd_co_writev,
     .bdrv_co_writev_flags       = nbd_co_writev_flags,
     .supported_write_flags      = BDRV_REQ_FUA,
@@ -516,6 +538,7 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_parse_filename        = nbd_parse_filename,
     .bdrv_file_open             = nbd_open,
     .bdrv_co_readv              = nbd_co_readv,
+    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
     .bdrv_co_writev             = nbd_co_writev,
     .bdrv_co_writev_flags       = nbd_co_writev_flags,
     .supported_write_flags      = BDRV_REQ_FUA,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 17/18] nbd: Implement NBD_CMD_WRITE_ZEROES on server
  2016-04-08 22:05 ` [Qemu-devel] [RFC PATCH 17/18] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
@ 2016-04-09  9:39   ` Pavel Borzenkov
  2016-04-09 10:54   ` Alex Bligh
  1 sibling, 0 replies; 48+ messages in thread
From: Pavel Borzenkov @ 2016-04-09  9:39 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Kevin Wolf, Paolo Bonzini, open list:Block layer core, alex

On Fri, Apr 08, 2016 at 04:05:57PM -0600, Eric Blake wrote:
> RFC because there is still discussion on the NBD list about
> adding an NBD_OPT_ to let the client suggest server defaults
> related to scanning for zeroes during NBD_CMD_WRITE, which may
> tweak this patch.
> 
> Upstream NBD protocol recently added the ability to efficiently
> write zeroes without having to send the zeroes over the wire,
> along with a flag to control whether the client wants a hole.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  include/block/nbd.h |  5 ++++-
>  nbd/server.c        | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 64 insertions(+), 4 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 4c57754..a1d955c 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -70,6 +70,7 @@ typedef struct nbd_reply nbd_reply;
>  #define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
>  #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
>  #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
> +#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)     /* Send WRITE_ZEROES */
>  #define NBD_FLAG_SEND_CLOSE     (1 << 8)        /* Send CLOSE */
> 
>  /* New-style handshake (global) flags, sent from server to client, and
> @@ -92,7 +93,8 @@ typedef struct nbd_reply nbd_reply;
>  #define NBD_REP_ERR_UNKNOWN     ((UINT32_C(1) << 31) | 6) /* Export unknown */
> 
>  /* Request flags, sent from client to server during transmission phase */
> -#define NBD_CMD_FLAG_FUA        (1 << 0)
> +#define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
> +#define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */
> 
>  /* Supported request types */
>  enum {
> @@ -101,6 +103,7 @@ enum {
>      NBD_CMD_DISC = 2,
>      NBD_CMD_FLUSH = 3,
>      NBD_CMD_TRIM = 4,
> +    NBD_CMD_WRITE_ZEROES = 5,
>      NBD_CMD_CLOSE = 7,
>  };
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 2a6eaf2..09af915 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -625,7 +625,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>      int rc;
>      const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
>                                NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
> -                              NBD_FLAG_SEND_CLOSE);
> +                              NBD_FLAG_SEND_CLOSE |
> +                              NBD_FLAG_SEND_WRITE_ZEROES);
>      bool oldStyle;
>      size_t len;
> 
> @@ -1088,7 +1089,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
>          goto out;
>      }
> 
> -    if (request->flags & ~NBD_CMD_FLAG_FUA) {
> +    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
>          LOG("unsupported flags (got 0x%x)", request->flags);
>          return -EINVAL;
>      }
> @@ -1102,7 +1103,13 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
>      TRACE("Decoding type");
> 
>      command = request->type;
> -    if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
> +    if (request->flags & NBD_CMD_FLAG_NO_HOLE &&
> +        !(command == NBD_CMD_WRITE || command == NBD_CMD_WRITE_ZEROES)) {
> +        LOG("NO_HOLE flag valid only with write operation");
> +        return -EINVAL;
> +    }
> +    if (command == NBD_CMD_READ || command == NBD_CMD_WRITE ||
> +        command == NBD_CMD_WRITE_ZEROES) {
>          if (request->len > NBD_MAX_BUFFER_SIZE) {
>              LOG("len (%" PRIu32" ) is larger than max len (%u)",
>                  request->len, NBD_MAX_BUFFER_SIZE);
> @@ -1143,6 +1150,7 @@ static void nbd_trip(void *opaque)
>      struct nbd_reply reply;
>      ssize_t ret;
>      uint32_t command;
> +    int flags;
> 
>      TRACE("Reading request.");
>      if (client->closing) {
> @@ -1221,6 +1229,9 @@ static void nbd_trip(void *opaque)
> 
>          TRACE("Writing to device");
> 
> +        /* FIXME: if the client passes NBD_CMD_FLAG_NO_HOLE, can we
> +         * make that override a server that is set to look for
> +         * holes? */
>          ret = blk_write(exp->blk,
>                          (request.from + exp->dev_offset) / BDRV_SECTOR_SIZE,
>                          req->data, request.len / BDRV_SECTOR_SIZE);
> @@ -1243,6 +1254,52 @@ static void nbd_trip(void *opaque)
>              goto out;
>          }
>          break;
> +    case NBD_CMD_WRITE_ZEROES:
> +        TRACE("Request type is WRITE_ZEROES");
> +
> +        if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
> +            TRACE("Server is read-only, return error");
> +            reply.error = EROFS;
> +            goto error_reply;
> +        }
> +
> +        TRACE("Writing to device");
> +
> +        flags = 0;
> +        if (request.flags & NBD_CMD_FLAG_FUA) {
> +            flags |= BDRV_REQ_FUA;
> +        }
> +        if (!(request.flags & NBD_CMD_FLAG_NO_HOLE)) {
> +            /* FIXME: should this depend on whether the server is set to
> +               look for holes? */
> +            flags |= BDRV_REQ_MAY_UNMAP;
> +        }
> +        ret = blk_write_zeroes(exp->blk,
> +                               ((request.from + exp->dev_offset) /
> +                                BDRV_SECTOR_SIZE),
> +                               request.len / BDRV_SECTOR_SIZE,
> +                               flags);

blk_write_zeroes() ignores flags right now. Something like this is
required for this code to work:

diff --git a/block/block-backend.c b/block/block-backend.c
index f0470bc..3ae4157 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -827,7 +827,7 @@ int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
                      int nb_sectors, BdrvRequestFlags flags)
 {
     return blk_rw(blk, sector_num, NULL, nb_sectors, blk_write_entry,
-                  BDRV_REQ_ZERO_WRITE);
+                  BDRV_REQ_ZERO_WRITE | flags);
 }
 
 static void error_callback_bh(void *opaque)

> +        if (ret < 0) {
> +            LOG("writing to file failed");
> +            reply.error = -ret;
> +            goto error_reply;
> +        }
> +
> +        /* FIXME: do we need FUA flush here, if we also passed it to
> +         * blk_write_zeroes? */
> +        if (request.flags & NBD_CMD_FLAG_FUA) {
> +            ret = blk_co_flush(exp->blk);
> +            if (ret < 0) {
> +                LOG("flush failed");
> +                reply.error = -ret;
> +                goto error_reply;
> +            }
> +        }
> +
> +        if (nbd_co_send_reply(req, &reply, 0) < 0) {
> +            goto out;
> +        }
> +        break;
>      case NBD_CMD_DISC:
>          TRACE("Request type is DISCONNECT");
>          goto out;
> -- 
> 2.5.5
> 
> 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [Nbd] [RFC PATCH 00/18] NBD protocol additions
  2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
                   ` (17 preceding siblings ...)
  2016-04-08 22:05 ` [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
@ 2016-04-09 10:21 ` Wouter Verhelst
  18 siblings, 0 replies; 48+ messages in thread
From: Wouter Verhelst @ 2016-04-09 10:21 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, nbd-general

On Fri, Apr 08, 2016 at 04:05:40PM -0600, Eric Blake wrote:
> This series is for qemu 2.7, and will probably need some rework
> especially since some of it is trying to implement features
> that are still marked experimental in upstream NBD.
> 
> Included are some interoperability bug fixes, code cleanups, then
> added support both client-side and server-side for:
> NBD_FLAG_C_NO_ZEROES
> NBD_CMD_WRITE_ZEROES
> NBD_CMD_CLOSE

Please hold off with the CLOSE implementation until the discussion on
that has finalized.

(not commenting on the rest of your series, since that's mostly
qemu-specific)

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 01/18] nbd: Don't kill server on client that doesn't request TLS
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 01/18] nbd: Don't kill server on client that doesn't request TLS Eric Blake
@ 2016-04-09 10:28   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:28 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> Upstream NBD is documenting that servers MAY choose to operate
> in a conditional mode, where it is up to the client whether to
> use TLS.  For qemu's case, we want to always be in FORCEDTLS
> mode, because of the risk of man-in-the-middle attacks, and since
> we never export more than one device; likewise, the qemu client
> will ALWAYS send NBD_OPT_STARTTLS as its first option.  But now
> that SELECTIVETLS servers exist, it is feasible to encounter a
> (non-qemu) client that does not do NBD_OPT_STARTTLS first, but
> rather wants to take advantage of the conditional modes it might
> find elsewhere.
> 
> Since we require TLS, we are within our rights to drop connections
> on any client that doesn't negotiate it right away, or which
> attempts to negotiate it incorrectly, without violating the intent
> of the NBD Protocol.  However, it's better to allow the client to
> continue trying, on the grounds that maybe the client will get the
> hint to send NBD_OPT_STARTTLS.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> nbd/server.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 2a4dd10..e7e4881 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -451,9 +451,12 @@ static int nbd_negotiate_options(NBDClient *client)
> 
>             default:
>                 TRACE("Option 0x%x not permitted before TLS", clientflags);
> +                if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
> +                    return -EIO;
> +                }
>                 nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_TLS_REQD,
>                                        clientflags);
> -                return -EINVAL;
> +                break;
>             }
>         } else if (fixedNewstyle) {
>             switch (clientflags) {
> @@ -471,6 +474,9 @@ static int nbd_negotiate_options(NBDClient *client)
>                 return nbd_negotiate_handle_export_name(client, length);
> 
>             case NBD_OPT_STARTTLS:
> +                if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
> +                    return -EIO;
> +                }
>                 if (client->tlscreds) {
>                     TRACE("TLS already enabled");
>                     nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_INVALID,
> @@ -480,7 +486,7 @@ static int nbd_negotiate_options(NBDClient *client)
>                     nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_POLICY,
>                                            clientflags);
>                 }
> -                return -EINVAL;
> +                break;
>             default:
>                 TRACE("Unsupported option 0x%x", clientflags);
>                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 02/18] nbd: Don't fail handshake on NBD_OPT_LIST descriptions
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 02/18] nbd: Don't fail handshake on NBD_OPT_LIST descriptions Eric Blake
@ 2016-04-09 10:30   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:30 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> The NBD Protocol states that NBD_REP_SERVER may set
> 'length > sizeof(namelen) + namelen'; in which case the rest
> of the packet is a UTF-8 description of the export.  While we
> don't know of any NBD servers that send this description yet,
> we had better consume the data so we don't choke when we start
> to talk to such a server.
> 
> Also, a (buggy/malicious) server that replies with length <
> sizeof(namelen) would cause us to block waiting for bytes that
> the server is not sending, and one that replies with super-huge
> lengths could cause us to temporarily allocate up to 4G memory.
> Sanity check things before blindly reading incorrectly.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> nbd/client.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 6777e58..48f2a21 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -192,13 +192,18 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>             return -1;
>         }
>     } else if (type == NBD_REP_SERVER) {
> +        if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
> +            error_setg(errp, "incorrect option length");
> +            return -1;
> +        }
>         if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
>             error_setg(errp, "failed to read option name length");
>             return -1;
>         }
>         namelen = be32_to_cpu(namelen);
> -        if (len != (namelen + sizeof(namelen))) {
> -            error_setg(errp, "incorrect option mame length");
> +        len -= sizeof(namelen);
> +        if (len < namelen) {
> +            error_setg(errp, "incorrect option name length");
>             return -1;
>         }
>         if (namelen > 255) {
> @@ -214,6 +219,20 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>             return -1;
>         }
>         (*name)[namelen] = '\0';
> +        len -= namelen;
> +        if (len) {
> +            char *buf = g_malloc(len + 1);
> +            if (read_sync(ioc, buf, len) != len) {
> +                error_setg(errp, "failed to read export description");
> +                g_free(*name);
> +                g_free(buf);
> +                *name = NULL;
> +                return -1;
> +            }
> +            buf[len] = '\0';
> +            TRACE("Ignoring export description: %s", buf);
> +            g_free(buf);
> +        }
>     } else {
>         error_setg(errp, "Unexpected reply type %x expected %x",
>                    type, NBD_REP_SERVER);
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 03/18] nbd: More debug typo fixes, use correct formats
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 03/18] nbd: More debug typo fixes, use correct formats Eric Blake
@ 2016-04-09 10:30   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:30 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> Clean up some debug message oddities missed earlier; this includes
> both typos, and recognizing that %d is not necessarily compatible
> with uint32_t.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> nbd/client.c | 41 ++++++++++++++++++++++-------------------
> nbd/server.c | 44 +++++++++++++++++++++++---------------------
> 2 files changed, 45 insertions(+), 40 deletions(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 48f2a21..42e4e52 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -109,25 +109,27 @@ static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,
> 
>     switch (type) {
>     case NBD_REP_ERR_UNSUP:
> -        TRACE("server doesn't understand request %d, attempting fallback",
> -              opt);
> +        TRACE("server doesn't understand request %" PRIx32
> +              ", attempting fallback", opt);
>         result = 0;
>         goto cleanup;
> 
>     case NBD_REP_ERR_POLICY:
> -        error_setg(errp, "Denied by server for option %x", opt);
> +        error_setg(errp, "Denied by server for option %" PRIx32, opt);
>         break;
> 
>     case NBD_REP_ERR_INVALID:
> -        error_setg(errp, "Invalid data length for option %x", opt);
> +        error_setg(errp, "Invalid data length for option %" PRIx32, opt);
>         break;
> 
>     case NBD_REP_ERR_TLS_REQD:
> -        error_setg(errp, "TLS negotiation required before option %x", opt);
> +        error_setg(errp, "TLS negotiation required before option %" PRIx32,
> +                   opt);
>         break;
> 
>     default:
> -        error_setg(errp, "Unknown error code when asking for option %x", opt);
> +        error_setg(errp, "Unknown error code when asking for option %" PRIx32,
> +                   opt);
>         break;
>     }
> 
> @@ -165,7 +167,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>     }
>     opt = be32_to_cpu(opt);
>     if (opt != NBD_OPT_LIST) {
> -        error_setg(errp, "Unexpected option type %x expected %x",
> +        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
>                    opt, NBD_OPT_LIST);
>         return -1;
>     }
> @@ -207,7 +209,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>             return -1;
>         }
>         if (namelen > 255) {
> -            error_setg(errp, "export name length too long %d", namelen);
> +            error_setg(errp, "export name length too long %" PRIu32, namelen);
>             return -1;
>         }
> 
> @@ -234,7 +236,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>             g_free(buf);
>         }
>     } else {
> -        error_setg(errp, "Unexpected reply type %x expected %x",
> +        error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
>                    type, NBD_REP_SERVER);
>         return -1;
>     }
> @@ -349,7 +351,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>     }
>     opt = be32_to_cpu(opt);
>     if (opt != NBD_OPT_STARTTLS) {
> -        error_setg(errp, "Unexpected option type %x expected %x",
> +        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
>                    opt, NBD_OPT_STARTTLS);
>         return NULL;
>     }
> @@ -361,7 +363,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>     }
>     type = be32_to_cpu(type);
>     if (type != NBD_REP_ACK) {
> -        error_setg(errp, "Server rejected request to start TLS %x",
> +        error_setg(errp, "Server rejected request to start TLS %" PRIx32,
>                    type);
>         return NULL;
>     }
> @@ -373,7 +375,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>     }
>     length = be32_to_cpu(length);
>     if (length != 0) {
> -        error_setg(errp, "Start TLS reponse was not zero %x",
> +        error_setg(errp, "Start TLS response was not zero %" PRIu32,
>                    length);
>         return NULL;
>     }
> @@ -384,7 +386,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>         return NULL;
>     }
>     data.loop = g_main_loop_new(g_main_context_default(), FALSE);
> -    TRACE("Starting TLS hanshake");
> +    TRACE("Starting TLS handshake");
>     qio_channel_tls_handshake(tioc,
>                               nbd_tls_handshake,
>                               &data,
> @@ -474,7 +476,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>         }
>         globalflags = be16_to_cpu(globalflags);
>         *flags = globalflags << 16;
> -        TRACE("Global flags are %x", globalflags);
> +        TRACE("Global flags are %" PRIx32, globalflags);
>         if (globalflags & NBD_FLAG_FIXED_NEWSTYLE) {
>             fixedNewStyle = true;
>             TRACE("Server supports fixed new style");
> @@ -550,7 +552,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>         }
>         exportflags = be16_to_cpu(exportflags);
>         *flags |= exportflags;
> -        TRACE("Export flags are %x", exportflags);
> +        TRACE("Export flags are %" PRIx16, exportflags);
>     } else if (magic == NBD_CLIENT_MAGIC) {
>         if (name) {
>             error_setg(errp, "Server does not support export names");
> @@ -683,7 +685,8 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request)
>     ssize_t ret;
> 
>     TRACE("Sending request to server: "
> -          "{ .from = %" PRIu64", .len = %u, .handle = %" PRIu64", .type=%i}",
> +          "{ .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64
> +          ", .type=%" PRIu16 " }",
>           request->from, request->len, request->handle, request->type);
> 
>     cpu_to_be32w((uint32_t*)buf, NBD_REQUEST_MAGIC);
> @@ -732,12 +735,12 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply)
> 
>     reply->error = nbd_errno_to_system_errno(reply->error);
> 
> -    TRACE("Got reply: "
> -          "{ magic = 0x%x, .error = %d, handle = %" PRIu64" }",
> +    TRACE("Got reply: { magic = 0x%" PRIx32 ", .error = % " PRId32
> +          ", handle = %" PRIu64" }",
>           magic, reply->error, reply->handle);
> 
>     if (magic != NBD_REPLY_MAGIC) {
> -        LOG("invalid magic (got 0x%x)", magic);
> +        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
>         return -EINVAL;
>     }
>     return 0;
> diff --git a/nbd/server.c b/nbd/server.c
> index e7e4881..81afae2 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -196,7 +196,7 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
>     uint64_t magic;
>     uint32_t len;
> 
> -    TRACE("Reply opt=%x type=%x", type, opt);
> +    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32, type, opt);
> 
>     magic = cpu_to_be64(NBD_REP_MAGIC);
>     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> @@ -226,7 +226,7 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
>     uint64_t magic, name_len;
>     uint32_t opt, type, len;
> 
> -    TRACE("Advertizing export name '%s'", exp->name ? exp->name : "");
> +    TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
>     name_len = strlen(exp->name);
>     magic = cpu_to_be64(NBD_REP_MAGIC);
>     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> @@ -392,12 +392,12 @@ static int nbd_negotiate_options(NBDClient *client)
>     TRACE("Checking client flags");
>     be32_to_cpus(&flags);
>     if (flags & NBD_FLAG_C_FIXED_NEWSTYLE) {
> -        TRACE("Support supports fixed newstyle handshake");
> +        TRACE("Client supports fixed newstyle handshake");
>         fixedNewstyle = true;
>         flags &= ~NBD_FLAG_C_FIXED_NEWSTYLE;
>     }
>     if (flags != 0) {
> -        TRACE("Unknown client flags 0x%x received", flags);
> +        TRACE("Unknown client flags 0x%" PRIx32 " received", flags);
>         return -EIO;
>     }
> 
> @@ -431,12 +431,12 @@ static int nbd_negotiate_options(NBDClient *client)
>         }
>         length = be32_to_cpu(length);
> 
> -        TRACE("Checking option 0x%x", clientflags);
> +        TRACE("Checking option 0x%" PRIx32, clientflags);
>         if (client->tlscreds &&
>             client->ioc == (QIOChannel *)client->sioc) {
>             QIOChannel *tioc;
>             if (!fixedNewstyle) {
> -                TRACE("Unsupported option 0x%x", clientflags);
> +                TRACE("Unsupported option 0x%" PRIx32, clientflags);
>                 return -EINVAL;
>             }
>             switch (clientflags) {
> @@ -450,7 +450,8 @@ static int nbd_negotiate_options(NBDClient *client)
>                 break;
> 
>             default:
> -                TRACE("Option 0x%x not permitted before TLS", clientflags);
> +                TRACE("Option 0x%" PRIx32 " not permitted before TLS",
> +                      clientflags);
>                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
>                     return -EIO;
>                 }
> @@ -488,7 +489,7 @@ static int nbd_negotiate_options(NBDClient *client)
>                 }
>                 break;
>             default:
> -                TRACE("Unsupported option 0x%x", clientflags);
> +                TRACE("Unsupported option 0x%" PRIx32, clientflags);
>                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
>                     return -EIO;
>                 }
> @@ -506,7 +507,7 @@ static int nbd_negotiate_options(NBDClient *client)
>                 return nbd_negotiate_handle_export_name(client, length);
> 
>             default:
> -                TRACE("Unsupported option 0x%x", clientflags);
> +                TRACE("Unsupported option 0x%" PRIx32, clientflags);
>                 return -EINVAL;
>             }
>         }
> @@ -647,12 +648,12 @@ static ssize_t nbd_receive_request(QIOChannel *ioc, struct nbd_request *request)
>     request->from  = be64_to_cpup((uint64_t*)(buf + 16));
>     request->len   = be32_to_cpup((uint32_t*)(buf + 24));
> 
> -    TRACE("Got request: "
> -          "{ magic = 0x%x, .type = %d, from = %" PRIu64" , len = %u }",
> +    TRACE("Got request: { magic = 0x%" PRIx32 ", .type = %" PRIx32
> +          ", from = %" PRIu64 " , len = %" PRIu32 " }",
>           magic, request->type, request->from, request->len);
> 
>     if (magic != NBD_REQUEST_MAGIC) {
> -        LOG("invalid magic (got 0x%x)", magic);
> +        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
>         return -EINVAL;
>     }
>     return 0;
> @@ -665,7 +666,8 @@ static ssize_t nbd_send_reply(QIOChannel *ioc, struct nbd_reply *reply)
> 
>     reply->error = system_errno_to_nbd_errno(reply->error);
> 
> -    TRACE("Sending response to client: { .error = %d, handle = %" PRIu64 " }",
> +    TRACE("Sending response to client: { .error = %" PRId32
> +          ", handle = %" PRIu64 " }",
>           reply->error, reply->handle);
> 
>     /* Reply
> @@ -994,7 +996,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
>     command = request->type & NBD_CMD_MASK_COMMAND;
>     if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
>         if (request->len > NBD_MAX_BUFFER_SIZE) {
> -            LOG("len (%u) is larger than max len (%u)",
> +            LOG("len (%" PRIu32" ) is larger than max len (%u)",
>                 request->len, NBD_MAX_BUFFER_SIZE);
>             rc = -EINVAL;
>             goto out;
> @@ -1007,7 +1009,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
>         }
>     }
>     if (command == NBD_CMD_WRITE) {
> -        TRACE("Reading %u byte(s)", request->len);
> +        TRACE("Reading %" PRIu32 " byte(s)", request->len);
> 
>         if (read_sync(client->ioc, req->data, request->len) != request->len) {
>             LOG("reading from socket failed");
> @@ -1057,10 +1059,10 @@ static void nbd_trip(void *opaque)
>     }
>     command = request.type & NBD_CMD_MASK_COMMAND;
>     if (command != NBD_CMD_DISC && (request.from + request.len) > exp->size) {
> -            LOG("From: %" PRIu64 ", Len: %u, Size: %" PRIu64
> -            ", Offset: %" PRIu64 "\n",
> -                    request.from, request.len,
> -                    (uint64_t)exp->size, (uint64_t)exp->dev_offset);
> +            LOG("From: %" PRIu64 ", Len: %" PRIu32", Size: %" PRIu64
> +                ", Offset: %" PRIu64 "\n",
> +                request.from, request.len,
> +                (uint64_t)exp->size, (uint64_t)exp->dev_offset);
>         LOG("requested operation past EOF--bad client?");
>         goto invalid_request;
>     }
> @@ -1095,7 +1097,7 @@ static void nbd_trip(void *opaque)
>             goto error_reply;
>         }
> 
> -        TRACE("Read %u byte(s)", request.len);
> +        TRACE("Read %" PRIu32" byte(s)", request.len);
>         if (nbd_co_send_reply(req, &reply, request.len) < 0)
>             goto out;
>         break;
> @@ -1162,7 +1164,7 @@ static void nbd_trip(void *opaque)
>         }
>         break;
>     default:
> -        LOG("invalid request type (%u) received", request.type);
> +        LOG("invalid request type (%" PRIu32 ") received", request.type);
>     invalid_request:
>         reply.error = EINVAL;
>     error_reply:
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 04/18] nbd: Detect servers that send unexpected error values
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 04/18] nbd: Detect servers that send unexpected error values Eric Blake
@ 2016-04-09 10:31   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:31 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> Add some debugging to flag servers that are not compliant to
> the NBD protocol.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> nbd/client.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 42e4e52..c834587 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -33,8 +33,10 @@ static int nbd_errno_to_system_errno(int err)
>         return ENOMEM;
>     case NBD_ENOSPC:
>         return ENOSPC;
> +    default:
> +        TRACE("Squashing unexpected error %d to EINVAL", err);
> +        /* fallthrough */
>     case NBD_EINVAL:
> -    default:
>         return EINVAL;
>     }
> }
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 05/18] nbd: Reject unknown request flags
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 05/18] nbd: Reject unknown request flags Eric Blake
@ 2016-04-09 10:32   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:32 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> The NBD protocol says that clients should not send a command flag
> that has not been negotiated (whether by the client requesting an
> option during a handshake, or because we advertise support for the
> flag in response to NBD_OPT_EXPORT_NAME), and that servers should
> reject invalid flags with EINVAL.  We were silently ignoring the
> flags instead.  The client can't rely on our behavior, since it is
> their fault for passing the bad flag in the first place, but it's
> better to be robust up front than to possibly behave differently
> than the client was expecting with the attempted flag.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> nbd/server.c | 5 +++++
> 1 file changed, 5 insertions(+)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 81afae2..a10294e 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -984,6 +984,11 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
>         goto out;
>     }
> 
> +    if (request->type & ~NBD_CMD_MASK_COMMAND & ~NBD_CMD_FLAG_FUA) {
> +        LOG("unsupported flags (got 0x%x)",
> +            request->type & ~NBD_CMD_MASK_COMMAND);
> +        return -EINVAL;
> +    }
>     if ((request->from + request->len) < request->from) {
>         LOG("integer overflow detected! "
>             "you're probably being attacked");
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 06/18] nbd: Avoid magic number for NBD max name size
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 06/18] nbd: Avoid magic number for NBD max name size Eric Blake
@ 2016-04-09 10:35   ` Alex Bligh
  2016-04-09 22:07     ` Eric Blake
  0 siblings, 1 reply; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:35 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini,
	open list:Block layer core


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> Declare a constant and use that when determining if an export
> name fits within the constraints we are willing to support.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> include/block/nbd.h | 2 ++
> nbd/client.c        | 2 +-
> nbd/server.c        | 4 ++--
> 3 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index b86a976..3f047bf 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -76,6 +76,8 @@ enum {
> 
> /* Maximum size of a single READ/WRITE data buffer */
> #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
> +/* Maximum size of an export name */
> +#define NBD_MAX_NAME_SIZE 255

Given the standard is either likely to or does (can't
remember whether that patch is merged) document the
maximum supported export length as 4096, why not change
this to 4096?

Otherwise:

Reviewed-by: Alex Bligh <alex@alex.org.uk>

> 
> ssize_t nbd_wr_syncv(QIOChannel *ioc,
>                      struct iovec *iov,
> diff --git a/nbd/client.c b/nbd/client.c
> index c834587..00f9244 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -210,7 +210,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>             error_setg(errp, "incorrect option name length");
>             return -1;
>         }
> -        if (namelen > 255) {
> +        if (namelen > NBD_MAX_NAME_SIZE) {
>             error_setg(errp, "export name length too long %" PRIu32, namelen);
>             return -1;
>         }
> diff --git a/nbd/server.c b/nbd/server.c
> index a10294e..5414c49 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -285,13 +285,13 @@ static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
> static int nbd_negotiate_handle_export_name(NBDClient *client, uint32_t length)
> {
>     int rc = -EINVAL;
> -    char name[256];
> +    char name[NBD_MAX_NAME_SIZE + 1];
> 
>     /* Client sends:
>         [20 ..  xx]   export name (length bytes)
>      */
>     TRACE("Checking length");
> -    if (length > 255) {
> +    if (length >= sizeof(name)) {
>         LOG("Bad length received");
>         goto fail;
>     }
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 07/18] nbd: Treat flags vs. command type as separate fields
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 07/18] nbd: Treat flags vs. command type as separate fields Eric Blake
@ 2016-04-09 10:37   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:37 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini,
	open list:Block layer core


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> Current upstream NBD documents that requests have a 16-bit flags,
> followed by a 16-bit type integer; although older versions mentioned
> only a 32-bit field with masking to find flags.  Since the protocol
> is in network order (big-endian over the wire), the ABI is unchanged;
> but dealing with the flags as a separate field rather than masking
> will make it easier to add support for upcoming NBD extensions that
> increase the number of both flags and commands.
> 
> Improve some comments in nbd.h based on the current upstream
> NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
> and touch some nearby code to keep checkpatch.pl happy.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> include/block/nbd.h | 18 ++++++++++++------
> nbd/nbd-internal.h  |  4 ++--
> block/nbd-client.c  |  9 +++------
> nbd/client.c        | 17 ++++++++++-------
> nbd/server.c        | 35 +++++++++++++++++++----------------
> 5 files changed, 46 insertions(+), 37 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 3f047bf..2c61901 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -1,4 +1,5 @@
> /*
> + *  Copyright (C) 2016 Red Hat, Inc.
>  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>  *
>  *  Network Block Device
> @@ -27,7 +28,8 @@
> 
> struct nbd_request {
>     uint32_t magic;
> -    uint32_t type;
> +    uint16_t flags;
> +    uint16_t type;
>     uint64_t handle;
>     uint64_t from;
>     uint32_t len;
> @@ -39,6 +41,8 @@ struct nbd_reply {
>     uint64_t handle;
> } QEMU_PACKED;
> 
> +/* Transmission (export) flags: sent from server to client during handshake,
> +   but describe what will happen during transmission */
> #define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
> #define NBD_FLAG_READ_ONLY      (1 << 1)        /* Device is read-only */
> #define NBD_FLAG_SEND_FLUSH     (1 << 2)        /* Send FLUSH */
> @@ -46,10 +50,12 @@ struct nbd_reply {
> #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
> #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
> 
> -/* New-style global flags. */
> +/* New-style handshake (global) flags, sent from server to client, and
> +   control what will happen during handshake phase. */
> #define NBD_FLAG_FIXED_NEWSTYLE     (1 << 0)    /* Fixed newstyle protocol. */
> 
> -/* New-style client flags. */
> +/* New-style client flags, sent from client to server to control what happens
> +   during handshake phase. */
> #define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)    /* Fixed newstyle protocol. */
> 
> /* Reply types. */
> @@ -60,10 +66,10 @@ struct nbd_reply {
> #define NBD_REP_ERR_INVALID     ((UINT32_C(1) << 31) | 3) /* Invalid length. */
> #define NBD_REP_ERR_TLS_REQD    ((UINT32_C(1) << 31) | 5) /* TLS required */
> 
> +/* Request flags, sent from client to server during transmission phase */
> +#define NBD_CMD_FLAG_FUA        (1 << 0)
> 
> -#define NBD_CMD_MASK_COMMAND	0x0000ffff
> -#define NBD_CMD_FLAG_FUA	(1 << 16)
> -
> +/* Supported request types */
> enum {
>     NBD_CMD_READ = 0,
>     NBD_CMD_WRITE = 1,
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 3791535..b663bf3 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -52,10 +52,10 @@
> /* This is all part of the "official" NBD API.
>  *
>  * The most up-to-date documentation is available at:
> - * https://github.com/yoe/nbd/blob/master/doc/proto.txt
> + * https://github.com/yoe/nbd/blob/master/doc/proto.md
>  */
> 
> -#define NBD_REQUEST_SIZE        (4 + 4 + 8 + 8 + 4)
> +#define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
> #define NBD_REPLY_SIZE          (4 + 4 + 8)
> #define NBD_REQUEST_MAGIC       0x25609513
> #define NBD_REPLY_MAGIC         0x67446698
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 878e879..285025d 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -1,6 +1,7 @@
> /*
>  * QEMU Block driver for  NBD
>  *
> + * Copyright (C) 2016 Red Hat, Inc.
>  * Copyright (C) 2008 Bull S.A.S.
>  *     Author: Laurent Vivier <Laurent.Vivier@bull.net>
>  *
> @@ -252,7 +253,7 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
> 
>     if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
>         *flags &= ~BDRV_REQ_FUA;
> -        request.type |= NBD_CMD_FLAG_FUA;
> +        request.flags |= NBD_CMD_FLAG_FUA;
>     }
> 
>     request.from = sector_num * 512;
> @@ -376,11 +377,7 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
> void nbd_client_close(BlockDriverState *bs)
> {
>     NbdClientSession *client = nbd_get_client_session(bs);
> -    struct nbd_request request = {
> -        .type = NBD_CMD_DISC,
> -        .from = 0,
> -        .len = 0
> -    };
> +    struct nbd_request request = { .type = NBD_CMD_DISC };
> 
>     if (client->ioc == NULL) {
>         return;
> diff --git a/nbd/client.c b/nbd/client.c
> index 00f9244..7fd6059 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -1,4 +1,5 @@
> /*
> + *  Copyright (C) 2016 Red Hat, Inc.
>  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>  *
>  *  Network Block Device Client Side
> @@ -688,14 +689,16 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request)
> 
>     TRACE("Sending request to server: "
>           "{ .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64
> -          ", .type=%" PRIu16 " }",
> -          request->from, request->len, request->handle, request->type);
> +          ", .flags=%" PRIx16 ", .type=%" PRIu16 " }",
> +          request->from, request->len, request->handle,
> +          request->flags, request->type);
> 
> -    cpu_to_be32w((uint32_t*)buf, NBD_REQUEST_MAGIC);
> -    cpu_to_be32w((uint32_t*)(buf + 4), request->type);
> -    cpu_to_be64w((uint64_t*)(buf + 8), request->handle);
> -    cpu_to_be64w((uint64_t*)(buf + 16), request->from);
> -    cpu_to_be32w((uint32_t*)(buf + 24), request->len);
> +    cpu_to_be32w((uint32_t *)buf, NBD_REQUEST_MAGIC);
> +    cpu_to_be16w((uint16_t *)(buf + 4), request->flags);
> +    cpu_to_be16w((uint16_t *)(buf + 6), request->type);
> +    cpu_to_be64w((uint64_t *)(buf + 8), request->handle);
> +    cpu_to_be64w((uint64_t *)(buf + 16), request->from);
> +    cpu_to_be32w((uint32_t *)(buf + 24), request->len);
> 
>     ret = write_sync(ioc, buf, sizeof(buf));
>     if (ret < 0) {
> diff --git a/nbd/server.c b/nbd/server.c
> index 5414c49..93c077e 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1,4 +1,5 @@
> /*
> + *  Copyright (C) 2016 Red Hat, Inc.
>  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>  *
>  *  Network Block Device Server Side
> @@ -636,21 +637,23 @@ static ssize_t nbd_receive_request(QIOChannel *ioc, struct nbd_request *request)
> 
>     /* Request
>        [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
> -       [ 4 ..  7]   type    (0 == READ, 1 == WRITE)
> +       [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
> +       [ 6 ..  7]   type    (NBD_CMD_READ, ...)
>        [ 8 .. 15]   handle
>        [16 .. 23]   from
>        [24 .. 27]   len
>      */
> 
> -    magic = be32_to_cpup((uint32_t*)buf);
> -    request->type  = be32_to_cpup((uint32_t*)(buf + 4));
> -    request->handle = be64_to_cpup((uint64_t*)(buf + 8));
> -    request->from  = be64_to_cpup((uint64_t*)(buf + 16));
> -    request->len   = be32_to_cpup((uint32_t*)(buf + 24));
> +    magic = be32_to_cpup((uint32_t *)buf);
> +    request->flags = be16_to_cpup((uint16_t *)(buf + 4));
> +    request->type  = be16_to_cpup((uint16_t *)(buf + 6));
> +    request->handle = be64_to_cpup((uint64_t *)(buf + 8));
> +    request->from  = be64_to_cpup((uint64_t *)(buf + 16));
> +    request->len   = be32_to_cpup((uint32_t *)(buf + 24));
> 
> -    TRACE("Got request: { magic = 0x%" PRIx32 ", .type = %" PRIx32
> -          ", from = %" PRIu64 " , len = %" PRIu32 " }",
> -          magic, request->type, request->from, request->len);
> +    TRACE("Got request: { magic = 0x%" PRIx32 ", .flags = %" PRIx16
> +          ".type = %" PRIx16 ", from = %" PRIu64 " , len = %" PRIu32 " }",
> +          magic, request->flags, request->type, request->from, request->len);
> 
>     if (magic != NBD_REQUEST_MAGIC) {
>         LOG("invalid magic (got 0x%" PRIx32 ")", magic);
> @@ -984,9 +987,8 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
>         goto out;
>     }
> 
> -    if (request->type & ~NBD_CMD_MASK_COMMAND & ~NBD_CMD_FLAG_FUA) {
> -        LOG("unsupported flags (got 0x%x)",
> -            request->type & ~NBD_CMD_MASK_COMMAND);
> +    if (request->flags & ~NBD_CMD_FLAG_FUA) {
> +        LOG("unsupported flags (got 0x%x)", request->flags);
>         return -EINVAL;
>     }
>     if ((request->from + request->len) < request->from) {
> @@ -998,7 +1000,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
> 
>     TRACE("Decoding type");
> 
> -    command = request->type & NBD_CMD_MASK_COMMAND;
> +    command = request->type;
>     if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
>         if (request->len > NBD_MAX_BUFFER_SIZE) {
>             LOG("len (%" PRIu32" ) is larger than max len (%u)",
> @@ -1062,7 +1064,7 @@ static void nbd_trip(void *opaque)
>         reply.error = -ret;
>         goto error_reply;
>     }
> -    command = request.type & NBD_CMD_MASK_COMMAND;
> +    command = request.type;
>     if (command != NBD_CMD_DISC && (request.from + request.len) > exp->size) {
>             LOG("From: %" PRIu64 ", Len: %" PRIu32", Size: %" PRIu64
>                 ", Offset: %" PRIu64 "\n",
> @@ -1084,7 +1086,8 @@ static void nbd_trip(void *opaque)
>     case NBD_CMD_READ:
>         TRACE("Request type is READ");
> 
> -        if (request.type & NBD_CMD_FLAG_FUA) {
> +        /* XXX: NBD Protocol only documents use of FUA with WRITE */
> +        if (request.flags & NBD_CMD_FLAG_FUA) {
>             ret = blk_co_flush(exp->blk);
>             if (ret < 0) {
>                 LOG("flush failed");
> @@ -1126,7 +1129,7 @@ static void nbd_trip(void *opaque)
>             goto error_reply;
>         }
> 
> -        if (request.type & NBD_CMD_FLAG_FUA) {
> +        if (request.flags & NBD_CMD_FLAG_FUA) {
>             ret = blk_co_flush(exp->blk);
>             if (ret < 0) {
>                 LOG("flush failed");
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 08/18] nbd: Limit nbdflags to 16 bits
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 08/18] nbd: Limit nbdflags to 16 bits Eric Blake
@ 2016-04-09 10:37   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:37 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini,
	open list:Block layer core


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> Rather than asserting that nbdflags is within range, just give
> it the correct type to begin with :)  nbdflags corresponds to
> the per-export portion of NBD Protocol "transmission flags", which
> is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> include/block/nbd.h |  2 +-
> nbd/server.c        | 10 ++++------
> qemu-nbd.c          |  2 +-
> 3 files changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 2c61901..42fd670 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -105,7 +105,7 @@ typedef struct NBDExport NBDExport;
> typedef struct NBDClient NBDClient;
> 
> NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
> -                          uint32_t nbdflags, void (*close)(NBDExport *),
> +                          uint16_t nbdflags, void (*close)(NBDExport *),
>                           Error **errp);
> void nbd_export_close(NBDExport *exp);
> void nbd_export_get(NBDExport *exp);
> diff --git a/nbd/server.c b/nbd/server.c
> index 93c077e..c8666ab 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -63,7 +63,7 @@ struct NBDExport {
>     char *name;
>     off_t dev_offset;
>     off_t size;
> -    uint32_t nbdflags;
> +    uint16_t nbdflags;
>     QTAILQ_HEAD(, NBDClient) clients;
>     QTAILQ_ENTRY(NBDExport) next;
> 
> @@ -525,8 +525,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>     NBDClient *client = data->client;
>     char buf[8 + 8 + 8 + 128];
>     int rc;
> -    const int myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> -                         NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> +    const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> +                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
>     bool oldStyle;
> 
>     /* Old style negotiation header without options
> @@ -556,7 +556,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
> 
>     oldStyle = client->exp != NULL && !client->tlscreds;
>     if (oldStyle) {
> -        assert ((client->exp->nbdflags & ~65535) == 0);
>         stq_be_p(buf + 8, NBD_CLIENT_MAGIC);
>         stq_be_p(buf + 16, client->exp->size);
>         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
> @@ -585,7 +584,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>             goto fail;
>         }
> 
> -        assert ((client->exp->nbdflags & ~65535) == 0);
>         stq_be_p(buf + 18, client->exp->size);
>         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
>         if (nbd_negotiate_write(client->ioc, buf + 18, sizeof(buf) - 18) !=
> @@ -807,7 +805,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
> }
> 
> NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
> -                          uint32_t nbdflags, void (*close)(NBDExport *),
> +                          uint16_t nbdflags, void (*close)(NBDExport *),
>                           Error **errp)
> {
>     NBDExport *exp = g_malloc0(sizeof(NBDExport));
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index c2e4d3f..8880ac3 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -454,7 +454,7 @@ int main(int argc, char **argv)
>     BlockBackend *blk;
>     BlockDriverState *bs;
>     off_t dev_offset = 0;
> -    uint32_t nbdflags = 0;
> +    uint16_t nbdflags = 0;
>     bool disconnect = false;
>     const char *bindto = "0.0.0.0";
>     const char *port = NULL;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 09/18] nbd: Share common reply-sending code in server
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 09/18] nbd: Share common reply-sending code in server Eric Blake
@ 2016-04-09 10:38   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:38 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> Rather than open-coding NBD_REP_SERVER, reuse the code we
> already have by adding a length parameter.  The code gets
> longer because of added comments, but the refactoring will
> make adding NBD_OPT_GO in a later patch easier.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> nbd/server.c | 59 +++++++++++++++++++++++++++++------------------------------
> 1 file changed, 29 insertions(+), 30 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index c8666ab..69724c9 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -192,12 +192,15 @@ static ssize_t nbd_negotiate_drop_sync(QIOChannel *ioc, size_t size)
> 
> */
> 
> -static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
> +/* Send a reply header, including length, but no payload.
> + * Return -errno to kill connection, 0 to continue negotiation */
> +static int nbd_negotiate_send_rep_len(QIOChannel *ioc, uint32_t type,
> +                                      uint32_t opt, uint32_t len)
> {
>     uint64_t magic;
> -    uint32_t len;
> 
> -    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32, type, opt);
> +    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32 " len=%" PRIu32,
> +          type, opt, len);
> 
>     magic = cpu_to_be64(NBD_REP_MAGIC);
>     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> @@ -214,7 +217,7 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
>         LOG("write failed (rep type)");
>         return -EINVAL;
>     }
> -    len = cpu_to_be32(0);
> +    len = cpu_to_be32(len);
>     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
>         LOG("write failed (rep data length)");
>         return -EINVAL;
> @@ -222,39 +225,35 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
>     return 0;
> }
> 
> +/* Send a reply header with default 0 length.
> + * Return -errno to kill connection, 0 to continue negotiation */
> +static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
> +{
> +    return nbd_negotiate_send_rep_len(ioc, type, opt, 0);
> +}
> +
> +/* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
> + * Return -errno to kill connection, 0 to continue negotiation */
> static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
> {
> -    uint64_t magic, name_len;
> -    uint32_t opt, type, len;
> +    uint32_t len;
> +    int rc;
> 
>     TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
> -    name_len = strlen(exp->name);
> -    magic = cpu_to_be64(NBD_REP_MAGIC);
> -    if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -        LOG("write failed (magic)");
> -        return -EINVAL;
> -     }
> -    opt = cpu_to_be32(NBD_OPT_LIST);
> -    if (nbd_negotiate_write(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -        LOG("write failed (opt)");
> -        return -EINVAL;
> -    }
> -    type = cpu_to_be32(NBD_REP_SERVER);
> -    if (nbd_negotiate_write(ioc, &type, sizeof(type)) != sizeof(type)) {
> -        LOG("write failed (reply type)");
> -        return -EINVAL;
> -    }
> -    len = cpu_to_be32(name_len + sizeof(len));
> -    if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
> -        LOG("write failed (length)");
> -        return -EINVAL;
> -    }
> -    len = cpu_to_be32(name_len);
> +    len = strlen(exp->name);
> +    rc = nbd_negotiate_send_rep_len(ioc, NBD_REP_SERVER, NBD_OPT_LIST,
> +                                    len + sizeof(len));
> +    if (rc < 0) {
> +        return rc;
> +    }
> +
> +    len = cpu_to_be32(len);
>     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
> -        LOG("write failed (length)");
> +        LOG("write failed (name length)");
>         return -EINVAL;
>     }
> -    if (nbd_negotiate_write(ioc, exp->name, name_len) != name_len) {
> +    len = be32_to_cpu(len);
> +    if (nbd_negotiate_write(ioc, exp->name, len) != len) {
>         LOG("write failed (buffer)");
>         return -EINVAL;
>     }
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 10/18] nbd: Share common option-sending code in client
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 10/18] nbd: Share common option-sending code in client Eric Blake
@ 2016-04-09 10:38   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:38 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini,
	open list:Block layer core


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> Rather than open-coding each option request, it's easier to
> have common helper functions do the work.  That in turn requires
> having convenient packed types for handling option requests
> and replies.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> include/block/nbd.h |  29 +++++-
> nbd/nbd-internal.h  |   2 +-
> nbd/client.c        | 250 ++++++++++++++++++++++------------------------------
> 3 files changed, 129 insertions(+), 152 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 42fd670..155196e 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -26,20 +26,41 @@
> #include "io/channel-socket.h"
> #include "crypto/tlscreds.h"
> 
> +/* Handshake phase structs */
> +
> +struct nbd_option {
> +    uint64_t magic; /* NBD_OPTS_MAGIC */
> +    uint32_t option; /* NBD_OPT_* */
> +    uint32_t length;
> +} QEMU_PACKED;
> +typedef struct nbd_option nbd_option;
> +
> +struct nbd_opt_reply {
> +    uint64_t magic; /* NBD_REP_MAGIC */
> +    uint32_t option; /* NBD_OPT_* */
> +    uint32_t type; /* NBD_REP_* */
> +    uint32_t length;
> +} QEMU_PACKED;
> +typedef struct nbd_opt_reply nbd_opt_reply;
> +
> +/* Transmission phase structs */
> +
> struct nbd_request {
> -    uint32_t magic;
> -    uint16_t flags;
> -    uint16_t type;
> +    uint32_t magic; /* NBD_REQUEST_MAGIC */
> +    uint16_t flags; /* NBD_CMD_FLAG_* */
> +    uint16_t type; /* NBD_CMD_* */
>     uint64_t handle;
>     uint64_t from;
>     uint32_t len;
> } QEMU_PACKED;
> +typedef struct nbd_request nbd_request;
> 
> struct nbd_reply {
> -    uint32_t magic;
> +    uint32_t magic; /* NBD_REPLY_MAGIC */
>     uint32_t error;
>     uint64_t handle;
> } QEMU_PACKED;
> +typedef struct nbd_reply nbd_reply;
> 
> /* Transmission (export) flags: sent from server to client during handshake,
>    but describe what will happen during transmission */
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index b663bf3..b78d249 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -61,7 +61,7 @@
> #define NBD_REPLY_MAGIC         0x67446698
> #define NBD_OPTS_MAGIC          0x49484156454F5054LL
> #define NBD_CLIENT_MAGIC        0x0000420281861253LL
> -#define NBD_REP_MAGIC           0x3e889045565a9LL
> +#define NBD_REP_MAGIC           0x0003e889045565a9LL
> 
> #define NBD_SET_SOCK            _IO(0xab, 0)
> #define NBD_SET_BLKSIZE         _IO(0xab, 1)
> diff --git a/nbd/client.c b/nbd/client.c
> index 7fd6059..07b8d2e 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -75,64 +75,123 @@ static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
> 
> */
> 
> +/* Send an option request. Return 0 if successful, -1 with errp set if
> + * it is impossible to continue. */
> +static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
> +                                   uint32_t len, const char *data,
> +                                   Error **errp)
> +{
> +    nbd_option req;
> +    QEMU_BUILD_BUG_ON(sizeof(req) != 16);
> 
> -/* If type represents success, return 1 without further action.
> - * If type represents an error reply, consume the rest of the packet on ioc.
> - * Then return 0 for unsupported (so the client can fall back to
> - * other approaches), or -1 with errp set for other errors.
> +    if (len == -1) {
> +        req.length = len = strlen(data);
> +    }
> +    TRACE("Sending option request %"PRIu32", len %"PRIu32, opt, len);
> +
> +    stq_be_p(&req.magic, NBD_OPTS_MAGIC);
> +    stl_be_p(&req.option, opt);
> +    stl_be_p(&req.length, len);
> +
> +    if (write_sync(ioc, &req, sizeof(req)) != sizeof(req)) {
> +        error_setg(errp, "Failed to send option request header");
> +        return -1;
> +    }
> +
> +    if (len && write_sync(ioc, (char *) data, len) != len) {
> +        error_setg(errp, "Failed to send option request data");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Receive the header of an option reply, which should match the given
> + * opt.  Read through the length field, but NOT the length bytes of
> + * payload. Return 0 if successful, -1 with errp set if it is
> + * impossible to continue. */
> +static int nbd_receive_option_reply(QIOChannel *ioc, uint32_t opt,
> +                                    nbd_opt_reply *reply, Error **errp)
> +{
> +    QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
> +    if (read_sync(ioc, reply, sizeof(*reply)) != sizeof(*reply)) {
> +        error_setg(errp, "failed to read option reply");
> +        return -1;
> +    }
> +    be64_to_cpus(&reply->magic);
> +    be32_to_cpus(&reply->option);
> +    be32_to_cpus(&reply->type);
> +    be32_to_cpus(&reply->length);
> +
> +    TRACE("Received option reply %"PRIu32", type %"PRIu32", len %"PRIu32,
> +          reply->option, reply->type, reply->length);
> +
> +    if (reply->magic != NBD_REP_MAGIC) {
> +        error_setg(errp, "Unexpected option reply magic");
> +        return -1;
> +    }
> +    if (reply->option != opt) {
> +        error_setg(errp, "Unexpected option type %x expected %x",
> +                   reply->option, opt);
> +        return -1;
> +    }
> +    return 0;
> +}
> +
> +/* If reply represents success, return 1 without further action.
> + * If reply represents an error, consume the optional payload of
> + * the packet on ioc.  Then return 0 for unsupported (so the client
> + * can fall back to other approaches), or -1 with errp set for other
> + * errors.
>  */
> -static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,
> +static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
>                                 Error **errp)
> {
> -    uint32_t len;
>     char *msg = NULL;
>     int result = -1;
> 
> -    if (!(type & (1 << 31))) {
> +    if (!(reply->type & (1 << 31))) {
>         return 1;
>     }
> 
> -    if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
> -        error_setg(errp, "failed to read option length");
> -        return -1;
> -    }
> -    len = be32_to_cpu(len);
> -    if (len) {
> -        if (len > NBD_MAX_BUFFER_SIZE) {
> +    if (reply->length) {
> +        if (reply->length > NBD_MAX_BUFFER_SIZE) {
>             error_setg(errp, "server's error message is too long");
>             goto cleanup;
>         }
> -        msg = g_malloc(len + 1);
> -        if (read_sync(ioc, msg, len) != len) {
> +        msg = g_malloc(reply->length + 1);
> +        if (read_sync(ioc, msg, reply->length) != reply->length) {
>             error_setg(errp, "failed to read option error message");
>             goto cleanup;
>         }
> -        msg[len] = '\0';
> +        msg[reply->length] = '\0';
>     }
> 
> -    switch (type) {
> +    switch (reply->type) {
>     case NBD_REP_ERR_UNSUP:
>         TRACE("server doesn't understand request %" PRIx32
> -              ", attempting fallback", opt);
> +              ", attempting fallback", reply->option);
>         result = 0;
>         goto cleanup;
> 
>     case NBD_REP_ERR_POLICY:
> -        error_setg(errp, "Denied by server for option %" PRIx32, opt);
> +        error_setg(errp, "Denied by server for option %" PRIx32,
> +                   reply->option);
>         break;
> 
>     case NBD_REP_ERR_INVALID:
> -        error_setg(errp, "Invalid data length for option %" PRIx32, opt);
> +        error_setg(errp, "Invalid data length for option %" PRIx32,
> +                   reply->option);
>         break;
> 
>     case NBD_REP_ERR_TLS_REQD:
>         error_setg(errp, "TLS negotiation required before option %" PRIx32,
> -                   opt);
> +                   reply->option);
>         break;
> 
>     default:
>         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
> -                   opt);
> +                   reply->option);
>         break;
>     }
> 
> @@ -147,58 +206,29 @@ static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,
> 
> static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
> {
> -    uint64_t magic;
> -    uint32_t opt;
> -    uint32_t type;
> +    nbd_opt_reply reply;
>     uint32_t len;
>     uint32_t namelen;
>     int error;
> 
>     *name = NULL;
> -    if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -        error_setg(errp, "failed to read list option magic");
> +    if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
>         return -1;
>     }
> -    magic = be64_to_cpu(magic);
> -    if (magic != NBD_REP_MAGIC) {
> -        error_setg(errp, "Unexpected option list magic");
> -        return -1;
> -    }
> -    if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -        error_setg(errp, "failed to read list option");
> -        return -1;
> -    }
> -    opt = be32_to_cpu(opt);
> -    if (opt != NBD_OPT_LIST) {
> -        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
> -                   opt, NBD_OPT_LIST);
> -        return -1;
> -    }
> -
> -    if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
> -        error_setg(errp, "failed to read list option type");
> -        return -1;
> -    }
> -    type = be32_to_cpu(type);
> -    error = nbd_handle_reply_err(ioc, opt, type, errp);
> +    error = nbd_handle_reply_err(ioc, &reply, errp);
>     if (error <= 0) {
>         return error;
>     }
> +    len = reply.length;
> 
> -    if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
> -        error_setg(errp, "failed to read option length");
> -        return -1;
> -    }
> -    len = be32_to_cpu(len);
> -
> -    if (type == NBD_REP_ACK) {
> +    if (reply.type == NBD_REP_ACK) {
>         if (len != 0) {
>             error_setg(errp, "length too long for option end");
>             return -1;
>         }
> -    } else if (type == NBD_REP_SERVER) {
> +    } else if (reply.type == NBD_REP_SERVER) {
>         if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
> -            error_setg(errp, "incorrect option length");
> +            error_setg(errp, "incorrect option length %"PRIu32, len);
>             return -1;
>         }
>         if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
> @@ -240,7 +270,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>         }
>     } else {
>         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
> -                   type, NBD_REP_SERVER);
> +                   reply.type, NBD_REP_SERVER);
>         return -1;
>     }
>     return 1;
> @@ -251,24 +281,10 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
>                                      const char *wantname,
>                                      Error **errp)
> {
> -    uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
> -    uint32_t opt = cpu_to_be32(NBD_OPT_LIST);
> -    uint32_t length = 0;
>     bool foundExport = false;
> 
>     TRACE("Querying export list");
> -    if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -        error_setg(errp, "Failed to send list option magic");
> -        return -1;
> -    }
> -
> -    if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -        error_setg(errp, "Failed to send list option number");
> -        return -1;
> -    }
> -
> -    if (write_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
> -        error_setg(errp, "Failed to send list option length");
> +    if (nbd_send_option_request(ioc, NBD_OPT_LIST, 0, NULL, errp) < 0) {
>         return -1;
>     }
> 
> @@ -314,72 +330,29 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>                                         QCryptoTLSCreds *tlscreds,
>                                         const char *hostname, Error **errp)
> {
> -    uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
> -    uint32_t opt = cpu_to_be32(NBD_OPT_STARTTLS);
> -    uint32_t length = 0;
> -    uint32_t type;
> +    nbd_opt_reply reply;
>     QIOChannelTLS *tioc;
>     struct NBDTLSHandshakeData data = { 0 };
> 
>     TRACE("Requesting TLS from server");
> -    if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -        error_setg(errp, "Failed to send option magic");
> -        return NULL;
> -    }
> -
> -    if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -        error_setg(errp, "Failed to send option number");
> -        return NULL;
> -    }
> -
> -    if (write_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
> -        error_setg(errp, "Failed to send option length");
> -        return NULL;
> -    }
> -
> -    TRACE("Getting TLS reply from server1");
> -    if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -        error_setg(errp, "failed to read option magic");
> -        return NULL;
> -    }
> -    magic = be64_to_cpu(magic);
> -    if (magic != NBD_REP_MAGIC) {
> -        error_setg(errp, "Unexpected option magic");
> -        return NULL;
> -    }
> -    TRACE("Getting TLS reply from server2");
> -    if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -        error_setg(errp, "failed to read option");
> -        return NULL;
> -    }
> -    opt = be32_to_cpu(opt);
> -    if (opt != NBD_OPT_STARTTLS) {
> -        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
> -                   opt, NBD_OPT_STARTTLS);
> +    if (nbd_send_option_request(ioc, NBD_OPT_STARTTLS, 0, NULL, errp) < 0) {
>         return NULL;
>     }
> 
>     TRACE("Getting TLS reply from server");
> -    if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
> -        error_setg(errp, "failed to read option type");
> +    if (nbd_receive_option_reply(ioc, NBD_OPT_STARTTLS, &reply, errp) < 0) {
>         return NULL;
>     }
> -    type = be32_to_cpu(type);
> -    if (type != NBD_REP_ACK) {
> +
> +    if (reply.type != NBD_REP_ACK) {
>         error_setg(errp, "Server rejected request to start TLS %" PRIx32,
> -                   type);
> +                   reply.type);
>         return NULL;
>     }
> 
> -    TRACE("Getting TLS reply from server");
> -    if (read_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
> -        error_setg(errp, "failed to read option length");
> -        return NULL;
> -    }
> -    length = be32_to_cpu(length);
> -    if (length != 0) {
> +    if (reply.length != 0) {
>         error_setg(errp, "Start TLS response was not zero %" PRIu32,
> -                   length);
> +                   reply.length);
>         return NULL;
>     }
> 
> @@ -466,8 +439,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
> 
>     if (magic == NBD_OPTS_MAGIC) {
>         uint32_t clientflags = 0;
> -        uint32_t opt;
> -        uint32_t namesize;
>         uint16_t globalflags;
>         uint16_t exportflags;
>         bool fixedNewStyle = false;
> @@ -519,28 +490,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>                 goto fail;
>             }
>         }
> -        /* write the export name */
> -        magic = cpu_to_be64(magic);
> -        if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -            error_setg(errp, "Failed to send export name magic");
> -            goto fail;
> -        }
> -        opt = cpu_to_be32(NBD_OPT_EXPORT_NAME);
> -        if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -            error_setg(errp, "Failed to send export name option number");
> -            goto fail;
> -        }
> -        namesize = cpu_to_be32(strlen(name));
> -        if (write_sync(ioc, &namesize, sizeof(namesize)) !=
> -            sizeof(namesize)) {
> -            error_setg(errp, "Failed to send export name length");
> -            goto fail;
> -        }
> -        if (write_sync(ioc, (char *)name, strlen(name)) != strlen(name)) {
> -            error_setg(errp, "Failed to send export name");
> +        /* write the export name request */
> +        if (nbd_send_option_request(ioc, NBD_OPT_EXPORT_NAME, -1, name,
> +                                    errp) < 0) {
>             goto fail;
>         }
> 
> +        /* Read the response */
>         if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
>             error_setg(errp, "Failed to read export length");
>             goto fail;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] nbd: Let client skip portions of server reply
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 11/18] nbd: Let client skip portions of server reply Eric Blake
@ 2016-04-09 10:39   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:39 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> The server has a nice helper function nbd_negotiate_drop_sync()
> which lets it easily ignore fluff from the client (such as the
> payload to an unknown option request).  We can't quite make it
> common, since it depends on nbd_negotiate_read() which handles
> coroutine magic, but we can copy the idea into the client where
> we have places where we want to ignore data (such as the
> description tacked on the end of NBD_REP_SERVER).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> nbd/client.c | 45 ++++++++++++++++++++++++++++++++-------------
> 1 file changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 07b8d2e..b2dfc11 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -75,6 +75,32 @@ static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
> 
> */
> 
> +/* Discard length bytes from channel.  Return -errno on failure, or
> + * the amount of bytes consumed. */
> +static ssize_t drop_sync(QIOChannel *ioc, size_t size)
> +{
> +    ssize_t ret, dropped = size;
> +    char small[1024];
> +    char *buffer;
> +
> +    buffer = sizeof(small) < size ? small : g_malloc(MIN(65536, size));
> +    while (size > 0) {
> +        ret = read_sync(ioc, buffer, MIN(65536, size));
> +        if (ret < 0) {
> +            goto cleanup;
> +        }
> +        assert(ret <= size);
> +        size -= ret;
> +    }
> +    ret = dropped;
> +
> + cleanup:
> +    if (buffer != small) {
> +        g_free(buffer);
> +    }
> +    return ret;
> +}
> +
> /* Send an option request. Return 0 if successful, -1 with errp set if
>  * it is impossible to continue. */
> static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
> @@ -255,18 +281,11 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>         }
>         (*name)[namelen] = '\0';
>         len -= namelen;
> -        if (len) {
> -            char *buf = g_malloc(len + 1);
> -            if (read_sync(ioc, buf, len) != len) {
> -                error_setg(errp, "failed to read export description");
> -                g_free(*name);
> -                g_free(buf);
> -                *name = NULL;
> -                return -1;
> -            }
> -            buf[len] = '\0';
> -            TRACE("Ignoring export description: %s", buf);
> -            g_free(buf);
> +        if (drop_sync(ioc, len) != len) {
> +            error_setg(errp, "failed to read export description");
> +            g_free(*name);
> +            *name = NULL;
> +            return -1;
>         }
>     } else {
>         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
> @@ -539,7 +558,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>         goto fail;
>     }
> 
> -    if (read_sync(ioc, &buf, 124) != 124) {
> +    if (drop_sync(ioc, 124) != 124) {
>         error_setg(errp, "Failed to read reserved block");
>         goto fail;
>     }
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 12/18] nbd: Less allocation during NBD_OPT_LIST
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 12/18] nbd: Less allocation during NBD_OPT_LIST Eric Blake
@ 2016-04-09 10:41   ` Alex Bligh
  2016-04-09 22:24     ` Eric Blake
  0 siblings, 1 reply; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:41 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> Since we know that the maximum name we are willing to accept
> is small enough to stack-allocate, rework the iteration over
> NBD_OPT_LIST responses to reuse a stack buffer rather than
> allocating every time.  Furthermore, we don't even have to
> allocate if we know the server's length doesn't match what
> we are searching for.
> 
> Not fixed here: Upstream NBD Protocol recently added this
> clarification:
> https://github.com/yoe/nbd/blob/18918eb/doc/proto.md#conventions
> 
> Where this document refers to a string, then unless otherwise
> stated, that string is a sequence of UTF-8 code points, which
> is not NUL terminated, MUST NOT contain NUL characters, SHOULD
> be no longer than 256 bytes and MUST be no longer than 4096
> bytes. This applies to export names and error messages (amongst
> others).
> 
> To be fully compliant to that, we need to bump our export name
> limit from 255 to at least 256, and need to decide whether we
> can bump it higher (bumping it all the way to 4096 is annoying
> in that we could no longer safely stack-allocate a worst-case
> string, so we may still want to take the leeway offered by SHOULD
> to force a reasonable smaller limit).

Is there a limit in qemu-world to safe stack allocation? I thought
that was (in general) only a kernel consideration? (probably my
ignorance here).

Otherwise:


Reviewed-by: Alex Bligh <alex@alex.org.uk>

> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> nbd/client.c | 130 +++++++++++++++++++++++++++++------------------------------
> 1 file changed, 65 insertions(+), 65 deletions(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index b2dfc11..d4e37d5 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -230,14 +230,17 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
>     return result;
> }
> 
> -static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
> +/* Return -1 if unrecoverable error occurs, 0 if NBD_OPT_LIST is
> + * unsupported, 1 if iteration is done, 2 to keep looking, and 3 if
> + * this entry matches want. */
> +static int nbd_receive_list(QIOChannel *ioc, const char *want, Error **errp)
> {
>     nbd_opt_reply reply;
>     uint32_t len;
>     uint32_t namelen;
> +    char name[NBD_MAX_NAME_SIZE + 1];
>     int error;
> 
> -    *name = NULL;
>     if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
>         return -1;
>     }
> @@ -252,97 +255,94 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>             error_setg(errp, "length too long for option end");
>             return -1;
>         }
> -    } else if (reply.type == NBD_REP_SERVER) {
> -        if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
> -            error_setg(errp, "incorrect option length %"PRIu32, len);
> -            return -1;
> -        }
> -        if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
> -            error_setg(errp, "failed to read option name length");
> -            return -1;
> -        }
> -        namelen = be32_to_cpu(namelen);
> -        len -= sizeof(namelen);
> -        if (len < namelen) {
> -            error_setg(errp, "incorrect option name length");
> -            return -1;
> -        }
> -        if (namelen > NBD_MAX_NAME_SIZE) {
> -            error_setg(errp, "export name length too long %" PRIu32, namelen);
> -            return -1;
> -        }
> -
> -        *name = g_new0(char, namelen + 1);
> -        if (read_sync(ioc, *name, namelen) != namelen) {
> -            error_setg(errp, "failed to read export name");
> -            g_free(*name);
> -            *name = NULL;
> -            return -1;
> -        }
> -        (*name)[namelen] = '\0';
> -        len -= namelen;
> -        if (drop_sync(ioc, len) != len) {
> -            error_setg(errp, "failed to read export description");
> -            g_free(*name);
> -            *name = NULL;
> -            return -1;
> -        }
> -    } else {
> +        return 1;
> +    } else if (reply.type != NBD_REP_SERVER) {
>         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
>                    reply.type, NBD_REP_SERVER);
>         return -1;
>     }
> -    return 1;
> +
> +    if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
> +        error_setg(errp, "incorrect option length %"PRIu32, len);
> +        return -1;
> +    }
> +    if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
> +        error_setg(errp, "failed to read option name length");
> +        return -1;
> +    }
> +    namelen = be32_to_cpu(namelen);
> +    len -= sizeof(namelen);
> +    if (len < namelen) {
> +        error_setg(errp, "incorrect option name length");
> +        return -1;
> +    }
> +    if (namelen != strlen(want)) {
> +        if (drop_sync(ioc, len) != len) {
> +            error_setg(errp, "failed to skip export name with wrong length");
> +            return -1;
> +        }
> +        return 2;
> +    }
> +
> +    assert(namelen < sizeof(name));
> +    if (read_sync(ioc, name, namelen) != namelen) {
> +        error_setg(errp, "failed to read export name");
> +        return -1;
> +    }
> +    name[namelen] = '\0';
> +    len -= namelen;
> +    if (drop_sync(ioc, len) != len) {
> +        error_setg(errp, "failed to read export description");
> +        return -1;
> +    }
> +    return strcmp(name, want) == 0 ? 3 : 2;
> }
> 
> 
> +/* Return -1 on failure, 0 if wantname is an available export. */
> static int nbd_receive_query_exports(QIOChannel *ioc,
>                                      const char *wantname,
>                                      Error **errp)
> {
>     bool foundExport = false;
> 
> -    TRACE("Querying export list");
> +    TRACE("Querying export list for '%s'", wantname);
>     if (nbd_send_option_request(ioc, NBD_OPT_LIST, 0, NULL, errp) < 0) {
>         return -1;
>     }
> 
>     TRACE("Reading available export names");
>     while (1) {
> -        char *name = NULL;
> -        int ret = nbd_receive_list(ioc, &name, errp);
> +        int ret = nbd_receive_list(ioc, wantname, errp);
> 
> -        if (ret < 0) {
> -            g_free(name);
> -            name = NULL;
> +        switch (ret) {
> +        default:
> +            /* Server gave unexpected reply */
> +            assert(ret < 0);
>             return -1;
> -        }
> -        if (ret == 0) {
> +        case 0:
>             /* Server doesn't support export listing, so
>              * we will just assume an export with our
>              * wanted name exists */
> -            foundExport = true;
> -            break;
> -        }
> -        if (name == NULL) {
> -            TRACE("End of export name list");
> +            return 0;
> +        case 1:
> +            /* Done iterating. */
> +            if (!foundExport) {
> +                error_setg(errp, "No export with name '%s' available",
> +                           wantname);
> +                return -1;
> +            }
> +            return 0;
> +        case 2:
> +            /* Wasn't this one, keep going. */
>             break;
> -        }
> -        if (g_str_equal(name, wantname)) {
> +        case 3:
> +            /* Found a match, but must finish parsing reply. */
> +            TRACE("Found desired export name '%s'", wantname);
>             foundExport = true;
> -            TRACE("Found desired export name '%s'", name);
> -        } else {
> -            TRACE("Ignored export name '%s'", name);
> +            break;
>         }
> -        g_free(name);
> -    }
> -
> -    if (!foundExport) {
> -        error_setg(errp, "No export with name '%s' available", wantname);
> -        return -1;
>     }
> -
> -    return 0;
> }
> 
> static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 13/18] nbd: Support shorter handshake
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 13/18] nbd: Support shorter handshake Eric Blake
@ 2016-04-09 10:42   ` Alex Bligh
  2016-04-09 22:27     ` Eric Blake
  0 siblings, 1 reply; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:42 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini,
	open list:Block layer core


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> The NBD Protocol allows the server and client to mutually agree
> on a shorter handshake (omit the 124 bytes of reserved 0), via
> the server advertising NBD_FLAG_NO_ZEROES and the client
> acknowledging with NBD_FLAG_C_NO_ZEROES (only possible in
> newstyle, whether or not it is fixed newstyle).  It doesn't
> shave much off the wire, but we might as well implement it.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Alex Bligh <alex@alex.org.uk>

thanks - that was annoying me.

> ---
> include/block/nbd.h |  6 ++++--
> nbd/client.c        |  8 +++++++-
> nbd/server.c        | 15 +++++++++++----
> 3 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 155196e..35c0ea3 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -73,11 +73,13 @@ typedef struct nbd_reply nbd_reply;
> 
> /* New-style handshake (global) flags, sent from server to client, and
>    control what will happen during handshake phase. */
> -#define NBD_FLAG_FIXED_NEWSTYLE     (1 << 0)    /* Fixed newstyle protocol. */
> +#define NBD_FLAG_FIXED_NEWSTYLE   (1 << 0) /* Fixed newstyle protocol. */
> +#define NBD_FLAG_NO_ZEROES        (1 << 1) /* End handshake without zeroes. */
> 
> /* New-style client flags, sent from client to server to control what happens
>    during handshake phase. */
> -#define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)    /* Fixed newstyle protocol. */
> +#define NBD_FLAG_C_FIXED_NEWSTYLE (1 << 0) /* Fixed newstyle protocol. */
> +#define NBD_FLAG_C_NO_ZEROES      (1 << 1) /* End handshake without zeroes. */
> 
> /* Reply types. */
> #define NBD_REP_ACK             (1)             /* Data sending finished. */
> diff --git a/nbd/client.c b/nbd/client.c
> index d4e37d5..507ddc1 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -409,6 +409,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>     char buf[256];
>     uint64_t magic, s;
>     int rc;
> +    bool zeroes = true;
> 
>     TRACE("Receiving negotiation tlscreds=%p hostname=%s.",
>           tlscreds, hostname ? hostname : "<null>");
> @@ -475,6 +476,11 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>             TRACE("Server supports fixed new style");
>             clientflags |= NBD_FLAG_C_FIXED_NEWSTYLE;
>         }
> +        if (globalflags & NBD_FLAG_NO_ZEROES) {
> +            zeroes = false;
> +            TRACE("Server supports no zeroes");
> +            clientflags |= NBD_FLAG_C_NO_ZEROES;
> +        }
>         /* client requested flags */
>         clientflags = cpu_to_be32(clientflags);
>         if (write_sync(ioc, &clientflags, sizeof(clientflags)) !=
> @@ -558,7 +564,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>         goto fail;
>     }
> 
> -    if (drop_sync(ioc, 124) != 124) {
> +    if (zeroes && drop_sync(ioc, 124) != 124) {
>         error_setg(errp, "Failed to read reserved block");
>         goto fail;
>     }
> diff --git a/nbd/server.c b/nbd/server.c
> index 69724c9..379df8c 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -78,6 +78,7 @@ struct NBDClient {
>     int refcount;
>     void (*close)(NBDClient *client);
> 
> +    bool no_zeroes;
>     NBDExport *exp;
>     QCryptoTLSCreds *tlscreds;
>     char *tlsaclname;
> @@ -396,6 +397,11 @@ static int nbd_negotiate_options(NBDClient *client)
>         fixedNewstyle = true;
>         flags &= ~NBD_FLAG_C_FIXED_NEWSTYLE;
>     }
> +    if (flags & NBD_FLAG_C_NO_ZEROES) {
> +        TRACE("Client supports no zeroes at handshake end");
> +        client->no_zeroes = true;
> +        flags &= ~NBD_FLAG_C_NO_ZEROES;
> +    }
>     if (flags != 0) {
>         TRACE("Unknown client flags 0x%" PRIx32 " received", flags);
>         return -EIO;
> @@ -527,6 +533,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
>                               NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
>     bool oldStyle;
> +    size_t len;
> 
>     /* Old style negotiation header without options
>         [ 0 ..   7]   passwd       ("NBDMAGIC")
> @@ -543,7 +550,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>         ....options sent....
>         [18 ..  25]   size
>         [26 ..  27]   export flags
> -        [28 .. 151]   reserved     (0)
> +        [28 .. 151]   reserved     (0, omit if no_zeroes)
>      */
> 
>     qio_channel_set_blocking(client->ioc, false, NULL);
> @@ -560,7 +567,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
>     } else {
>         stq_be_p(buf + 8, NBD_OPTS_MAGIC);
> -        stw_be_p(buf + 16, NBD_FLAG_FIXED_NEWSTYLE);
> +        stw_be_p(buf + 16, NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES);
>     }
> 
>     if (oldStyle) {
> @@ -585,8 +592,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
> 
>         stq_be_p(buf + 18, client->exp->size);
>         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
> -        if (nbd_negotiate_write(client->ioc, buf + 18, sizeof(buf) - 18) !=
> -            sizeof(buf) - 18) {
> +        len = client->no_zeroes ? 10 : sizeof(buf) - 18;
> +        if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
>             LOG("write failed");
>             goto fail;
>         }
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 14/18] nbd: Implement NBD_OPT_GO on client
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 14/18] nbd: Implement NBD_OPT_GO on client Eric Blake
@ 2016-04-09 10:47   ` Alex Bligh
  2016-04-09 22:38     ` Eric Blake
  0 siblings, 1 reply; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:47 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini,
	open list:Block layer core


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> NBD_OPT_EXPORT_NAME is lousy: it doesn't have any sane error
> reporting.  Upstream NBD recently added NBD_OPT_GO as the

... as an experimental option for now, but hopefully this
should move it out the experimental section.

Thanks for doing this one.

> improved version of the option that does what we want: it
> reports sane errors on failures (including when a server
> requires TLS but does not have NBD_OPT_GO!), and on success
> it concludes with the same data as NBD_OPT_EXPORT_NAME sends.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Perhaps worth adding that although all servers that support
FixedNewstyle are meant to support (i.e. error but not disconnect on)
unsupported options, perhaps some don't (in which case they
are buggy and should be fixed). But just in case someone asks
'why is qemu no longer connecting to shonkynbd', a message in
the commit log might be useful.

Otherwise:

Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> include/block/nbd.h |  1 +
> nbd/nbd-internal.h  |  7 +++++
> nbd/client.c        | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> 3 files changed, 92 insertions(+), 2 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 35c0ea3..d261dbc 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -88,6 +88,7 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_REP_ERR_POLICY      ((UINT32_C(1) << 31) | 2) /* Server denied */
> #define NBD_REP_ERR_INVALID     ((UINT32_C(1) << 31) | 3) /* Invalid length. */
> #define NBD_REP_ERR_TLS_REQD    ((UINT32_C(1) << 31) | 5) /* TLS required */
> +#define NBD_REP_ERR_UNKNOWN     ((UINT32_C(1) << 31) | 6) /* Export unknown */
> 
> /* Request flags, sent from client to server during transmission phase */
> #define NBD_CMD_FLAG_FUA        (1 << 0)
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index b78d249..ddba1d0 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -55,8 +55,13 @@
>  * https://github.com/yoe/nbd/blob/master/doc/proto.md
>  */
> 
> +/* Size of all NBD_OPT_*, without payload */
> #define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
> +/* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
> #define NBD_REPLY_SIZE          (4 + 4 + 8)
> +/* Size of reply to NBD_OPT_EXPORT_NAME, without trailing zeroes */
> +#define NBD_FINAL_REPLY_SIZE    (8 + 2)
> +
> #define NBD_REQUEST_MAGIC       0x25609513
> #define NBD_REPLY_MAGIC         0x67446698
> #define NBD_OPTS_MAGIC          0x49484156454F5054LL
> @@ -80,6 +85,8 @@
> #define NBD_OPT_LIST            (3)
> #define NBD_OPT_PEEK_EXPORT     (4)
> #define NBD_OPT_STARTTLS        (5)
> +#define NBD_OPT_INFO            (6)
> +#define NBD_OPT_GO              (7)
> 
> /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
>  * but only a limited set of errno values is specified in the protocol.
> diff --git a/nbd/client.c b/nbd/client.c
> index 507ddc1..af17d4c 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -215,6 +215,11 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
>                    reply->option);
>         break;
> 
> +    case NBD_REP_ERR_UNKNOWN:
> +        error_setg(errp, "Requested export not available for option %" PRIx32,
> +                   reply->option);
> +        break;
> +
>     default:
>         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
>                    reply->option);
> @@ -299,6 +304,67 @@ static int nbd_receive_list(QIOChannel *ioc, const char *want, Error **errp)
> }
> 
> 
> +/* Returns -1 if NBD_OPT_GO proves the export cannot be used, 0 if
> + * NBD_OPT_GO is unsupported (fall back to NBD_OPT_LIST and
> + * NBD_OPT_EXPORT_NAME in that case), and > 0 if the export is good to
> + * go (with the server data at the same point as it would be right
> + * after sending NBD_OPT_EXPORT_NAME). */
> +static int nbd_opt_go(QIOChannel *ioc, const char *wantname, Error **errp)
> +{
> +    nbd_opt_reply reply;
> +    uint32_t len;
> +    uint32_t namelen;
> +    int error;
> +    char buf[NBD_MAX_NAME_SIZE];
> +
> +    TRACE("Attempting NBD_OPT_GO for export '%s'", wantname);
> +    if (nbd_send_option_request(ioc, NBD_OPT_GO, -1, wantname, errp) < 0) {
> +        return -1;
> +    }
> +
> +    TRACE("Reading export info");
> +    if (nbd_receive_option_reply(ioc, NBD_OPT_GO, &reply, errp) < 0) {
> +        return -1;
> +    }
> +    error = nbd_handle_reply_err(ioc, &reply, errp);
> +    if (error <= 0) {
> +        return error;
> +    }
> +    len = reply.length;
> +
> +    if (reply.type != NBD_REP_SERVER) {
> +        error_setg(errp, "unexpected reply type %" PRIx32 ", expected %x",
> +                   reply.type, NBD_REP_SERVER);
> +        return -1;
> +    }
> +
> +    if (len < sizeof(namelen) + NBD_FINAL_REPLY_SIZE ||
> +        len > sizeof(namelen) + sizeof(buf) + NBD_FINAL_REPLY_SIZE) {
> +        error_setg(errp, "reply with %" PRIu32 " bytes is unexpected size",
> +                   len);
> +        return -1;
> +    }
> +
> +    if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
> +        error_setg(errp, "failed to read namelen");
> +        return -1;
> +    }
> +    be32_to_cpus(&namelen);
> +    if (len != sizeof(namelen) + namelen + NBD_FINAL_REPLY_SIZE) {
> +        error_setg(errp, "namelen %" PRIu32 " is unexpected size",
> +                   len);
> +        return -1;
> +    }
> +
> +    if (read_sync(ioc, buf, namelen) != namelen) {
> +        error_setg(errp, "failed to read name");
> +        return -1;
> +    }
> +
> +    TRACE("export is good to go");
> +    return 1;
> +}
> +
> /* Return -1 on failure, 0 if wantname is an available export. */
> static int nbd_receive_query_exports(QIOChannel *ioc,
>                                      const char *wantname,
> @@ -505,11 +571,26 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>             name = "";
>         }
>         if (fixedNewStyle) {
> +            int result;
> +
> +            /* Try NBD_OPT_GO first - if it works, we are done (it
> +             * also gives us a good message if the server requires
> +             * TLS).  If it is not available, fall back to
> +             * NBD_OPT_LIST for nicer error messages about a missing
> +             * export, then use NBD_OPT_EXPORT_NAME.  */
> +            result = nbd_opt_go(ioc, name, errp);
> +            if (result < 0) {
> +                goto fail;
> +            }
> +            if (result > 0) {
> +                zeroes = false;
> +                goto success;
> +            }
>             /* Check our desired export is present in the
>              * server export list. Since NBD_OPT_EXPORT_NAME
>              * cannot return an error message, running this
> -             * query gives us good error reporting if the
> -             * server required TLS
> +             * query gives us better error reporting if the
> +             * export name is not available.
>              */
>             if (nbd_receive_query_exports(ioc, name, errp) < 0) {
>                 goto fail;
> @@ -522,6 +603,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>         }
> 
>         /* Read the response */
> +    success:
>         if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
>             error_setg(errp, "Failed to read export length");
>             goto fail;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 15/18] nbd: Implement NBD_OPT_GO on server
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 15/18] nbd: Implement NBD_OPT_GO on server Eric Blake
@ 2016-04-09 10:48   ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:48 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> NBD_OPT_EXPORT_NAME is lousy: it requires us to close the connection
> rather than report an error.  Upstream NBD recently added NBD_OPT_GO
> as the improved version of the option that does what we want, along
> with NBD_OPT_INFO that returns the same information but does not
> transition to transmission phase.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> nbd/server.c | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 109 insertions(+), 13 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 379df8c..e68e83c 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -233,17 +233,19 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
>     return nbd_negotiate_send_rep_len(ioc, type, opt, 0);
> }
> 
> -/* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
> +/* Send the common part of an NBD_REP_SERVER reply for the given option,
> + * and include extra_len in the advertised payload.
>  * Return -errno to kill connection, 0 to continue negotiation */
> -static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
> +static int nbd_negotiate_send_rep_server(QIOChannel *ioc, NBDExport *exp,
> +                                         uint32_t opt, uint32_t extra_len)
> {
>     uint32_t len;
>     int rc;
> 
>     TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
>     len = strlen(exp->name);
> -    rc = nbd_negotiate_send_rep_len(ioc, NBD_REP_SERVER, NBD_OPT_LIST,
> -                                    len + sizeof(len));
> +    rc = nbd_negotiate_send_rep_len(ioc, NBD_REP_SERVER, opt,
> +                                    len + sizeof(len) + extra_len);
>     if (rc < 0) {
>         return rc;
>     }
> @@ -261,6 +263,15 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
>     return 0;
> }
> 
> +/* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
> + * Return -errno to kill connection, 0 to continue negotiation. */
> +static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
> +{
> +    return nbd_negotiate_send_rep_server(ioc, exp, NBD_OPT_LIST, 0);
> +}
> +
> +/* Send a sequence of replies to NBD_OPT_LIST.
> + * Return -errno to kill connection, 0 to continue negotiation. */
> static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
> {
>     NBDExport *exp;
> @@ -283,6 +294,8 @@ static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
>     return nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, NBD_OPT_LIST);
> }
> 
> +/* Send a reply to NBD_OPT_EXPORT_NAME.
> + * Return -errno to kill connection, 0 to end negotiation. */
> static int nbd_negotiate_handle_export_name(NBDClient *client, uint32_t length)
> {
>     int rc = -EINVAL;
> @@ -318,6 +331,73 @@ fail:
> }
> 
> 
> +/* Handle NBD_OPT_INFO and NBD_OPT_GO.
> + * Return -errno to kill connection, 0 if ready for next option, and 1
> + * to move into transmission phase.  */
> +static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
> +                                     uint32_t opt, uint16_t myflags)
> +{
> +    int rc;
> +    char name[NBD_MAX_NAME_SIZE + 1];
> +    NBDExport *exp;
> +    uint64_t size;
> +    uint16_t flags;
> +
> +    /* Client sends:
> +        [20 ..  xx]   export name (length bytes)
> +     */
> +    TRACE("Checking length");
> +    if (length >= sizeof(name)) {
> +        if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
> +            return -EIO;
> +        }
> +        return nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_INVALID, opt);
> +    }
> +    if (nbd_negotiate_read(client->ioc, name, length) != length) {
> +        LOG("read failed");
> +        return -EIO;
> +    }
> +    name[length] = '\0';
> +
> +    TRACE("Client requested info on export '%s'", name);
> +
> +    exp = nbd_export_find(name);
> +    if (!exp) {
> +        return nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_UNKNOWN, opt);
> +    }
> +
> +    QEMU_BUILD_BUG_ON(NBD_FINAL_REPLY_SIZE != sizeof(size) + sizeof(flags));
> +    rc = nbd_negotiate_send_rep_server(client->ioc, exp, opt,
> +                                       NBD_FINAL_REPLY_SIZE);
> +    if (rc < 0) {
> +        return rc;
> +    }
> +
> +    assert((exp->nbdflags & ~65535) == 0);
> +    size = cpu_to_be64(exp->size);
> +    flags = cpu_to_be16(exp->nbdflags | myflags);
> +
> +    if (nbd_negotiate_write(client->ioc, &size, sizeof(size)) !=
> +        sizeof(size)) {
> +        LOG("write failed");
> +        return -EIO;
> +    }
> +    if (nbd_negotiate_write(client->ioc, &flags, sizeof(flags)) !=
> +        sizeof(flags)) {
> +        LOG("write failed");
> +        return -EIO;
> +    }
> +
> +    if (opt == NBD_OPT_GO) {
> +        client->exp = exp;
> +        QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
> +        nbd_export_get(client->exp);
> +        rc = 1;
> +    }
> +    return rc;
> +}
> +
> +
> static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
>                                                  uint32_t length)
> {
> @@ -366,7 +446,10 @@ static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
> }
> 
> 
> -static int nbd_negotiate_options(NBDClient *client)
> +/* Loop over all client options, during fixed newstyle negotiation.
> + * Return -errno to kill connection, 0 on successful NBD_OPT_EXPORT_NAME,
> + * 1 on successful NBD_OPT_GO.  */
> +static int nbd_negotiate_options(NBDClient *client, uint16_t myflags)
> {
>     uint32_t flags;
>     bool fixedNewstyle = false;
> @@ -480,6 +563,16 @@ static int nbd_negotiate_options(NBDClient *client)
>             case NBD_OPT_EXPORT_NAME:
>                 return nbd_negotiate_handle_export_name(client, length);
> 
> +            case NBD_OPT_INFO:
> +            case NBD_OPT_GO:
> +                ret = nbd_negotiate_handle_info(client, length, clientflags,
> +                                                myflags);
> +                if (ret) {
> +                    assert(ret < 0 || clientflags == NBD_OPT_GO);
> +                    return ret;
> +                }
> +                break;
> +
>             case NBD_OPT_STARTTLS:
>                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
>                     return -EIO;
> @@ -584,18 +677,21 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>             LOG("write failed");
>             goto fail;
>         }
> -        rc = nbd_negotiate_options(client);
> -        if (rc != 0) {
> +        rc = nbd_negotiate_options(client, myflags);
> +        if (rc < 0) {
>             LOG("option negotiation failed");
>             goto fail;
>         }
> 
> -        stq_be_p(buf + 18, client->exp->size);
> -        stw_be_p(buf + 26, client->exp->nbdflags | myflags);
> -        len = client->no_zeroes ? 10 : sizeof(buf) - 18;
> -        if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
> -            LOG("write failed");
> -            goto fail;
> +        if (!rc) {
> +            /* If options ended with NBD_OPT_GO, we already sent this. */
> +            stq_be_p(buf + 18, client->exp->size);
> +            stw_be_p(buf + 26, client->exp->nbdflags | myflags);
> +            len = client->no_zeroes ? 10 : sizeof(buf) - 18;
> +            if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
> +                LOG("write failed");
> +                goto fail;
> +            }
>         }
>     }
> 
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 16/18] nbd: Support NBD_CMD_CLOSE
  2016-04-08 22:05 ` [Qemu-devel] [PATCH 16/18] nbd: Support NBD_CMD_CLOSE Eric Blake
@ 2016-04-09 10:50   ` Alex Bligh
  2016-04-09 23:12     ` Eric Blake
  0 siblings, 1 reply; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:50 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini,
	open list:Block layer core


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> NBD_CMD_DISC is annoying: the server is not required to reply,
> so the client has no choice but to disconnect once it has sent
> the message; but depending on timing, the server can see the
> disconnect prior to reading the request, and treat things as
> an abrupt exit rather than a clean shutdown (which may affect
> whether the server properly fsync()s data to disk, and so on).
> The new NBD_CMD_CLOSE adds another round of handshake, where
> the client waits for the server's action before closing, to
> make sure both parties know that it was a clean close rather
> than an accidental early disconnect.
> 
> In nbd-client.c, nbd_client_close() is called after we have
> already exited the normal coroutine context used by all the
> other transmission phase handlers, so the code is a bit more
> complex to build up a coroutine just for the purpose of waiting
> for the server's response.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Wouter is not yet convinced of the merits of NBD_CMD_CLOSE
so we should probably resist applying this unless / until we
have convinced him of its benefits.

BTW there is nothing to stop you doing an fsync() on ANY
disconnect server side.

Alex


> ---
> include/block/nbd.h |  4 +++-
> block/nbd-client.c  | 45 ++++++++++++++++++++++++++++++++++++++++++++-
> nbd/server.c        | 19 +++++++++++++++++--
> 3 files changed, 64 insertions(+), 4 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index d261dbc..4c57754 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -70,6 +70,7 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
> #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
> #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
> +#define NBD_FLAG_SEND_CLOSE     (1 << 8)        /* Send CLOSE */
> 
> /* New-style handshake (global) flags, sent from server to client, and
>    control what will happen during handshake phase. */
> @@ -99,7 +100,8 @@ enum {
>     NBD_CMD_WRITE = 1,
>     NBD_CMD_DISC = 2,
>     NBD_CMD_FLUSH = 3,
> -    NBD_CMD_TRIM = 4
> +    NBD_CMD_TRIM = 4,
> +    NBD_CMD_CLOSE = 7,
> };
> 
> #define NBD_DEFAULT_PORT	10809
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 285025d..f013084 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -374,6 +374,29 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
>                        false, nbd_reply_ready, NULL, bs);
> }
> 
> +typedef struct NbdCloseCo {
> +    BlockDriverState *bs;
> +    nbd_request request;
> +    nbd_reply reply;
> +    bool done;
> +} NbdCloseCo;
> +
> +static void coroutine_fn nbd_client_close_co(void *opaque)
> +{
> +    NbdCloseCo *closeco = opaque;
> +    NbdClientSession *client = nbd_get_client_session(closeco->bs);
> +    ssize_t ret;
> +
> +    nbd_coroutine_start(client, &closeco->request);
> +    ret = nbd_co_send_request(closeco->bs, &closeco->request, NULL, 0);
> +    if (ret >= 0) {
> +        nbd_co_receive_reply(client, &closeco->request, &closeco->reply,
> +                             NULL, 0);
> +    }
> +    nbd_coroutine_end(client, &closeco->request);
> +    closeco->done = true;
> +}
> +
> void nbd_client_close(BlockDriverState *bs)
> {
>     NbdClientSession *client = nbd_get_client_session(bs);
> @@ -383,8 +406,28 @@ void nbd_client_close(BlockDriverState *bs)
>         return;
>     }
> 
> -    nbd_send_request(client->ioc, &request);
> +    if (client->nbdflags & NBD_FLAG_SEND_CLOSE) {
> +        /* Newer server, wants us to wait for reply before we close */
> +        Coroutine *co;
> +        NbdCloseCo closeco = {
> +            .bs = bs,
> +            .request = { .type = NBD_CMD_CLOSE },
> +        };
> +        AioContext *aio_context;
> 
> +        g_assert(!qemu_in_coroutine());
> +        aio_context = bdrv_get_aio_context(bs);
> +        co = qemu_coroutine_create(nbd_client_close_co);
> +        qemu_coroutine_enter(co, &closeco);
> +        while (!closeco.done) {
> +            aio_poll(aio_context, true);
> +        }
> +    } else {
> +        /* Older server, send request, but no reply will come */
> +        nbd_send_request(client->ioc, &request);
> +    }
> +
> +    /* Regardless of any received errors, the connection is done. */
>     nbd_teardown_connection(bs);
> }
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index e68e83c..2a6eaf2 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -624,7 +624,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>     char buf[8 + 8 + 8 + 128];
>     int rc;
>     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> -                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> +                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
> +                              NBD_FLAG_SEND_CLOSE);
>     bool oldStyle;
>     size_t len;
> 
> @@ -1244,7 +1245,21 @@ static void nbd_trip(void *opaque)
>         break;
>     case NBD_CMD_DISC:
>         TRACE("Request type is DISCONNECT");
> -        errno = 0;
> +        goto out;
> +    case NBD_CMD_CLOSE:
> +        TRACE("Request type is CLOSE");
> +        if (request.flags || request.from || request.len) {
> +            LOG("bad parameters, skipping flush");
> +            reply.error = EINVAL;
> +        } else {
> +            ret = blk_co_flush(exp->blk);
> +            if (ret < 0) {
> +                LOG("flush failed");
> +                reply.error = -ret;
> +            }
> +        }
> +        /* Attempt to send reply, but even if it fails, we are done */
> +        nbd_co_send_reply(req, &reply, 0);
>         goto out;
>     case NBD_CMD_FLUSH:
>         TRACE("Request type is FLUSH");
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 17/18] nbd: Implement NBD_CMD_WRITE_ZEROES on server
  2016-04-08 22:05 ` [Qemu-devel] [RFC PATCH 17/18] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
  2016-04-09  9:39   ` Pavel Borzenkov
@ 2016-04-09 10:54   ` Alex Bligh
  1 sibling, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:54 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini,
	open list:Block layer core


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> RFC because there is still discussion on the NBD list about
> adding an NBD_OPT_ to let the client suggest server defaults
> related to scanning for zeroes during NBD_CMD_WRITE, which may
> tweak this patch.
> 
> Upstream NBD protocol recently added the ability to efficiently
> write zeroes without having to send the zeroes over the wire,
> along with a flag to control whether the client wants a hole.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> include/block/nbd.h |  5 ++++-
> nbd/server.c        | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> 2 files changed, 64 insertions(+), 4 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 4c57754..a1d955c 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -70,6 +70,7 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
> #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
> #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
> +#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)     /* Send WRITE_ZEROES */
> #define NBD_FLAG_SEND_CLOSE     (1 << 8)        /* Send CLOSE */
> 
> /* New-style handshake (global) flags, sent from server to client, and
> @@ -92,7 +93,8 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_REP_ERR_UNKNOWN     ((UINT32_C(1) << 31) | 6) /* Export unknown */
> 
> /* Request flags, sent from client to server during transmission phase */
> -#define NBD_CMD_FLAG_FUA        (1 << 0)
> +#define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
> +#define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */
> 
> /* Supported request types */
> enum {
> @@ -101,6 +103,7 @@ enum {
>     NBD_CMD_DISC = 2,
>     NBD_CMD_FLUSH = 3,
>     NBD_CMD_TRIM = 4,
> +    NBD_CMD_WRITE_ZEROES = 5,
>     NBD_CMD_CLOSE = 7,
> };
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 2a6eaf2..09af915 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -625,7 +625,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>     int rc;
>     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
>                               NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
> -                              NBD_FLAG_SEND_CLOSE);
> +                              NBD_FLAG_SEND_CLOSE |
> +                              NBD_FLAG_SEND_WRITE_ZEROES);
>     bool oldStyle;
>     size_t len;
> 
> @@ -1088,7 +1089,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
>         goto out;
>     }
> 
> -    if (request->flags & ~NBD_CMD_FLAG_FUA) {
> +    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
>         LOG("unsupported flags (got 0x%x)", request->flags);
>         return -EINVAL;
>     }
> @@ -1102,7 +1103,13 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
>     TRACE("Decoding type");
> 
>     command = request->type;
> -    if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
> +    if (request->flags & NBD_CMD_FLAG_NO_HOLE &&
> +        !(command == NBD_CMD_WRITE || command == NBD_CMD_WRITE_ZEROES)) {
> +        LOG("NO_HOLE flag valid only with write operation");
> +        return -EINVAL;
> +    }
> +    if (command == NBD_CMD_READ || command == NBD_CMD_WRITE ||
> +        command == NBD_CMD_WRITE_ZEROES) {
>         if (request->len > NBD_MAX_BUFFER_SIZE) {
>             LOG("len (%" PRIu32" ) is larger than max len (%u)",
>                 request->len, NBD_MAX_BUFFER_SIZE);
> @@ -1143,6 +1150,7 @@ static void nbd_trip(void *opaque)
>     struct nbd_reply reply;
>     ssize_t ret;
>     uint32_t command;
> +    int flags;
> 
>     TRACE("Reading request.");
>     if (client->closing) {
> @@ -1221,6 +1229,9 @@ static void nbd_trip(void *opaque)
> 
>         TRACE("Writing to device");
> 
> +        /* FIXME: if the client passes NBD_CMD_FLAG_NO_HOLE, can we
> +         * make that override a server that is set to look for
> +         * holes? */

No, and I reckon that's an error in the spec. There is a 'MUST'
which should be a 'SHOULD'. Good luck with any FS which supports
dedupe for instance!

>         ret = blk_write(exp->blk,
>                         (request.from + exp->dev_offset) / BDRV_SECTOR_SIZE,
>                         req->data, request.len / BDRV_SECTOR_SIZE);
> @@ -1243,6 +1254,52 @@ static void nbd_trip(void *opaque)
>             goto out;
>         }
>         break;
> +    case NBD_CMD_WRITE_ZEROES:
> +        TRACE("Request type is WRITE_ZEROES");
> +
> +        if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
> +            TRACE("Server is read-only, return error");
> +            reply.error = EROFS;
> +            goto error_reply;
> +        }
> +
> +        TRACE("Writing to device");
> +
> +        flags = 0;
> +        if (request.flags & NBD_CMD_FLAG_FUA) {
> +            flags |= BDRV_REQ_FUA;
> +        }
> +        if (!(request.flags & NBD_CMD_FLAG_NO_HOLE)) {
> +            /* FIXME: should this depend on whether the server is set to
> +               look for holes? */
> +            flags |= BDRV_REQ_MAY_UNMAP;
> +        }
> +        ret = blk_write_zeroes(exp->blk,
> +                               ((request.from + exp->dev_offset) /
> +                                BDRV_SECTOR_SIZE),
> +                               request.len / BDRV_SECTOR_SIZE,
> +                               flags);
> +        if (ret < 0) {
> +            LOG("writing to file failed");
> +            reply.error = -ret;
> +            goto error_reply;
> +        }
> +
> +        /* FIXME: do we need FUA flush here, if we also passed it to
> +         * blk_write_zeroes? */

I don't think so. blk_write_zeroes with BDRV_REQ_FUA should
make *that* request access the media, but FLUSH makes ALL PREVIOUS
requests access the media, so you are doing too much work here
I think.

> +        if (request.flags & NBD_CMD_FLAG_FUA) {
> +            ret = blk_co_flush(exp->blk);
> +            if (ret < 0) {
> +                LOG("flush failed");
> +                reply.error = -ret;
> +                goto error_reply;
> +            }
> +        }
> +
> +        if (nbd_co_send_reply(req, &reply, 0) < 0) {
> +            goto out;
> +        }
> +        break;
>     case NBD_CMD_DISC:
>         TRACE("Request type is DISCONNECT");
>         goto out;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client
  2016-04-08 22:05 ` [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
@ 2016-04-09 10:57   ` Alex Bligh
  2016-04-09 11:52     ` Pavel Borzenkov
  2016-04-09 23:17     ` Eric Blake
  0 siblings, 2 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-09 10:57 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini,
	open list:Block layer core


On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:

> RFC because there is still discussion on the NBD list about
> adding an NBD_OPT_ to let the client suggest server defaults
> related to scanning for zeroes during NBD_CMD_WRITE, which may
> tweak this patch.
> 
> Upstream NBD protocol recently added the ability to efficiently
> write zeroes without having to send the zeroes over the wire,
> along with a flag to control whether the client wants a hole.
> 
> The generic block code takes care of falling back to the obvious
> write lots of zeroes if we return -ENOTSUP because the server
> does not have WRITE_ZEROES.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> block/nbd-client.h |  2 ++
> block/nbd-client.c | 34 ++++++++++++++++++++++++++++++++++
> block/nbd.c        | 23 +++++++++++++++++++++++
> 3 files changed, 59 insertions(+)
> 
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index bc7aec0..2fe6654 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -47,6 +47,8 @@ void nbd_client_close(BlockDriverState *bs);
> int nbd_client_co_discard(BlockDriverState *bs, int64_t sector_num,
>                           int nb_sectors);
> int nbd_client_co_flush(BlockDriverState *bs);
> +int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +                               int nb_sectors, int *flags);
> int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
>                          int nb_sectors, QEMUIOVector *qiov, int *flags);
> int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index f013084..4be83a8 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -291,6 +291,40 @@ int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
>     return nbd_co_readv_1(bs, sector_num, nb_sectors, qiov, offset);
> }
> 
> +int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +                               int nb_sectors, int *flags)
> +{
> +    ssize_t ret;
> +    NbdClientSession *client = nbd_get_client_session(bs);
> +    struct nbd_request request = { .type = NBD_CMD_WRITE_ZEROES };
> +    struct nbd_reply reply;
> +
> +    if (!(client->nbdflags & NBD_FLAG_SEND_WRITE_ZEROES)) {
> +        return -ENOTSUP;
> +    }
> +
> +    if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
> +        *flags &= ~BDRV_REQ_FUA;
> +        request.flags |= NBD_CMD_FLAG_FUA;
> +    }
> +    if (!(*flags & BDRV_REQ_MAY_UNMAP)) {
> +        request.flags |= NBD_CMD_FLAG_NO_HOLE;
> +    }
> +
> +    request.from = sector_num * 512;
> +    request.len = nb_sectors * 512;
> +
> +    nbd_coroutine_start(client, &request);
> +    ret = nbd_co_send_request(bs, &request, NULL, 0);
> +    if (ret < 0) {
> +        reply.error = -ret;
> +    } else {
> +        nbd_co_receive_reply(client, &request, &reply, NULL, 0);
> +    }
> +    nbd_coroutine_end(client, &request);
> +    return -reply.error;
> +}
> +
> int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
>                          int nb_sectors, QEMUIOVector *qiov, int *flags)
> {
> diff --git a/block/nbd.c b/block/nbd.c
> index f7ea3b3..f5119c0 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -355,6 +355,26 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
>     return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
> }
> 
> +static int nbd_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +                               int nb_sectors, BdrvRequestFlags orig_flags)
> +{
> +    int flags = orig_flags;
> +    int ret;
> +
> +    ret = nbd_client_co_write_zeroes(bs, sector_num, nb_sectors, &flags);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    /* The flag wasn't sent to the server, so we need to emulate it with an
> +     * explicit flush */

Surely you only need to do this is the flag wasn't sent to the server,
i.e. if !(client->nbdflags & NBD_FLAG_SEND_FUA)

If you've sent a FUA request, no need to flush the whole thing.

nbd_co_writev_flags seems to have the same issue, which is where I guess
you got that from.

> +    if (flags & BDRV_REQ_FUA) {
> +        ret = nbd_client_co_flush(bs);
> +    }
> +
> +    return ret;
> +}
> +
> static int nbd_co_writev_flags(BlockDriverState *bs, int64_t sector_num,
>                                int nb_sectors, QEMUIOVector *qiov, int flags)
> {
> @@ -476,6 +496,7 @@ static BlockDriver bdrv_nbd = {
>     .bdrv_parse_filename        = nbd_parse_filename,
>     .bdrv_file_open             = nbd_open,
>     .bdrv_co_readv              = nbd_co_readv,
> +    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
>     .bdrv_co_writev             = nbd_co_writev,
>     .bdrv_co_writev_flags       = nbd_co_writev_flags,
>     .supported_write_flags      = BDRV_REQ_FUA,
> @@ -496,6 +517,7 @@ static BlockDriver bdrv_nbd_tcp = {
>     .bdrv_parse_filename        = nbd_parse_filename,
>     .bdrv_file_open             = nbd_open,
>     .bdrv_co_readv              = nbd_co_readv,
> +    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
>     .bdrv_co_writev             = nbd_co_writev,
>     .bdrv_co_writev_flags       = nbd_co_writev_flags,
>     .supported_write_flags      = BDRV_REQ_FUA,
> @@ -516,6 +538,7 @@ static BlockDriver bdrv_nbd_unix = {
>     .bdrv_parse_filename        = nbd_parse_filename,
>     .bdrv_file_open             = nbd_open,
>     .bdrv_co_readv              = nbd_co_readv,
> +    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
>     .bdrv_co_writev             = nbd_co_writev,
>     .bdrv_co_writev_flags       = nbd_co_writev_flags,
>     .supported_write_flags      = BDRV_REQ_FUA,
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client
  2016-04-09 10:57   ` Alex Bligh
@ 2016-04-09 11:52     ` Pavel Borzenkov
  2016-04-09 23:17     ` Eric Blake
  1 sibling, 0 replies; 48+ messages in thread
From: Pavel Borzenkov @ 2016-04-09 11:52 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Eric Blake, Kevin Wolf, Paolo Bonzini,
	open list:Block layer core, qemu-devel

On Sat, Apr 09, 2016 at 11:57:57AM +0100, Alex Bligh wrote:
> 
> On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:
> 
> > RFC because there is still discussion on the NBD list about
> > adding an NBD_OPT_ to let the client suggest server defaults
> > related to scanning for zeroes during NBD_CMD_WRITE, which may
> > tweak this patch.
> > 
> > Upstream NBD protocol recently added the ability to efficiently
> > write zeroes without having to send the zeroes over the wire,
> > along with a flag to control whether the client wants a hole.
> > 
> > The generic block code takes care of falling back to the obvious
> > write lots of zeroes if we return -ENOTSUP because the server
> > does not have WRITE_ZEROES.
> > 
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> > block/nbd-client.h |  2 ++
> > block/nbd-client.c | 34 ++++++++++++++++++++++++++++++++++
> > block/nbd.c        | 23 +++++++++++++++++++++++
> > 3 files changed, 59 insertions(+)
> > 
> > diff --git a/block/nbd-client.h b/block/nbd-client.h
> > index bc7aec0..2fe6654 100644
> > --- a/block/nbd-client.h
> > +++ b/block/nbd-client.h
> > @@ -47,6 +47,8 @@ void nbd_client_close(BlockDriverState *bs);
> > int nbd_client_co_discard(BlockDriverState *bs, int64_t sector_num,
> >                           int nb_sectors);
> > int nbd_client_co_flush(BlockDriverState *bs);
> > +int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> > +                               int nb_sectors, int *flags);
> > int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
> >                          int nb_sectors, QEMUIOVector *qiov, int *flags);
> > int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
> > diff --git a/block/nbd-client.c b/block/nbd-client.c
> > index f013084..4be83a8 100644
> > --- a/block/nbd-client.c
> > +++ b/block/nbd-client.c
> > @@ -291,6 +291,40 @@ int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
> >     return nbd_co_readv_1(bs, sector_num, nb_sectors, qiov, offset);
> > }
> > 
> > +int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> > +                               int nb_sectors, int *flags)
> > +{
> > +    ssize_t ret;
> > +    NbdClientSession *client = nbd_get_client_session(bs);
> > +    struct nbd_request request = { .type = NBD_CMD_WRITE_ZEROES };
> > +    struct nbd_reply reply;
> > +
> > +    if (!(client->nbdflags & NBD_FLAG_SEND_WRITE_ZEROES)) {
> > +        return -ENOTSUP;
> > +    }
> > +
> > +    if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
> > +        *flags &= ~BDRV_REQ_FUA;
> > +        request.flags |= NBD_CMD_FLAG_FUA;
> > +    }
> > +    if (!(*flags & BDRV_REQ_MAY_UNMAP)) {
> > +        request.flags |= NBD_CMD_FLAG_NO_HOLE;
> > +    }
> > +
> > +    request.from = sector_num * 512;
> > +    request.len = nb_sectors * 512;
> > +
> > +    nbd_coroutine_start(client, &request);
> > +    ret = nbd_co_send_request(bs, &request, NULL, 0);
> > +    if (ret < 0) {
> > +        reply.error = -ret;
> > +    } else {
> > +        nbd_co_receive_reply(client, &request, &reply, NULL, 0);
> > +    }
> > +    nbd_coroutine_end(client, &request);
> > +    return -reply.error;
> > +}
> > +
> > int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
> >                          int nb_sectors, QEMUIOVector *qiov, int *flags)
> > {
> > diff --git a/block/nbd.c b/block/nbd.c
> > index f7ea3b3..f5119c0 100644
> > --- a/block/nbd.c
> > +++ b/block/nbd.c
> > @@ -355,6 +355,26 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
> >     return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
> > }
> > 
> > +static int nbd_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> > +                               int nb_sectors, BdrvRequestFlags orig_flags)
> > +{
> > +    int flags = orig_flags;
> > +    int ret;
> > +
> > +    ret = nbd_client_co_write_zeroes(bs, sector_num, nb_sectors, &flags);
> > +    if (ret < 0) {
> > +        return ret;
> > +    }
> > +
> > +    /* The flag wasn't sent to the server, so we need to emulate it with an
> > +     * explicit flush */
> 
> Surely you only need to do this is the flag wasn't sent to the server,
> i.e. if !(client->nbdflags & NBD_FLAG_SEND_FUA)
> 
> If you've sent a FUA request, no need to flush the whole thing.

In this case BDRV_REQ_FUA is cleared from 'flags' by
nbd_client_co_write_zeroes() and this condition becomes false.

> 
> nbd_co_writev_flags seems to have the same issue, which is where I guess
> you got that from.
> 
> > +    if (flags & BDRV_REQ_FUA) {
> > +        ret = nbd_client_co_flush(bs);
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > static int nbd_co_writev_flags(BlockDriverState *bs, int64_t sector_num,
> >                                int nb_sectors, QEMUIOVector *qiov, int flags)
> > {
> > @@ -476,6 +496,7 @@ static BlockDriver bdrv_nbd = {
> >     .bdrv_parse_filename        = nbd_parse_filename,
> >     .bdrv_file_open             = nbd_open,
> >     .bdrv_co_readv              = nbd_co_readv,
> > +    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
> >     .bdrv_co_writev             = nbd_co_writev,
> >     .bdrv_co_writev_flags       = nbd_co_writev_flags,
> >     .supported_write_flags      = BDRV_REQ_FUA,
> > @@ -496,6 +517,7 @@ static BlockDriver bdrv_nbd_tcp = {
> >     .bdrv_parse_filename        = nbd_parse_filename,
> >     .bdrv_file_open             = nbd_open,
> >     .bdrv_co_readv              = nbd_co_readv,
> > +    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
> >     .bdrv_co_writev             = nbd_co_writev,
> >     .bdrv_co_writev_flags       = nbd_co_writev_flags,
> >     .supported_write_flags      = BDRV_REQ_FUA,
> > @@ -516,6 +538,7 @@ static BlockDriver bdrv_nbd_unix = {
> >     .bdrv_parse_filename        = nbd_parse_filename,
> >     .bdrv_file_open             = nbd_open,
> >     .bdrv_co_readv              = nbd_co_readv,
> > +    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
> >     .bdrv_co_writev             = nbd_co_writev,
> >     .bdrv_co_writev_flags       = nbd_co_writev_flags,
> >     .supported_write_flags      = BDRV_REQ_FUA,
> > -- 
> > 2.5.5
> > 
> > 
> 
> -- 
> Alex Bligh
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 06/18] nbd: Avoid magic number for NBD max name size
  2016-04-09 10:35   ` Alex Bligh
@ 2016-04-09 22:07     ` Eric Blake
  0 siblings, 0 replies; 48+ messages in thread
From: Eric Blake @ 2016-04-09 22:07 UTC (permalink / raw)
  To: Alex Bligh
  Cc: qemu-devel, Kevin Wolf, Paolo Bonzini, open list:Block layer core

[-- Attachment #1: Type: text/plain, Size: 2782 bytes --]

On 04/09/2016 04:35 AM, Alex Bligh wrote:
> 
> On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:
> 
>> Declare a constant and use that when determining if an export
>> name fits within the constraints we are willing to support.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---

>> +/* Maximum size of an export name */
>> +#define NBD_MAX_NAME_SIZE 255
> 
> Given the standard is either likely to or does (can't
> remember whether that patch is merged) document the
> maximum supported export length as 4096, why not change
> this to 4096?

I think I'd rather change the limit in a separate patch, auditing to
make sure that it works.  The current NBD Protocol states:

Where this document refers to a string, then unless otherwise stated,
that string is a sequence of UTF-8 code points, which is not NUL
terminated, MUST NOT contain NUL characters, SHOULD be no longer than
256 bytes and MUST be no longer than 4096 bytes.

So I agree that 255 is too small - a client sending 256 bytes and being
rejected by the server is a bug in the server, while a client sending
4096 bytes and being rejected by the server is not a protocol violation,
just poorer quality of implementation. Also, I think that the server
should gracefully reject the client for 4096 (as in a nice
NBD_ERR_REP_POLICY in response to NBD_OPT_INFO, stating that "your
request was valid by protocol, but not something I'm willing to
handle"), rather than abruptly disconnecting; while dealing with a
client that sends a 100M name is more likely to be a Denial-of-service
attack where an abrupt disconnect is nicer than wasting time reading to
the end of the client's message.

On the other hand, 4096 bytes is big enough that you can't safely stack
allocate (we are trying to get qemu to the point where it can be
compiled with gcc's options to warn on any function that requires more
than 4096 bytes for the entire function, as that is the largest safe
amount you can have before you can run into stack overflow turning what
is supposed to be SIGSEGV into a hard kill on Windows, if you get
unlucky enough to skip over the guard page).  Also, qemu has smaller
limits in other places (for example, no more than 1024 bytes for the
name of a qcow2 backing file), so it does no good to make qemu support
4096 bytes in NBD if it can't pass that on to the rest of qemu.

At any rate, I should probably stick this above explanation in the
commit message (or else do the audit, and merge it into this patch after
all, even if I pick a different limit than 4096).

> 
> Otherwise:
> 
> Reviewed-by: Alex Bligh <alex@alex.org.uk>
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 12/18] nbd: Less allocation during NBD_OPT_LIST
  2016-04-09 10:41   ` Alex Bligh
@ 2016-04-09 22:24     ` Eric Blake
  0 siblings, 0 replies; 48+ messages in thread
From: Eric Blake @ 2016-04-09 22:24 UTC (permalink / raw)
  To: Alex Bligh; +Cc: qemu-devel, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 4951 bytes --]

On 04/09/2016 04:41 AM, Alex Bligh wrote:
> 
> On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:
> 
>> Since we know that the maximum name we are willing to accept
>> is small enough to stack-allocate, rework the iteration over
>> NBD_OPT_LIST responses to reuse a stack buffer rather than
>> allocating every time.  Furthermore, we don't even have to
>> allocate if we know the server's length doesn't match what
>> we are searching for.
>>
>> Not fixed here: Upstream NBD Protocol recently added this
>> clarification:
>> https://github.com/yoe/nbd/blob/18918eb/doc/proto.md#conventions
>>
>> Where this document refers to a string, then unless otherwise
>> stated, that string is a sequence of UTF-8 code points, which
>> is not NUL terminated, MUST NOT contain NUL characters, SHOULD
>> be no longer than 256 bytes and MUST be no longer than 4096
>> bytes. This applies to export names and error messages (amongst
>> others).
>>
>> To be fully compliant to that, we need to bump our export name
>> limit from 255 to at least 256, and need to decide whether we
>> can bump it higher (bumping it all the way to 4096 is annoying
>> in that we could no longer safely stack-allocate a worst-case
>> string, so we may still want to take the leeway offered by SHOULD
>> to force a reasonable smaller limit).
> 
> Is there a limit in qemu-world to safe stack allocation? I thought
> that was (in general) only a kernel consideration? (probably my
> ignorance here).

Even in user space apps, any stack allocation larger than 4096 bytes
risks skipping over the guard page on Windows, which makes the
difference in whether you get a SIGSEGV (good) or a hard process kill
(bad).  We're slowly getting qemu to the point where it will compile
with gcc's options go guarantee that no one function requires more than
4k stack.

> 
> Otherwise:
> 
> 
> Reviewed-by: Alex Bligh <alex@alex.org.uk>
> 

And continuing from the things I mentioned in the other mail regarding
export name limits...

>> -    return 1;
>> +
>> +    if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
>> +        error_setg(errp, "incorrect option length %"PRIu32, len);
>> +        return -1;
>> +    }

This is a case of a faulty server stream (whether evil server, or MitM,
or whatever else...); if the packet wasn't big enough to include
namelen, or if the message size is larger than 32M, the stream is
considered corrupt to the point that it is no longer worth talking to
the server (hence, the return -1).

>> +    if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
>> +        error_setg(errp, "failed to read option name length");
>> +        return -1;
>> +    }

Likewise, anywhere we fail to read the server stream (most often due to
stream disconnect causing EOF), returning -1 is fine because we can't
recover anyway.

>> +    namelen = be32_to_cpu(namelen);
>> +    len -= sizeof(namelen);
>> +    if (len < namelen) {
>> +        error_setg(errp, "incorrect option name length");
>> +        return -1;
>> +    }

Likewise, if the server gives a namelen that would read beyond the
bounds of the overall packet length, the server can't be trusted for
anything else.

>> +    if (namelen != strlen(want)) {
>> +        if (drop_sync(ioc, len) != len) {
>> +            error_setg(errp, "failed to skip export name with wrong length");
>> +            return -1;
>> +        }
>> +        return 2;
>> +    }

While this gracefully handles any remaining string size, even up to
qemu's NBD_MAX_BUFFER_SIZE (32M), well beyond the required 256 or
recommended 4096 of the protocol.

>> +
>> +    assert(namelen < sizeof(name));
>> +    if (read_sync(ioc, name, namelen) != namelen) {
>> +        error_setg(errp, "failed to read export name");
>> +        return -1;
>> +    }
>> +    name[namelen] = '\0';
>> +    len -= namelen;
>> +    if (drop_sync(ioc, len) != len) {
>> +        error_setg(errp, "failed to read export description");
>> +        return -1;
>> +    }
>> +    return strcmp(name, want) == 0 ? 3 : 2;

So with this patch, I've worked it into allowing any string sizes from
the server, while focusing only on the strings that match the length of
the name requested by the client; now the audit proceeds to find out
whether letting the client request a name longer than 255 makes sense,
but at least this part of the client/server interaction is safe, and
ready for a later patch to bump the size of the #define for max name
length, unless bumping it too large makes the local buf[] exceed
preferred stack size.

[It's also interesting to note that once the NBD_OPT_GO code is live and
supported by more servers, we won't even be hitting this section of
NBD_OPT_INFO code in the qemu client]

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 13/18] nbd: Support shorter handshake
  2016-04-09 10:42   ` Alex Bligh
@ 2016-04-09 22:27     ` Eric Blake
  0 siblings, 0 replies; 48+ messages in thread
From: Eric Blake @ 2016-04-09 22:27 UTC (permalink / raw)
  To: Alex Bligh
  Cc: qemu-devel, Kevin Wolf, Paolo Bonzini, open list:Block layer core

[-- Attachment #1: Type: text/plain, Size: 839 bytes --]

On 04/09/2016 04:42 AM, Alex Bligh wrote:
> 
> On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:
> 
>> The NBD Protocol allows the server and client to mutually agree
>> on a shorter handshake (omit the 124 bytes of reserved 0), via
>> the server advertising NBD_FLAG_NO_ZEROES and the client
>> acknowledging with NBD_FLAG_C_NO_ZEROES (only possible in
>> newstyle, whether or not it is fixed newstyle).  It doesn't
>> shave much off the wire, but we might as well implement it.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> 
> Reviewed-by: Alex Bligh <alex@alex.org.uk>
> 
> thanks - that was annoying me.
> 

It turns out that doing this also made the NBD_OPT_GO patches easier :)

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 14/18] nbd: Implement NBD_OPT_GO on client
  2016-04-09 10:47   ` Alex Bligh
@ 2016-04-09 22:38     ` Eric Blake
  0 siblings, 0 replies; 48+ messages in thread
From: Eric Blake @ 2016-04-09 22:38 UTC (permalink / raw)
  To: Alex Bligh
  Cc: qemu-devel, Kevin Wolf, Paolo Bonzini, open list:Block layer core

[-- Attachment #1: Type: text/plain, Size: 1669 bytes --]

On 04/09/2016 04:47 AM, Alex Bligh wrote:
> 
> On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:
> 
>> NBD_OPT_EXPORT_NAME is lousy: it doesn't have any sane error
>> reporting.  Upstream NBD recently added NBD_OPT_GO as the
> 
> ... as an experimental option for now, but hopefully this
> should move it out the experimental section.
> 
> Thanks for doing this one.
> 
>> improved version of the option that does what we want: it
>> reports sane errors on failures (including when a server
>> requires TLS but does not have NBD_OPT_GO!), and on success
>> it concludes with the same data as NBD_OPT_EXPORT_NAME sends.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> Perhaps worth adding that although all servers that support
> FixedNewstyle are meant to support (i.e. error but not disconnect on)
> unsupported options, perhaps some don't (in which case they
> are buggy and should be fixed). But just in case someone asks
> 'why is qemu no longer connecting to shonkynbd', a message in
> the commit log might be useful.

qemu 2.5 is one of those 'shonkynbd' servers.  We just barely fixed it
in commit 156f6a10 in time for 2.6, but now you've made me worry whether
that will be a big enough problem to have to hack around in newer qemu
clients.  On the other hand, we won't merge the NBD_OPT_GO code until
qemu 2.7, a few more months down the road; and we can get the fix
backported to the 2.5.x stable series in the meantime.

> 
> Otherwise:
> 
> Reviewed-by: Alex Bligh <alex@alex.org.uk>
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 16/18] nbd: Support NBD_CMD_CLOSE
  2016-04-09 10:50   ` Alex Bligh
@ 2016-04-09 23:12     ` Eric Blake
  2016-04-10  5:28       ` Alex Bligh
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-09 23:12 UTC (permalink / raw)
  To: Alex Bligh
  Cc: qemu-devel, Kevin Wolf, Paolo Bonzini, open list:Block layer core

[-- Attachment #1: Type: text/plain, Size: 1895 bytes --]

On 04/09/2016 04:50 AM, Alex Bligh wrote:
> 
> On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:
> 
>> NBD_CMD_DISC is annoying: the server is not required to reply,
>> so the client has no choice but to disconnect once it has sent
>> the message; but depending on timing, the server can see the
>> disconnect prior to reading the request, and treat things as
>> an abrupt exit rather than a clean shutdown (which may affect
>> whether the server properly fsync()s data to disk, and so on).
>> The new NBD_CMD_CLOSE adds another round of handshake, where
>> the client waits for the server's action before closing, to
>> make sure both parties know that it was a clean close rather
>> than an accidental early disconnect.
>>
>> In nbd-client.c, nbd_client_close() is called after we have
>> already exited the normal coroutine context used by all the
>> other transmission phase handlers, so the code is a bit more
>> complex to build up a coroutine just for the purpose of waiting
>> for the server's response.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> Wouter is not yet convinced of the merits of NBD_CMD_CLOSE
> so we should probably resist applying this unless / until we
> have convinced him of its benefits.
> 
> BTW there is nothing to stop you doing an fsync() on ANY
> disconnect server side.

Qemu clients _already_ do the safe actions of waiting for all inflight
requests to complete, then sending one final NBD_CMD_FLUSH, before
attempting to send NBD_CMD_DISC.  If I knew how to make qemu guarantee
that the NBD_CMD_DISC hits the wire (even in TLS mode) rather than being
dropped early, that seems nicer than having to implement this (although
I did learn a bit about qemu coroutines in implementing this).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client
  2016-04-09 10:57   ` Alex Bligh
  2016-04-09 11:52     ` Pavel Borzenkov
@ 2016-04-09 23:17     ` Eric Blake
  2016-04-10  5:27       ` Alex Bligh
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Blake @ 2016-04-09 23:17 UTC (permalink / raw)
  To: Alex Bligh
  Cc: qemu-devel, Kevin Wolf, Paolo Bonzini, open list:Block layer core

[-- Attachment #1: Type: text/plain, Size: 2050 bytes --]

On 04/09/2016 04:57 AM, Alex Bligh wrote:
> 
> On 8 Apr 2016, at 23:05, Eric Blake <eblake@redhat.com> wrote:
> 
>> RFC because there is still discussion on the NBD list about
>> adding an NBD_OPT_ to let the client suggest server defaults
>> related to scanning for zeroes during NBD_CMD_WRITE, which may
>> tweak this patch.
>>
>> Upstream NBD protocol recently added the ability to efficiently
>> write zeroes without having to send the zeroes over the wire,
>> along with a flag to control whether the client wants a hole.
>>
>> The generic block code takes care of falling back to the obvious
>> write lots of zeroes if we return -ENOTSUP because the server
>> does not have WRITE_ZEROES.
>>

>> +    ret = nbd_client_co_write_zeroes(bs, sector_num, nb_sectors, &flags);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    /* The flag wasn't sent to the server, so we need to emulate it with an
>> +     * explicit flush */
> 
> Surely you only need to do this is the flag wasn't sent to the server,
> i.e. if !(client->nbdflags & NBD_FLAG_SEND_FUA)
> 
> If you've sent a FUA request, no need to flush the whole thing.
> 
> nbd_co_writev_flags seems to have the same issue, which is where I guess
> you got that from.

No, the code is correct.  In both functions, the logic is that if the
lower-level knows that the server respects FUA, then it clears the flag
before returning (flags is passed by reference, not value).  Then at
this higher level, if FUA is still set, the server is too old, so we do
a fallback flush to get the same semantics for the write in question,
but at higher cost of a full flush.

> 
>> +    if (flags & BDRV_REQ_FUA) {
>> +        ret = nbd_client_co_flush(bs);
>> +    }
>> +

It's also this higher level that knows how to fall back to NBD_CMD_WRITE
if the lower-level returned -ENOTSUP because the server doesn't support
WRITE_ZEROES.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client
  2016-04-09 23:17     ` Eric Blake
@ 2016-04-10  5:27       ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-10  5:27 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, Kevin Wolf, Paolo Bonzini, qemu-devel,
	open list:Block layer core

[-- Attachment #1: Type: text/plain, Size: 499 bytes --]


On 10 Apr 2016, at 00:17, Eric Blake <eblake@redhat.com> wrote:

> No, the code is correct.  In both functions, the logic is that if the
> lower-level knows that the server respects FUA, then it clears the flag
> before returning (flags is passed by reference, not value).  Then at
> this higher level, if FUA is still set, the server is too old, so we do
> a fallback flush to get the same semantics for the write in question,
> but at higher cost of a full flush.

OK, thanks.

--
Alex Bligh





[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 16/18] nbd: Support NBD_CMD_CLOSE
  2016-04-09 23:12     ` Eric Blake
@ 2016-04-10  5:28       ` Alex Bligh
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Bligh @ 2016-04-10  5:28 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, Kevin Wolf, Paolo Bonzini, qemu-devel,
	open list:Block layer core

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]


On 10 Apr 2016, at 00:12, Eric Blake <eblake@redhat.com> wrote:

> Qemu clients _already_ do the safe actions of waiting for all inflight
> requests to complete, then sending one final NBD_CMD_FLUSH, before
> attempting to send NBD_CMD_DISC.  If I knew how to make qemu guarantee
> that the NBD_CMD_DISC hits the wire (even in TLS mode) rather than being
> dropped early, that seems nicer than having to implement this (although
> I did learn a bit about qemu coroutines in implementing this).

Thanks. As discussed elsewhere, I think it's gnutls_bye() but I'm
more familiar with openssl and would be concerned that gnutls_bye()
might block.

--
Alex Bligh





[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2016-04-10  5:28 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-08 22:05 [Qemu-devel] [RFC PATCH 00/18] NBD protocol additions Eric Blake
2016-04-08 22:05 ` [Qemu-devel] [PATCH 01/18] nbd: Don't kill server on client that doesn't request TLS Eric Blake
2016-04-09 10:28   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 02/18] nbd: Don't fail handshake on NBD_OPT_LIST descriptions Eric Blake
2016-04-09 10:30   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 03/18] nbd: More debug typo fixes, use correct formats Eric Blake
2016-04-09 10:30   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 04/18] nbd: Detect servers that send unexpected error values Eric Blake
2016-04-09 10:31   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 05/18] nbd: Reject unknown request flags Eric Blake
2016-04-09 10:32   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 06/18] nbd: Avoid magic number for NBD max name size Eric Blake
2016-04-09 10:35   ` Alex Bligh
2016-04-09 22:07     ` Eric Blake
2016-04-08 22:05 ` [Qemu-devel] [PATCH 07/18] nbd: Treat flags vs. command type as separate fields Eric Blake
2016-04-09 10:37   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 08/18] nbd: Limit nbdflags to 16 bits Eric Blake
2016-04-09 10:37   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 09/18] nbd: Share common reply-sending code in server Eric Blake
2016-04-09 10:38   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 10/18] nbd: Share common option-sending code in client Eric Blake
2016-04-09 10:38   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 11/18] nbd: Let client skip portions of server reply Eric Blake
2016-04-09 10:39   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 12/18] nbd: Less allocation during NBD_OPT_LIST Eric Blake
2016-04-09 10:41   ` Alex Bligh
2016-04-09 22:24     ` Eric Blake
2016-04-08 22:05 ` [Qemu-devel] [PATCH 13/18] nbd: Support shorter handshake Eric Blake
2016-04-09 10:42   ` Alex Bligh
2016-04-09 22:27     ` Eric Blake
2016-04-08 22:05 ` [Qemu-devel] [PATCH 14/18] nbd: Implement NBD_OPT_GO on client Eric Blake
2016-04-09 10:47   ` Alex Bligh
2016-04-09 22:38     ` Eric Blake
2016-04-08 22:05 ` [Qemu-devel] [PATCH 15/18] nbd: Implement NBD_OPT_GO on server Eric Blake
2016-04-09 10:48   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [PATCH 16/18] nbd: Support NBD_CMD_CLOSE Eric Blake
2016-04-09 10:50   ` Alex Bligh
2016-04-09 23:12     ` Eric Blake
2016-04-10  5:28       ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [RFC PATCH 17/18] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
2016-04-09  9:39   ` Pavel Borzenkov
2016-04-09 10:54   ` Alex Bligh
2016-04-08 22:05 ` [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
2016-04-09 10:57   ` Alex Bligh
2016-04-09 11:52     ` Pavel Borzenkov
2016-04-09 23:17     ` Eric Blake
2016-04-10  5:27       ` Alex Bligh
2016-04-09 10:21 ` [Qemu-devel] [Nbd] [RFC PATCH 00/18] NBD protocol additions Wouter Verhelst

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.