All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes
@ 2016-06-25 22:15 Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 01/14] nbd: Fix bad flag detection on server Eric Blake
                   ` (13 more replies)
  0 siblings, 14 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini

The upstream NBD protocol is proposing an extension for efficient
write zeroes; having a qemu implementation will be one of the reasons
to promote the proposal from experimental to standard:
https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md

Some of these patches were previously posted as part of a larger v3
series, so this is labeled v4:
https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg03526.html

Prerequisites:
v1 Auto-fragment large transactions at the block layer:
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg05819.html
v1 byte-based block discard:
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg06491.html
v1 Switch raw NBD to byte-based
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07030.html

Also available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-zero-v4

changes since v3:
rebasing on lots of intermediate patches that have been applied,
particularly related to byte-based block driver performance; and
even more protocol compliance

001/15:[down] 'nbd: Fix bad flag detection on server'
002/15:[----] [-C] 'nbd: Add qemu-nbd -D for human-readable description'
003/15:[0002] [FC] 'nbd: Limit nbdflags to 16 bits'
004/15:[0044] [FC] 'nbd: Treat flags vs. command type as separate fields'
005/15:[----] [--] 'nbd: Share common reply-sending code in server'
006/15:[down] 'nbd: Send message along with server NBD_REP_ERR errors'
007/15:[0014] [FC] 'nbd: Share common option-sending code in client'
008/15:[down] 'nbd: Let server know when client gives up negotiation'
009/15:[0002] [FC] 'nbd: Let client skip portions of server reply'
010/15:[0016] [FC] 'nbd: Less allocation during NBD_OPT_LIST'
011/15:[----] [-C] 'nbd: Support shorter handshake'
012/15:[down] 'nbd: Improve server handling of shutdown requests'
013/15:[down] 'tmp enable tracing for qemu-nbd/qemu-io'
014/15:[0003] [FC] 'nbd: Implement NBD_CMD_WRITE_ZEROES on server'
015/15:[0057] [FC] 'nbd: Implement NBD_CMD_WRITE_ZEROES on client'

Eric Blake (14):
  nbd: Fix bad flag detection on server
  nbd: Add qemu-nbd -D for human-readable description
  nbd: Limit nbdflags to 16 bits
  nbd: Treat flags vs. command type as separate fields
  nbd: Share common reply-sending code in server
  nbd: Send message along with server NBD_REP_ERR errors
  nbd: Share common option-sending code in client
  nbd: Let server know when client gives up negotiation
  nbd: Let client skip portions of server reply
  nbd: Less allocation during NBD_OPT_LIST
  nbd: Support shorter handshake
  nbd: Improve server handling of shutdown requests
  nbd: Implement NBD_CMD_WRITE_ZEROES on server
  nbd: Implement NBD_CMD_WRITE_ZEROES on client

 block/nbd-client.h  |   4 +-
 include/block/nbd.h |  69 ++++++--
 nbd/nbd-internal.h  |  12 +-
 block/nbd-client.c  |  44 ++++-
 block/nbd.c         |   4 +
 nbd/client.c        | 500 ++++++++++++++++++++++++++++------------------------
 nbd/server.c        | 267 +++++++++++++++++++---------
 qemu-nbd.c          |  16 +-
 qemu-nbd.texi       |   5 +-
 9 files changed, 574 insertions(+), 347 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 01/14] nbd: Fix bad flag detection on server
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 02/14] nbd: Add qemu-nbd -D for human-readable description Eric Blake
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini

Commit ab7c548e added a check for invalid flags, but used an
early return on error instead of properly going through the
cleanup label.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: new patch
---
 nbd/server.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index 2f2d9dd..724ac12 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1057,7 +1057,8 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
     if (request->type & ~NBD_CMD_MASK_COMMAND & ~NBD_CMD_FLAG_FUA) {
         LOG("unsupported flags (got 0x%x)",
             request->type & ~NBD_CMD_MASK_COMMAND);
-        return -EINVAL;
+        rc = -EINVAL;
+        goto out;
     }

     rc = 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 02/14] nbd: Add qemu-nbd -D for human-readable description
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 01/14] nbd: Fix bad flag detection on server Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 03/14] nbd: Limit nbdflags to 16 bits Eric Blake
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, Kevin Wolf, Max Reitz

The NBD protocol allows servers to advertise a human-readable
description alongside an export name during NBD_OPT_LIST.  Add
an option to pass through the user's string to the NBD client.

Doing this also makes it easier to test commit 200650d4, which
is the client counterpart of receiving the description.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: rebase to latest
---
 include/block/nbd.h |  1 +
 nbd/nbd-internal.h  |  5 +++--
 nbd/server.c        | 34 ++++++++++++++++++++++++++--------
 qemu-nbd.c          | 12 +++++++++++-
 qemu-nbd.texi       |  5 ++++-
 5 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index cb91820..309db2b 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -114,6 +114,7 @@ BlockBackend *nbd_export_get_blockdev(NBDExport *exp);

 NBDExport *nbd_export_find(const char *name);
 void nbd_export_set_name(NBDExport *exp, const char *name);
+void nbd_export_set_description(NBDExport *exp, const char *description);
 void nbd_export_close_all(void);

 void nbd_client_new(NBDExport *exp,
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 93a6ca8..7e78064 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -104,9 +104,10 @@ static inline ssize_t read_sync(QIOChannel *ioc, void *buffer, size_t size)
     return nbd_wr_syncv(ioc, &iov, 1, size, true);
 }

-static inline ssize_t write_sync(QIOChannel *ioc, void *buffer, size_t size)
+static inline ssize_t write_sync(QIOChannel *ioc, const void *buffer,
+                                 size_t size)
 {
-    struct iovec iov = { .iov_base = buffer, .iov_len = size };
+    struct iovec iov = { .iov_base = (void *) buffer, .iov_len = size };

     return nbd_wr_syncv(ioc, &iov, 1, size, false);
 }
diff --git a/nbd/server.c b/nbd/server.c
index 724ac12..b875d08 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -61,6 +61,7 @@ struct NBDExport {

     BlockBackend *blk;
     char *name;
+    char *description;
     off_t dev_offset;
     off_t size;
     uint32_t nbdflags;
@@ -128,7 +129,8 @@ static ssize_t nbd_negotiate_read(QIOChannel *ioc, void *buffer, size_t size)

 }

-static ssize_t nbd_negotiate_write(QIOChannel *ioc, void *buffer, size_t size)
+static ssize_t nbd_negotiate_write(QIOChannel *ioc, const void *buffer,
+                                   size_t size)
 {
     ssize_t ret;
     guint watch;
@@ -224,11 +226,15 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)

 static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
 {
-    uint64_t magic, name_len;
+    uint64_t magic;
+    size_t name_len, desc_len;
     uint32_t opt, type, len;
+    const char *name = exp->name ? exp->name : "";
+    const char *desc = exp->description ? exp->description : "";

-    TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
-    name_len = strlen(exp->name);
+    TRACE("Advertising export name '%s' description '%s'", name, desc);
+    name_len = strlen(name);
+    desc_len = strlen(desc);
     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
         LOG("write failed (magic)");
@@ -244,18 +250,22 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
         LOG("write failed (reply type)");
         return -EINVAL;
     }
-    len = cpu_to_be32(name_len + sizeof(len));
+    len = cpu_to_be32(name_len + desc_len + sizeof(len));
     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
         LOG("write failed (length)");
         return -EINVAL;
     }
     len = cpu_to_be32(name_len);
     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
-        LOG("write failed (length)");
+        LOG("write failed (name length)");
         return -EINVAL;
     }
-    if (nbd_negotiate_write(ioc, exp->name, name_len) != name_len) {
-        LOG("write failed (buffer)");
+    if (nbd_negotiate_write(ioc, name, name_len) != name_len) {
+        LOG("write failed (name buffer)");
+        return -EINVAL;
+    }
+    if (nbd_negotiate_write(ioc, desc, desc_len) != desc_len) {
+        LOG("write failed (description buffer)");
         return -EINVAL;
     }
     return 0;
@@ -883,6 +893,12 @@ void nbd_export_set_name(NBDExport *exp, const char *name)
     nbd_export_put(exp);
 }

+void nbd_export_set_description(NBDExport *exp, const char *description)
+{
+    g_free(exp->description);
+    exp->description = g_strdup(description);
+}
+
 void nbd_export_close(NBDExport *exp)
 {
     NBDClient *client, *next;
@@ -892,6 +908,7 @@ void nbd_export_close(NBDExport *exp)
         client_close(client);
     }
     nbd_export_set_name(exp, NULL);
+    nbd_export_set_description(exp, NULL);
     nbd_export_put(exp);
 }

@@ -910,6 +927,7 @@ void nbd_export_put(NBDExport *exp)

     if (--exp->refcount == 0) {
         assert(exp->name == NULL);
+        assert(exp->description == NULL);

         if (exp->close) {
             exp->close(exp);
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 9519db3..c229ead 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -80,6 +80,7 @@ static void usage(const char *name)
 "  -t, --persistent          don't exit on the last connection\n"
 "  -v, --verbose             display extra debugging information\n"
 "  -x, --export-name=NAME    expose export by name\n"
+"  -D, --description=TEXT    with -x, also export a human-readable description\n"
 "\n"
 "Exposing part of the image:\n"
 "  -o, --offset=OFFSET       offset into the image\n"
@@ -470,7 +471,7 @@ int main(int argc, char **argv)
     off_t fd_size;
     QemuOpts *sn_opts = NULL;
     const char *sn_id_or_name = NULL;
-    const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:";
+    const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:D:";
     struct option lopt[] = {
         { "help", no_argument, NULL, 'h' },
         { "version", no_argument, NULL, 'V' },
@@ -496,6 +497,7 @@ int main(int argc, char **argv)
         { "verbose", no_argument, NULL, 'v' },
         { "object", required_argument, NULL, QEMU_NBD_OPT_OBJECT },
         { "export-name", required_argument, NULL, 'x' },
+        { "description", required_argument, NULL, 'D' },
         { "tls-creds", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
         { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
         { NULL, 0, NULL, 0 }
@@ -515,6 +517,7 @@ int main(int argc, char **argv)
     BlockdevDetectZeroesOptions detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
     QDict *options = NULL;
     const char *export_name = NULL;
+    const char *export_description = NULL;
     const char *tlscredsid = NULL;
     bool imageOpts = false;
     bool writethrough = true;
@@ -675,6 +678,9 @@ int main(int argc, char **argv)
         case 'x':
             export_name = optarg;
             break;
+        case 'D':
+            export_description = optarg;
+            break;
         case 'v':
             verbose = 1;
             break;
@@ -901,7 +907,11 @@ int main(int argc, char **argv)
     }
     if (export_name) {
         nbd_export_set_name(exp, export_name);
+        nbd_export_set_description(exp, export_description);
         newproto = true;
+    } else if (export_description) {
+        error_report("Export description requires an export name");
+        exit(EXIT_FAILURE);
     }

     server_ioc = qio_channel_socket_new();
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 9f23343..923de74 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -79,9 +79,12 @@ Disconnect the device @var{dev}
 Allow up to @var{num} clients to share the device (default @samp{1})
 @item -t, --persistent
 Don't exit on the last connection
-@item -x NAME, --export-name=NAME
+@item -x, --export-name=@var{name}
 Set the NBD volume export name. This switches the server to use
 the new style NBD protocol negotiation
+@item -D, --description=@var{description}
+Set the NBD volume export description, as a human-readable
+string. Requires the use of @option{-x}
 @item --tls-creds=ID
 Enable mandatory TLS encryption for the server by setting the ID
 of the TLS credentials object previously created with the --object
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 03/14] nbd: Limit nbdflags to 16 bits
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 01/14] nbd: Fix bad flag detection on server Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 02/14] nbd: Add qemu-nbd -D for human-readable description Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 04/14] nbd: Treat flags vs. command type as separate fields Eric Blake
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, qemu-stable, Kevin Wolf, Max Reitz

Rather than asserting that nbdflags is within range, just give
it the correct type to begin with :)  nbdflags corresponds to
the per-export portion of NBD Protocol "transmission flags", which
is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.

Furthermore, upstream NBD has never passed the global flags to
the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
tried to OR the global flags with the transmission flags, with
the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
caused all earlier NBD 3.x clients to treat every export as
read-only; NBD 3.10 and later intentionally clip things to 16
bits to pass only transmission flags).  Qemu should follow suit,
since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
during transmission.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: rebase, cc qemu-stable
v3: expand scope of patch
---
 block/nbd-client.h  |  2 +-
 include/block/nbd.h |  6 +++---
 nbd/client.c        | 28 +++++++++++++++-------------
 nbd/server.c        | 10 ++++------
 qemu-nbd.c          |  4 ++--
 5 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index fa9817b..044aca4 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -20,7 +20,7 @@
 typedef struct NbdClientSession {
     QIOChannelSocket *sioc; /* The master data channel */
     QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
-    uint32_t nbdflags;
+    uint16_t nbdflags;
     off_t size;

     CoMutex send_mutex;
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 309db2b..fd7e30b 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -90,11 +90,11 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
                      size_t niov,
                      size_t length,
                      bool do_read);
-int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
+int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
                           QIOChannel **outioc,
                           off_t *size, Error **errp);
-int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size);
+int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
 ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request);
 ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply);
 int nbd_client(int fd);
@@ -104,7 +104,7 @@ typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;

 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
-                          uint32_t nbdflags, void (*close)(NBDExport *),
+                          uint16_t nbdflags, void (*close)(NBDExport *),
                           Error **errp);
 void nbd_export_close(NBDExport *exp);
 void nbd_export_get(NBDExport *exp);
diff --git a/nbd/client.c b/nbd/client.c
index 78a7195..a92f1e2 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -408,7 +408,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
 }


-int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
+int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
                           QIOChannel **outioc,
                           off_t *size, Error **errp)
@@ -468,7 +468,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         uint32_t opt;
         uint32_t namesize;
         uint16_t globalflags;
-        uint16_t exportflags;
         bool fixedNewStyle = false;

         if (read_sync(ioc, &globalflags, sizeof(globalflags)) !=
@@ -477,7 +476,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
             goto fail;
         }
         globalflags = be16_to_cpu(globalflags);
-        *flags = globalflags << 16;
         TRACE("Global flags are %" PRIx32, globalflags);
         if (globalflags & NBD_FLAG_FIXED_NEWSTYLE) {
             fixedNewStyle = true;
@@ -545,17 +543,15 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
             goto fail;
         }
         *size = be64_to_cpu(s);
-        TRACE("Size is %" PRIu64, *size);

-        if (read_sync(ioc, &exportflags, sizeof(exportflags)) !=
-            sizeof(exportflags)) {
+        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
             error_setg(errp, "Failed to read export flags");
             goto fail;
         }
-        exportflags = be16_to_cpu(exportflags);
-        *flags |= exportflags;
-        TRACE("Export flags are %" PRIx16, exportflags);
+        be16_to_cpus(flags);
     } else if (magic == NBD_CLIENT_MAGIC) {
+        uint32_t oldflags;
+
         if (name) {
             error_setg(errp, "Server does not support export names");
             goto fail;
@@ -572,16 +568,22 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         *size = be64_to_cpu(s);
         TRACE("Size is %" PRIu64, *size);

-        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
+        if (read_sync(ioc, &oldflags, sizeof(oldflags)) != sizeof(oldflags)) {
             error_setg(errp, "Failed to read export flags");
             goto fail;
         }
-        *flags = be32_to_cpu(*flags);
+        be32_to_cpus(&oldflags);
+        if (oldflags & ~0xffff) {
+            error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
+            goto fail;
+        }
+        *flags = oldflags;
     } else {
         error_setg(errp, "Bad magic received");
         goto fail;
     }

+    TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
     if (read_sync(ioc, &buf, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
@@ -593,7 +595,7 @@ fail:
 }

 #ifdef __linux__
-int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
 {
     unsigned long sectors = size / BDRV_SECTOR_SIZE;
     if (size / BDRV_SECTOR_SIZE != sectors) {
@@ -689,7 +691,7 @@ int nbd_disconnect(int fd)
 }

 #else
-int nbd_init(int fd, QIOChannelSocket *ioc, uint32_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size)
 {
     return -ENOTSUP;
 }
diff --git a/nbd/server.c b/nbd/server.c
index b875d08..20c1b77 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -64,7 +64,7 @@ struct NBDExport {
     char *description;
     off_t dev_offset;
     off_t size;
-    uint32_t nbdflags;
+    uint16_t nbdflags;
     QTAILQ_HEAD(, NBDClient) clients;
     QTAILQ_ENTRY(NBDExport) next;

@@ -554,8 +554,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     NBDClient *client = data->client;
     char buf[8 + 8 + 8 + 128];
     int rc;
-    const int myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
-                         NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
+    const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
+                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
     bool oldStyle;

     /* Old style negotiation header without options
@@ -585,7 +585,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)

     oldStyle = client->exp != NULL && !client->tlscreds;
     if (oldStyle) {
-        assert ((client->exp->nbdflags & ~65535) == 0);
         TRACE("advertising size %" PRIu64 " and flags %x",
               client->exp->size, client->exp->nbdflags | myflags);
         stq_be_p(buf + 8, NBD_CLIENT_MAGIC);
@@ -616,7 +615,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
             goto fail;
         }

-        assert ((client->exp->nbdflags & ~65535) == 0);
         TRACE("advertising size %" PRIu64 " and flags %x",
               client->exp->size, client->exp->nbdflags | myflags);
         stq_be_p(buf + 18, client->exp->size);
@@ -820,7 +818,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
 }

 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
-                          uint32_t nbdflags, void (*close)(NBDExport *),
+                          uint16_t nbdflags, void (*close)(NBDExport *),
                           Error **errp)
 {
     NBDExport *exp = g_malloc0(sizeof(NBDExport));
diff --git a/qemu-nbd.c b/qemu-nbd.c
index c229ead..dac3edc 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -248,7 +248,7 @@ static void *nbd_client_thread(void *arg)
 {
     char *device = arg;
     off_t size;
-    uint32_t nbdflags;
+    uint16_t nbdflags;
     QIOChannelSocket *sioc;
     int fd;
     int ret;
@@ -462,7 +462,7 @@ int main(int argc, char **argv)
     BlockBackend *blk;
     BlockDriverState *bs;
     off_t dev_offset = 0;
-    uint32_t nbdflags = 0;
+    uint16_t nbdflags = 0;
     bool disconnect = false;
     const char *bindto = "0.0.0.0";
     const char *port = NULL;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 04/14] nbd: Treat flags vs. command type as separate fields
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (2 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 03/14] nbd: Limit nbdflags to 16 bits Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 05/14] nbd: Share common reply-sending code in server Eric Blake
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, Kevin Wolf, Max Reitz

Current upstream NBD documents that requests have a 16-bit flags,
followed by a 16-bit type integer; although older versions mentioned
only a 32-bit field with masking to find flags.  Since the protocol
is in network order (big-endian over the wire), the ABI is unchanged;
but dealing with the flags as a separate field rather than masking
will make it easier to add support for upcoming NBD extensions that
increase the number of both flags and commands.

Improve some comments in nbd.h based on the current upstream
NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
and touch some nearby code to keep checkpatch.pl happy.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: rebase to earlier changes
v3: rebase to other changes earlier in series
---
 include/block/nbd.h | 18 ++++++++++++------
 nbd/nbd-internal.h  |  4 ++--
 block/nbd-client.c  |  9 +++------
 nbd/client.c        |  9 ++++++---
 nbd/server.c        | 39 +++++++++++++++++++--------------------
 5 files changed, 42 insertions(+), 37 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fd7e30b..6904eb4 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,4 +1,5 @@
 /*
+ *  Copyright (C) 2016 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -32,7 +33,8 @@ struct nbd_request {
     uint64_t handle;
     uint64_t from;
     uint32_t len;
-    uint32_t type;
+    uint16_t flags;
+    uint16_t type;
 };

 struct nbd_reply {
@@ -40,6 +42,8 @@ struct nbd_reply {
     uint32_t error;
 };

+/* Transmission (export) flags: sent from server to client during handshake,
+   but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
 #define NBD_FLAG_READ_ONLY      (1 << 1)        /* Device is read-only */
 #define NBD_FLAG_SEND_FLUSH     (1 << 2)        /* Send FLUSH */
@@ -47,10 +51,12 @@ struct nbd_reply {
 #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
 #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */

-/* New-style global flags. */
+/* New-style handshake (global) flags, sent from server to client, and
+   control what will happen during handshake phase. */
 #define NBD_FLAG_FIXED_NEWSTYLE     (1 << 0)    /* Fixed newstyle protocol. */

-/* New-style client flags. */
+/* New-style client flags, sent from client to server to control what happens
+   during handshake phase. */
 #define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)    /* Fixed newstyle protocol. */

 /* Reply types. */
@@ -61,10 +67,10 @@ struct nbd_reply {
 #define NBD_REP_ERR_INVALID     ((UINT32_C(1) << 31) | 3) /* Invalid length. */
 #define NBD_REP_ERR_TLS_REQD    ((UINT32_C(1) << 31) | 5) /* TLS required */

+/* Request flags, sent from client to server during transmission phase */
+#define NBD_CMD_FLAG_FUA        (1 << 0)

-#define NBD_CMD_MASK_COMMAND	0x0000ffff
-#define NBD_CMD_FLAG_FUA	(1 << 16)
-
+/* Supported request types */
 enum {
     NBD_CMD_READ = 0,
     NBD_CMD_WRITE = 1,
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 7e78064..99e5157 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -53,10 +53,10 @@
 /* This is all part of the "official" NBD API.
  *
  * The most up-to-date documentation is available at:
- * https://github.com/yoe/nbd/blob/master/doc/proto.txt
+ * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

-#define NBD_REQUEST_SIZE        (4 + 4 + 8 + 8 + 4)
+#define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
 #define NBD_REPLY_SIZE          (4 + 4 + 8)
 #define NBD_REQUEST_MAGIC       0x25609513
 #define NBD_REPLY_MAGIC         0x67446698
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 647dedd..88e52c1 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -1,6 +1,7 @@
 /*
  * QEMU Block driver for  NBD
  *
+ * Copyright (C) 2016 Red Hat, Inc.
  * Copyright (C) 2008 Bull S.A.S.
  *     Author: Laurent Vivier <Laurent.Vivier@bull.net>
  *
@@ -258,7 +259,7 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,

     if (flags & BDRV_REQ_FUA) {
         assert(client->nbdflags & NBD_FLAG_SEND_FUA);
-        request.type |= NBD_CMD_FLAG_FUA;
+        request.flags |= NBD_CMD_FLAG_FUA;
     }

     assert(bytes <= NBD_MAX_BUFFER_SIZE);
@@ -343,11 +344,7 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
 void nbd_client_close(BlockDriverState *bs)
 {
     NbdClientSession *client = nbd_get_client_session(bs);
-    struct nbd_request request = {
-        .type = NBD_CMD_DISC,
-        .from = 0,
-        .len = 0
-    };
+    struct nbd_request request = { .type = NBD_CMD_DISC };

     if (client->ioc == NULL) {
         return;
diff --git a/nbd/client.c b/nbd/client.c
index a92f1e2..7c172ed 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,4 +1,5 @@
 /*
+ *  Copyright (C) 2016 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Client Side
@@ -713,11 +714,13 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request)

     TRACE("Sending request to server: "
           "{ .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64
-          ", .type=%" PRIu32 " }",
-          request->from, request->len, request->handle, request->type);
+          ", .flags = %" PRIx16 ", .type = %" PRIu16 " }",
+          request->from, request->len, request->handle,
+          request->flags, request->type);

     stl_be_p(buf, NBD_REQUEST_MAGIC);
-    stl_be_p(buf + 4, request->type);
+    stw_be_p(buf + 4, request->flags);
+    stw_be_p(buf + 6, request->type);
     stq_be_p(buf + 8, request->handle);
     stq_be_p(buf + 16, request->from);
     stl_be_p(buf + 24, request->len);
diff --git a/nbd/server.c b/nbd/server.c
index 20c1b77..ea0c412 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,4 +1,5 @@
 /*
+ *  Copyright (C) 2016 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Server Side
@@ -650,21 +651,23 @@ static ssize_t nbd_receive_request(QIOChannel *ioc, struct nbd_request *request)

     /* Request
        [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
-       [ 4 ..  7]   type    (0 == READ, 1 == WRITE)
+       [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+       [ 6 ..  7]   type    (NBD_CMD_READ, ...)
        [ 8 .. 15]   handle
        [16 .. 23]   from
        [24 .. 27]   len
      */

     magic = ldl_be_p(buf);
-    request->type   = ldl_be_p(buf + 4);
+    request->flags  = lduw_be_p(buf + 4);
+    request->type   = lduw_be_p(buf + 6);
     request->handle = ldq_be_p(buf + 8);
     request->from   = ldq_be_p(buf + 16);
     request->len    = ldl_be_p(buf + 24);

-    TRACE("Got request: { magic = 0x%" PRIx32 ", .type = %" PRIx32
-          ", from = %" PRIu64 " , len = %" PRIu32 " }",
-          magic, request->type, request->from, request->len);
+    TRACE("Got request: { magic = 0x%" PRIx32 ", .flags = %" PRIx16
+          ", .type = %" PRIx16 ", from = %" PRIu64 ", len = %" PRIu32 " }",
+          magic, request->flags, request->type, request->from, request->len);

     if (magic != NBD_REQUEST_MAGIC) {
         LOG("invalid magic (got 0x%" PRIx32 ")", magic);
@@ -997,7 +1000,6 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
                                       struct nbd_request *request)
 {
     NBDClient *client = req->client;
-    uint32_t command;
     ssize_t rc;

     g_assert(qemu_in_coroutine());
@@ -1014,13 +1016,12 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,

     TRACE("Decoding type");

-    command = request->type & NBD_CMD_MASK_COMMAND;
-    if (command != NBD_CMD_WRITE) {
+    if (request->type != NBD_CMD_WRITE) {
         /* No payload, we are ready to read the next request.  */
         req->complete = true;
     }

-    if (command == NBD_CMD_DISC) {
+    if (request->type == NBD_CMD_DISC) {
         /* Special case: we're going to disconnect without a reply,
          * whether or not flags, from, or len are bogus */
         TRACE("Request type is DISCONNECT");
@@ -1037,7 +1038,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
         goto out;
     }

-    if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
+    if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE) {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
             LOG("len (%" PRIu32" ) is larger than max len (%u)",
                 request->len, NBD_MAX_BUFFER_SIZE);
@@ -1051,7 +1052,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
             goto out;
         }
     }
-    if (command == NBD_CMD_WRITE) {
+    if (request->type == NBD_CMD_WRITE) {
         TRACE("Reading %" PRIu32 " byte(s)", request->len);

         if (read_sync(client->ioc, req->data, request->len) != request->len) {
@@ -1067,12 +1068,11 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
         LOG("operation past EOF; From: %" PRIu64 ", Len: %" PRIu32
             ", Size: %" PRIu64, request->from, request->len,
             (uint64_t)client->exp->size);
-        rc = command == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
+        rc = request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
         goto out;
     }
-    if (request->type & ~NBD_CMD_MASK_COMMAND & ~NBD_CMD_FLAG_FUA) {
-        LOG("unsupported flags (got 0x%x)",
-            request->type & ~NBD_CMD_MASK_COMMAND);
+    if (request->flags & ~NBD_CMD_FLAG_FUA) {
+        LOG("unsupported flags (got 0x%x)", request->flags);
         rc = -EINVAL;
         goto out;
     }
@@ -1094,7 +1094,6 @@ static void nbd_trip(void *opaque)
     struct nbd_request request;
     struct nbd_reply reply;
     ssize_t ret;
-    uint32_t command;
     int flags;

     TRACE("Reading request.");
@@ -1118,7 +1117,6 @@ static void nbd_trip(void *opaque)
         reply.error = -ret;
         goto error_reply;
     }
-    command = request.type & NBD_CMD_MASK_COMMAND;

     if (client->closing) {
         /*
@@ -1128,11 +1126,12 @@ static void nbd_trip(void *opaque)
         goto done;
     }

-    switch (command) {
+    switch (request.type) {
     case NBD_CMD_READ:
         TRACE("Request type is READ");

-        if (request.type & NBD_CMD_FLAG_FUA) {
+        /* XXX: NBD Protocol only documents use of FUA with WRITE */
+        if (request.flags & NBD_CMD_FLAG_FUA) {
             ret = blk_co_flush(exp->blk);
             if (ret < 0) {
                 LOG("flush failed");
@@ -1165,7 +1164,7 @@ static void nbd_trip(void *opaque)
         TRACE("Writing to device");

         flags = 0;
-        if (request.type & NBD_CMD_FLAG_FUA) {
+        if (request.flags & NBD_CMD_FLAG_FUA) {
             flags |= BDRV_REQ_FUA;
         }
         ret = blk_pwrite(exp->blk, request.from + exp->dev_offset,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 05/14] nbd: Share common reply-sending code in server
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (3 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 04/14] nbd: Treat flags vs. command type as separate fields Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 06/14] nbd: Send message along with server NBD_REP_ERR errors Eric Blake
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini

Rather than open-coding NBD_REP_SERVER, reuse the code we
already have by adding a length parameter.  Additionally,
the refactoring will make adding NBD_OPT_GO in a later patch
easier.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: no change
v3: rebase to changes earlier in series
---
 nbd/server.c | 48 +++++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index ea0c412..a9bcecd 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -195,12 +195,15 @@ static ssize_t nbd_negotiate_drop_sync(QIOChannel *ioc, size_t size)

 */

-static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
+/* Send a reply header, including length, but no payload.
+ * Return -errno to kill connection, 0 to continue negotiation. */
+static int nbd_negotiate_send_rep_len(QIOChannel *ioc, uint32_t type,
+                                      uint32_t opt, uint32_t len)
 {
     uint64_t magic;
-    uint32_t len;

-    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32, type, opt);
+    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32 " len=%" PRIu32,
+          type, opt, len);

     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
@@ -217,7 +220,7 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
         LOG("write failed (rep type)");
         return -EINVAL;
     }
-    len = cpu_to_be32(0);
+    len = cpu_to_be32(len);
     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
         LOG("write failed (rep data length)");
         return -EINVAL;
@@ -225,37 +228,32 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
     return 0;
 }

+/* Send a reply header with default 0 length.
+ * Return -errno to kill connection, 0 to continue negotiation. */
+static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
+{
+    return nbd_negotiate_send_rep_len(ioc, type, opt, 0);
+}
+
+/* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
+ * Return -errno to kill connection, 0 to continue negotiation. */
 static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
 {
-    uint64_t magic;
     size_t name_len, desc_len;
-    uint32_t opt, type, len;
+    uint32_t len;
     const char *name = exp->name ? exp->name : "";
     const char *desc = exp->description ? exp->description : "";
+    int rc;

     TRACE("Advertising export name '%s' description '%s'", name, desc);
     name_len = strlen(name);
     desc_len = strlen(desc);
-    magic = cpu_to_be64(NBD_REP_MAGIC);
-    if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        LOG("write failed (magic)");
-        return -EINVAL;
-     }
-    opt = cpu_to_be32(NBD_OPT_LIST);
-    if (nbd_negotiate_write(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        LOG("write failed (opt)");
-        return -EINVAL;
-    }
-    type = cpu_to_be32(NBD_REP_SERVER);
-    if (nbd_negotiate_write(ioc, &type, sizeof(type)) != sizeof(type)) {
-        LOG("write failed (reply type)");
-        return -EINVAL;
-    }
-    len = cpu_to_be32(name_len + desc_len + sizeof(len));
-    if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
-        LOG("write failed (length)");
-        return -EINVAL;
+    len = name_len + desc_len + sizeof(len);
+    rc = nbd_negotiate_send_rep_len(ioc, NBD_REP_SERVER, NBD_OPT_LIST, len);
+    if (rc < 0) {
+        return rc;
     }
+
     len = cpu_to_be32(name_len);
     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
         LOG("write failed (name length)");
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 06/14] nbd: Send message along with server NBD_REP_ERR errors
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (4 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 05/14] nbd: Share common reply-sending code in server Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-27 12:02   ` Paolo Bonzini
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 07/14] nbd: Share common option-sending code in client Eric Blake
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini

The NBD Protocol allows us to send human-readable messages
along with any NBD_REP_ERR error during option negotiation;
make use of this fact for clients that know what to do with
our message.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: new patch
---
 nbd/server.c | 74 ++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 55 insertions(+), 19 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index a9bcecd..99faa36 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -235,6 +235,34 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
     return nbd_negotiate_send_rep_len(ioc, type, opt, 0);
 }

+/* Send an error reply.
+ * Return -errno to kill connection, 0 to continue negotiation. */
+static int GCC_FMT_ATTR(4, 5)
+    nbd_negotiate_send_rep_err(QIOChannel *ioc, uint32_t type,
+                               uint32_t opt, const char *fmt, ...)
+{
+    va_list va;
+    char *msg;
+    int ret;
+    size_t len;
+
+    va_start(va, fmt);
+    msg = g_strdup_vprintf(fmt, va);
+    va_end(va);
+    len = strlen(msg);
+    assert(len < 4096);
+    TRACE("sending error message \"%s\"", msg);
+    ret = nbd_negotiate_send_rep_len(ioc, type, opt, len);
+    if (ret < 0) {
+        return ret;
+    }
+    if (nbd_negotiate_write(ioc, msg, len) != len) {
+        LOG("write failed (error message)");
+        return -EINVAL;
+    }
+    return 0;
+}
+
 /* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
  * Return -errno to kill connection, 0 to continue negotiation. */
 static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
@@ -278,8 +306,9 @@ static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
         if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
             return -EIO;
         }
-        return nbd_negotiate_send_rep(client->ioc,
-                                      NBD_REP_ERR_INVALID, NBD_OPT_LIST);
+        return nbd_negotiate_send_rep_err(client->ioc,
+                                          NBD_REP_ERR_INVALID, NBD_OPT_LIST,
+                                          "OPT_LIST should not have length");
     }

     /* For each export, send a NBD_REP_SERVER reply. */
@@ -326,7 +355,8 @@ fail:
     return rc;
 }

-
+/* Handle NBD_OPT_STARTTLS. Return NULL to drop connection, or else the
+ * new channel for all further (now-encrypted) communication. */
 static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
                                                  uint32_t length)
 {
@@ -340,7 +370,8 @@ static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
         if (nbd_negotiate_drop_sync(ioc, length) != length) {
             return NULL;
         }
-        nbd_negotiate_send_rep(ioc, NBD_REP_ERR_INVALID, NBD_OPT_STARTTLS);
+        nbd_negotiate_send_rep_err(ioc, NBD_REP_ERR_INVALID, NBD_OPT_STARTTLS,
+                                   "OPT_STARTTLS should not have length");
         return NULL;
     }

@@ -468,13 +499,15 @@ static int nbd_negotiate_options(NBDClient *client)
                 return -EINVAL;

             default:
-                TRACE("Option 0x%" PRIx32 " not permitted before TLS",
-                      clientflags);
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
                 }
-                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_TLS_REQD,
-                                             clientflags);
+                ret = nbd_negotiate_send_rep_err(client->ioc,
+                                                 NBD_REP_ERR_TLS_REQD,
+                                                 clientflags,
+                                                 "Option 0x%" PRIx32
+                                                 "not permitted before TLS",
+                                                 clientflags);
                 if (ret < 0) {
                     return ret;
                 }
@@ -500,27 +533,30 @@ static int nbd_negotiate_options(NBDClient *client)
                     return -EIO;
                 }
                 if (client->tlscreds) {
-                    TRACE("TLS already enabled");
-                    ret = nbd_negotiate_send_rep(client->ioc,
-                                                 NBD_REP_ERR_INVALID,
-                                                 clientflags);
+                    ret = nbd_negotiate_send_rep_err(client->ioc,
+                                                     NBD_REP_ERR_INVALID,
+                                                     clientflags,
+                                                     "TLS already enabled");
                 } else {
-                    TRACE("TLS not configured");
-                    ret = nbd_negotiate_send_rep(client->ioc,
-                                                 NBD_REP_ERR_POLICY,
-                                                 clientflags);
+                    ret = nbd_negotiate_send_rep_err(client->ioc,
+                                                     NBD_REP_ERR_POLICY,
+                                                     clientflags,
+                                                     "TLS not configured");
                 }
                 if (ret < 0) {
                     return ret;
                 }
                 break;
             default:
-                TRACE("Unsupported option 0x%" PRIx32, clientflags);
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
                 }
-                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_UNSUP,
-                                             clientflags);
+                ret = nbd_negotiate_send_rep_err(client->ioc,
+                                                 NBD_REP_ERR_UNSUP,
+                                                 clientflags,
+                                                 "Unsupported option 0x%"
+                                                 PRIx32,
+                                                 clientflags);
                 if (ret < 0) {
                     return ret;
                 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 07/14] nbd: Share common option-sending code in client
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (5 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 06/14] nbd: Send message along with server NBD_REP_ERR errors Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 08/14] nbd: Let server know when client gives up negotiation Eric Blake
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, Kevin Wolf, Max Reitz

Rather than open-coding each option request, it's easier to
have common helper functions do the work.  That in turn requires
having convenient packed types for handling option requests
and replies.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: rebase
v3: rebase, tweak a debug message
---
 include/block/nbd.h |  25 +++++-
 nbd/nbd-internal.h  |   2 +-
 nbd/client.c        | 250 ++++++++++++++++++++++------------------------------
 3 files changed, 126 insertions(+), 151 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 6904eb4..7c5103f 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -26,15 +26,34 @@
 #include "io/channel-socket.h"
 #include "crypto/tlscreds.h"

-/* Note: these are _NOT_ the same as the network representation of an NBD
+/* Handshake phase structs - this struct is passed on the wire */
+
+struct nbd_option {
+    uint64_t magic; /* NBD_OPTS_MAGIC */
+    uint32_t option; /* NBD_OPT_* */
+    uint32_t length;
+} QEMU_PACKED;
+typedef struct nbd_option nbd_option;
+
+struct nbd_opt_reply {
+    uint64_t magic; /* NBD_REP_MAGIC */
+    uint32_t option; /* NBD_OPT_* */
+    uint32_t type; /* NBD_REP_* */
+    uint32_t length;
+} QEMU_PACKED;
+typedef struct nbd_opt_reply nbd_opt_reply;
+
+/* Transmission phase structs
+ *
+ * Note: these are _NOT_ the same as the network representation of an NBD
  * request and reply!
  */
 struct nbd_request {
     uint64_t handle;
     uint64_t from;
     uint32_t len;
-    uint16_t flags;
-    uint16_t type;
+    uint16_t flags; /* NBD_CMD_FLAG_* */
+    uint16_t type; /* NBD_CMD_* */
 };

 struct nbd_reply {
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 99e5157..dd57e18 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -62,7 +62,7 @@
 #define NBD_REPLY_MAGIC         0x67446698
 #define NBD_OPTS_MAGIC          0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC        0x0000420281861253LL
-#define NBD_REP_MAGIC           0x3e889045565a9LL
+#define NBD_REP_MAGIC           0x0003e889045565a9LL

 #define NBD_SET_SOCK            _IO(0xab, 0)
 #define NBD_SET_BLKSIZE         _IO(0xab, 1)
diff --git a/nbd/client.c b/nbd/client.c
index 7c172ed..6e5288a 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -75,64 +75,123 @@ static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

 */

+/* Send an option request. Return 0 if successful, -1 with errp set if
+ * it is impossible to continue. */
+static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
+                                   uint32_t len, const char *data,
+                                   Error **errp)
+{
+    nbd_option req;
+    QEMU_BUILD_BUG_ON(sizeof(req) != 16);

-/* If type represents success, return 1 without further action.
- * If type represents an error reply, consume the rest of the packet on ioc.
- * Then return 0 for unsupported (so the client can fall back to
- * other approaches), or -1 with errp set for other errors.
+    if (len == -1) {
+        req.length = len = strlen(data);
+    }
+    TRACE("Sending option request %"PRIu32", len %"PRIu32, opt, len);
+
+    stq_be_p(&req.magic, NBD_OPTS_MAGIC);
+    stl_be_p(&req.option, opt);
+    stl_be_p(&req.length, len);
+
+    if (write_sync(ioc, &req, sizeof(req)) != sizeof(req)) {
+        error_setg(errp, "Failed to send option request header");
+        return -1;
+    }
+
+    if (len && write_sync(ioc, (char *) data, len) != len) {
+        error_setg(errp, "Failed to send option request data");
+        return -1;
+    }
+
+    return 0;
+}
+
+/* Receive the header of an option reply, which should match the given
+ * opt.  Read through the length field, but NOT the length bytes of
+ * payload. Return 0 if successful, -1 with errp set if it is
+ * impossible to continue. */
+static int nbd_receive_option_reply(QIOChannel *ioc, uint32_t opt,
+                                    nbd_opt_reply *reply, Error **errp)
+{
+    QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
+    if (read_sync(ioc, reply, sizeof(*reply)) != sizeof(*reply)) {
+        error_setg(errp, "failed to read option reply");
+        return -1;
+    }
+    be64_to_cpus(&reply->magic);
+    be32_to_cpus(&reply->option);
+    be32_to_cpus(&reply->type);
+    be32_to_cpus(&reply->length);
+
+    TRACE("Received option reply %"PRIx32", type %"PRIx32", len %"PRIu32,
+          reply->option, reply->type, reply->length);
+
+    if (reply->magic != NBD_REP_MAGIC) {
+        error_setg(errp, "Unexpected option reply magic");
+        return -1;
+    }
+    if (reply->option != opt) {
+        error_setg(errp, "Unexpected option type %x expected %x",
+                   reply->option, opt);
+        return -1;
+    }
+    return 0;
+}
+
+/* If reply represents success, return 1 without further action.
+ * If reply represents an error, consume the optional payload of
+ * the packet on ioc.  Then return 0 for unsupported (so the client
+ * can fall back to other approaches), or -1 with errp set for other
+ * errors.
  */
-static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,
+static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
                                 Error **errp)
 {
-    uint32_t len;
     char *msg = NULL;
     int result = -1;

-    if (!(type & (1 << 31))) {
+    if (!(reply->type & (1 << 31))) {
         return 1;
     }

-    if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
-        error_setg(errp, "failed to read option length");
-        return -1;
-    }
-    len = be32_to_cpu(len);
-    if (len) {
-        if (len > NBD_MAX_BUFFER_SIZE) {
+    if (reply->length) {
+        if (reply->length > NBD_MAX_BUFFER_SIZE) {
             error_setg(errp, "server's error message is too long");
             goto cleanup;
         }
-        msg = g_malloc(len + 1);
-        if (read_sync(ioc, msg, len) != len) {
+        msg = g_malloc(reply->length + 1);
+        if (read_sync(ioc, msg, reply->length) != reply->length) {
             error_setg(errp, "failed to read option error message");
             goto cleanup;
         }
-        msg[len] = '\0';
+        msg[reply->length] = '\0';
     }

-    switch (type) {
+    switch (reply->type) {
     case NBD_REP_ERR_UNSUP:
         TRACE("server doesn't understand request %" PRIx32
-              ", attempting fallback", opt);
+              ", attempting fallback", reply->option);
         result = 0;
         goto cleanup;

     case NBD_REP_ERR_POLICY:
-        error_setg(errp, "Denied by server for option %" PRIx32, opt);
+        error_setg(errp, "Denied by server for option %" PRIx32,
+                   reply->option);
         break;

     case NBD_REP_ERR_INVALID:
-        error_setg(errp, "Invalid data length for option %" PRIx32, opt);
+        error_setg(errp, "Invalid data length for option %" PRIx32,
+                   reply->option);
         break;

     case NBD_REP_ERR_TLS_REQD:
         error_setg(errp, "TLS negotiation required before option %" PRIx32,
-                   opt);
+                   reply->option);
         break;

     default:
         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
-                   opt);
+                   reply->option);
         break;
     }

@@ -147,58 +206,29 @@ static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,

 static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
 {
-    uint64_t magic;
-    uint32_t opt;
-    uint32_t type;
+    nbd_opt_reply reply;
     uint32_t len;
     uint32_t namelen;
     int error;

     *name = NULL;
-    if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "failed to read list option magic");
+    if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
         return -1;
     }
-    magic = be64_to_cpu(magic);
-    if (magic != NBD_REP_MAGIC) {
-        error_setg(errp, "Unexpected option list magic");
-        return -1;
-    }
-    if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "failed to read list option");
-        return -1;
-    }
-    opt = be32_to_cpu(opt);
-    if (opt != NBD_OPT_LIST) {
-        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
-                   opt, NBD_OPT_LIST);
-        return -1;
-    }
-
-    if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
-        error_setg(errp, "failed to read list option type");
-        return -1;
-    }
-    type = be32_to_cpu(type);
-    error = nbd_handle_reply_err(ioc, opt, type, errp);
+    error = nbd_handle_reply_err(ioc, &reply, errp);
     if (error <= 0) {
         return error;
     }
+    len = reply.length;

-    if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
-        error_setg(errp, "failed to read option length");
-        return -1;
-    }
-    len = be32_to_cpu(len);
-
-    if (type == NBD_REP_ACK) {
+    if (reply.type == NBD_REP_ACK) {
         if (len != 0) {
             error_setg(errp, "length too long for option end");
             return -1;
         }
-    } else if (type == NBD_REP_SERVER) {
+    } else if (reply.type == NBD_REP_SERVER) {
         if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
-            error_setg(errp, "incorrect option length");
+            error_setg(errp, "incorrect option length %"PRIu32, len);
             return -1;
         }
         if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
@@ -240,7 +270,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
         }
     } else {
         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
-                   type, NBD_REP_SERVER);
+                   reply.type, NBD_REP_SERVER);
         return -1;
     }
     return 1;
@@ -251,24 +281,10 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
                                      const char *wantname,
                                      Error **errp)
 {
-    uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
-    uint32_t opt = cpu_to_be32(NBD_OPT_LIST);
-    uint32_t length = 0;
     bool foundExport = false;

     TRACE("Querying export list");
-    if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "Failed to send list option magic");
-        return -1;
-    }
-
-    if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "Failed to send list option number");
-        return -1;
-    }
-
-    if (write_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
-        error_setg(errp, "Failed to send list option length");
+    if (nbd_send_option_request(ioc, NBD_OPT_LIST, 0, NULL, errp) < 0) {
         return -1;
     }

@@ -314,72 +330,29 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
                                         QCryptoTLSCreds *tlscreds,
                                         const char *hostname, Error **errp)
 {
-    uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
-    uint32_t opt = cpu_to_be32(NBD_OPT_STARTTLS);
-    uint32_t length = 0;
-    uint32_t type;
+    nbd_opt_reply reply;
     QIOChannelTLS *tioc;
     struct NBDTLSHandshakeData data = { 0 };

     TRACE("Requesting TLS from server");
-    if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "Failed to send option magic");
-        return NULL;
-    }
-
-    if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "Failed to send option number");
-        return NULL;
-    }
-
-    if (write_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
-        error_setg(errp, "Failed to send option length");
-        return NULL;
-    }
-
-    TRACE("Getting TLS reply from server1");
-    if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "failed to read option magic");
-        return NULL;
-    }
-    magic = be64_to_cpu(magic);
-    if (magic != NBD_REP_MAGIC) {
-        error_setg(errp, "Unexpected option magic");
-        return NULL;
-    }
-    TRACE("Getting TLS reply from server2");
-    if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "failed to read option");
-        return NULL;
-    }
-    opt = be32_to_cpu(opt);
-    if (opt != NBD_OPT_STARTTLS) {
-        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
-                   opt, NBD_OPT_STARTTLS);
+    if (nbd_send_option_request(ioc, NBD_OPT_STARTTLS, 0, NULL, errp) < 0) {
         return NULL;
     }

     TRACE("Getting TLS reply from server");
-    if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
-        error_setg(errp, "failed to read option type");
+    if (nbd_receive_option_reply(ioc, NBD_OPT_STARTTLS, &reply, errp) < 0) {
         return NULL;
     }
-    type = be32_to_cpu(type);
-    if (type != NBD_REP_ACK) {
+
+    if (reply.type != NBD_REP_ACK) {
         error_setg(errp, "Server rejected request to start TLS %" PRIx32,
-                   type);
+                   reply.type);
         return NULL;
     }

-    TRACE("Getting TLS reply from server");
-    if (read_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
-        error_setg(errp, "failed to read option length");
-        return NULL;
-    }
-    length = be32_to_cpu(length);
-    if (length != 0) {
+    if (reply.length != 0) {
         error_setg(errp, "Start TLS response was not zero %" PRIu32,
-                   length);
+                   reply.length);
         return NULL;
     }

@@ -466,8 +439,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,

     if (magic == NBD_OPTS_MAGIC) {
         uint32_t clientflags = 0;
-        uint32_t opt;
-        uint32_t namesize;
         uint16_t globalflags;
         bool fixedNewStyle = false;

@@ -517,28 +488,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                 goto fail;
             }
         }
-        /* write the export name */
-        magic = cpu_to_be64(magic);
-        if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-            error_setg(errp, "Failed to send export name magic");
-            goto fail;
-        }
-        opt = cpu_to_be32(NBD_OPT_EXPORT_NAME);
-        if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-            error_setg(errp, "Failed to send export name option number");
-            goto fail;
-        }
-        namesize = cpu_to_be32(strlen(name));
-        if (write_sync(ioc, &namesize, sizeof(namesize)) !=
-            sizeof(namesize)) {
-            error_setg(errp, "Failed to send export name length");
-            goto fail;
-        }
-        if (write_sync(ioc, (char *)name, strlen(name)) != strlen(name)) {
-            error_setg(errp, "Failed to send export name");
+        /* write the export name request */
+        if (nbd_send_option_request(ioc, NBD_OPT_EXPORT_NAME, -1, name,
+                                    errp) < 0) {
             goto fail;
         }

+        /* Read the response */
         if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
             error_setg(errp, "Failed to read export length");
             goto fail;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 08/14] nbd: Let server know when client gives up negotiation
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (6 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 07/14] nbd: Share common option-sending code in client Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 09/14] nbd: Let client skip portions of server reply Eric Blake
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini

The NBD spec says that a client should send NBD_OPT_ABORT
rather than just dropping the connection, if the client doesn't
like something the server sent during option negotiation.  This
is a best-effort attempt only, and can only be done in places
where we know the server is still in sync with what we've sent,
whether or not we've read everything the server has sent.
Technically, the server then has to reply with NBD_REP_ACK, but
it's not worth complicating the client to wait around for that
reply.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: new patch
---
 nbd/client.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/nbd/client.c b/nbd/client.c
index 6e5288a..a78e81c 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -106,6 +106,19 @@ static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
     return 0;
 }

+/* Send NBD_OPT_ABORT as a courtesy to let the server know that we are
+ * not going to attempt further negotiation. */
+static void nbd_send_opt_abort(QIOChannel *ioc)
+{
+    /* Technically, a compliant server is supposed to reply to us; but
+     * older servers disconnected instead. At any rate, we're allowed
+     * to disconnect without waiting for the server reply, so we don't
+     * even care if the request makes it to the server, let alone
+     * waiting around for whether the server replies. */
+    nbd_send_option_request(ioc, NBD_OPT_ABORT, 0, NULL, NULL);
+}
+
+
 /* Receive the header of an option reply, which should match the given
  * opt.  Read through the length field, but NOT the length bytes of
  * payload. Return 0 if successful, -1 with errp set if it is
@@ -116,6 +129,7 @@ static int nbd_receive_option_reply(QIOChannel *ioc, uint32_t opt,
     QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
     if (read_sync(ioc, reply, sizeof(*reply)) != sizeof(*reply)) {
         error_setg(errp, "failed to read option reply");
+        nbd_send_opt_abort(ioc);
         return -1;
     }
     be64_to_cpus(&reply->magic);
@@ -128,11 +142,13 @@ static int nbd_receive_option_reply(QIOChannel *ioc, uint32_t opt,

     if (reply->magic != NBD_REP_MAGIC) {
         error_setg(errp, "Unexpected option reply magic");
+        nbd_send_opt_abort(ioc);
         return -1;
     }
     if (reply->option != opt) {
         error_setg(errp, "Unexpected option type %x expected %x",
                    reply->option, opt);
+        nbd_send_opt_abort(ioc);
         return -1;
     }
     return 0;
@@ -201,6 +217,9 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,

  cleanup:
     g_free(msg);
+    if (result < 0) {
+        nbd_send_opt_abort(ioc);
+    }
     return result;
 }

@@ -224,25 +243,30 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
     if (reply.type == NBD_REP_ACK) {
         if (len != 0) {
             error_setg(errp, "length too long for option end");
+            nbd_send_opt_abort(ioc);
             return -1;
         }
     } else if (reply.type == NBD_REP_SERVER) {
         if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
             error_setg(errp, "incorrect option length %"PRIu32, len);
+            nbd_send_opt_abort(ioc);
             return -1;
         }
         if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
             error_setg(errp, "failed to read option name length");
+            nbd_send_opt_abort(ioc);
             return -1;
         }
         namelen = be32_to_cpu(namelen);
         len -= sizeof(namelen);
         if (len < namelen) {
             error_setg(errp, "incorrect option name length");
+            nbd_send_opt_abort(ioc);
             return -1;
         }
         if (namelen > NBD_MAX_NAME_SIZE) {
             error_setg(errp, "export name length too long %" PRIu32, namelen);
+            nbd_send_opt_abort(ioc);
             return -1;
         }

@@ -251,6 +275,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             error_setg(errp, "failed to read export name");
             g_free(*name);
             *name = NULL;
+            nbd_send_opt_abort(ioc);
             return -1;
         }
         (*name)[namelen] = '\0';
@@ -262,6 +287,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
                 g_free(*name);
                 g_free(buf);
                 *name = NULL;
+                nbd_send_opt_abort(ioc);
                 return -1;
             }
             buf[len] = '\0';
@@ -271,6 +297,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
     } else {
         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
                    reply.type, NBD_REP_SERVER);
+        nbd_send_opt_abort(ioc);
         return -1;
     }
     return 1;
@@ -320,6 +347,7 @@ static int nbd_receive_query_exports(QIOChannel *ioc,

     if (!foundExport) {
         error_setg(errp, "No export with name '%s' available", wantname);
+        nbd_send_opt_abort(ioc);
         return -1;
     }

@@ -347,12 +375,14 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
     if (reply.type != NBD_REP_ACK) {
         error_setg(errp, "Server rejected request to start TLS %" PRIx32,
                    reply.type);
+        nbd_send_opt_abort(ioc);
         return NULL;
     }

     if (reply.length != 0) {
         error_setg(errp, "Start TLS response was not zero %" PRIu32,
                    reply.length);
+        nbd_send_opt_abort(ioc);
         return NULL;
     }

-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 09/14] nbd: Let client skip portions of server reply
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (7 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 08/14] nbd: Let server know when client gives up negotiation Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 10/14] nbd: Less allocation during NBD_OPT_LIST Eric Blake
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini

The server has a nice helper function nbd_negotiate_drop_sync()
which lets it easily ignore fluff from the client (such as the
payload to an unknown option request).  We can't quite make it
common, since it depends on nbd_negotiate_read() which handles
coroutine magic, but we can copy the idea into the client where
we have places where we want to ignore data (such as the
description tacked on the end of NBD_REP_SERVER).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: rebase
v3: rebase
---
 nbd/client.c | 47 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 14 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index a78e81c..004dec6 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -75,6 +75,32 @@ static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

 */

+/* Discard length bytes from channel.  Return -errno on failure, or
+ * the amount of bytes consumed. */
+static ssize_t drop_sync(QIOChannel *ioc, size_t size)
+{
+    ssize_t ret, dropped = size;
+    char small[1024];
+    char *buffer;
+
+    buffer = sizeof(small) < size ? small : g_malloc(MIN(65536, size));
+    while (size > 0) {
+        ret = read_sync(ioc, buffer, MIN(65536, size));
+        if (ret < 0) {
+            goto cleanup;
+        }
+        assert(ret <= size);
+        size -= ret;
+    }
+    ret = dropped;
+
+ cleanup:
+    if (buffer != small) {
+        g_free(buffer);
+    }
+    return ret;
+}
+
 /* Send an option request. Return 0 if successful, -1 with errp set if
  * it is impossible to continue. */
 static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
@@ -280,19 +306,12 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
         }
         (*name)[namelen] = '\0';
         len -= namelen;
-        if (len) {
-            char *buf = g_malloc(len + 1);
-            if (read_sync(ioc, buf, len) != len) {
-                error_setg(errp, "failed to read export description");
-                g_free(*name);
-                g_free(buf);
-                *name = NULL;
-                nbd_send_opt_abort(ioc);
-                return -1;
-            }
-            buf[len] = '\0';
-            TRACE("Ignoring export description: %s", buf);
-            g_free(buf);
+        if (drop_sync(ioc, len) != len) {
+            error_setg(errp, "failed to read export description");
+            g_free(*name);
+            *name = NULL;
+            nbd_send_opt_abort(ioc);
+            return -1;
         }
     } else {
         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
@@ -571,7 +590,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
     }

     TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
-    if (read_sync(ioc, &buf, 124) != 124) {
+    if (drop_sync(ioc, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 10/14] nbd: Less allocation during NBD_OPT_LIST
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (8 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 09/14] nbd: Let client skip portions of server reply Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-27 12:16   ` Paolo Bonzini
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 11/14] nbd: Support shorter handshake Eric Blake
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini

Since we know that the maximum name we are willing to accept
is small enough to stack-allocate, rework the iteration over
NBD_OPT_LIST responses to reuse a stack buffer rather than
allocating every time.  Furthermore, we don't even have to
allocate if we know the server's length doesn't match what
we are searching for.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: rebase
v3: tweak commit message
---
 nbd/client.c | 144 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 004dec6..0d0ce05 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -249,14 +249,17 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
     return result;
 }

-static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
+/* Return -1 if unrecoverable error occurs, 0 if NBD_OPT_LIST is
+ * unsupported, 1 if iteration is done, 2 to keep looking, and 3 if
+ * this entry matches @want. */
+static int nbd_receive_list(QIOChannel *ioc, const char *want, Error **errp)
 {
     nbd_opt_reply reply;
     uint32_t len;
     uint32_t namelen;
+    char name[NBD_MAX_NAME_SIZE + 1];
     int error;

-    *name = NULL;
     if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
         return -1;
     }
@@ -272,105 +275,102 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             nbd_send_opt_abort(ioc);
             return -1;
         }
-    } else if (reply.type == NBD_REP_SERVER) {
-        if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
-            error_setg(errp, "incorrect option length %"PRIu32, len);
-            nbd_send_opt_abort(ioc);
-            return -1;
-        }
-        if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
-            error_setg(errp, "failed to read option name length");
-            nbd_send_opt_abort(ioc);
-            return -1;
-        }
-        namelen = be32_to_cpu(namelen);
-        len -= sizeof(namelen);
-        if (len < namelen) {
-            error_setg(errp, "incorrect option name length");
-            nbd_send_opt_abort(ioc);
-            return -1;
-        }
-        if (namelen > NBD_MAX_NAME_SIZE) {
-            error_setg(errp, "export name length too long %" PRIu32, namelen);
-            nbd_send_opt_abort(ioc);
-            return -1;
-        }
-
-        *name = g_new0(char, namelen + 1);
-        if (read_sync(ioc, *name, namelen) != namelen) {
-            error_setg(errp, "failed to read export name");
-            g_free(*name);
-            *name = NULL;
-            nbd_send_opt_abort(ioc);
-            return -1;
-        }
-        (*name)[namelen] = '\0';
-        len -= namelen;
-        if (drop_sync(ioc, len) != len) {
-            error_setg(errp, "failed to read export description");
-            g_free(*name);
-            *name = NULL;
-            nbd_send_opt_abort(ioc);
-            return -1;
-        }
-    } else {
+        return 1;
+    } else if (reply.type != NBD_REP_SERVER) {
         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
                    reply.type, NBD_REP_SERVER);
         nbd_send_opt_abort(ioc);
         return -1;
     }
-    return 1;
+
+    if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
+        error_setg(errp, "incorrect option length %"PRIu32, len);
+        nbd_send_opt_abort(ioc);
+        return -1;
+    }
+    if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
+        error_setg(errp, "failed to read option name length");
+        nbd_send_opt_abort(ioc);
+        return -1;
+    }
+    namelen = be32_to_cpu(namelen);
+    len -= sizeof(namelen);
+    if (len < namelen) {
+        error_setg(errp, "incorrect option name length");
+        nbd_send_opt_abort(ioc);
+        return -1;
+    }
+    if (namelen != strlen(want)) {
+        if (drop_sync(ioc, len) != len) {
+            error_setg(errp, "failed to skip export name with wrong length");
+            nbd_send_opt_abort(ioc);
+            return -1;
+        }
+        return 2;
+    }
+
+    assert(namelen < sizeof(name));
+    if (read_sync(ioc, name, namelen) != namelen) {
+        error_setg(errp, "failed to read export name");
+        nbd_send_opt_abort(ioc);
+        return -1;
+    }
+    name[namelen] = '\0';
+    len -= namelen;
+    if (drop_sync(ioc, len) != len) {
+        error_setg(errp, "failed to read export description");
+        nbd_send_opt_abort(ioc);
+        return -1;
+    }
+    return strcmp(name, want) == 0 ? 3 : 2;
 }


+/* Return -1 on failure, 0 if wantname is an available export. */
 static int nbd_receive_query_exports(QIOChannel *ioc,
                                      const char *wantname,
                                      Error **errp)
 {
     bool foundExport = false;

-    TRACE("Querying export list");
+    TRACE("Querying export list for '%s'", wantname);
     if (nbd_send_option_request(ioc, NBD_OPT_LIST, 0, NULL, errp) < 0) {
         return -1;
     }

     TRACE("Reading available export names");
     while (1) {
-        char *name = NULL;
-        int ret = nbd_receive_list(ioc, &name, errp);
+        int ret = nbd_receive_list(ioc, wantname, errp);

-        if (ret < 0) {
-            g_free(name);
-            name = NULL;
+        switch (ret) {
+        default:
+            /* Server gave unexpected reply */
+            assert(ret < 0);
             return -1;
-        }
-        if (ret == 0) {
+        case 0:
             /* Server doesn't support export listing, so
              * we will just assume an export with our
              * wanted name exists */
-            foundExport = true;
-            break;
-        }
-        if (name == NULL) {
-            TRACE("End of export name list");
+            return 0;
+        case 1:
+            /* Done iterating. */
+            if (!foundExport) {
+                error_setg(errp, "No export with name '%s' available",
+                           wantname);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            return 0;
+        case 2:
+            /* Wasn't this one, keep going. */
             break;
-        }
-        if (g_str_equal(name, wantname)) {
+        case 3:
+            /* Found a match, but must finish parsing reply. */
+            TRACE("Found desired export name '%s'", wantname);
             foundExport = true;
-            TRACE("Found desired export name '%s'", name);
-        } else {
-            TRACE("Ignored export name '%s'", name);
+            break;
         }
-        g_free(name);
-    }
-
-    if (!foundExport) {
-        error_setg(errp, "No export with name '%s' available", wantname);
-        nbd_send_opt_abort(ioc);
-        return -1;
     }
-
-    return 0;
 }

 static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 11/14] nbd: Support shorter handshake
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (9 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 10/14] nbd: Less allocation during NBD_OPT_LIST Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 12/14] nbd: Improve server handling of shutdown requests Eric Blake
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, Kevin Wolf, Max Reitz

The NBD Protocol allows the server and client to mutually agree
on a shorter handshake (omit the 124 bytes of reserved 0), via
the server advertising NBD_FLAG_NO_ZEROES and the client
acknowledging with NBD_FLAG_C_NO_ZEROES (only possible in
newstyle, whether or not it is fixed newstyle).  It doesn't
shave much off the wire, but we might as well implement it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

---
v4: rebase
v3: rebase
---
 include/block/nbd.h |  6 ++++--
 nbd/client.c        |  8 +++++++-
 nbd/server.c        | 15 +++++++++++----
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 7c5103f..6c1d9f4 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -72,11 +72,13 @@ struct nbd_reply {

 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
-#define NBD_FLAG_FIXED_NEWSTYLE     (1 << 0)    /* Fixed newstyle protocol. */
+#define NBD_FLAG_FIXED_NEWSTYLE   (1 << 0) /* Fixed newstyle protocol. */
+#define NBD_FLAG_NO_ZEROES        (1 << 1) /* End handshake without zeroes. */

 /* New-style client flags, sent from client to server to control what happens
    during handshake phase. */
-#define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)    /* Fixed newstyle protocol. */
+#define NBD_FLAG_C_FIXED_NEWSTYLE (1 << 0) /* Fixed newstyle protocol. */
+#define NBD_FLAG_C_NO_ZEROES      (1 << 1) /* End handshake without zeroes. */

 /* Reply types. */
 #define NBD_REP_ACK             (1)             /* Data sending finished. */
diff --git a/nbd/client.c b/nbd/client.c
index 0d0ce05..6554d5a 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -439,6 +439,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
     char buf[256];
     uint64_t magic, s;
     int rc;
+    bool zeroes = true;

     TRACE("Receiving negotiation tlscreds=%p hostname=%s.",
           tlscreds, hostname ? hostname : "<null>");
@@ -503,6 +504,11 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
             TRACE("Server supports fixed new style");
             clientflags |= NBD_FLAG_C_FIXED_NEWSTYLE;
         }
+        if (globalflags & NBD_FLAG_NO_ZEROES) {
+            zeroes = false;
+            TRACE("Server supports no zeroes");
+            clientflags |= NBD_FLAG_C_NO_ZEROES;
+        }
         /* client requested flags */
         clientflags = cpu_to_be32(clientflags);
         if (write_sync(ioc, &clientflags, sizeof(clientflags)) !=
@@ -590,7 +596,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
     }

     TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
-    if (drop_sync(ioc, 124) != 124) {
+    if (zeroes && drop_sync(ioc, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
     }
diff --git a/nbd/server.c b/nbd/server.c
index 99faa36..72ab037 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -80,6 +80,7 @@ struct NBDClient {
     int refcount;
     void (*close)(NBDClient *client);

+    bool no_zeroes;
     NBDExport *exp;
     QCryptoTLSCreds *tlscreds;
     char *tlsaclname;
@@ -440,6 +441,11 @@ static int nbd_negotiate_options(NBDClient *client)
         fixedNewstyle = true;
         flags &= ~NBD_FLAG_C_FIXED_NEWSTYLE;
     }
+    if (flags & NBD_FLAG_C_NO_ZEROES) {
+        TRACE("Client supports no zeroes at handshake end");
+        client->no_zeroes = true;
+        flags &= ~NBD_FLAG_C_NO_ZEROES;
+    }
     if (flags != 0) {
         TRACE("Unknown client flags 0x%" PRIx32 " received", flags);
         return -EIO;
@@ -592,6 +598,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
                               NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
     bool oldStyle;
+    size_t len;

     /* Old style negotiation header without options
         [ 0 ..   7]   passwd       ("NBDMAGIC")
@@ -608,7 +615,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
         ....options sent....
         [18 ..  25]   size
         [26 ..  27]   export flags
-        [28 .. 151]   reserved     (0)
+        [28 .. 151]   reserved     (0, omit if no_zeroes)
      */

     qio_channel_set_blocking(client->ioc, false, NULL);
@@ -627,7 +634,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
     } else {
         stq_be_p(buf + 8, NBD_OPTS_MAGIC);
-        stw_be_p(buf + 16, NBD_FLAG_FIXED_NEWSTYLE);
+        stw_be_p(buf + 16, NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES);
     }

     if (oldStyle) {
@@ -654,8 +661,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
               client->exp->size, client->exp->nbdflags | myflags);
         stq_be_p(buf + 18, client->exp->size);
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
-        if (nbd_negotiate_write(client->ioc, buf + 18, sizeof(buf) - 18) !=
-            sizeof(buf) - 18) {
+        len = client->no_zeroes ? 10 : sizeof(buf) - 18;
+        if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
             LOG("write failed");
             goto fail;
         }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 12/14] nbd: Improve server handling of shutdown requests
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (10 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 11/14] nbd: Support shorter handshake Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, Kevin Wolf, Max Reitz

NBD commit 6d34500b clarified how clients and servers are supposed
to behave before closing a connection. It added NBD_REP_ERR_SHUTDOWN
(for the server to announce it is about to go away during option
haggling, so the client should quit sending NBD_OPT_* other than
NBD_OPT_ABORT) and ESHUTDOWN (for the server to announce it is about
to go away during transmission, so the client should quit sending
NBD_CMD_* other than NBD_CMD_DISC).  It also clarified that
NBD_OPT_ABORT gets a reply, while NBD_CMD_DISC does not.

This patch merely adds the missing reply to NBD_OPT_ABORT and teaches
the client to recognize server errors.  Actually teaching the server
to send NBD_REP_ERR_SHUTDOWN or ESHUTDOWN would require knowing that
the server has been requested to shut down soon (maybe we could do
that by installing a SIGINT handler in qemu-nbd, which transitions
from RUNNING to a new state that waits for the client to react,
rather than just out-right quitting - but that's a bigger task for
another day).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: new patch
---
 include/block/nbd.h | 13 +++++++++----
 nbd/nbd-internal.h  |  1 +
 nbd/client.c        | 16 ++++++++++++++++
 nbd/server.c        | 10 ++++++++++
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 6c1d9f4..fc4426c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -81,12 +81,17 @@ struct nbd_reply {
 #define NBD_FLAG_C_NO_ZEROES      (1 << 1) /* End handshake without zeroes. */

 /* Reply types. */
+#define NBD_REP_ERR(value) ((UINT32_C(1) << 31) | (value))
+
 #define NBD_REP_ACK             (1)             /* Data sending finished. */
 #define NBD_REP_SERVER          (2)             /* Export description. */
-#define NBD_REP_ERR_UNSUP       ((UINT32_C(1) << 31) | 1) /* Unknown option. */
-#define NBD_REP_ERR_POLICY      ((UINT32_C(1) << 31) | 2) /* Server denied */
-#define NBD_REP_ERR_INVALID     ((UINT32_C(1) << 31) | 3) /* Invalid length. */
-#define NBD_REP_ERR_TLS_REQD    ((UINT32_C(1) << 31) | 5) /* TLS required */
+
+#define NBD_REP_ERR_UNSUP       NBD_REP_ERR(1)  /* Unknown option */
+#define NBD_REP_ERR_POLICY      NBD_REP_ERR(2)  /* Server denied */
+#define NBD_REP_ERR_INVALID     NBD_REP_ERR(3)  /* Invalid length */
+#define NBD_REP_ERR_PLATFORM    NBD_REP_ERR(4)  /* Not compiled in */
+#define NBD_REP_ERR_TLS_REQD    NBD_REP_ERR(5)  /* TLS required */
+#define NBD_REP_ERR_SHUTDOWN    NBD_REP_ERR(7)  /* Server shutting down */

 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0)
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index dd57e18..eee20ab 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -92,6 +92,7 @@
 #define NBD_ENOMEM     12
 #define NBD_EINVAL     22
 #define NBD_ENOSPC     28
+#define NBD_ESHUTDOWN  108

 static inline ssize_t read_sync(QIOChannel *ioc, void *buffer, size_t size)
 {
diff --git a/nbd/client.c b/nbd/client.c
index 6554d5a..b678cfc 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -34,6 +34,8 @@ static int nbd_errno_to_system_errno(int err)
         return ENOMEM;
     case NBD_ENOSPC:
         return ENOSPC;
+    case NBD_ESHUTDOWN:
+        return ESHUTDOWN;
     default:
         TRACE("Squashing unexpected error %d to EINVAL", err);
         /* fallthrough */
@@ -226,11 +228,21 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
                    reply->option);
         break;

+    case NBD_REP_ERR_PLATFORM:
+        error_setg(errp, "Server lacks support for option %" PRIx32,
+                   reply->option);
+        break;
+
     case NBD_REP_ERR_TLS_REQD:
         error_setg(errp, "TLS negotiation required before option %" PRIx32,
                    reply->option);
         break;

+    case NBD_REP_ERR_SHUTDOWN:
+        error_setg(errp, "Server shutting down before option %" PRIx32,
+                   reply->option);
+        break;
+
     default:
         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
                    reply->option);
@@ -784,6 +796,10 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply)
         LOG("invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
+    if (reply->error == ESHUTDOWN) {
+        LOG("server shutting down");
+        return -EINVAL;
+    }
     return 0;
 }

diff --git a/nbd/server.c b/nbd/server.c
index 72ab037..ee99e19 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -39,6 +39,8 @@ static int system_errno_to_nbd_errno(int err)
     case EFBIG:
     case ENOSPC:
         return NBD_ENOSPC;
+    case ESHUTDOWN:
+        return NBD_ESHUTDOWN;
     case EINVAL:
     default:
         return NBD_EINVAL;
@@ -517,6 +519,10 @@ static int nbd_negotiate_options(NBDClient *client)
                 if (ret < 0) {
                     return ret;
                 }
+                /* Let the client keep trying, unless they asked to quit */
+                if (clientflags == NBD_OPT_ABORT) {
+                    return -EINVAL;
+                }
                 break;
             }
         } else if (fixedNewstyle) {
@@ -529,6 +535,10 @@ static int nbd_negotiate_options(NBDClient *client)
                 break;

             case NBD_OPT_ABORT:
+                /* NBD spec says we must try to reply before
+                 * disconnecting, but that we must also tolerate
+                 * guests that don't wait for our reply. */
+                nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, clientflags);
                 return -EINVAL;

             case NBD_OPT_EXPORT_NAME:
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (11 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 12/14] nbd: Improve server handling of shutdown requests Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
  13 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, Kevin Wolf, Max Reitz

Upstream NBD protocol recently added the ability to efficiently
write zeroes without having to send the zeroes over the wire,
along with a flag to control whether the client wants a hole.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: rebase, fix value for constant
v3: abandon NBD_CMD_CLOSE extension, rebase to use blk_pwrite_zeroes
---
 include/block/nbd.h |  8 ++++++--
 nbd/server.c        | 42 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fc4426c..e23ef73 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -69,6 +69,7 @@ struct nbd_reply {
 #define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
 #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
 #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
+#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)     /* Send WRITE_ZEROES */

 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
@@ -94,7 +95,8 @@ struct nbd_reply {
 #define NBD_REP_ERR_SHUTDOWN    NBD_REP_ERR(7)  /* Server shutting down */

 /* Request flags, sent from client to server during transmission phase */
-#define NBD_CMD_FLAG_FUA        (1 << 0)
+#define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
+#define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */

 /* Supported request types */
 enum {
@@ -102,7 +104,9 @@ enum {
     NBD_CMD_WRITE = 1,
     NBD_CMD_DISC = 2,
     NBD_CMD_FLUSH = 3,
-    NBD_CMD_TRIM = 4
+    NBD_CMD_TRIM = 4,
+    /* 5 reserved for failed experiment NBD_CMD_CACHE */
+    NBD_CMD_WRITE_ZEROES = 6,
 };

 #define NBD_DEFAULT_PORT	10809
diff --git a/nbd/server.c b/nbd/server.c
index ee99e19..58100b3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -606,7 +606,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     char buf[8 + 8 + 8 + 128];
     int rc;
     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
-                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
+                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
+                              NBD_FLAG_SEND_WRITE_ZEROES);
     bool oldStyle;
     size_t len;

@@ -1122,11 +1123,17 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
         rc = request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
         goto out;
     }
-    if (request->flags & ~NBD_CMD_FLAG_FUA) {
+    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
         LOG("unsupported flags (got 0x%x)", request->flags);
         rc = -EINVAL;
         goto out;
     }
+    if (request->type != NBD_CMD_WRITE_ZEROES &&
+        (request->flags & NBD_CMD_FLAG_NO_HOLE)) {
+        LOG("unexpected flags (got 0x%x)", request->flags);
+        rc = -EINVAL;
+        goto out;
+    }

     rc = 0;

@@ -1231,6 +1238,37 @@ static void nbd_trip(void *opaque)
         }
         break;

+    case NBD_CMD_WRITE_ZEROES:
+        TRACE("Request type is WRITE_ZEROES");
+
+        if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
+            TRACE("Server is read-only, return error");
+            reply.error = EROFS;
+            goto error_reply;
+        }
+
+        TRACE("Writing to device");
+
+        flags = 0;
+        if (request.flags & NBD_CMD_FLAG_FUA) {
+            flags |= BDRV_REQ_FUA;
+        }
+        if (!(request.flags & NBD_CMD_FLAG_NO_HOLE)) {
+            flags |= BDRV_REQ_MAY_UNMAP;
+        }
+        ret = blk_pwrite_zeroes(exp->blk, request.from + exp->dev_offset,
+                                request.len, flags);
+        if (ret < 0) {
+            LOG("writing to file failed");
+            reply.error = -ret;
+            goto error_reply;
+        }
+
+        if (nbd_co_send_reply(req, &reply, 0) < 0) {
+            goto out;
+        }
+        break;
+
     case NBD_CMD_DISC:
         /* unreachable, thanks to special case in nbd_co_receive_request() */
         abort();
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client
  2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
                   ` (12 preceding siblings ...)
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
@ 2016-06-25 22:15 ` Eric Blake
  2016-06-27 12:13   ` Paolo Bonzini
  13 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2016-06-25 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, Kevin Wolf, Max Reitz

Upstream NBD protocol recently added the ability to efficiently
write zeroes without having to send the zeroes over the wire,
along with a flag to control whether the client wants a hole.

The generic block code takes care of falling back to the obvious
write of lots of zeroes if we return -ENOTSUP because the server
does not have WRITE_ZEROES.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: rebase to byte-based limits
v3: rebase, tell block layer about our support
---
 block/nbd-client.h |  2 ++
 block/nbd-client.c | 35 +++++++++++++++++++++++++++++++++++
 block/nbd.c        |  4 ++++
 3 files changed, 41 insertions(+)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 044aca4..2cfe377 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -48,6 +48,8 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int count);
 int nbd_client_co_flush(BlockDriverState *bs);
 int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
                           uint64_t bytes, QEMUIOVector *qiov, int flags);
+int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+                                int count, BdrvRequestFlags flags);
 int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
                          uint64_t bytes, QEMUIOVector *qiov, int flags);

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 88e52c1..5fdac74 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -275,6 +275,41 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
     return -reply.error;
 }

+int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+                                int count, BdrvRequestFlags flags)
+{
+    ssize_t ret;
+    NbdClientSession *client = nbd_get_client_session(bs);
+    struct nbd_request request = {
+        .type = NBD_CMD_WRITE_ZEROES,
+        .from = offset,
+        .len = count,
+    };
+    struct nbd_reply reply;
+
+    if (!(client->nbdflags & NBD_FLAG_SEND_WRITE_ZEROES)) {
+        return -ENOTSUP;
+    }
+
+    if (flags & BDRV_REQ_FUA) {
+        assert(client->nbdflags & NBD_FLAG_SEND_FUA);
+        request.flags |= NBD_CMD_FLAG_FUA;
+    }
+    if (!(flags & BDRV_REQ_MAY_UNMAP)) {
+        request.flags |= NBD_CMD_FLAG_NO_HOLE;
+    }
+
+    nbd_coroutine_start(client, &request);
+    ret = nbd_co_send_request(bs, &request, NULL);
+    if (ret < 0) {
+        reply.error = -ret;
+    } else {
+        nbd_co_receive_reply(client, &request, &reply, NULL);
+    }
+    nbd_coroutine_end(client, &request);
+    return -reply.error;
+}
+
 int nbd_client_co_flush(BlockDriverState *bs)
 {
     NbdClientSession *client = nbd_get_client_session(bs);
diff --git a/block/nbd.c b/block/nbd.c
index 8d57220..049d1bd 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -357,6 +357,7 @@ static int nbd_co_flush(BlockDriverState *bs)
 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     bs->bl.max_pdiscard = NBD_MAX_BUFFER_SIZE;
+    bs->bl.max_pwrite_zeroes = NBD_MAX_BUFFER_SIZE;
     bs->bl.max_transfer = NBD_MAX_BUFFER_SIZE;
 }

@@ -440,6 +441,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_file_open             = nbd_open,
     .bdrv_co_preadv             = nbd_client_co_preadv,
     .bdrv_co_pwritev            = nbd_client_co_pwritev,
+    .bdrv_co_pwrite_zeroes      = nbd_client_co_pwrite_zeroes,
     .bdrv_close                 = nbd_close,
     .bdrv_co_flush_to_os        = nbd_co_flush,
     .bdrv_co_pdiscard           = nbd_client_co_pdiscard,
@@ -458,6 +460,7 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_file_open             = nbd_open,
     .bdrv_co_preadv             = nbd_client_co_preadv,
     .bdrv_co_pwritev            = nbd_client_co_pwritev,
+    .bdrv_co_pwrite_zeroes      = nbd_client_co_pwrite_zeroes,
     .bdrv_close                 = nbd_close,
     .bdrv_co_flush_to_os        = nbd_co_flush,
     .bdrv_co_pdiscard           = nbd_client_co_pdiscard,
@@ -476,6 +479,7 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_file_open             = nbd_open,
     .bdrv_co_preadv             = nbd_client_co_preadv,
     .bdrv_co_pwritev            = nbd_client_co_pwritev,
+    .bdrv_co_pwrite_zeroes      = nbd_client_co_pwrite_zeroes,
     .bdrv_close                 = nbd_close,
     .bdrv_co_flush_to_os        = nbd_co_flush,
     .bdrv_co_pdiscard           = nbd_client_co_pdiscard,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/14] nbd: Send message along with server NBD_REP_ERR errors
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 06/14] nbd: Send message along with server NBD_REP_ERR errors Eric Blake
@ 2016-06-27 12:02   ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-06-27 12:02 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block



On 26/06/2016 00:15, Eric Blake wrote:
> 
> +/* Send an error reply.
> + * Return -errno to kill connection, 0 to continue negotiation. */
> +static int GCC_FMT_ATTR(4, 5)
> +    nbd_negotiate_send_rep_err(QIOChannel *ioc, uint32_t type,
> +                               uint32_t opt, const char *fmt, ...)
> +{
> +    va_list va;
> +    char *msg;
> +    int ret;
> +    size_t len;
> +
> +    va_start(va, fmt);
> +    msg = g_strdup_vprintf(fmt, va);

This leaks below.

Paolo

> +    va_end(va);
> +    len = strlen(msg);
> +    assert(len < 4096);
> +    TRACE("sending error message \"%s\"", msg);
> +    ret = nbd_negotiate_send_rep_len(ioc, type, opt, len);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    if (nbd_negotiate_write(ioc, msg, len) != len) {
> +        LOG("write failed (error message)");
> +        return -EINVAL;
> +    }
> +    return 0;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
@ 2016-06-27 12:13   ` Paolo Bonzini
  2016-06-27 13:00     ` Eric Blake
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-06-27 12:13 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz



On 26/06/2016 00:15, Eric Blake wrote:
> diff --git a/block/nbd.c b/block/nbd.c
> index 8d57220..049d1bd 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -357,6 +357,7 @@ static int nbd_co_flush(BlockDriverState *bs)
>  static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>      bs->bl.max_pdiscard = NBD_MAX_BUFFER_SIZE;
> +    bs->bl.max_pwrite_zeroes = NBD_MAX_BUFFER_SIZE;

I have probably asked before---is there any reason for these to be
limited, since the commands have no payload?

Thanks,

Paolo

>      bs->bl.max_transfer = NBD_MAX_BUFFER_SIZE;
>  }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/14] nbd: Less allocation during NBD_OPT_LIST
  2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 10/14] nbd: Less allocation during NBD_OPT_LIST Eric Blake
@ 2016-06-27 12:16   ` Paolo Bonzini
  2016-07-18 23:31     ` Eric Blake
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-06-27 12:16 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block



On 26/06/2016 00:15, Eric Blake wrote:
> +/* Return -1 if unrecoverable error occurs
> , 0 if NBD_OPT_LIST is unsupported,

These two should return errp != NULL and negative errno.

> 1 if iteration is done
> 2 to keep looking,
> and 3 if * this entry matches @want. */

These three should return errp == NULL.  Please change the function so
that the return value signifies "need another call", and a bool*
argument is set to true if the name matches.

Paolo

> +static int nbd_receive_list(QIOChannel *ioc, const char *want, Error **errp)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client
  2016-06-27 12:13   ` Paolo Bonzini
@ 2016-06-27 13:00     ` Eric Blake
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-06-27 13:00 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, nbd-general list

[-- Attachment #1: Type: text/plain, Size: 1834 bytes --]

On 06/27/2016 06:13 AM, Paolo Bonzini wrote:
> 
> 
> On 26/06/2016 00:15, Eric Blake wrote:
>> diff --git a/block/nbd.c b/block/nbd.c
>> index 8d57220..049d1bd 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -357,6 +357,7 @@ static int nbd_co_flush(BlockDriverState *bs)
>>  static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>>  {
>>      bs->bl.max_pdiscard = NBD_MAX_BUFFER_SIZE;
>> +    bs->bl.max_pwrite_zeroes = NBD_MAX_BUFFER_SIZE;
> 
> I have probably asked before---is there any reason for these to be
> limited, since the commands have no payload?

Here's the last time it was brought up on the nbd-general list [1].  We
have the potential BLOCK_SIZE handshake negotiation extension, where I
was proposing that the server can advertise its actual limits (rather
than the client having to guess them or rely on out-of-band information)
- and I was proposing that NBD_CMD_TRIM and NBD_CMD_WRITE_ZEROES should
be permitted to advertise additional limits that are larger than the
NBD_CMD_WRITE limit, precisely because they don't carry a payload and
can therefore be more efficient if done in bulk.

[1] https://sourceforge.net/p/nbd/mailman/message/35081223/

But at the time of that thread, there was concern expressed whether
adding and additional NBD_INFO for each NBD_CMD limit would scale well,
or whether we need a different approach, and I haven't revisited the
thread since that comment.  At any rate, I have BLOCK_SIZE patches ready
for qemu, once the WRITE_ZERO patches land, and where it should be easy
to make this limit runtime-settable to a larger value from actual server
limits, if we can decide how BLOCK_SIZE should advertise such a limit.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/14] nbd: Less allocation during NBD_OPT_LIST
  2016-06-27 12:16   ` Paolo Bonzini
@ 2016-07-18 23:31     ` Eric Blake
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2016-07-18 23:31 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

[-- Attachment #1: Type: text/plain, Size: 1696 bytes --]

On 06/27/2016 06:16 AM, Paolo Bonzini wrote:
> 
> 
> On 26/06/2016 00:15, Eric Blake wrote:
>> +/* Return -1 if unrecoverable error occurs
>> , 0 if NBD_OPT_LIST is unsupported,
> 
> These two should return errp != NULL and negative errno.

Not quite.  It returns 0 if nbd_handle_reply_err() detected an
unsupported command, in which case errp is unset.  And the caller
doesn't really care what value is sent (the return value is not -errno,
merely negative).

> 
>> 1 if iteration is done
>> 2 to keep looking,
>> and 3 if * this entry matches @want. */
> 
> These three should return errp == NULL.  Please change the function so
> that the return value signifies "need another call", and a bool*
> argument is set to true if the name matches.

By that argument, you want:

old code					 new code
-1, errp set - done iterating			 -1, *match unspecified
0, unsupported - done iterating, assume match	 0, *match set
1, done iterating, match determined earlier	 0, *match unchanged
2, still iterating, this round didn't match	 1, *match unchanged
3, still iterating, this round matched		 1, *match set

where the caller starts with *match clear, iterates as long as the
result is positive, then when the result is -1 returns an error, or when
the result is 0 it returns the current setting in *match.

I can try that alternative flow, and will post an update shortly.  I
still think this is worth having in 2.7 if it is not too late (as it was
originally posted before soft freeze), but if it misses hard freeze
tomorrow, then it is 2.8 material.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-07-18 23:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-25 22:15 [Qemu-devel] [PATCH v4 00/14] nbd: efficient write zeroes Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 01/14] nbd: Fix bad flag detection on server Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 02/14] nbd: Add qemu-nbd -D for human-readable description Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 03/14] nbd: Limit nbdflags to 16 bits Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 04/14] nbd: Treat flags vs. command type as separate fields Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 05/14] nbd: Share common reply-sending code in server Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 06/14] nbd: Send message along with server NBD_REP_ERR errors Eric Blake
2016-06-27 12:02   ` Paolo Bonzini
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 07/14] nbd: Share common option-sending code in client Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 08/14] nbd: Let server know when client gives up negotiation Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 09/14] nbd: Let client skip portions of server reply Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 10/14] nbd: Less allocation during NBD_OPT_LIST Eric Blake
2016-06-27 12:16   ` Paolo Bonzini
2016-07-18 23:31     ` Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 11/14] nbd: Support shorter handshake Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 12/14] nbd: Improve server handling of shutdown requests Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
2016-06-25 22:15 ` [Qemu-devel] [PATCH v4 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
2016-06-27 12:13   ` Paolo Bonzini
2016-06-27 13:00     ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.