All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/8] Implement NBD_OPT_GO, block size advertisement
@ 2017-02-21  2:42 Eric Blake
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 1/8] nbd/client: fix drop_sync [CVE-2017-2630] Eric Blake
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Eric Blake @ 2017-02-21  2:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, vsementsov, den

A bit later than I planned, but still in time for soft freeze if
we like it. The NBD protocol has a proposed extension that fixes
several shortcomings with NBD_OPT_EXPORT_NAME (namely, no error
reporting, no way for the server to advertise block sizes to the
client):
https://github.com/NetworkBlockDevice/nbd/blob/extension-info/doc/proto.md

This sort of overlaps with the work Vladimir is proposing for
implementing NBD_CMD_BLOCK_STATUS (in fact, I included his CVE
fix because it was my bug that introduced the hole), but an initial
version of my work has been on the list a lot longer; where it was
posted as v3 of a larger series (see patches 37-44):
https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg03526.html

Difference since v3 (if it is even worth comparing):

001/8:[down] 'nbd/client: fix drop_sync [CVE-2017-2630]'
002/8:[0034] [FC] 'nbd: Create struct for tracking export info'
003/8:[down] 'block: Add blk_get_opt_transfer()'
004/8:[down] 'nbd: Expose and debug more NBD constants'
005/8:[0187] [FC] 'nbd: Implement NBD_OPT_GO on server'
006/8:[0030] [FC] 'nbd: Implement NBD_OPT_GO on client'
007/8:[down] 'nbd: Implement NBD_INFO_BLOCK_SIZE on server'
008/8:[down] 'nbd: Implement NBD_INFO_BLOCK_SIZE on client'

I built this on top of my blkdebug enhancements for testing
purposes; I did not test this one without that but the two
series should be relatively orthogonal:
https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg03042.html

Also available as a tag at this location:
git fetch git://repo.or.cz/qemu/ericb.git nbd-go-v4

Eric Blake (7):
  nbd: Create struct for tracking export info
  block: Add blk_get_opt_transfer()
  nbd: Expose and debug more NBD constants
  nbd: Implement NBD_OPT_GO on server
  nbd: Implement NBD_OPT_GO on client
  nbd: Implement NBD_INFO_BLOCK_SIZE on server
  nbd: Implement NBD_INFO_BLOCK_SIZE on client

Vladimir Sementsov-Ogievskiy (1):
  nbd/client: fix drop_sync [CVE-2017-2630]

 block/nbd-client.h             |   3 +-
 include/block/nbd.h            |  55 ++++++--
 include/sysemu/block-backend.h |   1 +
 nbd/nbd-internal.h             |  12 +-
 block/block-backend.c          |  12 ++
 block/nbd-client.c             |  22 +--
 block/nbd.c                    |  16 ++-
 nbd/client.c                   | 299 ++++++++++++++++++++++++++++++++++-------
 nbd/common.c                   |  69 ++++++++++
 nbd/server.c                   | 264 +++++++++++++++++++++++++++++++++---
 qemu-nbd.c                     |  10 +-
 11 files changed, 656 insertions(+), 107 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 1/8] nbd/client: fix drop_sync [CVE-2017-2630]
  2017-02-21  2:42 [Qemu-devel] [PATCH v4 0/8] Implement NBD_OPT_GO, block size advertisement Eric Blake
@ 2017-02-21  2:42 ` Eric Blake
  2017-02-21  8:11   ` Marc-André Lureau
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 2/8] nbd: Create struct for tracking export info Eric Blake
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2017-02-21  2:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, vsementsov, den

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Comparison symbol is misused. It may lead to memory corruption.
Introduced in commit 7d3123e.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170203154757.36140-6-vsementsov@virtuozzo.com>
[eblake: add CVE details]
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/nbd/client.c b/nbd/client.c
index ffb0743..0d16cd1 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -94,7 +94,7 @@ static ssize_t drop_sync(QIOChannel *ioc, size_t size)
     char small[1024];
     char *buffer;

-    buffer = sizeof(small) < size ? small : g_malloc(MIN(65536, size));
+    buffer = sizeof(small) > size ? small : g_malloc(MIN(65536, size));
     while (size > 0) {
         ssize_t count = read_sync(ioc, buffer, MIN(65536, size));

-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 2/8] nbd: Create struct for tracking export info
  2017-02-21  2:42 [Qemu-devel] [PATCH v4 0/8] Implement NBD_OPT_GO, block size advertisement Eric Blake
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 1/8] nbd/client: fix drop_sync [CVE-2017-2630] Eric Blake
@ 2017-02-21  2:42 ` Eric Blake
  2017-02-21  8:11   ` Marc-André Lureau
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 3/8] block: Add blk_get_opt_transfer() Eric Blake
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2017-02-21  2:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, vsementsov, den, Kevin Wolf, Max Reitz

The NBD Protocol is introducing some additional information
about exports, such as minimum request size and alignment, as
well as an advertised maximum request size.  It will be easier
to feed this information back to the block layer if we gather
all the information into a struct, rather than adding yet more
pointer parameters during negotiation.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: rebase to master
v3: new patch
---
 block/nbd-client.h  |  3 +--
 include/block/nbd.h | 15 +++++++++++----
 block/nbd-client.c  | 18 ++++++++----------
 block/nbd.c         |  2 +-
 nbd/client.c        | 47 +++++++++++++++++++++++++----------------------
 qemu-nbd.c          | 10 ++++------
 6 files changed, 50 insertions(+), 45 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index f8d6006..098b65c 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -20,8 +20,7 @@
 typedef struct NBDClientSession {
     QIOChannelSocket *sioc; /* The master data channel */
     QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
-    uint16_t nbdflags;
-    off_t size;
+    NBDExportInfo info;

     CoMutex send_mutex;
     CoQueue free_sema;
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3e373f0..8cc9cbe 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -123,16 +123,23 @@ enum {
  * aren't overflowing some other buffer. */
 #define NBD_MAX_NAME_SIZE 256

+/* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
+struct NBDExportInfo {
+    uint64_t size;
+    uint16_t flags;
+};
+typedef struct NBDExportInfo NBDExportInfo;
+
 ssize_t nbd_wr_syncv(QIOChannel *ioc,
                      struct iovec *iov,
                      size_t niov,
                      size_t length,
                      bool do_read);
-int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
+int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
-                          QIOChannel **outioc,
-                          off_t *size, Error **errp);
-int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
+                          QIOChannel **outioc, NBDExportInfo *info,
+                          Error **errp);
+int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info);
 ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
 ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
 int nbd_client(int fd);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 06f1532..32d7c90 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -258,7 +258,7 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
     ssize_t ret;

     if (flags & BDRV_REQ_FUA) {
-        assert(client->nbdflags & NBD_FLAG_SEND_FUA);
+        assert(client->info.flags & NBD_FLAG_SEND_FUA);
         request.flags |= NBD_CMD_FLAG_FUA;
     }

@@ -287,12 +287,12 @@ int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
     };
     NBDReply reply;

-    if (!(client->nbdflags & NBD_FLAG_SEND_WRITE_ZEROES)) {
+    if (!(client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
         return -ENOTSUP;
     }

     if (flags & BDRV_REQ_FUA) {
-        assert(client->nbdflags & NBD_FLAG_SEND_FUA);
+        assert(client->info.flags & NBD_FLAG_SEND_FUA);
         request.flags |= NBD_CMD_FLAG_FUA;
     }
     if (!(flags & BDRV_REQ_MAY_UNMAP)) {
@@ -317,7 +317,7 @@ int nbd_client_co_flush(BlockDriverState *bs)
     NBDReply reply;
     ssize_t ret;

-    if (!(client->nbdflags & NBD_FLAG_SEND_FLUSH)) {
+    if (!(client->info.flags & NBD_FLAG_SEND_FLUSH)) {
         return 0;
     }

@@ -346,7 +346,7 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int count)
     NBDReply reply;
     ssize_t ret;

-    if (!(client->nbdflags & NBD_FLAG_SEND_TRIM)) {
+    if (!(client->info.flags & NBD_FLAG_SEND_TRIM)) {
         return 0;
     }

@@ -405,19 +405,17 @@ int nbd_client_init(BlockDriverState *bs,
     qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);

     ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), export,
-                                &client->nbdflags,
                                 tlscreds, hostname,
-                                &client->ioc,
-                                &client->size, errp);
+                                &client->ioc, &client->info, errp);
     if (ret < 0) {
         logout("Failed to negotiate with the NBD server\n");
         return ret;
     }
-    if (client->nbdflags & NBD_FLAG_SEND_FUA) {
+    if (client->info.flags & NBD_FLAG_SEND_FUA) {
         bs->supported_write_flags = BDRV_REQ_FUA;
         bs->supported_zero_flags |= BDRV_REQ_FUA;
     }
-    if (client->nbdflags & NBD_FLAG_SEND_WRITE_ZEROES) {
+    if (client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) {
         bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP;
     }

diff --git a/block/nbd.c b/block/nbd.c
index 35f24be..c43fa35 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -485,7 +485,7 @@ static int64_t nbd_getlength(BlockDriverState *bs)
 {
     BDRVNBDState *s = bs->opaque;

-    return s->client.size;
+    return s->client.info.size;
 }

 static void nbd_detach_aio_context(BlockDriverState *bs)
diff --git a/nbd/client.c b/nbd/client.c
index 0d16cd1..69f0e09 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -454,13 +454,13 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
 }


-int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
+int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
-                          QIOChannel **outioc,
-                          off_t *size, Error **errp)
+                          QIOChannel **outioc, NBDExportInfo *info,
+                          Error **errp)
 {
     char buf[256];
-    uint64_t magic, s;
+    uint64_t magic;
     int rc;
     bool zeroes = true;

@@ -573,17 +573,19 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
         }

         /* Read the response */
-        if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
+        if (read_sync(ioc, &info->size, sizeof(info->size)) !=
+            sizeof(info->size)) {
             error_setg(errp, "Failed to read export length");
             goto fail;
         }
-        *size = be64_to_cpu(s);
+        be64_to_cpus(&info->size);

-        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
+        if (read_sync(ioc, &info->flags, sizeof(info->flags)) !=
+            sizeof(info->flags)) {
             error_setg(errp, "Failed to read export flags");
             goto fail;
         }
-        be16_to_cpus(flags);
+        be16_to_cpus(&info->flags);
     } else if (magic == NBD_CLIENT_MAGIC) {
         uint32_t oldflags;

@@ -596,12 +598,12 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
             goto fail;
         }

-        if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
+        if (read_sync(ioc, &info->size, sizeof(info->size)) !=
+            sizeof(info->size)) {
             error_setg(errp, "Failed to read export length");
             goto fail;
         }
-        *size = be64_to_cpu(s);
-        TRACE("Size is %" PRIu64, *size);
+        be64_to_cpus(&info->size);

         if (read_sync(ioc, &oldflags, sizeof(oldflags)) != sizeof(oldflags)) {
             error_setg(errp, "Failed to read export flags");
@@ -612,13 +614,14 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
             error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
             goto fail;
         }
-        *flags = oldflags;
+        info->flags = oldflags;
     } else {
         error_setg(errp, "Bad magic received");
         goto fail;
     }

-    TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
+    TRACE("Size is %" PRIu64 ", export flags %" PRIx16,
+          info->size, info->flags);
     if (zeroes && drop_sync(ioc, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
@@ -630,11 +633,11 @@ fail:
 }

 #ifdef __linux__
-int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info)
 {
-    unsigned long sectors = size / BDRV_SECTOR_SIZE;
-    if (size / BDRV_SECTOR_SIZE != sectors) {
-        LOG("Export size %lld too large for 32-bit kernel", (long long) size);
+    unsigned long sectors = info->size / BDRV_SECTOR_SIZE;
+    if (info->size / BDRV_SECTOR_SIZE != sectors) {
+        LOG("Export size %" PRId64 " too large for 32-bit kernel", info->size);
         return -E2BIG;
     }

@@ -655,9 +658,9 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
     }

     TRACE("Setting size to %lu block(s)", sectors);
-    if (size % BDRV_SECTOR_SIZE) {
+    if (info->size % BDRV_SECTOR_SIZE) {
         TRACE("Ignoring trailing %d bytes of export",
-              (int) (size % BDRV_SECTOR_SIZE));
+              (int) (info->size % BDRV_SECTOR_SIZE));
     }

     if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) {
@@ -666,9 +669,9 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
         return -serrno;
     }

-    if (ioctl(fd, NBD_SET_FLAGS, (unsigned long) flags) < 0) {
+    if (ioctl(fd, NBD_SET_FLAGS, (unsigned long) info->flags) < 0) {
         if (errno == ENOTTY) {
-            int read_only = (flags & NBD_FLAG_READ_ONLY) != 0;
+            int read_only = (info->flags & NBD_FLAG_READ_ONLY) != 0;
             TRACE("Setting readonly attribute");

             if (ioctl(fd, BLKROSET, (unsigned long) &read_only) < 0) {
@@ -726,7 +729,7 @@ int nbd_disconnect(int fd)
 }

 #else
-int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *ioc, NBDExportInfo *info)
 {
     return -ENOTSUP;
 }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index e4fede6..c598cbe 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -254,8 +254,7 @@ static void *show_parts(void *arg)
 static void *nbd_client_thread(void *arg)
 {
     char *device = arg;
-    off_t size;
-    uint16_t nbdflags;
+    NBDExportInfo info;
     QIOChannelSocket *sioc;
     int fd;
     int ret;
@@ -270,9 +269,8 @@ static void *nbd_client_thread(void *arg)
         goto out;
     }

-    ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL, &nbdflags,
-                                NULL, NULL, NULL,
-                                &size, &local_error);
+    ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL,
+                                NULL, NULL, NULL, &info, &local_error);
     if (ret < 0) {
         if (local_error) {
             error_report_err(local_error);
@@ -287,7 +285,7 @@ static void *nbd_client_thread(void *arg)
         goto out_socket;
     }

-    ret = nbd_init(fd, sioc, nbdflags, size);
+    ret = nbd_init(fd, sioc, &info);
     if (ret < 0) {
         goto out_fd;
     }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 3/8] block: Add blk_get_opt_transfer()
  2017-02-21  2:42 [Qemu-devel] [PATCH v4 0/8] Implement NBD_OPT_GO, block size advertisement Eric Blake
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 1/8] nbd/client: fix drop_sync [CVE-2017-2630] Eric Blake
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 2/8] nbd: Create struct for tracking export info Eric Blake
@ 2017-02-21  2:42 ` Eric Blake
  2017-02-21  8:11   ` Marc-André Lureau
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 4/8] nbd: Expose and debug more NBD constants Eric Blake
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2017-02-21  2:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, vsementsov, den, Kevin Wolf, Max Reitz

The NBD protocol would like to advertise the optimal I/O
size to the client; but it would be a layering violation to
peek into blk_bs(blk)->bl, when we only have a BB.

This copies the existing blk_get_max_transfer() in reading
a value from the top BDS; where that value was picked via
bdrv_refresh_limits() to reflect the overall constraints of
the entire BDS chain.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: retitle, as part of rebasing to byte limits
v3: new patch
---
 include/sysemu/block-backend.h |  1 +
 block/block-backend.c          | 12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 6444e41..882480a 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -174,6 +174,7 @@ void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
 uint32_t blk_get_max_transfer(BlockBackend *blk);
+uint32_t blk_get_opt_transfer(BlockBackend *blk);
 int blk_get_max_iov(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_try_blockalign(BlockBackend *blk, size_t size);
diff --git a/block/block-backend.c b/block/block-backend.c
index efbf398..4d91ff8 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1426,6 +1426,18 @@ uint32_t blk_get_max_transfer(BlockBackend *blk)
     return MIN_NON_ZERO(max, INT_MAX);
 }

+/* Returns the optimum transfer length, in bytes; may be 0 if no optimum */
+uint32_t blk_get_opt_transfer(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+
+    if (bs) {
+        return bs->bl.opt_transfer;
+    } else {
+        return 0;
+    }
+}
+
 int blk_get_max_iov(BlockBackend *blk)
 {
     return blk->root->bs->bl.max_iov;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 4/8] nbd: Expose and debug more NBD constants
  2017-02-21  2:42 [Qemu-devel] [PATCH v4 0/8] Implement NBD_OPT_GO, block size advertisement Eric Blake
                   ` (2 preceding siblings ...)
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 3/8] block: Add blk_get_opt_transfer() Eric Blake
@ 2017-02-21  2:42 ` Eric Blake
  2017-02-21  8:12   ` Marc-André Lureau
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 5/8] nbd: Implement NBD_OPT_GO on server Eric Blake
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2017-02-21  2:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, vsementsov, den, Kevin Wolf, Max Reitz

The NBD protocol has several constants defined in various extensions
that we are about to implement.  Expose them to the code, along with
an easy way to map various constants to strings during diagnostic
messages.

Doing this points out a debug message in server.c that got
parameters mixed up.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: new patch
---
 include/block/nbd.h | 34 +++++++++++++++++++-------
 nbd/nbd-internal.h  |  9 +++----
 nbd/client.c        | 56 ++++++++++++++++++++++++++++---------------
 nbd/common.c        | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 nbd/server.c        | 17 +++++++------
 5 files changed, 145 insertions(+), 40 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 8cc9cbe..c84e022 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016 Red Hat, Inc.
+ *  Copyright (C) 2016-2017 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -83,18 +83,36 @@ typedef struct NBDReply NBDReply;
 #define NBD_FLAG_C_FIXED_NEWSTYLE (1 << 0) /* Fixed newstyle protocol. */
 #define NBD_FLAG_C_NO_ZEROES      (1 << 1) /* End handshake without zeroes. */

-/* Reply types. */
+/* Option requests. */
+#define NBD_OPT_EXPORT_NAME     (1)
+#define NBD_OPT_ABORT           (2)
+#define NBD_OPT_LIST            (3)
+/* #define NBD_OPT_PEEK_EXPORT  (4) not in use */
+#define NBD_OPT_STARTTLS        (5)
+#define NBD_OPT_INFO            (6)
+#define NBD_OPT_GO              (7)
+
+/* Option reply types. */
 #define NBD_REP_ERR(value) ((UINT32_C(1) << 31) | (value))

 #define NBD_REP_ACK             (1)             /* Data sending finished. */
 #define NBD_REP_SERVER          (2)             /* Export description. */
+#define NBD_REP_INFO            (3)             /* NBD_OPT_INFO/GO. */

-#define NBD_REP_ERR_UNSUP       NBD_REP_ERR(1)  /* Unknown option */
-#define NBD_REP_ERR_POLICY      NBD_REP_ERR(2)  /* Server denied */
-#define NBD_REP_ERR_INVALID     NBD_REP_ERR(3)  /* Invalid length */
-#define NBD_REP_ERR_PLATFORM    NBD_REP_ERR(4)  /* Not compiled in */
-#define NBD_REP_ERR_TLS_REQD    NBD_REP_ERR(5)  /* TLS required */
-#define NBD_REP_ERR_SHUTDOWN    NBD_REP_ERR(7)  /* Server shutting down */
+#define NBD_REP_ERR_UNSUP           NBD_REP_ERR(1)  /* Unknown option */
+#define NBD_REP_ERR_POLICY          NBD_REP_ERR(2)  /* Server denied */
+#define NBD_REP_ERR_INVALID         NBD_REP_ERR(3)  /* Invalid length */
+#define NBD_REP_ERR_PLATFORM        NBD_REP_ERR(4)  /* Not compiled in */
+#define NBD_REP_ERR_TLS_REQD        NBD_REP_ERR(5)  /* TLS required */
+#define NBD_REP_ERR_UNKNOWN         NBD_REP_ERR(6)  /* Export unknown */
+#define NBD_REP_ERR_SHUTDOWN        NBD_REP_ERR(7)  /* Server shutting down */
+#define NBD_REP_ERR_BLOCK_SIZE_REQD NBD_REP_ERR(8)  /* Need INFO_BLOCK_SIZE */
+
+/* Info types, used during NBD_REP_INFO */
+#define NBD_INFO_EXPORT         0
+#define NBD_INFO_NAME           1
+#define NBD_INFO_DESCRIPTION    2
+#define NBD_INFO_BLOCK_SIZE     3

 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index f43d990..aa5b2fd 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -76,12 +76,6 @@
 #define NBD_SET_TIMEOUT         _IO(0xab, 9)
 #define NBD_SET_FLAGS           _IO(0xab, 10)

-#define NBD_OPT_EXPORT_NAME     (1)
-#define NBD_OPT_ABORT           (2)
-#define NBD_OPT_LIST            (3)
-#define NBD_OPT_PEEK_EXPORT     (4)
-#define NBD_OPT_STARTTLS        (5)
-
 /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
  * but only a limited set of errno values is specified in the protocol.
  * Everything else is squashed to EINVAL.
@@ -122,5 +116,8 @@ struct NBDTLSHandshakeData {

 void nbd_tls_handshake(QIOTask *task,
                        void *opaque);
+const char *nbd_opt_lookup(uint32_t opt);
+const char *nbd_rep_lookup(uint32_t rep);
+const char *nbd_info_lookup(uint16_t info);

 #endif
diff --git a/nbd/client.c b/nbd/client.c
index 69f0e09..f96539b 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016 Red Hat, Inc.
+ *  Copyright (C) 2016-2017 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Client Side
@@ -130,7 +130,8 @@ static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
     if (len == -1) {
         req.length = len = strlen(data);
     }
-    TRACE("Sending option request %" PRIu32", len %" PRIu32, opt, len);
+    TRACE("Sending option request %" PRIu32" (%s), len %" PRIu32, opt,
+          nbd_opt_lookup(opt), len);

     stq_be_p(&req.magic, NBD_OPTS_MAGIC);
     stl_be_p(&req.option, opt);
@@ -180,8 +181,10 @@ static int nbd_receive_option_reply(QIOChannel *ioc, uint32_t opt,
     be32_to_cpus(&reply->type);
     be32_to_cpus(&reply->length);

-    TRACE("Received option reply %" PRIx32", type %" PRIx32", len %" PRIu32,
-          reply->option, reply->type, reply->length);
+    TRACE("Received option reply %" PRIx32" (%s), type %" PRIx32
+          " (%s), len %" PRIu32,
+          reply->option, nbd_opt_lookup(reply->option),
+          reply->type, nbd_rep_lookup(reply->type), reply->length);

     if (reply->magic != NBD_REP_MAGIC) {
         error_setg(errp, "Unexpected option reply magic");
@@ -215,12 +218,16 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,

     if (reply->length) {
         if (reply->length > NBD_MAX_BUFFER_SIZE) {
-            error_setg(errp, "server's error message is too long");
+            error_setg(errp, "server error 0x%" PRIx32
+                       " (%s) message is too long",
+                       reply->type, nbd_rep_lookup(reply->type));
             goto cleanup;
         }
         msg = g_malloc(reply->length + 1);
         if (read_sync(ioc, msg, reply->length) != reply->length) {
-            error_setg(errp, "failed to read option error message");
+            error_setg(errp, "failed to read option error 0x%" PRIx32
+                       " (%s) message",
+                       reply->type, nbd_rep_lookup(reply->type));
             goto cleanup;
         }
         msg[reply->length] = '\0';
@@ -229,38 +236,49 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
     switch (reply->type) {
     case NBD_REP_ERR_UNSUP:
         TRACE("server doesn't understand request %" PRIx32
-              ", attempting fallback", reply->option);
+              " (%s), attempting fallback",
+              reply->option, nbd_opt_lookup(reply->option));
         result = 0;
         goto cleanup;

     case NBD_REP_ERR_POLICY:
-        error_setg(errp, "Denied by server for option %" PRIx32,
-                   reply->option);
+        error_setg(errp, "Denied by server for option %" PRIx32 " (%s)",
+                   reply->option, nbd_opt_lookup(reply->option));
         break;

     case NBD_REP_ERR_INVALID:
-        error_setg(errp, "Invalid data length for option %" PRIx32,
-                   reply->option);
+        error_setg(errp, "Invalid data length for option %" PRIx32 " (%s)",
+                   reply->option, nbd_opt_lookup(reply->option));
         break;

     case NBD_REP_ERR_PLATFORM:
-        error_setg(errp, "Server lacks support for option %" PRIx32,
-                   reply->option);
+        error_setg(errp, "Server lacks support for option %" PRIx32 " (%s)",
+                   reply->option, nbd_opt_lookup(reply->option));
         break;

     case NBD_REP_ERR_TLS_REQD:
-        error_setg(errp, "TLS negotiation required before option %" PRIx32,
-                   reply->option);
+        error_setg(errp, "TLS negotiation required before option %" PRIx32
+                   " (%s)", reply->option, nbd_opt_lookup(reply->option));
+        break;
+
+    case NBD_REP_ERR_UNKNOWN:
+        error_setg(errp, "Requested export not available for option %" PRIx32
+                   " (%s)", reply->option, nbd_opt_lookup(reply->option));
         break;

     case NBD_REP_ERR_SHUTDOWN:
-        error_setg(errp, "Server shutting down before option %" PRIx32,
-                   reply->option);
+        error_setg(errp, "Server shutting down before option %" PRIx32 " (%s)",
+                   reply->option, nbd_opt_lookup(reply->option));
+        break;
+
+    case NBD_REP_ERR_BLOCK_SIZE_REQD:
+        error_setg(errp, "Server requires INFO_BLOCK_SIZE for option %" PRIx32
+                   " (%s)", reply->option, nbd_opt_lookup(reply->option));
         break;

     default:
-        error_setg(errp, "Unknown error code when asking for option %" PRIx32,
-                   reply->option);
+        error_setg(errp, "Unknown error code when asking for option %" PRIx32
+                   " (%s)", reply->option, nbd_opt_lookup(reply->option));
         break;
     }

diff --git a/nbd/common.c b/nbd/common.c
index a5f39ea..85622f9 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -89,3 +89,72 @@ void nbd_tls_handshake(QIOTask *task,
     data->complete = true;
     g_main_loop_quit(data->loop);
 }
+
+
+const char *nbd_opt_lookup(uint32_t opt)
+{
+    switch (opt) {
+    case NBD_OPT_EXPORT_NAME:
+        return "export name";
+    case NBD_OPT_ABORT:
+        return "abort";
+    case NBD_OPT_LIST:
+        return "list";
+    case NBD_OPT_STARTTLS:
+        return "starttls";
+    case NBD_OPT_INFO:
+        return "info";
+    case NBD_OPT_GO:
+        return "go";
+    default:
+        return "<unknown>";
+    }
+}
+
+
+const char *nbd_rep_lookup(uint32_t rep)
+{
+    switch (rep) {
+    case NBD_REP_ACK:
+        return "ack";
+    case NBD_REP_SERVER:
+        return "server";
+    case NBD_REP_INFO:
+        return "info";
+    case NBD_REP_ERR_UNSUP:
+        return "unsupported";
+    case NBD_REP_ERR_POLICY:
+        return "denied by policy";
+    case NBD_REP_ERR_INVALID:
+        return "invalid";
+    case NBD_REP_ERR_PLATFORM:
+        return "platform lacks support";
+    case NBD_REP_ERR_TLS_REQD:
+        return "TLS required";
+    case NBD_REP_ERR_UNKNOWN:
+        return "export unknown";
+    case NBD_REP_ERR_SHUTDOWN:
+        return "server shutting down";
+    case NBD_REP_ERR_BLOCK_SIZE_REQD:
+        return "block size required";
+    default:
+        return "<unknown>";
+    }
+}
+
+
+const char *nbd_info_lookup(uint16_t info)
+{
+    switch (info) {
+    case NBD_INFO_EXPORT:
+        return "export";
+    case NBD_INFO_NAME:
+        return "name";
+    case NBD_INFO_DESCRIPTION:
+        return "description";
+    case NBD_INFO_BLOCK_SIZE:
+        return "block size";
+    default:
+        return "<unknown>";
+    }
+}
diff --git a/nbd/server.c b/nbd/server.c
index efe5cb8..767ca0f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016 Red Hat, Inc.
+ *  Copyright (C) 2016-2017 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Server Side
@@ -206,8 +206,8 @@ static int nbd_negotiate_send_rep_len(QIOChannel *ioc, uint32_t type,
 {
     uint64_t magic;

-    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32 " len=%" PRIu32,
-          type, opt, len);
+    TRACE("Reply opt=%" PRIx32 " (%s), type=%" PRIx32 " (%s), len=%" PRIu32,
+          opt, nbd_opt_lookup(opt), type, nbd_rep_lookup(type), len);

     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
@@ -493,7 +493,8 @@ static int nbd_negotiate_options(NBDClient *client)
         }
         length = be32_to_cpu(length);

-        TRACE("Checking option 0x%" PRIx32, clientflags);
+        TRACE("Checking option 0x%" PRIx32 " (%s)", clientflags,
+              nbd_opt_lookup(clientflags));
         if (client->tlscreds &&
             client->ioc == (QIOChannel *)client->sioc) {
             QIOChannel *tioc;
@@ -581,8 +582,9 @@ static int nbd_negotiate_options(NBDClient *client)
                                                  NBD_REP_ERR_UNSUP,
                                                  clientflags,
                                                  "Unsupported option 0x%"
-                                                 PRIx32,
-                                                 clientflags);
+                                                 PRIx32 " (%s)",
+                                                 clientflags,
+                                                 nbd_opt_lookup(clientflags));
                 if (ret < 0) {
                     return ret;
                 }
@@ -598,7 +600,8 @@ static int nbd_negotiate_options(NBDClient *client)
                 return nbd_negotiate_handle_export_name(client, length);

             default:
-                TRACE("Unsupported option 0x%" PRIx32, clientflags);
+                TRACE("Unsupported option 0x%" PRIx32 " (%s)", clientflags,
+                      nbd_opt_lookup(clientflags));
                 return -EINVAL;
             }
         }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 5/8] nbd: Implement NBD_OPT_GO on server
  2017-02-21  2:42 [Qemu-devel] [PATCH v4 0/8] Implement NBD_OPT_GO, block size advertisement Eric Blake
                   ` (3 preceding siblings ...)
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 4/8] nbd: Expose and debug more NBD constants Eric Blake
@ 2017-02-21  2:42 ` Eric Blake
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 6/8] nbd: Implement NBD_OPT_GO on client Eric Blake
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2017-02-21  2:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, vsementsov, den

NBD_OPT_EXPORT_NAME is lousy: per the NBD protocol, any failure
requires us to close the connection rather than report an error.
Therefore, upstream NBD recently added NBD_OPT_GO as the improved
version of the option that does what we want [1], along with
NBD_OPT_INFO that returns the same information but does not
transition to transmission phase.

[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-info/doc/proto.md

This is a first cut at the information types, and only passes the
same information already available through NBD_OPT_LIST and
NBD_OPT_EXPORT_NAME; items like NBD_INFO_BLOCK_SIZE (and thus any
use of NBD_REP_ERR_BLOCK_SIZE_REQD) are intentionally left for
later patches.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: revamp to another round of NBD protocol changes
v3: revamp to match latest version of NBD protocol
---
 nbd/server.c | 208 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 195 insertions(+), 13 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 767ca0f..3b1a4a5 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -209,6 +209,7 @@ static int nbd_negotiate_send_rep_len(QIOChannel *ioc, uint32_t type,
     TRACE("Reply opt=%" PRIx32 " (%s), type=%" PRIx32 " (%s), len=%" PRIu32,
           opt, nbd_opt_lookup(opt), type, nbd_rep_lookup(type), len);

+    assert(len < NBD_MAX_BUFFER_SIZE);
     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
         LOG("write failed (rep magic)");
@@ -331,6 +332,8 @@ static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
     return nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, NBD_OPT_LIST);
 }

+/* Send a reply to NBD_OPT_EXPORT_NAME.
+ * Return -errno on error, 0 on success. */
 static int nbd_negotiate_handle_export_name(NBDClient *client, uint32_t length)
 {
     int rc = -EINVAL;
@@ -365,6 +368,171 @@ fail:
     return rc;
 }

+/* Send a single NBD_REP_INFO, with a buffer @buf of @length bytes.
+ * The buffer does NOT include the info type prefix.
+ * Return -errno on error, 0 if ready to send more. */
+static int nbd_negotiate_send_info(NBDClient *client, uint32_t opt,
+                                   uint16_t info, uint32_t length, void *buf)
+{
+    int rc;
+
+    TRACE("Sending NBD_REP_INFO type %" PRIu16 " (%s) with remaining length %"
+          PRIu32, info, nbd_info_lookup(info), length);
+    rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt,
+                                    sizeof(info) + length);
+    if (rc < 0) {
+        return rc;
+    }
+    cpu_to_be16s(&info);
+    if (nbd_negotiate_write(client->ioc, &info, sizeof(info)) !=
+        sizeof(info)) {
+        LOG("write failed");
+        return -EIO;
+    }
+    if (nbd_negotiate_write(client->ioc, buf, length) != length) {
+        LOG("write failed");
+        return -EIO;
+    }
+    return 0;
+}
+
+/* Handle NBD_OPT_INFO and NBD_OPT_GO.
+ * Return -errno on error, 0 if ready for next option, and 1 to move
+ * into transmission phase.  */
+static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
+                                     uint32_t opt, uint16_t myflags)
+{
+    int rc;
+    char name[NBD_MAX_NAME_SIZE + 1];
+    NBDExport *exp;
+    uint16_t requests;
+    uint16_t request;
+    uint32_t namelen;
+    bool sendname = false;
+    char buf[sizeof(uint64_t) + sizeof(uint16_t)];
+    const char *msg;
+
+    /* Client sends:
+        2 bytes: N, number of requests (can be 0)
+        N * 2 bytes: N requests
+        4 bytes: L, name length (can be 0)
+        L bytes: export name
+    */
+    if (length < sizeof(requests) + sizeof(namelen)) {
+        msg = "overall request too short";
+        goto invalid;
+    }
+    if (nbd_negotiate_read(client->ioc, &requests, sizeof(requests)) !=
+        sizeof(requests)) {
+        LOG("read failed");
+        return -EIO;
+    }
+    be16_to_cpus(&requests);
+    length -= sizeof(requests);
+    TRACE("Client requested %d items of info", requests);
+    if (requests > (length - sizeof(namelen)) / sizeof(request)) {
+        msg = "too many requests for overall length";
+        goto invalid;
+    }
+    while (requests--) {
+        if (nbd_negotiate_read(client->ioc, &request, sizeof(request)) !=
+            sizeof(request)) {
+            LOG("read failed");
+            return -EIO;
+        }
+        be16_to_cpus(&request);
+        length -= sizeof(request);
+        TRACE("Client requested info %d (%s)", request,
+              nbd_info_lookup(request));
+        /* For now, we only care about NBD_INFO_NAME; everything else
+         * is either a request we don't know or something we send
+         * regardless of request. */
+        if (request == NBD_INFO_NAME) {
+            sendname = true;
+        }
+    }
+
+    if (nbd_negotiate_read(client->ioc, &namelen, sizeof(namelen)) !=
+        sizeof(namelen)) {
+        LOG("read failed");
+        return -EIO;
+    }
+    be32_to_cpus(&namelen);
+    length -= sizeof(namelen);
+    TRACE("Client requested namelen %u", namelen);
+    if (length != namelen || namelen > sizeof(name)) {
+        msg = "name too long";
+        goto invalid;
+    }
+    if (nbd_negotiate_read(client->ioc, name, length) != length) {
+        LOG("read failed");
+        return -EIO;
+    }
+    name[length] = '\0';
+
+    TRACE("Client requested info on export '%s'", name);
+
+    exp = nbd_export_find(name);
+    if (!exp) {
+        return nbd_negotiate_send_rep_err(client->ioc, NBD_REP_ERR_UNKNOWN,
+                                          opt, "export '%s' not present",
+                                          name);
+    }
+
+    /* Don't bother sending NBD_INFO_NAME unless client requested it */
+    if (sendname) {
+        rc = nbd_negotiate_send_info(client, opt, NBD_INFO_NAME, length, name);
+        if (rc < 0) {
+            return rc;
+        }
+    }
+
+    /* Send NBD_INFO_DESCRIPTION only if available, regardless of
+     * client request */
+    if (exp->description) {
+        size_t len = strlen(exp->description);
+
+        rc = nbd_negotiate_send_info(client, opt, NBD_INFO_DESCRIPTION,
+                                     len, exp->description);
+        if (rc < 0) {
+            return rc;
+        }
+    }
+
+    /* Send NBD_INFO_EXPORT always */
+    TRACE("advertising size %" PRIu64 " and flags %" PRIx16,
+          exp->size, exp->nbdflags | myflags);
+    stq_be_p(buf, exp->size);
+    stw_be_p(buf + 8, exp->nbdflags | myflags);
+    rc = nbd_negotiate_send_info(client, opt, NBD_INFO_EXPORT,
+                                 sizeof(buf), buf);
+    if (rc < 0) {
+        return rc;
+    }
+
+    /* Final reply */
+    rc = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, opt);
+    if (rc < 0) {
+        return rc;
+    }
+
+    if (opt == NBD_OPT_GO) {
+        client->exp = exp;
+        QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
+        nbd_export_get(client->exp);
+        rc = 1;
+    }
+    return rc;
+
+ invalid:
+    if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
+        return -EIO;
+    }
+    return nbd_negotiate_send_rep_err(client->ioc, NBD_REP_ERR_INVALID, opt,
+                                      "%s", msg);
+}
+
+
 /* Handle NBD_OPT_STARTTLS. Return NULL to drop connection, or else the
  * new channel for all further (now-encrypted) communication. */
 static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
@@ -420,9 +588,10 @@ static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
 }


-/* Process all NBD_OPT_* client option commands.
- * Return -errno on error, 0 on success. */
-static int nbd_negotiate_options(NBDClient *client)
+/* Process all NBD_OPT_* client option commands, during fixed newstyle
+ * negotiation. Return -errno on error, 0 on successful NBD_OPT_EXPORT_NAME,
+ * and 1 on successful NBD_OPT_GO. */
+static int nbd_negotiate_options(NBDClient *client, uint16_t myflags)
 {
     uint32_t flags;
     bool fixedNewstyle = false;
@@ -555,6 +724,16 @@ static int nbd_negotiate_options(NBDClient *client)
             case NBD_OPT_EXPORT_NAME:
                 return nbd_negotiate_handle_export_name(client, length);

+            case NBD_OPT_INFO:
+            case NBD_OPT_GO:
+                ret = nbd_negotiate_handle_info(client, length, clientflags,
+                                                myflags);
+                if (ret) {
+                    assert(ret < 0 || clientflags == NBD_OPT_GO);
+                    return ret;
+                }
+                break;
+
             case NBD_OPT_STARTTLS:
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
@@ -675,20 +854,23 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
             LOG("write failed");
             goto fail;
         }
-        rc = nbd_negotiate_options(client);
-        if (rc != 0) {
+        rc = nbd_negotiate_options(client, myflags);
+        if (rc < 0) {
             LOG("option negotiation failed");
             goto fail;
         }

-        TRACE("advertising size %" PRIu64 " and flags %x",
-              client->exp->size, client->exp->nbdflags | myflags);
-        stq_be_p(buf + 18, client->exp->size);
-        stw_be_p(buf + 26, client->exp->nbdflags | myflags);
-        len = client->no_zeroes ? 10 : sizeof(buf) - 18;
-        if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
-            LOG("write failed");
-            goto fail;
+        if (!rc) {
+            /* If options ended with NBD_OPT_GO, we already sent this. */
+            TRACE("advertising size %" PRIu64 " and flags %x",
+                  client->exp->size, client->exp->nbdflags | myflags);
+            stq_be_p(buf + 18, client->exp->size);
+            stw_be_p(buf + 26, client->exp->nbdflags | myflags);
+            len = client->no_zeroes ? 10 : sizeof(buf) - 18;
+            if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
+                LOG("write failed");
+                goto fail;
+            }
         }
     }

-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 6/8] nbd: Implement NBD_OPT_GO on client
  2017-02-21  2:42 [Qemu-devel] [PATCH v4 0/8] Implement NBD_OPT_GO, block size advertisement Eric Blake
                   ` (4 preceding siblings ...)
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 5/8] nbd: Implement NBD_OPT_GO on server Eric Blake
@ 2017-02-21  2:42 ` Eric Blake
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server Eric Blake
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 8/8] nbd: Implement NBD_INFO_BLOCK_SIZE on client Eric Blake
  7 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2017-02-21  2:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, vsementsov, den

NBD_OPT_EXPORT_NAME is lousy: per the NBD protocol, any failure
requires the server to close the connection rather than report an
error to us.  Therefore, upstream NBD recently added NBD_OPT_GO as
the improved version of the option that does what we want [1]: it
reports sane errors on failures, and on success provides at least
as much info as NBD_OPT_EXPORT_NAME.

[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-info/doc/proto.md

This is a first cut at use of the information types.  Note that we
do not need to use NBD_OPT_INFO, and that use of NBD_OPT_GO means
we no longer have to use NBD_OPT_LIST to learn whether a server
requires TLS (this requires servers that gracefully handle unknown
NBD_OPT, many servers prior to qemu 2.5 were buggy, but I have patched
qemu, upstream nbd, and nbdkit in the meantime, in part because of
interoperability testing with this patch).  We still fall back to
NBD_OPT_LIST when NBD_OPT_GO is not supported on the server, as it
is still one last chance for a nicer error message.  Later patches
will use further info, like NBD_INFO_BLOCK_SIZE.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: NBD protocol changes, again
v3: revamp to match latest version of NBD protocol
---
 nbd/nbd-internal.h |   3 ++
 nbd/client.c       | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 131 insertions(+), 2 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index aa5b2fd..96c204b 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -56,8 +56,11 @@
  * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

+/* Size of all NBD_OPT_*, without payload */
 #define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
+/* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
 #define NBD_REPLY_SIZE          (4 + 4 + 8)
+
 #define NBD_REQUEST_MAGIC       0x25609513
 #define NBD_REPLY_MAGIC         0x67446698
 #define NBD_OPTS_MAGIC          0x49484156454F5054LL
diff --git a/nbd/client.c b/nbd/client.c
index f96539b..b408945 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -380,6 +380,118 @@ static int nbd_receive_list(QIOChannel *ioc, const char *want, bool *match,
 }


+/* Returns -1 if NBD_OPT_GO proves the export @wantname cannot be
+ * used, 0 if NBD_OPT_GO is unsupported (fall back to NBD_OPT_LIST and
+ * NBD_OPT_EXPORT_NAME in that case), and > 0 if the export is good to
+ * go (with @info populated). */
+static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
+                      NBDExportInfo *info, Error **errp)
+{
+    nbd_opt_reply reply;
+    uint32_t len = strlen(wantname);
+    uint16_t type;
+    int error;
+    char *buf;
+
+    /* The protocol requires that the server send NBD_INFO_EXPORT with
+     * a non-zero flags (at least NBD_FLAG_HAS_FLAGS must be set); so
+     * flags still 0 is a witness of a broken server. */
+    info->flags = 0;
+
+    TRACE("Attempting NBD_OPT_GO for export '%s'", wantname);
+    buf = g_malloc(2 + 4 + len + 1);
+    stw_be_p(buf, 0); /* No requests, live with whatever server sends */
+    stl_be_p(buf + 2, len);
+    memcpy(buf + 6, wantname, len);
+    if (nbd_send_option_request(ioc, NBD_OPT_GO, len + 6, buf, errp) < 0) {
+        return -1;
+    }
+
+    TRACE("Reading export info");
+    while (1) {
+        if (nbd_receive_option_reply(ioc, NBD_OPT_GO, &reply, errp) < 0) {
+            return -1;
+        }
+        error = nbd_handle_reply_err(ioc, &reply, errp);
+        if (error <= 0) {
+            return error;
+        }
+        len = reply.length;
+
+        if (reply.type == NBD_REP_ACK) {
+            /* Server is done sending info and moved into transmission
+               phase, but make sure it sent flags */
+            if (len) {
+                error_setg(errp, "server sent invalid NBD_REP_ACK");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            if (!info->flags) {
+                error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            TRACE("export is good to go");
+            return 1;
+        }
+        if (reply.type != NBD_REP_INFO) {
+            error_setg(errp, "unexpected reply type %" PRIx32 ", expected %x",
+                       reply.type, NBD_REP_INFO);
+            nbd_send_opt_abort(ioc);
+            return -1;
+        }
+        if (len < sizeof(type)) {
+            error_setg(errp, "NBD_REP_INFO length %" PRIu32 " is too short",
+                       len);
+            nbd_send_opt_abort(ioc);
+            return -1;
+        }
+        if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
+            error_setg(errp, "failed to read info type");
+            nbd_send_opt_abort(ioc);
+            return -1;
+        }
+        len -= sizeof(type);
+        be16_to_cpus(&type);
+        switch (type) {
+        case NBD_INFO_EXPORT:
+            if (len != sizeof(info->size) + sizeof(info->flags)) {
+                error_setg(errp, "remaining export info len %" PRIu32
+                           " is unexpected size", len);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            if (read_sync(ioc, &info->size, sizeof(info->size)) !=
+                sizeof(info->size)) {
+                error_setg(errp, "failed to read info size");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            be64_to_cpus(&info->size);
+            if (read_sync(ioc, &info->flags, sizeof(info->flags)) !=
+                sizeof(info->flags)) {
+                error_setg(errp, "failed to read info flags");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            be16_to_cpus(&info->flags);
+            TRACE("Size is %" PRIu64 ", export flags %" PRIx16,
+                  info->size, info->flags);
+            break;
+
+        default:
+            TRACE("ignoring unknown export info %" PRIu16 " (%s)", type,
+                  nbd_info_lookup(type));
+            if (drop_sync(ioc, len) != len) {
+                error_setg(errp, "Failed to read info payload");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            break;
+        }
+    }
+}
+
 /* Return -1 on failure, 0 if wantname is an available export. */
 static int nbd_receive_query_exports(QIOChannel *ioc,
                                      const char *wantname,
@@ -574,11 +686,25 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
             name = "";
         }
         if (fixedNewStyle) {
+            int result;
+
+            /* Try NBD_OPT_GO first - if it works, we are done (it
+             * also gives us a good message if the server requires
+             * TLS).  If it is not available, fall back to
+             * NBD_OPT_LIST for nicer error messages about a missing
+             * export, then use NBD_OPT_EXPORT_NAME.  */
+            result = nbd_opt_go(ioc, name, info, errp);
+            if (result < 0) {
+                goto fail;
+            }
+            if (result > 0) {
+                return 0;
+            }
             /* Check our desired export is present in the
              * server export list. Since NBD_OPT_EXPORT_NAME
              * cannot return an error message, running this
-             * query gives us good error reporting if the
-             * server required TLS
+             * query gives us better error reporting if the
+             * export name is not available.
              */
             if (nbd_receive_query_exports(ioc, name, errp) < 0) {
                 goto fail;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server
  2017-02-21  2:42 [Qemu-devel] [PATCH v4 0/8] Implement NBD_OPT_GO, block size advertisement Eric Blake
                   ` (5 preceding siblings ...)
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 6/8] nbd: Implement NBD_OPT_GO on client Eric Blake
@ 2017-02-21  2:42 ` Eric Blake
  2017-02-22 16:51   ` Paolo Bonzini
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 8/8] nbd: Implement NBD_INFO_BLOCK_SIZE on client Eric Blake
  7 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2017-02-21  2:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, vsementsov, den

The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.

Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512

We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.

We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: new patch
---
 nbd/server.c | 47 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 4 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 3b1a4a5..d63e5d3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -409,6 +409,8 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
     uint16_t request;
     uint32_t namelen;
     bool sendname = false;
+    bool blocksize = false;
+    uint32_t sizes[3];
     char buf[sizeof(uint64_t) + sizeof(uint16_t)];
     const char *msg;

@@ -444,11 +446,16 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
         length -= sizeof(request);
         TRACE("Client requested info %d (%s)", request,
               nbd_info_lookup(request));
-        /* For now, we only care about NBD_INFO_NAME; everything else
-         * is either a request we don't know or something we send
-         * regardless of request. */
-        if (request == NBD_INFO_NAME) {
+        /* We care about NBD_INFO_NAME and NBD_INFO_BLOCK_SIZE;
+         * everything else is either a request we don't know or
+         * something we send regardless of request */
+        switch (request) {
+        case NBD_INFO_NAME:
             sendname = true;
+            break;
+        case NBD_INFO_BLOCK_SIZE:
+            blocksize = true;
+            break;
         }
     }

@@ -499,6 +506,27 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
         }
     }

+    /* Send NBD_INFO_BLOCK_SIZE always, but tweak the minimum size
+     * according to whether the client requested it, and according to
+     * whether this is OPT_INFO or OPT_GO. */
+    /* minimum - 1 for back-compat, or 512 if client is new enough.
+     * TODO: consult blk_bs(blk)->request_align? */
+    sizes[0] = (opt == NBD_OPT_INFO || blocksize) ? BDRV_SECTOR_SIZE : 1;
+    /* preferred - At least 4096, but larger as appropriate. */
+    sizes[1] = MAX(blk_get_opt_transfer(exp->blk), 4096);
+    /* maximum - At most 32M, but smaller as appropriate. */
+    sizes[2] = MIN(blk_get_max_transfer(exp->blk), NBD_MAX_BUFFER_SIZE);
+    TRACE("advertising minimum 0x%" PRIx32 ", preferred 0x%" PRIx32
+          ", maximum 0x%" PRIx32, sizes[0], sizes[1], sizes[2]);
+    cpu_to_be32s(&sizes[0]);
+    cpu_to_be32s(&sizes[1]);
+    cpu_to_be32s(&sizes[2]);
+    rc = nbd_negotiate_send_info(client, opt, NBD_INFO_BLOCK_SIZE,
+                                 sizeof(sizes), sizes);
+    if (rc < 0) {
+        return rc;
+    }
+
     /* Send NBD_INFO_EXPORT always */
     TRACE("advertising size %" PRIu64 " and flags %" PRIx16,
           exp->size, exp->nbdflags | myflags);
@@ -510,6 +538,17 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
         return rc;
     }

+    /* If the client is just asking for NBD_OPT_INFO, but forgot to
+     * request block sizes, return an error.
+     * TODO: consult blk_bs(blk)->request_align, and only error if it
+     * is not 1? */
+    if (opt == NBD_OPT_INFO && !blocksize) {
+        return nbd_negotiate_send_rep_err(client->ioc,
+                                          NBD_REP_ERR_BLOCK_SIZE_REQD, opt,
+                                          "request NBD_INFO_BLOCK_SIZE to "
+                                          "use this export");
+    }
+
     /* Final reply */
     rc = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, opt);
     if (rc < 0) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v4 8/8] nbd: Implement NBD_INFO_BLOCK_SIZE on client
  2017-02-21  2:42 [Qemu-devel] [PATCH v4 0/8] Implement NBD_OPT_GO, block size advertisement Eric Blake
                   ` (6 preceding siblings ...)
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server Eric Blake
@ 2017-02-21  2:42 ` Eric Blake
  7 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2017-02-21  2:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pbonzini, vsementsov, den, Kevin Wolf, Max Reitz

The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server whether it intends to
obey block sizes.

When using the block layer as the client, we will obey block
sizes; but when used as 'qemu-nbd -c' to hand off to the
kernel nbd module as the client, we are still waiting for the
kernel to implement a way for us to learn if it will honor
block sizes (perhaps by an addition to sysfs, rather than an
ioctl), as well as any way to tell the kernel what additional
block sizes to obey (NBD_SET_BLKSIZE appears to be accurate
for the minimum size, but preferred and maximum sizes would
probably be new ioctl()s), so until then, we need to make our
request for block sizes conditional.

When using ioctl(NBD_SET_BLKSIZE) to hand off to the kernel,
use the minimum block size as the sector size if it is larger
than 512, which also has the nice effect of cooperating with
(non-qemu) servers that don't do read-modify-write when
exposing a block device with 4k sectors; it might also allow
us to visit a file larger than 2T on a 32-bit kernel.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: new patch
---
 include/block/nbd.h |  6 ++++
 block/nbd-client.c  |  4 +++
 block/nbd.c         | 14 +++++++--
 nbd/client.c        | 86 ++++++++++++++++++++++++++++++++++++++++++++---------
 qemu-nbd.c          |  2 +-
 5 files changed, 94 insertions(+), 18 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index c84e022..ae8cc12 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -143,8 +143,14 @@ enum {

 /* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
 struct NBDExportInfo {
+    /* Set by client before nbd_receive_negotiate() */
+    bool request_sizes;
+    /* Set by server results during nbd_receive_negotiate() */
     uint64_t size;
     uint16_t flags;
+    uint32_t min_block;
+    uint32_t opt_block;
+    uint32_t max_block;
 };
 typedef struct NBDExportInfo NBDExportInfo;

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 32d7c90..7dfe4ec 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -404,6 +404,7 @@ int nbd_client_init(BlockDriverState *bs,
     logout("session init %s\n", export);
     qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);

+    client->info.request_sizes = true;
     ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), export,
                                 tlscreds, hostname,
                                 &client->ioc, &client->info, errp);
@@ -418,6 +419,9 @@ int nbd_client_init(BlockDriverState *bs,
     if (client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) {
         bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP;
     }
+    if (client->info.min_block > bs->bl.request_alignment) {
+        bs->bl.request_alignment = client->info.min_block;
+    }

     qemu_co_mutex_init(&client->send_mutex);
     qemu_co_queue_init(&client->free_sema);
diff --git a/block/nbd.c b/block/nbd.c
index c43fa35..3afd475 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -465,9 +465,17 @@ static int nbd_co_flush(BlockDriverState *bs)

 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-    bs->bl.max_pdiscard = NBD_MAX_BUFFER_SIZE;
-    bs->bl.max_pwrite_zeroes = NBD_MAX_BUFFER_SIZE;
-    bs->bl.max_transfer = NBD_MAX_BUFFER_SIZE;
+    NBDClientSession *s = nbd_get_client_session(bs);
+    uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);
+
+    bs->bl.max_pdiscard = max;
+    bs->bl.max_pwrite_zeroes = max;
+    bs->bl.max_transfer = max;
+
+    if (s->info.opt_block &&
+        s->info.opt_block > bs->bl.opt_transfer) {
+        bs->bl.opt_transfer = s->info.opt_block;
+    }
 }

 static void nbd_close(BlockDriverState *bs)
diff --git a/nbd/client.c b/nbd/client.c
index b408945..9f62a02 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -399,11 +399,17 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
     info->flags = 0;

     TRACE("Attempting NBD_OPT_GO for export '%s'", wantname);
-    buf = g_malloc(2 + 4 + len + 1);
-    stw_be_p(buf, 0); /* No requests, live with whatever server sends */
-    stl_be_p(buf + 2, len);
-    memcpy(buf + 6, wantname, len);
-    if (nbd_send_option_request(ioc, NBD_OPT_GO, len + 6, buf, errp) < 0) {
+    /* At most one request, everything else up to server */
+    buf = g_malloc(2 + 2 * info->request_sizes + 4 + len);
+    stw_be_p(buf, info->request_sizes);
+    if (info->request_sizes) {
+        stw_be_p(buf + 2, NBD_INFO_BLOCK_SIZE);
+    }
+    stl_be_p(buf + 2 + 2 * info->request_sizes, len);
+    memcpy(buf + 2 + 2 * info->request_sizes + 4, wantname, len);
+    if (nbd_send_option_request(ioc, NBD_OPT_GO,
+                                2 + 2 * info->request_sizes + 4 + len, buf,
+                                errp) < 0) {
         return -1;
     }

@@ -435,8 +441,9 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
             return 1;
         }
         if (reply.type != NBD_REP_INFO) {
-            error_setg(errp, "unexpected reply type %" PRIx32 ", expected %x",
-                       reply.type, NBD_REP_INFO);
+            error_setg(errp, "unexpected reply type %" PRIx32
+                       " (%s), expected %x",
+                       reply.type, nbd_rep_lookup(reply.type), NBD_REP_INFO);
             nbd_send_opt_abort(ioc);
             return -1;
         }
@@ -479,6 +486,51 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
                   info->size, info->flags);
             break;

+        case NBD_INFO_BLOCK_SIZE:
+            if (len != sizeof(info->min_block) * 3) {
+                error_setg(errp, "remaining export info len %" PRIu32
+                           " is unexpected size", len);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            if (read_sync(ioc, &info->min_block, sizeof(info->min_block)) !=
+                sizeof(info->min_block)) {
+                error_setg(errp, "failed to read info minimum block size");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            be32_to_cpus(&info->min_block);
+            if (!is_power_of_2(info->min_block)) {
+                error_setg(errp, "server minimum block size %" PRId32
+                           "is not a power of two", info->min_block);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            if (read_sync(ioc, &info->opt_block, sizeof(info->opt_block)) !=
+                sizeof(info->opt_block)) {
+                error_setg(errp, "failed to read info preferred block size");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            be32_to_cpus(&info->opt_block);
+            if (!is_power_of_2(info->opt_block) ||
+                info->opt_block < info->min_block) {
+                error_setg(errp, "server preferred block size %" PRId32
+                           "is not valid", info->opt_block);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            if (read_sync(ioc, &info->max_block, sizeof(info->max_block)) !=
+                sizeof(info->max_block)) {
+                error_setg(errp, "failed to read info maximum block size");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            be32_to_cpus(&info->max_block);
+            TRACE("Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32,
+                  info->min_block, info->opt_block, info->max_block);
+            break;
+
         default:
             TRACE("ignoring unknown export info %" PRIu16 " (%s)", type,
                   nbd_info_lookup(type));
@@ -779,8 +831,14 @@ fail:
 #ifdef __linux__
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info)
 {
-    unsigned long sectors = info->size / BDRV_SECTOR_SIZE;
-    if (info->size / BDRV_SECTOR_SIZE != sectors) {
+    unsigned long sector_size = MAX(BDRV_SECTOR_SIZE, info->min_block);
+    unsigned long sectors = info->size / sector_size;
+
+    /* FIXME: Once the kernel module is patched to honor block sizes,
+     * and to advertise that fact to user space, we should update the
+     * hand-off to the kernel to use any block sizes we learned. */
+    assert(!info->request_sizes);
+    if (info->size / sector_size != sectors) {
         LOG("Export size %" PRId64 " too large for 32-bit kernel", info->size);
         return -E2BIG;
     }
@@ -793,18 +851,18 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info)
         return -serrno;
     }

-    TRACE("Setting block size to %lu", (unsigned long)BDRV_SECTOR_SIZE);
+    TRACE("Setting block size to %lu", sector_size);

-    if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) {
+    if (ioctl(fd, NBD_SET_BLKSIZE, sector_size) < 0) {
         int serrno = errno;
         LOG("Failed setting NBD block size");
         return -serrno;
     }

     TRACE("Setting size to %lu block(s)", sectors);
-    if (info->size % BDRV_SECTOR_SIZE) {
-        TRACE("Ignoring trailing %d bytes of export",
-              (int) (info->size % BDRV_SECTOR_SIZE));
+    if (info->size % sector_size) {
+        TRACE("Ignoring trailing %" PRId64 " bytes of export",
+              info->size % sector_size);
     }

     if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) {
diff --git a/qemu-nbd.c b/qemu-nbd.c
index c598cbe..45405ce 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -254,7 +254,7 @@ static void *show_parts(void *arg)
 static void *nbd_client_thread(void *arg)
 {
     char *device = arg;
-    NBDExportInfo info;
+    NBDExportInfo info = { .request_sizes = false, };
     QIOChannelSocket *sioc;
     int fd;
     int ret;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/8] nbd/client: fix drop_sync [CVE-2017-2630]
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 1/8] nbd/client: fix drop_sync [CVE-2017-2630] Eric Blake
@ 2017-02-21  8:11   ` Marc-André Lureau
  0 siblings, 0 replies; 19+ messages in thread
From: Marc-André Lureau @ 2017-02-21  8:11 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: pbonzini, vsementsov, den, qemu-block

On Tue, Feb 21, 2017 at 6:49 AM Eric Blake <eblake@redhat.com> wrote:

> From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>
> Comparison symbol is misused. It may lead to memory corruption.
> Introduced in commit 7d3123e.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Message-Id: <20170203154757.36140-6-vsementsov@virtuozzo.com>
> [eblake: add CVE details]
> Signed-off-by: Eric Blake <eblake@redhat.com>
>


Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>



> ---
>  nbd/client.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/nbd/client.c b/nbd/client.c
> index ffb0743..0d16cd1 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -94,7 +94,7 @@ static ssize_t drop_sync(QIOChannel *ioc, size_t size)
>      char small[1024];
>      char *buffer;
>
> -    buffer = sizeof(small) < size ? small : g_malloc(MIN(65536, size));
> +    buffer = sizeof(small) > size ? small : g_malloc(MIN(65536, size));
>      while (size > 0) {
>          ssize_t count = read_sync(ioc, buffer, MIN(65536, size));
>
> --
> 2.9.3
>
>
> --
Marc-André Lureau

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/8] nbd: Create struct for tracking export info
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 2/8] nbd: Create struct for tracking export info Eric Blake
@ 2017-02-21  8:11   ` Marc-André Lureau
  0 siblings, 0 replies; 19+ messages in thread
From: Marc-André Lureau @ 2017-02-21  8:11 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, vsementsov, den, qemu-block, Max Reitz, pbonzini

On Tue, Feb 21, 2017 at 6:49 AM Eric Blake <eblake@redhat.com> wrote:

> The NBD Protocol is introducing some additional information
> about exports, such as minimum request size and alignment, as
> well as an advertised maximum request size.  It will be easier
> to feed this information back to the block layer if we gather
> all the information into a struct, rather than adding yet more
> pointer parameters during negotiation.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>


Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>



>
> ---
> v4: rebase to master
> v3: new patch
> ---
>  block/nbd-client.h  |  3 +--
>  include/block/nbd.h | 15 +++++++++++----
>  block/nbd-client.c  | 18 ++++++++----------
>  block/nbd.c         |  2 +-
>  nbd/client.c        | 47 +++++++++++++++++++++++++----------------------
>  qemu-nbd.c          | 10 ++++------
>  6 files changed, 50 insertions(+), 45 deletions(-)
>
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index f8d6006..098b65c 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -20,8 +20,7 @@
>  typedef struct NBDClientSession {
>      QIOChannelSocket *sioc; /* The master data channel */
>      QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS)
> */
> -    uint16_t nbdflags;
> -    off_t size;
> +    NBDExportInfo info;
>
>      CoMutex send_mutex;
>      CoQueue free_sema;
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 3e373f0..8cc9cbe 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -123,16 +123,23 @@ enum {
>   * aren't overflowing some other buffer. */
>  #define NBD_MAX_NAME_SIZE 256
>
> +/* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
> +struct NBDExportInfo {
> +    uint64_t size;
> +    uint16_t flags;
> +};
> +typedef struct NBDExportInfo NBDExportInfo;
> +
>  ssize_t nbd_wr_syncv(QIOChannel *ioc,
>                       struct iovec *iov,
>                       size_t niov,
>                       size_t length,
>                       bool do_read);
> -int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t
> *flags,
> +int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
>                            QCryptoTLSCreds *tlscreds, const char *hostname,
> -                          QIOChannel **outioc,
> -                          off_t *size, Error **errp);
> -int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
> +                          QIOChannel **outioc, NBDExportInfo *info,
> +                          Error **errp);
> +int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info);
>  ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
>  ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
>  int nbd_client(int fd);
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 06f1532..32d7c90 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -258,7 +258,7 @@ int nbd_client_co_pwritev(BlockDriverState *bs,
> uint64_t offset,
>      ssize_t ret;
>
>      if (flags & BDRV_REQ_FUA) {
> -        assert(client->nbdflags & NBD_FLAG_SEND_FUA);
> +        assert(client->info.flags & NBD_FLAG_SEND_FUA);
>          request.flags |= NBD_CMD_FLAG_FUA;
>      }
>
> @@ -287,12 +287,12 @@ int nbd_client_co_pwrite_zeroes(BlockDriverState
> *bs, int64_t offset,
>      };
>      NBDReply reply;
>
> -    if (!(client->nbdflags & NBD_FLAG_SEND_WRITE_ZEROES)) {
> +    if (!(client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
>          return -ENOTSUP;
>      }
>
>      if (flags & BDRV_REQ_FUA) {
> -        assert(client->nbdflags & NBD_FLAG_SEND_FUA);
> +        assert(client->info.flags & NBD_FLAG_SEND_FUA);
>          request.flags |= NBD_CMD_FLAG_FUA;
>      }
>      if (!(flags & BDRV_REQ_MAY_UNMAP)) {
> @@ -317,7 +317,7 @@ int nbd_client_co_flush(BlockDriverState *bs)
>      NBDReply reply;
>      ssize_t ret;
>
> -    if (!(client->nbdflags & NBD_FLAG_SEND_FLUSH)) {
> +    if (!(client->info.flags & NBD_FLAG_SEND_FLUSH)) {
>          return 0;
>      }
>
> @@ -346,7 +346,7 @@ int nbd_client_co_pdiscard(BlockDriverState *bs,
> int64_t offset, int count)
>      NBDReply reply;
>      ssize_t ret;
>
> -    if (!(client->nbdflags & NBD_FLAG_SEND_TRIM)) {
> +    if (!(client->info.flags & NBD_FLAG_SEND_TRIM)) {
>          return 0;
>      }
>
> @@ -405,19 +405,17 @@ int nbd_client_init(BlockDriverState *bs,
>      qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
>
>      ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), export,
> -                                &client->nbdflags,
>                                  tlscreds, hostname,
> -                                &client->ioc,
> -                                &client->size, errp);
> +                                &client->ioc, &client->info, errp);
>      if (ret < 0) {
>          logout("Failed to negotiate with the NBD server\n");
>          return ret;
>      }
> -    if (client->nbdflags & NBD_FLAG_SEND_FUA) {
> +    if (client->info.flags & NBD_FLAG_SEND_FUA) {
>          bs->supported_write_flags = BDRV_REQ_FUA;
>          bs->supported_zero_flags |= BDRV_REQ_FUA;
>      }
> -    if (client->nbdflags & NBD_FLAG_SEND_WRITE_ZEROES) {
> +    if (client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) {
>          bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP;
>      }
>
> diff --git a/block/nbd.c b/block/nbd.c
> index 35f24be..c43fa35 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -485,7 +485,7 @@ static int64_t nbd_getlength(BlockDriverState *bs)
>  {
>      BDRVNBDState *s = bs->opaque;
>
> -    return s->client.size;
> +    return s->client.info.size;
>  }
>
>  static void nbd_detach_aio_context(BlockDriverState *bs)
> diff --git a/nbd/client.c b/nbd/client.c
> index 0d16cd1..69f0e09 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -454,13 +454,13 @@ static QIOChannel *nbd_receive_starttls(QIOChannel
> *ioc,
>  }
>
>
> -int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t
> *flags,
> +int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
>                            QCryptoTLSCreds *tlscreds, const char *hostname,
> -                          QIOChannel **outioc,
> -                          off_t *size, Error **errp)
> +                          QIOChannel **outioc, NBDExportInfo *info,
> +                          Error **errp)
>  {
>      char buf[256];
> -    uint64_t magic, s;
> +    uint64_t magic;
>      int rc;
>      bool zeroes = true;
>
> @@ -573,17 +573,19 @@ int nbd_receive_negotiate(QIOChannel *ioc, const
> char *name, uint16_t *flags,
>          }
>
>          /* Read the response */
> -        if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
> +        if (read_sync(ioc, &info->size, sizeof(info->size)) !=
> +            sizeof(info->size)) {
>              error_setg(errp, "Failed to read export length");
>              goto fail;
>          }
> -        *size = be64_to_cpu(s);
> +        be64_to_cpus(&info->size);
>
> -        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
> +        if (read_sync(ioc, &info->flags, sizeof(info->flags)) !=
> +            sizeof(info->flags)) {
>              error_setg(errp, "Failed to read export flags");
>              goto fail;
>          }
> -        be16_to_cpus(flags);
> +        be16_to_cpus(&info->flags);
>      } else if (magic == NBD_CLIENT_MAGIC) {
>          uint32_t oldflags;
>
> @@ -596,12 +598,12 @@ int nbd_receive_negotiate(QIOChannel *ioc, const
> char *name, uint16_t *flags,
>              goto fail;
>          }
>
> -        if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
> +        if (read_sync(ioc, &info->size, sizeof(info->size)) !=
> +            sizeof(info->size)) {
>              error_setg(errp, "Failed to read export length");
>              goto fail;
>          }
> -        *size = be64_to_cpu(s);
> -        TRACE("Size is %" PRIu64, *size);
> +        be64_to_cpus(&info->size);
>
>          if (read_sync(ioc, &oldflags, sizeof(oldflags)) !=
> sizeof(oldflags)) {
>              error_setg(errp, "Failed to read export flags");
> @@ -612,13 +614,14 @@ int nbd_receive_negotiate(QIOChannel *ioc, const
> char *name, uint16_t *flags,
>              error_setg(errp, "Unexpected export flags %0x" PRIx32,
> oldflags);
>              goto fail;
>          }
> -        *flags = oldflags;
> +        info->flags = oldflags;
>      } else {
>          error_setg(errp, "Bad magic received");
>          goto fail;
>      }
>
> -    TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
> +    TRACE("Size is %" PRIu64 ", export flags %" PRIx16,
> +          info->size, info->flags);
>      if (zeroes && drop_sync(ioc, 124) != 124) {
>          error_setg(errp, "Failed to read reserved block");
>          goto fail;
> @@ -630,11 +633,11 @@ fail:
>  }
>
>  #ifdef __linux__
> -int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
> +int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info)
>  {
> -    unsigned long sectors = size / BDRV_SECTOR_SIZE;
> -    if (size / BDRV_SECTOR_SIZE != sectors) {
> -        LOG("Export size %lld too large for 32-bit kernel", (long long)
> size);
> +    unsigned long sectors = info->size / BDRV_SECTOR_SIZE;
> +    if (info->size / BDRV_SECTOR_SIZE != sectors) {
> +        LOG("Export size %" PRId64 " too large for 32-bit kernel",
> info->size);
>          return -E2BIG;
>      }
>
> @@ -655,9 +658,9 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t
> flags, off_t size)
>      }
>
>      TRACE("Setting size to %lu block(s)", sectors);
> -    if (size % BDRV_SECTOR_SIZE) {
> +    if (info->size % BDRV_SECTOR_SIZE) {
>          TRACE("Ignoring trailing %d bytes of export",
> -              (int) (size % BDRV_SECTOR_SIZE));
> +              (int) (info->size % BDRV_SECTOR_SIZE));
>      }
>
>      if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) {
> @@ -666,9 +669,9 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t
> flags, off_t size)
>          return -serrno;
>      }
>
> -    if (ioctl(fd, NBD_SET_FLAGS, (unsigned long) flags) < 0) {
> +    if (ioctl(fd, NBD_SET_FLAGS, (unsigned long) info->flags) < 0) {
>          if (errno == ENOTTY) {
> -            int read_only = (flags & NBD_FLAG_READ_ONLY) != 0;
> +            int read_only = (info->flags & NBD_FLAG_READ_ONLY) != 0;
>              TRACE("Setting readonly attribute");
>
>              if (ioctl(fd, BLKROSET, (unsigned long) &read_only) < 0) {
> @@ -726,7 +729,7 @@ int nbd_disconnect(int fd)
>  }
>
>  #else
> -int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size)
> +int nbd_init(int fd, QIOChannelSocket *ioc, NBDExportInfo *info)
>  {
>      return -ENOTSUP;
>  }
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index e4fede6..c598cbe 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -254,8 +254,7 @@ static void *show_parts(void *arg)
>  static void *nbd_client_thread(void *arg)
>  {
>      char *device = arg;
> -    off_t size;
> -    uint16_t nbdflags;
> +    NBDExportInfo info;
>      QIOChannelSocket *sioc;
>      int fd;
>      int ret;
> @@ -270,9 +269,8 @@ static void *nbd_client_thread(void *arg)
>          goto out;
>      }
>
> -    ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL, &nbdflags,
> -                                NULL, NULL, NULL,
> -                                &size, &local_error);
> +    ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL,
> +                                NULL, NULL, NULL, &info, &local_error);
>      if (ret < 0) {
>          if (local_error) {
>              error_report_err(local_error);
> @@ -287,7 +285,7 @@ static void *nbd_client_thread(void *arg)
>          goto out_socket;
>      }
>
> -    ret = nbd_init(fd, sioc, nbdflags, size);
> +    ret = nbd_init(fd, sioc, &info);
>      if (ret < 0) {
>          goto out_fd;
>      }
> --
> 2.9.3
>
>
> --
Marc-André Lureau

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/8] block: Add blk_get_opt_transfer()
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 3/8] block: Add blk_get_opt_transfer() Eric Blake
@ 2017-02-21  8:11   ` Marc-André Lureau
  0 siblings, 0 replies; 19+ messages in thread
From: Marc-André Lureau @ 2017-02-21  8:11 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, vsementsov, den, qemu-block, Max Reitz, pbonzini

On Tue, Feb 21, 2017 at 6:45 AM Eric Blake <eblake@redhat.com> wrote:

> The NBD protocol would like to advertise the optimal I/O
> size to the client; but it would be a layering violation to
> peek into blk_bs(blk)->bl, when we only have a BB.
>
> This copies the existing blk_get_max_transfer() in reading
> a value from the top BDS; where that value was picked via
> bdrv_refresh_limits() to reflect the overall constraints of
> the entire BDS chain.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>



Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>



>
> ---
> v4: retitle, as part of rebasing to byte limits
> v3: new patch
> ---
>  include/sysemu/block-backend.h |  1 +
>  block/block-backend.c          | 12 ++++++++++++
>  2 files changed, 13 insertions(+)
>
> diff --git a/include/sysemu/block-backend.h
> b/include/sysemu/block-backend.h
> index 6444e41..882480a 100644
> --- a/include/sysemu/block-backend.h
> +++ b/include/sysemu/block-backend.h
> @@ -174,6 +174,7 @@ void blk_lock_medium(BlockBackend *blk, bool locked);
>  void blk_eject(BlockBackend *blk, bool eject_flag);
>  int blk_get_flags(BlockBackend *blk);
>  uint32_t blk_get_max_transfer(BlockBackend *blk);
> +uint32_t blk_get_opt_transfer(BlockBackend *blk);
>  int blk_get_max_iov(BlockBackend *blk);
>  void blk_set_guest_block_size(BlockBackend *blk, int align);
>  void *blk_try_blockalign(BlockBackend *blk, size_t size);
> diff --git a/block/block-backend.c b/block/block-backend.c
> index efbf398..4d91ff8 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1426,6 +1426,18 @@ uint32_t blk_get_max_transfer(BlockBackend *blk)
>      return MIN_NON_ZERO(max, INT_MAX);
>  }
>
> +/* Returns the optimum transfer length, in bytes; may be 0 if no optimum
> */
> +uint32_t blk_get_opt_transfer(BlockBackend *blk)
> +{
> +    BlockDriverState *bs = blk_bs(blk);
> +
> +    if (bs) {
> +        return bs->bl.opt_transfer;
> +    } else {
> +        return 0;
> +    }
> +}
> +
>  int blk_get_max_iov(BlockBackend *blk)
>  {
>      return blk->root->bs->bl.max_iov;
> --
> 2.9.3
>
>
> --
Marc-André Lureau

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/8] nbd: Expose and debug more NBD constants
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 4/8] nbd: Expose and debug more NBD constants Eric Blake
@ 2017-02-21  8:12   ` Marc-André Lureau
  0 siblings, 0 replies; 19+ messages in thread
From: Marc-André Lureau @ 2017-02-21  8:12 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, vsementsov, den, qemu-block, Max Reitz, pbonzini

On Tue, Feb 21, 2017 at 6:54 AM Eric Blake <eblake@redhat.com> wrote:

> The NBD protocol has several constants defined in various extensions
> that we are about to implement.  Expose them to the code, along with
> an easy way to map various constants to strings during diagnostic
> messages.
>
> Doing this points out a debug message in server.c that got
> parameters mixed up.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>


Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>



>
> ---
> v4: new patch
> ---
>  include/block/nbd.h | 34 +++++++++++++++++++-------
>  nbd/nbd-internal.h  |  9 +++----
>  nbd/client.c        | 56 ++++++++++++++++++++++++++++---------------
>  nbd/common.c        | 69
> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  nbd/server.c        | 17 +++++++------
>  5 files changed, 145 insertions(+), 40 deletions(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 8cc9cbe..c84e022 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -1,5 +1,5 @@
>  /*
> - *  Copyright (C) 2016 Red Hat, Inc.
> + *  Copyright (C) 2016-2017 Red Hat, Inc.
>   *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>   *
>   *  Network Block Device
> @@ -83,18 +83,36 @@ typedef struct NBDReply NBDReply;
>  #define NBD_FLAG_C_FIXED_NEWSTYLE (1 << 0) /* Fixed newstyle protocol. */
>  #define NBD_FLAG_C_NO_ZEROES      (1 << 1) /* End handshake without
> zeroes. */
>
> -/* Reply types. */
> +/* Option requests. */
> +#define NBD_OPT_EXPORT_NAME     (1)
> +#define NBD_OPT_ABORT           (2)
> +#define NBD_OPT_LIST            (3)
> +/* #define NBD_OPT_PEEK_EXPORT  (4) not in use */
> +#define NBD_OPT_STARTTLS        (5)
> +#define NBD_OPT_INFO            (6)
> +#define NBD_OPT_GO              (7)
> +
> +/* Option reply types. */
>  #define NBD_REP_ERR(value) ((UINT32_C(1) << 31) | (value))
>
>  #define NBD_REP_ACK             (1)             /* Data sending finished.
> */
>  #define NBD_REP_SERVER          (2)             /* Export description. */
> +#define NBD_REP_INFO            (3)             /* NBD_OPT_INFO/GO. */
>
> -#define NBD_REP_ERR_UNSUP       NBD_REP_ERR(1)  /* Unknown option */
> -#define NBD_REP_ERR_POLICY      NBD_REP_ERR(2)  /* Server denied */
> -#define NBD_REP_ERR_INVALID     NBD_REP_ERR(3)  /* Invalid length */
> -#define NBD_REP_ERR_PLATFORM    NBD_REP_ERR(4)  /* Not compiled in */
> -#define NBD_REP_ERR_TLS_REQD    NBD_REP_ERR(5)  /* TLS required */
> -#define NBD_REP_ERR_SHUTDOWN    NBD_REP_ERR(7)  /* Server shutting down */
> +#define NBD_REP_ERR_UNSUP           NBD_REP_ERR(1)  /* Unknown option */
> +#define NBD_REP_ERR_POLICY          NBD_REP_ERR(2)  /* Server denied */
> +#define NBD_REP_ERR_INVALID         NBD_REP_ERR(3)  /* Invalid length */
> +#define NBD_REP_ERR_PLATFORM        NBD_REP_ERR(4)  /* Not compiled in */
> +#define NBD_REP_ERR_TLS_REQD        NBD_REP_ERR(5)  /* TLS required */
> +#define NBD_REP_ERR_UNKNOWN         NBD_REP_ERR(6)  /* Export unknown */
> +#define NBD_REP_ERR_SHUTDOWN        NBD_REP_ERR(7)  /* Server shutting
> down */
> +#define NBD_REP_ERR_BLOCK_SIZE_REQD NBD_REP_ERR(8)  /* Need
> INFO_BLOCK_SIZE */
> +
> +/* Info types, used during NBD_REP_INFO */
> +#define NBD_INFO_EXPORT         0
> +#define NBD_INFO_NAME           1
> +#define NBD_INFO_DESCRIPTION    2
> +#define NBD_INFO_BLOCK_SIZE     3
>
>  /* Request flags, sent from client to server during transmission phase */
>  #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during
> write */
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index f43d990..aa5b2fd 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -76,12 +76,6 @@
>  #define NBD_SET_TIMEOUT         _IO(0xab, 9)
>  #define NBD_SET_FLAGS           _IO(0xab, 10)
>
> -#define NBD_OPT_EXPORT_NAME     (1)
> -#define NBD_OPT_ABORT           (2)
> -#define NBD_OPT_LIST            (3)
> -#define NBD_OPT_PEEK_EXPORT     (4)
> -#define NBD_OPT_STARTTLS        (5)
> -
>  /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
>   * but only a limited set of errno values is specified in the protocol.
>   * Everything else is squashed to EINVAL.
> @@ -122,5 +116,8 @@ struct NBDTLSHandshakeData {
>
>  void nbd_tls_handshake(QIOTask *task,
>                         void *opaque);
> +const char *nbd_opt_lookup(uint32_t opt);
> +const char *nbd_rep_lookup(uint32_t rep);
> +const char *nbd_info_lookup(uint16_t info);
>
>  #endif
> diff --git a/nbd/client.c b/nbd/client.c
> index 69f0e09..f96539b 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -1,5 +1,5 @@
>  /*
> - *  Copyright (C) 2016 Red Hat, Inc.
> + *  Copyright (C) 2016-2017 Red Hat, Inc.
>   *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>   *
>   *  Network Block Device Client Side
> @@ -130,7 +130,8 @@ static int nbd_send_option_request(QIOChannel *ioc,
> uint32_t opt,
>      if (len == -1) {
>          req.length = len = strlen(data);
>      }
> -    TRACE("Sending option request %" PRIu32", len %" PRIu32, opt, len);
> +    TRACE("Sending option request %" PRIu32" (%s), len %" PRIu32, opt,
> +          nbd_opt_lookup(opt), len);
>
>      stq_be_p(&req.magic, NBD_OPTS_MAGIC);
>      stl_be_p(&req.option, opt);
> @@ -180,8 +181,10 @@ static int nbd_receive_option_reply(QIOChannel *ioc,
> uint32_t opt,
>      be32_to_cpus(&reply->type);
>      be32_to_cpus(&reply->length);
>
> -    TRACE("Received option reply %" PRIx32", type %" PRIx32", len %"
> PRIu32,
> -          reply->option, reply->type, reply->length);
> +    TRACE("Received option reply %" PRIx32" (%s), type %" PRIx32
> +          " (%s), len %" PRIu32,
> +          reply->option, nbd_opt_lookup(reply->option),
> +          reply->type, nbd_rep_lookup(reply->type), reply->length);
>
>      if (reply->magic != NBD_REP_MAGIC) {
>          error_setg(errp, "Unexpected option reply magic");
> @@ -215,12 +218,16 @@ static int nbd_handle_reply_err(QIOChannel *ioc,
> nbd_opt_reply *reply,
>
>      if (reply->length) {
>          if (reply->length > NBD_MAX_BUFFER_SIZE) {
> -            error_setg(errp, "server's error message is too long");
> +            error_setg(errp, "server error 0x%" PRIx32
> +                       " (%s) message is too long",
> +                       reply->type, nbd_rep_lookup(reply->type));
>              goto cleanup;
>          }
>          msg = g_malloc(reply->length + 1);
>          if (read_sync(ioc, msg, reply->length) != reply->length) {
> -            error_setg(errp, "failed to read option error message");
> +            error_setg(errp, "failed to read option error 0x%" PRIx32
> +                       " (%s) message",
> +                       reply->type, nbd_rep_lookup(reply->type));
>              goto cleanup;
>          }
>          msg[reply->length] = '\0';
> @@ -229,38 +236,49 @@ static int nbd_handle_reply_err(QIOChannel *ioc,
> nbd_opt_reply *reply,
>      switch (reply->type) {
>      case NBD_REP_ERR_UNSUP:
>          TRACE("server doesn't understand request %" PRIx32
> -              ", attempting fallback", reply->option);
> +              " (%s), attempting fallback",
> +              reply->option, nbd_opt_lookup(reply->option));
>          result = 0;
>          goto cleanup;
>
>      case NBD_REP_ERR_POLICY:
> -        error_setg(errp, "Denied by server for option %" PRIx32,
> -                   reply->option);
> +        error_setg(errp, "Denied by server for option %" PRIx32 " (%s)",
> +                   reply->option, nbd_opt_lookup(reply->option));
>          break;
>
>      case NBD_REP_ERR_INVALID:
> -        error_setg(errp, "Invalid data length for option %" PRIx32,
> -                   reply->option);
> +        error_setg(errp, "Invalid data length for option %" PRIx32 "
> (%s)",
> +                   reply->option, nbd_opt_lookup(reply->option));
>          break;
>
>      case NBD_REP_ERR_PLATFORM:
> -        error_setg(errp, "Server lacks support for option %" PRIx32,
> -                   reply->option);
> +        error_setg(errp, "Server lacks support for option %" PRIx32 "
> (%s)",
> +                   reply->option, nbd_opt_lookup(reply->option));
>          break;
>
>      case NBD_REP_ERR_TLS_REQD:
> -        error_setg(errp, "TLS negotiation required before option %"
> PRIx32,
> -                   reply->option);
> +        error_setg(errp, "TLS negotiation required before option %" PRIx32
> +                   " (%s)", reply->option, nbd_opt_lookup(reply->option));
> +        break;
> +
> +    case NBD_REP_ERR_UNKNOWN:
> +        error_setg(errp, "Requested export not available for option %"
> PRIx32
> +                   " (%s)", reply->option, nbd_opt_lookup(reply->option));
>          break;
>
>      case NBD_REP_ERR_SHUTDOWN:
> -        error_setg(errp, "Server shutting down before option %" PRIx32,
> -                   reply->option);
> +        error_setg(errp, "Server shutting down before option %" PRIx32 "
> (%s)",
> +                   reply->option, nbd_opt_lookup(reply->option));
> +        break;
> +
> +    case NBD_REP_ERR_BLOCK_SIZE_REQD:
> +        error_setg(errp, "Server requires INFO_BLOCK_SIZE for option %"
> PRIx32
> +                   " (%s)", reply->option, nbd_opt_lookup(reply->option));
>          break;
>
>      default:
> -        error_setg(errp, "Unknown error code when asking for option %"
> PRIx32,
> -                   reply->option);
> +        error_setg(errp, "Unknown error code when asking for option %"
> PRIx32
> +                   " (%s)", reply->option, nbd_opt_lookup(reply->option));
>          break;
>      }
>
> diff --git a/nbd/common.c b/nbd/common.c
> index a5f39ea..85622f9 100644
> --- a/nbd/common.c
> +++ b/nbd/common.c
> @@ -89,3 +89,72 @@ void nbd_tls_handshake(QIOTask *task,
>      data->complete = true;
>      g_main_loop_quit(data->loop);
>  }
> +
> +
> +const char *nbd_opt_lookup(uint32_t opt)
> +{
> +    switch (opt) {
> +    case NBD_OPT_EXPORT_NAME:
> +        return "export name";
> +    case NBD_OPT_ABORT:
> +        return "abort";
> +    case NBD_OPT_LIST:
> +        return "list";
> +    case NBD_OPT_STARTTLS:
> +        return "starttls";
> +    case NBD_OPT_INFO:
> +        return "info";
> +    case NBD_OPT_GO:
> +        return "go";
> +    default:
> +        return "<unknown>";
> +    }
> +}
> +
> +
> +const char *nbd_rep_lookup(uint32_t rep)
> +{
> +    switch (rep) {
> +    case NBD_REP_ACK:
> +        return "ack";
> +    case NBD_REP_SERVER:
> +        return "server";
> +    case NBD_REP_INFO:
> +        return "info";
> +    case NBD_REP_ERR_UNSUP:
> +        return "unsupported";
> +    case NBD_REP_ERR_POLICY:
> +        return "denied by policy";
> +    case NBD_REP_ERR_INVALID:
> +        return "invalid";
> +    case NBD_REP_ERR_PLATFORM:
> +        return "platform lacks support";
> +    case NBD_REP_ERR_TLS_REQD:
> +        return "TLS required";
> +    case NBD_REP_ERR_UNKNOWN:
> +        return "export unknown";
> +    case NBD_REP_ERR_SHUTDOWN:
> +        return "server shutting down";
> +    case NBD_REP_ERR_BLOCK_SIZE_REQD:
> +        return "block size required";
> +    default:
> +        return "<unknown>";
> +    }
> +}
> +
> +
> +const char *nbd_info_lookup(uint16_t info)
> +{
> +    switch (info) {
> +    case NBD_INFO_EXPORT:
> +        return "export";
> +    case NBD_INFO_NAME:
> +        return "name";
> +    case NBD_INFO_DESCRIPTION:
> +        return "description";
> +    case NBD_INFO_BLOCK_SIZE:
> +        return "block size";
> +    default:
> +        return "<unknown>";
> +    }
> +}
> diff --git a/nbd/server.c b/nbd/server.c
> index efe5cb8..767ca0f 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1,5 +1,5 @@
>  /*
> - *  Copyright (C) 2016 Red Hat, Inc.
> + *  Copyright (C) 2016-2017 Red Hat, Inc.
>   *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>   *
>   *  Network Block Device Server Side
> @@ -206,8 +206,8 @@ static int nbd_negotiate_send_rep_len(QIOChannel *ioc,
> uint32_t type,
>  {
>      uint64_t magic;
>
> -    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32 " len=%" PRIu32,
> -          type, opt, len);
> +    TRACE("Reply opt=%" PRIx32 " (%s), type=%" PRIx32 " (%s), len=%"
> PRIu32,
> +          opt, nbd_opt_lookup(opt), type, nbd_rep_lookup(type), len);
>
>      magic = cpu_to_be64(NBD_REP_MAGIC);
>      if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic))
> {
> @@ -493,7 +493,8 @@ static int nbd_negotiate_options(NBDClient *client)
>          }
>          length = be32_to_cpu(length);
>
> -        TRACE("Checking option 0x%" PRIx32, clientflags);
> +        TRACE("Checking option 0x%" PRIx32 " (%s)", clientflags,
> +              nbd_opt_lookup(clientflags));
>          if (client->tlscreds &&
>              client->ioc == (QIOChannel *)client->sioc) {
>              QIOChannel *tioc;
> @@ -581,8 +582,9 @@ static int nbd_negotiate_options(NBDClient *client)
>                                                   NBD_REP_ERR_UNSUP,
>                                                   clientflags,
>                                                   "Unsupported option 0x%"
> -                                                 PRIx32,
> -                                                 clientflags);
> +                                                 PRIx32 " (%s)",
> +                                                 clientflags,
> +
>  nbd_opt_lookup(clientflags));
>                  if (ret < 0) {
>                      return ret;
>                  }
> @@ -598,7 +600,8 @@ static int nbd_negotiate_options(NBDClient *client)
>                  return nbd_negotiate_handle_export_name(client, length);
>
>              default:
> -                TRACE("Unsupported option 0x%" PRIx32, clientflags);
> +                TRACE("Unsupported option 0x%" PRIx32 " (%s)",
> clientflags,
> +                      nbd_opt_lookup(clientflags));
>                  return -EINVAL;
>              }
>          }
> --
> 2.9.3
>
>
> --
Marc-André Lureau

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server
  2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server Eric Blake
@ 2017-02-22 16:51   ` Paolo Bonzini
  2017-02-22 17:11     ` Eric Blake
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2017-02-22 16:51 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, vsementsov, den



On 21/02/2017 03:42, Eric Blake wrote:
> +    /* Send NBD_INFO_BLOCK_SIZE always, but tweak the minimum size
> +     * according to whether the client requested it, and according to
> +     * whether this is OPT_INFO or OPT_GO. */
> +    /* minimum - 1 for back-compat, or 512 if client is new enough.
> +     * TODO: consult blk_bs(blk)->request_align? */
> +    sizes[0] = (opt == NBD_OPT_INFO || blocksize) ? BDRV_SECTOR_SIZE : 1;
> +    /* preferred - At least 4096, but larger as appropriate. */
> +    sizes[1] = MAX(blk_get_opt_transfer(exp->blk), 4096);

Can we just say zero if the preferred transfer size is unknown?

Apart from this, it looks good.

Paolo

> +    /* maximum - At most 32M, but smaller as appropriate. */
> +    sizes[2] = MIN(blk_get_max_transfer(exp->blk), NBD_MAX_BUFFER_SIZE);
> +    TRACE("advertising minimum 0x%" PRIx32 ", preferred 0x%" PRIx32
> +          ", maximum 0x%" PRIx32, sizes[0], sizes[1], sizes[2]);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server
  2017-02-22 16:51   ` Paolo Bonzini
@ 2017-02-22 17:11     ` Eric Blake
  2017-02-22 17:26       ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2017-02-22 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block, vsementsov, den

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On 02/22/2017 10:51 AM, Paolo Bonzini wrote:
> 
> 
> On 21/02/2017 03:42, Eric Blake wrote:
>> +    /* Send NBD_INFO_BLOCK_SIZE always, but tweak the minimum size
>> +     * according to whether the client requested it, and according to
>> +     * whether this is OPT_INFO or OPT_GO. */
>> +    /* minimum - 1 for back-compat, or 512 if client is new enough.
>> +     * TODO: consult blk_bs(blk)->request_align? */
>> +    sizes[0] = (opt == NBD_OPT_INFO || blocksize) ? BDRV_SECTOR_SIZE : 1;
>> +    /* preferred - At least 4096, but larger as appropriate. */
>> +    sizes[1] = MAX(blk_get_opt_transfer(exp->blk), 4096);
> 
> Can we just say zero if the preferred transfer size is unknown?

The NBD specification requires a non-zero power-of-2 number if the
server transmits the block size at all; 1 is the ideal number, followed
by whatever actual size we learn from the request_align of the device.


I added blk_get_opt_transfer() in 3/8; maybe I should just respin the
series to also add blk_get_request_align() as well and get rid of the
TODO here.  In other words, my TODO is probably the result of me not
remembering to complete my rebase (since v3 of the series was posted so
many months ago).

> 
> Apart from this, it looks good.

Thanks; I'll get a respin out soon, with R-b added where they make
sense, and with the TODO removed.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server
  2017-02-22 17:11     ` Eric Blake
@ 2017-02-22 17:26       ` Paolo Bonzini
  2017-02-22 20:34         ` Eric Blake
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2017-02-22 17:26 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, vsementsov, den

[-- Attachment #1: Type: text/plain, Size: 1269 bytes --]



On 22/02/2017 18:11, Eric Blake wrote:
>>> +    /* preferred - At least 4096, but larger as appropriate. */
>>> +    sizes[1] = MAX(blk_get_opt_transfer(exp->blk), 4096);
>
> The NBD specification requires a non-zero power-of-2 number if the
> server transmits the block size at all; 1 is the ideal number, followed
> by whatever actual size we learn from the request_align of the device.

Oh, so it's the smallest "good" transfer size, or the preferred
alignment.  That's not the same as the SCSI definition, which is:

   If a device server receives one of these commands with a transfer
   size greater than this value, then the device server may incur
   delays in processing the command. An OPTIMAL TRANSFER LENGTH field
   set to 0000_0000h indicates that the device server does not report
   an optimal transfer size.

It's more similar to the physical block size:

   When using logical block access commands (see 4.2.2), application
   clients should:
   a) specify an LBA that is aligned to a physical block boundary; and
   b) access an integral number of physical blocks, provided that the
   access does not go beyond the last LBA on the medium.

So I'd rather ignore it in the client, and send 4096 in the server.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server
  2017-02-22 17:26       ` Paolo Bonzini
@ 2017-02-22 20:34         ` Eric Blake
  2017-02-23 13:13           ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2017-02-22 20:34 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block, vsementsov, den

[-- Attachment #1: Type: text/plain, Size: 3362 bytes --]

On 02/22/2017 11:26 AM, Paolo Bonzini wrote:
> 
> 
> On 22/02/2017 18:11, Eric Blake wrote:
>>>> +    /* preferred - At least 4096, but larger as appropriate. */
>>>> +    sizes[1] = MAX(blk_get_opt_transfer(exp->blk), 4096);
>>
>> The NBD specification requires a non-zero power-of-2 number if the
>> server transmits the block size at all; 1 is the ideal number, followed
>> by whatever actual size we learn from the request_align of the device.

Oh shoot - now I notice I misread your complaint - I thought you were
complaining about sizes[0] (min_size) having a TODO comment; but you
were talking about sizes[1] (preferred_size).

The NBD spec wording can be changed if needed (after all, it is still
experimental), but it currently says:

"If block size constraints have not been advertised or agreed on
externally, then a client SHOULD assume a default minimum block size of
1, a preferred block size of 2^12 (4,096), and a maximum block size of
the smaller of the export size or 0xffffffff (effectively unlimited).

...

"The preferred block size represents the minimum size at which aligned
requests will have efficient I/O, avoiding behaviour such as
read-modify-write. If advertised, this MUST be a power of 2 at least as
large as the smaller of the minimum block size and 2^12 (4,096),
although larger values (such as the minimum granularity of a hole) are
also appropriate. The preferred block size MAY be larger than the export
size, in which case the client is unable to utilize the preferred block
size for that export. The server MAY advertise an export size that is
not an integer multiple of the preferred block size."

> 
> Oh, so it's the smallest "good" transfer size, or the preferred
> alignment.  That's not the same as the SCSI definition, which is:
> 
>    If a device server receives one of these commands with a transfer
>    size greater than this value, then the device server may incur
>    delays in processing the command. An OPTIMAL TRANSFER LENGTH field
>    set to 0000_0000h indicates that the device server does not report
>    an optimal transfer size.

Hmm - that's yet another limit. I don't know if our block layer exposes
it, or if it should expose it.

> 
> It's more similar to the physical block size:

Indeed; at least that was my intent (picking a size that will avoid
read-modify-write pessimations, as well as reflecting granularity of
trim/zero operations).

> 
>    When using logical block access commands (see 4.2.2), application
>    clients should:
>    a) specify an LBA that is aligned to a physical block boundary; and
>    b) access an integral number of physical blocks, provided that the
>    access does not go beyond the last LBA on the medium.
> 
> So I'd rather ignore it in the client, and send 4096 in the server.

Does that mean our BlockLimits structure documentation needs a tweak,
too?  It currently reads:

    /* Optimal transfer length in bytes.  A power of 2 is best but not
     * mandatory.  Must be a multiple of bl.request_alignment, or 0 if
     * no preferred size */
    uint32_t opt_transfer;

Are we trying to track both optimum size in the SCSI sense _and_ block
size in the O_DIRECT sense?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server
  2017-02-22 20:34         ` Eric Blake
@ 2017-02-23 13:13           ` Paolo Bonzini
  2017-02-23 15:38             ` Eric Blake
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2017-02-23 13:13 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, vsementsov, den

[-- Attachment #1: Type: text/plain, Size: 2055 bytes --]



On 22/02/2017 21:34, Eric Blake wrote:
> 
>> Oh, so it's the smallest "good" transfer size, or the preferred
>> alignment.  That's not the same as the SCSI definition, which is:
>>
>>    If a device server receives one of these commands with a transfer
>>    size greater than this value, then the device server may incur
>>    delays in processing the command. An OPTIMAL TRANSFER LENGTH field
>>    set to 0000_0000h indicates that the device server does not report
>>    an optimal transfer size.
> Hmm - that's yet another limit. I don't know if our block layer exposes
> it, or if it should expose it.

It exposes it in opt_transfer. :)  The only driver that sets
opt_transfer is the iSCSI driver and uses the SCSI meaning.

>> It's more similar to the physical block size:
>
> Indeed; at least that was my intent (picking a size that will avoid
> read-modify-write pessimations, as well as reflecting granularity of
> trim/zero operations).

The two are not necessarily related.  Granularity of trim/zero is
usually a multiple of the sector size (for example qcow2 will have
physical block size = host physical block size, and unmap granularity =
cluster size).

>>    When using logical block access commands (see 4.2.2), application
>>    clients should:
>>    a) specify an LBA that is aligned to a physical block boundary; and
>>    b) access an integral number of physical blocks, provided that the
>>    access does not go beyond the last LBA on the medium.
>>
>> So I'd rather ignore it in the client, and send 4096 in the server.
> Does that mean our BlockLimits structure documentation needs a tweak,
> too?  It currently reads:
> 
>     /* Optimal transfer length in bytes.  A power of 2 is best but not
>      * mandatory.  Must be a multiple of bl.request_alignment, or 0 if
>      * no preferred size */
>     uint32_t opt_transfer;
> 
> Are we trying to track both optimum size in the SCSI sense _and_ block
> size in the O_DIRECT sense?

As of now, just in the SCSI sense.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server
  2017-02-23 13:13           ` Paolo Bonzini
@ 2017-02-23 15:38             ` Eric Blake
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2017-02-23 15:38 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block, vsementsov, den

[-- Attachment #1: Type: text/plain, Size: 1906 bytes --]

On 02/23/2017 07:13 AM, Paolo Bonzini wrote:
> 
> 
> On 22/02/2017 21:34, Eric Blake wrote:
>>
>>> Oh, so it's the smallest "good" transfer size, or the preferred
>>> alignment.  That's not the same as the SCSI definition, which is:
>>>
>>>    If a device server receives one of these commands with a transfer
>>>    size greater than this value, then the device server may incur
>>>    delays in processing the command. An OPTIMAL TRANSFER LENGTH field
>>>    set to 0000_0000h indicates that the device server does not report
>>>    an optimal transfer size.
>> Hmm - that's yet another limit. I don't know if our block layer exposes
>> it, or if it should expose it.
> 
> It exposes it in opt_transfer. :)  The only driver that sets
> opt_transfer is the iSCSI driver and uses the SCSI meaning.

>> Does that mean our BlockLimits structure documentation needs a tweak,
>> too?  It currently reads:
>>
>>     /* Optimal transfer length in bytes.  A power of 2 is best but not
>>      * mandatory.  Must be a multiple of bl.request_alignment, or 0 if
>>      * no preferred size */
>>     uint32_t opt_transfer;
>>
>> Are we trying to track both optimum size in the SCSI sense _and_ block
>> size in the O_DIRECT sense?
> 
> As of now, just in the SCSI sense.

Okay, sounds like I need a pre-requisite patch to beef up the
BlockLimits documentation, and maybe adding in a preferred size for yet
another parameter that we learn from stat() on files (there really are
performance benefits for performing I/O on a file in aligned requests,
even if it can do byte-wise manipulations).  And maybe the NBD
documentation needs a tweak too - which values are the most beneficial
to expose over the wire?  Should NBD be exposing more than just
min/preferred/max?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-02-23 15:39 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-21  2:42 [Qemu-devel] [PATCH v4 0/8] Implement NBD_OPT_GO, block size advertisement Eric Blake
2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 1/8] nbd/client: fix drop_sync [CVE-2017-2630] Eric Blake
2017-02-21  8:11   ` Marc-André Lureau
2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 2/8] nbd: Create struct for tracking export info Eric Blake
2017-02-21  8:11   ` Marc-André Lureau
2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 3/8] block: Add blk_get_opt_transfer() Eric Blake
2017-02-21  8:11   ` Marc-André Lureau
2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 4/8] nbd: Expose and debug more NBD constants Eric Blake
2017-02-21  8:12   ` Marc-André Lureau
2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 5/8] nbd: Implement NBD_OPT_GO on server Eric Blake
2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 6/8] nbd: Implement NBD_OPT_GO on client Eric Blake
2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server Eric Blake
2017-02-22 16:51   ` Paolo Bonzini
2017-02-22 17:11     ` Eric Blake
2017-02-22 17:26       ` Paolo Bonzini
2017-02-22 20:34         ` Eric Blake
2017-02-23 13:13           ` Paolo Bonzini
2017-02-23 15:38             ` Eric Blake
2017-02-21  2:42 ` [Qemu-devel] [PATCH v4 8/8] nbd: Implement NBD_INFO_BLOCK_SIZE on client Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.