All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/44] NBD protocol additions
@ 2016-04-22 23:40 Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 01/44] nbd: More debug typo fixes, use correct formats Eric Blake
                   ` (43 more replies)
  0 siblings, 44 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, nbd-general

This series is for qemu 2.7, and is a bit more stable this
time (upstream NBD extensions have been reaching some consensus
based on feedback I've made while implementing this series).

Included are some interoperability bug fixes, code cleanups, then
added support both client-side and server-side for:
NBD_FLAG_C_NO_ZEROES
NBD_CMD_WRITE_ZEROES
NBD_OPT_INFO
NBD_OPT_GO
NBD_OPT_BLOCK_SIZE

Still to come:
- support for NBD_OPT_STRUCTURED_REPLY
- strawman implementations to help with discussions towards
 NBD_CMD_BLOCK_STATUS

This posting is tied to this particular version of the NBD protocol:
https://github.com/yoe/nbd/blob/772b264/doc/proto.md
plus these two extensions:
https://github.com/yoe/nbd/blob/extension-info/doc/proto.md
https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md

I performed testing by temporarily turning on DEBUG_NBD while
compiling, then connecting variations on:
  ./qemu-nbd -f raw -x foo [-D string] file
  ./qemu-io -t none -f raw nbd://localhost:10809/foo
and watching the traces on both screen (both for startup negotiation,
and for various 'write -z [-u]', 'write -f [-u]', 'discard', and 'q'
commands in qemu-io.  I intentionally tested all three combinations of:
  old client, new server
  new client, old server
  new client, new server
at various phases of the series, to make sure that either side
gracefully handles unknown advertisements when the other side is
newer, and correctly falls back to older usage when the other side
is too old.

I'm posting now so that others may compile my work and help with
cross-project testing (such as qemu client to Alex's NBDGO
server), which in turn will help us move experimental extensions
into final form in the NBD protocol.

Also available as a tag at this location:
git fetch git://repo.or.cz/qemu/ericb.git nbd-flags-v3

Comparison to v2:
001/44:[----] [-C] 'nbd: More debug typo fixes, use correct formats'
002/44:[down] 'nbd: Quit server after any write error'
003/44:[down] 'nbd: Improve server handling of bogus commands'
004/44:[----] [-C] 'nbd: Reject unknown request flags'
005/44:[down] 'nbd: Group all Linux-specific ioctl code in one place'
006/44:[down] 'nbd: Clean up ioctl handling of qemu-nbd -c'
007/44:[0036] [FC] 'nbd: Limit nbdflags to 16 bits'
008/44:[down] 'nbd: Add qemu-nbd -D for human-readable description'
009/44:[down] 'block: Allow BDRV_REQ_FUA through blk_pwrite()'
010/44:[down] 'fdc: Switch to byte-based block access'
011/44:[down] 'nand: Switch to byte-based block access'
012/44:[down] 'onenand: Switch to byte-based block access'
013/44:[down] 'pflash: Switch to byte-based block access'
014/44:[down] 'sd: Switch to byte-based block access'
015/44:[down] 'm25p80: Switch to byte-based block access'
016/44:[down] 'atapi: Switch to byte-based block access'
017/44:[down] 'nbd: Switch to byte-based block access'
018/44:[down] 'qemu-img: Switch to byte-based block access'
019/44:[down] 'qemu-io: Switch to byte-based block access'
020/44:[down] 'block: Switch blk_read_unthrottled() to byte interface'
021/44:[down] 'block: Switch blk_write_zeroes() to byte interface'
022/44:[down] 'block: Kill blk_write(), blk_read()'
023/44:[down] 'qemu-io: Add missing option documentation'
024/44:[down] 'qemu-io: Add 'write -f' to test FUA flag'
025/44:[down] 'qemu-io: Add 'open -u' to set BDRV_O_UNMAP after the fact'
026/44:[down] 'qemu-io: Add 'write -z -u' to test MAY_UNMAP flag'
027/44:[down] 'nbd: Use BDRV_REQ_FUA for better FUA where supported'
028/44:[----] [--] 'nbd: Detect servers that send unexpected error values'
029/44:[0008] [FC] 'nbd: Avoid magic number for NBD max name size'
030/44:[0026] [FC] 'nbd: Treat flags vs. command type as separate fields'
031/44:[0025] [FC] 'nbd: Share common reply-sending code in server'
032/44:[0002] [FC] 'nbd: Share common option-sending code in client'
033/44:[----] [-C] 'nbd: Let client skip portions of server reply'
034/44:[----] [--] 'nbd: Less allocation during NBD_OPT_LIST'
035/44:[----] [-C] 'nbd: Support shorter handshake'
036/44:[down] 'nbd: Improve handling of shutdown requests'
037/44:[down] 'nbd: Create struct for tracking export info'
038/44:[down] 'block: Add blk_get_opt_transfer_length()'
039/44:[0065] [FC] 'nbd: Implement NBD_OPT_GO on server'
040/44:[0139] [FC] 'nbd: Implement NBD_OPT_GO on client'
041/44:[0043] [FC] 'nbd: Implement NBD_CMD_WRITE_ZEROES on server'
042/44:[0005] [FC] 'nbd: Implement NBD_CMD_WRITE_ZEROES on client'
043/44:[down] 'nbd: Implement NBD_OPT_BLOCK_SIZE on server'
044/44:[down] 'nbd: Implement NBD_OPT_BLOCK_SIZE on client'

Eric Blake (44):
  nbd: More debug typo fixes, use correct formats
  nbd: Quit server after any write error
  nbd: Improve server handling of bogus commands
  nbd: Reject unknown request flags
  nbd: Group all Linux-specific ioctl code in one place
  nbd: Clean up ioctl handling of qemu-nbd -c
  nbd: Limit nbdflags to 16 bits
  nbd: Add qemu-nbd -D for human-readable description
  block: Allow BDRV_REQ_FUA through blk_pwrite()
  fdc: Switch to byte-based block access
  nand: Switch to byte-based block access
  onenand: Switch to byte-based block access
  pflash: Switch to byte-based block access
  sd: Switch to byte-based block access
  m25p80: Switch to byte-based block access
  atapi: Switch to byte-based block access
  nbd: Switch to byte-based block access
  qemu-img: Switch to byte-based block access
  qemu-io: Switch to byte-based block access
  block: Switch blk_read_unthrottled() to byte interface
  block: Switch blk_write_zeroes() to byte interface
  block: Kill blk_write(), blk_read()
  qemu-io: Add missing option documentation
  qemu-io: Add 'write -f' to test FUA flag
  qemu-io: Add 'open -u' to set BDRV_O_UNMAP after the fact
  qemu-io: Add 'write -z -u' to test MAY_UNMAP flag
  nbd: Use BDRV_REQ_FUA for better FUA where supported
  nbd: Detect servers that send unexpected error values
  nbd: Avoid magic number for NBD max name size
  nbd: Treat flags vs. command type as separate fields
  nbd: Share common reply-sending code in server
  nbd: Share common option-sending code in client
  nbd: Let client skip portions of server reply
  nbd: Less allocation during NBD_OPT_LIST
  nbd: Support shorter handshake
  nbd: Improve handling of shutdown requests
  nbd: Create struct for tracking export info
  block: Add blk_get_opt_transfer_length()
  nbd: Implement NBD_OPT_GO on server
  nbd: Implement NBD_OPT_GO on client
  nbd: Implement NBD_CMD_WRITE_ZEROES on server
  nbd: Implement NBD_CMD_WRITE_ZEROES on client
  nbd: Implement NBD_OPT_BLOCK_SIZE on server
  nbd: Implement NBD_OPT_BLOCK_SIZE on client

 block/nbd-client.h             |   5 +-
 include/block/nbd.h            | 101 ++++--
 include/sysemu/block-backend.h |  16 +-
 nbd/nbd-internal.h             |  18 +-
 block/block-backend.c          |  58 ++--
 block/crypto.c                 |   2 +-
 block/nbd-client.c             |  56 +++-
 block/nbd.c                    |  41 ++-
 block/parallels.c              |   5 +-
 block/qcow.c                   |   8 +-
 block/qcow2.c                  |   4 +-
 block/qed.c                    |   6 +-
 block/sheepdog.c               |   2 +-
 block/vdi.c                    |   4 +-
 block/vhdx.c                   |   5 +-
 block/vmdk.c                   |  10 +-
 block/vpc.c                    |  10 +-
 hw/block/fdc.c                 |  25 +-
 hw/block/hd-geometry.c         |   2 +-
 hw/block/m25p80.c              |   3 +-
 hw/block/nand.c                |  36 ++-
 hw/block/onenand.c             |  36 ++-
 hw/block/pflash_cfi01.c        |  12 +-
 hw/block/pflash_cfi02.c        |  12 +-
 hw/ide/atapi.c                 |   8 +-
 hw/nvram/spapr_nvram.c         |   4 +-
 hw/sd/sd.c                     |  46 +--
 nbd/client.c                   | 717 ++++++++++++++++++++++++++---------------
 nbd/server.c                   | 552 +++++++++++++++++++++++--------
 qemu-img.c                     |  31 +-
 qemu-io-cmds.c                 | 114 +++----
 qemu-io.c                      |  12 +-
 qemu-nbd.c                     |  35 +-
 qemu-nbd.texi                  |   5 +-
 34 files changed, 1307 insertions(+), 694 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 01/44] nbd: More debug typo fixes, use correct formats
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 02/44] nbd: Quit server after any write error Eric Blake
                   ` (42 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

Clean up some debug message oddities missed earlier; this includes
both typos, and recognizing that %d is not necessarily compatible
with uint32_t.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

---
v3: rebase
---
 nbd/client.c | 41 ++++++++++++++++++++++-------------------
 nbd/server.c | 44 +++++++++++++++++++++++---------------------
 2 files changed, 45 insertions(+), 40 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 48f2a21..42e4e52 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -109,25 +109,27 @@ static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,

     switch (type) {
     case NBD_REP_ERR_UNSUP:
-        TRACE("server doesn't understand request %d, attempting fallback",
-              opt);
+        TRACE("server doesn't understand request %" PRIx32
+              ", attempting fallback", opt);
         result = 0;
         goto cleanup;

     case NBD_REP_ERR_POLICY:
-        error_setg(errp, "Denied by server for option %x", opt);
+        error_setg(errp, "Denied by server for option %" PRIx32, opt);
         break;

     case NBD_REP_ERR_INVALID:
-        error_setg(errp, "Invalid data length for option %x", opt);
+        error_setg(errp, "Invalid data length for option %" PRIx32, opt);
         break;

     case NBD_REP_ERR_TLS_REQD:
-        error_setg(errp, "TLS negotiation required before option %x", opt);
+        error_setg(errp, "TLS negotiation required before option %" PRIx32,
+                   opt);
         break;

     default:
-        error_setg(errp, "Unknown error code when asking for option %x", opt);
+        error_setg(errp, "Unknown error code when asking for option %" PRIx32,
+                   opt);
         break;
     }

@@ -165,7 +167,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
     }
     opt = be32_to_cpu(opt);
     if (opt != NBD_OPT_LIST) {
-        error_setg(errp, "Unexpected option type %x expected %x",
+        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
                    opt, NBD_OPT_LIST);
         return -1;
     }
@@ -207,7 +209,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             return -1;
         }
         if (namelen > 255) {
-            error_setg(errp, "export name length too long %d", namelen);
+            error_setg(errp, "export name length too long %" PRIu32, namelen);
             return -1;
         }

@@ -234,7 +236,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             g_free(buf);
         }
     } else {
-        error_setg(errp, "Unexpected reply type %x expected %x",
+        error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
                    type, NBD_REP_SERVER);
         return -1;
     }
@@ -349,7 +351,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
     }
     opt = be32_to_cpu(opt);
     if (opt != NBD_OPT_STARTTLS) {
-        error_setg(errp, "Unexpected option type %x expected %x",
+        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
                    opt, NBD_OPT_STARTTLS);
         return NULL;
     }
@@ -361,7 +363,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
     }
     type = be32_to_cpu(type);
     if (type != NBD_REP_ACK) {
-        error_setg(errp, "Server rejected request to start TLS %x",
+        error_setg(errp, "Server rejected request to start TLS %" PRIx32,
                    type);
         return NULL;
     }
@@ -373,7 +375,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
     }
     length = be32_to_cpu(length);
     if (length != 0) {
-        error_setg(errp, "Start TLS reponse was not zero %x",
+        error_setg(errp, "Start TLS response was not zero %" PRIu32,
                    length);
         return NULL;
     }
@@ -384,7 +386,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
         return NULL;
     }
     data.loop = g_main_loop_new(g_main_context_default(), FALSE);
-    TRACE("Starting TLS hanshake");
+    TRACE("Starting TLS handshake");
     qio_channel_tls_handshake(tioc,
                               nbd_tls_handshake,
                               &data,
@@ -474,7 +476,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         }
         globalflags = be16_to_cpu(globalflags);
         *flags = globalflags << 16;
-        TRACE("Global flags are %x", globalflags);
+        TRACE("Global flags are %" PRIx32, globalflags);
         if (globalflags & NBD_FLAG_FIXED_NEWSTYLE) {
             fixedNewStyle = true;
             TRACE("Server supports fixed new style");
@@ -550,7 +552,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         }
         exportflags = be16_to_cpu(exportflags);
         *flags |= exportflags;
-        TRACE("Export flags are %x", exportflags);
+        TRACE("Export flags are %" PRIx16, exportflags);
     } else if (magic == NBD_CLIENT_MAGIC) {
         if (name) {
             error_setg(errp, "Server does not support export names");
@@ -683,7 +685,8 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request)
     ssize_t ret;

     TRACE("Sending request to server: "
-          "{ .from = %" PRIu64", .len = %u, .handle = %" PRIu64", .type=%i}",
+          "{ .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64
+          ", .type=%" PRIu16 " }",
           request->from, request->len, request->handle, request->type);

     cpu_to_be32w((uint32_t*)buf, NBD_REQUEST_MAGIC);
@@ -732,12 +735,12 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply)

     reply->error = nbd_errno_to_system_errno(reply->error);

-    TRACE("Got reply: "
-          "{ magic = 0x%x, .error = %d, handle = %" PRIu64" }",
+    TRACE("Got reply: { magic = 0x%" PRIx32 ", .error = % " PRId32
+          ", handle = %" PRIu64" }",
           magic, reply->error, reply->handle);

     if (magic != NBD_REPLY_MAGIC) {
-        LOG("invalid magic (got 0x%x)", magic);
+        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
     return 0;
diff --git a/nbd/server.c b/nbd/server.c
index 2184c64..fc36f4d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -196,7 +196,7 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
     uint64_t magic;
     uint32_t len;

-    TRACE("Reply opt=%x type=%x", type, opt);
+    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32, type, opt);

     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
@@ -226,7 +226,7 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
     uint64_t magic, name_len;
     uint32_t opt, type, len;

-    TRACE("Advertizing export name '%s'", exp->name ? exp->name : "");
+    TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
     name_len = strlen(exp->name);
     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
@@ -392,12 +392,12 @@ static int nbd_negotiate_options(NBDClient *client)
     TRACE("Checking client flags");
     be32_to_cpus(&flags);
     if (flags & NBD_FLAG_C_FIXED_NEWSTYLE) {
-        TRACE("Support supports fixed newstyle handshake");
+        TRACE("Client supports fixed newstyle handshake");
         fixedNewstyle = true;
         flags &= ~NBD_FLAG_C_FIXED_NEWSTYLE;
     }
     if (flags != 0) {
-        TRACE("Unknown client flags 0x%x received", flags);
+        TRACE("Unknown client flags 0x%" PRIx32 " received", flags);
         return -EIO;
     }

@@ -431,12 +431,12 @@ static int nbd_negotiate_options(NBDClient *client)
         }
         length = be32_to_cpu(length);

-        TRACE("Checking option 0x%x", clientflags);
+        TRACE("Checking option 0x%" PRIx32, clientflags);
         if (client->tlscreds &&
             client->ioc == (QIOChannel *)client->sioc) {
             QIOChannel *tioc;
             if (!fixedNewstyle) {
-                TRACE("Unsupported option 0x%x", clientflags);
+                TRACE("Unsupported option 0x%" PRIx32, clientflags);
                 return -EINVAL;
             }
             switch (clientflags) {
@@ -455,7 +455,8 @@ static int nbd_negotiate_options(NBDClient *client)
                 return -EINVAL;

             default:
-                TRACE("Option 0x%x not permitted before TLS", clientflags);
+                TRACE("Option 0x%" PRIx32 " not permitted before TLS",
+                      clientflags);
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
                 }
@@ -493,7 +494,7 @@ static int nbd_negotiate_options(NBDClient *client)
                 }
                 break;
             default:
-                TRACE("Unsupported option 0x%x", clientflags);
+                TRACE("Unsupported option 0x%" PRIx32, clientflags);
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
                 }
@@ -511,7 +512,7 @@ static int nbd_negotiate_options(NBDClient *client)
                 return nbd_negotiate_handle_export_name(client, length);

             default:
-                TRACE("Unsupported option 0x%x", clientflags);
+                TRACE("Unsupported option 0x%" PRIx32, clientflags);
                 return -EINVAL;
             }
         }
@@ -652,12 +653,12 @@ static ssize_t nbd_receive_request(QIOChannel *ioc, struct nbd_request *request)
     request->from  = be64_to_cpup((uint64_t*)(buf + 16));
     request->len   = be32_to_cpup((uint32_t*)(buf + 24));

-    TRACE("Got request: "
-          "{ magic = 0x%x, .type = %d, from = %" PRIu64" , len = %u }",
+    TRACE("Got request: { magic = 0x%" PRIx32 ", .type = %" PRIx32
+          ", from = %" PRIu64 " , len = %" PRIu32 " }",
           magic, request->type, request->from, request->len);

     if (magic != NBD_REQUEST_MAGIC) {
-        LOG("invalid magic (got 0x%x)", magic);
+        LOG("invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
     return 0;
@@ -670,7 +671,8 @@ static ssize_t nbd_send_reply(QIOChannel *ioc, struct nbd_reply *reply)

     reply->error = system_errno_to_nbd_errno(reply->error);

-    TRACE("Sending response to client: { .error = %d, handle = %" PRIu64 " }",
+    TRACE("Sending response to client: { .error = %" PRId32
+          ", handle = %" PRIu64 " }",
           reply->error, reply->handle);

     /* Reply
@@ -999,7 +1001,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
     command = request->type & NBD_CMD_MASK_COMMAND;
     if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
-            LOG("len (%u) is larger than max len (%u)",
+            LOG("len (%" PRIu32" ) is larger than max len (%u)",
                 request->len, NBD_MAX_BUFFER_SIZE);
             rc = -EINVAL;
             goto out;
@@ -1012,7 +1014,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
         }
     }
     if (command == NBD_CMD_WRITE) {
-        TRACE("Reading %u byte(s)", request->len);
+        TRACE("Reading %" PRIu32 " byte(s)", request->len);

         if (read_sync(client->ioc, req->data, request->len) != request->len) {
             LOG("reading from socket failed");
@@ -1062,10 +1064,10 @@ static void nbd_trip(void *opaque)
     }
     command = request.type & NBD_CMD_MASK_COMMAND;
     if (command != NBD_CMD_DISC && (request.from + request.len) > exp->size) {
-            LOG("From: %" PRIu64 ", Len: %u, Size: %" PRIu64
-            ", Offset: %" PRIu64 "\n",
-                    request.from, request.len,
-                    (uint64_t)exp->size, (uint64_t)exp->dev_offset);
+            LOG("From: %" PRIu64 ", Len: %" PRIu32", Size: %" PRIu64
+                ", Offset: %" PRIu64 "\n",
+                request.from, request.len,
+                (uint64_t)exp->size, (uint64_t)exp->dev_offset);
         LOG("requested operation past EOF--bad client?");
         goto invalid_request;
     }
@@ -1099,7 +1101,7 @@ static void nbd_trip(void *opaque)
             goto error_reply;
         }

-        TRACE("Read %u byte(s)", request.len);
+        TRACE("Read %" PRIu32" byte(s)", request.len);
         if (nbd_co_send_reply(req, &reply, request.len) < 0)
             goto out;
         break;
@@ -1165,7 +1167,7 @@ static void nbd_trip(void *opaque)
         }
         break;
     default:
-        LOG("invalid request type (%u) received", request.type);
+        LOG("invalid request type (%" PRIu32 ") received", request.type);
     invalid_request:
         reply.error = EINVAL;
     error_reply:
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 02/44] nbd: Quit server after any write error
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 01/44] nbd: More debug typo fixes, use correct formats Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25  9:21   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 03/44] nbd: Improve server handling of bogus commands Eric Blake
                   ` (41 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

We should never ignore failure from nbd_negotiate_send_rep(); if
we are unable to write to the client, then it is not worth trying
to continue the negotiation.  Fortunately, the problem is not
too severe - chances are that the errors being ignored here (mainly
inability to write the reply to the client) are indications of
a closed connection or something similar, which will also affect
the next attempt to interact with the client and eventually reach
a point where the errors are detected to end the loop.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index fc36f4d..0a003e4 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -334,7 +334,10 @@ static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
         return NULL;
     }

-    nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, NBD_OPT_STARTTLS);
+    if (nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
+                               NBD_OPT_STARTTLS) < 0) {
+        return NULL;
+    }

     tioc = qio_channel_tls_new_server(ioc,
                                       client->tlscreds,
@@ -460,8 +463,11 @@ static int nbd_negotiate_options(NBDClient *client)
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
                 }
-                nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_TLS_REQD,
-                                       clientflags);
+                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_TLS_REQD,
+                                             clientflags);
+                if (ret < 0) {
+                    return ret;
+                }
                 break;
             }
         } else if (fixedNewstyle) {
@@ -485,12 +491,17 @@ static int nbd_negotiate_options(NBDClient *client)
                 }
                 if (client->tlscreds) {
                     TRACE("TLS already enabled");
-                    nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_INVALID,
-                                           clientflags);
+                    ret = nbd_negotiate_send_rep(client->ioc,
+                                                 NBD_REP_ERR_INVALID,
+                                                 clientflags);
                 } else {
                     TRACE("TLS not configured");
-                    nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_POLICY,
-                                           clientflags);
+                    ret = nbd_negotiate_send_rep(client->ioc,
+                                                 NBD_REP_ERR_POLICY,
+                                                 clientflags);
+                }
+                if (ret < 0) {
+                    return ret;
                 }
                 break;
             default:
@@ -498,8 +509,11 @@ static int nbd_negotiate_options(NBDClient *client)
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
                 }
-                nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_UNSUP,
-                                       clientflags);
+                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_UNSUP,
+                                             clientflags);
+                if (ret < 0) {
+                    return ret;
+                }
                 break;
             }
         } else {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 03/44] nbd: Improve server handling of bogus commands
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 01/44] nbd: More debug typo fixes, use correct formats Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 02/44] nbd: Quit server after any write error Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25  9:29   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 04/44] nbd: Reject unknown request flags Eric Blake
                   ` (40 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

We have a few bugs in how we handle invalid client commands:

- A client can send an NBD_CMD_DISC where from + len overflows,
convincing us to reply with an error and stay connected, even
though the protocol requires us to silently disconnect. Fix by
hoisting the special case sooner.

- A client can send an NBD_CMD_WRITE with bogus from and len,
where we reply to the client with EINVAL without consuming the
payload; this will normally cause us to fail if the next thing
read is not the right magic, but in rare cases, could cause us
to interpret the data payload as valid commands and do things
not requested by the client. Fix by adding a complete flag to
track whether we are in sync or must disconnect.

- If we report an error to NBD_CMD_READ, we are not writing out
any data payload; but the protocol says that a client can expect
to read the payload no matter what (and must instead ignore it),
which means the client will start reading our next replies as
its data payload. Fix by disconnecting.

Furthermore, we have split the checks for bogus from/len across
two functions, when it is easier to do it all at once.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 67 +++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 46 insertions(+), 21 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 0a003e4..6a6b5a2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -52,6 +52,7 @@ struct NBDRequest {
     QSIMPLEQ_ENTRY(NBDRequest) entry;
     NBDClient *client;
     uint8_t *data;
+    bool complete;
 };

 struct NBDExport {
@@ -985,7 +986,13 @@ static ssize_t nbd_co_send_reply(NBDRequest *req, struct nbd_reply *reply,
     return rc;
 }

-static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *request)
+/* Collect a client request.  Return 0 if request looks valid, -EAGAIN
+ * to keep trying the collection, -EIO to drop connection right away,
+ * and any other negative value to report an error to the client
+ * (although the caller may still need to disconnect after reporting
+ * the error).  */
+static ssize_t nbd_co_receive_request(NBDRequest *req,
+                                      struct nbd_request *request)
 {
     NBDClient *client = req->client;
     uint32_t command;
@@ -1003,16 +1010,26 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
         goto out;
     }

-    if ((request->from + request->len) < request->from) {
-        LOG("integer overflow detected! "
-            "you're probably being attacked");
-        rc = -EINVAL;
-        goto out;
-    }
-
     TRACE("Decoding type");

     command = request->type & NBD_CMD_MASK_COMMAND;
+    if (command == NBD_CMD_DISC) {
+        /* Special case: we're going to disconnect without a reply,
+         * whether or not flags, from, or len are bogus */
+        TRACE("Request type is DISCONNECT");
+        rc = -EIO;
+        goto out;
+    }
+
+    /* Check for sanity in the parameters, part 1.  Defer as many
+     * checks as possible until after reading any NBD_CMD_WRITE
+     * payload, so we can try and keep the connection alive.  */
+    if ((request->from + request->len) < request->from) {
+        LOG("integer overflow detected, you're probably being attacked");
+        rc = -EINVAL;
+        goto out;
+    }
+
     if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
             LOG("len (%" PRIu32" ) is larger than max len (%u)",
@@ -1035,7 +1052,18 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
             rc = -EIO;
             goto out;
         }
+        req->complete = true;
     }
+
+    /* Sanity checks, part 2. */
+    if (request->from + request->len > client->exp->size) {
+        LOG("operation past EOF; From: %" PRIu64 ", Len: %" PRIu32
+            ", Size: %" PRIu64, request->from, request->len,
+            (uint64_t)client->exp->size);
+        rc = -EINVAL;
+        goto out;
+    }
+
     rc = 0;

 out:
@@ -1077,14 +1105,6 @@ static void nbd_trip(void *opaque)
         goto error_reply;
     }
     command = request.type & NBD_CMD_MASK_COMMAND;
-    if (command != NBD_CMD_DISC && (request.from + request.len) > exp->size) {
-            LOG("From: %" PRIu64 ", Len: %" PRIu32", Size: %" PRIu64
-                ", Offset: %" PRIu64 "\n",
-                request.from, request.len,
-                (uint64_t)exp->size, (uint64_t)exp->dev_offset);
-        LOG("requested operation past EOF--bad client?");
-        goto invalid_request;
-    }

     if (client->closing) {
         /*
@@ -1151,10 +1171,11 @@ static void nbd_trip(void *opaque)
             goto out;
         }
         break;
+
     case NBD_CMD_DISC:
-        TRACE("Request type is DISCONNECT");
-        errno = 0;
-        goto out;
+        /* unreachable, thanks to special case in nbd_co_receive_request() */
+        abort();
+
     case NBD_CMD_FLUSH:
         TRACE("Request type is FLUSH");

@@ -1182,10 +1203,14 @@ static void nbd_trip(void *opaque)
         break;
     default:
         LOG("invalid request type (%" PRIu32 ") received", request.type);
-    invalid_request:
         reply.error = EINVAL;
     error_reply:
-        if (nbd_co_send_reply(req, &reply, 0) < 0) {
+        /* We must disconnect after replying with an error to
+         * NBD_CMD_READ, since we choose not to send bogus filler
+         * data; likewise after NBD_CMD_WRITE if we did not read the
+         * payload. */
+        if (nbd_co_send_reply(req, &reply, 0) < 0 || command == NBD_CMD_READ ||
+            (command == NBD_CMD_WRITE && !req->complete)) {
             goto out;
         }
         break;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 04/44] nbd: Reject unknown request flags
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (2 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 03/44] nbd: Improve server handling of bogus commands Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 05/44] nbd: Group all Linux-specific ioctl code in one place Eric Blake
                   ` (39 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

The NBD protocol says that clients should not send a command flag
that has not been negotiated (whether by the client requesting an
option during a handshake, or because we advertise support for the
flag in response to NBD_OPT_EXPORT_NAME), and that servers should
reject invalid flags with EINVAL.  We were silently ignoring the
flags instead.  The client can't rely on our behavior, since it is
their fault for passing the bad flag in the first place, but it's
better to be robust up front than to possibly behave differently
than the client was expecting with the attempted flag.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

---
v3: reorder in series, defer check until after NBD_CMD_WRITE payload
is consumed
---
 nbd/server.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/nbd/server.c b/nbd/server.c
index 6a6b5a2..731e5f4 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1063,6 +1063,11 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
         rc = -EINVAL;
         goto out;
     }
+    if (request->type & ~NBD_CMD_MASK_COMMAND & ~NBD_CMD_FLAG_FUA) {
+        LOG("unsupported flags (got 0x%x)",
+            request->type & ~NBD_CMD_MASK_COMMAND);
+        return -EINVAL;
+    }

     rc = 0;

-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 05/44] nbd: Group all Linux-specific ioctl code in one place
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (3 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 04/44] nbd: Reject unknown request flags Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 06/44] nbd: Clean up ioctl handling of qemu-nbd -c Eric Blake
                   ` (38 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

NBD ioctl()s are used to manage an NBD client session where
initial handshake is done in userspace, but then the transmission
phase is handed off to the kernel through a /dev/nbdX device.
As such, all ioctls sent to the kernel on the /dev/nbdX fd belong
in client.c; nbd_disconnect() was out-of-place in server.c.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 13 +++++++++++++
 nbd/server.c | 18 ------------------
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 42e4e52..ae9fdd4 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -667,6 +667,15 @@ int nbd_client(int fd)
     errno = serrno;
     return ret;
 }
+
+int nbd_disconnect(int fd)
+{
+    ioctl(fd, NBD_CLEAR_QUE);
+    ioctl(fd, NBD_DISCONNECT);
+    ioctl(fd, NBD_CLEAR_SOCK);
+    return 0;
+}
+
 #else
 int nbd_init(int fd, QIOChannelSocket *ioc, uint32_t flags, off_t size)
 {
@@ -677,6 +686,10 @@ int nbd_client(int fd)
 {
     return -ENOTSUP;
 }
+int nbd_disconnect(int fd)
+{
+    return -ENOTSUP;
+}
 #endif

 ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request)
diff --git a/nbd/server.c b/nbd/server.c
index 731e5f4..789189d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -620,24 +620,6 @@ fail:
     return rc;
 }

-#ifdef __linux__
-
-int nbd_disconnect(int fd)
-{
-    ioctl(fd, NBD_CLEAR_QUE);
-    ioctl(fd, NBD_DISCONNECT);
-    ioctl(fd, NBD_CLEAR_SOCK);
-    return 0;
-}
-
-#else
-
-int nbd_disconnect(int fd)
-{
-    return -ENOTSUP;
-}
-#endif
-
 static ssize_t nbd_receive_request(QIOChannel *ioc, struct nbd_request *request)
 {
     uint8_t buf[NBD_REQUEST_SIZE];
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 06/44] nbd: Clean up ioctl handling of qemu-nbd -c
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (4 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 05/44] nbd: Group all Linux-specific ioctl code in one place Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 07/44] nbd: Limit nbdflags to 16 bits Eric Blake
                   ` (37 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

The kernel ioctl() interface into NBD is limited to 'unsigned long';
we MUST pass in input with that type (and not int or size_t, as
there may be platform ABIs where the wrong types promote incorrectly
through var-args).  Furthermore, on 32-bit platforms, the kernel
is limited to a maximum export size of 2T (our BLKSIZE of 512 times
a SIZE_BLOCKS constrained by 32 bit unsigned long).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index ae9fdd4..f1afa49 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -593,9 +593,15 @@ fail:
 #ifdef __linux__
 int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size)
 {
+    unsigned long sectors = size / BDRV_SECTOR_SIZE;
+    if (size / BDRV_SECTOR_SIZE != sectors) {
+        LOG("Export size %lld too large for 32-bit kernel", (long long) size);
+        return -E2BIG;
+    }
+
     TRACE("Setting NBD socket");

-    if (ioctl(fd, NBD_SET_SOCK, sioc->fd) < 0) {
+    if (ioctl(fd, NBD_SET_SOCK, (unsigned long) sioc->fd) < 0) {
         int serrno = errno;
         LOG("Failed to set NBD socket");
         return -serrno;
@@ -603,21 +609,25 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size)

     TRACE("Setting block size to %lu", (unsigned long)BDRV_SECTOR_SIZE);

-    if (ioctl(fd, NBD_SET_BLKSIZE, (size_t)BDRV_SECTOR_SIZE) < 0) {
+    if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) {
         int serrno = errno;
         LOG("Failed setting NBD block size");
         return -serrno;
     }

-    TRACE("Setting size to %zd block(s)", (size_t)(size / BDRV_SECTOR_SIZE));
+    TRACE("Setting size to %lu block(s)", sectors);
+    if (size % BDRV_SECTOR_SIZE) {
+        TRACE("Ignoring trailing %d bytes of export",
+              (int) (size % BDRV_SECTOR_SIZE));
+    }

-    if (ioctl(fd, NBD_SET_SIZE_BLOCKS, (size_t)(size / BDRV_SECTOR_SIZE)) < 0) {
+    if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) {
         int serrno = errno;
         LOG("Failed setting size (in blocks)");
         return -serrno;
     }

-    if (ioctl(fd, NBD_SET_FLAGS, flags) < 0) {
+    if (ioctl(fd, NBD_SET_FLAGS, (unsigned long) flags) < 0) {
         if (errno == ENOTTY) {
             int read_only = (flags & NBD_FLAG_READ_ONLY) != 0;
             TRACE("Setting readonly attribute");
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 07/44] nbd: Limit nbdflags to 16 bits
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (5 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 06/44] nbd: Clean up ioctl handling of qemu-nbd -c Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25  9:24   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 08/44] nbd: Add qemu-nbd -D for human-readable description Eric Blake
                   ` (36 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

Rather than asserting that nbdflags is within range, just give
it the correct type to begin with :)  nbdflags corresponds to
the per-export portion of NBD Protocol "transmission flags", which
is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.

Furthermore, upstream NBD has never passed the global flags to
the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
tried to OR the global flags with the transmission flags, with
the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
caused all earlier NBD 3.x clients to treat every export as
read-only; NBD 3.10 and later intentionally clip things to 16
bits to pass only transmission flags).  Qemu should follow suit,
since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
during transmission.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: expand scope of patch
---
 block/nbd-client.h  |  2 +-
 include/block/nbd.h |  6 +++---
 nbd/client.c        | 28 +++++++++++++++-------------
 nbd/server.c        | 10 ++++------
 qemu-nbd.c          |  4 ++--
 5 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index bc7aec0..1243612 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -20,7 +20,7 @@
 typedef struct NbdClientSession {
     QIOChannelSocket *sioc; /* The master data channel */
     QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
-    uint32_t nbdflags;
+    uint16_t nbdflags;
     off_t size;

     CoMutex send_mutex;
diff --git a/include/block/nbd.h b/include/block/nbd.h
index b86a976..134f117 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -83,11 +83,11 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
                      size_t offset,
                      size_t length,
                      bool do_read);
-int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
+int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
                           QIOChannel **outioc,
                           off_t *size, Error **errp);
-int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size);
+int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
 ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request);
 ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply);
 int nbd_client(int fd);
@@ -97,7 +97,7 @@ typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;

 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
-                          uint32_t nbdflags, void (*close)(NBDExport *),
+                          uint16_t nbdflags, void (*close)(NBDExport *),
                           Error **errp);
 void nbd_export_close(NBDExport *exp);
 void nbd_export_get(NBDExport *exp);
diff --git a/nbd/client.c b/nbd/client.c
index f1afa49..937344c 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -406,7 +406,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
 }


-int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
+int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
                           QIOChannel **outioc,
                           off_t *size, Error **errp)
@@ -466,7 +466,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         uint32_t opt;
         uint32_t namesize;
         uint16_t globalflags;
-        uint16_t exportflags;
         bool fixedNewStyle = false;

         if (read_sync(ioc, &globalflags, sizeof(globalflags)) !=
@@ -475,7 +474,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
             goto fail;
         }
         globalflags = be16_to_cpu(globalflags);
-        *flags = globalflags << 16;
         TRACE("Global flags are %" PRIx32, globalflags);
         if (globalflags & NBD_FLAG_FIXED_NEWSTYLE) {
             fixedNewStyle = true;
@@ -543,17 +541,15 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
             goto fail;
         }
         *size = be64_to_cpu(s);
-        TRACE("Size is %" PRIu64, *size);

-        if (read_sync(ioc, &exportflags, sizeof(exportflags)) !=
-            sizeof(exportflags)) {
+        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
             error_setg(errp, "Failed to read export flags");
             goto fail;
         }
-        exportflags = be16_to_cpu(exportflags);
-        *flags |= exportflags;
-        TRACE("Export flags are %" PRIx16, exportflags);
+        be16_to_cpus(flags);
     } else if (magic == NBD_CLIENT_MAGIC) {
+        uint32_t oldflags;
+
         if (name) {
             error_setg(errp, "Server does not support export names");
             goto fail;
@@ -570,16 +566,22 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         *size = be64_to_cpu(s);
         TRACE("Size is %" PRIu64, *size);

-        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
+        if (read_sync(ioc, &oldflags, sizeof(oldflags)) != sizeof(oldflags)) {
             error_setg(errp, "Failed to read export flags");
             goto fail;
         }
-        *flags = be32_to_cpup(flags);
+        be32_to_cpus(&oldflags);
+        if (oldflags & ~0xffff) {
+            error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
+            goto fail;
+        }
+        *flags = oldflags;
     } else {
         error_setg(errp, "Bad magic received");
         goto fail;
     }

+    TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
     if (read_sync(ioc, &buf, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
@@ -591,7 +593,7 @@ fail:
 }

 #ifdef __linux__
-int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
 {
     unsigned long sectors = size / BDRV_SECTOR_SIZE;
     if (size / BDRV_SECTOR_SIZE != sectors) {
@@ -687,7 +689,7 @@ int nbd_disconnect(int fd)
 }

 #else
-int nbd_init(int fd, QIOChannelSocket *ioc, uint32_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size)
 {
     return -ENOTSUP;
 }
diff --git a/nbd/server.c b/nbd/server.c
index 789189d..31fc9cf 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -63,7 +63,7 @@ struct NBDExport {
     char *name;
     off_t dev_offset;
     off_t size;
-    uint32_t nbdflags;
+    uint16_t nbdflags;
     QTAILQ_HEAD(, NBDClient) clients;
     QTAILQ_ENTRY(NBDExport) next;

@@ -544,8 +544,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     NBDClient *client = data->client;
     char buf[8 + 8 + 8 + 128];
     int rc;
-    const int myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
-                         NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
+    const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
+                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
     bool oldStyle;

     /* Old style negotiation header without options
@@ -575,7 +575,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)

     oldStyle = client->exp != NULL && !client->tlscreds;
     if (oldStyle) {
-        assert ((client->exp->nbdflags & ~65535) == 0);
         stq_be_p(buf + 8, NBD_CLIENT_MAGIC);
         stq_be_p(buf + 16, client->exp->size);
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
@@ -604,7 +603,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
             goto fail;
         }

-        assert ((client->exp->nbdflags & ~65535) == 0);
         stq_be_p(buf + 18, client->exp->size);
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
         if (nbd_negotiate_write(client->ioc, buf + 18, sizeof(buf) - 18) !=
@@ -806,7 +804,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
 }

 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
-                          uint32_t nbdflags, void (*close)(NBDExport *),
+                          uint16_t nbdflags, void (*close)(NBDExport *),
                           Error **errp)
 {
     NBDExport *exp = g_malloc0(sizeof(NBDExport));
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 2c9754e..71bfdeb 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -241,7 +241,7 @@ static void *nbd_client_thread(void *arg)
 {
     char *device = arg;
     off_t size;
-    uint32_t nbdflags;
+    uint16_t nbdflags;
     QIOChannelSocket *sioc;
     int fd;
     int ret;
@@ -455,7 +455,7 @@ int main(int argc, char **argv)
     BlockBackend *blk;
     BlockDriverState *bs;
     off_t dev_offset = 0;
-    uint32_t nbdflags = 0;
+    uint16_t nbdflags = 0;
     bool disconnect = false;
     const char *bindto = "0.0.0.0";
     const char *port = NULL;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 08/44] nbd: Add qemu-nbd -D for human-readable description
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (6 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 07/44] nbd: Limit nbdflags to 16 bits Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25  9:25   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 09/44] block: Allow BDRV_REQ_FUA through blk_pwrite() Eric Blake
                   ` (35 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

The NBD protocol allows servers to advertise a human-readable
description alongside an export name during NBD_OPT_LIST.  Add
an option to pass through the user's string to the NBD client.

Doing this also makes it easier to test commit 200650d4, which
is the client counterpart of receiving the description.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  1 +
 nbd/nbd-internal.h  |  5 +++--
 nbd/server.c        | 34 ++++++++++++++++++++++++++--------
 qemu-nbd.c          | 12 +++++++++++-
 qemu-nbd.texi       |  5 ++++-
 5 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 134f117..3e2d76b 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -107,6 +107,7 @@ BlockBackend *nbd_export_get_blockdev(NBDExport *exp);

 NBDExport *nbd_export_find(const char *name);
 void nbd_export_set_name(NBDExport *exp, const char *name);
+void nbd_export_set_description(NBDExport *exp, const char *description);
 void nbd_export_close_all(void);

 void nbd_client_new(NBDExport *exp,
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 3791535..035ead4 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -103,9 +103,10 @@ static inline ssize_t read_sync(QIOChannel *ioc, void *buffer, size_t size)
     return nbd_wr_syncv(ioc, &iov, 1, 0, size, true);
 }

-static inline ssize_t write_sync(QIOChannel *ioc, void *buffer, size_t size)
+static inline ssize_t write_sync(QIOChannel *ioc, const void *buffer,
+                                 size_t size)
 {
-    struct iovec iov = { .iov_base = buffer, .iov_len = size };
+    struct iovec iov = { .iov_base = (void *) buffer, .iov_len = size };

     return nbd_wr_syncv(ioc, &iov, 1, 0, size, false);
 }
diff --git a/nbd/server.c b/nbd/server.c
index 31fc9cf..aa252a4 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -61,6 +61,7 @@ struct NBDExport {

     BlockBackend *blk;
     char *name;
+    char *description;
     off_t dev_offset;
     off_t size;
     uint16_t nbdflags;
@@ -128,7 +129,8 @@ static ssize_t nbd_negotiate_read(QIOChannel *ioc, void *buffer, size_t size)

 }

-static ssize_t nbd_negotiate_write(QIOChannel *ioc, void *buffer, size_t size)
+static ssize_t nbd_negotiate_write(QIOChannel *ioc, const void *buffer,
+                                   size_t size)
 {
     ssize_t ret;
     guint watch;
@@ -224,11 +226,15 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)

 static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
 {
-    uint64_t magic, name_len;
+    uint64_t magic;
+    size_t name_len, desc_len;
     uint32_t opt, type, len;
+    const char *name = exp->name ? exp->name : "";
+    const char *desc = exp->description ? exp->description : "";

-    TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
-    name_len = strlen(exp->name);
+    TRACE("Advertising export name '%s' description '%s'", name, desc);
+    name_len = strlen(name);
+    desc_len = strlen(desc);
     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
         LOG("write failed (magic)");
@@ -244,18 +250,22 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
         LOG("write failed (reply type)");
         return -EINVAL;
     }
-    len = cpu_to_be32(name_len + sizeof(len));
+    len = cpu_to_be32(name_len + desc_len + sizeof(len));
     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
         LOG("write failed (length)");
         return -EINVAL;
     }
     len = cpu_to_be32(name_len);
     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
-        LOG("write failed (length)");
+        LOG("write failed (name length)");
         return -EINVAL;
     }
-    if (nbd_negotiate_write(ioc, exp->name, name_len) != name_len) {
-        LOG("write failed (buffer)");
+    if (nbd_negotiate_write(ioc, name, name_len) != name_len) {
+        LOG("write failed (name buffer)");
+        return -EINVAL;
+    }
+    if (nbd_negotiate_write(ioc, desc, desc_len) != desc_len) {
+        LOG("write failed (description buffer)");
         return -EINVAL;
     }
     return 0;
@@ -877,6 +887,12 @@ void nbd_export_set_name(NBDExport *exp, const char *name)
     nbd_export_put(exp);
 }

+void nbd_export_set_description(NBDExport *exp, const char *description)
+{
+    g_free(exp->description);
+    exp->description = g_strdup(description);
+}
+
 void nbd_export_close(NBDExport *exp)
 {
     NBDClient *client, *next;
@@ -886,6 +902,7 @@ void nbd_export_close(NBDExport *exp)
         client_close(client);
     }
     nbd_export_set_name(exp, NULL);
+    nbd_export_set_description(exp, NULL);
     nbd_export_put(exp);
 }

@@ -904,6 +921,7 @@ void nbd_export_put(NBDExport *exp)

     if (--exp->refcount == 0) {
         assert(exp->name == NULL);
+        assert(exp->description == NULL);

         if (exp->close) {
             exp->close(exp);
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 71bfdeb..a85e98f 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -77,6 +77,7 @@ static void usage(const char *name)
 "  -t, --persistent          don't exit on the last connection\n"
 "  -v, --verbose             display extra debugging information\n"
 "  -x, --export-name=NAME    expose export by name\n"
+"  -D, --description=TEXT    with -x, also export a human-readable description\n"
 "\n"
 "Exposing part of the image:\n"
 "  -o, --offset=OFFSET       offset into the image\n"
@@ -464,7 +465,7 @@ int main(int argc, char **argv)
     off_t fd_size;
     QemuOpts *sn_opts = NULL;
     const char *sn_id_or_name = NULL;
-    const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:";
+    const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:D:";
     struct option lopt[] = {
         { "help", no_argument, NULL, 'h' },
         { "version", no_argument, NULL, 'V' },
@@ -490,6 +491,7 @@ int main(int argc, char **argv)
         { "verbose", no_argument, NULL, 'v' },
         { "object", required_argument, NULL, QEMU_NBD_OPT_OBJECT },
         { "export-name", required_argument, NULL, 'x' },
+        { "description", required_argument, NULL, 'D' },
         { "tls-creds", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
         { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
         { NULL, 0, NULL, 0 }
@@ -509,6 +511,7 @@ int main(int argc, char **argv)
     BlockdevDetectZeroesOptions detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
     QDict *options = NULL;
     const char *export_name = NULL;
+    const char *export_description = NULL;
     const char *tlscredsid = NULL;
     bool imageOpts = false;
     bool writethrough = true;
@@ -672,6 +675,9 @@ int main(int argc, char **argv)
         case 'x':
             export_name = optarg;
             break;
+        case 'D':
+            export_description = optarg;
+            break;
         case 'v':
             verbose = 1;
             break;
@@ -899,7 +905,11 @@ int main(int argc, char **argv)
     }
     if (export_name) {
         nbd_export_set_name(exp, export_name);
+        nbd_export_set_description(exp, export_description);
         newproto = true;
+    } else if (export_description) {
+        error_report("Export description requires an export name");
+        exit(EXIT_FAILURE);
     }

     server_ioc = qio_channel_socket_new();
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 9f23343..923de74 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -79,9 +79,12 @@ Disconnect the device @var{dev}
 Allow up to @var{num} clients to share the device (default @samp{1})
 @item -t, --persistent
 Don't exit on the last connection
-@item -x NAME, --export-name=NAME
+@item -x, --export-name=@var{name}
 Set the NBD volume export name. This switches the server to use
 the new style NBD protocol negotiation
+@item -D, --description=@var{description}
+Set the NBD volume export description, as a human-readable
+string. Requires the use of @option{-x}
 @item --tls-creds=ID
 Enable mandatory TLS encryption for the server by setting the ID
 of the TLS credentials object previously created with the --object
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 09/44] block: Allow BDRV_REQ_FUA through blk_pwrite()
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (7 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 08/44] nbd: Add qemu-nbd -D for human-readable description Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-23  8:12   ` Denis V. Lunev
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 10/44] fdc: Switch to byte-based block access Eric Blake
                   ` (34 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, alex, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Denis V. Lunev, Hitoshi Mitake, Liu Yuan, Jeff Cody, Stefan Weil,
	Fam Zheng, David Gibson, Alexander Graf, Paolo Bonzini,
	open list:Sheepdog, open list:sPAPR

We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.

This patch just wires up a flags argument; a followup patch
will actually make use of it in the NBD driver and in qemu-io.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/sysemu/block-backend.h |  3 ++-
 block/block-backend.c          |  6 ++++--
 block/crypto.c                 |  2 +-
 block/parallels.c              |  2 +-
 block/qcow.c                   |  8 ++++----
 block/qcow2.c                  |  4 ++--
 block/qed.c                    |  6 +++---
 block/sheepdog.c               |  2 +-
 block/vdi.c                    |  4 ++--
 block/vhdx.c                   |  5 +++--
 block/vmdk.c                   | 10 +++++-----
 block/vpc.c                    | 10 +++++-----
 hw/nvram/spapr_nvram.c         |  4 ++--
 nbd/server.c                   |  2 +-
 qemu-io-cmds.c                 |  2 +-
 15 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index c62b6fe..6991b26 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -102,7 +102,8 @@ BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
                                  int nb_sectors, BdrvRequestFlags flags,
                                  BlockCompletionFunc *cb, void *opaque);
 int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count);
-int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count);
+int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count,
+               BdrvRequestFlags flags);
 int64_t blk_getlength(BlockBackend *blk);
 void blk_get_geometry(BlockBackend *blk, uint64_t *nb_sectors_ptr);
 int64_t blk_nb_sectors(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index 16c9d5e..4551865 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -955,9 +955,11 @@ int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count)
     return count;
 }

-int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count)
+int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count,
+               BdrvRequestFlags flags)
 {
-    int ret = blk_prw(blk, offset, (void*) buf, count, blk_write_entry, 0);
+    int ret = blk_prw(blk, offset, (void *) buf, count, blk_write_entry,
+                      flags);
     if (ret < 0) {
         return ret;
     }
diff --git a/block/crypto.c b/block/crypto.c
index 1903e84..32ba17c 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -91,7 +91,7 @@ static ssize_t block_crypto_write_func(QCryptoBlock *block,
     struct BlockCryptoCreateData *data = opaque;
     ssize_t ret;

-    ret = blk_pwrite(data->blk, offset, buf, buflen);
+    ret = blk_pwrite(data->blk, offset, buf, buflen, 0);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Could not write encryption header");
         return ret;
diff --git a/block/parallels.c b/block/parallels.c
index 324ed43..2d8bc87 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -512,7 +512,7 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
     memset(tmp, 0, sizeof(tmp));
     memcpy(tmp, &header, sizeof(header));

-    ret = blk_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE);
+    ret = blk_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE, 0);
     if (ret < 0) {
         goto exit;
     }
diff --git a/block/qcow.c b/block/qcow.c
index 60ddb12..d6dc1b0 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -853,14 +853,14 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
     }

     /* write all the data */
-    ret = blk_pwrite(qcow_blk, 0, &header, sizeof(header));
+    ret = blk_pwrite(qcow_blk, 0, &header, sizeof(header), 0);
     if (ret != sizeof(header)) {
         goto exit;
     }

     if (backing_file) {
         ret = blk_pwrite(qcow_blk, sizeof(header),
-            backing_file, backing_filename_len);
+                         backing_file, backing_filename_len, 0);
         if (ret != backing_filename_len) {
             goto exit;
         }
@@ -869,8 +869,8 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
     tmp = g_malloc0(BDRV_SECTOR_SIZE);
     for (i = 0; i < ((sizeof(uint64_t)*l1_size + BDRV_SECTOR_SIZE - 1)/
         BDRV_SECTOR_SIZE); i++) {
-        ret = blk_pwrite(qcow_blk, header_size +
-            BDRV_SECTOR_SIZE*i, tmp, BDRV_SECTOR_SIZE);
+        ret = blk_pwrite(qcow_blk, header_size + BDRV_SECTOR_SIZE * i,
+                         tmp, BDRV_SECTOR_SIZE, 0);
         if (ret != BDRV_SECTOR_SIZE) {
             g_free(tmp);
             goto exit;
diff --git a/block/qcow2.c b/block/qcow2.c
index 470734b..3090538 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2207,7 +2207,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
             cpu_to_be64(QCOW2_COMPAT_LAZY_REFCOUNTS);
     }

-    ret = blk_pwrite(blk, 0, header, cluster_size);
+    ret = blk_pwrite(blk, 0, header, cluster_size, 0);
     g_free(header);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Could not write qcow2 header");
@@ -2217,7 +2217,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
     /* Write a refcount table with one refcount block */
     refcount_table = g_malloc0(2 * cluster_size);
     refcount_table[0] = cpu_to_be64(2 * cluster_size);
-    ret = blk_pwrite(blk, cluster_size, refcount_table, 2 * cluster_size);
+    ret = blk_pwrite(blk, cluster_size, refcount_table, 2 * cluster_size, 0);
     g_free(refcount_table);

     if (ret < 0) {
diff --git a/block/qed.c b/block/qed.c
index 0af5274..6cfd4c1 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -601,18 +601,18 @@ static int qed_create(const char *filename, uint32_t cluster_size,
     }

     qed_header_cpu_to_le(&header, &le_header);
-    ret = blk_pwrite(blk, 0, &le_header, sizeof(le_header));
+    ret = blk_pwrite(blk, 0, &le_header, sizeof(le_header), 0);
     if (ret < 0) {
         goto out;
     }
     ret = blk_pwrite(blk, sizeof(le_header), backing_file,
-                     header.backing_filename_size);
+                     header.backing_filename_size, 0);
     if (ret < 0) {
         goto out;
     }

     l1_table = g_malloc0(l1_size);
-    ret = blk_pwrite(blk, header.l1_table_offset, l1_table, l1_size);
+    ret = blk_pwrite(blk, header.l1_table_offset, l1_table, l1_size, 0);
     if (ret < 0) {
         goto out;
     }
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 33e0a33..625f876 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1678,7 +1678,7 @@ static int sd_prealloc(const char *filename, Error **errp)
         if (ret < 0) {
             goto out;
         }
-        ret = blk_pwrite(blk, idx * buf_size, buf, buf_size);
+        ret = blk_pwrite(blk, idx * buf_size, buf, buf_size, 0);
         if (ret < 0) {
             goto out;
         }
diff --git a/block/vdi.c b/block/vdi.c
index 75d4819..12ab3a6 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -808,7 +808,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
     vdi_header_print(&header);
 #endif
     vdi_header_to_le(&header);
-    ret = blk_pwrite(blk, offset, &header, sizeof(header));
+    ret = blk_pwrite(blk, offset, &header, sizeof(header), 0);
     if (ret < 0) {
         error_setg(errp, "Error writing header to %s", filename);
         goto exit;
@@ -829,7 +829,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
                 bmap[i] = VDI_UNALLOCATED;
             }
         }
-        ret = blk_pwrite(blk, offset, bmap, bmap_size);
+        ret = blk_pwrite(blk, offset, bmap, bmap_size, 0);
         if (ret < 0) {
             error_setg(errp, "Error writing bmap to %s", filename);
             goto exit;
diff --git a/block/vhdx.c b/block/vhdx.c
index 2b7b332..ec778fe 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1856,13 +1856,14 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
     creator = g_utf8_to_utf16("QEMU v" QEMU_VERSION, -1, NULL,
                               &creator_items, NULL);
     signature = cpu_to_le64(VHDX_FILE_SIGNATURE);
-    ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, &signature, sizeof(signature));
+    ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, &signature, sizeof(signature),
+                     0);
     if (ret < 0) {
         goto delete_and_exit;
     }
     if (creator) {
         ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET + sizeof(signature),
-                         creator, creator_items * sizeof(gunichar2));
+                         creator, creator_items * sizeof(gunichar2), 0);
         if (ret < 0) {
             goto delete_and_exit;
         }
diff --git a/block/vmdk.c b/block/vmdk.c
index 45f9d3c..0cc2011 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1728,12 +1728,12 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
     header.check_bytes[3] = 0xa;

     /* write all the data */
-    ret = blk_pwrite(blk, 0, &magic, sizeof(magic));
+    ret = blk_pwrite(blk, 0, &magic, sizeof(magic), 0);
     if (ret < 0) {
         error_setg(errp, QERR_IO_ERROR);
         goto exit;
     }
-    ret = blk_pwrite(blk, sizeof(magic), &header, sizeof(header));
+    ret = blk_pwrite(blk, sizeof(magic), &header, sizeof(header), 0);
     if (ret < 0) {
         error_setg(errp, QERR_IO_ERROR);
         goto exit;
@@ -1753,7 +1753,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
         gd_buf[i] = cpu_to_le32(tmp);
     }
     ret = blk_pwrite(blk, le64_to_cpu(header.rgd_offset) * BDRV_SECTOR_SIZE,
-                     gd_buf, gd_buf_size);
+                     gd_buf, gd_buf_size, 0);
     if (ret < 0) {
         error_setg(errp, QERR_IO_ERROR);
         goto exit;
@@ -1765,7 +1765,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
         gd_buf[i] = cpu_to_le32(tmp);
     }
     ret = blk_pwrite(blk, le64_to_cpu(header.gd_offset) * BDRV_SECTOR_SIZE,
-                     gd_buf, gd_buf_size);
+                     gd_buf, gd_buf_size, 0);
     if (ret < 0) {
         error_setg(errp, QERR_IO_ERROR);
         goto exit;
@@ -2028,7 +2028,7 @@ static int vmdk_create(const char *filename, QemuOpts *opts, Error **errp)

     blk_set_allow_write_beyond_eof(new_blk, true);

-    ret = blk_pwrite(new_blk, desc_offset, desc, desc_len);
+    ret = blk_pwrite(new_blk, desc_offset, desc, desc_len, 0);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Could not write description");
         goto exit;
diff --git a/block/vpc.c b/block/vpc.c
index 3e2ea69..a55a3e4 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -783,13 +783,13 @@ static int create_dynamic_disk(BlockBackend *blk, uint8_t *buf,
     block_size = 0x200000;
     num_bat_entries = (total_sectors + block_size / 512) / (block_size / 512);

-    ret = blk_pwrite(blk, offset, buf, HEADER_SIZE);
+    ret = blk_pwrite(blk, offset, buf, HEADER_SIZE, 0);
     if (ret < 0) {
         goto fail;
     }

     offset = 1536 + ((num_bat_entries * 4 + 511) & ~511);
-    ret = blk_pwrite(blk, offset, buf, HEADER_SIZE);
+    ret = blk_pwrite(blk, offset, buf, HEADER_SIZE, 0);
     if (ret < 0) {
         goto fail;
     }
@@ -799,7 +799,7 @@ static int create_dynamic_disk(BlockBackend *blk, uint8_t *buf,

     memset(buf, 0xFF, 512);
     for (i = 0; i < (num_bat_entries * 4 + 511) / 512; i++) {
-        ret = blk_pwrite(blk, offset, buf, 512);
+        ret = blk_pwrite(blk, offset, buf, 512, 0);
         if (ret < 0) {
             goto fail;
         }
@@ -826,7 +826,7 @@ static int create_dynamic_disk(BlockBackend *blk, uint8_t *buf,
     /* Write the header */
     offset = 512;

-    ret = blk_pwrite(blk, offset, buf, 1024);
+    ret = blk_pwrite(blk, offset, buf, 1024, 0);
     if (ret < 0) {
         goto fail;
     }
@@ -848,7 +848,7 @@ static int create_fixed_disk(BlockBackend *blk, uint8_t *buf,
         return ret;
     }

-    ret = blk_pwrite(blk, total_size - HEADER_SIZE, buf, HEADER_SIZE);
+    ret = blk_pwrite(blk, total_size - HEADER_SIZE, buf, HEADER_SIZE, 0);
     if (ret < 0) {
         return ret;
     }
diff --git a/hw/nvram/spapr_nvram.c b/hw/nvram/spapr_nvram.c
index 802636e..019f25d 100644
--- a/hw/nvram/spapr_nvram.c
+++ b/hw/nvram/spapr_nvram.c
@@ -124,7 +124,7 @@ static void rtas_nvram_store(PowerPCCPU *cpu, sPAPRMachineState *spapr,

     alen = len;
     if (nvram->blk) {
-        alen = blk_pwrite(nvram->blk, offset, membuf, len);
+        alen = blk_pwrite(nvram->blk, offset, membuf, len, 0);
     }

     assert(nvram->buf);
@@ -190,7 +190,7 @@ static int spapr_nvram_post_load(void *opaque, int version_id)
     sPAPRNVRAM *nvram = VIO_SPAPR_NVRAM(opaque);

     if (nvram->blk) {
-        int alen = blk_pwrite(nvram->blk, 0, nvram->buf, nvram->size);
+        int alen = blk_pwrite(nvram->blk, 0, nvram->buf, nvram->size, 0);

         if (alen < 0) {
             return alen;
diff --git a/nbd/server.c b/nbd/server.c
index aa252a4..9be0a99 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1154,7 +1154,7 @@ static void nbd_trip(void *opaque)
         TRACE("Writing to device");

         ret = blk_pwrite(exp->blk, request.from + exp->dev_offset,
-                        req->data, request.len);
+                         req->data, request.len, 0);
         if (ret < 0) {
             LOG("writing to file failed");
             reply.error = -ret;
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index e34f777..e26e543 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -474,7 +474,7 @@ static int do_pwrite(BlockBackend *blk, char *buf, int64_t offset,
         return -ERANGE;
     }

-    *total = blk_pwrite(blk, offset, (uint8_t *)buf, count);
+    *total = blk_pwrite(blk, offset, (uint8_t *)buf, count, 0);
     if (*total < 0) {
         return *total;
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 10/44] fdc: Switch to byte-based block access
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (8 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 09/44] block: Allow BDRV_REQ_FUA through blk_pwrite() Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 11/44] nand: " Eric Blake
                   ` (33 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, John Snow, Kevin Wolf, Max Reitz

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 hw/block/fdc.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 3722275..f73af7d 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -223,6 +223,13 @@ static int fd_sector(FDrive *drv)
                           NUM_SIDES(drv));
 }

+/* Returns current position, in bytes, for given drive */
+static int fd_offset(FDrive *drv)
+{
+    g_assert(fd_sector(drv) < INT_MAX >> BDRV_SECTOR_BITS);
+    return fd_sector(drv) << BDRV_SECTOR_BITS;
+}
+
 /* Seek to a new position:
  * returns 0 if already on right track
  * returns 1 if track changed
@@ -1629,8 +1636,8 @@ static int fdctrl_transfer_handler (void *opaque, int nchan,
         if (fdctrl->data_dir != FD_DIR_WRITE ||
             len < FD_SECTOR_LEN || rel_pos != 0) {
             /* READ & SCAN commands and realign to a sector for WRITE */
-            if (blk_read(cur_drv->blk, fd_sector(cur_drv),
-                         fdctrl->fifo, 1) < 0) {
+            if (blk_pread(cur_drv->blk, fd_offset(cur_drv),
+                          fdctrl->fifo, BDRV_SECTOR_SIZE) < 0) {
                 FLOPPY_DPRINTF("Floppy: error getting sector %d\n",
                                fd_sector(cur_drv));
                 /* Sure, image size is too small... */
@@ -1657,8 +1664,8 @@ static int fdctrl_transfer_handler (void *opaque, int nchan,

             k->read_memory(fdctrl->dma, nchan, fdctrl->fifo + rel_pos,
                            fdctrl->data_pos, len);
-            if (blk_write(cur_drv->blk, fd_sector(cur_drv),
-                          fdctrl->fifo, 1) < 0) {
+            if (blk_pwrite(cur_drv->blk, fd_offset(cur_drv),
+                           fdctrl->fifo, BDRV_SECTOR_SIZE, 0) < 0) {
                 FLOPPY_DPRINTF("error writing sector %d\n",
                                fd_sector(cur_drv));
                 fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM | FD_SR0_SEEK, 0x00, 0x00);
@@ -1741,7 +1748,8 @@ static uint32_t fdctrl_read_data(FDCtrl *fdctrl)
                                    fd_sector(cur_drv));
                     return 0;
                 }
-            if (blk_read(cur_drv->blk, fd_sector(cur_drv), fdctrl->fifo, 1)
+            if (blk_pread(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+                          BDRV_SECTOR_SIZE)
                 < 0) {
                 FLOPPY_DPRINTF("error getting sector %d\n",
                                fd_sector(cur_drv));
@@ -1820,7 +1828,8 @@ static void fdctrl_format_sector(FDCtrl *fdctrl)
     }
     memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
     if (cur_drv->blk == NULL ||
-        blk_write(cur_drv->blk, fd_sector(cur_drv), fdctrl->fifo, 1) < 0) {
+        blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+                   BDRV_SECTOR_SIZE, 0) < 0) {
         FLOPPY_DPRINTF("error formatting sector %d\n", fd_sector(cur_drv));
         fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM | FD_SR0_SEEK, 0x00, 0x00);
     } else {
@@ -2243,8 +2252,8 @@ static void fdctrl_write_data(FDCtrl *fdctrl, uint32_t value)
         if (pos == FD_SECTOR_LEN - 1 ||
             fdctrl->data_pos == fdctrl->data_len) {
             cur_drv = get_cur_drv(fdctrl);
-            if (blk_write(cur_drv->blk, fd_sector(cur_drv), fdctrl->fifo, 1)
-                < 0) {
+            if (blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+                           BDRV_SECTOR_SIZE, 0) < 0) {
                 FLOPPY_DPRINTF("error writing sector %d\n",
                                fd_sector(cur_drv));
                 break;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 11/44] nand: Switch to byte-based block access
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (9 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 10/44] fdc: Switch to byte-based block access Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 12/44] onenand: " Eric Blake
                   ` (32 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

This file is doing some complex computations to map various
flash page sizes (256, 512, and 2048) atop generic uses of
512-byte sector operations.  Perhaps someone will want to tidy
up the file for fewer gymnastics in managing addresses and
offsets, and less wasteful visits of 256-byte pages, but it
was out of scope for this series, where I just went with the
mechanical conversion.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 hw/block/nand.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/hw/block/nand.c b/hw/block/nand.c
index 29c6596..2703ff4 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -663,7 +663,8 @@ static void glue(nand_blk_write_, PAGE_SIZE)(NANDFlashState *s)
         sector = SECTOR(s->addr);
         off = (s->addr & PAGE_MASK) + s->offset;
         soff = SECTOR_OFFSET(s->addr);
-        if (blk_read(s->blk, sector, iobuf, PAGE_SECTORS) < 0) {
+        if (blk_pread(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+                      PAGE_SECTORS << BDRV_SECTOR_BITS) < 0) {
             printf("%s: read error in sector %" PRIu64 "\n", __func__, sector);
             return;
         }
@@ -675,21 +676,24 @@ static void glue(nand_blk_write_, PAGE_SIZE)(NANDFlashState *s)
                             MIN(OOB_SIZE, off + s->iolen - PAGE_SIZE));
         }

-        if (blk_write(s->blk, sector, iobuf, PAGE_SECTORS) < 0) {
+        if (blk_pwrite(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+                       PAGE_SECTORS << BDRV_SECTOR_BITS, 0) < 0) {
             printf("%s: write error in sector %" PRIu64 "\n", __func__, sector);
         }
     } else {
         off = PAGE_START(s->addr) + (s->addr & PAGE_MASK) + s->offset;
         sector = off >> 9;
         soff = off & 0x1ff;
-        if (blk_read(s->blk, sector, iobuf, PAGE_SECTORS + 2) < 0) {
+        if (blk_pread(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+                     (PAGE_SECTORS + 2) << BDRV_SECTOR_BITS) < 0) {
             printf("%s: read error in sector %" PRIu64 "\n", __func__, sector);
             return;
         }

         mem_and(iobuf + soff, s->io, s->iolen);

-        if (blk_write(s->blk, sector, iobuf, PAGE_SECTORS + 2) < 0) {
+        if (blk_write(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+                      (PAGE_SECTORS + 2) << BDRV_SECTOR_BITS, 0) < 0) {
             printf("%s: write error in sector %" PRIu64 "\n", __func__, sector);
         }
     }
@@ -716,17 +720,20 @@ static void glue(nand_blk_erase_, PAGE_SIZE)(NANDFlashState *s)
         i = SECTOR(addr);
         page = SECTOR(addr + (1 << (ADDR_SHIFT + s->erase_shift)));
         for (; i < page; i ++)
-            if (blk_write(s->blk, i, iobuf, 1) < 0) {
+            if (blk_pwrite(s->blk, i << BDRV_SECTOR_BITS, iobuf,
+                           BDRV_SECTOR_SIZE, 0) < 0) {
                 printf("%s: write error in sector %" PRIu64 "\n", __func__, i);
             }
     } else {
         addr = PAGE_START(addr);
         page = addr >> 9;
-        if (blk_read(s->blk, page, iobuf, 1) < 0) {
+        if (blk_pread(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+                      BDRV_SECTOR_SIZE) < 0) {
             printf("%s: read error in sector %" PRIu64 "\n", __func__, page);
         }
         memset(iobuf + (addr & 0x1ff), 0xff, (~addr & 0x1ff) + 1);
-        if (blk_write(s->blk, page, iobuf, 1) < 0) {
+        if (blk_pwrite(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+                       BDRV_SECTOR_SIZE, 0) < 0) {
             printf("%s: write error in sector %" PRIu64 "\n", __func__, page);
         }

@@ -734,18 +741,20 @@ static void glue(nand_blk_erase_, PAGE_SIZE)(NANDFlashState *s)
         i = (addr & ~0x1ff) + 0x200;
         for (addr += ((PAGE_SIZE + OOB_SIZE) << s->erase_shift) - 0x200;
                         i < addr; i += 0x200) {
-            if (blk_write(s->blk, i >> 9, iobuf, 1) < 0) {
+            if (blk_pwrite(s->blk, i, iobuf, BDRV_SECTOR_SIZE, 0) < 0) {
                 printf("%s: write error in sector %" PRIu64 "\n",
                        __func__, i >> 9);
             }
         }

         page = i >> 9;
-        if (blk_read(s->blk, page, iobuf, 1) < 0) {
+        if (blk_pread(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+                      BDRV_SECTOR_SIZE) < 0) {
             printf("%s: read error in sector %" PRIu64 "\n", __func__, page);
         }
         memset(iobuf, 0xff, ((addr - 1) & 0x1ff) + 1);
-        if (blk_write(s->blk, page, iobuf, 1) < 0) {
+        if (blk_pwrite(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+                       BDRV_SECTOR_SIZE, 0) < 0) {
             printf("%s: write error in sector %" PRIu64 "\n", __func__, page);
         }
     }
@@ -760,7 +769,8 @@ static void glue(nand_blk_load_, PAGE_SIZE)(NANDFlashState *s,

     if (s->blk) {
         if (s->mem_oob) {
-            if (blk_read(s->blk, SECTOR(addr), s->io, PAGE_SECTORS) < 0) {
+            if (blk_pread(s->blk, SECTOR(addr) << BDRV_SECTOR_BITS, s->io,
+                          PAGE_SECTORS << BDRV_SECTOR_BITS) < 0) {
                 printf("%s: read error in sector %" PRIu64 "\n",
                                 __func__, SECTOR(addr));
             }
@@ -769,8 +779,8 @@ static void glue(nand_blk_load_, PAGE_SIZE)(NANDFlashState *s,
                             OOB_SIZE);
             s->ioaddr = s->io + SECTOR_OFFSET(s->addr) + offset;
         } else {
-            if (blk_read(s->blk, PAGE_START(addr) >> 9,
-                         s->io, (PAGE_SECTORS + 2)) < 0) {
+            if (blk_pread(s->blk, PAGE_START(addr), s->io,
+                          (PAGE_SECTORS + 2) << BDRV_SECTOR_BITS) < 0) {
                 printf("%s: read error in sector %" PRIu64 "\n",
                                 __func__, PAGE_START(addr) >> 9);
             }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 12/44] onenand: Switch to byte-based block access
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (10 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 11/44] nand: " Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 13/44] pflash: " Eric Blake
                   ` (31 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 hw/block/onenand.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/hw/block/onenand.c b/hw/block/onenand.c
index 883f4b1..3d19b0c 100644
--- a/hw/block/onenand.c
+++ b/hw/block/onenand.c
@@ -224,7 +224,8 @@ static void onenand_reset(OneNANDState *s, int cold)
         /* Lock the whole flash */
         memset(s->blockwp, ONEN_LOCK_LOCKED, s->blocks);

-        if (s->blk_cur && blk_read(s->blk_cur, 0, s->boot[0], 8) < 0) {
+        if (s->blk_cur && blk_pread(s->blk_cur, 0, s->boot[0],
+                                    8 << BDRV_SECTOR_BITS) < 0) {
             hw_error("%s: Loading the BootRAM failed.\n", __func__);
         }
     }
@@ -241,7 +242,8 @@ static inline int onenand_load_main(OneNANDState *s, int sec, int secn,
                 void *dest)
 {
     if (s->blk_cur) {
-        return blk_read(s->blk_cur, sec, dest, secn) < 0;
+        return blk_pread(s->blk_cur, sec << BDRV_SECTOR_BITS, dest,
+                         secn << BDRV_SECTOR_BITS) < 0;
     } else if (sec + secn > s->secs_cur) {
         return 1;
     }
@@ -257,19 +259,20 @@ static inline int onenand_prog_main(OneNANDState *s, int sec, int secn,
     int result = 0;

     if (secn > 0) {
-        uint32_t size = (uint32_t)secn * 512;
+        uint32_t size = (uint32_t)secn << BDRV_SECTOR_BITS;
+        int64_t offset = sec << BDRV_SECTOR_BITS;
         const uint8_t *sp = (const uint8_t *)src;
         uint8_t *dp = 0;
         if (s->blk_cur) {
             dp = g_malloc(size);
-            if (!dp || blk_read(s->blk_cur, sec, dp, secn) < 0) {
+            if (!dp || blk_pread(s->blk_cur, offset, dp, size) < 0) {
                 result = 1;
             }
         } else {
             if (sec + secn > s->secs_cur) {
                 result = 1;
             } else {
-                dp = (uint8_t *)s->current + (sec << 9);
+                dp = (uint8_t *)s->current + offset;
             }
         }
         if (!result) {
@@ -278,7 +281,7 @@ static inline int onenand_prog_main(OneNANDState *s, int sec, int secn,
                 dp[i] &= sp[i];
             }
             if (s->blk_cur) {
-                result = blk_write(s->blk_cur, sec, dp, secn) < 0;
+                result = blk_pwrite(s->blk_cur, offset, dp, size, 0) < 0;
             }
         }
         if (dp && s->blk_cur) {
@@ -295,7 +298,8 @@ static inline int onenand_load_spare(OneNANDState *s, int sec, int secn,
     uint8_t buf[512];

     if (s->blk_cur) {
-        if (blk_read(s->blk_cur, s->secs_cur + (sec >> 5), buf, 1) < 0) {
+        int32_t offset = (s->secs_cur + (sec >> 5)) << BDRV_SECTOR_BITS;
+        if (blk_pread(s->blk_cur, offset, buf, BDRV_SECTOR_SIZE) < 0) {
             return 1;
         }
         memcpy(dest, buf + ((sec & 31) << 4), secn << 4);
@@ -304,7 +308,7 @@ static inline int onenand_load_spare(OneNANDState *s, int sec, int secn,
     } else {
         memcpy(dest, s->current + (s->secs_cur << 9) + (sec << 4), secn << 4);
     }
- 
+
     return 0;
 }

@@ -315,10 +319,11 @@ static inline int onenand_prog_spare(OneNANDState *s, int sec, int secn,
     if (secn > 0) {
         const uint8_t *sp = (const uint8_t *)src;
         uint8_t *dp = 0, *dpp = 0;
+        uint64_t offset = (s->secs_cur + (sec >> 5)) << BDRV_SECTOR_BITS;
         if (s->blk_cur) {
             dp = g_malloc(512);
             if (!dp
-                || blk_read(s->blk_cur, s->secs_cur + (sec >> 5), dp, 1) < 0) {
+                || blk_pread(s->blk_cur, offset, dp, BDRV_SECTOR_SIZE) < 0) {
                 result = 1;
             } else {
                 dpp = dp + ((sec & 31) << 4);
@@ -336,8 +341,8 @@ static inline int onenand_prog_spare(OneNANDState *s, int sec, int secn,
                 dpp[i] &= sp[i];
             }
             if (s->blk_cur) {
-                result = blk_write(s->blk_cur, s->secs_cur + (sec >> 5),
-                                   dp, 1) < 0;
+                result = blk_pwrite(s->blk_cur, offset, dp,
+                                    BDRV_SECTOR_SIZE, 0) < 0;
             }
         }
         g_free(dp);
@@ -355,14 +360,17 @@ static inline int onenand_erase(OneNANDState *s, int sec, int num)
     for (; num > 0; num--, sec++) {
         if (s->blk_cur) {
             int erasesec = s->secs_cur + (sec >> 5);
-            if (blk_write(s->blk_cur, sec, blankbuf, 1) < 0) {
+            if (blk_pwrite(s->blk_cur, sec << BDRV_SECTOR_BITS, blankbuf,
+                           BDRV_SECTOR_SIZE, 0) < 0) {
                 goto fail;
             }
-            if (blk_read(s->blk_cur, erasesec, tmpbuf, 1) < 0) {
+            if (blk_pread(s->blk_cur, erasesec << BDRV_SECTOR_BITS, tmpbuf,
+                          BDRV_SECTOR_SIZE) < 0) {
                 goto fail;
             }
             memcpy(tmpbuf + ((sec & 31) << 4), blankbuf, 1 << 4);
-            if (blk_write(s->blk_cur, erasesec, tmpbuf, 1) < 0) {
+            if (blk_pwrite(s->blk_cur, erasesec << BDRV_SECTOR_BITS, tmpbuf,
+                           BDRV_SECTOR_SIZE, 0) < 0) {
                 goto fail;
             }
         } else {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 13/44] pflash: Switch to byte-based block access
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (11 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 12/44] onenand: " Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 14/44] sd: " Eric Blake
                   ` (30 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 hw/block/pflash_cfi01.c | 12 ++++++------
 hw/block/pflash_cfi02.c | 12 ++++++------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 106a775..3a1f85d 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -413,11 +413,11 @@ static void pflash_update(pflash_t *pfl, int offset,
     int offset_end;
     if (pfl->blk) {
         offset_end = offset + size;
-        /* round to sectors */
-        offset = offset >> 9;
-        offset_end = (offset_end + 511) >> 9;
-        blk_write(pfl->blk, offset, pfl->storage + (offset << 9),
-                  offset_end - offset);
+        /* widen to sector boundaries */
+        offset = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+        offset_end = QEMU_ALIGN_UP(offset_end, BDRV_SECTOR_SIZE);
+        blk_pwrite(pfl->blk, offset, pfl->storage + offset,
+                   offset_end - offset, 0);
     }
 }

@@ -739,7 +739,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)

     if (pfl->blk) {
         /* read the initial flash content */
-        ret = blk_read(pfl->blk, 0, pfl->storage, total_len >> 9);
+        ret = blk_pread(pfl->blk, 0, pfl->storage, total_len);

         if (ret < 0) {
             vmstate_unregister_ram(&pfl->mem, DEVICE(pfl));
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index b13172c..5f10610 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -253,11 +253,11 @@ static void pflash_update(pflash_t *pfl, int offset,
     int offset_end;
     if (pfl->blk) {
         offset_end = offset + size;
-        /* round to sectors */
-        offset = offset >> 9;
-        offset_end = (offset_end + 511) >> 9;
-        blk_write(pfl->blk, offset, pfl->storage + (offset << 9),
-                  offset_end - offset);
+        /* widen to sector boundaries */
+        offset = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+        offset_end = QEMU_ALIGN_UP(offset_end, BDRV_SECTOR_SIZE);
+        blk_pwrite(pfl->blk, offset, pfl->storage + offset,
+                   offset_end - offset, 0);
     }
 }

@@ -622,7 +622,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error **errp)
     pfl->chip_len = chip_len;
     if (pfl->blk) {
         /* read the initial flash content */
-        ret = blk_read(pfl->blk, 0, pfl->storage, chip_len >> 9);
+        ret = blk_pread(pfl->blk, 0, pfl->storage, chip_len);
         if (ret < 0) {
             vmstate_unregister_ram(&pfl->orig_mem, DEVICE(pfl));
             error_setg(errp, "failed to read the initial flash content");
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 14/44] sd: Switch to byte-based block access
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (12 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 13/44] pflash: " Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 15/44] m25p80: " Eric Blake
                   ` (29 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Greatly simplifies the code, now that we let the block layer
take care of alignment and read-modify-write on our behalf :)

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 hw/sd/sd.c | 46 +++-------------------------------------------
 1 file changed, 3 insertions(+), 43 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index b66e5d2..3c2f2f1 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -1577,57 +1577,17 @@ send_response:

 static void sd_blk_read(SDState *sd, uint64_t addr, uint32_t len)
 {
-    uint64_t end = addr + len;
-
     DPRINTF("sd_blk_read: addr = 0x%08llx, len = %d\n",
             (unsigned long long) addr, len);
-    if (!sd->blk || blk_read(sd->blk, addr >> 9, sd->buf, 1) < 0) {
+    if (!sd->blk || blk_pread(sd->blk, addr, sd->data, len) < 0) {
         fprintf(stderr, "sd_blk_read: read error on host side\n");
-        return;
     }
-
-    if (end > (addr & ~511) + 512) {
-        memcpy(sd->data, sd->buf + (addr & 511), 512 - (addr & 511));
-
-        if (blk_read(sd->blk, end >> 9, sd->buf, 1) < 0) {
-            fprintf(stderr, "sd_blk_read: read error on host side\n");
-            return;
-        }
-        memcpy(sd->data + 512 - (addr & 511), sd->buf, end & 511);
-    } else
-        memcpy(sd->data, sd->buf + (addr & 511), len);
 }

 static void sd_blk_write(SDState *sd, uint64_t addr, uint32_t len)
 {
-    uint64_t end = addr + len;
-
-    if ((addr & 511) || len < 512)
-        if (!sd->blk || blk_read(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-            fprintf(stderr, "sd_blk_write: read error on host side\n");
-            return;
-        }
-
-    if (end > (addr & ~511) + 512) {
-        memcpy(sd->buf + (addr & 511), sd->data, 512 - (addr & 511));
-        if (blk_write(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-            fprintf(stderr, "sd_blk_write: write error on host side\n");
-            return;
-        }
-
-        if (blk_read(sd->blk, end >> 9, sd->buf, 1) < 0) {
-            fprintf(stderr, "sd_blk_write: read error on host side\n");
-            return;
-        }
-        memcpy(sd->buf, sd->data + 512 - (addr & 511), end & 511);
-        if (blk_write(sd->blk, end >> 9, sd->buf, 1) < 0) {
-            fprintf(stderr, "sd_blk_write: write error on host side\n");
-        }
-    } else {
-        memcpy(sd->buf + (addr & 511), sd->data, len);
-        if (!sd->blk || blk_write(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-            fprintf(stderr, "sd_blk_write: write error on host side\n");
-        }
+    if (!sd->blk || blk_pwrite(sd->blk, addr, sd->buf, len, 0) < 0) {
+        fprintf(stderr, "sd_blk_write: write error on host side\n");
     }
 }

-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 15/44] m25p80: Switch to byte-based block access
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (13 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 14/44] sd: " Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 16/44] atapi: " Eric Blake
                   ` (28 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Peter Crosthwaite, Kevin Wolf, Max Reitz

Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 hw/block/m25p80.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 906b712..01c51a2 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -907,8 +907,7 @@ static int m25p80_init(SSISlave *ss)
         s->storage = blk_blockalign(s->blk, s->size);

         /* FIXME: Move to late init */
-        if (blk_read(s->blk, 0, s->storage,
-                     DIV_ROUND_UP(s->size, BDRV_SECTOR_SIZE))) {
+        if (blk_pread(s->blk, 0, s->storage, s->size)) {
             fprintf(stderr, "Failed to initialize SPI flash!\n");
             return 1;
         }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 16/44] atapi: Switch to byte-based block access
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (14 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 15/44] m25p80: " Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25 21:36   ` John Snow
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 17/44] nbd: " Eric Blake
                   ` (27 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, John Snow

Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 hw/ide/atapi.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 2bb606c..81000d8 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -119,12 +119,12 @@ cd_read_sector_sync(IDEState *s)

     switch (s->cd_sector_size) {
     case 2048:
-        ret = blk_read(s->blk, (int64_t)s->lba << 2,
-                       s->io_buffer, 4);
+        ret = blk_pread(s->blk, (int64_t)s->lba << (2 + BDRV_SECTOR_BITS),
+                        s->io_buffer, 4 << BDRV_SECTOR_BITS);
         break;
     case 2352:
-        ret = blk_read(s->blk, (int64_t)s->lba << 2,
-                       s->io_buffer + 16, 4);
+        ret = blk_pread(s->blk, (int64_t)s->lba << (2 + BDRV_SECTOR_BITS),
+                        s->io_buffer + 16, 4 << BDRV_SECTOR_BITS);
         if (ret >= 0) {
             cd_data_to_raw(s->io_buffer, s->lba);
         }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 17/44] nbd: Switch to byte-based block access
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (15 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 16/44] atapi: " Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 18/44] qemu-img: " Eric Blake
                   ` (26 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-nbd.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index a85e98f..01eb7e4 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -160,12 +160,13 @@ static int find_partition(BlockBackend *blk, int partition,
                           off_t *offset, off_t *size)
 {
     struct partition_record mbr[4];
-    uint8_t data[512];
+    uint8_t data[BDRV_SECTOR_SIZE];
     int i;
     int ext_partnum = 4;
     int ret;

-    if ((ret = blk_read(blk, 0, data, 1)) < 0) {
+    ret = blk_pread(blk, 0, data, sizeof(data));
+    if (ret < 0) {
         error_report("error while reading: %s", strerror(-ret));
         exit(EXIT_FAILURE);
     }
@@ -183,10 +184,12 @@ static int find_partition(BlockBackend *blk, int partition,

         if (mbr[i].system == 0xF || mbr[i].system == 0x5) {
             struct partition_record ext[4];
-            uint8_t data1[512];
+            uint8_t data1[BDRV_SECTOR_SIZE];
             int j;

-            if ((ret = blk_read(blk, mbr[i].start_sector_abs, data1, 1)) < 0) {
+            ret = blk_pread(blk, mbr[i].start_sector_abs << BDRV_SECTOR_BITS,
+                            data1, sizeof(data1));
+            if (ret < 0) {
                 error_report("error while reading: %s", strerror(-ret));
                 exit(EXIT_FAILURE);
             }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 18/44] qemu-img: Switch to byte-based block access
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (16 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 17/44] nbd: " Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 19/44] qemu-io: " Eric Blake
                   ` (25 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-img.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 1697762..2e4646e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1092,7 +1092,8 @@ static int check_empty_sectors(BlockBackend *blk, int64_t sect_num,
                                uint8_t *buffer, bool quiet)
 {
     int pnum, ret = 0;
-    ret = blk_read(blk, sect_num, buffer, sect_count);
+    ret = blk_pread(blk, sect_num << BDRV_SECTOR_BITS, buffer,
+                    sect_count << BDRV_SECTOR_BITS);
     if (ret < 0) {
         error_report("Error while reading offset %" PRId64 " of %s: %s",
                      sectors_to_bytes(sect_num), filename, strerror(-ret));
@@ -1307,7 +1308,8 @@ static int img_compare(int argc, char **argv)
             nb_sectors = MIN(pnum1, pnum2);
         } else if (allocated1 == allocated2) {
             if (allocated1) {
-                ret = blk_read(blk1, sector_num, buf1, nb_sectors);
+                ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1,
+                                nb_sectors << BDRV_SECTOR_BITS);
                 if (ret < 0) {
                     error_report("Error while reading offset %" PRId64 " of %s:"
                                  " %s", sectors_to_bytes(sector_num), filename1,
@@ -1315,7 +1317,8 @@ static int img_compare(int argc, char **argv)
                     ret = 4;
                     goto out;
                 }
-                ret = blk_read(blk2, sector_num, buf2, nb_sectors);
+                ret = blk_pread(blk2, sector_num << BDRV_SECTOR_BITS, buf2,
+                                nb_sectors << BDRV_SECTOR_BITS);
                 if (ret < 0) {
                     error_report("Error while reading offset %" PRId64
                                  " of %s: %s", sectors_to_bytes(sector_num),
@@ -1528,7 +1531,9 @@ static int convert_read(ImgConvertState *s, int64_t sector_num, int nb_sectors,
         bs_sectors = s->src_sectors[s->src_cur];

         n = MIN(nb_sectors, bs_sectors - (sector_num - s->src_cur_offset));
-        ret = blk_read(blk, sector_num - s->src_cur_offset, buf, n);
+        ret = blk_pread(blk,
+                        (sector_num - s->src_cur_offset) << BDRV_SECTOR_BITS,
+                        buf, n << BDRV_SECTOR_BITS);
         if (ret < 0) {
             return ret;
         }
@@ -1583,7 +1588,8 @@ static int convert_write(ImgConvertState *s, int64_t sector_num, int nb_sectors,
             if (!s->min_sparse ||
                 is_allocated_sectors_min(buf, n, &n, s->min_sparse))
             {
-                ret = blk_write(s->target, sector_num, buf, n);
+                ret = blk_pwrite(s->target, sector_num << BDRV_SECTOR_BITS,
+                                 buf, n << BDRV_SECTOR_BITS, 0);
                 if (ret < 0) {
                     return ret;
                 }
@@ -3036,7 +3042,8 @@ static int img_rebase(int argc, char **argv)
                     n = old_backing_num_sectors - sector;
                 }

-                ret = blk_read(blk_old_backing, sector, buf_old, n);
+                ret = blk_pread(blk_old_backing, sector << BDRV_SECTOR_BITS,
+                                buf_old, n << BDRV_SECTOR_BITS);
                 if (ret < 0) {
                     error_report("error while reading from old backing file");
                     goto out;
@@ -3050,7 +3057,8 @@ static int img_rebase(int argc, char **argv)
                     n = new_backing_num_sectors - sector;
                 }

-                ret = blk_read(blk_new_backing, sector, buf_new, n);
+                ret = blk_pread(blk_new_backing, sector << BDRV_SECTOR_BITS,
+                                buf_new, n << BDRV_SECTOR_BITS);
                 if (ret < 0) {
                     error_report("error while reading from new backing file");
                     goto out;
@@ -3066,8 +3074,10 @@ static int img_rebase(int argc, char **argv)
                 if (compare_sectors(buf_old + written * 512,
                     buf_new + written * 512, n - written, &pnum))
                 {
-                    ret = blk_write(blk, sector + written,
-                                    buf_old + written * 512, pnum);
+                    ret = blk_pwrite(blk,
+                                     (sector + written) << BDRV_SECTOR_BITS,
+                                     buf_old + written * 512,
+                                     pnum << BDRV_SECTOR_BITS, 0);
                     if (ret < 0) {
                         error_report("Error while writing to COW image: %s",
                             strerror(-ret));
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 19/44] qemu-io: Switch to byte-based block access
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (17 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 18/44] qemu-img: " Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 20/44] block: Switch blk_read_unthrottled() to byte interface Eric Blake
                   ` (24 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

blk_write() and blk_read() are now very simple wrappers around
blk_pwrite() and blk_pread().  There's no reason to require
the user to pass in aligned numbers.  Keep 'read -p' and
'write -p' so that I don't have to hunt down and update all
users of qemu-io, but make the default 'read' and 'write' now
do the same behavior that used to require -p.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-io-cmds.c | 75 +++++++++++++---------------------------------------------
 1 file changed, 16 insertions(+), 59 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index e26e543..4184fb8 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -419,40 +419,6 @@ fail:
     return buf;
 }

-static int do_read(BlockBackend *blk, char *buf, int64_t offset, int64_t count,
-                   int64_t *total)
-{
-    int ret;
-
-    if (count >> 9 > INT_MAX) {
-        return -ERANGE;
-    }
-
-    ret = blk_read(blk, offset >> 9, (uint8_t *)buf, count >> 9);
-    if (ret < 0) {
-        return ret;
-    }
-    *total = count;
-    return 1;
-}
-
-static int do_write(BlockBackend *blk, char *buf, int64_t offset, int64_t count,
-                    int64_t *total)
-{
-    int ret;
-
-    if (count >> 9 > INT_MAX) {
-        return -ERANGE;
-    }
-
-    ret = blk_write(blk, offset >> 9, (uint8_t *)buf, count >> 9);
-    if (ret < 0) {
-        return ret;
-    }
-    *total = count;
-    return 1;
-}
-
 static int do_pread(BlockBackend *blk, char *buf, int64_t offset,
                     int64_t count, int64_t *total)
 {
@@ -671,7 +637,7 @@ static void read_help(void)
 " -b, -- read from the VM state rather than the virtual disk\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -l, -- length for pattern verification (only with -P)\n"
-" -p, -- use blk_pread to read the file\n"
+" -p, -- ignored for back-compat\n"
 " -P, -- use a pattern to verify read data\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
 " -s, -- start offset for pattern verification (only with -P)\n"
@@ -687,7 +653,7 @@ static const cmdinfo_t read_cmd = {
     .cfunc      = read_f,
     .argmin     = 2,
     .argmax     = -1,
-    .args       = "[-abCpqv] [-P pattern [-s off] [-l len]] off len",
+    .args       = "[-abCqv] [-P pattern [-s off] [-l len]] off len",
     .oneline    = "reads a number of bytes at a specified offset",
     .help       = read_help,
 };
@@ -695,7 +661,7 @@ static const cmdinfo_t read_cmd = {
 static int read_f(BlockBackend *blk, int argc, char **argv)
 {
     struct timeval t1, t2;
-    int Cflag = 0, pflag = 0, qflag = 0, vflag = 0;
+    int Cflag = 0, qflag = 0, vflag = 0;
     int Pflag = 0, sflag = 0, lflag = 0, bflag = 0;
     int c, cnt;
     char *buf;
@@ -723,7 +689,7 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
             }
             break;
         case 'p':
-            pflag = 1;
+            /* Ignored for back-compat */
             break;
         case 'P':
             Pflag = 1;
@@ -755,11 +721,6 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
         return qemuio_command_usage(&read_cmd);
     }

-    if (bflag && pflag) {
-        printf("-b and -p cannot be specified at the same time\n");
-        return 0;
-    }
-
     offset = cvtnum(argv[optind]);
     if (offset < 0) {
         print_cvtnum_err(offset, argv[optind]);
@@ -790,7 +751,7 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
         return 0;
     }

-    if (!pflag) {
+    if (bflag) {
         if (offset & 0x1ff) {
             printf("offset %" PRId64 " is not sector aligned\n",
                    offset);
@@ -806,12 +767,10 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
     buf = qemu_io_alloc(blk, count, 0xab);

     gettimeofday(&t1, NULL);
-    if (pflag) {
-        cnt = do_pread(blk, buf, offset, count, &total);
-    } else if (bflag) {
+    if (bflag) {
         cnt = do_load_vmstate(blk, buf, offset, count, &total);
     } else {
-        cnt = do_read(blk, buf, offset, count, &total);
+        cnt = do_pread(blk, buf, offset, count, &total);
     }
     gettimeofday(&t2, NULL);

@@ -991,7 +950,7 @@ static void write_help(void)
 " filled with a set pattern (0xcdcdcdcd).\n"
 " -b, -- write to the VM state rather than the virtual disk\n"
 " -c, -- write compressed data with blk_write_compressed\n"
-" -p, -- use blk_pwrite to write the file\n"
+" -p, -- ignored for back-compat\n"
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
@@ -1007,7 +966,7 @@ static const cmdinfo_t write_cmd = {
     .cfunc      = write_f,
     .argmin     = 2,
     .argmax     = -1,
-    .args       = "[-bcCpqz] [-P pattern ] off len",
+    .args       = "[-bcCqz] [-P pattern ] off len",
     .oneline    = "writes a number of bytes at a specified offset",
     .help       = write_help,
 };
@@ -1015,7 +974,7 @@ static const cmdinfo_t write_cmd = {
 static int write_f(BlockBackend *blk, int argc, char **argv)
 {
     struct timeval t1, t2;
-    int Cflag = 0, pflag = 0, qflag = 0, bflag = 0, Pflag = 0, zflag = 0;
+    int Cflag = 0, qflag = 0, bflag = 0, Pflag = 0, zflag = 0;
     int cflag = 0;
     int c, cnt;
     char *buf = NULL;
@@ -1037,7 +996,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
             Cflag = 1;
             break;
         case 'p':
-            pflag = 1;
+            /* Ignored for back-compat */
             break;
         case 'P':
             Pflag = 1;
@@ -1061,8 +1020,8 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
         return qemuio_command_usage(&write_cmd);
     }

-    if (bflag + pflag + zflag > 1) {
-        printf("-b, -p, or -z cannot be specified at the same time\n");
+    if (bflag + zflag > 1) {
+        printf("-b and -z cannot be specified at the same time\n");
         return 0;
     }

@@ -1088,7 +1047,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
         return 0;
     }

-    if (!pflag) {
+    if (bflag || cflag || zflag) {
         if (offset & 0x1ff) {
             printf("offset %" PRId64 " is not sector aligned\n",
                    offset);
@@ -1107,16 +1066,14 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
     }

     gettimeofday(&t1, NULL);
-    if (pflag) {
-        cnt = do_pwrite(blk, buf, offset, count, &total);
-    } else if (bflag) {
+    if (bflag) {
         cnt = do_save_vmstate(blk, buf, offset, count, &total);
     } else if (zflag) {
         cnt = do_co_write_zeroes(blk, offset, count, &total);
     } else if (cflag) {
         cnt = do_write_compressed(blk, buf, offset, count, &total);
     } else {
-        cnt = do_write(blk, buf, offset, count, &total);
+        cnt = do_pwrite(blk, buf, offset, count, &total);
     }
     gettimeofday(&t2, NULL);

-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 20/44] block: Switch blk_read_unthrottled() to byte interface
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (18 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 19/44] qemu-io: " Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 21/44] block: Switch blk_write_zeroes() " Eric Blake
                   ` (23 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz, John Snow

Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/sysemu/block-backend.h | 4 ++--
 block/block-backend.c          | 8 ++++----
 hw/block/hd-geometry.c         | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 6991b26..662a106 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -92,8 +92,8 @@ void *blk_get_attached_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
              int nb_sectors);
-int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
-                         int nb_sectors);
+int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
+                          int count);
 int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
               int nb_sectors);
 int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
diff --git a/block/block-backend.c b/block/block-backend.c
index 4551865..5513b6f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -790,21 +790,21 @@ int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
     return blk_rw(blk, sector_num, buf, nb_sectors, blk_read_entry, 0);
 }

-int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
-                         int nb_sectors)
+int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
+                          int count)
 {
     BlockDriverState *bs = blk_bs(blk);
     bool enabled;
     int ret;

-    ret = blk_check_request(blk, sector_num, nb_sectors);
+    ret = blk_check_byte_request(blk, offset, count);
     if (ret < 0) {
         return ret;
     }

     enabled = bs->io_limits_enabled;
     bs->io_limits_enabled = false;
-    ret = blk_read(blk, sector_num, buf, nb_sectors);
+    ret = blk_pread(blk, offset, buf, count);
     bs->io_limits_enabled = enabled;
     return ret;
 }
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 6d02192..d388f13 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -66,7 +66,7 @@ static int guess_disk_lchs(BlockBackend *blk,
      * but also in async I/O mode. So the I/O throttling function has to
      * be disabled temporarily here, not permanently.
      */
-    if (blk_read_unthrottled(blk, 0, buf, 1) < 0) {
+    if (blk_pread_unthrottled(blk, 0, buf, BDRV_SECTOR_SIZE) < 0) {
         return -1;
     }
     /* test msdos magic */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 21/44] block: Switch blk_write_zeroes() to byte interface
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (19 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 20/44] block: Switch blk_read_unthrottled() to byte interface Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-23  8:12   ` Denis V. Lunev
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 22/44] block: Kill blk_write(), blk_read() Eric Blake
                   ` (22 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, alex, Kevin Wolf, Max Reitz, Stefan Hajnoczi, Denis V. Lunev

Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes().

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/sysemu/block-backend.h | 4 ++--
 block/block-backend.c          | 8 ++++----
 block/parallels.c              | 3 ++-
 qemu-img.c                     | 3 ++-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 662a106..1246699 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -96,8 +96,8 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
                           int count);
 int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
               int nb_sectors);
-int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
-                     int nb_sectors, BdrvRequestFlags flags);
+int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+                     int count, BdrvRequestFlags flags);
 BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
                                  int nb_sectors, BdrvRequestFlags flags,
                                  BlockCompletionFunc *cb, void *opaque);
diff --git a/block/block-backend.c b/block/block-backend.c
index 5513b6f..ae08bd2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -816,11 +816,11 @@ int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
                   blk_write_entry, 0);
 }

-int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
-                     int nb_sectors, BdrvRequestFlags flags)
+int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+                      int count, BdrvRequestFlags flags)
 {
-    return blk_rw(blk, sector_num, NULL, nb_sectors, blk_write_entry,
-                  flags | BDRV_REQ_ZERO_WRITE);
+    return blk_prw(blk, offset, NULL, count, blk_write_entry,
+                   flags | BDRV_REQ_ZERO_WRITE);
 }

 static void error_callback_bh(void *opaque)
diff --git a/block/parallels.c b/block/parallels.c
index 2d8bc87..95bfc32 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -516,7 +516,8 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
     if (ret < 0) {
         goto exit;
     }
-    ret = blk_write_zeroes(file, 1, bat_sectors - 1, 0);
+    ret = blk_pwrite_zeroes(file, BDRV_SECTOR_SIZE,
+                            (bat_sectors - 1) << BDRV_SECTOR_BITS, 0);
     if (ret < 0) {
         goto exit;
     }
diff --git a/qemu-img.c b/qemu-img.c
index 2e4646e..376107c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1601,7 +1601,8 @@ static int convert_write(ImgConvertState *s, int64_t sector_num, int nb_sectors,
             if (s->has_zero_init) {
                 break;
             }
-            ret = blk_write_zeroes(s->target, sector_num, n, 0);
+            ret = blk_pwrite_zeroes(s->target, sector_num << BDRV_SECTOR_BITS,
+                                    n << BDRV_SECTOR_BITS, 0);
             if (ret < 0) {
                 return ret;
             }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 22/44] block: Kill blk_write(), blk_read()
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (20 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 21/44] block: Switch blk_write_zeroes() " Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 23/44] qemu-io: Add missing option documentation Eric Blake
                   ` (21 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

Now that there are no remaining clients, we can drop these
functions, to ensure that all future users get the byte-based
interfaces.  Sadly, there are still remaining sector-based
interfaces, such as blk_aio_writev; those will have to wait
for another day.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/sysemu/block-backend.h |  4 ----
 block/block-backend.c          | 25 -------------------------
 2 files changed, 29 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 1246699..bf04086 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -90,12 +90,8 @@ void blk_attach_dev_nofail(BlockBackend *blk, void *dev);
 void blk_detach_dev(BlockBackend *blk, void *dev);
 void *blk_get_attached_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
-int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
-             int nb_sectors);
 int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
                           int count);
-int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
-              int nb_sectors);
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
                      int count, BdrvRequestFlags flags);
 BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
diff --git a/block/block-backend.c b/block/block-backend.c
index ae08bd2..1c3b495 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -772,24 +772,6 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
     return rwco.ret;
 }

-static int blk_rw(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
-                  int nb_sectors, CoroutineEntry co_entry,
-                  BdrvRequestFlags flags)
-{
-    if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-        return -EINVAL;
-    }
-
-    return blk_prw(blk, sector_num << BDRV_SECTOR_BITS, buf,
-                   nb_sectors << BDRV_SECTOR_BITS, co_entry, flags);
-}
-
-int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
-             int nb_sectors)
-{
-    return blk_rw(blk, sector_num, buf, nb_sectors, blk_read_entry, 0);
-}
-
 int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
                           int count)
 {
@@ -809,13 +791,6 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
     return ret;
 }

-int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
-              int nb_sectors)
-{
-    return blk_rw(blk, sector_num, (uint8_t*) buf, nb_sectors,
-                  blk_write_entry, 0);
-}
-
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
                       int count, BdrvRequestFlags flags)
 {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 23/44] qemu-io: Add missing option documentation
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (21 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 22/44] block: Kill blk_write(), blk_read() Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 24/44] qemu-io: Add 'write -f' to test FUA flag Eric Blake
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

Commit 499afa2 added --image-opts, but forgot to document it in
--help.  Likewise for commit 9e8f183 and -d/--discard.

Finally, commit 10d9d75 removed -g/--growable, but forgot to
cull it from the valid short options.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/qemu-io.c b/qemu-io.c
index 288bba8..cdc9497 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -221,6 +221,7 @@ static void usage(const char *name)
 "\n"
 "  --object OBJECTDEF   define an object such as 'secret' for\n"
 "                       passwords and/or encryption keys\n"
+"  --image-opts         treat file as option string\n"
 "  -c, --cmd STRING     execute command with its arguments\n"
 "                       from the given string\n"
 "  -f, --format FMT     specifies the block driver to use\n"
@@ -230,6 +231,7 @@ static void usage(const char *name)
 "  -m, --misalign       misalign allocations for O_DIRECT\n"
 "  -k, --native-aio     use kernel AIO implementation (on Linux only)\n"
 "  -t, --cache=MODE     use the given cache mode for the image\n"
+"  -d, --discard=MODE   use the given discard mode for the image\n"
 "  -T, --trace FILE     enable trace events listed in the given file\n"
 "  -h, --help           display this help and exit\n"
 "  -V, --version        output version information and exit\n"
@@ -410,7 +412,7 @@ static QemuOptsList file_opts = {
 int main(int argc, char **argv)
 {
     int readonly = 0;
-    const char *sopt = "hVc:d:f:rsnmgkt:T:";
+    const char *sopt = "hVc:d:f:rsnmkt:T:";
     const struct option lopt[] = {
         { "help", no_argument, NULL, 'h' },
         { "version", no_argument, NULL, 'V' },
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 24/44] qemu-io: Add 'write -f' to test FUA flag
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (22 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 23/44] qemu-io: Add missing option documentation Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 25/44] qemu-io: Add 'open -u' to set BDRV_O_UNMAP after the fact Eric Blake
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

Make it easier to test block drivers with BDRV_REQ_FUA in
.supported_write_flags, by adding a flag to qemu-io to
conditionally pass the flag through to specific writes.  You'll
want to use 'qemu-io -t none' to actually make -f useful (as
otherwise, the default writethrough mode automatically sets
the FUA bit on every write).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-io-cmds.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 4184fb8..1e444cc 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -434,13 +434,14 @@ static int do_pread(BlockBackend *blk, char *buf, int64_t offset,
 }

 static int do_pwrite(BlockBackend *blk, char *buf, int64_t offset,
-                     int64_t count, int64_t *total)
+                     int64_t count, bool fua, int64_t *total)
 {
     if (count > INT_MAX) {
         return -ERANGE;
     }

-    *total = blk_pwrite(blk, offset, (uint8_t *)buf, count, 0);
+    *total = blk_pwrite(blk, offset, (uint8_t *)buf, count,
+                        fua ? BDRV_REQ_FUA : 0);
     if (*total < 0) {
         return *total;
     }
@@ -452,6 +453,7 @@ typedef struct {
     int64_t offset;
     int64_t count;
     int64_t *total;
+    bool fua;
     int ret;
     bool done;
 } CoWriteZeroes;
@@ -461,7 +463,8 @@ static void coroutine_fn co_write_zeroes_entry(void *opaque)
     CoWriteZeroes *data = opaque;

     data->ret = blk_co_write_zeroes(data->blk, data->offset / BDRV_SECTOR_SIZE,
-                                    data->count / BDRV_SECTOR_SIZE, 0);
+                                    data->count / BDRV_SECTOR_SIZE,
+                                    data->fua ? BDRV_REQ_FUA : 0);
     data->done = true;
     if (data->ret < 0) {
         *data->total = data->ret;
@@ -472,7 +475,7 @@ static void coroutine_fn co_write_zeroes_entry(void *opaque)
 }

 static int do_co_write_zeroes(BlockBackend *blk, int64_t offset, int64_t count,
-                              int64_t *total)
+                              bool fua, int64_t *total)
 {
     Coroutine *co;
     CoWriteZeroes data = {
@@ -480,6 +483,7 @@ static int do_co_write_zeroes(BlockBackend *blk, int64_t offset, int64_t count,
         .offset = offset,
         .count  = count,
         .total  = total,
+        .fua    = fua,
         .done   = false,
     };

@@ -950,6 +954,7 @@ static void write_help(void)
 " filled with a set pattern (0xcdcdcdcd).\n"
 " -b, -- write to the VM state rather than the virtual disk\n"
 " -c, -- write compressed data with blk_write_compressed\n"
+" -f, -- use Force Unit Access semantics\n"
 " -p, -- ignored for back-compat\n"
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
@@ -966,7 +971,7 @@ static const cmdinfo_t write_cmd = {
     .cfunc      = write_f,
     .argmin     = 2,
     .argmax     = -1,
-    .args       = "[-bcCqz] [-P pattern ] off len",
+    .args       = "[-bcCfqz] [-P pattern ] off len",
     .oneline    = "writes a number of bytes at a specified offset",
     .help       = write_help,
 };
@@ -975,7 +980,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
 {
     struct timeval t1, t2;
     int Cflag = 0, qflag = 0, bflag = 0, Pflag = 0, zflag = 0;
-    int cflag = 0;
+    int cflag = 0, fflag = 0;
     int c, cnt;
     char *buf = NULL;
     int64_t offset;
@@ -984,7 +989,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
     int64_t total = 0;
     int pattern = 0xcd;

-    while ((c = getopt(argc, argv, "bcCpP:qz")) != -1) {
+    while ((c = getopt(argc, argv, "bcCfpP:qz")) != -1) {
         switch (c) {
         case 'b':
             bflag = 1;
@@ -995,6 +1000,9 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
         case 'C':
             Cflag = 1;
             break;
+        case 'f':
+            fflag = 1;
+            break;
         case 'p':
             /* Ignored for back-compat */
             break;
@@ -1025,6 +1033,11 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
         return 0;
     }

+    if (fflag && (bflag + cflag)) {
+        printf("-f and -b or -c cannot be specified at the same time\n");
+        return 0;
+    }
+
     if (zflag && Pflag) {
         printf("-z and -P cannot be specified at the same time\n");
         return 0;
@@ -1069,11 +1082,11 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
     if (bflag) {
         cnt = do_save_vmstate(blk, buf, offset, count, &total);
     } else if (zflag) {
-        cnt = do_co_write_zeroes(blk, offset, count, &total);
+        cnt = do_co_write_zeroes(blk, offset, count, fflag, &total);
     } else if (cflag) {
         cnt = do_write_compressed(blk, buf, offset, count, &total);
     } else {
-        cnt = do_pwrite(blk, buf, offset, count, &total);
+        cnt = do_pwrite(blk, buf, offset, count, fflag, &total);
     }
     gettimeofday(&t2, NULL);

-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 25/44] qemu-io: Add 'open -u' to set BDRV_O_UNMAP after the fact
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (23 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 24/44] qemu-io: Add 'write -f' to test FUA flag Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 26/44] qemu-io: Add 'write -z -u' to test MAY_UNMAP flag Eric Blake
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

When opening a file from the command line, qemu-io defaults
to BDRV_O_UNMAP but allows -d to give full control to disable
unmaps. But when opening via the 'open' command, qemu-io did
not set BDRV_O_UNMAP, and had no way to allow it.

Make it at least possible to symmetrically test things:
'qemu-io -d ignore' at the CLI now matches 'qemu-io> open'
in batch mode, and 'qemu-io' or 'qemu-io -d unmap' at
the CLI matches 'qemu-io> open -u'.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-io.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index cdc9497..dc8d124 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -107,6 +107,7 @@ static void open_help(void)
 " -r, -- open file read-only\n"
 " -s, -- use snapshot file\n"
 " -n, -- disable host cache\n"
+" -u, -- allow discard and zero operations to unmap\n"
 " -o, -- options to be given to the block driver"
 "\n");
 }
@@ -120,7 +121,7 @@ static const cmdinfo_t open_cmd = {
     .argmin     = 1,
     .argmax     = -1,
     .flags      = CMD_NOFILE_OK,
-    .args       = "[-Crsn] [-o options] [path]",
+    .args       = "[-Crsnu] [-o options] [path]",
     .oneline    = "open the file specified by path",
     .help       = open_help,
 };
@@ -144,7 +145,7 @@ static int open_f(BlockBackend *blk, int argc, char **argv)
     QemuOpts *qopts;
     QDict *opts;

-    while ((c = getopt(argc, argv, "snrgo:")) != -1) {
+    while ((c = getopt(argc, argv, "snrguo:")) != -1) {
         switch (c) {
         case 's':
             flags |= BDRV_O_SNAPSHOT;
@@ -156,6 +157,9 @@ static int open_f(BlockBackend *blk, int argc, char **argv)
         case 'r':
             readonly = 1;
             break;
+        case 'u':
+            flags |= BDRV_O_UNMAP;
+            break;
         case 'o':
             if (imageOpts) {
                 printf("--image-opts and 'open -o' are mutually exclusive\n");
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 26/44] qemu-io: Add 'write -z -u' to test MAY_UNMAP flag
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (24 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 25/44] qemu-io: Add 'open -u' to set BDRV_O_UNMAP after the fact Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 27/44] nbd: Use BDRV_REQ_FUA for better FUA where supported Eric Blake
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

Make it easier to control whether the BDRV_REQ_MAY_UNMAP flag
can be passed through a write_zeroes command, by adding a flag
to qemu-io.  To be useful, the device has to be opened with
'qemu-io -d unmap' (or the just-added 'open -u' subcommand).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-io-cmds.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 1e444cc..ca23459 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -454,6 +454,7 @@ typedef struct {
     int64_t count;
     int64_t *total;
     bool fua;
+    bool unmap;
     int ret;
     bool done;
 } CoWriteZeroes;
@@ -464,7 +465,8 @@ static void coroutine_fn co_write_zeroes_entry(void *opaque)

     data->ret = blk_co_write_zeroes(data->blk, data->offset / BDRV_SECTOR_SIZE,
                                     data->count / BDRV_SECTOR_SIZE,
-                                    data->fua ? BDRV_REQ_FUA : 0);
+                                    (data->fua ? BDRV_REQ_FUA : 0) |
+                                    (data->unmap ? BDRV_REQ_MAY_UNMAP : 0));
     data->done = true;
     if (data->ret < 0) {
         *data->total = data->ret;
@@ -475,7 +477,7 @@ static void coroutine_fn co_write_zeroes_entry(void *opaque)
 }

 static int do_co_write_zeroes(BlockBackend *blk, int64_t offset, int64_t count,
-                              bool fua, int64_t *total)
+                              bool fua, bool unmap, int64_t *total)
 {
     Coroutine *co;
     CoWriteZeroes data = {
@@ -484,6 +486,7 @@ static int do_co_write_zeroes(BlockBackend *blk, int64_t offset, int64_t count,
         .count  = count,
         .total  = total,
         .fua    = fua,
+        .unmap  = unmap,
         .done   = false,
     };

@@ -959,6 +962,7 @@ static void write_help(void)
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
+" -u, -- with -z, allow unmapping\n"
 " -z, -- write zeroes using blk_co_write_zeroes\n"
 "\n");
 }
@@ -971,7 +975,7 @@ static const cmdinfo_t write_cmd = {
     .cfunc      = write_f,
     .argmin     = 2,
     .argmax     = -1,
-    .args       = "[-bcCfqz] [-P pattern ] off len",
+    .args       = "[-bcCfquz] [-P pattern ] off len",
     .oneline    = "writes a number of bytes at a specified offset",
     .help       = write_help,
 };
@@ -980,7 +984,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
 {
     struct timeval t1, t2;
     int Cflag = 0, qflag = 0, bflag = 0, Pflag = 0, zflag = 0;
-    int cflag = 0, fflag = 0;
+    int cflag = 0, fflag = 0, uflag = 0;
     int c, cnt;
     char *buf = NULL;
     int64_t offset;
@@ -989,7 +993,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
     int64_t total = 0;
     int pattern = 0xcd;

-    while ((c = getopt(argc, argv, "bcCfpP:qz")) != -1) {
+    while ((c = getopt(argc, argv, "bcCfpP:quz")) != -1) {
         switch (c) {
         case 'b':
             bflag = 1;
@@ -1016,6 +1020,9 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
         case 'q':
             qflag = 1;
             break;
+        case 'u':
+            uflag = 1;
+            break;
         case 'z':
             zflag = 1;
             break;
@@ -1038,6 +1045,11 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
         return 0;
     }

+    if (uflag && !zflag) {
+        printf("-u requires -z to be specified\n");
+        return 0;
+    }
+
     if (zflag && Pflag) {
         printf("-z and -P cannot be specified at the same time\n");
         return 0;
@@ -1082,7 +1094,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
     if (bflag) {
         cnt = do_save_vmstate(blk, buf, offset, count, &total);
     } else if (zflag) {
-        cnt = do_co_write_zeroes(blk, offset, count, fflag, &total);
+        cnt = do_co_write_zeroes(blk, offset, count, fflag, uflag, &total);
     } else if (cflag) {
         cnt = do_write_compressed(blk, buf, offset, count, &total);
     } else {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 27/44] nbd: Use BDRV_REQ_FUA for better FUA where supported
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (25 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 26/44] qemu-io: Add 'write -z -u' to test MAY_UNMAP flag Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 28/44] nbd: Detect servers that send unexpected error values Eric Blake
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

Rather than always flushing ourselves, let the block layer
forward the FUA on to the underlying device - where all
layers understand FUA, we are now more efficient; and where
the underlying layer doesn't understand it, now the block
layer takes care of the full flush fallback on our behalf.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 9be0a99..fa05a73 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1085,6 +1085,7 @@ static void nbd_trip(void *opaque)
     struct nbd_reply reply;
     ssize_t ret;
     uint32_t command;
+    int flags;

     TRACE("Reading request.");
     if (client->closing) {
@@ -1153,23 +1154,18 @@ static void nbd_trip(void *opaque)

         TRACE("Writing to device");

+        flags = 0;
+        if (request.type & NBD_CMD_FLAG_FUA) {
+            flags |= BDRV_REQ_FUA;
+        }
         ret = blk_pwrite(exp->blk, request.from + exp->dev_offset,
-                         req->data, request.len, 0);
+                         req->data, request.len, flags);
         if (ret < 0) {
             LOG("writing to file failed");
             reply.error = -ret;
             goto error_reply;
         }

-        if (request.type & NBD_CMD_FLAG_FUA) {
-            ret = blk_co_flush(exp->blk);
-            if (ret < 0) {
-                LOG("flush failed");
-                reply.error = -ret;
-                goto error_reply;
-            }
-        }
-
         if (nbd_co_send_reply(req, &reply, 0) < 0) {
             goto out;
         }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 28/44] nbd: Detect servers that send unexpected error values
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (26 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 27/44] nbd: Use BDRV_REQ_FUA for better FUA where supported Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 29/44] nbd: Avoid magic number for NBD max name size Eric Blake
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

Add some debugging to flag servers that are not compliant to
the NBD protocol.  This would have flagged the server bug
fixed in commit c0301fcc.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

---
v3: later in series, but no change
---
 nbd/client.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/nbd/client.c b/nbd/client.c
index 937344c..4659df3 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -33,8 +33,10 @@ static int nbd_errno_to_system_errno(int err)
         return ENOMEM;
     case NBD_ENOSPC:
         return ENOSPC;
+    default:
+        TRACE("Squashing unexpected error %d to EINVAL", err);
+        /* fallthrough */
     case NBD_EINVAL:
-    default:
         return EINVAL;
     }
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 29/44] nbd: Avoid magic number for NBD max name size
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (27 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 28/44] nbd: Detect servers that send unexpected error values Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25  9:32   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 30/44] nbd: Treat flags vs. command type as separate fields Eric Blake
                   ` (14 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

Declare a constant and use that when determining if an export
name fits within the constraints we are willing to support.

Note that upstream NBD recently documented that clients MUST
support export names of 256 bytes (not including trailing NUL),
and SHOULD support names up to 4096 bytes.  4096 is a bit big
(we would lose benefits of stack-allocation of a name array),
and we already have other limits in place (for example, qcow2
snapshot names are clamped around 1024).  So for now, just
stick to the required minimum.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: enlarge the limit, and document choice of new value
---
 include/block/nbd.h | 6 ++++++
 nbd/client.c        | 2 +-
 nbd/server.c        | 4 ++--
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3e2d76b..2c753cc 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -76,6 +76,12 @@ enum {

 /* Maximum size of a single READ/WRITE data buffer */
 #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
+/* Maximum size of an export name. The NBD spec requires 256 and
+ * suggests that servers support up to 4096, but we stick to only the
+ * required size so that we can stack-allocate the names, and because
+ * going larger would require an audit of more code to make sure we
+ * aren't overflowing some other buffer. */
+#define NBD_MAX_NAME_SIZE 256

 ssize_t nbd_wr_syncv(QIOChannel *ioc,
                      struct iovec *iov,
diff --git a/nbd/client.c b/nbd/client.c
index 4659df3..b700100 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -210,7 +210,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             error_setg(errp, "incorrect option name length");
             return -1;
         }
-        if (namelen > 255) {
+        if (namelen > NBD_MAX_NAME_SIZE) {
             error_setg(errp, "export name length too long %" PRIu32, namelen);
             return -1;
         }
diff --git a/nbd/server.c b/nbd/server.c
index fa05a73..a20bf8a 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -296,13 +296,13 @@ static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
 static int nbd_negotiate_handle_export_name(NBDClient *client, uint32_t length)
 {
     int rc = -EINVAL;
-    char name[256];
+    char name[NBD_MAX_NAME_SIZE + 1];

     /* Client sends:
         [20 ..  xx]   export name (length bytes)
      */
     TRACE("Checking length");
-    if (length > 255) {
+    if (length >= sizeof(name)) {
         LOG("Bad length received");
         goto fail;
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 30/44] nbd: Treat flags vs. command type as separate fields
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (28 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 29/44] nbd: Avoid magic number for NBD max name size Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25  9:34   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 31/44] nbd: Share common reply-sending code in server Eric Blake
                   ` (13 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

Current upstream NBD documents that requests have a 16-bit flags,
followed by a 16-bit type integer; although older versions mentioned
only a 32-bit field with masking to find flags.  Since the protocol
is in network order (big-endian over the wire), the ABI is unchanged;
but dealing with the flags as a separate field rather than masking
will make it easier to add support for upcoming NBD extensions that
increase the number of both flags and commands.

Improve some comments in nbd.h based on the current upstream
NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
and touch some nearby code to keep checkpatch.pl happy.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: rebase to other changes earlier in series
---
 include/block/nbd.h | 18 ++++++++++++------
 nbd/nbd-internal.h  |  4 ++--
 block/nbd-client.c  |  9 +++------
 nbd/client.c        | 17 ++++++++++-------
 nbd/server.c        | 51 ++++++++++++++++++++++++++-------------------------
 5 files changed, 53 insertions(+), 46 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 2c753cc..f4ae86c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,4 +1,5 @@
 /*
+ *  Copyright (C) 2016 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -27,7 +28,8 @@

 struct nbd_request {
     uint32_t magic;
-    uint32_t type;
+    uint16_t flags;
+    uint16_t type;
     uint64_t handle;
     uint64_t from;
     uint32_t len;
@@ -39,6 +41,8 @@ struct nbd_reply {
     uint64_t handle;
 } QEMU_PACKED;

+/* Transmission (export) flags: sent from server to client during handshake,
+   but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
 #define NBD_FLAG_READ_ONLY      (1 << 1)        /* Device is read-only */
 #define NBD_FLAG_SEND_FLUSH     (1 << 2)        /* Send FLUSH */
@@ -46,10 +50,12 @@ struct nbd_reply {
 #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
 #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */

-/* New-style global flags. */
+/* New-style handshake (global) flags, sent from server to client, and
+   control what will happen during handshake phase. */
 #define NBD_FLAG_FIXED_NEWSTYLE     (1 << 0)    /* Fixed newstyle protocol. */

-/* New-style client flags. */
+/* New-style client flags, sent from client to server to control what happens
+   during handshake phase. */
 #define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)    /* Fixed newstyle protocol. */

 /* Reply types. */
@@ -60,10 +66,10 @@ struct nbd_reply {
 #define NBD_REP_ERR_INVALID     ((UINT32_C(1) << 31) | 3) /* Invalid length. */
 #define NBD_REP_ERR_TLS_REQD    ((UINT32_C(1) << 31) | 5) /* TLS required */

+/* Request flags, sent from client to server during transmission phase */
+#define NBD_CMD_FLAG_FUA        (1 << 0)

-#define NBD_CMD_MASK_COMMAND	0x0000ffff
-#define NBD_CMD_FLAG_FUA	(1 << 16)
-
+/* Supported request types */
 enum {
     NBD_CMD_READ = 0,
     NBD_CMD_WRITE = 1,
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 035ead4..d0108a1 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -52,10 +52,10 @@
 /* This is all part of the "official" NBD API.
  *
  * The most up-to-date documentation is available at:
- * https://github.com/yoe/nbd/blob/master/doc/proto.txt
+ * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

-#define NBD_REQUEST_SIZE        (4 + 4 + 8 + 8 + 4)
+#define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
 #define NBD_REPLY_SIZE          (4 + 4 + 8)
 #define NBD_REQUEST_MAGIC       0x25609513
 #define NBD_REPLY_MAGIC         0x67446698
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 878e879..285025d 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -1,6 +1,7 @@
 /*
  * QEMU Block driver for  NBD
  *
+ * Copyright (C) 2016 Red Hat, Inc.
  * Copyright (C) 2008 Bull S.A.S.
  *     Author: Laurent Vivier <Laurent.Vivier@bull.net>
  *
@@ -252,7 +253,7 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,

     if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
         *flags &= ~BDRV_REQ_FUA;
-        request.type |= NBD_CMD_FLAG_FUA;
+        request.flags |= NBD_CMD_FLAG_FUA;
     }

     request.from = sector_num * 512;
@@ -376,11 +377,7 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
 void nbd_client_close(BlockDriverState *bs)
 {
     NbdClientSession *client = nbd_get_client_session(bs);
-    struct nbd_request request = {
-        .type = NBD_CMD_DISC,
-        .from = 0,
-        .len = 0
-    };
+    struct nbd_request request = { .type = NBD_CMD_DISC };

     if (client->ioc == NULL) {
         return;
diff --git a/nbd/client.c b/nbd/client.c
index b700100..a9e173a 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,4 +1,5 @@
 /*
+ *  Copyright (C) 2016 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Client Side
@@ -713,14 +714,16 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request)

     TRACE("Sending request to server: "
           "{ .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64
-          ", .type=%" PRIu16 " }",
-          request->from, request->len, request->handle, request->type);
+          ", .flags = %" PRIx16 ", .type = %" PRIu16 " }",
+          request->from, request->len, request->handle,
+          request->flags, request->type);

-    cpu_to_be32w((uint32_t*)buf, NBD_REQUEST_MAGIC);
-    cpu_to_be32w((uint32_t*)(buf + 4), request->type);
-    cpu_to_be64w((uint64_t*)(buf + 8), request->handle);
-    cpu_to_be64w((uint64_t*)(buf + 16), request->from);
-    cpu_to_be32w((uint32_t*)(buf + 24), request->len);
+    cpu_to_be32w((uint32_t *)buf, NBD_REQUEST_MAGIC);
+    cpu_to_be16w((uint16_t *)(buf + 4), request->flags);
+    cpu_to_be16w((uint16_t *)(buf + 6), request->type);
+    cpu_to_be64w((uint64_t *)(buf + 8), request->handle);
+    cpu_to_be64w((uint64_t *)(buf + 16), request->from);
+    cpu_to_be32w((uint32_t *)(buf + 24), request->len);

     ret = write_sync(ioc, buf, sizeof(buf));
     if (ret < 0) {
diff --git a/nbd/server.c b/nbd/server.c
index a20bf8a..1d30b6d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,4 +1,5 @@
 /*
+ *  Copyright (C) 2016 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Server Side
@@ -646,21 +647,23 @@ static ssize_t nbd_receive_request(QIOChannel *ioc, struct nbd_request *request)

     /* Request
        [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
-       [ 4 ..  7]   type    (0 == READ, 1 == WRITE)
+       [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+       [ 6 ..  7]   type    (NBD_CMD_READ, ...)
        [ 8 .. 15]   handle
        [16 .. 23]   from
        [24 .. 27]   len
      */

-    magic = be32_to_cpup((uint32_t*)buf);
-    request->type  = be32_to_cpup((uint32_t*)(buf + 4));
-    request->handle = be64_to_cpup((uint64_t*)(buf + 8));
-    request->from  = be64_to_cpup((uint64_t*)(buf + 16));
-    request->len   = be32_to_cpup((uint32_t*)(buf + 24));
+    magic = be32_to_cpup((uint32_t *)buf);
+    request->flags = be16_to_cpup((uint16_t *)(buf + 4));
+    request->type  = be16_to_cpup((uint16_t *)(buf + 6));
+    request->handle = be64_to_cpup((uint64_t *)(buf + 8));
+    request->from  = be64_to_cpup((uint64_t *)(buf + 16));
+    request->len   = be32_to_cpup((uint32_t *)(buf + 24));

-    TRACE("Got request: { magic = 0x%" PRIx32 ", .type = %" PRIx32
-          ", from = %" PRIu64 " , len = %" PRIu32 " }",
-          magic, request->type, request->from, request->len);
+    TRACE("Got request: { magic = 0x%" PRIx32 ", .flags = %" PRIx16
+          ", .type = %" PRIx16 ", from = %" PRIu64 " , len = %" PRIu32 " }",
+          magic, request->flags, request->type, request->from, request->len);

     if (magic != NBD_REQUEST_MAGIC) {
         LOG("invalid magic (got 0x%" PRIx32 ")", magic);
@@ -993,7 +996,6 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
                                       struct nbd_request *request)
 {
     NBDClient *client = req->client;
-    uint32_t command;
     ssize_t rc;

     g_assert(qemu_in_coroutine());
@@ -1010,8 +1012,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,

     TRACE("Decoding type");

-    command = request->type & NBD_CMD_MASK_COMMAND;
-    if (command == NBD_CMD_DISC) {
+    if (request->type == NBD_CMD_DISC) {
         /* Special case: we're going to disconnect without a reply,
          * whether or not flags, from, or len are bogus */
         TRACE("Request type is DISCONNECT");
@@ -1028,7 +1029,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
         goto out;
     }

-    if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
+    if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE) {
         if (request->len > NBD_MAX_BUFFER_SIZE) {
             LOG("len (%" PRIu32" ) is larger than max len (%u)",
                 request->len, NBD_MAX_BUFFER_SIZE);
@@ -1042,7 +1043,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
             goto out;
         }
     }
-    if (command == NBD_CMD_WRITE) {
+    if (request->type == NBD_CMD_WRITE) {
         TRACE("Reading %" PRIu32 " byte(s)", request->len);

         if (read_sync(client->ioc, req->data, request->len) != request->len) {
@@ -1061,10 +1062,10 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
         rc = -EINVAL;
         goto out;
     }
-    if (request->type & ~NBD_CMD_MASK_COMMAND & ~NBD_CMD_FLAG_FUA) {
-        LOG("unsupported flags (got 0x%x)",
-            request->type & ~NBD_CMD_MASK_COMMAND);
-        return -EINVAL;
+    if (request->flags & ~NBD_CMD_FLAG_FUA) {
+        LOG("unsupported flags (got 0x%x)", request->flags);
+        rc = -EINVAL;
+        goto out;
     }

     rc = 0;
@@ -1084,7 +1085,6 @@ static void nbd_trip(void *opaque)
     struct nbd_request request;
     struct nbd_reply reply;
     ssize_t ret;
-    uint32_t command;
     int flags;

     TRACE("Reading request.");
@@ -1108,7 +1108,6 @@ static void nbd_trip(void *opaque)
         reply.error = -ret;
         goto error_reply;
     }
-    command = request.type & NBD_CMD_MASK_COMMAND;

     if (client->closing) {
         /*
@@ -1118,11 +1117,12 @@ static void nbd_trip(void *opaque)
         goto done;
     }

-    switch (command) {
+    switch (request.type) {
     case NBD_CMD_READ:
         TRACE("Request type is READ");

-        if (request.type & NBD_CMD_FLAG_FUA) {
+        /* XXX: NBD Protocol only documents use of FUA with WRITE */
+        if (request.flags & NBD_CMD_FLAG_FUA) {
             ret = blk_co_flush(exp->blk);
             if (ret < 0) {
                 LOG("flush failed");
@@ -1155,7 +1155,7 @@ static void nbd_trip(void *opaque)
         TRACE("Writing to device");

         flags = 0;
-        if (request.type & NBD_CMD_FLAG_FUA) {
+        if (request.flags & NBD_CMD_FLAG_FUA) {
             flags |= BDRV_REQ_FUA;
         }
         ret = blk_pwrite(exp->blk, request.from + exp->dev_offset,
@@ -1208,8 +1208,9 @@ static void nbd_trip(void *opaque)
          * NBD_CMD_READ, since we choose not to send bogus filler
          * data; likewise after NBD_CMD_WRITE if we did not read the
          * payload. */
-        if (nbd_co_send_reply(req, &reply, 0) < 0 || command == NBD_CMD_READ ||
-            (command == NBD_CMD_WRITE && !req->complete)) {
+        if (nbd_co_send_reply(req, &reply, 0) < 0 ||
+            request.type == NBD_CMD_READ ||
+            (request.type == NBD_CMD_WRITE && !req->complete)) {
             goto out;
         }
         break;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 31/44] nbd: Share common reply-sending code in server
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (29 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 30/44] nbd: Treat flags vs. command type as separate fields Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25  9:34   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 32/44] nbd: Share common option-sending code in client Eric Blake
                   ` (12 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

Rather than open-coding NBD_REP_SERVER, reuse the code we
already have by adding a length parameter.  Additionally,
the refactoring will make adding NBD_OPT_GO in a later patch
easier.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: rebase to changes earlier in series
---
 nbd/server.c | 48 +++++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 1d30b6d..4435d37 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -195,12 +195,15 @@ static ssize_t nbd_negotiate_drop_sync(QIOChannel *ioc, size_t size)

 */

-static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
+/* Send a reply header, including length, but no payload.
+ * Return -errno to kill connection, 0 to continue negotiation. */
+static int nbd_negotiate_send_rep_len(QIOChannel *ioc, uint32_t type,
+                                      uint32_t opt, uint32_t len)
 {
     uint64_t magic;
-    uint32_t len;

-    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32, type, opt);
+    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32 " len=%" PRIu32,
+          type, opt, len);

     magic = cpu_to_be64(NBD_REP_MAGIC);
     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
@@ -217,7 +220,7 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
         LOG("write failed (rep type)");
         return -EINVAL;
     }
-    len = cpu_to_be32(0);
+    len = cpu_to_be32(len);
     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
         LOG("write failed (rep data length)");
         return -EINVAL;
@@ -225,37 +228,32 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
     return 0;
 }

+/* Send a reply header with default 0 length.
+ * Return -errno to kill connection, 0 to continue negotiation. */
+static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
+{
+    return nbd_negotiate_send_rep_len(ioc, type, opt, 0);
+}
+
+/* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
+ * Return -errno to kill connection, 0 to continue negotiation. */
 static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
 {
-    uint64_t magic;
     size_t name_len, desc_len;
-    uint32_t opt, type, len;
+    uint32_t len;
     const char *name = exp->name ? exp->name : "";
     const char *desc = exp->description ? exp->description : "";
+    int rc;

     TRACE("Advertising export name '%s' description '%s'", name, desc);
     name_len = strlen(name);
     desc_len = strlen(desc);
-    magic = cpu_to_be64(NBD_REP_MAGIC);
-    if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        LOG("write failed (magic)");
-        return -EINVAL;
-     }
-    opt = cpu_to_be32(NBD_OPT_LIST);
-    if (nbd_negotiate_write(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        LOG("write failed (opt)");
-        return -EINVAL;
-    }
-    type = cpu_to_be32(NBD_REP_SERVER);
-    if (nbd_negotiate_write(ioc, &type, sizeof(type)) != sizeof(type)) {
-        LOG("write failed (reply type)");
-        return -EINVAL;
-    }
-    len = cpu_to_be32(name_len + desc_len + sizeof(len));
-    if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
-        LOG("write failed (length)");
-        return -EINVAL;
+    len = name_len + desc_len + sizeof(len);
+    rc = nbd_negotiate_send_rep_len(ioc, NBD_REP_SERVER, NBD_OPT_LIST, len);
+    if (rc < 0) {
+        return rc;
     }
+
     len = cpu_to_be32(name_len);
     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
         LOG("write failed (name length)");
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 32/44] nbd: Share common option-sending code in client
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (30 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 31/44] nbd: Share common reply-sending code in server Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25  9:37   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 33/44] nbd: Let client skip portions of server reply Eric Blake
                   ` (11 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

Rather than open-coding each option request, it's easier to
have common helper functions do the work.  That in turn requires
having convenient packed types for handling option requests
and replies.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

---
v3: rebase, tweak a debug message
---
 include/block/nbd.h |  29 +++++-
 nbd/nbd-internal.h  |   2 +-
 nbd/client.c        | 250 ++++++++++++++++++++++------------------------------
 3 files changed, 129 insertions(+), 152 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index f4ae86c..5227ec6 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -26,20 +26,41 @@
 #include "io/channel-socket.h"
 #include "crypto/tlscreds.h"

+/* Handshake phase structs */
+
+struct nbd_option {
+    uint64_t magic; /* NBD_OPTS_MAGIC */
+    uint32_t option; /* NBD_OPT_* */
+    uint32_t length;
+} QEMU_PACKED;
+typedef struct nbd_option nbd_option;
+
+struct nbd_opt_reply {
+    uint64_t magic; /* NBD_REP_MAGIC */
+    uint32_t option; /* NBD_OPT_* */
+    uint32_t type; /* NBD_REP_* */
+    uint32_t length;
+} QEMU_PACKED;
+typedef struct nbd_opt_reply nbd_opt_reply;
+
+/* Transmission phase structs */
+
 struct nbd_request {
-    uint32_t magic;
-    uint16_t flags;
-    uint16_t type;
+    uint32_t magic; /* NBD_REQUEST_MAGIC */
+    uint16_t flags; /* NBD_CMD_FLAG_* */
+    uint16_t type; /* NBD_CMD_* */
     uint64_t handle;
     uint64_t from;
     uint32_t len;
 } QEMU_PACKED;
+typedef struct nbd_request nbd_request;

 struct nbd_reply {
-    uint32_t magic;
+    uint32_t magic; /* NBD_REPLY_MAGIC */
     uint32_t error;
     uint64_t handle;
 } QEMU_PACKED;
+typedef struct nbd_reply nbd_reply;

 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index d0108a1..95069db 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -61,7 +61,7 @@
 #define NBD_REPLY_MAGIC         0x67446698
 #define NBD_OPTS_MAGIC          0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC        0x0000420281861253LL
-#define NBD_REP_MAGIC           0x3e889045565a9LL
+#define NBD_REP_MAGIC           0x0003e889045565a9LL

 #define NBD_SET_SOCK            _IO(0xab, 0)
 #define NBD_SET_BLKSIZE         _IO(0xab, 1)
diff --git a/nbd/client.c b/nbd/client.c
index a9e173a..6626fa8 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -75,64 +75,123 @@ static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

 */

+/* Send an option request. Return 0 if successful, -1 with errp set if
+ * it is impossible to continue. */
+static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
+                                   uint32_t len, const char *data,
+                                   Error **errp)
+{
+    nbd_option req;
+    QEMU_BUILD_BUG_ON(sizeof(req) != 16);

-/* If type represents success, return 1 without further action.
- * If type represents an error reply, consume the rest of the packet on ioc.
- * Then return 0 for unsupported (so the client can fall back to
- * other approaches), or -1 with errp set for other errors.
+    if (len == -1) {
+        req.length = len = strlen(data);
+    }
+    TRACE("Sending option request %"PRIu32", len %"PRIu32, opt, len);
+
+    stq_be_p(&req.magic, NBD_OPTS_MAGIC);
+    stl_be_p(&req.option, opt);
+    stl_be_p(&req.length, len);
+
+    if (write_sync(ioc, &req, sizeof(req)) != sizeof(req)) {
+        error_setg(errp, "Failed to send option request header");
+        return -1;
+    }
+
+    if (len && write_sync(ioc, (char *) data, len) != len) {
+        error_setg(errp, "Failed to send option request data");
+        return -1;
+    }
+
+    return 0;
+}
+
+/* Receive the header of an option reply, which should match the given
+ * opt.  Read through the length field, but NOT the length bytes of
+ * payload. Return 0 if successful, -1 with errp set if it is
+ * impossible to continue. */
+static int nbd_receive_option_reply(QIOChannel *ioc, uint32_t opt,
+                                    nbd_opt_reply *reply, Error **errp)
+{
+    QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
+    if (read_sync(ioc, reply, sizeof(*reply)) != sizeof(*reply)) {
+        error_setg(errp, "failed to read option reply");
+        return -1;
+    }
+    be64_to_cpus(&reply->magic);
+    be32_to_cpus(&reply->option);
+    be32_to_cpus(&reply->type);
+    be32_to_cpus(&reply->length);
+
+    TRACE("Received option reply %"PRIx32", type %"PRIx32", len %"PRIu32,
+          reply->option, reply->type, reply->length);
+
+    if (reply->magic != NBD_REP_MAGIC) {
+        error_setg(errp, "Unexpected option reply magic");
+        return -1;
+    }
+    if (reply->option != opt) {
+        error_setg(errp, "Unexpected option type %x expected %x",
+                   reply->option, opt);
+        return -1;
+    }
+    return 0;
+}
+
+/* If reply represents success, return 1 without further action.
+ * If reply represents an error, consume the optional payload of
+ * the packet on ioc.  Then return 0 for unsupported (so the client
+ * can fall back to other approaches), or -1 with errp set for other
+ * errors.
  */
-static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,
+static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
                                 Error **errp)
 {
-    uint32_t len;
     char *msg = NULL;
     int result = -1;

-    if (!(type & (1 << 31))) {
+    if (!(reply->type & (1 << 31))) {
         return 1;
     }

-    if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
-        error_setg(errp, "failed to read option length");
-        return -1;
-    }
-    len = be32_to_cpu(len);
-    if (len) {
-        if (len > NBD_MAX_BUFFER_SIZE) {
+    if (reply->length) {
+        if (reply->length > NBD_MAX_BUFFER_SIZE) {
             error_setg(errp, "server's error message is too long");
             goto cleanup;
         }
-        msg = g_malloc(len + 1);
-        if (read_sync(ioc, msg, len) != len) {
+        msg = g_malloc(reply->length + 1);
+        if (read_sync(ioc, msg, reply->length) != reply->length) {
             error_setg(errp, "failed to read option error message");
             goto cleanup;
         }
-        msg[len] = '\0';
+        msg[reply->length] = '\0';
     }

-    switch (type) {
+    switch (reply->type) {
     case NBD_REP_ERR_UNSUP:
         TRACE("server doesn't understand request %" PRIx32
-              ", attempting fallback", opt);
+              ", attempting fallback", reply->option);
         result = 0;
         goto cleanup;

     case NBD_REP_ERR_POLICY:
-        error_setg(errp, "Denied by server for option %" PRIx32, opt);
+        error_setg(errp, "Denied by server for option %" PRIx32,
+                   reply->option);
         break;

     case NBD_REP_ERR_INVALID:
-        error_setg(errp, "Invalid data length for option %" PRIx32, opt);
+        error_setg(errp, "Invalid data length for option %" PRIx32,
+                   reply->option);
         break;

     case NBD_REP_ERR_TLS_REQD:
         error_setg(errp, "TLS negotiation required before option %" PRIx32,
-                   opt);
+                   reply->option);
         break;

     default:
         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
-                   opt);
+                   reply->option);
         break;
     }

@@ -147,58 +206,29 @@ static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,

 static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
 {
-    uint64_t magic;
-    uint32_t opt;
-    uint32_t type;
+    nbd_opt_reply reply;
     uint32_t len;
     uint32_t namelen;
     int error;

     *name = NULL;
-    if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "failed to read list option magic");
+    if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
         return -1;
     }
-    magic = be64_to_cpu(magic);
-    if (magic != NBD_REP_MAGIC) {
-        error_setg(errp, "Unexpected option list magic");
-        return -1;
-    }
-    if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "failed to read list option");
-        return -1;
-    }
-    opt = be32_to_cpu(opt);
-    if (opt != NBD_OPT_LIST) {
-        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
-                   opt, NBD_OPT_LIST);
-        return -1;
-    }
-
-    if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
-        error_setg(errp, "failed to read list option type");
-        return -1;
-    }
-    type = be32_to_cpu(type);
-    error = nbd_handle_reply_err(ioc, opt, type, errp);
+    error = nbd_handle_reply_err(ioc, &reply, errp);
     if (error <= 0) {
         return error;
     }
+    len = reply.length;

-    if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
-        error_setg(errp, "failed to read option length");
-        return -1;
-    }
-    len = be32_to_cpu(len);
-
-    if (type == NBD_REP_ACK) {
+    if (reply.type == NBD_REP_ACK) {
         if (len != 0) {
             error_setg(errp, "length too long for option end");
             return -1;
         }
-    } else if (type == NBD_REP_SERVER) {
+    } else if (reply.type == NBD_REP_SERVER) {
         if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
-            error_setg(errp, "incorrect option length");
+            error_setg(errp, "incorrect option length %"PRIu32, len);
             return -1;
         }
         if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
@@ -240,7 +270,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
         }
     } else {
         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
-                   type, NBD_REP_SERVER);
+                   reply.type, NBD_REP_SERVER);
         return -1;
     }
     return 1;
@@ -251,24 +281,10 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
                                      const char *wantname,
                                      Error **errp)
 {
-    uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
-    uint32_t opt = cpu_to_be32(NBD_OPT_LIST);
-    uint32_t length = 0;
     bool foundExport = false;

     TRACE("Querying export list");
-    if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "Failed to send list option magic");
-        return -1;
-    }
-
-    if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "Failed to send list option number");
-        return -1;
-    }
-
-    if (write_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
-        error_setg(errp, "Failed to send list option length");
+    if (nbd_send_option_request(ioc, NBD_OPT_LIST, 0, NULL, errp) < 0) {
         return -1;
     }

@@ -314,72 +330,29 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
                                         QCryptoTLSCreds *tlscreds,
                                         const char *hostname, Error **errp)
 {
-    uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
-    uint32_t opt = cpu_to_be32(NBD_OPT_STARTTLS);
-    uint32_t length = 0;
-    uint32_t type;
+    nbd_opt_reply reply;
     QIOChannelTLS *tioc;
     struct NBDTLSHandshakeData data = { 0 };

     TRACE("Requesting TLS from server");
-    if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "Failed to send option magic");
-        return NULL;
-    }
-
-    if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "Failed to send option number");
-        return NULL;
-    }
-
-    if (write_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
-        error_setg(errp, "Failed to send option length");
-        return NULL;
-    }
-
-    TRACE("Getting TLS reply from server1");
-    if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-        error_setg(errp, "failed to read option magic");
-        return NULL;
-    }
-    magic = be64_to_cpu(magic);
-    if (magic != NBD_REP_MAGIC) {
-        error_setg(errp, "Unexpected option magic");
-        return NULL;
-    }
-    TRACE("Getting TLS reply from server2");
-    if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-        error_setg(errp, "failed to read option");
-        return NULL;
-    }
-    opt = be32_to_cpu(opt);
-    if (opt != NBD_OPT_STARTTLS) {
-        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
-                   opt, NBD_OPT_STARTTLS);
+    if (nbd_send_option_request(ioc, NBD_OPT_STARTTLS, 0, NULL, errp) < 0) {
         return NULL;
     }

     TRACE("Getting TLS reply from server");
-    if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
-        error_setg(errp, "failed to read option type");
+    if (nbd_receive_option_reply(ioc, NBD_OPT_STARTTLS, &reply, errp) < 0) {
         return NULL;
     }
-    type = be32_to_cpu(type);
-    if (type != NBD_REP_ACK) {
+
+    if (reply.type != NBD_REP_ACK) {
         error_setg(errp, "Server rejected request to start TLS %" PRIx32,
-                   type);
+                   reply.type);
         return NULL;
     }

-    TRACE("Getting TLS reply from server");
-    if (read_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
-        error_setg(errp, "failed to read option length");
-        return NULL;
-    }
-    length = be32_to_cpu(length);
-    if (length != 0) {
+    if (reply.length != 0) {
         error_setg(errp, "Start TLS response was not zero %" PRIu32,
-                   length);
+                   reply.length);
         return NULL;
     }

@@ -466,8 +439,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,

     if (magic == NBD_OPTS_MAGIC) {
         uint32_t clientflags = 0;
-        uint32_t opt;
-        uint32_t namesize;
         uint16_t globalflags;
         bool fixedNewStyle = false;

@@ -517,28 +488,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                 goto fail;
             }
         }
-        /* write the export name */
-        magic = cpu_to_be64(magic);
-        if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
-            error_setg(errp, "Failed to send export name magic");
-            goto fail;
-        }
-        opt = cpu_to_be32(NBD_OPT_EXPORT_NAME);
-        if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
-            error_setg(errp, "Failed to send export name option number");
-            goto fail;
-        }
-        namesize = cpu_to_be32(strlen(name));
-        if (write_sync(ioc, &namesize, sizeof(namesize)) !=
-            sizeof(namesize)) {
-            error_setg(errp, "Failed to send export name length");
-            goto fail;
-        }
-        if (write_sync(ioc, (char *)name, strlen(name)) != strlen(name)) {
-            error_setg(errp, "Failed to send export name");
+        /* write the export name request */
+        if (nbd_send_option_request(ioc, NBD_OPT_EXPORT_NAME, -1, name,
+                                    errp) < 0) {
             goto fail;
         }

+        /* Read the response */
         if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
             error_setg(errp, "Failed to read export length");
             goto fail;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 33/44] nbd: Let client skip portions of server reply
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (31 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 32/44] nbd: Share common option-sending code in client Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 34/44] nbd: Less allocation during NBD_OPT_LIST Eric Blake
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

The server has a nice helper function nbd_negotiate_drop_sync()
which lets it easily ignore fluff from the client (such as the
payload to an unknown option request).  We can't quite make it
common, since it depends on nbd_negotiate_read() which handles
coroutine magic, but we can copy the idea into the client where
we have places where we want to ignore data (such as the
description tacked on the end of NBD_REP_SERVER).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

---
v3: rebase
---
 nbd/client.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 6626fa8..d346c86 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -75,6 +75,32 @@ static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

 */

+/* Discard length bytes from channel.  Return -errno on failure, or
+ * the amount of bytes consumed. */
+static ssize_t drop_sync(QIOChannel *ioc, size_t size)
+{
+    ssize_t ret, dropped = size;
+    char small[1024];
+    char *buffer;
+
+    buffer = sizeof(small) < size ? small : g_malloc(MIN(65536, size));
+    while (size > 0) {
+        ret = read_sync(ioc, buffer, MIN(65536, size));
+        if (ret < 0) {
+            goto cleanup;
+        }
+        assert(ret <= size);
+        size -= ret;
+    }
+    ret = dropped;
+
+ cleanup:
+    if (buffer != small) {
+        g_free(buffer);
+    }
+    return ret;
+}
+
 /* Send an option request. Return 0 if successful, -1 with errp set if
  * it is impossible to continue. */
 static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
@@ -255,18 +281,11 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
         }
         (*name)[namelen] = '\0';
         len -= namelen;
-        if (len) {
-            char *buf = g_malloc(len + 1);
-            if (read_sync(ioc, buf, len) != len) {
-                error_setg(errp, "failed to read export description");
-                g_free(*name);
-                g_free(buf);
-                *name = NULL;
-                return -1;
-            }
-            buf[len] = '\0';
-            TRACE("Ignoring export description: %s", buf);
-            g_free(buf);
+        if (drop_sync(ioc, len) != len) {
+            error_setg(errp, "failed to read export description");
+            g_free(*name);
+            *name = NULL;
+            return -1;
         }
     } else {
         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
@@ -541,7 +560,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
     }

     TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
-    if (read_sync(ioc, &buf, 124) != 124) {
+    if (drop_sync(ioc, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 34/44] nbd: Less allocation during NBD_OPT_LIST
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (32 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 33/44] nbd: Let client skip portions of server reply Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 35/44] nbd: Support shorter handshake Eric Blake
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

Since we know that the maximum name we are willing to accept
is small enough to stack-allocate, rework the iteration over
NBD_OPT_LIST responses to reuse a stack buffer rather than
allocating every time.  Furthermore, we don't even have to
allocate if we know the server's length doesn't match what
we are searching for.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

---
v3: tweak commit message
---
 nbd/client.c | 130 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 65 insertions(+), 65 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index d346c86..4debb5d 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -230,14 +230,17 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
     return result;
 }

-static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
+/* Return -1 if unrecoverable error occurs, 0 if NBD_OPT_LIST is
+ * unsupported, 1 if iteration is done, 2 to keep looking, and 3 if
+ * this entry matches want. */
+static int nbd_receive_list(QIOChannel *ioc, const char *want, Error **errp)
 {
     nbd_opt_reply reply;
     uint32_t len;
     uint32_t namelen;
+    char name[NBD_MAX_NAME_SIZE + 1];
     int error;

-    *name = NULL;
     if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
         return -1;
     }
@@ -252,97 +255,94 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
             error_setg(errp, "length too long for option end");
             return -1;
         }
-    } else if (reply.type == NBD_REP_SERVER) {
-        if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
-            error_setg(errp, "incorrect option length %"PRIu32, len);
-            return -1;
-        }
-        if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
-            error_setg(errp, "failed to read option name length");
-            return -1;
-        }
-        namelen = be32_to_cpu(namelen);
-        len -= sizeof(namelen);
-        if (len < namelen) {
-            error_setg(errp, "incorrect option name length");
-            return -1;
-        }
-        if (namelen > NBD_MAX_NAME_SIZE) {
-            error_setg(errp, "export name length too long %" PRIu32, namelen);
-            return -1;
-        }
-
-        *name = g_new0(char, namelen + 1);
-        if (read_sync(ioc, *name, namelen) != namelen) {
-            error_setg(errp, "failed to read export name");
-            g_free(*name);
-            *name = NULL;
-            return -1;
-        }
-        (*name)[namelen] = '\0';
-        len -= namelen;
-        if (drop_sync(ioc, len) != len) {
-            error_setg(errp, "failed to read export description");
-            g_free(*name);
-            *name = NULL;
-            return -1;
-        }
-    } else {
+        return 1;
+    } else if (reply.type != NBD_REP_SERVER) {
         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
                    reply.type, NBD_REP_SERVER);
         return -1;
     }
-    return 1;
+
+    if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
+        error_setg(errp, "incorrect option length %"PRIu32, len);
+        return -1;
+    }
+    if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
+        error_setg(errp, "failed to read option name length");
+        return -1;
+    }
+    namelen = be32_to_cpu(namelen);
+    len -= sizeof(namelen);
+    if (len < namelen) {
+        error_setg(errp, "incorrect option name length");
+        return -1;
+    }
+    if (namelen != strlen(want)) {
+        if (drop_sync(ioc, len) != len) {
+            error_setg(errp, "failed to skip export name with wrong length");
+            return -1;
+        }
+        return 2;
+    }
+
+    assert(namelen < sizeof(name));
+    if (read_sync(ioc, name, namelen) != namelen) {
+        error_setg(errp, "failed to read export name");
+        return -1;
+    }
+    name[namelen] = '\0';
+    len -= namelen;
+    if (drop_sync(ioc, len) != len) {
+        error_setg(errp, "failed to read export description");
+        return -1;
+    }
+    return strcmp(name, want) == 0 ? 3 : 2;
 }


+/* Return -1 on failure, 0 if wantname is an available export. */
 static int nbd_receive_query_exports(QIOChannel *ioc,
                                      const char *wantname,
                                      Error **errp)
 {
     bool foundExport = false;

-    TRACE("Querying export list");
+    TRACE("Querying export list for '%s'", wantname);
     if (nbd_send_option_request(ioc, NBD_OPT_LIST, 0, NULL, errp) < 0) {
         return -1;
     }

     TRACE("Reading available export names");
     while (1) {
-        char *name = NULL;
-        int ret = nbd_receive_list(ioc, &name, errp);
+        int ret = nbd_receive_list(ioc, wantname, errp);

-        if (ret < 0) {
-            g_free(name);
-            name = NULL;
+        switch (ret) {
+        default:
+            /* Server gave unexpected reply */
+            assert(ret < 0);
             return -1;
-        }
-        if (ret == 0) {
+        case 0:
             /* Server doesn't support export listing, so
              * we will just assume an export with our
              * wanted name exists */
-            foundExport = true;
-            break;
-        }
-        if (name == NULL) {
-            TRACE("End of export name list");
+            return 0;
+        case 1:
+            /* Done iterating. */
+            if (!foundExport) {
+                error_setg(errp, "No export with name '%s' available",
+                           wantname);
+                return -1;
+            }
+            return 0;
+        case 2:
+            /* Wasn't this one, keep going. */
             break;
-        }
-        if (g_str_equal(name, wantname)) {
+        case 3:
+            /* Found a match, but must finish parsing reply. */
+            TRACE("Found desired export name '%s'", wantname);
             foundExport = true;
-            TRACE("Found desired export name '%s'", name);
-        } else {
-            TRACE("Ignored export name '%s'", name);
+            break;
         }
-        g_free(name);
-    }
-
-    if (!foundExport) {
-        error_setg(errp, "No export with name '%s' available", wantname);
-        return -1;
     }
-
-    return 0;
 }

 static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 35/44] nbd: Support shorter handshake
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (33 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 34/44] nbd: Less allocation during NBD_OPT_LIST Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests Eric Blake
                   ` (8 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

The NBD Protocol allows the server and client to mutually agree
on a shorter handshake (omit the 124 bytes of reserved 0), via
the server advertising NBD_FLAG_NO_ZEROES and the client
acknowledging with NBD_FLAG_C_NO_ZEROES (only possible in
newstyle, whether or not it is fixed newstyle).  It doesn't
shave much off the wire, but we might as well implement it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

---
v3: rebase
---
 include/block/nbd.h |  6 ++++--
 nbd/client.c        |  8 +++++++-
 nbd/server.c        | 15 +++++++++++----
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 5227ec6..d707761 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -73,11 +73,13 @@ typedef struct nbd_reply nbd_reply;

 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
-#define NBD_FLAG_FIXED_NEWSTYLE     (1 << 0)    /* Fixed newstyle protocol. */
+#define NBD_FLAG_FIXED_NEWSTYLE   (1 << 0) /* Fixed newstyle protocol. */
+#define NBD_FLAG_NO_ZEROES        (1 << 1) /* End handshake without zeroes. */

 /* New-style client flags, sent from client to server to control what happens
    during handshake phase. */
-#define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)    /* Fixed newstyle protocol. */
+#define NBD_FLAG_C_FIXED_NEWSTYLE (1 << 0) /* Fixed newstyle protocol. */
+#define NBD_FLAG_C_NO_ZEROES      (1 << 1) /* End handshake without zeroes. */

 /* Reply types. */
 #define NBD_REP_ACK             (1)             /* Data sending finished. */
diff --git a/nbd/client.c b/nbd/client.c
index 4debb5d..68e9473 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -409,6 +409,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
     char buf[256];
     uint64_t magic, s;
     int rc;
+    bool zeroes = true;

     TRACE("Receiving negotiation tlscreds=%p hostname=%s.",
           tlscreds, hostname ? hostname : "<null>");
@@ -473,6 +474,11 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
             TRACE("Server supports fixed new style");
             clientflags |= NBD_FLAG_C_FIXED_NEWSTYLE;
         }
+        if (globalflags & NBD_FLAG_NO_ZEROES) {
+            zeroes = false;
+            TRACE("Server supports no zeroes");
+            clientflags |= NBD_FLAG_C_NO_ZEROES;
+        }
         /* client requested flags */
         clientflags = cpu_to_be32(clientflags);
         if (write_sync(ioc, &clientflags, sizeof(clientflags)) !=
@@ -560,7 +566,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
     }

     TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
-    if (drop_sync(ioc, 124) != 124) {
+    if (zeroes && drop_sync(ioc, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
     }
diff --git a/nbd/server.c b/nbd/server.c
index 4435d37..dadc928 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -80,6 +80,7 @@ struct NBDClient {
     int refcount;
     void (*close)(NBDClient *client);

+    bool no_zeroes;
     NBDExport *exp;
     QCryptoTLSCreds *tlscreds;
     char *tlsaclname;
@@ -409,6 +410,11 @@ static int nbd_negotiate_options(NBDClient *client)
         fixedNewstyle = true;
         flags &= ~NBD_FLAG_C_FIXED_NEWSTYLE;
     }
+    if (flags & NBD_FLAG_C_NO_ZEROES) {
+        TRACE("Client supports no zeroes at handshake end");
+        client->no_zeroes = true;
+        flags &= ~NBD_FLAG_C_NO_ZEROES;
+    }
     if (flags != 0) {
         TRACE("Unknown client flags 0x%" PRIx32 " received", flags);
         return -EIO;
@@ -556,6 +562,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
                               NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
     bool oldStyle;
+    size_t len;

     /* Old style negotiation header without options
         [ 0 ..   7]   passwd       ("NBDMAGIC")
@@ -572,7 +579,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
         ....options sent....
         [18 ..  25]   size
         [26 ..  27]   export flags
-        [28 .. 151]   reserved     (0)
+        [28 .. 151]   reserved     (0, omit if no_zeroes)
      */

     qio_channel_set_blocking(client->ioc, false, NULL);
@@ -589,7 +596,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
     } else {
         stq_be_p(buf + 8, NBD_OPTS_MAGIC);
-        stw_be_p(buf + 16, NBD_FLAG_FIXED_NEWSTYLE);
+        stw_be_p(buf + 16, NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES);
     }

     if (oldStyle) {
@@ -614,8 +621,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)

         stq_be_p(buf + 18, client->exp->size);
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
-        if (nbd_negotiate_write(client->ioc, buf + 18, sizeof(buf) - 18) !=
-            sizeof(buf) - 18) {
+        len = client->no_zeroes ? 10 : sizeof(buf) - 18;
+        if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
             LOG("write failed");
             goto fail;
         }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (34 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 35/44] nbd: Support shorter handshake Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25  9:47   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 37/44] nbd: Create struct for tracking export info Eric Blake
                   ` (7 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

NBD commit 6d34500b clarified how clients and servers are supposed
to behave before closing a connection. It added NBD_REP_ERR_SHUTDOWN
(for the server to announce it is about to go away during option
haggling, so the client should quit sending NBD_OPT_* other than
NBD_OPT_ABORT) and ESHUTDOWN (for the server to announce it is about
to go away during transmission, so the client should quit sending
NBD_CMD_* other than NBD_CMD_DISC).  It also clarified that
NBD_OPT_ABORT gets a reply, while NBD_CMD_DISC does not.

This patch merely adds the missing reply to NBD_OPT_ABORT and teaches
the client to recognize server errors.  Actually teaching the server
to send NBD_REP_ERR_SHUTDOWN or ESHUTDOWN would require knowing that
the server has been requested to shut down soon (maybe we could do
that by installing a SIGINT handler in qemu-nbd, which transitions
from RUNNING to a new state that waits for the client to react,
rather than just out-right quitting).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h | 13 +++++++++----
 nbd/nbd-internal.h  |  1 +
 nbd/client.c        | 16 ++++++++++++++++
 nbd/server.c        | 16 +++++++++++++++-
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index d707761..2fd1a67 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -82,12 +82,17 @@ typedef struct nbd_reply nbd_reply;
 #define NBD_FLAG_C_NO_ZEROES      (1 << 1) /* End handshake without zeroes. */

 /* Reply types. */
+#define NBD_REP_ERR(value) ((UINT32_C(1) << 31) | (value))
+
 #define NBD_REP_ACK             (1)             /* Data sending finished. */
 #define NBD_REP_SERVER          (2)             /* Export description. */
-#define NBD_REP_ERR_UNSUP       ((UINT32_C(1) << 31) | 1) /* Unknown option. */
-#define NBD_REP_ERR_POLICY      ((UINT32_C(1) << 31) | 2) /* Server denied */
-#define NBD_REP_ERR_INVALID     ((UINT32_C(1) << 31) | 3) /* Invalid length. */
-#define NBD_REP_ERR_TLS_REQD    ((UINT32_C(1) << 31) | 5) /* TLS required */
+
+#define NBD_REP_ERR_UNSUP       NBD_REP_ERR(1)  /* Unknown option. */
+#define NBD_REP_ERR_POLICY      NBD_REP_ERR(2)  /* Server denied */
+#define NBD_REP_ERR_INVALID     NBD_REP_ERR(3)  /* Invalid length */
+#define NBD_REP_ERR_PLATFORM    NBD_REP_ERR(4)  /* Not compiled in */
+#define NBD_REP_ERR_TLS_REQD    NBD_REP_ERR(5)  /* TLS required */
+#define NBD_REP_ERR_SHUTDOWN    NBD_REP_ERR(7)  /* Server shutting down */

 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0)
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 95069db..0d40b1f 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -91,6 +91,7 @@
 #define NBD_ENOMEM     12
 #define NBD_EINVAL     22
 #define NBD_ENOSPC     28
+#define NBD_ESHUTDOWN  108

 static inline ssize_t read_sync(QIOChannel *ioc, void *buffer, size_t size)
 {
diff --git a/nbd/client.c b/nbd/client.c
index 68e9473..4140d13 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -34,6 +34,8 @@ static int nbd_errno_to_system_errno(int err)
         return ENOMEM;
     case NBD_ENOSPC:
         return ENOSPC;
+    case NBD_ESHUTDOWN:
+        return ESHUTDOWN;
     default:
         TRACE("Squashing unexpected error %d to EINVAL", err);
         /* fallthrough */
@@ -210,11 +212,21 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
                    reply->option);
         break;

+    case NBD_REP_ERR_PLATFORM:
+        error_setg(errp, "Server lacks support for option %" PRIx32,
+                   reply->option);
+        break;
+
     case NBD_REP_ERR_TLS_REQD:
         error_setg(errp, "TLS negotiation required before option %" PRIx32,
                    reply->option);
         break;

+    case NBD_REP_ERR_SHUTDOWN:
+        error_setg(errp, "Server shutting down before option %" PRIx32,
+                   reply->option);
+        break;
+
     default:
         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
                    reply->option);
@@ -754,6 +766,10 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply)
         LOG("invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
+    if (reply->error == ESHUTDOWN) {
+        LOG("server shutting down");
+        return -EINVAL;
+    }
     return 0;
 }

diff --git a/nbd/server.c b/nbd/server.c
index dadc928..fa6a994 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -39,6 +39,8 @@ static int system_errno_to_nbd_errno(int err)
     case EFBIG:
     case ENOSPC:
         return NBD_ENOSPC;
+    case ESHUTDOWN:
+        return NBD_ESHUTDOWN;
     case EINVAL:
     default:
         return NBD_EINVAL;
@@ -484,6 +486,10 @@ static int nbd_negotiate_options(NBDClient *client)
                 if (ret < 0) {
                     return ret;
                 }
+                /* Let the client keep trying, unless they asked to quit */
+                if (clientflags == NBD_OPT_ABORT) {
+                    return -EINVAL;
+                }
                 break;
             }
         } else if (fixedNewstyle) {
@@ -496,7 +502,15 @@ static int nbd_negotiate_options(NBDClient *client)
                 break;

             case NBD_OPT_ABORT:
-                return -EINVAL;
+                /* NBD spec says we must reply before disconnecting,
+                 * but that we must also tolerate guests that don't
+                 * wait for our reply. */
+                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
+                                             clientflags);
+                if (!ret) {
+                    ret = -EINVAL;
+                }
+                return ret;

             case NBD_OPT_EXPORT_NAME:
                 return nbd_negotiate_handle_export_name(client, length);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 37/44] nbd: Create struct for tracking export info
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (35 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 38/44] block: Add blk_get_opt_transfer_length() Eric Blake
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

The NBD Protocol is introducing some additional information
about exports, such as minimum request size and alignment, as
well as an advertised maximum request size.  It will be easier
to feed this information back to the block layer if we gather
all the information into a struct, rather than adding yet more
pointer parameters during negotiation.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd-client.h  |  3 +--
 include/block/nbd.h | 15 +++++++++++----
 block/nbd-client.c  | 10 ++++------
 block/nbd.c         |  2 +-
 nbd/client.c        | 43 +++++++++++++++++++++++--------------------
 qemu-nbd.c          | 10 ++++------
 6 files changed, 44 insertions(+), 39 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 1243612..0867147 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -20,8 +20,7 @@
 typedef struct NbdClientSession {
     QIOChannelSocket *sioc; /* The master data channel */
     QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
-    uint16_t nbdflags;
-    off_t size;
+    NbdExportInfo info;

     CoMutex send_mutex;
     CoMutex free_sema;
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 2fd1a67..3fa7996 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -117,17 +117,24 @@ enum {
  * aren't overflowing some other buffer. */
 #define NBD_MAX_NAME_SIZE 256

+/* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
+struct NbdExportInfo {
+    uint64_t size;
+    uint16_t flags;
+};
+typedef struct NbdExportInfo NbdExportInfo;
+
 ssize_t nbd_wr_syncv(QIOChannel *ioc,
                      struct iovec *iov,
                      size_t niov,
                      size_t offset,
                      size_t length,
                      bool do_read);
-int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
+int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
-                          QIOChannel **outioc,
-                          off_t *size, Error **errp);
-int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
+                          QIOChannel **outioc, NbdExportInfo *info,
+                          Error **errp);
+int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info);
 ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request);
 ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply);
 int nbd_client(int fd);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 285025d..f20219b 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -251,7 +251,7 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
     struct nbd_reply reply;
     ssize_t ret;

-    if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
+    if ((*flags & BDRV_REQ_FUA) && (client->info.flags & NBD_FLAG_SEND_FUA)) {
         *flags &= ~BDRV_REQ_FUA;
         request.flags |= NBD_CMD_FLAG_FUA;
     }
@@ -316,7 +316,7 @@ int nbd_client_co_flush(BlockDriverState *bs)
     struct nbd_reply reply;
     ssize_t ret;

-    if (!(client->nbdflags & NBD_FLAG_SEND_FLUSH)) {
+    if (!(client->info.flags & NBD_FLAG_SEND_FLUSH)) {
         return 0;
     }

@@ -342,7 +342,7 @@ int nbd_client_co_discard(BlockDriverState *bs, int64_t sector_num,
     struct nbd_reply reply;
     ssize_t ret;

-    if (!(client->nbdflags & NBD_FLAG_SEND_TRIM)) {
+    if (!(client->info.flags & NBD_FLAG_SEND_TRIM)) {
         return 0;
     }
     request.from = sector_num * 512;
@@ -403,10 +403,8 @@ int nbd_client_init(BlockDriverState *bs,
     qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);

     ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), export,
-                                &client->nbdflags,
                                 tlscreds, hostname,
-                                &client->ioc,
-                                &client->size, errp);
+                                &client->ioc, &client->info, errp);
     if (ret < 0) {
         logout("Failed to negotiate with the NBD server\n");
         return ret;
diff --git a/block/nbd.c b/block/nbd.c
index f7ea3b3..34db83e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -406,7 +406,7 @@ static int64_t nbd_getlength(BlockDriverState *bs)
 {
     BDRVNBDState *s = bs->opaque;

-    return s->client.size;
+    return s->client.info.size;
 }

 static void nbd_detach_aio_context(BlockDriverState *bs)
diff --git a/nbd/client.c b/nbd/client.c
index 4140d13..89fa2c3 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -413,13 +413,13 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
 }


-int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
+int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
-                          QIOChannel **outioc,
-                          off_t *size, Error **errp)
+                          QIOChannel **outioc, NbdExportInfo *info,
+                          Error **errp)
 {
     char buf[256];
-    uint64_t magic, s;
+    uint64_t magic;
     int rc;
     bool zeroes = true;

@@ -532,17 +532,19 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
         }

         /* Read the response */
-        if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
+        if (read_sync(ioc, &info->size, sizeof(info->size)) !=
+            sizeof(info->size)) {
             error_setg(errp, "Failed to read export length");
             goto fail;
         }
-        *size = be64_to_cpu(s);
+        be64_to_cpus(&info->size);

-        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
+        if (read_sync(ioc, &info->flags, sizeof(info->flags)) !=
+            sizeof(info->flags)) {
             error_setg(errp, "Failed to read export flags");
             goto fail;
         }
-        be16_to_cpus(flags);
+        be16_to_cpus(&info->flags);
     } else if (magic == NBD_CLIENT_MAGIC) {
         uint32_t oldflags;

@@ -555,12 +557,12 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
             goto fail;
         }

-        if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
+        if (read_sync(ioc, &info->size, sizeof(info->size)) !=
+            sizeof(info->size)) {
             error_setg(errp, "Failed to read export length");
             goto fail;
         }
-        *size = be64_to_cpu(s);
-        TRACE("Size is %" PRIu64, *size);
+        be64_to_cpus(&info->size);

         if (read_sync(ioc, &oldflags, sizeof(oldflags)) != sizeof(oldflags)) {
             error_setg(errp, "Failed to read export flags");
@@ -571,13 +573,14 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
             error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
             goto fail;
         }
-        *flags = oldflags;
+        info->flags = oldflags;
     } else {
         error_setg(errp, "Bad magic received");
         goto fail;
     }

-    TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
+    TRACE("Size is %" PRIu64 ", export flags %" PRIx16,
+          info->size, info->flags);
     if (zeroes && drop_sync(ioc, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
@@ -589,11 +592,11 @@ fail:
 }

 #ifdef __linux__
-int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info)
 {
-    unsigned long sectors = size / BDRV_SECTOR_SIZE;
-    if (size / BDRV_SECTOR_SIZE != sectors) {
-        LOG("Export size %lld too large for 32-bit kernel", (long long) size);
+    unsigned long sectors = info->size / BDRV_SECTOR_SIZE;
+    if (info->size / BDRV_SECTOR_SIZE != sectors) {
+        LOG("Export size %" PRId64 " too large for 32-bit kernel", info->size);
         return -E2BIG;
     }

@@ -625,9 +628,9 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
         return -serrno;
     }

-    if (ioctl(fd, NBD_SET_FLAGS, (unsigned long) flags) < 0) {
+    if (ioctl(fd, NBD_SET_FLAGS, (unsigned long) info->flags) < 0) {
         if (errno == ENOTTY) {
-            int read_only = (flags & NBD_FLAG_READ_ONLY) != 0;
+            int read_only = (info->flags & NBD_FLAG_READ_ONLY) != 0;
             TRACE("Setting readonly attribute");

             if (ioctl(fd, BLKROSET, (unsigned long) &read_only) < 0) {
@@ -685,7 +688,7 @@ int nbd_disconnect(int fd)
 }

 #else
-int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *ioc, NbdExportInfo *info)
 {
     return -ENOTSUP;
 }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 01eb7e4..a7cadcb 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -244,8 +244,7 @@ static void *show_parts(void *arg)
 static void *nbd_client_thread(void *arg)
 {
     char *device = arg;
-    off_t size;
-    uint16_t nbdflags;
+    NbdExportInfo info;
     QIOChannelSocket *sioc;
     int fd;
     int ret;
@@ -260,9 +259,8 @@ static void *nbd_client_thread(void *arg)
         goto out;
     }

-    ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL, &nbdflags,
-                                NULL, NULL, NULL,
-                                &size, &local_error);
+    ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL,
+                                NULL, NULL, NULL, &info, &local_error);
     if (ret < 0) {
         if (local_error) {
             error_report_err(local_error);
@@ -277,7 +275,7 @@ static void *nbd_client_thread(void *arg)
         goto out_socket;
     }

-    ret = nbd_init(fd, sioc, nbdflags, size);
+    ret = nbd_init(fd, sioc, &info);
     if (ret < 0) {
         goto out_fd;
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 38/44] block: Add blk_get_opt_transfer_length()
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (36 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 37/44] nbd: Create struct for tracking export info Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 39/44] nbd: Implement NBD_OPT_GO on server Eric Blake
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Kevin Wolf, Max Reitz

The NBD protocol would like to advertise the optimal I/O
size to the client; but it would be a layering violation to
peek into blk_bs(blk)->bl, when we only have a BB.

I just copied the existing blk_get_max_transfer_length() in
reading a value from the top BDS; I have no idea if
bdrv_refresh_limits() properly picks a size valid over the
entire BDS chain as part of its recursion, but if not, it
would be audit of existing code in addition to the new
accessor function added here.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/sysemu/block-backend.h |  1 +
 block/block-backend.c          | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index bf04086..76b3647 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -145,6 +145,7 @@ void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
 int blk_get_max_transfer_length(BlockBackend *blk);
+int blk_get_opt_transfer_length(BlockBackend *blk);
 int blk_get_max_iov(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_try_blockalign(BlockBackend *blk, size_t size);
diff --git a/block/block-backend.c b/block/block-backend.c
index 1c3b495..e76e61d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1272,6 +1272,17 @@ int blk_get_max_transfer_length(BlockBackend *blk)
     }
 }

+int blk_get_opt_transfer_length(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+
+    if (bs) {
+        return bs->bl.opt_transfer_length;
+    } else {
+        return 0;
+    }
+}
+
 int blk_get_max_iov(BlockBackend *blk)
 {
     return blk->root->bs->bl.max_iov;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 39/44] nbd: Implement NBD_OPT_GO on server
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (37 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 38/44] block: Add blk_get_opt_transfer_length() Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 40/44] nbd: Implement NBD_OPT_GO on client Eric Blake
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

NBD_OPT_EXPORT_NAME is lousy: it requires us to close the connection
rather than report an error.  Upstream NBD recently added NBD_OPT_GO
as the improved version of the option that does what we want, along
with NBD_OPT_INFO that returns the same information but does not
transition to transmission phase.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: revamp to match latest version of NBD protocol
---
 include/block/nbd.h |   7 +++
 nbd/nbd-internal.h  |   2 +
 nbd/server.c        | 136 ++++++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 136 insertions(+), 9 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3fa7996..05c0e48 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -86,14 +86,21 @@ typedef struct nbd_reply nbd_reply;

 #define NBD_REP_ACK             (1)             /* Data sending finished. */
 #define NBD_REP_SERVER          (2)             /* Export description. */
+#define NBD_REP_INFO            (3)             /* NBD_OPT_INFO/GO. */

 #define NBD_REP_ERR_UNSUP       NBD_REP_ERR(1)  /* Unknown option. */
 #define NBD_REP_ERR_POLICY      NBD_REP_ERR(2)  /* Server denied */
 #define NBD_REP_ERR_INVALID     NBD_REP_ERR(3)  /* Invalid length */
 #define NBD_REP_ERR_PLATFORM    NBD_REP_ERR(4)  /* Not compiled in */
 #define NBD_REP_ERR_TLS_REQD    NBD_REP_ERR(5)  /* TLS required */
+#define NBD_REP_ERR_UNKNOWN     NBD_REP_ERR(6)  /* Export unknown */
 #define NBD_REP_ERR_SHUTDOWN    NBD_REP_ERR(7)  /* Server shutting down */

+/* Info types, used during NBD_REP_INFO */
+#define NBD_INFO_EXPORT         0
+#define NBD_INFO_NAME           1
+#define NBD_INFO_DESCRIPTION    2
+
 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 0d40b1f..c597bb8 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -80,6 +80,8 @@
 #define NBD_OPT_LIST            (3)
 #define NBD_OPT_PEEK_EXPORT     (4)
 #define NBD_OPT_STARTTLS        (5)
+#define NBD_OPT_INFO            (6)
+#define NBD_OPT_GO              (7)

 /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
  * but only a limited set of errno values is specified in the protocol.
diff --git a/nbd/server.c b/nbd/server.c
index fa6a994..1edb5f3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -273,6 +273,8 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
     return 0;
 }

+/* Send a sequence of replies to NBD_OPT_LIST.
+ * Return -errno to kill connection, 0 to continue negotiation. */
 static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
 {
     NBDExport *exp;
@@ -295,6 +297,8 @@ static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
     return nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, NBD_OPT_LIST);
 }

+/* Send a reply to NBD_OPT_EXPORT_NAME.
+ * Return -errno to kill connection, 0 to end negotiation. */
 static int nbd_negotiate_handle_export_name(NBDClient *client, uint32_t length)
 {
     int rc = -EINVAL;
@@ -330,6 +334,104 @@ fail:
 }


+/* Handle NBD_OPT_INFO and NBD_OPT_GO.
+ * Return -errno to kill connection, 0 if ready for next option, and 1
+ * to move into transmission phase.  */
+static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
+                                     uint32_t opt, uint16_t myflags)
+{
+    int rc;
+    char name[NBD_MAX_NAME_SIZE + 1];
+    NBDExport *exp;
+    uint16_t type;
+    uint64_t size;
+    uint16_t flags;
+
+    /* Client sends:
+        [20 ..  xx]   export name (length bytes)
+     */
+    TRACE("Checking length");
+    if (length >= sizeof(name)) {
+        if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
+            return -EIO;
+        }
+        return nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_INVALID, opt);
+    }
+    if (nbd_negotiate_read(client->ioc, name, length) != length) {
+        LOG("read failed");
+        return -EIO;
+    }
+    name[length] = '\0';
+
+    TRACE("Client requested info on export '%s'", name);
+
+    exp = nbd_export_find(name);
+    if (!exp) {
+        return nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_UNKNOWN, opt);
+    }
+
+    if (exp->description) {
+        size_t len = strlen(exp->description);
+
+        rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt,
+                                        sizeof(type) + len);
+        if (rc < 0) {
+            return rc;
+        }
+        type = cpu_to_be16(NBD_INFO_DESCRIPTION);
+        if (nbd_negotiate_write(client->ioc, &type, sizeof(type)) !=
+            sizeof(type)) {
+            LOG("write failed");
+            return -EIO;
+        }
+        if (nbd_negotiate_write(client->ioc, &exp->description, len) != len) {
+            LOG("write failed");
+            return -EIO;
+        }
+    }
+
+    rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt,
+                                    sizeof(type) + sizeof(size) +
+                                    sizeof(flags));
+    if (rc < 0) {
+        return rc;
+    }
+
+    type = cpu_to_be16(NBD_INFO_EXPORT);
+    size = cpu_to_be64(exp->size);
+    flags = cpu_to_be16(exp->nbdflags | myflags);
+
+    if (nbd_negotiate_write(client->ioc, &type, sizeof(type)) !=
+        sizeof(type)) {
+        LOG("write failed");
+        return -EIO;
+    }
+    if (nbd_negotiate_write(client->ioc, &size, sizeof(size)) !=
+        sizeof(size)) {
+        LOG("write failed");
+        return -EIO;
+    }
+    if (nbd_negotiate_write(client->ioc, &flags, sizeof(flags)) !=
+        sizeof(flags)) {
+        LOG("write failed");
+        return -EIO;
+    }
+
+    rc = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, opt);
+    if (rc < 0) {
+        return rc;
+    }
+
+    if (opt == NBD_OPT_GO) {
+        client->exp = exp;
+        QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
+        nbd_export_get(client->exp);
+        rc = 1;
+    }
+    return rc;
+}
+
+
 static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
                                                  uint32_t length)
 {
@@ -381,7 +483,10 @@ static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
 }


-static int nbd_negotiate_options(NBDClient *client)
+/* Loop over all client options, during fixed newstyle negotiation.
+ * Return -errno to kill connection, 0 on successful NBD_OPT_EXPORT_NAME,
+ * 1 on successful NBD_OPT_GO.  */
+static int nbd_negotiate_options(NBDClient *client, uint16_t myflags)
 {
     uint32_t flags;
     bool fixedNewstyle = false;
@@ -515,6 +620,16 @@ static int nbd_negotiate_options(NBDClient *client)
             case NBD_OPT_EXPORT_NAME:
                 return nbd_negotiate_handle_export_name(client, length);

+            case NBD_OPT_INFO:
+            case NBD_OPT_GO:
+                ret = nbd_negotiate_handle_info(client, length, clientflags,
+                                                myflags);
+                if (ret) {
+                    assert(ret < 0 || clientflags == NBD_OPT_GO);
+                    return ret;
+                }
+                break;
+
             case NBD_OPT_STARTTLS:
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
@@ -627,18 +742,21 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
             LOG("write failed");
             goto fail;
         }
-        rc = nbd_negotiate_options(client);
-        if (rc != 0) {
+        rc = nbd_negotiate_options(client, myflags);
+        if (rc < 0) {
             LOG("option negotiation failed");
             goto fail;
         }

-        stq_be_p(buf + 18, client->exp->size);
-        stw_be_p(buf + 26, client->exp->nbdflags | myflags);
-        len = client->no_zeroes ? 10 : sizeof(buf) - 18;
-        if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
-            LOG("write failed");
-            goto fail;
+        if (!rc) {
+            /* If options ended with NBD_OPT_GO, we already sent this. */
+            stq_be_p(buf + 18, client->exp->size);
+            stw_be_p(buf + 26, client->exp->nbdflags | myflags);
+            len = client->no_zeroes ? 10 : sizeof(buf) - 18;
+            if (nbd_negotiate_write(client->ioc, buf + 18, len) != len) {
+                LOG("write failed");
+                goto fail;
+            }
         }
     }

-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 40/44] nbd: Implement NBD_OPT_GO on client
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (38 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 39/44] nbd: Implement NBD_OPT_GO on server Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25 10:31   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 41/44] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
                   ` (3 subsequent siblings)
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini

NBD_OPT_EXPORT_NAME is lousy: it doesn't have any sane error
reporting.  Upstream NBD recently added NBD_OPT_GO as the
improved version of the option that does what we want: it
reports sane errors on failures (including when a server
requires TLS but does not have NBD_OPT_GO!), and on success
it provides at least as much info as NBD_OPT_EXPORT_NAME sends.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: revamp to match latest version of NBD protocol
---
 nbd/nbd-internal.h |   3 ++
 nbd/client.c       | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 121 insertions(+), 2 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index c597bb8..1935102 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -55,8 +55,11 @@
  * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

+/* Size of all NBD_OPT_*, without payload */
 #define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
+/* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
 #define NBD_REPLY_SIZE          (4 + 4 + 8)
+
 #define NBD_REQUEST_MAGIC       0x25609513
 #define NBD_REPLY_MAGIC         0x67446698
 #define NBD_OPTS_MAGIC          0x49484156454F5054LL
diff --git a/nbd/client.c b/nbd/client.c
index 89fa2c3..dac4f29 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -222,6 +222,11 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
                    reply->option);
         break;

+    case NBD_REP_ERR_UNKNOWN:
+        error_setg(errp, "Requested export not available for option %" PRIx32,
+                   reply->option);
+        break;
+
     case NBD_REP_ERR_SHUTDOWN:
         error_setg(errp, "Server shutting down before option %" PRIx32,
                    reply->option);
@@ -311,6 +316,103 @@ static int nbd_receive_list(QIOChannel *ioc, const char *want, Error **errp)
 }


+/* Returns -1 if NBD_OPT_GO proves the export @wantname cannot be
+ * used, 0 if NBD_OPT_GO is unsupported (fall back to NBD_OPT_LIST and
+ * NBD_OPT_EXPORT_NAME in that case), and > 0 if the export is good to
+ * go (with @size and @flags populated). */
+static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
+                      NbdExportInfo *info, Error **errp)
+{
+    nbd_opt_reply reply;
+    uint32_t len;
+    uint16_t type;
+    int error;
+
+    /* The protocol requires that the server send NBD_INFO_EXPORT with
+     * a non-zero flags (at least NBD_FLAG_HAS_FLAGS must be set); so
+     * flags still 0 is a witness of a broken server. */
+    info->flags = 0;
+
+    TRACE("Attempting NBD_OPT_GO for export '%s'", wantname);
+    if (nbd_send_option_request(ioc, NBD_OPT_GO, -1, wantname, errp) < 0) {
+        return -1;
+    }
+
+    TRACE("Reading export info");
+    while (1) {
+        if (nbd_receive_option_reply(ioc, NBD_OPT_GO, &reply, errp) < 0) {
+            return -1;
+        }
+        error = nbd_handle_reply_err(ioc, &reply, errp);
+        if (error <= 0) {
+            return error;
+        }
+        len = reply.length;
+
+        if (reply.type == NBD_REP_ACK) {
+            /* Server is done sending info and moved into transmission
+               phase, but make sure it sent flags */
+            if (len) {
+                error_setg(errp, "server sent invalid NBD_REP_ACK");
+                return -1;
+            }
+            if (!info->flags) {
+                error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
+                return -1;
+            }
+            TRACE("export is good to go");
+            return 1;
+        }
+        if (reply.type != NBD_REP_INFO) {
+            error_setg(errp, "unexpected reply type %" PRIx32 ", expected %x",
+                       reply.type, NBD_REP_INFO);
+            return -1;
+        }
+        if (len < sizeof(type)) {
+            error_setg(errp, "NBD_REP_INFO length %" PRIu32 " is too short",
+                       len);
+            return -1;
+        }
+        if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
+            error_setg(errp, "failed to read info type");
+            return -1;
+        }
+        len -= sizeof(type);
+        be16_to_cpus(&type);
+        switch (type) {
+        case NBD_INFO_EXPORT:
+            if (len != sizeof(info->size) + sizeof(info->flags)) {
+                error_setg(errp, "remaining export info len %" PRIu32
+                           " is unexpected size", len);
+                return -1;
+            }
+            if (read_sync(ioc, &info->size, sizeof(info->size)) !=
+                sizeof(info->size)) {
+                error_setg(errp, "failed to read info size");
+                return -1;
+            }
+            be64_to_cpus(&info->size);
+            if (read_sync(ioc, &info->flags, sizeof(info->flags)) !=
+                sizeof(info->flags)) {
+                error_setg(errp, "failed to read info flags");
+                return -1;
+            }
+            be16_to_cpus(&info->flags);
+            TRACE("Size is %" PRIu64 ", export flags %" PRIx16,
+                  info->size, info->flags);
+            break;
+
+        default:
+            TRACE("ignoring unknown export info %" PRIu16, type);
+            if (drop_sync(ioc, len) != len) {
+                error_setg(errp, "Failed to read info payload");
+                return -1;
+            }
+            break;
+        }
+    }
+}
+
 /* Return -1 on failure, 0 if wantname is an available export. */
 static int nbd_receive_query_exports(QIOChannel *ioc,
                                      const char *wantname,
@@ -515,11 +617,25 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
             name = "";
         }
         if (fixedNewStyle) {
+            int result;
+
+            /* Try NBD_OPT_GO first - if it works, we are done (it
+             * also gives us a good message if the server requires
+             * TLS).  If it is not available, fall back to
+             * NBD_OPT_LIST for nicer error messages about a missing
+             * export, then use NBD_OPT_EXPORT_NAME.  */
+            result = nbd_opt_go(ioc, name, info, errp);
+            if (result < 0) {
+                goto fail;
+            }
+            if (result > 0) {
+                return 0;
+            }
             /* Check our desired export is present in the
              * server export list. Since NBD_OPT_EXPORT_NAME
              * cannot return an error message, running this
-             * query gives us good error reporting if the
-             * server required TLS
+             * query gives us better error reporting if the
+             * export name is not available.
              */
             if (nbd_receive_query_exports(ioc, name, errp) < 0) {
                 goto fail;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 41/44] nbd: Implement NBD_CMD_WRITE_ZEROES on server
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (39 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 40/44] nbd: Implement NBD_OPT_GO on client Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-23  9:00   ` Pavel Borzenkov
  2016-04-25 12:11   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 42/44] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
                   ` (2 subsequent siblings)
  43 siblings, 2 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

Upstream NBD protocol recently added the ability to efficiently
write zeroes without having to send the zeroes over the wire,
along with a flag to control whether the client wants a hole.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: abandon NBD_CMD_CLOSE extension, rebase to use blk_pwrite_zeroes
---
 include/block/nbd.h |  7 +++++--
 nbd/server.c        | 42 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 05c0e48..1072d9e 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -70,6 +70,7 @@ typedef struct nbd_reply nbd_reply;
 #define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
 #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
 #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
+#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)     /* Send WRITE_ZEROES */

 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
@@ -102,7 +103,8 @@ typedef struct nbd_reply nbd_reply;
 #define NBD_INFO_DESCRIPTION    2

 /* Request flags, sent from client to server during transmission phase */
-#define NBD_CMD_FLAG_FUA        (1 << 0)
+#define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
+#define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */

 /* Supported request types */
 enum {
@@ -110,7 +112,8 @@ enum {
     NBD_CMD_WRITE = 1,
     NBD_CMD_DISC = 2,
     NBD_CMD_FLUSH = 3,
-    NBD_CMD_TRIM = 4
+    NBD_CMD_TRIM = 4,
+    NBD_CMD_WRITE_ZEROES = 5,
 };

 #define NBD_DEFAULT_PORT	10809
diff --git a/nbd/server.c b/nbd/server.c
index 1edb5f3..563afb2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -689,7 +689,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     char buf[8 + 8 + 8 + 128];
     int rc;
     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
-                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
+                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
+                              NBD_FLAG_SEND_WRITE_ZEROES);
     bool oldStyle;
     size_t len;

@@ -1199,11 +1200,17 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
         rc = -EINVAL;
         goto out;
     }
-    if (request->flags & ~NBD_CMD_FLAG_FUA) {
+    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
         LOG("unsupported flags (got 0x%x)", request->flags);
         rc = -EINVAL;
         goto out;
     }
+    if (request->type != NBD_CMD_WRITE_ZEROES &&
+        (request->flags & NBD_CMD_FLAG_NO_HOLE)) {
+        LOG("unexpected flags (got 0x%x)", request->flags);
+        rc = -EINVAL;
+        goto out;
+    }

     rc = 0;

@@ -1308,6 +1315,37 @@ static void nbd_trip(void *opaque)
         }
         break;

+    case NBD_CMD_WRITE_ZEROES:
+        TRACE("Request type is WRITE_ZEROES");
+
+        if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
+            TRACE("Server is read-only, return error");
+            reply.error = EROFS;
+            goto error_reply;
+        }
+
+        TRACE("Writing to device");
+
+        flags = 0;
+        if (request.flags & NBD_CMD_FLAG_FUA) {
+            flags |= BDRV_REQ_FUA;
+        }
+        if (!(request.flags & NBD_CMD_FLAG_NO_HOLE)) {
+            flags |= BDRV_REQ_MAY_UNMAP;
+        }
+        ret = blk_pwrite_zeroes(exp->blk, request.from + exp->dev_offset,
+                                request.len, flags);
+        if (ret < 0) {
+            LOG("writing to file failed");
+            reply.error = -ret;
+            goto error_reply;
+        }
+
+        if (nbd_co_send_reply(req, &reply, 0) < 0) {
+            goto out;
+        }
+        break;
+
     case NBD_CMD_DISC:
         /* unreachable, thanks to special case in nbd_co_receive_request() */
         abort();
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 42/44] nbd: Implement NBD_CMD_WRITE_ZEROES on client
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (40 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 41/44] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25 12:12   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 43/44] nbd: Implement NBD_OPT_BLOCK_SIZE on server Eric Blake
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 44/44] nbd: Implement NBD_OPT_BLOCK_SIZE on client Eric Blake
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

Upstream NBD protocol recently added the ability to efficiently
write zeroes without having to send the zeroes over the wire,
along with a flag to control whether the client wants a hole.

The generic block code takes care of falling back to the obvious
write of lots of zeroes if we return -ENOTSUP because the server
does not have WRITE_ZEROES.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: rebase, tell block layer about our support
---
 block/nbd-client.h |  2 ++
 block/nbd-client.c | 34 ++++++++++++++++++++++++++++++++++
 block/nbd.c        | 24 ++++++++++++++++++++++++
 3 files changed, 60 insertions(+)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 0867147..07630ab 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -46,6 +46,8 @@ void nbd_client_close(BlockDriverState *bs);
 int nbd_client_co_discard(BlockDriverState *bs, int64_t sector_num,
                           int nb_sectors);
 int nbd_client_co_flush(BlockDriverState *bs);
+int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
+                               int nb_sectors, int *flags);
 int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
                          int nb_sectors, QEMUIOVector *qiov, int *flags);
 int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
diff --git a/block/nbd-client.c b/block/nbd-client.c
index f20219b..2b6ac27 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -291,6 +291,40 @@ int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
     return nbd_co_readv_1(bs, sector_num, nb_sectors, qiov, offset);
 }

+int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
+                               int nb_sectors, int *flags)
+{
+    ssize_t ret;
+    NbdClientSession *client = nbd_get_client_session(bs);
+    struct nbd_request request = { .type = NBD_CMD_WRITE_ZEROES };
+    struct nbd_reply reply;
+
+    if (!(client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
+        return -ENOTSUP;
+    }
+
+    if ((*flags & BDRV_REQ_FUA) && (client->info.flags & NBD_FLAG_SEND_FUA)) {
+        *flags &= ~BDRV_REQ_FUA;
+        request.flags |= NBD_CMD_FLAG_FUA;
+    }
+    if (!(*flags & BDRV_REQ_MAY_UNMAP)) {
+        request.flags |= NBD_CMD_FLAG_NO_HOLE;
+    }
+
+    request.from = sector_num * 512;
+    request.len = nb_sectors * 512;
+
+    nbd_coroutine_start(client, &request);
+    ret = nbd_co_send_request(bs, &request, NULL, 0);
+    if (ret < 0) {
+        reply.error = -ret;
+    } else {
+        nbd_co_receive_reply(client, &request, &reply, NULL, 0);
+    }
+    nbd_coroutine_end(client, &request);
+    return -reply.error;
+}
+
 int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
                          int nb_sectors, QEMUIOVector *qiov, int *flags)
 {
diff --git a/block/nbd.c b/block/nbd.c
index 34db83e..5172039 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -355,6 +355,26 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
     return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
 }

+static int nbd_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
+                               int nb_sectors, BdrvRequestFlags orig_flags)
+{
+    int flags = orig_flags;
+    int ret;
+
+    ret = nbd_client_co_write_zeroes(bs, sector_num, nb_sectors, &flags);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* The flag wasn't sent to the server, so we need to emulate it with an
+     * explicit flush */
+    if (flags & BDRV_REQ_FUA) {
+        ret = nbd_client_co_flush(bs);
+    }
+
+    return ret;
+}
+
 static int nbd_co_writev_flags(BlockDriverState *bs, int64_t sector_num,
                                int nb_sectors, QEMUIOVector *qiov, int flags)
 {
@@ -388,6 +408,7 @@ static int nbd_co_flush(BlockDriverState *bs)
 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
+    bs->bl.max_write_zeroes = UINT32_MAX >> BDRV_SECTOR_BITS;
     bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
 }

@@ -476,6 +497,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_parse_filename        = nbd_parse_filename,
     .bdrv_file_open             = nbd_open,
     .bdrv_co_readv              = nbd_co_readv,
+    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
     .bdrv_co_writev             = nbd_co_writev,
     .bdrv_co_writev_flags       = nbd_co_writev_flags,
     .supported_write_flags      = BDRV_REQ_FUA,
@@ -496,6 +518,7 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_parse_filename        = nbd_parse_filename,
     .bdrv_file_open             = nbd_open,
     .bdrv_co_readv              = nbd_co_readv,
+    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
     .bdrv_co_writev             = nbd_co_writev,
     .bdrv_co_writev_flags       = nbd_co_writev_flags,
     .supported_write_flags      = BDRV_REQ_FUA,
@@ -516,6 +539,7 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_parse_filename        = nbd_parse_filename,
     .bdrv_file_open             = nbd_open,
     .bdrv_co_readv              = nbd_co_readv,
+    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
     .bdrv_co_writev             = nbd_co_writev,
     .bdrv_co_writev_flags       = nbd_co_writev_flags,
     .supported_write_flags      = BDRV_REQ_FUA,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 43/44] nbd: Implement NBD_OPT_BLOCK_SIZE on server
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (41 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 42/44] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25 12:16   ` Alex Bligh
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 44/44] nbd: Implement NBD_OPT_BLOCK_SIZE on client Eric Blake
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.

Thanks to a recent fix, our minimum transfer size is always
1 (the block layer takes care of read-modify-write on our
behalf), although if we wanted down the road, we could
advertise a minimum of 512 based on our usage patterns to a
client that is willing to honor block sizes.  Meanwhile,
advertising our absolute maximum transfer size of 32M will
help newer clients avoid EINVAL failures.

We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be. But
for clients new enough to use NBD_OPT_GO, we require the client
to first send us NBD_OPT_BLOCK_SIZE to prove they know about
sizing restraints, by failing with NBD_REP_ERR_BLOCK_SIZE_REQD.
All existing released qemu clients (whether old-style or new, at
least by the end of this series) already honor our limits, and
will still connect; so at most, this would reject a new non-qemu
client that doesn't intend to obey limits (and that client could
still use NBD_OPT_EXPORT_NAME to bypass our rejection).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  2 ++
 nbd/nbd-internal.h  |  1 +
 nbd/server.c        | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 1072d9e..a5c68df 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -96,11 +96,13 @@ typedef struct nbd_reply nbd_reply;
 #define NBD_REP_ERR_TLS_REQD    NBD_REP_ERR(5)  /* TLS required */
 #define NBD_REP_ERR_UNKNOWN     NBD_REP_ERR(6)  /* Export unknown */
 #define NBD_REP_ERR_SHUTDOWN    NBD_REP_ERR(7)  /* Server shutting down */
+#define NBD_REP_ERR_BLOCK_SIZE_REQD NBD_REP_ERR(8) /* Missing OPT_BLOCK_SIZE */

 /* Info types, used during NBD_REP_INFO */
 #define NBD_INFO_EXPORT         0
 #define NBD_INFO_NAME           1
 #define NBD_INFO_DESCRIPTION    2
+#define NBD_INFO_BLOCK_SIZE     3

 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 1935102..1354182 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -85,6 +85,7 @@
 #define NBD_OPT_STARTTLS        (5)
 #define NBD_OPT_INFO            (6)
 #define NBD_OPT_GO              (7)
+#define NBD_OPT_BLOCK_SIZE      (9)

 /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
  * but only a limited set of errno values is specified in the protocol.
diff --git a/nbd/server.c b/nbd/server.c
index 563afb2..86d1e2d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -83,6 +83,7 @@ struct NBDClient {
     void (*close)(NBDClient *client);

     bool no_zeroes;
+    bool block_size;
     NBDExport *exp;
     QCryptoTLSCreds *tlscreds;
     char *tlsaclname;
@@ -346,6 +347,7 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
     uint16_t type;
     uint64_t size;
     uint16_t flags;
+    uint32_t block;

     /* Client sends:
         [20 ..  xx]   export name (length bytes)
@@ -391,6 +393,57 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
     }

     rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt,
+                                    sizeof(type) + 3 * sizeof(block));
+    if (rc < 0) {
+        return rc;
+    }
+
+    type = cpu_to_be16(NBD_INFO_BLOCK_SIZE);
+    if (nbd_negotiate_write(client->ioc, &type, sizeof(type)) !=
+        sizeof(type)) {
+        LOG("write failed");
+        return -EIO;
+    }
+    /* minimum - Always 1, because we use blk_pread().
+     * TODO: Advertise 512 if guest used NBD_OPT_BLOCK_SIZE? */
+    block = cpu_to_be32(1);
+    if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) !=
+        sizeof(block)) {
+        LOG("write failed");
+        return -EIO;
+    }
+    /* preferred - At least 4096, but larger as appropriate. */
+    block = blk_get_opt_transfer_length(exp->blk) * BDRV_SECTOR_SIZE;
+    block = cpu_to_be32(MAX(4096, block));
+    if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) !=
+        sizeof(block)) {
+        LOG("write failed");
+        return -EIO;
+    }
+    /* maximum - At most 32M, but smaller as appropriate. */
+    block = blk_get_max_transfer_length(exp->blk);
+    if (block && block < NBD_MAX_BUFFER_SIZE / BDRV_SECTOR_SIZE) {
+        block *= BDRV_SECTOR_SIZE;
+    } else {
+        block = NBD_MAX_BUFFER_SIZE;
+    }
+    block = cpu_to_be32(block);
+    if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) !=
+        sizeof(block)) {
+        LOG("write failed");
+        return -EIO;
+    }
+
+    if (!client->block_size) {
+        /* The client is new enough to use NBD_OPT_GO, but forgot to
+         * tell us that it plans to obey block sizes. Since we fail
+         * hard on oversize requests, it's better to reject such a
+         * client up front.  */
+        return nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_BLOCK_SIZE_REQD,
+                                      opt);
+    }
+
+    rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt,
                                     sizeof(type) + sizeof(size) +
                                     sizeof(flags));
     if (rc < 0) {
@@ -630,6 +683,15 @@ static int nbd_negotiate_options(NBDClient *client, uint16_t myflags)
                 }
                 break;

+            case NBD_OPT_BLOCK_SIZE:
+                client->block_size = true;
+                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
+                                             clientflags);
+                if (ret < 0) {
+                    return ret;
+                }
+                break;
+
             case NBD_OPT_STARTTLS:
                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
                     return -EIO;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [Qemu-devel] [PATCH v3 44/44] nbd: Implement NBD_OPT_BLOCK_SIZE on client
  2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
                   ` (42 preceding siblings ...)
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 43/44] nbd: Implement NBD_OPT_BLOCK_SIZE on server Eric Blake
@ 2016-04-22 23:40 ` Eric Blake
  2016-04-25 12:19   ` Alex Bligh
  43 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-22 23:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, alex, Paolo Bonzini, Kevin Wolf, Max Reitz

The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.

Pass any received sizes on to the block layer.

Use the minimum block size as the sector size we pass to the
kernel - which also has the nice effect of cooperating with
(non-qemu) servers that don't do read-modify-write when exposing
a block device with 4k sectors; it can also allow us to visit a
file larger than 2T on a 32-bit kernel.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h |  3 +++
 block/nbd-client.c  |  3 +++
 block/nbd.c         | 17 +++++++++---
 nbd/client.c        | 74 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index a5c68df..27a6854 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -133,6 +133,9 @@ enum {
 struct NbdExportInfo {
     uint64_t size;
     uint16_t flags;
+    uint32_t min_block;
+    uint32_t opt_block;
+    uint32_t max_block;
 };
 typedef struct NbdExportInfo NbdExportInfo;

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 2b6ac27..602a8ab 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -443,6 +443,9 @@ int nbd_client_init(BlockDriverState *bs,
         logout("Failed to negotiate with the NBD server\n");
         return ret;
     }
+    if (client->info.min_block > bs->request_alignment) {
+        bs->request_alignment = client->info.min_block;
+    }

     qemu_co_mutex_init(&client->send_mutex);
     qemu_co_mutex_init(&client->free_sema);
diff --git a/block/nbd.c b/block/nbd.c
index 5172039..bb7df55 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -407,9 +407,20 @@ static int nbd_co_flush(BlockDriverState *bs)

 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-    bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
-    bs->bl.max_write_zeroes = UINT32_MAX >> BDRV_SECTOR_BITS;
-    bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
+    NbdClientSession *s = nbd_get_client_session(bs);
+    int max = UINT32_MAX >> BDRV_SECTOR_BITS;
+
+    if (s->info.max_block) {
+        max = s->info.max_block >> BDRV_SECTOR_BITS;
+    }
+    bs->bl.max_discard = max;
+    bs->bl.max_write_zeroes = max;
+    bs->bl.max_transfer_length = max;
+
+    if (s->info.opt_block &&
+        s->info.opt_block >> BDRV_SECTOR_BITS > bs->bl.opt_transfer_length) {
+        bs->bl.opt_transfer_length = s->info.opt_block >> BDRV_SECTOR_BITS;
+    }
 }

 static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
diff --git a/nbd/client.c b/nbd/client.c
index dac4f29..24f6b0b 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -232,6 +232,11 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
                    reply->option);
         break;

+    case NBD_REP_ERR_BLOCK_SIZE_REQD:
+        error_setg(errp, "Server wants OPT_BLOCK_SIZE before option %" PRIx32,
+                   reply->option);
+        break;
+
     default:
         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
                    reply->option);
@@ -333,6 +338,21 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
      * flags still 0 is a witness of a broken server. */
     info->flags = 0;

+    /* Some servers use NBD_OPT_GO to advertise non-default block
+     * sizes, and require that we first use NBD_OPT_BLOCK_SIZE to
+     * agree to that. */
+    TRACE("Attempting NBD_OPT_BLOCK_SIZE");
+    if (nbd_send_option_request(ioc, NBD_OPT_BLOCK_SIZE, 0, NULL, errp) < 0) {
+        return -1;
+    }
+    if (nbd_receive_option_reply(ioc, NBD_OPT_BLOCK_SIZE, &reply, errp) < 0) {
+        return -1;
+    }
+    error = nbd_handle_reply_err(ioc, &reply, errp);
+    if (error < 0) {
+        return error;
+    }
+
     TRACE("Attempting NBD_OPT_GO for export '%s'", wantname);
     if (nbd_send_option_request(ioc, NBD_OPT_GO, -1, wantname, errp) < 0) {
         return -1;
@@ -402,6 +422,45 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
                   info->size, info->flags);
             break;

+        case NBD_INFO_BLOCK_SIZE:
+            if (len != sizeof(info->min_block) * 3) {
+                error_setg(errp, "remaining export info len %" PRIu32
+                           " is unexpected size", len);
+                return -1;
+            }
+            if (read_sync(ioc, &info->min_block, sizeof(info->min_block)) !=
+                sizeof(info->min_block)) {
+                error_setg(errp, "failed to read info minimum block size");
+                return -1;
+            }
+            be32_to_cpus(&info->min_block);
+            if (!is_power_of_2(info->min_block)) {
+                error_setg(errp, "server minimum block size %" PRId32
+                           "is not a power of two", info->min_block);
+                return -1;
+            }
+            if (read_sync(ioc, &info->opt_block, sizeof(info->opt_block)) !=
+                sizeof(info->opt_block)) {
+                error_setg(errp, "failed to read info preferred block size");
+                return -1;
+            }
+            be32_to_cpus(&info->opt_block);
+            if (!is_power_of_2(info->opt_block) ||
+                info->opt_block < info->min_block) {
+                error_setg(errp, "server preferred block size %" PRId32
+                           "is not valid", info->opt_block);
+                return -1;
+            }
+            if (read_sync(ioc, &info->max_block, sizeof(info->max_block)) !=
+                sizeof(info->max_block)) {
+                error_setg(errp, "failed to read info maximum block size");
+                return -1;
+            }
+            be32_to_cpus(&info->max_block);
+            TRACE("Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32,
+                  info->min_block, info->opt_block, info->max_block);
+            break;
+
         default:
             TRACE("ignoring unknown export info %" PRIu16, type);
             if (drop_sync(ioc, len) != len) {
@@ -710,8 +769,9 @@ fail:
 #ifdef __linux__
 int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info)
 {
-    unsigned long sectors = info->size / BDRV_SECTOR_SIZE;
-    if (info->size / BDRV_SECTOR_SIZE != sectors) {
+    unsigned long sector_size = MAX(BDRV_SECTOR_SIZE, info->min_block);
+    unsigned long sectors = info->size / sector_size;
+    if (info->size / sector_size != sectors) {
         LOG("Export size %" PRId64 " too large for 32-bit kernel", info->size);
         return -E2BIG;
     }
@@ -724,18 +784,18 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info)
         return -serrno;
     }

-    TRACE("Setting block size to %lu", (unsigned long)BDRV_SECTOR_SIZE);
+    TRACE("Setting block size to %lu", sector_size);

-    if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) {
+    if (ioctl(fd, NBD_SET_BLKSIZE, sector_size) < 0) {
         int serrno = errno;
         LOG("Failed setting NBD block size");
         return -serrno;
     }

     TRACE("Setting size to %lu block(s)", sectors);
-    if (size % BDRV_SECTOR_SIZE) {
-        TRACE("Ignoring trailing %d bytes of export",
-              (int) (size % BDRV_SECTOR_SIZE));
+    if (info->size % sector_size) {
+        TRACE("Ignoring trailing %" PRId64 " bytes of export",
+              info->size % sector_size);
     }

     if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 21/44] block: Switch blk_write_zeroes() to byte interface
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 21/44] block: Switch blk_write_zeroes() " Eric Blake
@ 2016-04-23  8:12   ` Denis V. Lunev
  0 siblings, 0 replies; 67+ messages in thread
From: Denis V. Lunev @ 2016-04-23  8:12 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: qemu-block, alex, Kevin Wolf, Max Reitz, Stefan Hajnoczi

On 04/23/2016 02:40 AM, Eric Blake wrote:
> Sector-based blk_write() should die; convert the one-off
> variant blk_write_zeroes().
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   include/sysemu/block-backend.h | 4 ++--
>   block/block-backend.c          | 8 ++++----
>   block/parallels.c              | 3 ++-
>   qemu-img.c                     | 3 ++-
>   4 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
> index 662a106..1246699 100644
> --- a/include/sysemu/block-backend.h
> +++ b/include/sysemu/block-backend.h
> @@ -96,8 +96,8 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
>                             int count);
>   int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
>                 int nb_sectors);
> -int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
> -                     int nb_sectors, BdrvRequestFlags flags);
> +int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> +                     int count, BdrvRequestFlags flags);
>   BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
>                                    int nb_sectors, BdrvRequestFlags flags,
>                                    BlockCompletionFunc *cb, void *opaque);
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 5513b6f..ae08bd2 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -816,11 +816,11 @@ int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
>                     blk_write_entry, 0);
>   }
>
> -int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
> -                     int nb_sectors, BdrvRequestFlags flags)
> +int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> +                      int count, BdrvRequestFlags flags)
>   {
> -    return blk_rw(blk, sector_num, NULL, nb_sectors, blk_write_entry,
> -                  flags | BDRV_REQ_ZERO_WRITE);
> +    return blk_prw(blk, offset, NULL, count, blk_write_entry,
> +                   flags | BDRV_REQ_ZERO_WRITE);
>   }
>
>   static void error_callback_bh(void *opaque)
> diff --git a/block/parallels.c b/block/parallels.c
> index 2d8bc87..95bfc32 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -516,7 +516,8 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
>       if (ret < 0) {
>           goto exit;
>       }
> -    ret = blk_write_zeroes(file, 1, bat_sectors - 1, 0);
> +    ret = blk_pwrite_zeroes(file, BDRV_SECTOR_SIZE,
> +                            (bat_sectors - 1) << BDRV_SECTOR_BITS, 0);
>       if (ret < 0) {
>           goto exit;
>       }
> diff --git a/qemu-img.c b/qemu-img.c
> index 2e4646e..376107c 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1601,7 +1601,8 @@ static int convert_write(ImgConvertState *s, int64_t sector_num, int nb_sectors,
>               if (s->has_zero_init) {
>                   break;
>               }
> -            ret = blk_write_zeroes(s->target, sector_num, n, 0);
> +            ret = blk_pwrite_zeroes(s->target, sector_num << BDRV_SECTOR_BITS,
> +                                    n << BDRV_SECTOR_BITS, 0);
>               if (ret < 0) {
>                   return ret;
>               }
Acked-by: Denis V. Lunev <den@openvz.org>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 09/44] block: Allow BDRV_REQ_FUA through blk_pwrite()
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 09/44] block: Allow BDRV_REQ_FUA through blk_pwrite() Eric Blake
@ 2016-04-23  8:12   ` Denis V. Lunev
  0 siblings, 0 replies; 67+ messages in thread
From: Denis V. Lunev @ 2016-04-23  8:12 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: qemu-block, alex, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Hitoshi Mitake, Liu Yuan, Jeff Cody, Stefan Weil, Fam Zheng,
	David Gibson, Alexander Graf, Paolo Bonzini, open list:Sheepdog,
	open list:sPAPR

On 04/23/2016 02:40 AM, Eric Blake wrote:
> We have several block drivers that understand BDRV_REQ_FUA,
> and emulate it in the block layer for the rest by a full flush.
> But without a way to actually request BDRV_REQ_FUA during a
> pass-through blk_pwrite(), FUA-aware block drivers like NBD are
> forced to repeat the emulation logic of a full flush regardless
> of whether the backend they are writing to could do it more
> efficiently.
>
> This patch just wires up a flags argument; a followup patch
> will actually make use of it in the NBD driver and in qemu-io.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   include/sysemu/block-backend.h |  3 ++-
>   block/block-backend.c          |  6 ++++--
>   block/crypto.c                 |  2 +-
>   block/parallels.c              |  2 +-
>   block/qcow.c                   |  8 ++++----
>   block/qcow2.c                  |  4 ++--
>   block/qed.c                    |  6 +++---
>   block/sheepdog.c               |  2 +-
>   block/vdi.c                    |  4 ++--
>   block/vhdx.c                   |  5 +++--
>   block/vmdk.c                   | 10 +++++-----
>   block/vpc.c                    | 10 +++++-----
>   hw/nvram/spapr_nvram.c         |  4 ++--
>   nbd/server.c                   |  2 +-
>   qemu-io-cmds.c                 |  2 +-
>   15 files changed, 37 insertions(+), 33 deletions(-)
>
> diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
> index c62b6fe..6991b26 100644
> --- a/include/sysemu/block-backend.h
> +++ b/include/sysemu/block-backend.h
> @@ -102,7 +102,8 @@ BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
>                                    int nb_sectors, BdrvRequestFlags flags,
>                                    BlockCompletionFunc *cb, void *opaque);
>   int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count);
> -int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count);
> +int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count,
> +               BdrvRequestFlags flags);
>   int64_t blk_getlength(BlockBackend *blk);
>   void blk_get_geometry(BlockBackend *blk, uint64_t *nb_sectors_ptr);
>   int64_t blk_nb_sectors(BlockBackend *blk);
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 16c9d5e..4551865 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -955,9 +955,11 @@ int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count)
>       return count;
>   }
>
> -int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count)
> +int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count,
> +               BdrvRequestFlags flags)
>   {
> -    int ret = blk_prw(blk, offset, (void*) buf, count, blk_write_entry, 0);
> +    int ret = blk_prw(blk, offset, (void *) buf, count, blk_write_entry,
> +                      flags);
>       if (ret < 0) {
>           return ret;
>       }
> diff --git a/block/crypto.c b/block/crypto.c
> index 1903e84..32ba17c 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
> @@ -91,7 +91,7 @@ static ssize_t block_crypto_write_func(QCryptoBlock *block,
>       struct BlockCryptoCreateData *data = opaque;
>       ssize_t ret;
>
> -    ret = blk_pwrite(data->blk, offset, buf, buflen);
> +    ret = blk_pwrite(data->blk, offset, buf, buflen, 0);
>       if (ret < 0) {
>           error_setg_errno(errp, -ret, "Could not write encryption header");
>           return ret;
> diff --git a/block/parallels.c b/block/parallels.c
> index 324ed43..2d8bc87 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -512,7 +512,7 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
>       memset(tmp, 0, sizeof(tmp));
>       memcpy(tmp, &header, sizeof(header));
>
> -    ret = blk_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE);
> +    ret = blk_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE, 0);
>       if (ret < 0) {
>           goto exit;
>       }
> diff --git a/block/qcow.c b/block/qcow.c
> index 60ddb12..d6dc1b0 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -853,14 +853,14 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
>       }
>
>       /* write all the data */
> -    ret = blk_pwrite(qcow_blk, 0, &header, sizeof(header));
> +    ret = blk_pwrite(qcow_blk, 0, &header, sizeof(header), 0);
>       if (ret != sizeof(header)) {
>           goto exit;
>       }
>
>       if (backing_file) {
>           ret = blk_pwrite(qcow_blk, sizeof(header),
> -            backing_file, backing_filename_len);
> +                         backing_file, backing_filename_len, 0);
>           if (ret != backing_filename_len) {
>               goto exit;
>           }
> @@ -869,8 +869,8 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
>       tmp = g_malloc0(BDRV_SECTOR_SIZE);
>       for (i = 0; i < ((sizeof(uint64_t)*l1_size + BDRV_SECTOR_SIZE - 1)/
>           BDRV_SECTOR_SIZE); i++) {
> -        ret = blk_pwrite(qcow_blk, header_size +
> -            BDRV_SECTOR_SIZE*i, tmp, BDRV_SECTOR_SIZE);
> +        ret = blk_pwrite(qcow_blk, header_size + BDRV_SECTOR_SIZE * i,
> +                         tmp, BDRV_SECTOR_SIZE, 0);
>           if (ret != BDRV_SECTOR_SIZE) {
>               g_free(tmp);
>               goto exit;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 470734b..3090538 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2207,7 +2207,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
>               cpu_to_be64(QCOW2_COMPAT_LAZY_REFCOUNTS);
>       }
>
> -    ret = blk_pwrite(blk, 0, header, cluster_size);
> +    ret = blk_pwrite(blk, 0, header, cluster_size, 0);
>       g_free(header);
>       if (ret < 0) {
>           error_setg_errno(errp, -ret, "Could not write qcow2 header");
> @@ -2217,7 +2217,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
>       /* Write a refcount table with one refcount block */
>       refcount_table = g_malloc0(2 * cluster_size);
>       refcount_table[0] = cpu_to_be64(2 * cluster_size);
> -    ret = blk_pwrite(blk, cluster_size, refcount_table, 2 * cluster_size);
> +    ret = blk_pwrite(blk, cluster_size, refcount_table, 2 * cluster_size, 0);
>       g_free(refcount_table);
>
>       if (ret < 0) {
> diff --git a/block/qed.c b/block/qed.c
> index 0af5274..6cfd4c1 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -601,18 +601,18 @@ static int qed_create(const char *filename, uint32_t cluster_size,
>       }
>
>       qed_header_cpu_to_le(&header, &le_header);
> -    ret = blk_pwrite(blk, 0, &le_header, sizeof(le_header));
> +    ret = blk_pwrite(blk, 0, &le_header, sizeof(le_header), 0);
>       if (ret < 0) {
>           goto out;
>       }
>       ret = blk_pwrite(blk, sizeof(le_header), backing_file,
> -                     header.backing_filename_size);
> +                     header.backing_filename_size, 0);
>       if (ret < 0) {
>           goto out;
>       }
>
>       l1_table = g_malloc0(l1_size);
> -    ret = blk_pwrite(blk, header.l1_table_offset, l1_table, l1_size);
> +    ret = blk_pwrite(blk, header.l1_table_offset, l1_table, l1_size, 0);
>       if (ret < 0) {
>           goto out;
>       }
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 33e0a33..625f876 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -1678,7 +1678,7 @@ static int sd_prealloc(const char *filename, Error **errp)
>           if (ret < 0) {
>               goto out;
>           }
> -        ret = blk_pwrite(blk, idx * buf_size, buf, buf_size);
> +        ret = blk_pwrite(blk, idx * buf_size, buf, buf_size, 0);
>           if (ret < 0) {
>               goto out;
>           }
> diff --git a/block/vdi.c b/block/vdi.c
> index 75d4819..12ab3a6 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -808,7 +808,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
>       vdi_header_print(&header);
>   #endif
>       vdi_header_to_le(&header);
> -    ret = blk_pwrite(blk, offset, &header, sizeof(header));
> +    ret = blk_pwrite(blk, offset, &header, sizeof(header), 0);
>       if (ret < 0) {
>           error_setg(errp, "Error writing header to %s", filename);
>           goto exit;
> @@ -829,7 +829,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
>                   bmap[i] = VDI_UNALLOCATED;
>               }
>           }
> -        ret = blk_pwrite(blk, offset, bmap, bmap_size);
> +        ret = blk_pwrite(blk, offset, bmap, bmap_size, 0);
>           if (ret < 0) {
>               error_setg(errp, "Error writing bmap to %s", filename);
>               goto exit;
> diff --git a/block/vhdx.c b/block/vhdx.c
> index 2b7b332..ec778fe 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -1856,13 +1856,14 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
>       creator = g_utf8_to_utf16("QEMU v" QEMU_VERSION, -1, NULL,
>                                 &creator_items, NULL);
>       signature = cpu_to_le64(VHDX_FILE_SIGNATURE);
> -    ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, &signature, sizeof(signature));
> +    ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, &signature, sizeof(signature),
> +                     0);
>       if (ret < 0) {
>           goto delete_and_exit;
>       }
>       if (creator) {
>           ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET + sizeof(signature),
> -                         creator, creator_items * sizeof(gunichar2));
> +                         creator, creator_items * sizeof(gunichar2), 0);
>           if (ret < 0) {
>               goto delete_and_exit;
>           }
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 45f9d3c..0cc2011 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1728,12 +1728,12 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
>       header.check_bytes[3] = 0xa;
>
>       /* write all the data */
> -    ret = blk_pwrite(blk, 0, &magic, sizeof(magic));
> +    ret = blk_pwrite(blk, 0, &magic, sizeof(magic), 0);
>       if (ret < 0) {
>           error_setg(errp, QERR_IO_ERROR);
>           goto exit;
>       }
> -    ret = blk_pwrite(blk, sizeof(magic), &header, sizeof(header));
> +    ret = blk_pwrite(blk, sizeof(magic), &header, sizeof(header), 0);
>       if (ret < 0) {
>           error_setg(errp, QERR_IO_ERROR);
>           goto exit;
> @@ -1753,7 +1753,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
>           gd_buf[i] = cpu_to_le32(tmp);
>       }
>       ret = blk_pwrite(blk, le64_to_cpu(header.rgd_offset) * BDRV_SECTOR_SIZE,
> -                     gd_buf, gd_buf_size);
> +                     gd_buf, gd_buf_size, 0);
>       if (ret < 0) {
>           error_setg(errp, QERR_IO_ERROR);
>           goto exit;
> @@ -1765,7 +1765,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
>           gd_buf[i] = cpu_to_le32(tmp);
>       }
>       ret = blk_pwrite(blk, le64_to_cpu(header.gd_offset) * BDRV_SECTOR_SIZE,
> -                     gd_buf, gd_buf_size);
> +                     gd_buf, gd_buf_size, 0);
>       if (ret < 0) {
>           error_setg(errp, QERR_IO_ERROR);
>           goto exit;
> @@ -2028,7 +2028,7 @@ static int vmdk_create(const char *filename, QemuOpts *opts, Error **errp)
>
>       blk_set_allow_write_beyond_eof(new_blk, true);
>
> -    ret = blk_pwrite(new_blk, desc_offset, desc, desc_len);
> +    ret = blk_pwrite(new_blk, desc_offset, desc, desc_len, 0);
>       if (ret < 0) {
>           error_setg_errno(errp, -ret, "Could not write description");
>           goto exit;
> diff --git a/block/vpc.c b/block/vpc.c
> index 3e2ea69..a55a3e4 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -783,13 +783,13 @@ static int create_dynamic_disk(BlockBackend *blk, uint8_t *buf,
>       block_size = 0x200000;
>       num_bat_entries = (total_sectors + block_size / 512) / (block_size / 512);
>
> -    ret = blk_pwrite(blk, offset, buf, HEADER_SIZE);
> +    ret = blk_pwrite(blk, offset, buf, HEADER_SIZE, 0);
>       if (ret < 0) {
>           goto fail;
>       }
>
>       offset = 1536 + ((num_bat_entries * 4 + 511) & ~511);
> -    ret = blk_pwrite(blk, offset, buf, HEADER_SIZE);
> +    ret = blk_pwrite(blk, offset, buf, HEADER_SIZE, 0);
>       if (ret < 0) {
>           goto fail;
>       }
> @@ -799,7 +799,7 @@ static int create_dynamic_disk(BlockBackend *blk, uint8_t *buf,
>
>       memset(buf, 0xFF, 512);
>       for (i = 0; i < (num_bat_entries * 4 + 511) / 512; i++) {
> -        ret = blk_pwrite(blk, offset, buf, 512);
> +        ret = blk_pwrite(blk, offset, buf, 512, 0);
>           if (ret < 0) {
>               goto fail;
>           }
> @@ -826,7 +826,7 @@ static int create_dynamic_disk(BlockBackend *blk, uint8_t *buf,
>       /* Write the header */
>       offset = 512;
>
> -    ret = blk_pwrite(blk, offset, buf, 1024);
> +    ret = blk_pwrite(blk, offset, buf, 1024, 0);
>       if (ret < 0) {
>           goto fail;
>       }
> @@ -848,7 +848,7 @@ static int create_fixed_disk(BlockBackend *blk, uint8_t *buf,
>           return ret;
>       }
>
> -    ret = blk_pwrite(blk, total_size - HEADER_SIZE, buf, HEADER_SIZE);
> +    ret = blk_pwrite(blk, total_size - HEADER_SIZE, buf, HEADER_SIZE, 0);
>       if (ret < 0) {
>           return ret;
>       }
> diff --git a/hw/nvram/spapr_nvram.c b/hw/nvram/spapr_nvram.c
> index 802636e..019f25d 100644
> --- a/hw/nvram/spapr_nvram.c
> +++ b/hw/nvram/spapr_nvram.c
> @@ -124,7 +124,7 @@ static void rtas_nvram_store(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>
>       alen = len;
>       if (nvram->blk) {
> -        alen = blk_pwrite(nvram->blk, offset, membuf, len);
> +        alen = blk_pwrite(nvram->blk, offset, membuf, len, 0);
>       }
>
>       assert(nvram->buf);
> @@ -190,7 +190,7 @@ static int spapr_nvram_post_load(void *opaque, int version_id)
>       sPAPRNVRAM *nvram = VIO_SPAPR_NVRAM(opaque);
>
>       if (nvram->blk) {
> -        int alen = blk_pwrite(nvram->blk, 0, nvram->buf, nvram->size);
> +        int alen = blk_pwrite(nvram->blk, 0, nvram->buf, nvram->size, 0);
>
>           if (alen < 0) {
>               return alen;
> diff --git a/nbd/server.c b/nbd/server.c
> index aa252a4..9be0a99 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1154,7 +1154,7 @@ static void nbd_trip(void *opaque)
>           TRACE("Writing to device");
>
>           ret = blk_pwrite(exp->blk, request.from + exp->dev_offset,
> -                        req->data, request.len);
> +                         req->data, request.len, 0);
>           if (ret < 0) {
>               LOG("writing to file failed");
>               reply.error = -ret;
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index e34f777..e26e543 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -474,7 +474,7 @@ static int do_pwrite(BlockBackend *blk, char *buf, int64_t offset,
>           return -ERANGE;
>       }
>
> -    *total = blk_pwrite(blk, offset, (uint8_t *)buf, count);
> +    *total = blk_pwrite(blk, offset, (uint8_t *)buf, count, 0);
>       if (*total < 0) {
>           return *total;
>       }
Acked-by: Denis V. Lunev <den@openvz.org>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 41/44] nbd: Implement NBD_CMD_WRITE_ZEROES on server
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 41/44] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
@ 2016-04-23  9:00   ` Pavel Borzenkov
  2016-04-25 12:11   ` Alex Bligh
  1 sibling, 0 replies; 67+ messages in thread
From: Pavel Borzenkov @ 2016-04-23  9:00 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Kevin Wolf, Paolo Bonzini, alex, qemu-block, Max Reitz

On Fri, Apr 22, 2016 at 05:40:49PM -0600, Eric Blake wrote:
> Upstream NBD protocol recently added the ability to efficiently
> write zeroes without having to send the zeroes over the wire,
> along with a flag to control whether the client wants a hole.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: abandon NBD_CMD_CLOSE extension, rebase to use blk_pwrite_zeroes
> ---
>  include/block/nbd.h |  7 +++++--
>  nbd/server.c        | 42 ++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 45 insertions(+), 4 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 05c0e48..1072d9e 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -70,6 +70,7 @@ typedef struct nbd_reply nbd_reply;
>  #define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
>  #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
>  #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
> +#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)     /* Send WRITE_ZEROES */
> 
>  /* New-style handshake (global) flags, sent from server to client, and
>     control what will happen during handshake phase. */
> @@ -102,7 +103,8 @@ typedef struct nbd_reply nbd_reply;
>  #define NBD_INFO_DESCRIPTION    2
> 
>  /* Request flags, sent from client to server during transmission phase */
> -#define NBD_CMD_FLAG_FUA        (1 << 0)
> +#define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
> +#define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */
> 
>  /* Supported request types */
>  enum {
> @@ -110,7 +112,8 @@ enum {
>      NBD_CMD_WRITE = 1,
>      NBD_CMD_DISC = 2,
>      NBD_CMD_FLUSH = 3,
> -    NBD_CMD_TRIM = 4
> +    NBD_CMD_TRIM = 4,
> +    NBD_CMD_WRITE_ZEROES = 5,

It's defined to 6 by the spec.

>  };
> 
>  #define NBD_DEFAULT_PORT	10809
> diff --git a/nbd/server.c b/nbd/server.c
> index 1edb5f3..563afb2 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -689,7 +689,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>      char buf[8 + 8 + 8 + 128];
>      int rc;
>      const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> -                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> +                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
> +                              NBD_FLAG_SEND_WRITE_ZEROES);
>      bool oldStyle;
>      size_t len;
> 
> @@ -1199,11 +1200,17 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
>          rc = -EINVAL;
>          goto out;
>      }
> -    if (request->flags & ~NBD_CMD_FLAG_FUA) {
> +    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
>          LOG("unsupported flags (got 0x%x)", request->flags);
>          rc = -EINVAL;
>          goto out;
>      }
> +    if (request->type != NBD_CMD_WRITE_ZEROES &&
> +        (request->flags & NBD_CMD_FLAG_NO_HOLE)) {
> +        LOG("unexpected flags (got 0x%x)", request->flags);
> +        rc = -EINVAL;
> +        goto out;
> +    }
> 
>      rc = 0;
> 
> @@ -1308,6 +1315,37 @@ static void nbd_trip(void *opaque)
>          }
>          break;
> 
> +    case NBD_CMD_WRITE_ZEROES:
> +        TRACE("Request type is WRITE_ZEROES");
> +
> +        if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
> +            TRACE("Server is read-only, return error");
> +            reply.error = EROFS;
> +            goto error_reply;
> +        }
> +
> +        TRACE("Writing to device");
> +
> +        flags = 0;
> +        if (request.flags & NBD_CMD_FLAG_FUA) {
> +            flags |= BDRV_REQ_FUA;
> +        }
> +        if (!(request.flags & NBD_CMD_FLAG_NO_HOLE)) {
> +            flags |= BDRV_REQ_MAY_UNMAP;
> +        }
> +        ret = blk_pwrite_zeroes(exp->blk, request.from + exp->dev_offset,
> +                                request.len, flags);
> +        if (ret < 0) {
> +            LOG("writing to file failed");
> +            reply.error = -ret;
> +            goto error_reply;
> +        }
> +
> +        if (nbd_co_send_reply(req, &reply, 0) < 0) {
> +            goto out;
> +        }
> +        break;
> +
>      case NBD_CMD_DISC:
>          /* unreachable, thanks to special case in nbd_co_receive_request() */
>          abort();
> -- 
> 2.5.5
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 02/44] nbd: Quit server after any write error
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 02/44] nbd: Quit server after any write error Eric Blake
@ 2016-04-25  9:21   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25  9:21 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini, qemu-block


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> We should never ignore failure from nbd_negotiate_send_rep(); if
> we are unable to write to the client, then it is not worth trying
> to continue the negotiation.  Fortunately, the problem is not
> too severe - chances are that the errors being ignored here (mainly
> inability to write the reply to the client) are indications of
> a closed connection or something similar, which will also affect
> the next attempt to interact with the client and eventually reach
> a point where the errors are detected to end the loop.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> nbd/server.c | 32 +++++++++++++++++++++++---------
> 1 file changed, 23 insertions(+), 9 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index fc36f4d..0a003e4 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -334,7 +334,10 @@ static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
>         return NULL;
>     }
> 
> -    nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, NBD_OPT_STARTTLS);
> +    if (nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
> +                               NBD_OPT_STARTTLS) < 0) {
> +        return NULL;
> +    }
> 
>     tioc = qio_channel_tls_new_server(ioc,
>                                       client->tlscreds,
> @@ -460,8 +463,11 @@ static int nbd_negotiate_options(NBDClient *client)
>                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
>                     return -EIO;
>                 }
> -                nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_TLS_REQD,
> -                                       clientflags);
> +                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_TLS_REQD,
> +                                             clientflags);
> +                if (ret < 0) {
> +                    return ret;
> +                }
>                 break;
>             }
>         } else if (fixedNewstyle) {
> @@ -485,12 +491,17 @@ static int nbd_negotiate_options(NBDClient *client)
>                 }
>                 if (client->tlscreds) {
>                     TRACE("TLS already enabled");
> -                    nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_INVALID,
> -                                           clientflags);
> +                    ret = nbd_negotiate_send_rep(client->ioc,
> +                                                 NBD_REP_ERR_INVALID,
> +                                                 clientflags);
>                 } else {
>                     TRACE("TLS not configured");
> -                    nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_POLICY,
> -                                           clientflags);
> +                    ret = nbd_negotiate_send_rep(client->ioc,
> +                                                 NBD_REP_ERR_POLICY,
> +                                                 clientflags);
> +                }
> +                if (ret < 0) {
> +                    return ret;
>                 }
>                 break;
>             default:
> @@ -498,8 +509,11 @@ static int nbd_negotiate_options(NBDClient *client)
>                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
>                     return -EIO;
>                 }
> -                nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_UNSUP,
> -                                       clientflags);
> +                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_UNSUP,
> +                                             clientflags);
> +                if (ret < 0) {
> +                    return ret;
> +                }
>                 break;
>             }
>         } else {
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/44] nbd: Limit nbdflags to 16 bits
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 07/44] nbd: Limit nbdflags to 16 bits Eric Blake
@ 2016-04-25  9:24   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25  9:24 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini, qemu block, Max Reitz


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> Rather than asserting that nbdflags is within range, just give
> it the correct type to begin with :)  nbdflags corresponds to
> the per-export portion of NBD Protocol "transmission flags", which
> is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.
> 
> Furthermore, upstream NBD has never passed the global flags to
> the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
> introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
> tried to OR the global flags with the transmission flags, with
> the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
> caused all earlier NBD 3.x clients to treat every export as
> read-only; NBD 3.10 and later intentionally clip things to 16
> bits to pass only transmission flags).  Qemu should follow suit,
> since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
> and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
> during transmission.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Looks sensible, but NBD has at least three types of flags. Perhaps
rather than calling them nbdflags you could call them
nbdtransmissionflags, nbdclientflags or whatever which might
help avoid this confusion in future.

Alex


> 
> ---
> v3: expand scope of patch
> ---
> block/nbd-client.h  |  2 +-
> include/block/nbd.h |  6 +++---
> nbd/client.c        | 28 +++++++++++++++-------------
> nbd/server.c        | 10 ++++------
> qemu-nbd.c          |  4 ++--
> 5 files changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index bc7aec0..1243612 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -20,7 +20,7 @@
> typedef struct NbdClientSession {
>     QIOChannelSocket *sioc; /* The master data channel */
>     QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
> -    uint32_t nbdflags;
> +    uint16_t nbdflags;
>     off_t size;
> 
>     CoMutex send_mutex;
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index b86a976..134f117 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -83,11 +83,11 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
>                      size_t offset,
>                      size_t length,
>                      bool do_read);
> -int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
> +int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>                           QCryptoTLSCreds *tlscreds, const char *hostname,
>                           QIOChannel **outioc,
>                           off_t *size, Error **errp);
> -int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size);
> +int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
> ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request);
> ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply);
> int nbd_client(int fd);
> @@ -97,7 +97,7 @@ typedef struct NBDExport NBDExport;
> typedef struct NBDClient NBDClient;
> 
> NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
> -                          uint32_t nbdflags, void (*close)(NBDExport *),
> +                          uint16_t nbdflags, void (*close)(NBDExport *),
>                           Error **errp);
> void nbd_export_close(NBDExport *exp);
> void nbd_export_get(NBDExport *exp);
> diff --git a/nbd/client.c b/nbd/client.c
> index f1afa49..937344c 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -406,7 +406,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
> }
> 
> 
> -int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
> +int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>                           QCryptoTLSCreds *tlscreds, const char *hostname,
>                           QIOChannel **outioc,
>                           off_t *size, Error **errp)
> @@ -466,7 +466,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>         uint32_t opt;
>         uint32_t namesize;
>         uint16_t globalflags;
> -        uint16_t exportflags;
>         bool fixedNewStyle = false;
> 
>         if (read_sync(ioc, &globalflags, sizeof(globalflags)) !=
> @@ -475,7 +474,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>             goto fail;
>         }
>         globalflags = be16_to_cpu(globalflags);
> -        *flags = globalflags << 16;
>         TRACE("Global flags are %" PRIx32, globalflags);
>         if (globalflags & NBD_FLAG_FIXED_NEWSTYLE) {
>             fixedNewStyle = true;
> @@ -543,17 +541,15 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>             goto fail;
>         }
>         *size = be64_to_cpu(s);
> -        TRACE("Size is %" PRIu64, *size);
> 
> -        if (read_sync(ioc, &exportflags, sizeof(exportflags)) !=
> -            sizeof(exportflags)) {
> +        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
>             error_setg(errp, "Failed to read export flags");
>             goto fail;
>         }
> -        exportflags = be16_to_cpu(exportflags);
> -        *flags |= exportflags;
> -        TRACE("Export flags are %" PRIx16, exportflags);
> +        be16_to_cpus(flags);
>     } else if (magic == NBD_CLIENT_MAGIC) {
> +        uint32_t oldflags;
> +
>         if (name) {
>             error_setg(errp, "Server does not support export names");
>             goto fail;
> @@ -570,16 +566,22 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
>         *size = be64_to_cpu(s);
>         TRACE("Size is %" PRIu64, *size);
> 
> -        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
> +        if (read_sync(ioc, &oldflags, sizeof(oldflags)) != sizeof(oldflags)) {
>             error_setg(errp, "Failed to read export flags");
>             goto fail;
>         }
> -        *flags = be32_to_cpup(flags);
> +        be32_to_cpus(&oldflags);
> +        if (oldflags & ~0xffff) {
> +            error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
> +            goto fail;
> +        }
> +        *flags = oldflags;
>     } else {
>         error_setg(errp, "Bad magic received");
>         goto fail;
>     }
> 
> +    TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
>     if (read_sync(ioc, &buf, 124) != 124) {
>         error_setg(errp, "Failed to read reserved block");
>         goto fail;
> @@ -591,7 +593,7 @@ fail:
> }
> 
> #ifdef __linux__
> -int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size)
> +int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
> {
>     unsigned long sectors = size / BDRV_SECTOR_SIZE;
>     if (size / BDRV_SECTOR_SIZE != sectors) {
> @@ -687,7 +689,7 @@ int nbd_disconnect(int fd)
> }
> 
> #else
> -int nbd_init(int fd, QIOChannelSocket *ioc, uint32_t flags, off_t size)
> +int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size)
> {
>     return -ENOTSUP;
> }
> diff --git a/nbd/server.c b/nbd/server.c
> index 789189d..31fc9cf 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -63,7 +63,7 @@ struct NBDExport {
>     char *name;
>     off_t dev_offset;
>     off_t size;
> -    uint32_t nbdflags;
> +    uint16_t nbdflags;
>     QTAILQ_HEAD(, NBDClient) clients;
>     QTAILQ_ENTRY(NBDExport) next;
> 
> @@ -544,8 +544,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>     NBDClient *client = data->client;
>     char buf[8 + 8 + 8 + 128];
>     int rc;
> -    const int myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> -                         NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> +    const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> +                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
>     bool oldStyle;
> 
>     /* Old style negotiation header without options
> @@ -575,7 +575,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
> 
>     oldStyle = client->exp != NULL && !client->tlscreds;
>     if (oldStyle) {
> -        assert ((client->exp->nbdflags & ~65535) == 0);
>         stq_be_p(buf + 8, NBD_CLIENT_MAGIC);
>         stq_be_p(buf + 16, client->exp->size);
>         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
> @@ -604,7 +603,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>             goto fail;
>         }
> 
> -        assert ((client->exp->nbdflags & ~65535) == 0);
>         stq_be_p(buf + 18, client->exp->size);
>         stw_be_p(buf + 26, client->exp->nbdflags | myflags);
>         if (nbd_negotiate_write(client->ioc, buf + 18, sizeof(buf) - 18) !=
> @@ -806,7 +804,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
> }
> 
> NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
> -                          uint32_t nbdflags, void (*close)(NBDExport *),
> +                          uint16_t nbdflags, void (*close)(NBDExport *),
>                           Error **errp)
> {
>     NBDExport *exp = g_malloc0(sizeof(NBDExport));
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 2c9754e..71bfdeb 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -241,7 +241,7 @@ static void *nbd_client_thread(void *arg)
> {
>     char *device = arg;
>     off_t size;
> -    uint32_t nbdflags;
> +    uint16_t nbdflags;
>     QIOChannelSocket *sioc;
>     int fd;
>     int ret;
> @@ -455,7 +455,7 @@ int main(int argc, char **argv)
>     BlockBackend *blk;
>     BlockDriverState *bs;
>     off_t dev_offset = 0;
> -    uint32_t nbdflags = 0;
> +    uint16_t nbdflags = 0;
>     bool disconnect = false;
>     const char *bindto = "0.0.0.0";
>     const char *port = NULL;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/44] nbd: Add qemu-nbd -D for human-readable description
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 08/44] nbd: Add qemu-nbd -D for human-readable description Eric Blake
@ 2016-04-25  9:25   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25  9:25 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini, qemu block, Max Reitz


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> The NBD protocol allows servers to advertise a human-readable
> description alongside an export name during NBD_OPT_LIST.  Add
> an option to pass through the user's string to the NBD client.
> 
> Doing this also makes it easier to test commit 200650d4, which
> is the client counterpart of receiving the description.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> include/block/nbd.h |  1 +
> nbd/nbd-internal.h  |  5 +++--
> nbd/server.c        | 34 ++++++++++++++++++++++++++--------
> qemu-nbd.c          | 12 +++++++++++-
> qemu-nbd.texi       |  5 ++++-
> 5 files changed, 45 insertions(+), 12 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 134f117..3e2d76b 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -107,6 +107,7 @@ BlockBackend *nbd_export_get_blockdev(NBDExport *exp);
> 
> NBDExport *nbd_export_find(const char *name);
> void nbd_export_set_name(NBDExport *exp, const char *name);
> +void nbd_export_set_description(NBDExport *exp, const char *description);
> void nbd_export_close_all(void);
> 
> void nbd_client_new(NBDExport *exp,
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 3791535..035ead4 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -103,9 +103,10 @@ static inline ssize_t read_sync(QIOChannel *ioc, void *buffer, size_t size)
>     return nbd_wr_syncv(ioc, &iov, 1, 0, size, true);
> }
> 
> -static inline ssize_t write_sync(QIOChannel *ioc, void *buffer, size_t size)
> +static inline ssize_t write_sync(QIOChannel *ioc, const void *buffer,
> +                                 size_t size)
> {
> -    struct iovec iov = { .iov_base = buffer, .iov_len = size };
> +    struct iovec iov = { .iov_base = (void *) buffer, .iov_len = size };
> 
>     return nbd_wr_syncv(ioc, &iov, 1, 0, size, false);
> }
> diff --git a/nbd/server.c b/nbd/server.c
> index 31fc9cf..aa252a4 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -61,6 +61,7 @@ struct NBDExport {
> 
>     BlockBackend *blk;
>     char *name;
> +    char *description;
>     off_t dev_offset;
>     off_t size;
>     uint16_t nbdflags;
> @@ -128,7 +129,8 @@ static ssize_t nbd_negotiate_read(QIOChannel *ioc, void *buffer, size_t size)
> 
> }
> 
> -static ssize_t nbd_negotiate_write(QIOChannel *ioc, void *buffer, size_t size)
> +static ssize_t nbd_negotiate_write(QIOChannel *ioc, const void *buffer,
> +                                   size_t size)
> {
>     ssize_t ret;
>     guint watch;
> @@ -224,11 +226,15 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
> 
> static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
> {
> -    uint64_t magic, name_len;
> +    uint64_t magic;
> +    size_t name_len, desc_len;
>     uint32_t opt, type, len;
> +    const char *name = exp->name ? exp->name : "";
> +    const char *desc = exp->description ? exp->description : "";
> 
> -    TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
> -    name_len = strlen(exp->name);
> +    TRACE("Advertising export name '%s' description '%s'", name, desc);
> +    name_len = strlen(name);
> +    desc_len = strlen(desc);
>     magic = cpu_to_be64(NBD_REP_MAGIC);
>     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
>         LOG("write failed (magic)");
> @@ -244,18 +250,22 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
>         LOG("write failed (reply type)");
>         return -EINVAL;
>     }
> -    len = cpu_to_be32(name_len + sizeof(len));
> +    len = cpu_to_be32(name_len + desc_len + sizeof(len));
>     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
>         LOG("write failed (length)");
>         return -EINVAL;
>     }
>     len = cpu_to_be32(name_len);
>     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
> -        LOG("write failed (length)");
> +        LOG("write failed (name length)");
>         return -EINVAL;
>     }
> -    if (nbd_negotiate_write(ioc, exp->name, name_len) != name_len) {
> -        LOG("write failed (buffer)");
> +    if (nbd_negotiate_write(ioc, name, name_len) != name_len) {
> +        LOG("write failed (name buffer)");
> +        return -EINVAL;
> +    }
> +    if (nbd_negotiate_write(ioc, desc, desc_len) != desc_len) {
> +        LOG("write failed (description buffer)");
>         return -EINVAL;
>     }
>     return 0;
> @@ -877,6 +887,12 @@ void nbd_export_set_name(NBDExport *exp, const char *name)
>     nbd_export_put(exp);
> }
> 
> +void nbd_export_set_description(NBDExport *exp, const char *description)
> +{
> +    g_free(exp->description);
> +    exp->description = g_strdup(description);
> +}
> +
> void nbd_export_close(NBDExport *exp)
> {
>     NBDClient *client, *next;
> @@ -886,6 +902,7 @@ void nbd_export_close(NBDExport *exp)
>         client_close(client);
>     }
>     nbd_export_set_name(exp, NULL);
> +    nbd_export_set_description(exp, NULL);
>     nbd_export_put(exp);
> }
> 
> @@ -904,6 +921,7 @@ void nbd_export_put(NBDExport *exp)
> 
>     if (--exp->refcount == 0) {
>         assert(exp->name == NULL);
> +        assert(exp->description == NULL);
> 
>         if (exp->close) {
>             exp->close(exp);
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 71bfdeb..a85e98f 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -77,6 +77,7 @@ static void usage(const char *name)
> "  -t, --persistent          don't exit on the last connection\n"
> "  -v, --verbose             display extra debugging information\n"
> "  -x, --export-name=NAME    expose export by name\n"
> +"  -D, --description=TEXT    with -x, also export a human-readable description\n"
> "\n"
> "Exposing part of the image:\n"
> "  -o, --offset=OFFSET       offset into the image\n"
> @@ -464,7 +465,7 @@ int main(int argc, char **argv)
>     off_t fd_size;
>     QemuOpts *sn_opts = NULL;
>     const char *sn_id_or_name = NULL;
> -    const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:";
> +    const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:D:";
>     struct option lopt[] = {
>         { "help", no_argument, NULL, 'h' },
>         { "version", no_argument, NULL, 'V' },
> @@ -490,6 +491,7 @@ int main(int argc, char **argv)
>         { "verbose", no_argument, NULL, 'v' },
>         { "object", required_argument, NULL, QEMU_NBD_OPT_OBJECT },
>         { "export-name", required_argument, NULL, 'x' },
> +        { "description", required_argument, NULL, 'D' },
>         { "tls-creds", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
>         { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
>         { NULL, 0, NULL, 0 }
> @@ -509,6 +511,7 @@ int main(int argc, char **argv)
>     BlockdevDetectZeroesOptions detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
>     QDict *options = NULL;
>     const char *export_name = NULL;
> +    const char *export_description = NULL;
>     const char *tlscredsid = NULL;
>     bool imageOpts = false;
>     bool writethrough = true;
> @@ -672,6 +675,9 @@ int main(int argc, char **argv)
>         case 'x':
>             export_name = optarg;
>             break;
> +        case 'D':
> +            export_description = optarg;
> +            break;
>         case 'v':
>             verbose = 1;
>             break;
> @@ -899,7 +905,11 @@ int main(int argc, char **argv)
>     }
>     if (export_name) {
>         nbd_export_set_name(exp, export_name);
> +        nbd_export_set_description(exp, export_description);
>         newproto = true;
> +    } else if (export_description) {
> +        error_report("Export description requires an export name");
> +        exit(EXIT_FAILURE);
>     }
> 
>     server_ioc = qio_channel_socket_new();
> diff --git a/qemu-nbd.texi b/qemu-nbd.texi
> index 9f23343..923de74 100644
> --- a/qemu-nbd.texi
> +++ b/qemu-nbd.texi
> @@ -79,9 +79,12 @@ Disconnect the device @var{dev}
> Allow up to @var{num} clients to share the device (default @samp{1})
> @item -t, --persistent
> Don't exit on the last connection
> -@item -x NAME, --export-name=NAME
> +@item -x, --export-name=@var{name}
> Set the NBD volume export name. This switches the server to use
> the new style NBD protocol negotiation
> +@item -D, --description=@var{description}
> +Set the NBD volume export description, as a human-readable
> +string. Requires the use of @option{-x}
> @item --tls-creds=ID
> Enable mandatory TLS encryption for the server by setting the ID
> of the TLS credentials object previously created with the --object
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/44] nbd: Improve server handling of bogus commands
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 03/44] nbd: Improve server handling of bogus commands Eric Blake
@ 2016-04-25  9:29   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25  9:29 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini, qemu-block


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> We have a few bugs in how we handle invalid client commands:
> 
> - A client can send an NBD_CMD_DISC where from + len overflows,
> convincing us to reply with an error and stay connected, even
> though the protocol requires us to silently disconnect. Fix by
> hoisting the special case sooner.
> 
> - A client can send an NBD_CMD_WRITE with bogus from and len,
> where we reply to the client with EINVAL without consuming the
> payload; this will normally cause us to fail if the next thing
> read is not the right magic, but in rare cases, could cause us
> to interpret the data payload as valid commands and do things
> not requested by the client. Fix by adding a complete flag to
> track whether we are in sync or must disconnect.
> 
> - If we report an error to NBD_CMD_READ, we are not writing out
> any data payload; but the protocol says that a client can expect
> to read the payload no matter what (and must instead ignore it),
> which means the client will start reading our next replies as
> its data payload. Fix by disconnecting.
> 
> Furthermore, we have split the checks for bogus from/len across
> two functions, when it is easier to do it all at once.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> nbd/server.c | 67 +++++++++++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 46 insertions(+), 21 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 0a003e4..6a6b5a2 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -52,6 +52,7 @@ struct NBDRequest {
>     QSIMPLEQ_ENTRY(NBDRequest) entry;
>     NBDClient *client;
>     uint8_t *data;
> +    bool complete;
> };
> 
> struct NBDExport {
> @@ -985,7 +986,13 @@ static ssize_t nbd_co_send_reply(NBDRequest *req, struct nbd_reply *reply,
>     return rc;
> }
> 
> -static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *request)
> +/* Collect a client request.  Return 0 if request looks valid, -EAGAIN
> + * to keep trying the collection, -EIO to drop connection right away,
> + * and any other negative value to report an error to the client
> + * (although the caller may still need to disconnect after reporting
> + * the error).  */
> +static ssize_t nbd_co_receive_request(NBDRequest *req,
> +                                      struct nbd_request *request)
> {
>     NBDClient *client = req->client;
>     uint32_t command;
> @@ -1003,16 +1010,26 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
>         goto out;
>     }
> 
> -    if ((request->from + request->len) < request->from) {
> -        LOG("integer overflow detected! "
> -            "you're probably being attacked");
> -        rc = -EINVAL;
> -        goto out;
> -    }
> -
>     TRACE("Decoding type");
> 
>     command = request->type & NBD_CMD_MASK_COMMAND;
> +    if (command == NBD_CMD_DISC) {
> +        /* Special case: we're going to disconnect without a reply,
> +         * whether or not flags, from, or len are bogus */
> +        TRACE("Request type is DISCONNECT");
> +        rc = -EIO;
> +        goto out;
> +    }
> +
> +    /* Check for sanity in the parameters, part 1.  Defer as many
> +     * checks as possible until after reading any NBD_CMD_WRITE
> +     * payload, so we can try and keep the connection alive.  */
> +    if ((request->from + request->len) < request->from) {
> +        LOG("integer overflow detected, you're probably being attacked");

I realise this is unchanged since the previous code, but why
'you're probably being attacked'? More likely you're probably
using a buggy client.

> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
>     if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
>         if (request->len > NBD_MAX_BUFFER_SIZE) {
>             LOG("len (%" PRIu32" ) is larger than max len (%u)",
> @@ -1035,7 +1052,18 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request *reque
>             rc = -EIO;
>             goto out;
>         }
> +        req->complete = true;
>     }
> +
> +    /* Sanity checks, part 2. */
> +    if (request->from + request->len > client->exp->size) {
> +        LOG("operation past EOF; From: %" PRIu64 ", Len: %" PRIu32
> +            ", Size: %" PRIu64, request->from, request->len,
> +            (uint64_t)client->exp->size);
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
>     rc = 0;
> 
> out:
> @@ -1077,14 +1105,6 @@ static void nbd_trip(void *opaque)
>         goto error_reply;
>     }
>     command = request.type & NBD_CMD_MASK_COMMAND;
> -    if (command != NBD_CMD_DISC && (request.from + request.len) > exp->size) {
> -            LOG("From: %" PRIu64 ", Len: %" PRIu32", Size: %" PRIu64
> -                ", Offset: %" PRIu64 "\n",
> -                request.from, request.len,
> -                (uint64_t)exp->size, (uint64_t)exp->dev_offset);
> -        LOG("requested operation past EOF--bad client?");
> -        goto invalid_request;
> -    }
> 
>     if (client->closing) {
>         /*
> @@ -1151,10 +1171,11 @@ static void nbd_trip(void *opaque)
>             goto out;
>         }
>         break;
> +
>     case NBD_CMD_DISC:
> -        TRACE("Request type is DISCONNECT");
> -        errno = 0;
> -        goto out;
> +        /* unreachable, thanks to special case in nbd_co_receive_request() */
> +        abort();
> +
>     case NBD_CMD_FLUSH:
>         TRACE("Request type is FLUSH");
> 
> @@ -1182,10 +1203,14 @@ static void nbd_trip(void *opaque)
>         break;
>     default:
>         LOG("invalid request type (%" PRIu32 ") received", request.type);
> -    invalid_request:
>         reply.error = EINVAL;
>     error_reply:
> -        if (nbd_co_send_reply(req, &reply, 0) < 0) {
> +        /* We must disconnect after replying with an error to
> +         * NBD_CMD_READ, since we choose not to send bogus filler
> +         * data; likewise after NBD_CMD_WRITE if we did not read the
> +         * payload. */
> +        if (nbd_co_send_reply(req, &reply, 0) < 0 || command == NBD_CMD_READ ||
> +            (command == NBD_CMD_WRITE && !req->complete)) {
>             goto out;
>         }
>         break;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 29/44] nbd: Avoid magic number for NBD max name size
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 29/44] nbd: Avoid magic number for NBD max name size Eric Blake
@ 2016-04-25  9:32   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25  9:32 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini, qemu block, Max Reitz


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> Declare a constant and use that when determining if an export
> name fits within the constraints we are willing to support.
> 
> Note that upstream NBD recently documented that clients MUST
> support export names of 256 bytes (not including trailing NUL),
> and SHOULD support names up to 4096 bytes.  4096 is a bit big
> (we would lose benefits of stack-allocation of a name array),
> and we already have other limits in place (for example, qcow2
> snapshot names are clamped around 1024).  So for now, just
> stick to the required minimum.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
> 
> ---
> v3: enlarge the limit, and document choice of new value
> ---
> include/block/nbd.h | 6 ++++++
> nbd/client.c        | 2 +-
> nbd/server.c        | 4 ++--
> 3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 3e2d76b..2c753cc 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -76,6 +76,12 @@ enum {
> 
> /* Maximum size of a single READ/WRITE data buffer */
> #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
> +/* Maximum size of an export name. The NBD spec requires 256 and
> + * suggests that servers support up to 4096, but we stick to only the
> + * required size so that we can stack-allocate the names, and because
> + * going larger would require an audit of more code to make sure we
> + * aren't overflowing some other buffer. */
> +#define NBD_MAX_NAME_SIZE 256
> 
> ssize_t nbd_wr_syncv(QIOChannel *ioc,
>                      struct iovec *iov,
> diff --git a/nbd/client.c b/nbd/client.c
> index 4659df3..b700100 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -210,7 +210,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>             error_setg(errp, "incorrect option name length");
>             return -1;
>         }
> -        if (namelen > 255) {
> +        if (namelen > NBD_MAX_NAME_SIZE) {
>             error_setg(errp, "export name length too long %" PRIu32, namelen);
>             return -1;
>         }
> diff --git a/nbd/server.c b/nbd/server.c
> index fa05a73..a20bf8a 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -296,13 +296,13 @@ static int nbd_negotiate_handle_list(NBDClient *client, uint32_t length)
> static int nbd_negotiate_handle_export_name(NBDClient *client, uint32_t length)
> {
>     int rc = -EINVAL;
> -    char name[256];
> +    char name[NBD_MAX_NAME_SIZE + 1];
> 
>     /* Client sends:
>         [20 ..  xx]   export name (length bytes)
>      */
>     TRACE("Checking length");
> -    if (length > 255) {
> +    if (length >= sizeof(name)) {
>         LOG("Bad length received");
>         goto fail;
>     }
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 30/44] nbd: Treat flags vs. command type as separate fields
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 30/44] nbd: Treat flags vs. command type as separate fields Eric Blake
@ 2016-04-25  9:34   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25  9:34 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini, qemu block, Max Reitz


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> Current upstream NBD documents that requests have a 16-bit flags,
> followed by a 16-bit type integer; although older versions mentioned
> only a 32-bit field with masking to find flags.  Since the protocol
> is in network order (big-endian over the wire), the ABI is unchanged;
> but dealing with the flags as a separate field rather than masking
> will make it easier to add support for upcoming NBD extensions that
> increase the number of both flags and commands.
> 
> Improve some comments in nbd.h based on the current upstream
> NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
> and touch some nearby code to keep checkpatch.pl happy.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
> 
> ---
> v3: rebase to other changes earlier in series
> ---
> include/block/nbd.h | 18 ++++++++++++------
> nbd/nbd-internal.h  |  4 ++--
> block/nbd-client.c  |  9 +++------
> nbd/client.c        | 17 ++++++++++-------
> nbd/server.c        | 51 ++++++++++++++++++++++++++-------------------------
> 5 files changed, 53 insertions(+), 46 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 2c753cc..f4ae86c 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -1,4 +1,5 @@
> /*
> + *  Copyright (C) 2016 Red Hat, Inc.
>  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>  *
>  *  Network Block Device
> @@ -27,7 +28,8 @@
> 
> struct nbd_request {
>     uint32_t magic;
> -    uint32_t type;
> +    uint16_t flags;
> +    uint16_t type;
>     uint64_t handle;
>     uint64_t from;
>     uint32_t len;
> @@ -39,6 +41,8 @@ struct nbd_reply {
>     uint64_t handle;
> } QEMU_PACKED;
> 
> +/* Transmission (export) flags: sent from server to client during handshake,
> +   but describe what will happen during transmission */
> #define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
> #define NBD_FLAG_READ_ONLY      (1 << 1)        /* Device is read-only */
> #define NBD_FLAG_SEND_FLUSH     (1 << 2)        /* Send FLUSH */
> @@ -46,10 +50,12 @@ struct nbd_reply {
> #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
> #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
> 
> -/* New-style global flags. */
> +/* New-style handshake (global) flags, sent from server to client, and
> +   control what will happen during handshake phase. */
> #define NBD_FLAG_FIXED_NEWSTYLE     (1 << 0)    /* Fixed newstyle protocol. */
> 
> -/* New-style client flags. */
> +/* New-style client flags, sent from client to server to control what happens
> +   during handshake phase. */
> #define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)    /* Fixed newstyle protocol. */
> 
> /* Reply types. */
> @@ -60,10 +66,10 @@ struct nbd_reply {
> #define NBD_REP_ERR_INVALID     ((UINT32_C(1) << 31) | 3) /* Invalid length. */
> #define NBD_REP_ERR_TLS_REQD    ((UINT32_C(1) << 31) | 5) /* TLS required */
> 
> +/* Request flags, sent from client to server during transmission phase */
> +#define NBD_CMD_FLAG_FUA        (1 << 0)
> 
> -#define NBD_CMD_MASK_COMMAND	0x0000ffff
> -#define NBD_CMD_FLAG_FUA	(1 << 16)
> -
> +/* Supported request types */
> enum {
>     NBD_CMD_READ = 0,
>     NBD_CMD_WRITE = 1,
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 035ead4..d0108a1 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -52,10 +52,10 @@
> /* This is all part of the "official" NBD API.
>  *
>  * The most up-to-date documentation is available at:
> - * https://github.com/yoe/nbd/blob/master/doc/proto.txt
> + * https://github.com/yoe/nbd/blob/master/doc/proto.md
>  */
> 
> -#define NBD_REQUEST_SIZE        (4 + 4 + 8 + 8 + 4)
> +#define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
> #define NBD_REPLY_SIZE          (4 + 4 + 8)
> #define NBD_REQUEST_MAGIC       0x25609513
> #define NBD_REPLY_MAGIC         0x67446698
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 878e879..285025d 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -1,6 +1,7 @@
> /*
>  * QEMU Block driver for  NBD
>  *
> + * Copyright (C) 2016 Red Hat, Inc.
>  * Copyright (C) 2008 Bull S.A.S.
>  *     Author: Laurent Vivier <Laurent.Vivier@bull.net>
>  *
> @@ -252,7 +253,7 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
> 
>     if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
>         *flags &= ~BDRV_REQ_FUA;
> -        request.type |= NBD_CMD_FLAG_FUA;
> +        request.flags |= NBD_CMD_FLAG_FUA;
>     }
> 
>     request.from = sector_num * 512;
> @@ -376,11 +377,7 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
> void nbd_client_close(BlockDriverState *bs)
> {
>     NbdClientSession *client = nbd_get_client_session(bs);
> -    struct nbd_request request = {
> -        .type = NBD_CMD_DISC,
> -        .from = 0,
> -        .len = 0
> -    };
> +    struct nbd_request request = { .type = NBD_CMD_DISC };
> 
>     if (client->ioc == NULL) {
>         return;
> diff --git a/nbd/client.c b/nbd/client.c
> index b700100..a9e173a 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -1,4 +1,5 @@
> /*
> + *  Copyright (C) 2016 Red Hat, Inc.
>  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>  *
>  *  Network Block Device Client Side
> @@ -713,14 +714,16 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request)
> 
>     TRACE("Sending request to server: "
>           "{ .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64
> -          ", .type=%" PRIu16 " }",
> -          request->from, request->len, request->handle, request->type);
> +          ", .flags = %" PRIx16 ", .type = %" PRIu16 " }",
> +          request->from, request->len, request->handle,
> +          request->flags, request->type);
> 
> -    cpu_to_be32w((uint32_t*)buf, NBD_REQUEST_MAGIC);
> -    cpu_to_be32w((uint32_t*)(buf + 4), request->type);
> -    cpu_to_be64w((uint64_t*)(buf + 8), request->handle);
> -    cpu_to_be64w((uint64_t*)(buf + 16), request->from);
> -    cpu_to_be32w((uint32_t*)(buf + 24), request->len);
> +    cpu_to_be32w((uint32_t *)buf, NBD_REQUEST_MAGIC);
> +    cpu_to_be16w((uint16_t *)(buf + 4), request->flags);
> +    cpu_to_be16w((uint16_t *)(buf + 6), request->type);
> +    cpu_to_be64w((uint64_t *)(buf + 8), request->handle);
> +    cpu_to_be64w((uint64_t *)(buf + 16), request->from);
> +    cpu_to_be32w((uint32_t *)(buf + 24), request->len);
> 
>     ret = write_sync(ioc, buf, sizeof(buf));
>     if (ret < 0) {
> diff --git a/nbd/server.c b/nbd/server.c
> index a20bf8a..1d30b6d 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1,4 +1,5 @@
> /*
> + *  Copyright (C) 2016 Red Hat, Inc.
>  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>  *
>  *  Network Block Device Server Side
> @@ -646,21 +647,23 @@ static ssize_t nbd_receive_request(QIOChannel *ioc, struct nbd_request *request)
> 
>     /* Request
>        [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
> -       [ 4 ..  7]   type    (0 == READ, 1 == WRITE)
> +       [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
> +       [ 6 ..  7]   type    (NBD_CMD_READ, ...)
>        [ 8 .. 15]   handle
>        [16 .. 23]   from
>        [24 .. 27]   len
>      */
> 
> -    magic = be32_to_cpup((uint32_t*)buf);
> -    request->type  = be32_to_cpup((uint32_t*)(buf + 4));
> -    request->handle = be64_to_cpup((uint64_t*)(buf + 8));
> -    request->from  = be64_to_cpup((uint64_t*)(buf + 16));
> -    request->len   = be32_to_cpup((uint32_t*)(buf + 24));
> +    magic = be32_to_cpup((uint32_t *)buf);
> +    request->flags = be16_to_cpup((uint16_t *)(buf + 4));
> +    request->type  = be16_to_cpup((uint16_t *)(buf + 6));
> +    request->handle = be64_to_cpup((uint64_t *)(buf + 8));
> +    request->from  = be64_to_cpup((uint64_t *)(buf + 16));
> +    request->len   = be32_to_cpup((uint32_t *)(buf + 24));
> 
> -    TRACE("Got request: { magic = 0x%" PRIx32 ", .type = %" PRIx32
> -          ", from = %" PRIu64 " , len = %" PRIu32 " }",
> -          magic, request->type, request->from, request->len);
> +    TRACE("Got request: { magic = 0x%" PRIx32 ", .flags = %" PRIx16
> +          ", .type = %" PRIx16 ", from = %" PRIu64 " , len = %" PRIu32 " }",
> +          magic, request->flags, request->type, request->from, request->len);
> 
>     if (magic != NBD_REQUEST_MAGIC) {
>         LOG("invalid magic (got 0x%" PRIx32 ")", magic);
> @@ -993,7 +996,6 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
>                                       struct nbd_request *request)
> {
>     NBDClient *client = req->client;
> -    uint32_t command;
>     ssize_t rc;
> 
>     g_assert(qemu_in_coroutine());
> @@ -1010,8 +1012,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
> 
>     TRACE("Decoding type");
> 
> -    command = request->type & NBD_CMD_MASK_COMMAND;
> -    if (command == NBD_CMD_DISC) {
> +    if (request->type == NBD_CMD_DISC) {
>         /* Special case: we're going to disconnect without a reply,
>          * whether or not flags, from, or len are bogus */
>         TRACE("Request type is DISCONNECT");
> @@ -1028,7 +1029,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
>         goto out;
>     }
> 
> -    if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
> +    if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE) {
>         if (request->len > NBD_MAX_BUFFER_SIZE) {
>             LOG("len (%" PRIu32" ) is larger than max len (%u)",
>                 request->len, NBD_MAX_BUFFER_SIZE);
> @@ -1042,7 +1043,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
>             goto out;
>         }
>     }
> -    if (command == NBD_CMD_WRITE) {
> +    if (request->type == NBD_CMD_WRITE) {
>         TRACE("Reading %" PRIu32 " byte(s)", request->len);
> 
>         if (read_sync(client->ioc, req->data, request->len) != request->len) {
> @@ -1061,10 +1062,10 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
>         rc = -EINVAL;
>         goto out;
>     }
> -    if (request->type & ~NBD_CMD_MASK_COMMAND & ~NBD_CMD_FLAG_FUA) {
> -        LOG("unsupported flags (got 0x%x)",
> -            request->type & ~NBD_CMD_MASK_COMMAND);
> -        return -EINVAL;
> +    if (request->flags & ~NBD_CMD_FLAG_FUA) {
> +        LOG("unsupported flags (got 0x%x)", request->flags);
> +        rc = -EINVAL;
> +        goto out;
>     }
> 
>     rc = 0;
> @@ -1084,7 +1085,6 @@ static void nbd_trip(void *opaque)
>     struct nbd_request request;
>     struct nbd_reply reply;
>     ssize_t ret;
> -    uint32_t command;
>     int flags;
> 
>     TRACE("Reading request.");
> @@ -1108,7 +1108,6 @@ static void nbd_trip(void *opaque)
>         reply.error = -ret;
>         goto error_reply;
>     }
> -    command = request.type & NBD_CMD_MASK_COMMAND;
> 
>     if (client->closing) {
>         /*
> @@ -1118,11 +1117,12 @@ static void nbd_trip(void *opaque)
>         goto done;
>     }
> 
> -    switch (command) {
> +    switch (request.type) {
>     case NBD_CMD_READ:
>         TRACE("Request type is READ");
> 
> -        if (request.type & NBD_CMD_FLAG_FUA) {
> +        /* XXX: NBD Protocol only documents use of FUA with WRITE */
> +        if (request.flags & NBD_CMD_FLAG_FUA) {
>             ret = blk_co_flush(exp->blk);
>             if (ret < 0) {
>                 LOG("flush failed");
> @@ -1155,7 +1155,7 @@ static void nbd_trip(void *opaque)
>         TRACE("Writing to device");
> 
>         flags = 0;
> -        if (request.type & NBD_CMD_FLAG_FUA) {
> +        if (request.flags & NBD_CMD_FLAG_FUA) {
>             flags |= BDRV_REQ_FUA;
>         }
>         ret = blk_pwrite(exp->blk, request.from + exp->dev_offset,
> @@ -1208,8 +1208,9 @@ static void nbd_trip(void *opaque)
>          * NBD_CMD_READ, since we choose not to send bogus filler
>          * data; likewise after NBD_CMD_WRITE if we did not read the
>          * payload. */
> -        if (nbd_co_send_reply(req, &reply, 0) < 0 || command == NBD_CMD_READ ||
> -            (command == NBD_CMD_WRITE && !req->complete)) {
> +        if (nbd_co_send_reply(req, &reply, 0) < 0 ||
> +            request.type == NBD_CMD_READ ||
> +            (request.type == NBD_CMD_WRITE && !req->complete)) {
>             goto out;
>         }
>         break;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 31/44] nbd: Share common reply-sending code in server
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 31/44] nbd: Share common reply-sending code in server Eric Blake
@ 2016-04-25  9:34   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25  9:34 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini, qemu-block


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> Rather than open-coding NBD_REP_SERVER, reuse the code we
> already have by adding a length parameter.  Additionally,
> the refactoring will make adding NBD_OPT_GO in a later patch
> easier.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
> 
> ---
> v3: rebase to changes earlier in series
> ---
> nbd/server.c | 48 +++++++++++++++++++++++-------------------------
> 1 file changed, 23 insertions(+), 25 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 1d30b6d..4435d37 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -195,12 +195,15 @@ static ssize_t nbd_negotiate_drop_sync(QIOChannel *ioc, size_t size)
> 
> */
> 
> -static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
> +/* Send a reply header, including length, but no payload.
> + * Return -errno to kill connection, 0 to continue negotiation. */
> +static int nbd_negotiate_send_rep_len(QIOChannel *ioc, uint32_t type,
> +                                      uint32_t opt, uint32_t len)
> {
>     uint64_t magic;
> -    uint32_t len;
> 
> -    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32, type, opt);
> +    TRACE("Reply opt=%" PRIx32 " type=%" PRIx32 " len=%" PRIu32,
> +          type, opt, len);
> 
>     magic = cpu_to_be64(NBD_REP_MAGIC);
>     if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> @@ -217,7 +220,7 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
>         LOG("write failed (rep type)");
>         return -EINVAL;
>     }
> -    len = cpu_to_be32(0);
> +    len = cpu_to_be32(len);
>     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
>         LOG("write failed (rep data length)");
>         return -EINVAL;
> @@ -225,37 +228,32 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
>     return 0;
> }
> 
> +/* Send a reply header with default 0 length.
> + * Return -errno to kill connection, 0 to continue negotiation. */
> +static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t opt)
> +{
> +    return nbd_negotiate_send_rep_len(ioc, type, opt, 0);
> +}
> +
> +/* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
> + * Return -errno to kill connection, 0 to continue negotiation. */
> static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
> {
> -    uint64_t magic;
>     size_t name_len, desc_len;
> -    uint32_t opt, type, len;
> +    uint32_t len;
>     const char *name = exp->name ? exp->name : "";
>     const char *desc = exp->description ? exp->description : "";
> +    int rc;
> 
>     TRACE("Advertising export name '%s' description '%s'", name, desc);
>     name_len = strlen(name);
>     desc_len = strlen(desc);
> -    magic = cpu_to_be64(NBD_REP_MAGIC);
> -    if (nbd_negotiate_write(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -        LOG("write failed (magic)");
> -        return -EINVAL;
> -     }
> -    opt = cpu_to_be32(NBD_OPT_LIST);
> -    if (nbd_negotiate_write(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -        LOG("write failed (opt)");
> -        return -EINVAL;
> -    }
> -    type = cpu_to_be32(NBD_REP_SERVER);
> -    if (nbd_negotiate_write(ioc, &type, sizeof(type)) != sizeof(type)) {
> -        LOG("write failed (reply type)");
> -        return -EINVAL;
> -    }
> -    len = cpu_to_be32(name_len + desc_len + sizeof(len));
> -    if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
> -        LOG("write failed (length)");
> -        return -EINVAL;
> +    len = name_len + desc_len + sizeof(len);
> +    rc = nbd_negotiate_send_rep_len(ioc, NBD_REP_SERVER, NBD_OPT_LIST, len);
> +    if (rc < 0) {
> +        return rc;
>     }
> +
>     len = cpu_to_be32(name_len);
>     if (nbd_negotiate_write(ioc, &len, sizeof(len)) != sizeof(len)) {
>         LOG("write failed (name length)");
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 32/44] nbd: Share common option-sending code in client
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 32/44] nbd: Share common option-sending code in client Eric Blake
@ 2016-04-25  9:37   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25  9:37 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini, qemu block, Max Reitz


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> Rather than open-coding each option request, it's easier to
> have common helper functions do the work.  That in turn requires
> having convenient packed types for handling option requests
> and replies.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Alex Bligh <alex@alex.org.uk>
> 
> ---
> v3: rebase, tweak a debug message
> ---
> include/block/nbd.h |  29 +++++-
> nbd/nbd-internal.h  |   2 +-
> nbd/client.c        | 250 ++++++++++++++++++++++------------------------------
> 3 files changed, 129 insertions(+), 152 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index f4ae86c..5227ec6 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -26,20 +26,41 @@
> #include "io/channel-socket.h"
> #include "crypto/tlscreds.h"
> 
> +/* Handshake phase structs */
> +
> +struct nbd_option {
> +    uint64_t magic; /* NBD_OPTS_MAGIC */
> +    uint32_t option; /* NBD_OPT_* */
> +    uint32_t length;
> +} QEMU_PACKED;
> +typedef struct nbd_option nbd_option;
> +
> +struct nbd_opt_reply {
> +    uint64_t magic; /* NBD_REP_MAGIC */
> +    uint32_t option; /* NBD_OPT_* */
> +    uint32_t type; /* NBD_REP_* */
> +    uint32_t length;
> +} QEMU_PACKED;
> +typedef struct nbd_opt_reply nbd_opt_reply;
> +
> +/* Transmission phase structs */
> +
> struct nbd_request {
> -    uint32_t magic;
> -    uint16_t flags;
> -    uint16_t type;
> +    uint32_t magic; /* NBD_REQUEST_MAGIC */
> +    uint16_t flags; /* NBD_CMD_FLAG_* */
> +    uint16_t type; /* NBD_CMD_* */
>     uint64_t handle;
>     uint64_t from;
>     uint32_t len;
> } QEMU_PACKED;
> +typedef struct nbd_request nbd_request;
> 
> struct nbd_reply {
> -    uint32_t magic;
> +    uint32_t magic; /* NBD_REPLY_MAGIC */
>     uint32_t error;
>     uint64_t handle;
> } QEMU_PACKED;
> +typedef struct nbd_reply nbd_reply;
> 
> /* Transmission (export) flags: sent from server to client during handshake,
>    but describe what will happen during transmission */
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index d0108a1..95069db 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -61,7 +61,7 @@
> #define NBD_REPLY_MAGIC         0x67446698
> #define NBD_OPTS_MAGIC          0x49484156454F5054LL
> #define NBD_CLIENT_MAGIC        0x0000420281861253LL
> -#define NBD_REP_MAGIC           0x3e889045565a9LL
> +#define NBD_REP_MAGIC           0x0003e889045565a9LL
> 
> #define NBD_SET_SOCK            _IO(0xab, 0)
> #define NBD_SET_BLKSIZE         _IO(0xab, 1)
> diff --git a/nbd/client.c b/nbd/client.c
> index a9e173a..6626fa8 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -75,64 +75,123 @@ static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
> 
> */
> 
> +/* Send an option request. Return 0 if successful, -1 with errp set if
> + * it is impossible to continue. */
> +static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
> +                                   uint32_t len, const char *data,
> +                                   Error **errp)
> +{
> +    nbd_option req;
> +    QEMU_BUILD_BUG_ON(sizeof(req) != 16);
> 
> -/* If type represents success, return 1 without further action.
> - * If type represents an error reply, consume the rest of the packet on ioc.
> - * Then return 0 for unsupported (so the client can fall back to
> - * other approaches), or -1 with errp set for other errors.
> +    if (len == -1) {
> +        req.length = len = strlen(data);
> +    }
> +    TRACE("Sending option request %"PRIu32", len %"PRIu32, opt, len);
> +
> +    stq_be_p(&req.magic, NBD_OPTS_MAGIC);
> +    stl_be_p(&req.option, opt);
> +    stl_be_p(&req.length, len);
> +
> +    if (write_sync(ioc, &req, sizeof(req)) != sizeof(req)) {
> +        error_setg(errp, "Failed to send option request header");
> +        return -1;
> +    }
> +
> +    if (len && write_sync(ioc, (char *) data, len) != len) {
> +        error_setg(errp, "Failed to send option request data");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Receive the header of an option reply, which should match the given
> + * opt.  Read through the length field, but NOT the length bytes of
> + * payload. Return 0 if successful, -1 with errp set if it is
> + * impossible to continue. */
> +static int nbd_receive_option_reply(QIOChannel *ioc, uint32_t opt,
> +                                    nbd_opt_reply *reply, Error **errp)
> +{
> +    QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
> +    if (read_sync(ioc, reply, sizeof(*reply)) != sizeof(*reply)) {
> +        error_setg(errp, "failed to read option reply");
> +        return -1;
> +    }
> +    be64_to_cpus(&reply->magic);
> +    be32_to_cpus(&reply->option);
> +    be32_to_cpus(&reply->type);
> +    be32_to_cpus(&reply->length);
> +
> +    TRACE("Received option reply %"PRIx32", type %"PRIx32", len %"PRIu32,
> +          reply->option, reply->type, reply->length);
> +
> +    if (reply->magic != NBD_REP_MAGIC) {
> +        error_setg(errp, "Unexpected option reply magic");
> +        return -1;
> +    }
> +    if (reply->option != opt) {
> +        error_setg(errp, "Unexpected option type %x expected %x",
> +                   reply->option, opt);
> +        return -1;
> +    }
> +    return 0;
> +}
> +
> +/* If reply represents success, return 1 without further action.
> + * If reply represents an error, consume the optional payload of
> + * the packet on ioc.  Then return 0 for unsupported (so the client
> + * can fall back to other approaches), or -1 with errp set for other
> + * errors.
>  */
> -static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,
> +static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
>                                 Error **errp)
> {
> -    uint32_t len;
>     char *msg = NULL;
>     int result = -1;
> 
> -    if (!(type & (1 << 31))) {
> +    if (!(reply->type & (1 << 31))) {

Not something you changed, but 1<<31 could reasonably be a constant

>         return 1;
>     }
> 
> -    if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
> -        error_setg(errp, "failed to read option length");
> -        return -1;
> -    }
> -    len = be32_to_cpu(len);
> -    if (len) {
> -        if (len > NBD_MAX_BUFFER_SIZE) {
> +    if (reply->length) {
> +        if (reply->length > NBD_MAX_BUFFER_SIZE) {
>             error_setg(errp, "server's error message is too long");
>             goto cleanup;
>         }
> -        msg = g_malloc(len + 1);
> -        if (read_sync(ioc, msg, len) != len) {
> +        msg = g_malloc(reply->length + 1);
> +        if (read_sync(ioc, msg, reply->length) != reply->length) {
>             error_setg(errp, "failed to read option error message");
>             goto cleanup;
>         }
> -        msg[len] = '\0';
> +        msg[reply->length] = '\0';
>     }
> 
> -    switch (type) {
> +    switch (reply->type) {
>     case NBD_REP_ERR_UNSUP:
>         TRACE("server doesn't understand request %" PRIx32
> -              ", attempting fallback", opt);
> +              ", attempting fallback", reply->option);
>         result = 0;
>         goto cleanup;
> 
>     case NBD_REP_ERR_POLICY:
> -        error_setg(errp, "Denied by server for option %" PRIx32, opt);
> +        error_setg(errp, "Denied by server for option %" PRIx32,
> +                   reply->option);
>         break;
> 
>     case NBD_REP_ERR_INVALID:
> -        error_setg(errp, "Invalid data length for option %" PRIx32, opt);
> +        error_setg(errp, "Invalid data length for option %" PRIx32,
> +                   reply->option);
>         break;
> 
>     case NBD_REP_ERR_TLS_REQD:
>         error_setg(errp, "TLS negotiation required before option %" PRIx32,
> -                   opt);
> +                   reply->option);
>         break;
> 
>     default:
>         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
> -                   opt);
> +                   reply->option);
>         break;
>     }
> 
> @@ -147,58 +206,29 @@ static int nbd_handle_reply_err(QIOChannel *ioc, uint32_t opt, uint32_t type,
> 
> static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
> {
> -    uint64_t magic;
> -    uint32_t opt;
> -    uint32_t type;
> +    nbd_opt_reply reply;
>     uint32_t len;
>     uint32_t namelen;
>     int error;
> 
>     *name = NULL;
> -    if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -        error_setg(errp, "failed to read list option magic");
> +    if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
>         return -1;
>     }
> -    magic = be64_to_cpu(magic);
> -    if (magic != NBD_REP_MAGIC) {
> -        error_setg(errp, "Unexpected option list magic");
> -        return -1;
> -    }
> -    if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -        error_setg(errp, "failed to read list option");
> -        return -1;
> -    }
> -    opt = be32_to_cpu(opt);
> -    if (opt != NBD_OPT_LIST) {
> -        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
> -                   opt, NBD_OPT_LIST);
> -        return -1;
> -    }
> -
> -    if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
> -        error_setg(errp, "failed to read list option type");
> -        return -1;
> -    }
> -    type = be32_to_cpu(type);
> -    error = nbd_handle_reply_err(ioc, opt, type, errp);
> +    error = nbd_handle_reply_err(ioc, &reply, errp);
>     if (error <= 0) {
>         return error;
>     }
> +    len = reply.length;
> 
> -    if (read_sync(ioc, &len, sizeof(len)) != sizeof(len)) {
> -        error_setg(errp, "failed to read option length");
> -        return -1;
> -    }
> -    len = be32_to_cpu(len);
> -
> -    if (type == NBD_REP_ACK) {
> +    if (reply.type == NBD_REP_ACK) {
>         if (len != 0) {
>             error_setg(errp, "length too long for option end");
>             return -1;
>         }
> -    } else if (type == NBD_REP_SERVER) {
> +    } else if (reply.type == NBD_REP_SERVER) {
>         if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
> -            error_setg(errp, "incorrect option length");
> +            error_setg(errp, "incorrect option length %"PRIu32, len);
>             return -1;
>         }
>         if (read_sync(ioc, &namelen, sizeof(namelen)) != sizeof(namelen)) {
> @@ -240,7 +270,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, Error **errp)
>         }
>     } else {
>         error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
> -                   type, NBD_REP_SERVER);
> +                   reply.type, NBD_REP_SERVER);
>         return -1;
>     }
>     return 1;
> @@ -251,24 +281,10 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
>                                      const char *wantname,
>                                      Error **errp)
> {
> -    uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
> -    uint32_t opt = cpu_to_be32(NBD_OPT_LIST);
> -    uint32_t length = 0;
>     bool foundExport = false;
> 
>     TRACE("Querying export list");
> -    if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -        error_setg(errp, "Failed to send list option magic");
> -        return -1;
> -    }
> -
> -    if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -        error_setg(errp, "Failed to send list option number");
> -        return -1;
> -    }
> -
> -    if (write_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
> -        error_setg(errp, "Failed to send list option length");
> +    if (nbd_send_option_request(ioc, NBD_OPT_LIST, 0, NULL, errp) < 0) {
>         return -1;
>     }
> 
> @@ -314,72 +330,29 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>                                         QCryptoTLSCreds *tlscreds,
>                                         const char *hostname, Error **errp)
> {
> -    uint64_t magic = cpu_to_be64(NBD_OPTS_MAGIC);
> -    uint32_t opt = cpu_to_be32(NBD_OPT_STARTTLS);
> -    uint32_t length = 0;
> -    uint32_t type;
> +    nbd_opt_reply reply;
>     QIOChannelTLS *tioc;
>     struct NBDTLSHandshakeData data = { 0 };
> 
>     TRACE("Requesting TLS from server");
> -    if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -        error_setg(errp, "Failed to send option magic");
> -        return NULL;
> -    }
> -
> -    if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -        error_setg(errp, "Failed to send option number");
> -        return NULL;
> -    }
> -
> -    if (write_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
> -        error_setg(errp, "Failed to send option length");
> -        return NULL;
> -    }
> -
> -    TRACE("Getting TLS reply from server1");
> -    if (read_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -        error_setg(errp, "failed to read option magic");
> -        return NULL;
> -    }
> -    magic = be64_to_cpu(magic);
> -    if (magic != NBD_REP_MAGIC) {
> -        error_setg(errp, "Unexpected option magic");
> -        return NULL;
> -    }
> -    TRACE("Getting TLS reply from server2");
> -    if (read_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -        error_setg(errp, "failed to read option");
> -        return NULL;
> -    }
> -    opt = be32_to_cpu(opt);
> -    if (opt != NBD_OPT_STARTTLS) {
> -        error_setg(errp, "Unexpected option type %" PRIx32 " expected %x",
> -                   opt, NBD_OPT_STARTTLS);
> +    if (nbd_send_option_request(ioc, NBD_OPT_STARTTLS, 0, NULL, errp) < 0) {
>         return NULL;
>     }
> 
>     TRACE("Getting TLS reply from server");
> -    if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
> -        error_setg(errp, "failed to read option type");
> +    if (nbd_receive_option_reply(ioc, NBD_OPT_STARTTLS, &reply, errp) < 0) {
>         return NULL;
>     }
> -    type = be32_to_cpu(type);
> -    if (type != NBD_REP_ACK) {
> +
> +    if (reply.type != NBD_REP_ACK) {
>         error_setg(errp, "Server rejected request to start TLS %" PRIx32,
> -                   type);
> +                   reply.type);
>         return NULL;
>     }
> 
> -    TRACE("Getting TLS reply from server");
> -    if (read_sync(ioc, &length, sizeof(length)) != sizeof(length)) {
> -        error_setg(errp, "failed to read option length");
> -        return NULL;
> -    }
> -    length = be32_to_cpu(length);
> -    if (length != 0) {
> +    if (reply.length != 0) {
>         error_setg(errp, "Start TLS response was not zero %" PRIu32,
> -                   length);
> +                   reply.length);
>         return NULL;
>     }
> 
> @@ -466,8 +439,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
> 
>     if (magic == NBD_OPTS_MAGIC) {
>         uint32_t clientflags = 0;
> -        uint32_t opt;
> -        uint32_t namesize;
>         uint16_t globalflags;
>         bool fixedNewStyle = false;
> 
> @@ -517,28 +488,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
>                 goto fail;
>             }
>         }
> -        /* write the export name */
> -        magic = cpu_to_be64(magic);
> -        if (write_sync(ioc, &magic, sizeof(magic)) != sizeof(magic)) {
> -            error_setg(errp, "Failed to send export name magic");
> -            goto fail;
> -        }
> -        opt = cpu_to_be32(NBD_OPT_EXPORT_NAME);
> -        if (write_sync(ioc, &opt, sizeof(opt)) != sizeof(opt)) {
> -            error_setg(errp, "Failed to send export name option number");
> -            goto fail;
> -        }
> -        namesize = cpu_to_be32(strlen(name));
> -        if (write_sync(ioc, &namesize, sizeof(namesize)) !=
> -            sizeof(namesize)) {
> -            error_setg(errp, "Failed to send export name length");
> -            goto fail;
> -        }
> -        if (write_sync(ioc, (char *)name, strlen(name)) != strlen(name)) {
> -            error_setg(errp, "Failed to send export name");
> +        /* write the export name request */
> +        if (nbd_send_option_request(ioc, NBD_OPT_EXPORT_NAME, -1, name,
> +                                    errp) < 0) {
>             goto fail;
>         }
> 
> +        /* Read the response */
>         if (read_sync(ioc, &s, sizeof(s)) != sizeof(s)) {
>             error_setg(errp, "Failed to read export length");
>             goto fail;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests Eric Blake
@ 2016-04-25  9:47   ` Alex Bligh
  2016-04-25 19:20     ` Eric Blake
  0 siblings, 1 reply; 67+ messages in thread
From: Alex Bligh @ 2016-04-25  9:47 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini, qemu block, Max Reitz


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> NBD commit 6d34500b clarified how clients and servers are supposed
> to behave before closing a connection. It added NBD_REP_ERR_SHUTDOWN
> (for the server to announce it is about to go away during option
> haggling, so the client should quit sending NBD_OPT_* other than
> NBD_OPT_ABORT) and ESHUTDOWN (for the server to announce it is about
> to go away during transmission, so the client should quit sending
> NBD_CMD_* other than NBD_CMD_DISC).  It also clarified that
> NBD_OPT_ABORT gets a reply, while NBD_CMD_DISC does not.
> 
> This patch merely adds the missing reply to NBD_OPT_ABORT and teaches
> the client to recognize server errors.  Actually teaching the server
> to send NBD_REP_ERR_SHUTDOWN or ESHUTDOWN would require knowing that
> the server has been requested to shut down soon (maybe we could do
> that by installing a SIGINT handler in qemu-nbd, which transitions
> from RUNNING to a new state that waits for the client to react,
> rather than just out-right quitting).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> include/block/nbd.h | 13 +++++++++----
> nbd/nbd-internal.h  |  1 +
> nbd/client.c        | 16 ++++++++++++++++
> nbd/server.c        | 16 +++++++++++++++-
> 4 files changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index d707761..2fd1a67 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -82,12 +82,17 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_FLAG_C_NO_ZEROES      (1 << 1) /* End handshake without zeroes. */
> 
> /* Reply types. */
> +#define NBD_REP_ERR(value) ((UINT32_C(1) << 31) | (value))
> +
> #define NBD_REP_ACK             (1)             /* Data sending finished. */
> #define NBD_REP_SERVER          (2)             /* Export description. */
> -#define NBD_REP_ERR_UNSUP       ((UINT32_C(1) << 31) | 1) /* Unknown option. */
> -#define NBD_REP_ERR_POLICY      ((UINT32_C(1) << 31) | 2) /* Server denied */
> -#define NBD_REP_ERR_INVALID     ((UINT32_C(1) << 31) | 3) /* Invalid length. */
> -#define NBD_REP_ERR_TLS_REQD    ((UINT32_C(1) << 31) | 5) /* TLS required */
> +
> +#define NBD_REP_ERR_UNSUP       NBD_REP_ERR(1)  /* Unknown option. */
> +#define NBD_REP_ERR_POLICY      NBD_REP_ERR(2)  /* Server denied */
> +#define NBD_REP_ERR_INVALID     NBD_REP_ERR(3)  /* Invalid length */
> +#define NBD_REP_ERR_PLATFORM    NBD_REP_ERR(4)  /* Not compiled in */
> +#define NBD_REP_ERR_TLS_REQD    NBD_REP_ERR(5)  /* TLS required */
> +#define NBD_REP_ERR_SHUTDOWN    NBD_REP_ERR(7)  /* Server shutting down */
> 
> /* Request flags, sent from client to server during transmission phase */
> #define NBD_CMD_FLAG_FUA        (1 << 0)
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 95069db..0d40b1f 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -91,6 +91,7 @@
> #define NBD_ENOMEM     12
> #define NBD_EINVAL     22
> #define NBD_ENOSPC     28
> +#define NBD_ESHUTDOWN  108
> 
> static inline ssize_t read_sync(QIOChannel *ioc, void *buffer, size_t size)
> {
> diff --git a/nbd/client.c b/nbd/client.c
> index 68e9473..4140d13 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -34,6 +34,8 @@ static int nbd_errno_to_system_errno(int err)
>         return ENOMEM;
>     case NBD_ENOSPC:
>         return ENOSPC;
> +    case NBD_ESHUTDOWN:
> +        return ESHUTDOWN;
>     default:
>         TRACE("Squashing unexpected error %d to EINVAL", err);
>         /* fallthrough */
> @@ -210,11 +212,21 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
>                    reply->option);
>         break;
> 
> +    case NBD_REP_ERR_PLATFORM:
> +        error_setg(errp, "Server lacks support for option %" PRIx32,
> +                   reply->option);
> +        break;
> +
>     case NBD_REP_ERR_TLS_REQD:
>         error_setg(errp, "TLS negotiation required before option %" PRIx32,
>                    reply->option);
>         break;
> 
> +    case NBD_REP_ERR_SHUTDOWN:
> +        error_setg(errp, "Server shutting down before option %" PRIx32,
> +                   reply->option);
> +        break;
> +
>     default:
>         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
>                    reply->option);
> @@ -754,6 +766,10 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply)
>         LOG("invalid magic (got 0x%" PRIx32 ")", magic);
>         return -EINVAL;
>     }
> +    if (reply->error == ESHUTDOWN) {
> +        LOG("server shutting down");
> +        return -EINVAL;
> +    }
>     return 0;
> }
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index dadc928..fa6a994 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -39,6 +39,8 @@ static int system_errno_to_nbd_errno(int err)
>     case EFBIG:
>     case ENOSPC:
>         return NBD_ENOSPC;
> +    case ESHUTDOWN:
> +        return NBD_ESHUTDOWN;
>     case EINVAL:
>     default:
>         return NBD_EINVAL;
> @@ -484,6 +486,10 @@ static int nbd_negotiate_options(NBDClient *client)
>                 if (ret < 0) {
>                     return ret;
>                 }
> +                /* Let the client keep trying, unless they asked to quit */
> +                if (clientflags == NBD_OPT_ABORT) {

OK that's totally confusing. clientflags is not the client flags. clientflags
is the NBD option ID, which happens to be the two bytes after the NBD OPT magic,
which is the client flags if we were doing oldstyle negotiation, not newstyle
negotiation.

Except:

> +                    return -EINVAL;
> +                }
>                 break;
>             }
>         } else if (fixedNewstyle) {

So the above is for NewStyle (not fixedNewStyle)?

In which case more than one option isn't even supported, so what's the
stuff purporting to handle TLS doing there?

Confused ...

> @@ -496,7 +502,15 @@ static int nbd_negotiate_options(NBDClient *client)
>                 break;
> 
>             case NBD_OPT_ABORT:
> -                return -EINVAL;
> +                /* NBD spec says we must reply before disconnecting,
> +                 * but that we must also tolerate guests that don't
> +                 * wait for our reply. */
> +                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
> +                                             clientflags);
> +                if (!ret) {
> +                    ret = -EINVAL;
> +                }
> +                return ret;
> 
>             case NBD_OPT_EXPORT_NAME:
>                 return nbd_negotiate_handle_export_name(client, length);
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 40/44] nbd: Implement NBD_OPT_GO on client
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 40/44] nbd: Implement NBD_OPT_GO on client Eric Blake
@ 2016-04-25 10:31   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25 10:31 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alex Bligh, qemu-devel, Paolo Bonzini, qemu-block


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> NBD_OPT_EXPORT_NAME is lousy: it doesn't have any sane error
> reporting.  Upstream NBD recently added NBD_OPT_GO as the
> improved version of the option that does what we want: it
> reports sane errors on failures (including when a server
> requires TLS but does not have NBD_OPT_GO!), and on success
> it provides at least as much info as NBD_OPT_EXPORT_NAME sends.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

Reviewed-by: Alex Bligh <alex@alex.org.uk>

> ---
> v3: revamp to match latest version of NBD protocol
> ---
> nbd/nbd-internal.h |   3 ++
> nbd/client.c       | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 121 insertions(+), 2 deletions(-)
> 
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index c597bb8..1935102 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -55,8 +55,11 @@
>  * https://github.com/yoe/nbd/blob/master/doc/proto.md
>  */
> 
> +/* Size of all NBD_OPT_*, without payload */
> #define NBD_REQUEST_SIZE        (4 + 2 + 2 + 8 + 8 + 4)
> +/* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
> #define NBD_REPLY_SIZE          (4 + 4 + 8)
> +
> #define NBD_REQUEST_MAGIC       0x25609513
> #define NBD_REPLY_MAGIC         0x67446698
> #define NBD_OPTS_MAGIC          0x49484156454F5054LL
> diff --git a/nbd/client.c b/nbd/client.c
> index 89fa2c3..dac4f29 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -222,6 +222,11 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
>                    reply->option);
>         break;
> 
> +    case NBD_REP_ERR_UNKNOWN:
> +        error_setg(errp, "Requested export not available for option %" PRIx32,
> +                   reply->option);
> +        break;
> +
>     case NBD_REP_ERR_SHUTDOWN:
>         error_setg(errp, "Server shutting down before option %" PRIx32,
>                    reply->option);
> @@ -311,6 +316,103 @@ static int nbd_receive_list(QIOChannel *ioc, const char *want, Error **errp)
> }
> 
> 
> +/* Returns -1 if NBD_OPT_GO proves the export @wantname cannot be
> + * used, 0 if NBD_OPT_GO is unsupported (fall back to NBD_OPT_LIST and
> + * NBD_OPT_EXPORT_NAME in that case), and > 0 if the export is good to
> + * go (with @size and @flags populated). */
> +static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
> +                      NbdExportInfo *info, Error **errp)
> +{
> +    nbd_opt_reply reply;
> +    uint32_t len;
> +    uint16_t type;
> +    int error;
> +
> +    /* The protocol requires that the server send NBD_INFO_EXPORT with
> +     * a non-zero flags (at least NBD_FLAG_HAS_FLAGS must be set); so
> +     * flags still 0 is a witness of a broken server. */
> +    info->flags = 0;
> +
> +    TRACE("Attempting NBD_OPT_GO for export '%s'", wantname);
> +    if (nbd_send_option_request(ioc, NBD_OPT_GO, -1, wantname, errp) < 0) {
> +        return -1;
> +    }
> +
> +    TRACE("Reading export info");
> +    while (1) {
> +        if (nbd_receive_option_reply(ioc, NBD_OPT_GO, &reply, errp) < 0) {
> +            return -1;
> +        }
> +        error = nbd_handle_reply_err(ioc, &reply, errp);
> +        if (error <= 0) {
> +            return error;
> +        }
> +        len = reply.length;
> +
> +        if (reply.type == NBD_REP_ACK) {
> +            /* Server is done sending info and moved into transmission
> +               phase, but make sure it sent flags */
> +            if (len) {
> +                error_setg(errp, "server sent invalid NBD_REP_ACK");
> +                return -1;
> +            }
> +            if (!info->flags) {
> +                error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
> +                return -1;
> +            }
> +            TRACE("export is good to go");
> +            return 1;
> +        }
> +        if (reply.type != NBD_REP_INFO) {
> +            error_setg(errp, "unexpected reply type %" PRIx32 ", expected %x",
> +                       reply.type, NBD_REP_INFO);
> +            return -1;
> +        }
> +        if (len < sizeof(type)) {
> +            error_setg(errp, "NBD_REP_INFO length %" PRIu32 " is too short",
> +                       len);
> +            return -1;
> +        }
> +        if (read_sync(ioc, &type, sizeof(type)) != sizeof(type)) {
> +            error_setg(errp, "failed to read info type");
> +            return -1;
> +        }
> +        len -= sizeof(type);
> +        be16_to_cpus(&type);
> +        switch (type) {
> +        case NBD_INFO_EXPORT:
> +            if (len != sizeof(info->size) + sizeof(info->flags)) {
> +                error_setg(errp, "remaining export info len %" PRIu32
> +                           " is unexpected size", len);
> +                return -1;
> +            }
> +            if (read_sync(ioc, &info->size, sizeof(info->size)) !=
> +                sizeof(info->size)) {
> +                error_setg(errp, "failed to read info size");
> +                return -1;
> +            }
> +            be64_to_cpus(&info->size);
> +            if (read_sync(ioc, &info->flags, sizeof(info->flags)) !=
> +                sizeof(info->flags)) {
> +                error_setg(errp, "failed to read info flags");
> +                return -1;
> +            }
> +            be16_to_cpus(&info->flags);
> +            TRACE("Size is %" PRIu64 ", export flags %" PRIx16,
> +                  info->size, info->flags);
> +            break;
> +
> +        default:
> +            TRACE("ignoring unknown export info %" PRIu16, type);
> +            if (drop_sync(ioc, len) != len) {
> +                error_setg(errp, "Failed to read info payload");
> +                return -1;
> +            }
> +            break;
> +        }
> +    }
> +}
> +
> /* Return -1 on failure, 0 if wantname is an available export. */
> static int nbd_receive_query_exports(QIOChannel *ioc,
>                                      const char *wantname,
> @@ -515,11 +617,25 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
>             name = "";
>         }
>         if (fixedNewStyle) {
> +            int result;
> +
> +            /* Try NBD_OPT_GO first - if it works, we are done (it
> +             * also gives us a good message if the server requires
> +             * TLS).  If it is not available, fall back to
> +             * NBD_OPT_LIST for nicer error messages about a missing
> +             * export, then use NBD_OPT_EXPORT_NAME.  */
> +            result = nbd_opt_go(ioc, name, info, errp);
> +            if (result < 0) {
> +                goto fail;
> +            }
> +            if (result > 0) {
> +                return 0;
> +            }
>             /* Check our desired export is present in the
>              * server export list. Since NBD_OPT_EXPORT_NAME
>              * cannot return an error message, running this
> -             * query gives us good error reporting if the
> -             * server required TLS
> +             * query gives us better error reporting if the
> +             * export name is not available.
>              */
>             if (nbd_receive_query_exports(ioc, name, errp) < 0) {
>                 goto fail;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 41/44] nbd: Implement NBD_CMD_WRITE_ZEROES on server
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 41/44] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
  2016-04-23  9:00   ` Pavel Borzenkov
@ 2016-04-25 12:11   ` Alex Bligh
  1 sibling, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25 12:11 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini, qemu-block, Max Reitz


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> Upstream NBD protocol recently added the ability to efficiently
> write zeroes without having to send the zeroes over the wire,
> along with a flag to control whether the client wants a hole.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alex Bligh <alex@alex.org.uk>

> 
> ---
> v3: abandon NBD_CMD_CLOSE extension, rebase to use blk_pwrite_zeroes
> ---
> include/block/nbd.h |  7 +++++--
> nbd/server.c        | 42 ++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 45 insertions(+), 4 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 05c0e48..1072d9e 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -70,6 +70,7 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
> #define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
> #define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
> +#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)     /* Send WRITE_ZEROES */
> 
> /* New-style handshake (global) flags, sent from server to client, and
>    control what will happen during handshake phase. */
> @@ -102,7 +103,8 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_INFO_DESCRIPTION    2
> 
> /* Request flags, sent from client to server during transmission phase */
> -#define NBD_CMD_FLAG_FUA        (1 << 0)
> +#define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
> +#define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */
> 
> /* Supported request types */
> enum {
> @@ -110,7 +112,8 @@ enum {
>     NBD_CMD_WRITE = 1,
>     NBD_CMD_DISC = 2,
>     NBD_CMD_FLUSH = 3,
> -    NBD_CMD_TRIM = 4
> +    NBD_CMD_TRIM = 4,
> +    NBD_CMD_WRITE_ZEROES = 5,
> };
> 
> #define NBD_DEFAULT_PORT	10809
> diff --git a/nbd/server.c b/nbd/server.c
> index 1edb5f3..563afb2 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -689,7 +689,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>     char buf[8 + 8 + 8 + 128];
>     int rc;
>     const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> -                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> +                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
> +                              NBD_FLAG_SEND_WRITE_ZEROES);
>     bool oldStyle;
>     size_t len;
> 
> @@ -1199,11 +1200,17 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
>         rc = -EINVAL;
>         goto out;
>     }
> -    if (request->flags & ~NBD_CMD_FLAG_FUA) {
> +    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
>         LOG("unsupported flags (got 0x%x)", request->flags);
>         rc = -EINVAL;
>         goto out;
>     }
> +    if (request->type != NBD_CMD_WRITE_ZEROES &&
> +        (request->flags & NBD_CMD_FLAG_NO_HOLE)) {
> +        LOG("unexpected flags (got 0x%x)", request->flags);
> +        rc = -EINVAL;
> +        goto out;
> +    }
> 
>     rc = 0;
> 
> @@ -1308,6 +1315,37 @@ static void nbd_trip(void *opaque)
>         }
>         break;
> 
> +    case NBD_CMD_WRITE_ZEROES:
> +        TRACE("Request type is WRITE_ZEROES");
> +
> +        if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
> +            TRACE("Server is read-only, return error");
> +            reply.error = EROFS;
> +            goto error_reply;
> +        }
> +
> +        TRACE("Writing to device");
> +
> +        flags = 0;
> +        if (request.flags & NBD_CMD_FLAG_FUA) {
> +            flags |= BDRV_REQ_FUA;
> +        }
> +        if (!(request.flags & NBD_CMD_FLAG_NO_HOLE)) {
> +            flags |= BDRV_REQ_MAY_UNMAP;
> +        }
> +        ret = blk_pwrite_zeroes(exp->blk, request.from + exp->dev_offset,
> +                                request.len, flags);
> +        if (ret < 0) {
> +            LOG("writing to file failed");
> +            reply.error = -ret;
> +            goto error_reply;
> +        }
> +
> +        if (nbd_co_send_reply(req, &reply, 0) < 0) {
> +            goto out;
> +        }
> +        break;
> +
>     case NBD_CMD_DISC:
>         /* unreachable, thanks to special case in nbd_co_receive_request() */
>         abort();
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 42/44] nbd: Implement NBD_CMD_WRITE_ZEROES on client
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 42/44] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
@ 2016-04-25 12:12   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25 12:12 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini, qemu-block, Max Reitz


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> Upstream NBD protocol recently added the ability to efficiently
> write zeroes without having to send the zeroes over the wire,
> along with a flag to control whether the client wants a hole.
> 
> The generic block code takes care of falling back to the obvious
> write of lots of zeroes if we return -ENOTSUP because the server
> does not have WRITE_ZEROES.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
> 
> ---
> v3: rebase, tell block layer about our support
> ---
> block/nbd-client.h |  2 ++
> block/nbd-client.c | 34 ++++++++++++++++++++++++++++++++++
> block/nbd.c        | 24 ++++++++++++++++++++++++
> 3 files changed, 60 insertions(+)
> 
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index 0867147..07630ab 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -46,6 +46,8 @@ void nbd_client_close(BlockDriverState *bs);
> int nbd_client_co_discard(BlockDriverState *bs, int64_t sector_num,
>                           int nb_sectors);
> int nbd_client_co_flush(BlockDriverState *bs);
> +int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +                               int nb_sectors, int *flags);
> int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
>                          int nb_sectors, QEMUIOVector *qiov, int *flags);
> int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index f20219b..2b6ac27 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -291,6 +291,40 @@ int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
>     return nbd_co_readv_1(bs, sector_num, nb_sectors, qiov, offset);
> }
> 
> +int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +                               int nb_sectors, int *flags)
> +{
> +    ssize_t ret;
> +    NbdClientSession *client = nbd_get_client_session(bs);
> +    struct nbd_request request = { .type = NBD_CMD_WRITE_ZEROES };
> +    struct nbd_reply reply;
> +
> +    if (!(client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
> +        return -ENOTSUP;
> +    }
> +
> +    if ((*flags & BDRV_REQ_FUA) && (client->info.flags & NBD_FLAG_SEND_FUA)) {
> +        *flags &= ~BDRV_REQ_FUA;
> +        request.flags |= NBD_CMD_FLAG_FUA;
> +    }
> +    if (!(*flags & BDRV_REQ_MAY_UNMAP)) {
> +        request.flags |= NBD_CMD_FLAG_NO_HOLE;
> +    }
> +
> +    request.from = sector_num * 512;
> +    request.len = nb_sectors * 512;
> +
> +    nbd_coroutine_start(client, &request);
> +    ret = nbd_co_send_request(bs, &request, NULL, 0);
> +    if (ret < 0) {
> +        reply.error = -ret;
> +    } else {
> +        nbd_co_receive_reply(client, &request, &reply, NULL, 0);
> +    }
> +    nbd_coroutine_end(client, &request);
> +    return -reply.error;
> +}
> +
> int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
>                          int nb_sectors, QEMUIOVector *qiov, int *flags)
> {
> diff --git a/block/nbd.c b/block/nbd.c
> index 34db83e..5172039 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -355,6 +355,26 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
>     return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
> }
> 
> +static int nbd_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +                               int nb_sectors, BdrvRequestFlags orig_flags)
> +{
> +    int flags = orig_flags;
> +    int ret;
> +
> +    ret = nbd_client_co_write_zeroes(bs, sector_num, nb_sectors, &flags);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    /* The flag wasn't sent to the server, so we need to emulate it with an
> +     * explicit flush */
> +    if (flags & BDRV_REQ_FUA) {
> +        ret = nbd_client_co_flush(bs);
> +    }
> +
> +    return ret;
> +}
> +
> static int nbd_co_writev_flags(BlockDriverState *bs, int64_t sector_num,
>                                int nb_sectors, QEMUIOVector *qiov, int flags)
> {
> @@ -388,6 +408,7 @@ static int nbd_co_flush(BlockDriverState *bs)
> static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
> {
>     bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
> +    bs->bl.max_write_zeroes = UINT32_MAX >> BDRV_SECTOR_BITS;
>     bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
> }
> 
> @@ -476,6 +497,7 @@ static BlockDriver bdrv_nbd = {
>     .bdrv_parse_filename        = nbd_parse_filename,
>     .bdrv_file_open             = nbd_open,
>     .bdrv_co_readv              = nbd_co_readv,
> +    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
>     .bdrv_co_writev             = nbd_co_writev,
>     .bdrv_co_writev_flags       = nbd_co_writev_flags,
>     .supported_write_flags      = BDRV_REQ_FUA,
> @@ -496,6 +518,7 @@ static BlockDriver bdrv_nbd_tcp = {
>     .bdrv_parse_filename        = nbd_parse_filename,
>     .bdrv_file_open             = nbd_open,
>     .bdrv_co_readv              = nbd_co_readv,
> +    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
>     .bdrv_co_writev             = nbd_co_writev,
>     .bdrv_co_writev_flags       = nbd_co_writev_flags,
>     .supported_write_flags      = BDRV_REQ_FUA,
> @@ -516,6 +539,7 @@ static BlockDriver bdrv_nbd_unix = {
>     .bdrv_parse_filename        = nbd_parse_filename,
>     .bdrv_file_open             = nbd_open,
>     .bdrv_co_readv              = nbd_co_readv,
> +    .bdrv_co_write_zeroes       = nbd_co_write_zeroes,
>     .bdrv_co_writev             = nbd_co_writev,
>     .bdrv_co_writev_flags       = nbd_co_writev_flags,
>     .supported_write_flags      = BDRV_REQ_FUA,
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 43/44] nbd: Implement NBD_OPT_BLOCK_SIZE on server
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 43/44] nbd: Implement NBD_OPT_BLOCK_SIZE on server Eric Blake
@ 2016-04-25 12:16   ` Alex Bligh
  0 siblings, 0 replies; 67+ messages in thread
From: Alex Bligh @ 2016-04-25 12:16 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini, qemu-block, Max Reitz

Eric,

See my message on nbd-general today re the necessity (or not)
of getting NBD_OPT_BLOCK_SIZE first; it may be just that you
can assume 512 is OK.

Otherwise

Reviewed-by: Alex Bligh <alex@alex.org.uk>

Alex

On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> The upstream NBD Protocol has defined a new extension to allow
> the server to advertise block sizes to the client, as well as
> a way for the client to inform the server that it intends to
> obey block sizes.
> 
> Thanks to a recent fix, our minimum transfer size is always
> 1 (the block layer takes care of read-modify-write on our
> behalf), although if we wanted down the road, we could
> advertise a minimum of 512 based on our usage patterns to a
> client that is willing to honor block sizes.  Meanwhile,
> advertising our absolute maximum transfer size of 32M will
> help newer clients avoid EINVAL failures.
> 
> We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
> we are no worse off for those clients than we used to be. But
> for clients new enough to use NBD_OPT_GO, we require the client
> to first send us NBD_OPT_BLOCK_SIZE to prove they know about
> sizing restraints, by failing with NBD_REP_ERR_BLOCK_SIZE_REQD.
> All existing released qemu clients (whether old-style or new, at
> least by the end of this series) already honor our limits, and
> will still connect; so at most, this would reject a new non-qemu
> client that doesn't intend to obey limits (and that client could
> still use NBD_OPT_EXPORT_NAME to bypass our rejection).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> include/block/nbd.h |  2 ++
> nbd/nbd-internal.h  |  1 +
> nbd/server.c        | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 65 insertions(+)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 1072d9e..a5c68df 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -96,11 +96,13 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_REP_ERR_TLS_REQD    NBD_REP_ERR(5)  /* TLS required */
> #define NBD_REP_ERR_UNKNOWN     NBD_REP_ERR(6)  /* Export unknown */
> #define NBD_REP_ERR_SHUTDOWN    NBD_REP_ERR(7)  /* Server shutting down */
> +#define NBD_REP_ERR_BLOCK_SIZE_REQD NBD_REP_ERR(8) /* Missing OPT_BLOCK_SIZE */
> 
> /* Info types, used during NBD_REP_INFO */
> #define NBD_INFO_EXPORT         0
> #define NBD_INFO_NAME           1
> #define NBD_INFO_DESCRIPTION    2
> +#define NBD_INFO_BLOCK_SIZE     3
> 
> /* Request flags, sent from client to server during transmission phase */
> #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 1935102..1354182 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -85,6 +85,7 @@
> #define NBD_OPT_STARTTLS        (5)
> #define NBD_OPT_INFO            (6)
> #define NBD_OPT_GO              (7)
> +#define NBD_OPT_BLOCK_SIZE      (9)
> 
> /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
>  * but only a limited set of errno values is specified in the protocol.
> diff --git a/nbd/server.c b/nbd/server.c
> index 563afb2..86d1e2d 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -83,6 +83,7 @@ struct NBDClient {
>     void (*close)(NBDClient *client);
> 
>     bool no_zeroes;
> +    bool block_size;
>     NBDExport *exp;
>     QCryptoTLSCreds *tlscreds;
>     char *tlsaclname;
> @@ -346,6 +347,7 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
>     uint16_t type;
>     uint64_t size;
>     uint16_t flags;
> +    uint32_t block;
> 
>     /* Client sends:
>         [20 ..  xx]   export name (length bytes)
> @@ -391,6 +393,57 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint32_t length,
>     }
> 
>     rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt,
> +                                    sizeof(type) + 3 * sizeof(block));
> +    if (rc < 0) {
> +        return rc;
> +    }
> +
> +    type = cpu_to_be16(NBD_INFO_BLOCK_SIZE);
> +    if (nbd_negotiate_write(client->ioc, &type, sizeof(type)) !=
> +        sizeof(type)) {
> +        LOG("write failed");
> +        return -EIO;
> +    }
> +    /* minimum - Always 1, because we use blk_pread().
> +     * TODO: Advertise 512 if guest used NBD_OPT_BLOCK_SIZE? */
> +    block = cpu_to_be32(1);
> +    if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) !=
> +        sizeof(block)) {
> +        LOG("write failed");
> +        return -EIO;
> +    }
> +    /* preferred - At least 4096, but larger as appropriate. */
> +    block = blk_get_opt_transfer_length(exp->blk) * BDRV_SECTOR_SIZE;
> +    block = cpu_to_be32(MAX(4096, block));
> +    if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) !=
> +        sizeof(block)) {
> +        LOG("write failed");
> +        return -EIO;
> +    }
> +    /* maximum - At most 32M, but smaller as appropriate. */
> +    block = blk_get_max_transfer_length(exp->blk);
> +    if (block && block < NBD_MAX_BUFFER_SIZE / BDRV_SECTOR_SIZE) {
> +        block *= BDRV_SECTOR_SIZE;
> +    } else {
> +        block = NBD_MAX_BUFFER_SIZE;
> +    }
> +    block = cpu_to_be32(block);
> +    if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) !=
> +        sizeof(block)) {
> +        LOG("write failed");
> +        return -EIO;
> +    }
> +
> +    if (!client->block_size) {
> +        /* The client is new enough to use NBD_OPT_GO, but forgot to
> +         * tell us that it plans to obey block sizes. Since we fail
> +         * hard on oversize requests, it's better to reject such a
> +         * client up front.  */
> +        return nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_BLOCK_SIZE_REQD,
> +                                      opt);
> +    }
> +
> +    rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt,
>                                     sizeof(type) + sizeof(size) +
>                                     sizeof(flags));
>     if (rc < 0) {
> @@ -630,6 +683,15 @@ static int nbd_negotiate_options(NBDClient *client, uint16_t myflags)
>                 }
>                 break;
> 
> +            case NBD_OPT_BLOCK_SIZE:
> +                client->block_size = true;
> +                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
> +                                             clientflags);
> +                if (ret < 0) {
> +                    return ret;
> +                }
> +                break;
> +
>             case NBD_OPT_STARTTLS:
>                 if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
>                     return -EIO;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 44/44] nbd: Implement NBD_OPT_BLOCK_SIZE on client
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 44/44] nbd: Implement NBD_OPT_BLOCK_SIZE on client Eric Blake
@ 2016-04-25 12:19   ` Alex Bligh
  2016-04-25 19:16     ` Eric Blake
  0 siblings, 1 reply; 67+ messages in thread
From: Alex Bligh @ 2016-04-25 12:19 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, qemu-devel, Kevin Wolf, Paolo Bonzini, qemu-block, Max Reitz


On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:

> The upstream NBD Protocol has defined a new extension to allow
> the server to advertise block sizes to the client, as well as
> a way for the client to inform the server that it intends to
> obey block sizes.
> 
> Pass any received sizes on to the block layer.
> 
> Use the minimum block size as the sector size we pass to the
> kernel - which also has the nice effect of cooperating with
> (non-qemu) servers that don't do read-modify-write when exposing
> a block device with 4k sectors; it can also allow us to visit a
> file larger than 2T on a 32-bit kernel.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> include/block/nbd.h |  3 +++
> block/nbd-client.c  |  3 +++
> block/nbd.c         | 17 +++++++++---
> nbd/client.c        | 74 ++++++++++++++++++++++++++++++++++++++++++++++++-----
> 4 files changed, 87 insertions(+), 10 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index a5c68df..27a6854 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -133,6 +133,9 @@ enum {
> struct NbdExportInfo {
>     uint64_t size;
>     uint16_t flags;
> +    uint32_t min_block;
> +    uint32_t opt_block;
> +    uint32_t max_block;
> };
> typedef struct NbdExportInfo NbdExportInfo;
> 
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 2b6ac27..602a8ab 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -443,6 +443,9 @@ int nbd_client_init(BlockDriverState *bs,
>         logout("Failed to negotiate with the NBD server\n");
>         return ret;
>     }
> +    if (client->info.min_block > bs->request_alignment) {
> +        bs->request_alignment = client->info.min_block;
> +    }
> 
>     qemu_co_mutex_init(&client->send_mutex);
>     qemu_co_mutex_init(&client->free_sema);
> diff --git a/block/nbd.c b/block/nbd.c
> index 5172039..bb7df55 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -407,9 +407,20 @@ static int nbd_co_flush(BlockDriverState *bs)
> 
> static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
> {
> -    bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
> -    bs->bl.max_write_zeroes = UINT32_MAX >> BDRV_SECTOR_BITS;
> -    bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
> +    NbdClientSession *s = nbd_get_client_session(bs);
> +    int max = UINT32_MAX >> BDRV_SECTOR_BITS;
> +
> +    if (s->info.max_block) {
> +        max = s->info.max_block >> BDRV_SECTOR_BITS;
> +    }
> +    bs->bl.max_discard = max;
> +    bs->bl.max_write_zeroes = max;
> +    bs->bl.max_transfer_length = max;
> +
> +    if (s->info.opt_block &&
> +        s->info.opt_block >> BDRV_SECTOR_BITS > bs->bl.opt_transfer_length) {
> +        bs->bl.opt_transfer_length = s->info.opt_block >> BDRV_SECTOR_BITS;
> +    }
> }
> 
> static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
> diff --git a/nbd/client.c b/nbd/client.c
> index dac4f29..24f6b0b 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -232,6 +232,11 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply,
>                    reply->option);
>         break;
> 
> +    case NBD_REP_ERR_BLOCK_SIZE_REQD:
> +        error_setg(errp, "Server wants OPT_BLOCK_SIZE before option %" PRIx32,
> +                   reply->option);
> +        break;
> +
>     default:
>         error_setg(errp, "Unknown error code when asking for option %" PRIx32,
>                    reply->option);
> @@ -333,6 +338,21 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
>      * flags still 0 is a witness of a broken server. */
>     info->flags = 0;
> 
> +    /* Some servers use NBD_OPT_GO to advertise non-default block
> +     * sizes, and require that we first use NBD_OPT_BLOCK_SIZE to
> +     * agree to that. */
> +    TRACE("Attempting NBD_OPT_BLOCK_SIZE");
> +    if (nbd_send_option_request(ioc, NBD_OPT_BLOCK_SIZE, 0, NULL, errp) < 0) {
> +        return -1;
> +    }
> +    if (nbd_receive_option_reply(ioc, NBD_OPT_BLOCK_SIZE, &reply, errp) < 0) {
> +        return -1;
> +    }
> +    error = nbd_handle_reply_err(ioc, &reply, errp);
> +    if (error < 0) {
> +        return error;
> +    }
> +
>     TRACE("Attempting NBD_OPT_GO for export '%s'", wantname);
>     if (nbd_send_option_request(ioc, NBD_OPT_GO, -1, wantname, errp) < 0) {
>         return -1;
> @@ -402,6 +422,45 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
>                   info->size, info->flags);
>             break;
> 
> +        case NBD_INFO_BLOCK_SIZE:
> +            if (len != sizeof(info->min_block) * 3) {
> +                error_setg(errp, "remaining export info len %" PRIu32
> +                           " is unexpected size", len);
> +                return -1;
> +            }
> +            if (read_sync(ioc, &info->min_block, sizeof(info->min_block)) !=
> +                sizeof(info->min_block)) {
> +                error_setg(errp, "failed to read info minimum block size");
> +                return -1;
> +            }
> +            be32_to_cpus(&info->min_block);
> +            if (!is_power_of_2(info->min_block)) {
> +                error_setg(errp, "server minimum block size %" PRId32
> +                           "is not a power of two", info->min_block);
> +                return -1;
> +            }
> +            if (read_sync(ioc, &info->opt_block, sizeof(info->opt_block)) !=
> +                sizeof(info->opt_block)) {
> +                error_setg(errp, "failed to read info preferred block size");
> +                return -1;
> +            }
> +            be32_to_cpus(&info->opt_block);
> +            if (!is_power_of_2(info->opt_block) ||
> +                info->opt_block < info->min_block) {
> +                error_setg(errp, "server preferred block size %" PRId32
> +                           "is not valid", info->opt_block);
> +                return -1;
> +            }
> +            if (read_sync(ioc, &info->max_block, sizeof(info->max_block)) !=
> +                sizeof(info->max_block)) {
> +                error_setg(errp, "failed to read info maximum block size");
> +                return -1;
> +            }
> +            be32_to_cpus(&info->max_block);
> +            TRACE("Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32,
> +                  info->min_block, info->opt_block, info->max_block);
> +            break;
> +

You should probably check min_block <= opt_block <= max_block here

Also should there be a check that BDRV_SECTOR_SIZE >= min_block?


>         default:
>             TRACE("ignoring unknown export info %" PRIu16, type);
>             if (drop_sync(ioc, len) != len) {
> @@ -710,8 +769,9 @@ fail:
> #ifdef __linux__
> int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info)
> {
> -    unsigned long sectors = info->size / BDRV_SECTOR_SIZE;
> -    if (info->size / BDRV_SECTOR_SIZE != sectors) {
> +    unsigned long sector_size = MAX(BDRV_SECTOR_SIZE, info->min_block);
> +    unsigned long sectors = info->size / sector_size;
> +    if (info->size / sector_size != sectors) {
>         LOG("Export size %" PRId64 " too large for 32-bit kernel", info->size);
>         return -E2BIG;
>     }
> @@ -724,18 +784,18 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info)
>         return -serrno;
>     }
> 
> -    TRACE("Setting block size to %lu", (unsigned long)BDRV_SECTOR_SIZE);
> +    TRACE("Setting block size to %lu", sector_size);
> 
> -    if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) {
> +    if (ioctl(fd, NBD_SET_BLKSIZE, sector_size) < 0) {
>         int serrno = errno;
>         LOG("Failed setting NBD block size");
>         return -serrno;
>     }
> 
>     TRACE("Setting size to %lu block(s)", sectors);
> -    if (size % BDRV_SECTOR_SIZE) {
> -        TRACE("Ignoring trailing %d bytes of export",
> -              (int) (size % BDRV_SECTOR_SIZE));
> +    if (info->size % sector_size) {
> +        TRACE("Ignoring trailing %" PRId64 " bytes of export",
> +              info->size % sector_size);
>     }
> 
>     if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) {
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 44/44] nbd: Implement NBD_OPT_BLOCK_SIZE on client
  2016-04-25 12:19   ` Alex Bligh
@ 2016-04-25 19:16     ` Eric Blake
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-25 19:16 UTC (permalink / raw)
  To: Alex Bligh; +Cc: qemu-devel, Kevin Wolf, Paolo Bonzini, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 3034 bytes --]

On 04/25/2016 06:19 AM, Alex Bligh wrote:
> 
> On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:
> 
>> The upstream NBD Protocol has defined a new extension to allow
>> the server to advertise block sizes to the client, as well as
>> a way for the client to inform the server that it intends to
>> obey block sizes.
>>
>> Pass any received sizes on to the block layer.
>>
>> Use the minimum block size as the sector size we pass to the
>> kernel - which also has the nice effect of cooperating with
>> (non-qemu) servers that don't do read-modify-write when exposing
>> a block device with 4k sectors; it can also allow us to visit a
>> file larger than 2T on a 32-bit kernel.
>>

>> +            be32_to_cpus(&info->max_block);
>> +            TRACE("Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32,
>> +                  info->min_block, info->opt_block, info->max_block);
>> +            break;
>> +
> 
> You should probably check min_block <= opt_block <= max_block here

opt_block > max_block is possible if max_block is clamped to export size
(in the degenerate case where you have a small export that is too small
for the granularity of a hole or efficient I/O).  But yes, some sanity
checks that the server isn't horribly broken might be worthwhile.

> 
> Also should there be a check that BDRV_SECTOR_SIZE >= min_block?

No, because we take the server's min_block and feed it into
bs->request_align (which forces the block layer to comply with a minimum
alignment larger than 512, using code already tested on physical block
drives with 4k sectors), see the hunk in nbd-client.c.

>> int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info)
>> {
>> -    unsigned long sectors = info->size / BDRV_SECTOR_SIZE;
>> -    if (info->size / BDRV_SECTOR_SIZE != sectors) {
>> +    unsigned long sector_size = MAX(BDRV_SECTOR_SIZE, info->min_block);
>> +    unsigned long sectors = info->size / sector_size;
>> +    if (info->size / sector_size != sectors) {
>>         LOG("Export size %" PRId64 " too large for 32-bit kernel", info->size);
>>         return -E2BIG;
>>     }
>> @@ -724,18 +784,18 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info)
>>         return -serrno;
>>     }
>>
>> -    TRACE("Setting block size to %lu", (unsigned long)BDRV_SECTOR_SIZE);
>> +    TRACE("Setting block size to %lu", sector_size);
>>
>> -    if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) {
>> +    if (ioctl(fd, NBD_SET_BLKSIZE, sector_size) < 0) {

We also feed the maximum of 512 or the advertised minimum block size
into the kernel when using ioctl() for the kernel to take over
transmission phase; although I'm not certain whether the kernel obeys
NBD_SET_BLKSIZE as a hint rather than a hard rule - but if that needs
patching, it needs patching in the kernel implementation, not in qemu.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests
  2016-04-25  9:47   ` Alex Bligh
@ 2016-04-25 19:20     ` Eric Blake
  2016-04-25 19:40       ` Alex Bligh
  0 siblings, 1 reply; 67+ messages in thread
From: Eric Blake @ 2016-04-25 19:20 UTC (permalink / raw)
  To: Alex Bligh; +Cc: qemu-devel, Kevin Wolf, Paolo Bonzini, qemu block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2683 bytes --]

On 04/25/2016 03:47 AM, Alex Bligh wrote:
> 
> On 23 Apr 2016, at 00:40, Eric Blake <eblake@redhat.com> wrote:
> 
>> NBD commit 6d34500b clarified how clients and servers are supposed
>> to behave before closing a connection. It added NBD_REP_ERR_SHUTDOWN
>> (for the server to announce it is about to go away during option
>> haggling, so the client should quit sending NBD_OPT_* other than
>> NBD_OPT_ABORT) and ESHUTDOWN (for the server to announce it is about
>> to go away during transmission, so the client should quit sending
>> NBD_CMD_* other than NBD_CMD_DISC).  It also clarified that
>> NBD_OPT_ABORT gets a reply, while NBD_CMD_DISC does not.
>>
>> This patch merely adds the missing reply to NBD_OPT_ABORT and teaches
>> the client to recognize server errors.  Actually teaching the server
>> to send NBD_REP_ERR_SHUTDOWN or ESHUTDOWN would require knowing that
>> the server has been requested to shut down soon (maybe we could do
>> that by installing a SIGINT handler in qemu-nbd, which transitions
>> from RUNNING to a new state that waits for the client to react,
>> rather than just out-right quitting).
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---

>> @@ -484,6 +486,10 @@ static int nbd_negotiate_options(NBDClient *client)
>>                 if (ret < 0) {
>>                     return ret;
>>                 }
>> +                /* Let the client keep trying, unless they asked to quit */
>> +                if (clientflags == NBD_OPT_ABORT) {
> 
> OK that's totally confusing. clientflags is not the client flags. clientflags
> is the NBD option ID, which happens to be the two bytes after the NBD OPT magic,
> which is the client flags if we were doing oldstyle negotiation, not newstyle
> negotiation.

Yes, 'clientflags' is a poor name; I should rename it in a separate
patch.   It is the option-negotiation command type.

> 
> Except:
> 
>> +                    return -EINVAL;
>> +                }
>>                 break;
>>             }
>>         } else if (fixedNewstyle) {
> 
> So the above is for NewStyle (not fixedNewStyle)?

The above is for fixedNewStyle when TLS is not negotiated yet; the 'else
if' is for fixedNewStyle after TLS has been negotiated.  Prior to TLS,
we have to special-case NBD_OPT_ABORT and actually quit.

> 
> In which case more than one option isn't even supported, so what's the
> stuff purporting to handle TLS doing there?
> 
> Confused ...

Sounds like a cleanup patch as a prerequisite on my next respin would
help, then.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests
  2016-04-25 19:20     ` Eric Blake
@ 2016-04-25 19:40       ` Alex Bligh
  2016-04-25 19:48         ` Eric Blake
  0 siblings, 1 reply; 67+ messages in thread
From: Alex Bligh @ 2016-04-25 19:40 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Bligh, Kevin Wolf, Paolo Bonzini, qemu-devel, qemu block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 642 bytes --]


On 25 Apr 2016, at 20:20, Eric Blake <eblake@redhat.com> wrote:

>>>        } else if (fixedNewstyle) {
>> 
>> So the above is for NewStyle (not fixedNewStyle)?
> 
> The above is for fixedNewStyle when TLS is not negotiated yet; the 'else
> if' is for fixedNewStyle after TLS has been negotiated.  Prior to TLS,
> we have to special-case NBD_OPT_ABORT and actually quit.

OK. fixedNewStyle is defined as a prerequisite for TLS. I'm hoping
nothing in Qemuland ever did non-fixed NewStyle, and assuming that's
the case I would not even support it (certainly it won't play
nicely with all the other stuff you've been doing).

--
Alex Bligh





[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests
  2016-04-25 19:40       ` Alex Bligh
@ 2016-04-25 19:48         ` Eric Blake
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Blake @ 2016-04-25 19:48 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, qemu block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1339 bytes --]

On 04/25/2016 01:40 PM, Alex Bligh wrote:
> 
> On 25 Apr 2016, at 20:20, Eric Blake <eblake@redhat.com> wrote:
> 
>>>>        } else if (fixedNewstyle) {
>>>
>>> So the above is for NewStyle (not fixedNewStyle)?
>>
>> The above is for fixedNewStyle when TLS is not negotiated yet; the 'else
>> if' is for fixedNewStyle after TLS has been negotiated.  Prior to TLS,
>> we have to special-case NBD_OPT_ABORT and actually quit.
> 
> OK. fixedNewStyle is defined as a prerequisite for TLS. I'm hoping
> nothing in Qemuland ever did non-fixed NewStyle, and assuming that's
> the case I would not even support it (certainly it won't play
> nicely with all the other stuff you've been doing).

Well, there were some last-minute patches that went into the 2.6
candidates that fixed qemu to actually be a fixedNewStyle server
(without commit 156f6a10, for example, qemu had the very bug of
disconnecting on unknown client options that fixedNewStyle was supposed
to prevent). Fortunately, qemu 2.5 is oldstyle only, and qemu 2.6 is the
first newstyle server, and I think I got the worst of the
interoperability bugs nailed in 2.6, whereas this series is focusing on
the feature enhancements for inclusion in 2.7.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Qemu-devel] [PATCH v3 16/44] atapi: Switch to byte-based block access
  2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 16/44] atapi: " Eric Blake
@ 2016-04-25 21:36   ` John Snow
  0 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2016-04-25 21:36 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: alex, qemu-block



On 04/22/2016 07:40 PM, Eric Blake wrote:
> Sector-based blk_read() should die; switch to byte-based
> blk_pread() instead.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  hw/ide/atapi.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 2bb606c..81000d8 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -119,12 +119,12 @@ cd_read_sector_sync(IDEState *s)
> 
>      switch (s->cd_sector_size) {
>      case 2048:
> -        ret = blk_read(s->blk, (int64_t)s->lba << 2,
> -                       s->io_buffer, 4);
> +        ret = blk_pread(s->blk, (int64_t)s->lba << (2 + BDRV_SECTOR_BITS),
> +                        s->io_buffer, 4 << BDRV_SECTOR_BITS);
>          break;
>      case 2352:
> -        ret = blk_read(s->blk, (int64_t)s->lba << 2,
> -                       s->io_buffer + 16, 4);
> +        ret = blk_pread(s->blk, (int64_t)s->lba << (2 + BDRV_SECTOR_BITS),

Uh, hm. So what we have is a cdrom-sector-based LBA, that we need to
transform into IDE-based sectors, then to bytes.

We could just define an ATAPI_SECTOR_BITS to be (2 + BDRV_SECTOR_BITS).

Then, the lba conversion would be just:

s->lba << ATAPI_SECTOR_BITS

and the size would be just:

1 << ATAPI_SECTOR_BITS

And that's probably better on the eyes.

> +                        s->io_buffer + 16, 4 << BDRV_SECTOR_BITS);
>          if (ret >= 0) {
>              cd_data_to_raw(s->io_buffer, s->lba);
>          }
> 

The code already uses lots of different stuff like
4 * BDRV_SECTOR_SIZE
or
"2048"

so we probably need some staple definition here.


...Otherwise, none of this is a problem you've created, just one this
patch highlights. Fix at your own peril.

Acked-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2016-04-25 21:36 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-22 23:40 [Qemu-devel] [PATCH v3 00/44] NBD protocol additions Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 01/44] nbd: More debug typo fixes, use correct formats Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 02/44] nbd: Quit server after any write error Eric Blake
2016-04-25  9:21   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 03/44] nbd: Improve server handling of bogus commands Eric Blake
2016-04-25  9:29   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 04/44] nbd: Reject unknown request flags Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 05/44] nbd: Group all Linux-specific ioctl code in one place Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 06/44] nbd: Clean up ioctl handling of qemu-nbd -c Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 07/44] nbd: Limit nbdflags to 16 bits Eric Blake
2016-04-25  9:24   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 08/44] nbd: Add qemu-nbd -D for human-readable description Eric Blake
2016-04-25  9:25   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 09/44] block: Allow BDRV_REQ_FUA through blk_pwrite() Eric Blake
2016-04-23  8:12   ` Denis V. Lunev
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 10/44] fdc: Switch to byte-based block access Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 11/44] nand: " Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 12/44] onenand: " Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 13/44] pflash: " Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 14/44] sd: " Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 15/44] m25p80: " Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 16/44] atapi: " Eric Blake
2016-04-25 21:36   ` John Snow
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 17/44] nbd: " Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 18/44] qemu-img: " Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 19/44] qemu-io: " Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 20/44] block: Switch blk_read_unthrottled() to byte interface Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 21/44] block: Switch blk_write_zeroes() " Eric Blake
2016-04-23  8:12   ` Denis V. Lunev
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 22/44] block: Kill blk_write(), blk_read() Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 23/44] qemu-io: Add missing option documentation Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 24/44] qemu-io: Add 'write -f' to test FUA flag Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 25/44] qemu-io: Add 'open -u' to set BDRV_O_UNMAP after the fact Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 26/44] qemu-io: Add 'write -z -u' to test MAY_UNMAP flag Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 27/44] nbd: Use BDRV_REQ_FUA for better FUA where supported Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 28/44] nbd: Detect servers that send unexpected error values Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 29/44] nbd: Avoid magic number for NBD max name size Eric Blake
2016-04-25  9:32   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 30/44] nbd: Treat flags vs. command type as separate fields Eric Blake
2016-04-25  9:34   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 31/44] nbd: Share common reply-sending code in server Eric Blake
2016-04-25  9:34   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 32/44] nbd: Share common option-sending code in client Eric Blake
2016-04-25  9:37   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 33/44] nbd: Let client skip portions of server reply Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 34/44] nbd: Less allocation during NBD_OPT_LIST Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 35/44] nbd: Support shorter handshake Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests Eric Blake
2016-04-25  9:47   ` Alex Bligh
2016-04-25 19:20     ` Eric Blake
2016-04-25 19:40       ` Alex Bligh
2016-04-25 19:48         ` Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 37/44] nbd: Create struct for tracking export info Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 38/44] block: Add blk_get_opt_transfer_length() Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 39/44] nbd: Implement NBD_OPT_GO on server Eric Blake
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 40/44] nbd: Implement NBD_OPT_GO on client Eric Blake
2016-04-25 10:31   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 41/44] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
2016-04-23  9:00   ` Pavel Borzenkov
2016-04-25 12:11   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 42/44] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
2016-04-25 12:12   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 43/44] nbd: Implement NBD_OPT_BLOCK_SIZE on server Eric Blake
2016-04-25 12:16   ` Alex Bligh
2016-04-22 23:40 ` [Qemu-devel] [PATCH v3 44/44] nbd: Implement NBD_OPT_BLOCK_SIZE on client Eric Blake
2016-04-25 12:19   ` Alex Bligh
2016-04-25 19:16     ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.