All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES
@ 2018-05-01 21:13 Eric Blake
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info Eric Blake
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Eric Blake @ 2018-05-01 21:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, edgar.kaziakhmedov, vsementsov

This is my counterproposal for:
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02107.html
where Edgar pointed out that NBD write zero requests are rather
inefficient because the NBD spec was ambiguous on whether we could
exceed an advertised maximum request size even when the command
carries no payload, and so for maximum compatibility with unknown
NBD partners, we erred on the conservative side.

The NBD spec has since been improved to mention that a client can
try a larger WRITE_ZEROES if it is prepared to handle an EINVAL, and
that a server should not hang up on overlarge requests that did not
include a payload.  But in the process it was pointed out that even
better would be improving the spec to advertise both limits (the
READ/WRITE payload limit, and the larger TRIM/ZERO limit), as done
in this series.

With this in place, qemu as both NBD client and server can send
zero requests up to BDRV_REQUEST_MAX_BYTES (nearly 2G) in length,
rather than having to subdivide the request into 32M chunks.

This series is experimental until the NBD spec is actually modified
to document the feature added here; but that modification will be
easier now that I have a proof-of-concept implementation.

Eric Blake (4):
  nbd: Prepare for additional block sizing info
  nbd/client: Refactor in preparation for more limits
  nbd/client: Support requests of additional block sizing info
  nbd/server: Support requests of additional block sizing info

 include/block/nbd.h |   8 +++-
 block/nbd.c         |  15 +++++-
 nbd/client.c        | 135 +++++++++++++++++++++++++++++++++++++++++++++++-----
 nbd/common.c        |   4 ++
 nbd/server.c        |  30 ++++++++++++
 nbd/trace-events    |   4 ++
 6 files changed, 181 insertions(+), 15 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info
  2018-05-01 21:13 [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES Eric Blake
@ 2018-05-01 21:13 ` Eric Blake
  2018-05-03  8:20   ` Vladimir Sementsov-Ogievskiy
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits Eric Blake
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Eric Blake @ 2018-05-01 21:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, edgar.kaziakhmedov, vsementsov, Paolo Bonzini,
	Kevin Wolf, Max Reitz

The NBD spec is clarifying [1] that a server may want to advertise
different limits for READ/WRITE (in our case, 32M) than for
TRIM/ZERO (in our case, nearly 4G).  Add the constants and name
lookups for new NBD_INFO_ fields used during handshake to convey
this additional information.  Note that the NBD spec already
requires servers to ignore unknown requests from the client, and
for clients to ignore unknown gratuitous responses sent from
the server.

[1] https://lists.debian.org/nbd/2018/03/msg00048.html

Signed-off-by: Eric Blake <eblake@redhat.com>

---

The given URL for the NBD spec was v3; it will change to be a v4
version of that patch in part to point back to this qemu commit
as a proof of implementation.
---
 include/block/nbd.h | 4 +++-
 nbd/common.c        | 4 ++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fcdcd545023..cbf51628f78 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2017 Red Hat, Inc.
+ *  Copyright (C) 2016-2018 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -180,6 +180,8 @@ typedef struct NBDExtent {
 #define NBD_INFO_NAME           1
 #define NBD_INFO_DESCRIPTION    2
 #define NBD_INFO_BLOCK_SIZE     3
+#define NBD_INFO_TRIM_SIZE      4
+#define NBD_INFO_ZERO_SIZE      5

 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
diff --git a/nbd/common.c b/nbd/common.c
index 8c95c1d606e..840c91d5ca4 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -129,6 +129,10 @@ const char *nbd_info_lookup(uint16_t info)
         return "description";
     case NBD_INFO_BLOCK_SIZE:
         return "block size";
+    case NBD_INFO_TRIM_SIZE:
+        return "trim size";
+    case NBD_INFO_ZERO_SIZE:
+        return "zero size";
     default:
         return "<unknown>";
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits
  2018-05-01 21:13 [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES Eric Blake
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info Eric Blake
@ 2018-05-01 21:13 ` Eric Blake
  2018-05-03  8:29   ` Vladimir Sementsov-Ogievskiy
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info Eric Blake
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 4/4] nbd/server: " Eric Blake
  3 siblings, 1 reply; 12+ messages in thread
From: Eric Blake @ 2018-05-01 21:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, edgar.kaziakhmedov, vsementsov, Paolo Bonzini

The next patch will ask the server for more items of NBD_INFO.
However, the server is free to respond with INFO items in a
different order than what we request, so performing any sanity
checks about constraints that occur between multiple INFO items
must be done after all items have been received.  Make it easier
to see that such checks are last by sinking the final processing
after the while loop, rather than embedded in the NBD_REP_ACK
processing that occurs textually within the loop before other
INFO processing.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/client.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 232ff4f46da..0abb195b856 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2017 Red Hat, Inc.
+ *  Copyright (C) 2016-2018 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Client Side
@@ -365,17 +365,12 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,

         if (reply.type == NBD_REP_ACK) {
             /* Server is done sending info and moved into transmission
-               phase, but make sure it sent flags */
+               phase */
             if (len) {
                 error_setg(errp, "server sent invalid NBD_REP_ACK");
                 return -1;
             }
-            if (!info->flags) {
-                error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
-                return -1;
-            }
-            trace_nbd_opt_go_success();
-            return 1;
+            break;
         }
         if (reply.type != NBD_REP_INFO) {
             error_setg(errp, "unexpected reply type %" PRIu32
@@ -482,6 +477,14 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
             break;
         }
     }
+
+    /* Sanity check that server's responses make sense */
+    if (!info->flags) {
+        error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
+        return -1;
+    }
+    trace_nbd_opt_go_success();
+    return 1;
 }

 /* Return -1 on failure, 0 if wantname is an available export. */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
  2018-05-01 21:13 [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES Eric Blake
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info Eric Blake
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits Eric Blake
@ 2018-05-01 21:13 ` Eric Blake
  2018-05-03  9:17   ` Vladimir Sementsov-Ogievskiy
  2018-05-22 15:33   ` Vladimir Sementsov-Ogievskiy
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 4/4] nbd/server: " Eric Blake
  3 siblings, 2 replies; 12+ messages in thread
From: Eric Blake @ 2018-05-01 21:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, edgar.kaziakhmedov, vsementsov, Paolo Bonzini,
	Kevin Wolf, Max Reitz

The NBD spec is clarifying [1] that a server may want to advertise
different limits for READ/WRITE (in our case, 32M) than for
TRIM/ZERO (in our case, nearly 4G).  Implement the client
side support for these alternate limits, by always requesting
the new information (a compliant server must ignore the
request if it is unknown), and by validating anything reported
by the server before populating the block layer limits.

[1] https://lists.debian.org/nbd/2018/03/msg00048.html

Signed-off-by: Eric Blake <eblake@redhat.com>

---

The given URL for the NBD spec was v3; it will change to be a v4
version of that patch in part to point back to this qemu commit
as a proof of implementation.
---
 include/block/nbd.h |   4 ++
 block/nbd.c         |  15 ++++++-
 nbd/client.c        | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 nbd/trace-events    |   2 +
 4 files changed, 131 insertions(+), 6 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index cbf51628f78..90ddef32bb3 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -270,6 +270,10 @@ struct NBDExportInfo {
     uint32_t min_block;
     uint32_t opt_block;
     uint32_t max_block;
+    uint32_t min_trim;
+    uint32_t max_trim;
+    uint32_t min_zero;
+    uint32_t max_zero;

     uint32_t meta_base_allocation_id;
 };
diff --git a/block/nbd.c b/block/nbd.c
index 1e2b3ba2d3b..823d10b251d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -478,8 +478,19 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
     uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);

     bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
-    bs->bl.max_pdiscard = max;
-    bs->bl.max_pwrite_zeroes = max;
+    if (s->info.max_trim) {
+        bs->bl.max_pdiscard = MIN(s->info.max_trim, BDRV_REQUEST_MAX_BYTES);
+    } else {
+        bs->bl.max_pdiscard = max;
+    }
+    bs->bl.pdiscard_alignment = s->info.min_trim;
+    if (s->info.max_zero) {
+        bs->bl.max_pwrite_zeroes = MIN(s->info.max_zero,
+                                       BDRV_REQUEST_MAX_BYTES);
+    } else {
+        bs->bl.max_pwrite_zeroes = max;
+    }
+    bs->bl.pwrite_zeroes_alignment = s->info.min_zero;
     bs->bl.max_transfer = max;

     if (s->info.opt_block &&
diff --git a/nbd/client.c b/nbd/client.c
index 0abb195b856..f1364747ba1 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -337,16 +337,18 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
     info->flags = 0;

     trace_nbd_opt_go_start(wantname);
-    buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
+    buf = g_malloc(4 + len + 2 + 3 * 2 * info->request_sizes + 1);
     stl_be_p(buf, len);
     memcpy(buf + 4, wantname, len);
-    /* At most one request, everything else up to server */
-    stw_be_p(buf + 4 + len, info->request_sizes);
+    /* Either 0 or 3 requests, everything else up to server */
+    stw_be_p(buf + 4 + len, 3 * info->request_sizes);
     if (info->request_sizes) {
         stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
+        stw_be_p(buf + 4 + len + 2 + 2, NBD_INFO_TRIM_SIZE);
+        stw_be_p(buf + 4 + len + 2 + 2 + 2, NBD_INFO_ZERO_SIZE);
     }
     error = nbd_send_option_request(ioc, NBD_OPT_GO,
-                                    4 + len + 2 + 2 * info->request_sizes,
+                                    4 + len + 2 + 3 * 2 * info->request_sizes,
                                     buf, errp);
     g_free(buf);
     if (error < 0) {
@@ -467,6 +469,72 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
                                              info->max_block);
             break;

+        case NBD_INFO_TRIM_SIZE:
+            if (len != sizeof(info->min_trim) * 2) {
+                error_setg(errp, "remaining trim info len %" PRIu32
+                           " is unexpected size", len);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            if (nbd_read(ioc, &info->min_trim, sizeof(info->min_trim),
+                         errp) < 0) {
+                error_prepend(errp, "failed to read info minimum trim size: ");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            be32_to_cpus(&info->min_trim);
+            if (nbd_read(ioc, &info->max_trim, sizeof(info->max_trim),
+                         errp) < 0) {
+                error_prepend(errp,
+                              "failed to read info maximum trim size: ");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            be32_to_cpus(&info->max_trim);
+            if (!info->min_trim || !info->max_trim ||
+                (info->max_trim != UINT32_MAX &&
+                 !QEMU_IS_ALIGNED(info->max_trim, info->min_trim))) {
+                error_setg(errp, "server trim sizes %" PRIu32 "/%" PRIu32
+                           " are not valid", info->min_trim, info->max_trim);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            trace_nbd_opt_go_info_trim_size(info->min_trim, info->max_trim);
+            break;
+
+        case NBD_INFO_ZERO_SIZE:
+            if (len != sizeof(info->min_zero) * 2) {
+                error_setg(errp, "remaining zero info len %" PRIu32
+                           " is unexpected size", len);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            if (nbd_read(ioc, &info->min_zero, sizeof(info->min_zero),
+                         errp) < 0) {
+                error_prepend(errp, "failed to read info minimum zero size: ");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            be32_to_cpus(&info->min_zero);
+            if (nbd_read(ioc, &info->max_zero, sizeof(info->max_zero),
+                         errp) < 0) {
+                error_prepend(errp,
+                              "failed to read info maximum zero size: ");
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            be32_to_cpus(&info->max_zero);
+            if (!info->min_zero || !info->max_zero ||
+                (info->max_zero != UINT32_MAX &&
+                 !QEMU_IS_ALIGNED(info->max_zero, info->min_zero))) {
+                error_setg(errp, "server zero sizes %" PRIu32 "/%" PRIu32
+                           " are not valid", info->min_zero, info->max_zero);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            trace_nbd_opt_go_info_zero_size(info->min_zero, info->max_zero);
+            break;
+
         default:
             trace_nbd_opt_go_info_unknown(type, nbd_info_lookup(type));
             if (nbd_drop(ioc, len, errp) < 0) {
@@ -483,6 +551,46 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
         error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
         return -1;
     }
+    if (info->min_trim) {
+        if (!info->min_block) {
+            error_setg(errp, "broken server sent INFO_TRIM_SIZE without"
+                       " INFO_BLOCK_SIZE");
+            return -1;
+        }
+        if (info->min_trim < info->opt_block) {
+            error_setg(errp, "broken server sent INFO_TRIM_SIZE with"
+                       " minimum trim %" PRIu32 " less than preferred block %"
+                       PRIu32, info->min_trim, info->opt_block);
+            return -1;
+        }
+        if (info->max_trim < info->max_block) {
+            error_setg(errp, "broken server sent INFO_TRIM_SIZE with"
+                       " maximum trim %" PRIu32 " less than maximum block %"
+                       PRIu32, info->max_trim, info->max_block);
+            return -1;
+        }
+        info->max_trim = QEMU_ALIGN_DOWN(info->max_trim, info->min_block);
+    }
+    if (info->min_zero) {
+        if (!info->min_block) {
+            error_setg(errp, "broken server sent INFO_ZERO_SIZE without"
+                       " INFO_BLOCK_SIZE");
+            return -1;
+        }
+        if (info->min_zero < info->opt_block) {
+            error_setg(errp, "broken server sent INFO_ZERO_SIZE with"
+                       " minimum zero %" PRIu32 " less than preferred block %"
+                       PRIu32, info->min_zero, info->opt_block);
+            return -1;
+        }
+        if (info->max_zero < info->max_block) {
+            error_setg(errp, "broken server sent INFO_ZERO_SIZE with"
+                       " maximum zero %" PRIu32 " less than maximum block %"
+                       PRIu32, info->max_zero, info->max_block);
+            return -1;
+        }
+        info->max_zero = QEMU_ALIGN_DOWN(info->max_zero, info->min_block);
+    }
     trace_nbd_opt_go_success();
     return 1;
 }
diff --git a/nbd/trace-events b/nbd/trace-events
index dee081e7758..ddb9d0a2b3e 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -6,6 +6,8 @@ nbd_opt_go_start(const char *name) "Attempting NBD_OPT_GO for export '%s'"
 nbd_opt_go_success(void) "Export is good to go"
 nbd_opt_go_info_unknown(int info, const char *name) "Ignoring unknown info %d (%s)"
 nbd_opt_go_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t maximum) "Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32
+nbd_opt_go_info_trim_size(uint32_t minimum, uint32_t maximum) "Trim sizes are 0x%" PRIx32 ", 0x%" PRIx32
+nbd_opt_go_info_zero_size(uint32_t minimum, uint32_t maximum) "Zero sizes are 0x%" PRIx32 ", 0x%" PRIx32
 nbd_receive_query_exports_start(const char *wantname) "Querying export list for '%s'"
 nbd_receive_query_exports_success(const char *wantname) "Found desired export name '%s'"
 nbd_receive_starttls_new_client(void) "Setting up TLS"
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 4/4] nbd/server: Support requests of additional block sizing info
  2018-05-01 21:13 [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES Eric Blake
                   ` (2 preceding siblings ...)
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info Eric Blake
@ 2018-05-01 21:13 ` Eric Blake
  3 siblings, 0 replies; 12+ messages in thread
From: Eric Blake @ 2018-05-01 21:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, edgar.kaziakhmedov, vsementsov, Paolo Bonzini

The NBD spec is clarifying [1] that a server may want to advertise
different limits for READ/WRITE (in our case, 32M) than for
TRIM/ZERO (in our case, nearly 4G).  Implement the server
side support for these alternate limits, by always advertising
the new information (a compliant client must ignore a gratuitous
advertisement if it is unknown).

[1] https://lists.debian.org/nbd/2018/03/msg00048.html

Signed-off-by: Eric Blake <eblake@redhat.com>

---

The given URL for the NBD spec was v3; it will change to be a v4
version of that patch in part to point back to this qemu commit
as a proof of implementation.
---
 nbd/server.c     | 30 ++++++++++++++++++++++++++++++
 nbd/trace-events |  2 ++
 2 files changed, 32 insertions(+)

diff --git a/nbd/server.c b/nbd/server.c
index 9e1f2271784..39502b58446 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -617,6 +617,36 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint16_t myflags,
         return rc;
     }

+    /* Send NBD_INFO_TRIM_SIZE always. */
+    /* minimum - Hard-code to 4096 for now, matching preferred block.
+     * TODO: consult blk_bs(blk)->bl.pdiscard_alignment? */
+    sizes[0] = 4096;
+    /* maximum - < 2G (then block layer fragments to bl.max_pdiscard). */
+    sizes[1] = QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES, sizes[0]);
+    trace_nbd_negotiate_handle_info_trim_size(sizes[0], sizes[1]);
+    cpu_to_be32s(&sizes[0]);
+    cpu_to_be32s(&sizes[1]);
+    rc = nbd_negotiate_send_info(client, NBD_INFO_TRIM_SIZE,
+                                 2 * sizeof(sizes[0]), sizes, errp);
+    if (rc < 0) {
+        return rc;
+    }
+
+    /* Send NBD_INFO_ZERO_SIZE always. */
+    /* minimum - Hard-code to 4096 for now, matching preferred block.
+     * TODO: consult blk_bs(blk)->bl.pwrite_zeroes_alignment? */
+    sizes[0] = 4096;
+    /* maximum - < 2G (then block layer fragments to bl.max_pwrite_zeroes). */
+    sizes[1] = QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES, sizes[0]);
+    trace_nbd_negotiate_handle_info_zero_size(sizes[0], sizes[1]);
+    cpu_to_be32s(&sizes[0]);
+    cpu_to_be32s(&sizes[1]);
+    rc = nbd_negotiate_send_info(client, NBD_INFO_ZERO_SIZE,
+                                 2 * sizeof(sizes[0]), sizes, errp);
+    if (rc < 0) {
+        return rc;
+    }
+
     /* Send NBD_INFO_EXPORT always */
     trace_nbd_negotiate_new_style_size_flags(exp->size,
                                              exp->nbdflags | myflags);
diff --git a/nbd/trace-events b/nbd/trace-events
index ddb9d0a2b3e..d8849c43b21 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -46,6 +46,8 @@ nbd_negotiate_send_info(int info, const char *name, uint32_t length) "Sending NB
 nbd_negotiate_handle_info_requests(int requests) "Client requested %d items of info"
 nbd_negotiate_handle_info_request(int request, const char *name) "Client requested info %d (%s)"
 nbd_negotiate_handle_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t maximum) "advertising minimum 0x%" PRIx32 ", preferred 0x%" PRIx32 ", maximum 0x%" PRIx32
+nbd_negotiate_handle_info_trim_size(uint32_t minimum, uint32_t maximum) "advertising minimum 0x%" PRIx32 ", maximum 0x%" PRIx32
+nbd_negotiate_handle_info_zero_size(uint32_t minimum, uint32_t maximum) "advertising minimum 0x%" PRIx32 ", maximum 0x%" PRIx32
 nbd_negotiate_handle_starttls(void) "Setting up TLS"
 nbd_negotiate_handle_starttls_handshake(void) "Starting TLS handshake"
 nbd_negotiate_meta_context(const char *optname, const char *export, uint32_t queries) "Client requested %s for export %s, with %" PRIu32 " queries"
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info Eric Blake
@ 2018-05-03  8:20   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-03  8:20 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz

02.05.2018 00:13, Eric Blake wrote:
> The NBD spec is clarifying [1] that a server may want to advertise
> different limits for READ/WRITE (in our case, 32M) than for
> TRIM/ZERO (in our case, nearly 4G).  Add the constants and name
> lookups for new NBD_INFO_ fields used during handshake to convey
> this additional information.  Note that the NBD spec already
> requires servers to ignore unknown requests from the client, and
> for clients to ignore unknown gratuitous responses sent from
> the server.
>
> [1] https://lists.debian.org/nbd/2018/03/msg00048.html
>
> Signed-off-by: Eric Blake <eblake@redhat.com>


Reviewed-by:  Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

>
> ---
>
> The given URL for the NBD spec was v3; it will change to be a v4
> version of that patch in part to point back to this qemu commit
> as a proof of implementation.
> ---
>   include/block/nbd.h | 4 +++-
>   nbd/common.c        | 4 ++++
>   2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index fcdcd545023..cbf51628f78 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -1,5 +1,5 @@
>   /*
> - *  Copyright (C) 2016-2017 Red Hat, Inc.
> + *  Copyright (C) 2016-2018 Red Hat, Inc.
>    *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>    *
>    *  Network Block Device
> @@ -180,6 +180,8 @@ typedef struct NBDExtent {
>   #define NBD_INFO_NAME           1
>   #define NBD_INFO_DESCRIPTION    2
>   #define NBD_INFO_BLOCK_SIZE     3
> +#define NBD_INFO_TRIM_SIZE      4
> +#define NBD_INFO_ZERO_SIZE      5
>
>   /* Request flags, sent from client to server during transmission phase */
>   #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
> diff --git a/nbd/common.c b/nbd/common.c
> index 8c95c1d606e..840c91d5ca4 100644
> --- a/nbd/common.c
> +++ b/nbd/common.c
> @@ -129,6 +129,10 @@ const char *nbd_info_lookup(uint16_t info)
>           return "description";
>       case NBD_INFO_BLOCK_SIZE:
>           return "block size";
> +    case NBD_INFO_TRIM_SIZE:
> +        return "trim size";
> +    case NBD_INFO_ZERO_SIZE:
> +        return "zero size";
>       default:
>           return "<unknown>";
>       }


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits Eric Blake
@ 2018-05-03  8:29   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-03  8:29 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini

02.05.2018 00:13, Eric Blake wrote:
> The next patch will ask the server for more items of NBD_INFO.
> However, the server is free to respond with INFO items in a
> different order than what we request, so performing any sanity
> checks about constraints that occur between multiple INFO items
> must be done after all items have been received.  Make it easier
> to see that such checks are last by sinking the final processing
> after the while loop, rather than embedded in the NBD_REP_ACK
> processing that occurs textually within the loop before other
> INFO processing.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by:  Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> ---
>   nbd/client.c | 19 +++++++++++--------
>   1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/nbd/client.c b/nbd/client.c
> index 232ff4f46da..0abb195b856 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -1,5 +1,5 @@
>   /*
> - *  Copyright (C) 2016-2017 Red Hat, Inc.
> + *  Copyright (C) 2016-2018 Red Hat, Inc.
>    *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>    *
>    *  Network Block Device Client Side
> @@ -365,17 +365,12 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
>
>           if (reply.type == NBD_REP_ACK) {
>               /* Server is done sending info and moved into transmission
> -               phase, but make sure it sent flags */
> +               phase */
>               if (len) {
>                   error_setg(errp, "server sent invalid NBD_REP_ACK");
>                   return -1;
>               }
> -            if (!info->flags) {
> -                error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
> -                return -1;
> -            }
> -            trace_nbd_opt_go_success();
> -            return 1;
> +            break;
>           }
>           if (reply.type != NBD_REP_INFO) {
>               error_setg(errp, "unexpected reply type %" PRIu32
> @@ -482,6 +477,14 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
>               break;
>           }
>       }
> +
> +    /* Sanity check that server's responses make sense */
> +    if (!info->flags) {
> +        error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
> +        return -1;
> +    }
> +    trace_nbd_opt_go_success();
> +    return 1;
>   }
>
>   /* Return -1 on failure, 0 if wantname is an available export. */


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info Eric Blake
@ 2018-05-03  9:17   ` Vladimir Sementsov-Ogievskiy
  2018-05-03 14:49     ` Eric Blake
  2018-05-22 15:33   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-03  9:17 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz

02.05.2018 00:13, Eric Blake wrote:
> The NBD spec is clarifying [1] that a server may want to advertise
> different limits for READ/WRITE (in our case, 32M) than for
> TRIM/ZERO (in our case, nearly 4G).  Implement the client
> side support for these alternate limits, by always requesting
> the new information (a compliant server must ignore the
> request if it is unknown), and by validating anything reported
> by the server before populating the block layer limits.
>
> [1] https://lists.debian.org/nbd/2018/03/msg00048.html
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
>
> The given URL for the NBD spec was v3; it will change to be a v4
> version of that patch in part to point back to this qemu commit
> as a proof of implementation.
> ---
>   include/block/nbd.h |   4 ++
>   block/nbd.c         |  15 ++++++-
>   nbd/client.c        | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>   nbd/trace-events    |   2 +
>   4 files changed, 131 insertions(+), 6 deletions(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index cbf51628f78..90ddef32bb3 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -270,6 +270,10 @@ struct NBDExportInfo {
>       uint32_t min_block;
>       uint32_t opt_block;
>       uint32_t max_block;
> +    uint32_t min_trim;
> +    uint32_t max_trim;
> +    uint32_t min_zero;
> +    uint32_t max_zero;
>
>       uint32_t meta_base_allocation_id;
>   };
> diff --git a/block/nbd.c b/block/nbd.c
> index 1e2b3ba2d3b..823d10b251d 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -478,8 +478,19 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>       uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);
>
>       bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
> -    bs->bl.max_pdiscard = max;
> -    bs->bl.max_pwrite_zeroes = max;
> +    if (s->info.max_trim) {
> +        bs->bl.max_pdiscard = MIN(s->info.max_trim, BDRV_REQUEST_MAX_BYTES);
> +    } else {
> +        bs->bl.max_pdiscard = max;
> +    }
> +    bs->bl.pdiscard_alignment = s->info.min_trim;
> +    if (s->info.max_zero) {
> +        bs->bl.max_pwrite_zeroes = MIN(s->info.max_zero,
> +                                       BDRV_REQUEST_MAX_BYTES);
> +    } else {
> +        bs->bl.max_pwrite_zeroes = max;
> +    }
> +    bs->bl.pwrite_zeroes_alignment = s->info.min_zero;
>       bs->bl.max_transfer = max;
>
>       if (s->info.opt_block &&
> diff --git a/nbd/client.c b/nbd/client.c
> index 0abb195b856..f1364747ba1 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -337,16 +337,18 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
>       info->flags = 0;
>
>       trace_nbd_opt_go_start(wantname);
> -    buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
> +    buf = g_malloc(4 + len + 2 + 3 * 2 * info->request_sizes + 1);

Hmm, what "+1" means?

>       stl_be_p(buf, len);
>       memcpy(buf + 4, wantname, len);
> -    /* At most one request, everything else up to server */
> -    stw_be_p(buf + 4 + len, info->request_sizes);
> +    /* Either 0 or 3 requests, everything else up to server */
> +    stw_be_p(buf + 4 + len, 3 * info->request_sizes);
>       if (info->request_sizes) {
>           stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
> +        stw_be_p(buf + 4 + len + 2 + 2, NBD_INFO_TRIM_SIZE);
> +        stw_be_p(buf + 4 + len + 2 + 2 + 2, NBD_INFO_ZERO_SIZE);
>       }
>       error = nbd_send_option_request(ioc, NBD_OPT_GO,
> -                                    4 + len + 2 + 2 * info->request_sizes,
> +                                    4 + len + 2 + 3 * 2 * info->request_sizes,
>                                       buf, errp);

magic chunk) Change looks correct. Is it worth introducing a buf_len 
variable, to
not retype it?

>       g_free(buf);
>       if (error < 0) {
> @@ -467,6 +469,72 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
>                                                info->max_block);
>               break;
>
> +        case NBD_INFO_TRIM_SIZE:
> +            if (len != sizeof(info->min_trim) * 2) {
> +                error_setg(errp, "remaining trim info len %" PRIu32
> +                           " is unexpected size", len);
> +                nbd_send_opt_abort(ioc);
> +                return -1;
> +            }
> +            if (nbd_read(ioc, &info->min_trim, sizeof(info->min_trim),
> +                         errp) < 0) {
> +                error_prepend(errp, "failed to read info minimum trim size: ");
> +                nbd_send_opt_abort(ioc);
> +                return -1;
> +            }
> +            be32_to_cpus(&info->min_trim);
> +            if (nbd_read(ioc, &info->max_trim, sizeof(info->max_trim),
> +                         errp) < 0) {
> +                error_prepend(errp,
> +                              "failed to read info maximum trim size: ");
> +                nbd_send_opt_abort(ioc);
> +                return -1;
> +            }
> +            be32_to_cpus(&info->max_trim);
> +            if (!info->min_trim || !info->max_trim ||
> +                (info->max_trim != UINT32_MAX &&
> +                 !QEMU_IS_ALIGNED(info->max_trim, info->min_trim))) {
> +                error_setg(errp, "server trim sizes %" PRIu32 "/%" PRIu32
> +                           " are not valid", info->min_trim, info->max_trim);
> +                nbd_send_opt_abort(ioc);
> +                return -1;
> +            }

Didn't you forget to check that min_trim is a power of two?

> +            trace_nbd_opt_go_info_trim_size(info->min_trim, info->max_trim);
> +            break;
> +
> +        case NBD_INFO_ZERO_SIZE:
> +            if (len != sizeof(info->min_zero) * 2) {
> +                error_setg(errp, "remaining zero info len %" PRIu32
> +                           " is unexpected size", len);
> +                nbd_send_opt_abort(ioc);
> +                return -1;
> +            }
> +            if (nbd_read(ioc, &info->min_zero, sizeof(info->min_zero),
> +                         errp) < 0) {
> +                error_prepend(errp, "failed to read info minimum zero size: ");
> +                nbd_send_opt_abort(ioc);
> +                return -1;
> +            }
> +            be32_to_cpus(&info->min_zero);
> +            if (nbd_read(ioc, &info->max_zero, sizeof(info->max_zero),
> +                         errp) < 0) {
> +                error_prepend(errp,
> +                              "failed to read info maximum zero size: ");
> +                nbd_send_opt_abort(ioc);
> +                return -1;
> +            }
> +            be32_to_cpus(&info->max_zero);
> +            if (!info->min_zero || !info->max_zero ||
> +                (info->max_zero != UINT32_MAX &&
> +                 !QEMU_IS_ALIGNED(info->max_zero, info->min_zero))) {
> +                error_setg(errp, "server zero sizes %" PRIu32 "/%" PRIu32
> +                           " are not valid", info->min_zero, info->max_zero);
> +                nbd_send_opt_abort(ioc);
> +                return -1;
> +            }
> +            trace_nbd_opt_go_info_zero_size(info->min_zero, info->max_zero);
> +            break;
> +

hmm, so, now nbd_opt_go() is full of very-very similar code parts, which 
may worth some refactoring now or in future.

Are we going to add similar limits for BLOCK_STATUS? and for CACHE? I 
have an idea: why not to make an universal option for it in
protocol? Something like

option INFO_CMD_SIZE:
   uint16_t command
   uint32_t min_block
   uint32_t max_block

and server can send several such options. Most of semantics for these 
additional limits are common, so it will simplify both documentation
and realization..

I'll copy this to nbd thread and it is better to discuss it in it.

>           default:
>               trace_nbd_opt_go_info_unknown(type, nbd_info_lookup(type));
>               if (nbd_drop(ioc, len, errp) < 0) {
> @@ -483,6 +551,46 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
>           error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
>           return -1;
>       }
> +    if (info->min_trim) {
> +        if (!info->min_block) {
> +            error_setg(errp, "broken server sent INFO_TRIM_SIZE without"
> +                       " INFO_BLOCK_SIZE");
> +            return -1;
> +        }
> +        if (info->min_trim < info->opt_block) {
> +            error_setg(errp, "broken server sent INFO_TRIM_SIZE with"
> +                       " minimum trim %" PRIu32 " less than preferred block %"
> +                       PRIu32, info->min_trim, info->opt_block);
> +            return -1;
> +        }
> +        if (info->max_trim < info->max_block) {
> +            error_setg(errp, "broken server sent INFO_TRIM_SIZE with"
> +                       " maximum trim %" PRIu32 " less than maximum block %"
> +                       PRIu32, info->max_trim, info->max_block);
> +            return -1;
> +        }
> +        info->max_trim = QEMU_ALIGN_DOWN(info->max_trim, info->min_block);
> +    }
> +    if (info->min_zero) {
> +        if (!info->min_block) {
> +            error_setg(errp, "broken server sent INFO_ZERO_SIZE without"
> +                       " INFO_BLOCK_SIZE");
> +            return -1;
> +        }
> +        if (info->min_zero < info->opt_block) {
> +            error_setg(errp, "broken server sent INFO_ZERO_SIZE with"
> +                       " minimum zero %" PRIu32 " less than preferred block %"
> +                       PRIu32, info->min_zero, info->opt_block);
> +            return -1;
> +        }
> +        if (info->max_zero < info->max_block) {
> +            error_setg(errp, "broken server sent INFO_ZERO_SIZE with"
> +                       " maximum zero %" PRIu32 " less than maximum block %"
> +                       PRIu32, info->max_zero, info->max_block);
> +            return -1;
> +        }
> +        info->max_zero = QEMU_ALIGN_DOWN(info->max_zero, info->min_block);
> +    }
>       trace_nbd_opt_go_success();
>       return 1;
>   }
> diff --git a/nbd/trace-events b/nbd/trace-events
> index dee081e7758..ddb9d0a2b3e 100644
> --- a/nbd/trace-events
> +++ b/nbd/trace-events
> @@ -6,6 +6,8 @@ nbd_opt_go_start(const char *name) "Attempting NBD_OPT_GO for export '%s'"
>   nbd_opt_go_success(void) "Export is good to go"
>   nbd_opt_go_info_unknown(int info, const char *name) "Ignoring unknown info %d (%s)"
>   nbd_opt_go_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t maximum) "Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32
> +nbd_opt_go_info_trim_size(uint32_t minimum, uint32_t maximum) "Trim sizes are 0x%" PRIx32 ", 0x%" PRIx32
> +nbd_opt_go_info_zero_size(uint32_t minimum, uint32_t maximum) "Zero sizes are 0x%" PRIx32 ", 0x%" PRIx32
>   nbd_receive_query_exports_start(const char *wantname) "Querying export list for '%s'"
>   nbd_receive_query_exports_success(const char *wantname) "Found desired export name '%s'"
>   nbd_receive_starttls_new_client(void) "Setting up TLS"


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
  2018-05-03  9:17   ` Vladimir Sementsov-Ogievskiy
@ 2018-05-03 14:49     ` Eric Blake
  2018-05-03 15:51       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Blake @ 2018-05-03 14:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz

On 05/03/2018 04:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> 02.05.2018 00:13, Eric Blake wrote:
>> The NBD spec is clarifying [1] that a server may want to advertise
>> different limits for READ/WRITE (in our case, 32M) than for
>> TRIM/ZERO (in our case, nearly 4G).  Implement the client
>> side support for these alternate limits, by always requesting
>> the new information (a compliant server must ignore the
>> request if it is unknown), and by validating anything reported
>> by the server before populating the block layer limits.
>>
>> [1] https://lists.debian.org/nbd/2018/03/msg00048.html
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>> +++ b/nbd/client.c
>> @@ -337,16 +337,18 @@ static int nbd_opt_go(QIOChannel *ioc, const 
>> char *wantname,
>>       info->flags = 0;
>>
>>       trace_nbd_opt_go_start(wantname);
>> -    buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
>> +    buf = g_malloc(4 + len + 2 + 3 * 2 * info->request_sizes + 1);
> 
> Hmm, what "+1" means?

Doesn't appear to be necessary, but a little slop never hurts.  Better 
might be struct-ifying things rather than open-coding offsets, or even 
using a vectored approach rather than requiring a single buffer.  But if 
useful, that's refactoring that is independent of this patch, while this 
is just demonstrating the bare minimum implementation of the new feature.

>>       stl_be_p(buf, len);
>>       memcpy(buf + 4, wantname, len);
>> -    /* At most one request, everything else up to server */
>> -    stw_be_p(buf + 4 + len, info->request_sizes);
>> +    /* Either 0 or 3 requests, everything else up to server */
>> +    stw_be_p(buf + 4 + len, 3 * info->request_sizes);
>>       if (info->request_sizes) {
>>           stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
>> +        stw_be_p(buf + 4 + len + 2 + 2, NBD_INFO_TRIM_SIZE);
>> +        stw_be_p(buf + 4 + len + 2 + 2 + 2, NBD_INFO_ZERO_SIZE);
>>       }
>>       error = nbd_send_option_request(ioc, NBD_OPT_GO,
>> -                                    4 + len + 2 + 2 * 
>> info->request_sizes,
>> +                                    4 + len + 2 + 3 * 2 * 
>> info->request_sizes,
>>                                       buf, errp);
> 
> magic chunk) Change looks correct. Is it worth introducing a buf_len 
> variable, to
> not retype it?

And increment as we go?  Yeah, it might make things more legible.


>> +            be32_to_cpus(&info->max_trim);
>> +            if (!info->min_trim || !info->max_trim ||
>> +                (info->max_trim != UINT32_MAX &&
>> +                 !QEMU_IS_ALIGNED(info->max_trim, info->min_trim))) {
>> +                error_setg(errp, "server trim sizes %" PRIu32 "/%" 
>> PRIu32
>> +                           " are not valid", info->min_trim, 
>> info->max_trim);
>> +                nbd_send_opt_abort(ioc);
>> +                return -1;
>> +            }
> 
> Didn't you forget to check that min_trim is a power of two?

Nope. The NBD spec wording specifically uses "The minimum trim size 
SHOULD be a power of 2, and MUST be at least as large as the preferred 
block size advertised in `NBD_INFO_BLOCK_SIZE`", in part because there 
ARE existing hardware iscsi devices that have an advertised 
preferred/maximum trim size of exactly 15 megabytes (you can request 
smaller trims 1M at a time, and the device then caches things to 
eventually report unmapped slices at a 15M boundary).  For more 
information on this odd hardware, look in the qemu logs for the Dell 
Equallogic device, such as commit 3482b9bc.

Since the NBD 'min trim' is advisory, and merely telling the client the 
ideal alignments to use for a trim most likely to have an effect (on 
Equallogic, a 1M trim by itself might not trim anything unless 
neighboring areas are also trimmed, while an aligned 15M trim is 
immediately observable), an NBD server wrapping such an iscsi device may 
prefer to mirror the hardware's preferred alignment of 15M, rather than 
inventing a minimum alignment of 1M, in what it advertises in 
NBD_INFO_TRIM_SIZE.

And maybe I should tweak the NBD spec addition to call things "preferred 
trim/zero" rather than "min trim/zero", if that makes things any easier 
to grasp on first read, and better matches the iscsi concept.

> 
> hmm, so, now nbd_opt_go() is full of very-very similar code parts, which 
> may worth some refactoring now or in future.
> 
> Are we going to add similar limits for BLOCK_STATUS? and for CACHE? I 
> have an idea: why not to make an universal option for it in
> protocol? Something like
> 
> option INFO_CMD_SIZE:
>    uint16_t command
>    uint32_t min_block
>    uint32_t max_block
> 
> and server can send several such options. Most of semantics for these 
> additional limits are common, so it will simplify both documentation
> and realization..
> 
> I'll copy this to nbd thread and it is better to discuss it in it.

Yep, I've followed up on that thread, but will post a v2 of this 
proposal along those lines.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
  2018-05-03 14:49     ` Eric Blake
@ 2018-05-03 15:51       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-03 15:51 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz

03.05.2018 17:49, Eric Blake wrote:
> On 05/03/2018 04:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 02.05.2018 00:13, Eric Blake wrote:
>>> The NBD spec is clarifying [1] that a server may want to advertise
>>> different limits for READ/WRITE (in our case, 32M) than for
>>> TRIM/ZERO (in our case, nearly 4G).  Implement the client
>>> side support for these alternate limits, by always requesting
>>> the new information (a compliant server must ignore the
>>> request if it is unknown), and by validating anything reported
>>> by the server before populating the block layer limits.
>>>
>>> [1] https://lists.debian.org/nbd/2018/03/msg00048.html
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>
>
>>> +++ b/nbd/client.c
>>> @@ -337,16 +337,18 @@ static int nbd_opt_go(QIOChannel *ioc, const 
>>> char *wantname,
>>>       info->flags = 0;
>>>
>>>       trace_nbd_opt_go_start(wantname);
>>> -    buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
>>> +    buf = g_malloc(4 + len + 2 + 3 * 2 * info->request_sizes + 1);
>>
>> Hmm, what "+1" means?
>
> Doesn't appear to be necessary, but a little slop never hurts. Better 
> might be struct-ifying things rather than open-coding offsets, or even 
> using a vectored approach rather than requiring a single buffer.  But 
> if useful, that's refactoring that is independent of this patch, while 
> this is just demonstrating the bare minimum implementation of the new 
> feature.
>
>>>       stl_be_p(buf, len);
>>>       memcpy(buf + 4, wantname, len);
>>> -    /* At most one request, everything else up to server */
>>> -    stw_be_p(buf + 4 + len, info->request_sizes);
>>> +    /* Either 0 or 3 requests, everything else up to server */
>>> +    stw_be_p(buf + 4 + len, 3 * info->request_sizes);
>>>       if (info->request_sizes) {
>>>           stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
>>> +        stw_be_p(buf + 4 + len + 2 + 2, NBD_INFO_TRIM_SIZE);
>>> +        stw_be_p(buf + 4 + len + 2 + 2 + 2, NBD_INFO_ZERO_SIZE);
>>>       }
>>>       error = nbd_send_option_request(ioc, NBD_OPT_GO,
>>> -                                    4 + len + 2 + 2 * 
>>> info->request_sizes,
>>> +                                    4 + len + 2 + 3 * 2 * 
>>> info->request_sizes,
>>>                                       buf, errp);
>>
>> magic chunk) Change looks correct. Is it worth introducing a buf_len 
>> variable, to
>> not retype it?
>
> And increment as we go?  Yeah, it might make things more legible.
>
>
>>> + be32_to_cpus(&info->max_trim);
>>> +            if (!info->min_trim || !info->max_trim ||
>>> +                (info->max_trim != UINT32_MAX &&
>>> +                 !QEMU_IS_ALIGNED(info->max_trim, info->min_trim))) {
>>> +                error_setg(errp, "server trim sizes %" PRIu32 "/%" 
>>> PRIu32
>>> +                           " are not valid", info->min_trim, 
>>> info->max_trim);
>>> +                nbd_send_opt_abort(ioc);
>>> +                return -1;
>>> +            }
>>
>> Didn't you forget to check that min_trim is a power of two?
>
> Nope. The NBD spec wording specifically uses "The minimum trim size 
> SHOULD be a power of 2, and MUST be at least as large as the preferred 
> block size advertised in `NBD_INFO_BLOCK_SIZE`", in part because there 
> ARE existing hardware iscsi devices that have an advertised 
> preferred/maximum trim size of exactly 15 megabytes (you can request 
> smaller trims 1M at a time, and the device then caches things to 
> eventually report unmapped slices at a 15M boundary).  For more 
> information on this odd hardware, look in the qemu logs for the Dell 
> Equallogic device, such as commit 3482b9bc.

Ok. Interesting, thank you.

>
> Since the NBD 'min trim' is advisory, and merely telling the client 
> the ideal alignments to use for a trim most likely to have an effect 
> (on Equallogic, a 1M trim by itself might not trim anything unless 
> neighboring areas are also trimmed, while an aligned 15M trim is 
> immediately observable), an NBD server wrapping such an iscsi device 
> may prefer to mirror the hardware's preferred alignment of 15M, rather 
> than inventing a minimum alignment of 1M, in what it advertises in 
> NBD_INFO_TRIM_SIZE.
>
> And maybe I should tweak the NBD spec addition to call things 
> "preferred trim/zero" rather than "min trim/zero", if that makes 
> things any easier to grasp on first read, and better matches the iscsi 
> concept.

good idea. and it will better correspond to pref_block for 
INFO_BLOCK_SIZE, as they are compared with it.

>
>>
>> hmm, so, now nbd_opt_go() is full of very-very similar code parts, 
>> which may worth some refactoring now or in future.
>>
>> Are we going to add similar limits for BLOCK_STATUS? and for CACHE? I 
>> have an idea: why not to make an universal option for it in
>> protocol? Something like
>>
>> option INFO_CMD_SIZE:
>>    uint16_t command
>>    uint32_t min_block
>>    uint32_t max_block
>>
>> and server can send several such options. Most of semantics for these 
>> additional limits are common, so it will simplify both documentation
>> and realization..
>>
>> I'll copy this to nbd thread and it is better to discuss it in it.
>
> Yep, I've followed up on that thread, but will post a v2 of this 
> proposal along those lines.
>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
  2018-05-01 21:13 ` [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info Eric Blake
  2018-05-03  9:17   ` Vladimir Sementsov-Ogievskiy
@ 2018-05-22 15:33   ` Vladimir Sementsov-Ogievskiy
  2018-05-23 15:40     ` Eric Blake
  1 sibling, 1 reply; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-22 15:33 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz

02.05.2018 00:13, Eric Blake wrote:
> The NBD spec is clarifying [1] that a server may want to advertise
> different limits for READ/WRITE (in our case, 32M) than for
> TRIM/ZERO (in our case, nearly 4G).  Implement the client
> side support for these alternate limits, by always requesting
> the new information (a compliant server must ignore the
> request if it is unknown), and by validating anything reported
> by the server before populating the block layer limits.
>
> [1] https://lists.debian.org/nbd/2018/03/msg00048.html
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
>
> The given URL for the NBD spec was v3; it will change to be a v4
> version of that patch in part to point back to this qemu commit
> as a proof of implementation.
> ---
>   include/block/nbd.h |   4 ++
>   block/nbd.c         |  15 ++++++-
>   nbd/client.c        | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>   nbd/trace-events    |   2 +
>   4 files changed, 131 insertions(+), 6 deletions(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index cbf51628f78..90ddef32bb3 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -270,6 +270,10 @@ struct NBDExportInfo {
>       uint32_t min_block;
>       uint32_t opt_block;
>       uint32_t max_block;
> +    uint32_t min_trim;
> +    uint32_t max_trim;
> +    uint32_t min_zero;
> +    uint32_t max_zero;
>
>       uint32_t meta_base_allocation_id;
>   };
> diff --git a/block/nbd.c b/block/nbd.c
> index 1e2b3ba2d3b..823d10b251d 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -478,8 +478,19 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>       uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);
>
>       bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
> -    bs->bl.max_pdiscard = max;
> -    bs->bl.max_pwrite_zeroes = max;
> +    if (s->info.max_trim) {
> +        bs->bl.max_pdiscard = MIN(s->info.max_trim, BDRV_REQUEST_MAX_BYTES);
> +    } else {
> +        bs->bl.max_pdiscard = max;
> +    }
> +    bs->bl.pdiscard_alignment = s->info.min_trim;
> +    if (s->info.max_zero) {
> +        bs->bl.max_pwrite_zeroes = MIN(s->info.max_zero,
> +                                       BDRV_REQUEST_MAX_BYTES);
> +    } else {
> +        bs->bl.max_pwrite_zeroes = max;
> +    }
> +    bs->bl.pwrite_zeroes_alignment = s->info.min_zero;
>       bs->bl.max_transfer = max;

Should not max_transfer be updated too? Looks like it limits all 
requests, is it right?

Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
  2018-05-22 15:33   ` Vladimir Sementsov-Ogievskiy
@ 2018-05-23 15:40     ` Eric Blake
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Blake @ 2018-05-23 15:40 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz

On 05/22/2018 10:33 AM, Vladimir Sementsov-Ogievskiy wrote:
> 02.05.2018 00:13, Eric Blake wrote:
>> The NBD spec is clarifying [1] that a server may want to advertise
>> different limits for READ/WRITE (in our case, 32M) than for
>> TRIM/ZERO (in our case, nearly 4G).  Implement the client
>> side support for these alternate limits, by always requesting
>> the new information (a compliant server must ignore the
>> request if it is unknown), and by validating anything reported
>> by the server before populating the block layer limits.

I still need to do a v2 of this series based on feedback from the NBD 
list, but answering your question in the meantime:


>>       bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
>> -    bs->bl.max_pdiscard = max;
>> -    bs->bl.max_pwrite_zeroes = max;
>> +    if (s->info.max_trim) {
>> +        bs->bl.max_pdiscard = MIN(s->info.max_trim, 
>> BDRV_REQUEST_MAX_BYTES);
>> +    } else {
>> +        bs->bl.max_pdiscard = max;
>> +    }
>> +    bs->bl.pdiscard_alignment = s->info.min_trim;
>> +    if (s->info.max_zero) {
>> +        bs->bl.max_pwrite_zeroes = MIN(s->info.max_zero,
>> +                                       BDRV_REQUEST_MAX_BYTES);
>> +    } else {
>> +        bs->bl.max_pwrite_zeroes = max;
>> +    }
>> +    bs->bl.pwrite_zeroes_alignment = s->info.min_zero;
>>       bs->bl.max_transfer = max;
> 
> Should not max_transfer be updated too? Looks like it limits all 
> requests, is it right?

max_transfer affects read and write requests, but not write_zero 
requests (where max_pwrite_zeroes is used instead) or trim requests 
(where max_pdiscard is used instead).  Which is precisely the semantics 
we want - the NBD extension is mirroring the fact that qemu already has 
a way to track independent limits for trim/zero that can be larger than 
the normal limits for read/write, because trim/zero do not involve a 
transfer of a large data buffer.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-05-23 15:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-01 21:13 [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES Eric Blake
2018-05-01 21:13 ` [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info Eric Blake
2018-05-03  8:20   ` Vladimir Sementsov-Ogievskiy
2018-05-01 21:13 ` [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits Eric Blake
2018-05-03  8:29   ` Vladimir Sementsov-Ogievskiy
2018-05-01 21:13 ` [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info Eric Blake
2018-05-03  9:17   ` Vladimir Sementsov-Ogievskiy
2018-05-03 14:49     ` Eric Blake
2018-05-03 15:51       ` Vladimir Sementsov-Ogievskiy
2018-05-22 15:33   ` Vladimir Sementsov-Ogievskiy
2018-05-23 15:40     ` Eric Blake
2018-05-01 21:13 ` [Qemu-devel] [PATCH 4/4] nbd/server: " Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.