* [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES
@ 2018-05-01 21:13 Eric Blake
2018-05-01 21:13 ` [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info Eric Blake
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Eric Blake @ 2018-05-01 21:13 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, edgar.kaziakhmedov, vsementsov
This is my counterproposal for:
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02107.html
where Edgar pointed out that NBD write zero requests are rather
inefficient because the NBD spec was ambiguous on whether we could
exceed an advertised maximum request size even when the command
carries no payload, and so for maximum compatibility with unknown
NBD partners, we erred on the conservative side.
The NBD spec has since been improved to mention that a client can
try a larger WRITE_ZEROES if it is prepared to handle an EINVAL, and
that a server should not hang up on overlarge requests that did not
include a payload. But in the process it was pointed out that even
better would be improving the spec to advertise both limits (the
READ/WRITE payload limit, and the larger TRIM/ZERO limit), as done
in this series.
With this in place, qemu as both NBD client and server can send
zero requests up to BDRV_REQUEST_MAX_BYTES (nearly 2G) in length,
rather than having to subdivide the request into 32M chunks.
This series is experimental until the NBD spec is actually modified
to document the feature added here; but that modification will be
easier now that I have a proof-of-concept implementation.
Eric Blake (4):
nbd: Prepare for additional block sizing info
nbd/client: Refactor in preparation for more limits
nbd/client: Support requests of additional block sizing info
nbd/server: Support requests of additional block sizing info
include/block/nbd.h | 8 +++-
block/nbd.c | 15 +++++-
nbd/client.c | 135 +++++++++++++++++++++++++++++++++++++++++++++++-----
nbd/common.c | 4 ++
nbd/server.c | 30 ++++++++++++
nbd/trace-events | 4 ++
6 files changed, 181 insertions(+), 15 deletions(-)
--
2.14.3
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info
2018-05-01 21:13 [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES Eric Blake
@ 2018-05-01 21:13 ` Eric Blake
2018-05-03 8:20 ` Vladimir Sementsov-Ogievskiy
2018-05-01 21:13 ` [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits Eric Blake
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Eric Blake @ 2018-05-01 21:13 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, edgar.kaziakhmedov, vsementsov, Paolo Bonzini,
Kevin Wolf, Max Reitz
The NBD spec is clarifying [1] that a server may want to advertise
different limits for READ/WRITE (in our case, 32M) than for
TRIM/ZERO (in our case, nearly 4G). Add the constants and name
lookups for new NBD_INFO_ fields used during handshake to convey
this additional information. Note that the NBD spec already
requires servers to ignore unknown requests from the client, and
for clients to ignore unknown gratuitous responses sent from
the server.
[1] https://lists.debian.org/nbd/2018/03/msg00048.html
Signed-off-by: Eric Blake <eblake@redhat.com>
---
The given URL for the NBD spec was v3; it will change to be a v4
version of that patch in part to point back to this qemu commit
as a proof of implementation.
---
include/block/nbd.h | 4 +++-
nbd/common.c | 4 ++++
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/include/block/nbd.h b/include/block/nbd.h
index fcdcd545023..cbf51628f78 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Red Hat, Inc.
+ * Copyright (C) 2016-2018 Red Hat, Inc.
* Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws>
*
* Network Block Device
@@ -180,6 +180,8 @@ typedef struct NBDExtent {
#define NBD_INFO_NAME 1
#define NBD_INFO_DESCRIPTION 2
#define NBD_INFO_BLOCK_SIZE 3
+#define NBD_INFO_TRIM_SIZE 4
+#define NBD_INFO_ZERO_SIZE 5
/* Request flags, sent from client to server during transmission phase */
#define NBD_CMD_FLAG_FUA (1 << 0) /* 'force unit access' during write */
diff --git a/nbd/common.c b/nbd/common.c
index 8c95c1d606e..840c91d5ca4 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -129,6 +129,10 @@ const char *nbd_info_lookup(uint16_t info)
return "description";
case NBD_INFO_BLOCK_SIZE:
return "block size";
+ case NBD_INFO_TRIM_SIZE:
+ return "trim size";
+ case NBD_INFO_ZERO_SIZE:
+ return "zero size";
default:
return "<unknown>";
}
--
2.14.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits
2018-05-01 21:13 [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES Eric Blake
2018-05-01 21:13 ` [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info Eric Blake
@ 2018-05-01 21:13 ` Eric Blake
2018-05-03 8:29 ` Vladimir Sementsov-Ogievskiy
2018-05-01 21:13 ` [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info Eric Blake
2018-05-01 21:13 ` [Qemu-devel] [PATCH 4/4] nbd/server: " Eric Blake
3 siblings, 1 reply; 12+ messages in thread
From: Eric Blake @ 2018-05-01 21:13 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, edgar.kaziakhmedov, vsementsov, Paolo Bonzini
The next patch will ask the server for more items of NBD_INFO.
However, the server is free to respond with INFO items in a
different order than what we request, so performing any sanity
checks about constraints that occur between multiple INFO items
must be done after all items have been received. Make it easier
to see that such checks are last by sinking the final processing
after the while loop, rather than embedded in the NBD_REP_ACK
processing that occurs textually within the loop before other
INFO processing.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
nbd/client.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/nbd/client.c b/nbd/client.c
index 232ff4f46da..0abb195b856 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Red Hat, Inc.
+ * Copyright (C) 2016-2018 Red Hat, Inc.
* Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws>
*
* Network Block Device Client Side
@@ -365,17 +365,12 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
if (reply.type == NBD_REP_ACK) {
/* Server is done sending info and moved into transmission
- phase, but make sure it sent flags */
+ phase */
if (len) {
error_setg(errp, "server sent invalid NBD_REP_ACK");
return -1;
}
- if (!info->flags) {
- error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
- return -1;
- }
- trace_nbd_opt_go_success();
- return 1;
+ break;
}
if (reply.type != NBD_REP_INFO) {
error_setg(errp, "unexpected reply type %" PRIu32
@@ -482,6 +477,14 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
break;
}
}
+
+ /* Sanity check that server's responses make sense */
+ if (!info->flags) {
+ error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
+ return -1;
+ }
+ trace_nbd_opt_go_success();
+ return 1;
}
/* Return -1 on failure, 0 if wantname is an available export. */
--
2.14.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
2018-05-01 21:13 [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES Eric Blake
2018-05-01 21:13 ` [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info Eric Blake
2018-05-01 21:13 ` [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits Eric Blake
@ 2018-05-01 21:13 ` Eric Blake
2018-05-03 9:17 ` Vladimir Sementsov-Ogievskiy
2018-05-22 15:33 ` Vladimir Sementsov-Ogievskiy
2018-05-01 21:13 ` [Qemu-devel] [PATCH 4/4] nbd/server: " Eric Blake
3 siblings, 2 replies; 12+ messages in thread
From: Eric Blake @ 2018-05-01 21:13 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, edgar.kaziakhmedov, vsementsov, Paolo Bonzini,
Kevin Wolf, Max Reitz
The NBD spec is clarifying [1] that a server may want to advertise
different limits for READ/WRITE (in our case, 32M) than for
TRIM/ZERO (in our case, nearly 4G). Implement the client
side support for these alternate limits, by always requesting
the new information (a compliant server must ignore the
request if it is unknown), and by validating anything reported
by the server before populating the block layer limits.
[1] https://lists.debian.org/nbd/2018/03/msg00048.html
Signed-off-by: Eric Blake <eblake@redhat.com>
---
The given URL for the NBD spec was v3; it will change to be a v4
version of that patch in part to point back to this qemu commit
as a proof of implementation.
---
include/block/nbd.h | 4 ++
block/nbd.c | 15 ++++++-
nbd/client.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++--
nbd/trace-events | 2 +
4 files changed, 131 insertions(+), 6 deletions(-)
diff --git a/include/block/nbd.h b/include/block/nbd.h
index cbf51628f78..90ddef32bb3 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -270,6 +270,10 @@ struct NBDExportInfo {
uint32_t min_block;
uint32_t opt_block;
uint32_t max_block;
+ uint32_t min_trim;
+ uint32_t max_trim;
+ uint32_t min_zero;
+ uint32_t max_zero;
uint32_t meta_base_allocation_id;
};
diff --git a/block/nbd.c b/block/nbd.c
index 1e2b3ba2d3b..823d10b251d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -478,8 +478,19 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);
bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
- bs->bl.max_pdiscard = max;
- bs->bl.max_pwrite_zeroes = max;
+ if (s->info.max_trim) {
+ bs->bl.max_pdiscard = MIN(s->info.max_trim, BDRV_REQUEST_MAX_BYTES);
+ } else {
+ bs->bl.max_pdiscard = max;
+ }
+ bs->bl.pdiscard_alignment = s->info.min_trim;
+ if (s->info.max_zero) {
+ bs->bl.max_pwrite_zeroes = MIN(s->info.max_zero,
+ BDRV_REQUEST_MAX_BYTES);
+ } else {
+ bs->bl.max_pwrite_zeroes = max;
+ }
+ bs->bl.pwrite_zeroes_alignment = s->info.min_zero;
bs->bl.max_transfer = max;
if (s->info.opt_block &&
diff --git a/nbd/client.c b/nbd/client.c
index 0abb195b856..f1364747ba1 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -337,16 +337,18 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
info->flags = 0;
trace_nbd_opt_go_start(wantname);
- buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
+ buf = g_malloc(4 + len + 2 + 3 * 2 * info->request_sizes + 1);
stl_be_p(buf, len);
memcpy(buf + 4, wantname, len);
- /* At most one request, everything else up to server */
- stw_be_p(buf + 4 + len, info->request_sizes);
+ /* Either 0 or 3 requests, everything else up to server */
+ stw_be_p(buf + 4 + len, 3 * info->request_sizes);
if (info->request_sizes) {
stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
+ stw_be_p(buf + 4 + len + 2 + 2, NBD_INFO_TRIM_SIZE);
+ stw_be_p(buf + 4 + len + 2 + 2 + 2, NBD_INFO_ZERO_SIZE);
}
error = nbd_send_option_request(ioc, NBD_OPT_GO,
- 4 + len + 2 + 2 * info->request_sizes,
+ 4 + len + 2 + 3 * 2 * info->request_sizes,
buf, errp);
g_free(buf);
if (error < 0) {
@@ -467,6 +469,72 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
info->max_block);
break;
+ case NBD_INFO_TRIM_SIZE:
+ if (len != sizeof(info->min_trim) * 2) {
+ error_setg(errp, "remaining trim info len %" PRIu32
+ " is unexpected size", len);
+ nbd_send_opt_abort(ioc);
+ return -1;
+ }
+ if (nbd_read(ioc, &info->min_trim, sizeof(info->min_trim),
+ errp) < 0) {
+ error_prepend(errp, "failed to read info minimum trim size: ");
+ nbd_send_opt_abort(ioc);
+ return -1;
+ }
+ be32_to_cpus(&info->min_trim);
+ if (nbd_read(ioc, &info->max_trim, sizeof(info->max_trim),
+ errp) < 0) {
+ error_prepend(errp,
+ "failed to read info maximum trim size: ");
+ nbd_send_opt_abort(ioc);
+ return -1;
+ }
+ be32_to_cpus(&info->max_trim);
+ if (!info->min_trim || !info->max_trim ||
+ (info->max_trim != UINT32_MAX &&
+ !QEMU_IS_ALIGNED(info->max_trim, info->min_trim))) {
+ error_setg(errp, "server trim sizes %" PRIu32 "/%" PRIu32
+ " are not valid", info->min_trim, info->max_trim);
+ nbd_send_opt_abort(ioc);
+ return -1;
+ }
+ trace_nbd_opt_go_info_trim_size(info->min_trim, info->max_trim);
+ break;
+
+ case NBD_INFO_ZERO_SIZE:
+ if (len != sizeof(info->min_zero) * 2) {
+ error_setg(errp, "remaining zero info len %" PRIu32
+ " is unexpected size", len);
+ nbd_send_opt_abort(ioc);
+ return -1;
+ }
+ if (nbd_read(ioc, &info->min_zero, sizeof(info->min_zero),
+ errp) < 0) {
+ error_prepend(errp, "failed to read info minimum zero size: ");
+ nbd_send_opt_abort(ioc);
+ return -1;
+ }
+ be32_to_cpus(&info->min_zero);
+ if (nbd_read(ioc, &info->max_zero, sizeof(info->max_zero),
+ errp) < 0) {
+ error_prepend(errp,
+ "failed to read info maximum zero size: ");
+ nbd_send_opt_abort(ioc);
+ return -1;
+ }
+ be32_to_cpus(&info->max_zero);
+ if (!info->min_zero || !info->max_zero ||
+ (info->max_zero != UINT32_MAX &&
+ !QEMU_IS_ALIGNED(info->max_zero, info->min_zero))) {
+ error_setg(errp, "server zero sizes %" PRIu32 "/%" PRIu32
+ " are not valid", info->min_zero, info->max_zero);
+ nbd_send_opt_abort(ioc);
+ return -1;
+ }
+ trace_nbd_opt_go_info_zero_size(info->min_zero, info->max_zero);
+ break;
+
default:
trace_nbd_opt_go_info_unknown(type, nbd_info_lookup(type));
if (nbd_drop(ioc, len, errp) < 0) {
@@ -483,6 +551,46 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
return -1;
}
+ if (info->min_trim) {
+ if (!info->min_block) {
+ error_setg(errp, "broken server sent INFO_TRIM_SIZE without"
+ " INFO_BLOCK_SIZE");
+ return -1;
+ }
+ if (info->min_trim < info->opt_block) {
+ error_setg(errp, "broken server sent INFO_TRIM_SIZE with"
+ " minimum trim %" PRIu32 " less than preferred block %"
+ PRIu32, info->min_trim, info->opt_block);
+ return -1;
+ }
+ if (info->max_trim < info->max_block) {
+ error_setg(errp, "broken server sent INFO_TRIM_SIZE with"
+ " maximum trim %" PRIu32 " less than maximum block %"
+ PRIu32, info->max_trim, info->max_block);
+ return -1;
+ }
+ info->max_trim = QEMU_ALIGN_DOWN(info->max_trim, info->min_block);
+ }
+ if (info->min_zero) {
+ if (!info->min_block) {
+ error_setg(errp, "broken server sent INFO_ZERO_SIZE without"
+ " INFO_BLOCK_SIZE");
+ return -1;
+ }
+ if (info->min_zero < info->opt_block) {
+ error_setg(errp, "broken server sent INFO_ZERO_SIZE with"
+ " minimum zero %" PRIu32 " less than preferred block %"
+ PRIu32, info->min_zero, info->opt_block);
+ return -1;
+ }
+ if (info->max_zero < info->max_block) {
+ error_setg(errp, "broken server sent INFO_ZERO_SIZE with"
+ " maximum zero %" PRIu32 " less than maximum block %"
+ PRIu32, info->max_zero, info->max_block);
+ return -1;
+ }
+ info->max_zero = QEMU_ALIGN_DOWN(info->max_zero, info->min_block);
+ }
trace_nbd_opt_go_success();
return 1;
}
diff --git a/nbd/trace-events b/nbd/trace-events
index dee081e7758..ddb9d0a2b3e 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -6,6 +6,8 @@ nbd_opt_go_start(const char *name) "Attempting NBD_OPT_GO for export '%s'"
nbd_opt_go_success(void) "Export is good to go"
nbd_opt_go_info_unknown(int info, const char *name) "Ignoring unknown info %d (%s)"
nbd_opt_go_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t maximum) "Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32
+nbd_opt_go_info_trim_size(uint32_t minimum, uint32_t maximum) "Trim sizes are 0x%" PRIx32 ", 0x%" PRIx32
+nbd_opt_go_info_zero_size(uint32_t minimum, uint32_t maximum) "Zero sizes are 0x%" PRIx32 ", 0x%" PRIx32
nbd_receive_query_exports_start(const char *wantname) "Querying export list for '%s'"
nbd_receive_query_exports_success(const char *wantname) "Found desired export name '%s'"
nbd_receive_starttls_new_client(void) "Setting up TLS"
--
2.14.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 4/4] nbd/server: Support requests of additional block sizing info
2018-05-01 21:13 [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES Eric Blake
` (2 preceding siblings ...)
2018-05-01 21:13 ` [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info Eric Blake
@ 2018-05-01 21:13 ` Eric Blake
3 siblings, 0 replies; 12+ messages in thread
From: Eric Blake @ 2018-05-01 21:13 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, edgar.kaziakhmedov, vsementsov, Paolo Bonzini
The NBD spec is clarifying [1] that a server may want to advertise
different limits for READ/WRITE (in our case, 32M) than for
TRIM/ZERO (in our case, nearly 4G). Implement the server
side support for these alternate limits, by always advertising
the new information (a compliant client must ignore a gratuitous
advertisement if it is unknown).
[1] https://lists.debian.org/nbd/2018/03/msg00048.html
Signed-off-by: Eric Blake <eblake@redhat.com>
---
The given URL for the NBD spec was v3; it will change to be a v4
version of that patch in part to point back to this qemu commit
as a proof of implementation.
---
nbd/server.c | 30 ++++++++++++++++++++++++++++++
nbd/trace-events | 2 ++
2 files changed, 32 insertions(+)
diff --git a/nbd/server.c b/nbd/server.c
index 9e1f2271784..39502b58446 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -617,6 +617,36 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint16_t myflags,
return rc;
}
+ /* Send NBD_INFO_TRIM_SIZE always. */
+ /* minimum - Hard-code to 4096 for now, matching preferred block.
+ * TODO: consult blk_bs(blk)->bl.pdiscard_alignment? */
+ sizes[0] = 4096;
+ /* maximum - < 2G (then block layer fragments to bl.max_pdiscard). */
+ sizes[1] = QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES, sizes[0]);
+ trace_nbd_negotiate_handle_info_trim_size(sizes[0], sizes[1]);
+ cpu_to_be32s(&sizes[0]);
+ cpu_to_be32s(&sizes[1]);
+ rc = nbd_negotiate_send_info(client, NBD_INFO_TRIM_SIZE,
+ 2 * sizeof(sizes[0]), sizes, errp);
+ if (rc < 0) {
+ return rc;
+ }
+
+ /* Send NBD_INFO_ZERO_SIZE always. */
+ /* minimum - Hard-code to 4096 for now, matching preferred block.
+ * TODO: consult blk_bs(blk)->bl.pwrite_zeroes_alignment? */
+ sizes[0] = 4096;
+ /* maximum - < 2G (then block layer fragments to bl.max_pwrite_zeroes). */
+ sizes[1] = QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES, sizes[0]);
+ trace_nbd_negotiate_handle_info_zero_size(sizes[0], sizes[1]);
+ cpu_to_be32s(&sizes[0]);
+ cpu_to_be32s(&sizes[1]);
+ rc = nbd_negotiate_send_info(client, NBD_INFO_ZERO_SIZE,
+ 2 * sizeof(sizes[0]), sizes, errp);
+ if (rc < 0) {
+ return rc;
+ }
+
/* Send NBD_INFO_EXPORT always */
trace_nbd_negotiate_new_style_size_flags(exp->size,
exp->nbdflags | myflags);
diff --git a/nbd/trace-events b/nbd/trace-events
index ddb9d0a2b3e..d8849c43b21 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -46,6 +46,8 @@ nbd_negotiate_send_info(int info, const char *name, uint32_t length) "Sending NB
nbd_negotiate_handle_info_requests(int requests) "Client requested %d items of info"
nbd_negotiate_handle_info_request(int request, const char *name) "Client requested info %d (%s)"
nbd_negotiate_handle_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t maximum) "advertising minimum 0x%" PRIx32 ", preferred 0x%" PRIx32 ", maximum 0x%" PRIx32
+nbd_negotiate_handle_info_trim_size(uint32_t minimum, uint32_t maximum) "advertising minimum 0x%" PRIx32 ", maximum 0x%" PRIx32
+nbd_negotiate_handle_info_zero_size(uint32_t minimum, uint32_t maximum) "advertising minimum 0x%" PRIx32 ", maximum 0x%" PRIx32
nbd_negotiate_handle_starttls(void) "Setting up TLS"
nbd_negotiate_handle_starttls_handshake(void) "Starting TLS handshake"
nbd_negotiate_meta_context(const char *optname, const char *export, uint32_t queries) "Client requested %s for export %s, with %" PRIu32 " queries"
--
2.14.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info
2018-05-01 21:13 ` [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info Eric Blake
@ 2018-05-03 8:20 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-03 8:20 UTC (permalink / raw)
To: Eric Blake, qemu-devel
Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz
02.05.2018 00:13, Eric Blake wrote:
> The NBD spec is clarifying [1] that a server may want to advertise
> different limits for READ/WRITE (in our case, 32M) than for
> TRIM/ZERO (in our case, nearly 4G). Add the constants and name
> lookups for new NBD_INFO_ fields used during handshake to convey
> this additional information. Note that the NBD spec already
> requires servers to ignore unknown requests from the client, and
> for clients to ignore unknown gratuitous responses sent from
> the server.
>
> [1] https://lists.debian.org/nbd/2018/03/msg00048.html
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>
> ---
>
> The given URL for the NBD spec was v3; it will change to be a v4
> version of that patch in part to point back to this qemu commit
> as a proof of implementation.
> ---
> include/block/nbd.h | 4 +++-
> nbd/common.c | 4 ++++
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index fcdcd545023..cbf51628f78 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (C) 2016-2017 Red Hat, Inc.
> + * Copyright (C) 2016-2018 Red Hat, Inc.
> * Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws>
> *
> * Network Block Device
> @@ -180,6 +180,8 @@ typedef struct NBDExtent {
> #define NBD_INFO_NAME 1
> #define NBD_INFO_DESCRIPTION 2
> #define NBD_INFO_BLOCK_SIZE 3
> +#define NBD_INFO_TRIM_SIZE 4
> +#define NBD_INFO_ZERO_SIZE 5
>
> /* Request flags, sent from client to server during transmission phase */
> #define NBD_CMD_FLAG_FUA (1 << 0) /* 'force unit access' during write */
> diff --git a/nbd/common.c b/nbd/common.c
> index 8c95c1d606e..840c91d5ca4 100644
> --- a/nbd/common.c
> +++ b/nbd/common.c
> @@ -129,6 +129,10 @@ const char *nbd_info_lookup(uint16_t info)
> return "description";
> case NBD_INFO_BLOCK_SIZE:
> return "block size";
> + case NBD_INFO_TRIM_SIZE:
> + return "trim size";
> + case NBD_INFO_ZERO_SIZE:
> + return "zero size";
> default:
> return "<unknown>";
> }
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits
2018-05-01 21:13 ` [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits Eric Blake
@ 2018-05-03 8:29 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-03 8:29 UTC (permalink / raw)
To: Eric Blake, qemu-devel; +Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini
02.05.2018 00:13, Eric Blake wrote:
> The next patch will ask the server for more items of NBD_INFO.
> However, the server is free to respond with INFO items in a
> different order than what we request, so performing any sanity
> checks about constraints that occur between multiple INFO items
> must be done after all items have been received. Make it easier
> to see that such checks are last by sinking the final processing
> after the while loop, rather than embedded in the NBD_REP_ACK
> processing that occurs textually within the loop before other
> INFO processing.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> nbd/client.c | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/nbd/client.c b/nbd/client.c
> index 232ff4f46da..0abb195b856 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (C) 2016-2017 Red Hat, Inc.
> + * Copyright (C) 2016-2018 Red Hat, Inc.
> * Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws>
> *
> * Network Block Device Client Side
> @@ -365,17 +365,12 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
>
> if (reply.type == NBD_REP_ACK) {
> /* Server is done sending info and moved into transmission
> - phase, but make sure it sent flags */
> + phase */
> if (len) {
> error_setg(errp, "server sent invalid NBD_REP_ACK");
> return -1;
> }
> - if (!info->flags) {
> - error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
> - return -1;
> - }
> - trace_nbd_opt_go_success();
> - return 1;
> + break;
> }
> if (reply.type != NBD_REP_INFO) {
> error_setg(errp, "unexpected reply type %" PRIu32
> @@ -482,6 +477,14 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
> break;
> }
> }
> +
> + /* Sanity check that server's responses make sense */
> + if (!info->flags) {
> + error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
> + return -1;
> + }
> + trace_nbd_opt_go_success();
> + return 1;
> }
>
> /* Return -1 on failure, 0 if wantname is an available export. */
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
2018-05-01 21:13 ` [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info Eric Blake
@ 2018-05-03 9:17 ` Vladimir Sementsov-Ogievskiy
2018-05-03 14:49 ` Eric Blake
2018-05-22 15:33 ` Vladimir Sementsov-Ogievskiy
1 sibling, 1 reply; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-03 9:17 UTC (permalink / raw)
To: Eric Blake, qemu-devel
Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz
02.05.2018 00:13, Eric Blake wrote:
> The NBD spec is clarifying [1] that a server may want to advertise
> different limits for READ/WRITE (in our case, 32M) than for
> TRIM/ZERO (in our case, nearly 4G). Implement the client
> side support for these alternate limits, by always requesting
> the new information (a compliant server must ignore the
> request if it is unknown), and by validating anything reported
> by the server before populating the block layer limits.
>
> [1] https://lists.debian.org/nbd/2018/03/msg00048.html
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
>
> The given URL for the NBD spec was v3; it will change to be a v4
> version of that patch in part to point back to this qemu commit
> as a proof of implementation.
> ---
> include/block/nbd.h | 4 ++
> block/nbd.c | 15 ++++++-
> nbd/client.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> nbd/trace-events | 2 +
> 4 files changed, 131 insertions(+), 6 deletions(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index cbf51628f78..90ddef32bb3 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -270,6 +270,10 @@ struct NBDExportInfo {
> uint32_t min_block;
> uint32_t opt_block;
> uint32_t max_block;
> + uint32_t min_trim;
> + uint32_t max_trim;
> + uint32_t min_zero;
> + uint32_t max_zero;
>
> uint32_t meta_base_allocation_id;
> };
> diff --git a/block/nbd.c b/block/nbd.c
> index 1e2b3ba2d3b..823d10b251d 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -478,8 +478,19 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
> uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);
>
> bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
> - bs->bl.max_pdiscard = max;
> - bs->bl.max_pwrite_zeroes = max;
> + if (s->info.max_trim) {
> + bs->bl.max_pdiscard = MIN(s->info.max_trim, BDRV_REQUEST_MAX_BYTES);
> + } else {
> + bs->bl.max_pdiscard = max;
> + }
> + bs->bl.pdiscard_alignment = s->info.min_trim;
> + if (s->info.max_zero) {
> + bs->bl.max_pwrite_zeroes = MIN(s->info.max_zero,
> + BDRV_REQUEST_MAX_BYTES);
> + } else {
> + bs->bl.max_pwrite_zeroes = max;
> + }
> + bs->bl.pwrite_zeroes_alignment = s->info.min_zero;
> bs->bl.max_transfer = max;
>
> if (s->info.opt_block &&
> diff --git a/nbd/client.c b/nbd/client.c
> index 0abb195b856..f1364747ba1 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -337,16 +337,18 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
> info->flags = 0;
>
> trace_nbd_opt_go_start(wantname);
> - buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
> + buf = g_malloc(4 + len + 2 + 3 * 2 * info->request_sizes + 1);
Hmm, what "+1" means?
> stl_be_p(buf, len);
> memcpy(buf + 4, wantname, len);
> - /* At most one request, everything else up to server */
> - stw_be_p(buf + 4 + len, info->request_sizes);
> + /* Either 0 or 3 requests, everything else up to server */
> + stw_be_p(buf + 4 + len, 3 * info->request_sizes);
> if (info->request_sizes) {
> stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
> + stw_be_p(buf + 4 + len + 2 + 2, NBD_INFO_TRIM_SIZE);
> + stw_be_p(buf + 4 + len + 2 + 2 + 2, NBD_INFO_ZERO_SIZE);
> }
> error = nbd_send_option_request(ioc, NBD_OPT_GO,
> - 4 + len + 2 + 2 * info->request_sizes,
> + 4 + len + 2 + 3 * 2 * info->request_sizes,
> buf, errp);
magic chunk) Change looks correct. Is it worth introducing a buf_len
variable, to
not retype it?
> g_free(buf);
> if (error < 0) {
> @@ -467,6 +469,72 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
> info->max_block);
> break;
>
> + case NBD_INFO_TRIM_SIZE:
> + if (len != sizeof(info->min_trim) * 2) {
> + error_setg(errp, "remaining trim info len %" PRIu32
> + " is unexpected size", len);
> + nbd_send_opt_abort(ioc);
> + return -1;
> + }
> + if (nbd_read(ioc, &info->min_trim, sizeof(info->min_trim),
> + errp) < 0) {
> + error_prepend(errp, "failed to read info minimum trim size: ");
> + nbd_send_opt_abort(ioc);
> + return -1;
> + }
> + be32_to_cpus(&info->min_trim);
> + if (nbd_read(ioc, &info->max_trim, sizeof(info->max_trim),
> + errp) < 0) {
> + error_prepend(errp,
> + "failed to read info maximum trim size: ");
> + nbd_send_opt_abort(ioc);
> + return -1;
> + }
> + be32_to_cpus(&info->max_trim);
> + if (!info->min_trim || !info->max_trim ||
> + (info->max_trim != UINT32_MAX &&
> + !QEMU_IS_ALIGNED(info->max_trim, info->min_trim))) {
> + error_setg(errp, "server trim sizes %" PRIu32 "/%" PRIu32
> + " are not valid", info->min_trim, info->max_trim);
> + nbd_send_opt_abort(ioc);
> + return -1;
> + }
Didn't you forget to check that min_trim is a power of two?
> + trace_nbd_opt_go_info_trim_size(info->min_trim, info->max_trim);
> + break;
> +
> + case NBD_INFO_ZERO_SIZE:
> + if (len != sizeof(info->min_zero) * 2) {
> + error_setg(errp, "remaining zero info len %" PRIu32
> + " is unexpected size", len);
> + nbd_send_opt_abort(ioc);
> + return -1;
> + }
> + if (nbd_read(ioc, &info->min_zero, sizeof(info->min_zero),
> + errp) < 0) {
> + error_prepend(errp, "failed to read info minimum zero size: ");
> + nbd_send_opt_abort(ioc);
> + return -1;
> + }
> + be32_to_cpus(&info->min_zero);
> + if (nbd_read(ioc, &info->max_zero, sizeof(info->max_zero),
> + errp) < 0) {
> + error_prepend(errp,
> + "failed to read info maximum zero size: ");
> + nbd_send_opt_abort(ioc);
> + return -1;
> + }
> + be32_to_cpus(&info->max_zero);
> + if (!info->min_zero || !info->max_zero ||
> + (info->max_zero != UINT32_MAX &&
> + !QEMU_IS_ALIGNED(info->max_zero, info->min_zero))) {
> + error_setg(errp, "server zero sizes %" PRIu32 "/%" PRIu32
> + " are not valid", info->min_zero, info->max_zero);
> + nbd_send_opt_abort(ioc);
> + return -1;
> + }
> + trace_nbd_opt_go_info_zero_size(info->min_zero, info->max_zero);
> + break;
> +
hmm, so, now nbd_opt_go() is full of very-very similar code parts, which
may worth some refactoring now or in future.
Are we going to add similar limits for BLOCK_STATUS? and for CACHE? I
have an idea: why not to make an universal option for it in
protocol? Something like
option INFO_CMD_SIZE:
uint16_t command
uint32_t min_block
uint32_t max_block
and server can send several such options. Most of semantics for these
additional limits are common, so it will simplify both documentation
and realization..
I'll copy this to nbd thread and it is better to discuss it in it.
> default:
> trace_nbd_opt_go_info_unknown(type, nbd_info_lookup(type));
> if (nbd_drop(ioc, len, errp) < 0) {
> @@ -483,6 +551,46 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
> error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
> return -1;
> }
> + if (info->min_trim) {
> + if (!info->min_block) {
> + error_setg(errp, "broken server sent INFO_TRIM_SIZE without"
> + " INFO_BLOCK_SIZE");
> + return -1;
> + }
> + if (info->min_trim < info->opt_block) {
> + error_setg(errp, "broken server sent INFO_TRIM_SIZE with"
> + " minimum trim %" PRIu32 " less than preferred block %"
> + PRIu32, info->min_trim, info->opt_block);
> + return -1;
> + }
> + if (info->max_trim < info->max_block) {
> + error_setg(errp, "broken server sent INFO_TRIM_SIZE with"
> + " maximum trim %" PRIu32 " less than maximum block %"
> + PRIu32, info->max_trim, info->max_block);
> + return -1;
> + }
> + info->max_trim = QEMU_ALIGN_DOWN(info->max_trim, info->min_block);
> + }
> + if (info->min_zero) {
> + if (!info->min_block) {
> + error_setg(errp, "broken server sent INFO_ZERO_SIZE without"
> + " INFO_BLOCK_SIZE");
> + return -1;
> + }
> + if (info->min_zero < info->opt_block) {
> + error_setg(errp, "broken server sent INFO_ZERO_SIZE with"
> + " minimum zero %" PRIu32 " less than preferred block %"
> + PRIu32, info->min_zero, info->opt_block);
> + return -1;
> + }
> + if (info->max_zero < info->max_block) {
> + error_setg(errp, "broken server sent INFO_ZERO_SIZE with"
> + " maximum zero %" PRIu32 " less than maximum block %"
> + PRIu32, info->max_zero, info->max_block);
> + return -1;
> + }
> + info->max_zero = QEMU_ALIGN_DOWN(info->max_zero, info->min_block);
> + }
> trace_nbd_opt_go_success();
> return 1;
> }
> diff --git a/nbd/trace-events b/nbd/trace-events
> index dee081e7758..ddb9d0a2b3e 100644
> --- a/nbd/trace-events
> +++ b/nbd/trace-events
> @@ -6,6 +6,8 @@ nbd_opt_go_start(const char *name) "Attempting NBD_OPT_GO for export '%s'"
> nbd_opt_go_success(void) "Export is good to go"
> nbd_opt_go_info_unknown(int info, const char *name) "Ignoring unknown info %d (%s)"
> nbd_opt_go_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t maximum) "Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32
> +nbd_opt_go_info_trim_size(uint32_t minimum, uint32_t maximum) "Trim sizes are 0x%" PRIx32 ", 0x%" PRIx32
> +nbd_opt_go_info_zero_size(uint32_t minimum, uint32_t maximum) "Zero sizes are 0x%" PRIx32 ", 0x%" PRIx32
> nbd_receive_query_exports_start(const char *wantname) "Querying export list for '%s'"
> nbd_receive_query_exports_success(const char *wantname) "Found desired export name '%s'"
> nbd_receive_starttls_new_client(void) "Setting up TLS"
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
2018-05-03 9:17 ` Vladimir Sementsov-Ogievskiy
@ 2018-05-03 14:49 ` Eric Blake
2018-05-03 15:51 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 12+ messages in thread
From: Eric Blake @ 2018-05-03 14:49 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, qemu-devel
Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz
On 05/03/2018 04:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> 02.05.2018 00:13, Eric Blake wrote:
>> The NBD spec is clarifying [1] that a server may want to advertise
>> different limits for READ/WRITE (in our case, 32M) than for
>> TRIM/ZERO (in our case, nearly 4G). Implement the client
>> side support for these alternate limits, by always requesting
>> the new information (a compliant server must ignore the
>> request if it is unknown), and by validating anything reported
>> by the server before populating the block layer limits.
>>
>> [1] https://lists.debian.org/nbd/2018/03/msg00048.html
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>> +++ b/nbd/client.c
>> @@ -337,16 +337,18 @@ static int nbd_opt_go(QIOChannel *ioc, const
>> char *wantname,
>> info->flags = 0;
>>
>> trace_nbd_opt_go_start(wantname);
>> - buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
>> + buf = g_malloc(4 + len + 2 + 3 * 2 * info->request_sizes + 1);
>
> Hmm, what "+1" means?
Doesn't appear to be necessary, but a little slop never hurts. Better
might be struct-ifying things rather than open-coding offsets, or even
using a vectored approach rather than requiring a single buffer. But if
useful, that's refactoring that is independent of this patch, while this
is just demonstrating the bare minimum implementation of the new feature.
>> stl_be_p(buf, len);
>> memcpy(buf + 4, wantname, len);
>> - /* At most one request, everything else up to server */
>> - stw_be_p(buf + 4 + len, info->request_sizes);
>> + /* Either 0 or 3 requests, everything else up to server */
>> + stw_be_p(buf + 4 + len, 3 * info->request_sizes);
>> if (info->request_sizes) {
>> stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
>> + stw_be_p(buf + 4 + len + 2 + 2, NBD_INFO_TRIM_SIZE);
>> + stw_be_p(buf + 4 + len + 2 + 2 + 2, NBD_INFO_ZERO_SIZE);
>> }
>> error = nbd_send_option_request(ioc, NBD_OPT_GO,
>> - 4 + len + 2 + 2 *
>> info->request_sizes,
>> + 4 + len + 2 + 3 * 2 *
>> info->request_sizes,
>> buf, errp);
>
> magic chunk) Change looks correct. Is it worth introducing a buf_len
> variable, to
> not retype it?
And increment as we go? Yeah, it might make things more legible.
>> + be32_to_cpus(&info->max_trim);
>> + if (!info->min_trim || !info->max_trim ||
>> + (info->max_trim != UINT32_MAX &&
>> + !QEMU_IS_ALIGNED(info->max_trim, info->min_trim))) {
>> + error_setg(errp, "server trim sizes %" PRIu32 "/%"
>> PRIu32
>> + " are not valid", info->min_trim,
>> info->max_trim);
>> + nbd_send_opt_abort(ioc);
>> + return -1;
>> + }
>
> Didn't you forget to check that min_trim is a power of two?
Nope. The NBD spec wording specifically uses "The minimum trim size
SHOULD be a power of 2, and MUST be at least as large as the preferred
block size advertised in `NBD_INFO_BLOCK_SIZE`", in part because there
ARE existing hardware iscsi devices that have an advertised
preferred/maximum trim size of exactly 15 megabytes (you can request
smaller trims 1M at a time, and the device then caches things to
eventually report unmapped slices at a 15M boundary). For more
information on this odd hardware, look in the qemu logs for the Dell
Equallogic device, such as commit 3482b9bc.
Since the NBD 'min trim' is advisory, and merely telling the client the
ideal alignments to use for a trim most likely to have an effect (on
Equallogic, a 1M trim by itself might not trim anything unless
neighboring areas are also trimmed, while an aligned 15M trim is
immediately observable), an NBD server wrapping such an iscsi device may
prefer to mirror the hardware's preferred alignment of 15M, rather than
inventing a minimum alignment of 1M, in what it advertises in
NBD_INFO_TRIM_SIZE.
And maybe I should tweak the NBD spec addition to call things "preferred
trim/zero" rather than "min trim/zero", if that makes things any easier
to grasp on first read, and better matches the iscsi concept.
>
> hmm, so, now nbd_opt_go() is full of very-very similar code parts, which
> may worth some refactoring now or in future.
>
> Are we going to add similar limits for BLOCK_STATUS? and for CACHE? I
> have an idea: why not to make an universal option for it in
> protocol? Something like
>
> option INFO_CMD_SIZE:
> uint16_t command
> uint32_t min_block
> uint32_t max_block
>
> and server can send several such options. Most of semantics for these
> additional limits are common, so it will simplify both documentation
> and realization..
>
> I'll copy this to nbd thread and it is better to discuss it in it.
Yep, I've followed up on that thread, but will post a v2 of this
proposal along those lines.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
2018-05-03 14:49 ` Eric Blake
@ 2018-05-03 15:51 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-03 15:51 UTC (permalink / raw)
To: Eric Blake, qemu-devel
Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz
03.05.2018 17:49, Eric Blake wrote:
> On 05/03/2018 04:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 02.05.2018 00:13, Eric Blake wrote:
>>> The NBD spec is clarifying [1] that a server may want to advertise
>>> different limits for READ/WRITE (in our case, 32M) than for
>>> TRIM/ZERO (in our case, nearly 4G). Implement the client
>>> side support for these alternate limits, by always requesting
>>> the new information (a compliant server must ignore the
>>> request if it is unknown), and by validating anything reported
>>> by the server before populating the block layer limits.
>>>
>>> [1] https://lists.debian.org/nbd/2018/03/msg00048.html
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>
>
>>> +++ b/nbd/client.c
>>> @@ -337,16 +337,18 @@ static int nbd_opt_go(QIOChannel *ioc, const
>>> char *wantname,
>>> info->flags = 0;
>>>
>>> trace_nbd_opt_go_start(wantname);
>>> - buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
>>> + buf = g_malloc(4 + len + 2 + 3 * 2 * info->request_sizes + 1);
>>
>> Hmm, what "+1" means?
>
> Doesn't appear to be necessary, but a little slop never hurts. Better
> might be struct-ifying things rather than open-coding offsets, or even
> using a vectored approach rather than requiring a single buffer. But
> if useful, that's refactoring that is independent of this patch, while
> this is just demonstrating the bare minimum implementation of the new
> feature.
>
>>> stl_be_p(buf, len);
>>> memcpy(buf + 4, wantname, len);
>>> - /* At most one request, everything else up to server */
>>> - stw_be_p(buf + 4 + len, info->request_sizes);
>>> + /* Either 0 or 3 requests, everything else up to server */
>>> + stw_be_p(buf + 4 + len, 3 * info->request_sizes);
>>> if (info->request_sizes) {
>>> stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
>>> + stw_be_p(buf + 4 + len + 2 + 2, NBD_INFO_TRIM_SIZE);
>>> + stw_be_p(buf + 4 + len + 2 + 2 + 2, NBD_INFO_ZERO_SIZE);
>>> }
>>> error = nbd_send_option_request(ioc, NBD_OPT_GO,
>>> - 4 + len + 2 + 2 *
>>> info->request_sizes,
>>> + 4 + len + 2 + 3 * 2 *
>>> info->request_sizes,
>>> buf, errp);
>>
>> magic chunk) Change looks correct. Is it worth introducing a buf_len
>> variable, to
>> not retype it?
>
> And increment as we go? Yeah, it might make things more legible.
>
>
>>> + be32_to_cpus(&info->max_trim);
>>> + if (!info->min_trim || !info->max_trim ||
>>> + (info->max_trim != UINT32_MAX &&
>>> + !QEMU_IS_ALIGNED(info->max_trim, info->min_trim))) {
>>> + error_setg(errp, "server trim sizes %" PRIu32 "/%"
>>> PRIu32
>>> + " are not valid", info->min_trim,
>>> info->max_trim);
>>> + nbd_send_opt_abort(ioc);
>>> + return -1;
>>> + }
>>
>> Didn't you forget to check that min_trim is a power of two?
>
> Nope. The NBD spec wording specifically uses "The minimum trim size
> SHOULD be a power of 2, and MUST be at least as large as the preferred
> block size advertised in `NBD_INFO_BLOCK_SIZE`", in part because there
> ARE existing hardware iscsi devices that have an advertised
> preferred/maximum trim size of exactly 15 megabytes (you can request
> smaller trims 1M at a time, and the device then caches things to
> eventually report unmapped slices at a 15M boundary). For more
> information on this odd hardware, look in the qemu logs for the Dell
> Equallogic device, such as commit 3482b9bc.
Ok. Interesting, thank you.
>
> Since the NBD 'min trim' is advisory, and merely telling the client
> the ideal alignments to use for a trim most likely to have an effect
> (on Equallogic, a 1M trim by itself might not trim anything unless
> neighboring areas are also trimmed, while an aligned 15M trim is
> immediately observable), an NBD server wrapping such an iscsi device
> may prefer to mirror the hardware's preferred alignment of 15M, rather
> than inventing a minimum alignment of 1M, in what it advertises in
> NBD_INFO_TRIM_SIZE.
>
> And maybe I should tweak the NBD spec addition to call things
> "preferred trim/zero" rather than "min trim/zero", if that makes
> things any easier to grasp on first read, and better matches the iscsi
> concept.
good idea. and it will better correspond to pref_block for
INFO_BLOCK_SIZE, as they are compared with it.
>
>>
>> hmm, so, now nbd_opt_go() is full of very-very similar code parts,
>> which may worth some refactoring now or in future.
>>
>> Are we going to add similar limits for BLOCK_STATUS? and for CACHE? I
>> have an idea: why not to make an universal option for it in
>> protocol? Something like
>>
>> option INFO_CMD_SIZE:
>> uint16_t command
>> uint32_t min_block
>> uint32_t max_block
>>
>> and server can send several such options. Most of semantics for these
>> additional limits are common, so it will simplify both documentation
>> and realization..
>>
>> I'll copy this to nbd thread and it is better to discuss it in it.
>
> Yep, I've followed up on that thread, but will post a v2 of this
> proposal along those lines.
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
2018-05-01 21:13 ` [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info Eric Blake
2018-05-03 9:17 ` Vladimir Sementsov-Ogievskiy
@ 2018-05-22 15:33 ` Vladimir Sementsov-Ogievskiy
2018-05-23 15:40 ` Eric Blake
1 sibling, 1 reply; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-05-22 15:33 UTC (permalink / raw)
To: Eric Blake, qemu-devel
Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz
02.05.2018 00:13, Eric Blake wrote:
> The NBD spec is clarifying [1] that a server may want to advertise
> different limits for READ/WRITE (in our case, 32M) than for
> TRIM/ZERO (in our case, nearly 4G). Implement the client
> side support for these alternate limits, by always requesting
> the new information (a compliant server must ignore the
> request if it is unknown), and by validating anything reported
> by the server before populating the block layer limits.
>
> [1] https://lists.debian.org/nbd/2018/03/msg00048.html
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
>
> The given URL for the NBD spec was v3; it will change to be a v4
> version of that patch in part to point back to this qemu commit
> as a proof of implementation.
> ---
> include/block/nbd.h | 4 ++
> block/nbd.c | 15 ++++++-
> nbd/client.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> nbd/trace-events | 2 +
> 4 files changed, 131 insertions(+), 6 deletions(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index cbf51628f78..90ddef32bb3 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -270,6 +270,10 @@ struct NBDExportInfo {
> uint32_t min_block;
> uint32_t opt_block;
> uint32_t max_block;
> + uint32_t min_trim;
> + uint32_t max_trim;
> + uint32_t min_zero;
> + uint32_t max_zero;
>
> uint32_t meta_base_allocation_id;
> };
> diff --git a/block/nbd.c b/block/nbd.c
> index 1e2b3ba2d3b..823d10b251d 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -478,8 +478,19 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
> uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);
>
> bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
> - bs->bl.max_pdiscard = max;
> - bs->bl.max_pwrite_zeroes = max;
> + if (s->info.max_trim) {
> + bs->bl.max_pdiscard = MIN(s->info.max_trim, BDRV_REQUEST_MAX_BYTES);
> + } else {
> + bs->bl.max_pdiscard = max;
> + }
> + bs->bl.pdiscard_alignment = s->info.min_trim;
> + if (s->info.max_zero) {
> + bs->bl.max_pwrite_zeroes = MIN(s->info.max_zero,
> + BDRV_REQUEST_MAX_BYTES);
> + } else {
> + bs->bl.max_pwrite_zeroes = max;
> + }
> + bs->bl.pwrite_zeroes_alignment = s->info.min_zero;
> bs->bl.max_transfer = max;
Should not max_transfer be updated too? Looks like it limits all
requests, is it right?
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
2018-05-22 15:33 ` Vladimir Sementsov-Ogievskiy
@ 2018-05-23 15:40 ` Eric Blake
0 siblings, 0 replies; 12+ messages in thread
From: Eric Blake @ 2018-05-23 15:40 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, qemu-devel
Cc: qemu-block, edgar.kaziakhmedov, Paolo Bonzini, Kevin Wolf, Max Reitz
On 05/22/2018 10:33 AM, Vladimir Sementsov-Ogievskiy wrote:
> 02.05.2018 00:13, Eric Blake wrote:
>> The NBD spec is clarifying [1] that a server may want to advertise
>> different limits for READ/WRITE (in our case, 32M) than for
>> TRIM/ZERO (in our case, nearly 4G). Implement the client
>> side support for these alternate limits, by always requesting
>> the new information (a compliant server must ignore the
>> request if it is unknown), and by validating anything reported
>> by the server before populating the block layer limits.
I still need to do a v2 of this series based on feedback from the NBD
list, but answering your question in the meantime:
>> bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
>> - bs->bl.max_pdiscard = max;
>> - bs->bl.max_pwrite_zeroes = max;
>> + if (s->info.max_trim) {
>> + bs->bl.max_pdiscard = MIN(s->info.max_trim,
>> BDRV_REQUEST_MAX_BYTES);
>> + } else {
>> + bs->bl.max_pdiscard = max;
>> + }
>> + bs->bl.pdiscard_alignment = s->info.min_trim;
>> + if (s->info.max_zero) {
>> + bs->bl.max_pwrite_zeroes = MIN(s->info.max_zero,
>> + BDRV_REQUEST_MAX_BYTES);
>> + } else {
>> + bs->bl.max_pwrite_zeroes = max;
>> + }
>> + bs->bl.pwrite_zeroes_alignment = s->info.min_zero;
>> bs->bl.max_transfer = max;
>
> Should not max_transfer be updated too? Looks like it limits all
> requests, is it right?
max_transfer affects read and write requests, but not write_zero
requests (where max_pwrite_zeroes is used instead) or trim requests
(where max_pdiscard is used instead). Which is precisely the semantics
we want - the NBD extension is mirroring the fact that qemu already has
a way to track independent limits for trim/zero that can be larger than
the normal limits for read/write, because trim/zero do not involve a
transfer of a large data buffer.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-05-23 15:41 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-01 21:13 [Qemu-devel] [PATCH 0/4] Support larger NBD_CMD_WRITE_ZEROES Eric Blake
2018-05-01 21:13 ` [Qemu-devel] [PATCH 1/4] nbd: Prepare for additional block sizing info Eric Blake
2018-05-03 8:20 ` Vladimir Sementsov-Ogievskiy
2018-05-01 21:13 ` [Qemu-devel] [PATCH 2/4] nbd/client: Refactor in preparation for more limits Eric Blake
2018-05-03 8:29 ` Vladimir Sementsov-Ogievskiy
2018-05-01 21:13 ` [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info Eric Blake
2018-05-03 9:17 ` Vladimir Sementsov-Ogievskiy
2018-05-03 14:49 ` Eric Blake
2018-05-03 15:51 ` Vladimir Sementsov-Ogievskiy
2018-05-22 15:33 ` Vladimir Sementsov-Ogievskiy
2018-05-23 15:40 ` Eric Blake
2018-05-01 21:13 ` [Qemu-devel] [PATCH 4/4] nbd/server: " Eric Blake
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.