* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-09 15:17 ` [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export Vladimir Sementsov-Ogievskiy
@ 2018-06-20 11:24 ` Eric Blake
2018-06-20 14:04 ` Vladimir Sementsov-Ogievskiy
2018-06-20 15:43 ` Eric Blake
2018-06-20 16:27 ` Eric Blake
` (2 subsequent siblings)
3 siblings, 2 replies; 34+ messages in thread
From: Eric Blake @ 2018-06-20 11:24 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
Cc: armbru, pbonzini, mreitz, kwolf, den
On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Handle new NBD meta namespace: "qemu", and corresponding queries:
s/new/a new/
> "qemu:dirty-bitmap:<export bitmap name>".
>
> With new metadata context negotiated, BLOCK_STATUS query will reply
s/new/the new/
> with dirty-bitmap data, converted to extents. New public function
s/New/The new/
> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
s/bitmap/which bitmap/
> may be exported.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> include/block/nbd.h | 6 ++
> nbd/server.c | 271 ++++++++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 259 insertions(+), 18 deletions(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index fcdcd54502..a653d0fba7 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -234,6 +234,10 @@ enum {
> #define NBD_STATE_HOLE (1 << 0)
> #define NBD_STATE_ZERO (1 << 1)
>
> +/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
> + * for qemu:dirty-bitmap:* meta contexts */
Comment tail.
> +#define NBD_STATE_DIRTY (1 << 0)
> +
> static inline bool nbd_reply_type_is_error(int type)
> {
> return type & (1 << 15);
> @@ -315,6 +319,8 @@ void nbd_client_put(NBDClient *client);
> void nbd_server_start(SocketAddress *addr, const char *tls_creds,
> Error **errp);
>
> +void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
> + const char *bitmap_export_name, Error **errp);
>
> /* nbd_read
> * Reads @size bytes from @ioc. Returns 0 on success.
> diff --git a/nbd/server.c b/nbd/server.c
> index 2d762d7289..569e068fc2 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -23,6 +23,12 @@
> #include "nbd-internal.h"
>
> #define NBD_META_ID_BASE_ALLOCATION 0
> +#define NBD_META_ID_DIRTY_BITMAP 1
> +
> +/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical constant. If need
> + * to increase, note that NBD protocol defines 32 mb as a limit, after which the
s/need to increase/an increase is needed/
> + * reply may be considered as a denial of service attack. */
> +#define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8)
>
> static int system_errno_to_nbd_errno(int err)
> {
> @@ -80,6 +86,9 @@ struct NBDExport {
>
> BlockBackend *eject_notifier_blk;
> Notifier eject_notifier;
> +
> + BdrvDirtyBitmap *export_bitmap;
> + char *export_bitmap_context;
> };
>
> static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
> @@ -92,6 +101,7 @@ typedef struct NBDExportMetaContexts {
> bool valid; /* means that negotiation of the option finished without
> errors */
> bool base_allocation; /* export base:allocation context (block status) */
> + bool dirty_bitmap; /* export qemu:dirty-bitmap:<export bitmap name> */
> } NBDExportMetaContexts;
>
> struct NBDClient {
> @@ -810,6 +820,55 @@ static int nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
> &meta->base_allocation, errp);
> }
>
> +/* nbd_meta_bitmap_query
> + *
> + * Handle query to 'qemu:' namespace.
> + * @len is the amount of text remaining to be read from the current name, after
> + * the 'qemu:' portion has been stripped.
> + *
> + * Return -errno on I/O error, 0 if option was completely handled by
> + * sending a reply about inconsistent lengths, or 1 on success. */
> +static int nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
> + uint32_t len, Error **errp)
> +{
> + bool dirty_bitmap = false;
> + int dirty_bitmap_len = strlen("dirty-bitmap:");
size_t is better for strlen() values.
> + int ret;
> +
> + if (!meta->exp->export_bitmap) {
> + return nbd_opt_skip(client, len, errp);
> + }
Worth a trace?...
> +
> + if (len == 0) {
> + if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
> + meta->dirty_bitmap = true;
> + }
> + trace_nbd_negotiate_meta_query_parse("empty");
> + return 1;
> + }
> +
> + if (len < dirty_bitmap_len) {
> + trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
> + return nbd_opt_skip(client, len, errp);
> + }
> +
...especially since other returns have traced the result.
I'm strongly thinking of adding a patch to QMP to add an x-context
parameter when creating an NBD client, in order to at least make testing
client/server interactions easier than just code inspection. Does not
have to be part of this series.
> + len -= dirty_bitmap_len;
> + ret = nbd_meta_pattern(client, "dirty-bitmap:", &dirty_bitmap, errp);
> + if (ret <= 0) {
> + return ret;
> + }
> + if (!dirty_bitmap) {
> + trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
> + return nbd_opt_skip(client, len, errp);
> + }
> +
> + trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
> +
> + return nbd_meta_empty_or_pattern(
> + client, meta->exp->export_bitmap_context +
> + strlen("qemu:dirty_bitmap:"), len, &meta->dirty_bitmap, errp);
> +}
> +
> /* nbd_negotiate_meta_query
> *
> * Parse namespace name and call corresponding function to parse body of the
> @@ -826,8 +885,10 @@ static int nbd_negotiate_meta_query(NBDClient *client,
> NBDExportMetaContexts *meta, Error **errp)
> {
> int ret;
> - char query[sizeof("base:") - 1];
> - size_t baselen = strlen("base:");
> + size_t ns_len = 5;
> + char ns[5]; /* Both 'qemu' and 'base' namespaces have length = 5 including a
> + colon. If it's needed to introduce a namespace of the other
> + length, this should be certainly refactored. */
s/be certainly/certainly be/
...
> @@ -1793,6 +1871,9 @@ static int blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset,
> }
>
> /* nbd_co_send_extents
> + *
> + * NBD_REPLY_FLAG_DONE is not set, don't forget to send it.
> + *
> * @extents should be in big-endian */
> static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
> NBDExtent *extents, unsigned nb_extents,
> @@ -1805,7 +1886,7 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
> {.iov_base = extents, .iov_len = nb_extents * sizeof(extents[0])}
> };
>
> - set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_BLOCK_STATUS,
> + set_be_chunk(&chunk.h, 0, NBD_REPLY_TYPE_BLOCK_STATUS,
Worth a bool parameter to send the flag automatically on the last context?
> handle, sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len);
> stl_be_p(&chunk.context_id, context_id);
>
> @@ -1830,6 +1911,97 @@ static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
> return nbd_co_send_extents(client, handle, &extent, 1, context_id, errp);
> }
>
> +/* Set several extents, describing region of given @length with given @flags.
> + * Do not set more than @nb_extents, return number of set extents.
> + */
> +static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
> + uint64_t length, uint32_t flags)
> +{
> + unsigned i = 0;
> + uint32_t max_extent = QEMU_ALIGN_DOWN(INT32_MAX, BDRV_SECTOR_SIZE);
This is too small of a granularity wrong when the server advertised 4k
alignment during NBD_OPT_GO; it should probably refer to
bs->bl.request_alignment.
> + uint32_t max_extent_be = cpu_to_be32(max_extent);
> + uint32_t flags_be = cpu_to_be32(flags);
> +
> + for (i = 0; i < nb_extents && length > max_extent;
> + i++, length -= max_extent)
> + {
> + extents[i].length = max_extent_be;
> + extents[i].flags = flags_be;
> + }
> +
> + if (length > 0 && i < nb_extents) {
> + extents[i].length = cpu_to_be32(length);
> + extents[i].flags = flags_be;
> + i++;
> + }
> +
> + return i;
> +}
> +
> +static unsigned bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
> + uint64_t length, NBDExtent *extents,
> + unsigned nb_extents, bool dont_fragment)
> +{
> + uint64_t begin = offset, end;
> + uint64_t overall_end = offset + length;
> + unsigned i = 0;
> + BdrvDirtyBitmapIter *it;
> + bool dirty;
> +
> + bdrv_dirty_bitmap_lock(bitmap);
> +
> + it = bdrv_dirty_iter_new(bitmap);
> + dirty = bdrv_get_dirty_locked(NULL, bitmap, offset);
> +
> + while (begin < overall_end && i < nb_extents) {
> + if (dirty) {
> + end = bdrv_dirty_bitmap_next_zero(bitmap, begin);
> + } else {
> + bdrv_set_dirty_iter(it, begin);
> + end = bdrv_dirty_iter_next(it);
> + }
> + if (end == -1) {
> + end = bdrv_dirty_bitmap_size(bitmap);
> + }
> + if (dont_fragment && end > overall_end) {
> + /* We MUST not exceed requested region if NBD_CMD_FLAG_REQ_ONE flag
> + * is set */
> + end = overall_end;
> + }
> +
> + i += add_extents(extents + i, nb_extents - i, end - begin,
> + dirty ? NBD_STATE_DIRTY : 0);
> + begin = end;
> + dirty = !dirty;
> + }
> +
> + bdrv_dirty_iter_free(it);
> +
> + bdrv_dirty_bitmap_unlock(bitmap);
> +
> + return i;
> +}
> +
> +static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
> + BdrvDirtyBitmap *bitmap, uint64_t offset,
> + uint64_t length, bool dont_fragment,
> + uint32_t context_id, Error **errp)
> +{
> + int ret;
> + unsigned nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS;
> + NBDExtent *extents = g_new(NBDExtent, nb_extents);
> +
> + nb_extents = bitmap_to_extents(bitmap, offset, length, extents, nb_extents,
> + dont_fragment);
> +
> + ret = nbd_co_send_extents(client, handle, extents, nb_extents, context_id,
> + errp);
Worth a trace when extents are sent?
> +
> + g_free(extents);
> +
> + return ret;
> +}
> +
> /* nbd_co_receive_request
> * Collect a client request. Return 0 if request looks valid, -EIO to drop
> * connection right away, and any other negative value to report an error to
> @@ -2043,11 +2215,33 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
> "discard failed", errp);
>
> case NBD_CMD_BLOCK_STATUS:
> - if (client->export_meta.valid && client->export_meta.base_allocation) {
> - return nbd_co_send_block_status(client, request->handle,
> - blk_bs(exp->blk), request->from,
> - request->len,
> - NBD_META_ID_BASE_ALLOCATION, errp);
> + if (client->export_meta.valid &&
> + (client->export_meta.base_allocation ||
> + client->export_meta.dirty_bitmap))
> + {
> + if (client->export_meta.base_allocation) {
> + ret = nbd_co_send_block_status(client, request->handle,
> + blk_bs(exp->blk), request->from,
> + request->len,
> + NBD_META_ID_BASE_ALLOCATION,
> + errp);
> + if (ret < 0) {
> + return ret;
> + }
> + }
> +
> + if (client->export_meta.dirty_bitmap) {
> + ret = nbd_co_send_bitmap(client, request->handle,
> + client->exp->export_bitmap,
> + request->from, request->len,
> + request->flags & NBD_CMD_FLAG_REQ_ONE,
> + NBD_META_ID_DIRTY_BITMAP, errp);
> + if (ret < 0) {
> + return ret;
> + }
> + }
> +
> + return nbd_co_send_structured_done(client, request->handle, errp);
> } else {
> return nbd_send_generic_reply(client, request->handle, -EINVAL,
> "CMD_BLOCK_STATUS not negotiated",
> @@ -2199,3 +2393,44 @@ void nbd_client_new(NBDExport *exp,
> co = qemu_coroutine_create(nbd_co_client_start, client);
> qemu_coroutine_enter(co);
> }
> +
> +void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
> + const char *bitmap_export_name, Error **errp)
> +{
> + BdrvDirtyBitmap *bm = NULL;
> + BlockDriverState *bs = blk_bs(exp->blk);
> +
> + if (exp->export_bitmap) {
> + error_setg(errp, "Export bitmap is already set");
> + return;
> + }
> +
> + while (true) {
> + bm = bdrv_find_dirty_bitmap(bs, bitmap);
> + if (bm != NULL || bs->backing == NULL) {
> + break;
> + }
> +
> + bs = bs->backing->bs;
> + }
Is searching for the dirty bitmap in the backing chain always the wisest
thing to do? I guess it works, but it seems a bit odd on first glance.
> +
> + if (bm == NULL) {
> + error_setg(errp, "Bitmap '%s' is not found", bitmap);
> + return;
> + }
> +
> + if (bdrv_dirty_bitmap_enabled(bm)) {
> + error_setg(errp, "Bitmap '%s' is enabled", bitmap);
> + return;
> + }
> +
> + if (bdrv_dirty_bitmap_qmp_locked(bm)) {
> + error_setg(errp, "Bitmap '%s' is locked", bitmap);
> + return;
> + }
> +
> + bdrv_dirty_bitmap_set_qmp_locked(bm, true);
> + exp->export_bitmap = bm;
> + exp->export_bitmap_context =
> + g_strdup_printf("qemu:dirty-bitmap:%s", bitmap_export_name);
> +}
>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-20 11:24 ` Eric Blake
@ 2018-06-20 14:04 ` Vladimir Sementsov-Ogievskiy
2018-06-20 15:43 ` Eric Blake
1 sibling, 0 replies; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-20 14:04 UTC (permalink / raw)
To: Eric Blake, qemu-block, qemu-devel; +Cc: armbru, pbonzini, mreitz, kwolf, den
20.06.2018 14:24, Eric Blake wrote:
> On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Handle new NBD meta namespace: "qemu", and corresponding queries:
>
> s/new/a new/
>
>> "qemu:dirty-bitmap:<export bitmap name>".
>>
>> With new metadata context negotiated, BLOCK_STATUS query will reply
>
> s/new/the new/
>
>> with dirty-bitmap data, converted to extents. New public function
>
> s/New/The new/
>
>> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
>
> s/bitmap/which bitmap/
>
>> may be exported.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>> include/block/nbd.h | 6 ++
>> nbd/server.c | 271
>> ++++++++++++++++++++++++++++++++++++++++++++++++----
>> 2 files changed, 259 insertions(+), 18 deletions(-)
>>
>> diff --git a/include/block/nbd.h b/include/block/nbd.h
>> index fcdcd54502..a653d0fba7 100644
>> --- a/include/block/nbd.h
>> +++ b/include/block/nbd.h
>> @@ -234,6 +234,10 @@ enum {
>> #define NBD_STATE_HOLE (1 << 0)
>> #define NBD_STATE_ZERO (1 << 1)
>> +/* Flags for extents (NBDExtent.flags) of
>> NBD_REPLY_TYPE_BLOCK_STATUS,
>> + * for qemu:dirty-bitmap:* meta contexts */
>
> Comment tail.
>
>> +#define NBD_STATE_DIRTY (1 << 0)
>> +
>> static inline bool nbd_reply_type_is_error(int type)
>> {
>> return type & (1 << 15);
>> @@ -315,6 +319,8 @@ void nbd_client_put(NBDClient *client);
>> void nbd_server_start(SocketAddress *addr, const char *tls_creds,
>> Error **errp);
>> +void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
>> + const char *bitmap_export_name, Error **errp);
>> /* nbd_read
>> * Reads @size bytes from @ioc. Returns 0 on success.
>> diff --git a/nbd/server.c b/nbd/server.c
>> index 2d762d7289..569e068fc2 100644
>> --- a/nbd/server.c
>> +++ b/nbd/server.c
>> @@ -23,6 +23,12 @@
>> #include "nbd-internal.h"
>> #define NBD_META_ID_BASE_ALLOCATION 0
>> +#define NBD_META_ID_DIRTY_BITMAP 1
>> +
>> +/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical
>> constant. If need
>> + * to increase, note that NBD protocol defines 32 mb as a limit,
>> after which the
>
> s/need to increase/an increase is needed/
>
>> + * reply may be considered as a denial of service attack. */
>> +#define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8)
>> static int system_errno_to_nbd_errno(int err)
>> {
>> @@ -80,6 +86,9 @@ struct NBDExport {
>> BlockBackend *eject_notifier_blk;
>> Notifier eject_notifier;
>> +
>> + BdrvDirtyBitmap *export_bitmap;
>> + char *export_bitmap_context;
>> };
>> static QTAILQ_HEAD(, NBDExport) exports =
>> QTAILQ_HEAD_INITIALIZER(exports);
>> @@ -92,6 +101,7 @@ typedef struct NBDExportMetaContexts {
>> bool valid; /* means that negotiation of the option finished
>> without
>> errors */
>> bool base_allocation; /* export base:allocation context (block
>> status) */
>> + bool dirty_bitmap; /* export qemu:dirty-bitmap:<export bitmap
>> name> */
>> } NBDExportMetaContexts;
>> struct NBDClient {
>> @@ -810,6 +820,55 @@ static int nbd_meta_base_query(NBDClient
>> *client, NBDExportMetaContexts *meta,
>> &meta->base_allocation, errp);
>> }
>> +/* nbd_meta_bitmap_query
>> + *
>> + * Handle query to 'qemu:' namespace.
>> + * @len is the amount of text remaining to be read from the current
>> name, after
>> + * the 'qemu:' portion has been stripped.
>> + *
>> + * Return -errno on I/O error, 0 if option was completely handled by
>> + * sending a reply about inconsistent lengths, or 1 on success. */
>> +static int nbd_meta_qemu_query(NBDClient *client,
>> NBDExportMetaContexts *meta,
>> + uint32_t len, Error **errp)
>> +{
>> + bool dirty_bitmap = false;
>> + int dirty_bitmap_len = strlen("dirty-bitmap:");
>
> size_t is better for strlen() values.
>
>> + int ret;
>> +
>> + if (!meta->exp->export_bitmap) {
>> + return nbd_opt_skip(client, len, errp);
>> + }
>
> Worth a trace?...
>
>> +
>> + if (len == 0) {
>> + if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
>> + meta->dirty_bitmap = true;
>> + }
>> + trace_nbd_negotiate_meta_query_parse("empty");
>> + return 1;
>> + }
>> +
>> + if (len < dirty_bitmap_len) {
>> + trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
>> + return nbd_opt_skip(client, len, errp);
>> + }
>> +
>
> ...especially since other returns have traced the result.
>
> I'm strongly thinking of adding a patch to QMP to add an x-context
> parameter when creating an NBD client, in order to at least make
> testing client/server interactions easier than just code inspection.
> Does not have to be part of this series.
>
>> + len -= dirty_bitmap_len;
>> + ret = nbd_meta_pattern(client, "dirty-bitmap:", &dirty_bitmap,
>> errp);
>> + if (ret <= 0) {
>> + return ret;
>> + }
>> + if (!dirty_bitmap) {
>> + trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
>> + return nbd_opt_skip(client, len, errp);
>> + }
>> +
>> + trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
>> +
>> + return nbd_meta_empty_or_pattern(
>> + client, meta->exp->export_bitmap_context +
>> + strlen("qemu:dirty_bitmap:"), len, &meta->dirty_bitmap,
>> errp);
>> +}
>> +
>> /* nbd_negotiate_meta_query
>> *
>> * Parse namespace name and call corresponding function to parse
>> body of the
>> @@ -826,8 +885,10 @@ static int nbd_negotiate_meta_query(NBDClient
>> *client,
>> NBDExportMetaContexts *meta,
>> Error **errp)
>> {
>> int ret;
>> - char query[sizeof("base:") - 1];
>> - size_t baselen = strlen("base:");
>> + size_t ns_len = 5;
>> + char ns[5]; /* Both 'qemu' and 'base' namespaces have length = 5
>> including a
>> + colon. If it's needed to introduce a namespace of
>> the other
>> + length, this should be certainly refactored. */
>
> s/be certainly/certainly be/
>
> ...
>
>> @@ -1793,6 +1871,9 @@ static int
>> blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset,
>> }
>> /* nbd_co_send_extents
>> + *
>> + * NBD_REPLY_FLAG_DONE is not set, don't forget to send it.
>> + *
>> * @extents should be in big-endian */
>> static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
>> NBDExtent *extents, unsigned
>> nb_extents,
>> @@ -1805,7 +1886,7 @@ static int nbd_co_send_extents(NBDClient
>> *client, uint64_t handle,
>> {.iov_base = extents, .iov_len = nb_extents *
>> sizeof(extents[0])}
>> };
>> - set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE,
>> NBD_REPLY_TYPE_BLOCK_STATUS,
>> + set_be_chunk(&chunk.h, 0, NBD_REPLY_TYPE_BLOCK_STATUS,
>
> Worth a bool parameter to send the flag automatically on the last
> context?
>
>> handle, sizeof(chunk) - sizeof(chunk.h) +
>> iov[1].iov_len);
>> stl_be_p(&chunk.context_id, context_id);
>> @@ -1830,6 +1911,97 @@ static int
>> nbd_co_send_block_status(NBDClient *client, uint64_t handle,
>> return nbd_co_send_extents(client, handle, &extent, 1,
>> context_id, errp);
>> }
>> +/* Set several extents, describing region of given @length with
>> given @flags.
>> + * Do not set more than @nb_extents, return number of set extents.
>> + */
>> +static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
>> + uint64_t length, uint32_t flags)
>> +{
>> + unsigned i = 0;
>> + uint32_t max_extent = QEMU_ALIGN_DOWN(INT32_MAX, BDRV_SECTOR_SIZE);
>
> This is too small of a granularity wrong when the server advertised 4k
> alignment during NBD_OPT_GO; it should probably refer to
> bs->bl.request_alignment.
>
>> + uint32_t max_extent_be = cpu_to_be32(max_extent);
>> + uint32_t flags_be = cpu_to_be32(flags);
>> +
>> + for (i = 0; i < nb_extents && length > max_extent;
>> + i++, length -= max_extent)
>> + {
>> + extents[i].length = max_extent_be;
>> + extents[i].flags = flags_be;
>> + }
>> +
>> + if (length > 0 && i < nb_extents) {
>> + extents[i].length = cpu_to_be32(length);
>> + extents[i].flags = flags_be;
>> + i++;
>> + }
>> +
>> + return i;
>> +}
>> +
>> +static unsigned bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t
>> offset,
>> + uint64_t length, NBDExtent *extents,
>> + unsigned nb_extents, bool
>> dont_fragment)
>> +{
>> + uint64_t begin = offset, end;
>> + uint64_t overall_end = offset + length;
>> + unsigned i = 0;
>> + BdrvDirtyBitmapIter *it;
>> + bool dirty;
>> +
>> + bdrv_dirty_bitmap_lock(bitmap);
>> +
>> + it = bdrv_dirty_iter_new(bitmap);
>> + dirty = bdrv_get_dirty_locked(NULL, bitmap, offset);
>> +
>> + while (begin < overall_end && i < nb_extents) {
>> + if (dirty) {
>> + end = bdrv_dirty_bitmap_next_zero(bitmap, begin);
>> + } else {
>> + bdrv_set_dirty_iter(it, begin);
>> + end = bdrv_dirty_iter_next(it);
>> + }
>> + if (end == -1) {
>> + end = bdrv_dirty_bitmap_size(bitmap);
>> + }
>> + if (dont_fragment && end > overall_end) {
>> + /* We MUST not exceed requested region if
>> NBD_CMD_FLAG_REQ_ONE flag
>> + * is set */
>> + end = overall_end;
>> + }
>> +
>> + i += add_extents(extents + i, nb_extents - i, end - begin,
>> + dirty ? NBD_STATE_DIRTY : 0);
>> + begin = end;
>> + dirty = !dirty;
>> + }
>> +
>> + bdrv_dirty_iter_free(it);
>> +
>> + bdrv_dirty_bitmap_unlock(bitmap);
>> +
>> + return i;
>> +}
>> +
>> +static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
>> + BdrvDirtyBitmap *bitmap, uint64_t offset,
>> + uint64_t length, bool dont_fragment,
>> + uint32_t context_id, Error **errp)
>> +{
>> + int ret;
>> + unsigned nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS;
>> + NBDExtent *extents = g_new(NBDExtent, nb_extents);
>> +
>> + nb_extents = bitmap_to_extents(bitmap, offset, length, extents,
>> nb_extents,
>> + dont_fragment);
>> +
>> + ret = nbd_co_send_extents(client, handle, extents, nb_extents,
>> context_id,
>> + errp);
>
> Worth a trace when extents are sent?
>
>> +
>> + g_free(extents);
>> +
>> + return ret;
>> +}
>> +
>> /* nbd_co_receive_request
>> * Collect a client request. Return 0 if request looks valid, -EIO
>> to drop
>> * connection right away, and any other negative value to report an
>> error to
>> @@ -2043,11 +2215,33 @@ static coroutine_fn int
>> nbd_handle_request(NBDClient *client,
>> "discard failed", errp);
>> case NBD_CMD_BLOCK_STATUS:
>> - if (client->export_meta.valid &&
>> client->export_meta.base_allocation) {
>> - return nbd_co_send_block_status(client, request->handle,
>> - blk_bs(exp->blk), request->from,
>> - request->len,
>> - NBD_META_ID_BASE_ALLOCATION, errp);
>> + if (client->export_meta.valid &&
>> + (client->export_meta.base_allocation ||
>> + client->export_meta.dirty_bitmap))
>> + {
>> + if (client->export_meta.base_allocation) {
>> + ret = nbd_co_send_block_status(client, request->handle,
>> + blk_bs(exp->blk), request->from,
>> + request->len,
>> + NBD_META_ID_BASE_ALLOCATION,
>> + errp);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + }
>> +
>> + if (client->export_meta.dirty_bitmap) {
>> + ret = nbd_co_send_bitmap(client, request->handle,
>> + client->exp->export_bitmap,
>> + request->from, request->len,
>> + request->flags &
>> NBD_CMD_FLAG_REQ_ONE,
>> + NBD_META_ID_DIRTY_BITMAP, errp);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + }
>> +
>> + return nbd_co_send_structured_done(client,
>> request->handle, errp);
>> } else {
>> return nbd_send_generic_reply(client, request->handle,
>> -EINVAL,
>> "CMD_BLOCK_STATUS not
>> negotiated",
>> @@ -2199,3 +2393,44 @@ void nbd_client_new(NBDExport *exp,
>> co = qemu_coroutine_create(nbd_co_client_start, client);
>> qemu_coroutine_enter(co);
>> }
>> +
>> +void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
>> + const char *bitmap_export_name, Error **errp)
>> +{
>> + BdrvDirtyBitmap *bm = NULL;
>> + BlockDriverState *bs = blk_bs(exp->blk);
>> +
>> + if (exp->export_bitmap) {
>> + error_setg(errp, "Export bitmap is already set");
>> + return;
>> + }
>> +
>> + while (true) {
>> + bm = bdrv_find_dirty_bitmap(bs, bitmap);
>> + if (bm != NULL || bs->backing == NULL) {
>> + break;
>> + }
>> +
>> + bs = bs->backing->bs;
>> + }
>
> Is searching for the dirty bitmap in the backing chain always the
> wisest thing to do? I guess it works, but it seems a bit odd on first
> glance.
I'm not sure about "always", but external backup with fleecing related
filter node (nodes) don't work without this search.. May be, we should
specify node name in the command, and check here that specified node is
in backing chain.
>
>> +
>> + if (bm == NULL) {
>> + error_setg(errp, "Bitmap '%s' is not found", bitmap);
>> + return;
>> + }
>> +
>> + if (bdrv_dirty_bitmap_enabled(bm)) {
>> + error_setg(errp, "Bitmap '%s' is enabled", bitmap);
>> + return;
>> + }
>> +
>> + if (bdrv_dirty_bitmap_qmp_locked(bm)) {
>> + error_setg(errp, "Bitmap '%s' is locked", bitmap);
>> + return;
>> + }
>> +
>> + bdrv_dirty_bitmap_set_qmp_locked(bm, true);
>> + exp->export_bitmap = bm;
>> + exp->export_bitmap_context =
>> + g_strdup_printf("qemu:dirty-bitmap:%s",
>> bitmap_export_name);
>> +}
>>
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-20 11:24 ` Eric Blake
2018-06-20 14:04 ` Vladimir Sementsov-Ogievskiy
@ 2018-06-20 15:43 ` Eric Blake
2018-06-20 15:58 ` Eric Blake
1 sibling, 1 reply; 34+ messages in thread
From: Eric Blake @ 2018-06-20 15:43 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
Cc: kwolf, pbonzini, den, armbru, mreitz
On 06/20/2018 06:24 AM, Eric Blake wrote:
>> +/* Set several extents, describing region of given @length with given
>> @flags.
>> + * Do not set more than @nb_extents, return number of set extents.
>> + */
>> +static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
>> + uint64_t length, uint32_t flags)
>> +{
>> + unsigned i = 0;
>> + uint32_t max_extent = QEMU_ALIGN_DOWN(INT32_MAX, BDRV_SECTOR_SIZE);
>
> This is too small of a granularity wrong when the server advertised 4k
> alignment during NBD_OPT_GO; it should probably refer to
> bs->bl.request_alignment.
In fact, we can just use INT32_MAX. The dirty bitmap has a granularity
at least as large as the sector size, but no smaller than the
request_alignment. We don't have to worry about alignment here, as the
extents will already be naturally aligned when converting from the
bitmap into extents in the caller.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-20 15:43 ` Eric Blake
@ 2018-06-20 15:58 ` Eric Blake
0 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2018-06-20 15:58 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
Cc: kwolf, pbonzini, mreitz, armbru, den
On 06/20/2018 10:43 AM, Eric Blake wrote:
> On 06/20/2018 06:24 AM, Eric Blake wrote:
>
>>> +/* Set several extents, describing region of given @length with
>>> given @flags.
>>> + * Do not set more than @nb_extents, return number of set extents.
>>> + */
>>> +static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
>>> + uint64_t length, uint32_t flags)
>>> +{
>>> + unsigned i = 0;
>>> + uint32_t max_extent = QEMU_ALIGN_DOWN(INT32_MAX, BDRV_SECTOR_SIZE);
>>
>> This is too small of a granularity wrong when the server advertised 4k
>> alignment during NBD_OPT_GO; it should probably refer to
>> bs->bl.request_alignment.
>
> In fact, we can just use INT32_MAX. The dirty bitmap has a granularity
> at least as large as the sector size, but no smaller than the
> request_alignment. We don't have to worry about alignment here, as the
> extents will already be naturally aligned when converting from the
> bitmap into extents in the caller.
Oh, I see. The NBD protocol can only ask for a length of up to 32 bits,
but if you learn via bitmap query that the entire rest of the image has
the same state, you can indeed call add_extents() with a value larger
than 32 bits, that needs to be fragmented back down into sub-32-bit
chunks. INT32_MAX isn't quite right, but rounding may still pick the
wrong granularity on an export with 4k alignment; however, INT32_MAX +
1U is just fine, and unlikely to run into alignment issues.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-09 15:17 ` [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export Vladimir Sementsov-Ogievskiy
2018-06-20 11:24 ` Eric Blake
@ 2018-06-20 16:27 ` Eric Blake
2018-06-20 17:04 ` Vladimir Sementsov-Ogievskiy
2018-11-29 4:34 ` Eric Blake
2019-01-09 19:21 ` Eric Blake
3 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2018-06-20 16:27 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
Cc: armbru, pbonzini, mreitz, kwolf, den
On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Handle new NBD meta namespace: "qemu", and corresponding queries:
> "qemu:dirty-bitmap:<export bitmap name>".
>
> With new metadata context negotiated, BLOCK_STATUS query will reply
> with dirty-bitmap data, converted to extents. New public function
> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
> may be exported.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> include/block/nbd.h | 6 ++
> nbd/server.c | 271 ++++++++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 259 insertions(+), 18 deletions(-)
I'm squashing this in, and adding:
Reviewed-by: Eric Blake <eblake@redhat.com>
(Grammar fixes, prefer 'unsigned int' over 'unsigned', add 'last' flag
for avoiding separate chunk in reply, shorter variable name to fit 80
columns, tweak 'length' tracking in order to add a trace before sending
a status reply)
diff --git i/include/block/nbd.h w/include/block/nbd.h
index a653d0fba79..8bb9606c39b 100644
--- i/include/block/nbd.h
+++ w/include/block/nbd.h
@@ -229,13 +229,11 @@ enum {
#define NBD_REPLY_TYPE_ERROR NBD_REPLY_ERR(1)
#define NBD_REPLY_TYPE_ERROR_OFFSET NBD_REPLY_ERR(2)
-/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
- * for base:allocation meta context */
+/* Extent flags for base:allocation in NBD_REPLY_TYPE_BLOCK_STATUS */
#define NBD_STATE_HOLE (1 << 0)
#define NBD_STATE_ZERO (1 << 1)
-/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
- * for qemu:dirty-bitmap:* meta contexts */
+/* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
#define NBD_STATE_DIRTY (1 << 0)
static inline bool nbd_reply_type_is_error(int type)
diff --git i/nbd/server.c w/nbd/server.c
index 6f662bd4c7f..e84aea6cf03 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -25,9 +25,10 @@
#define NBD_META_ID_BASE_ALLOCATION 0
#define NBD_META_ID_DIRTY_BITMAP 1
-/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical constant.
If need
- * to increase, note that NBD protocol defines 32 mb as a limit, after
which the
- * reply may be considered as a denial of service attack. */
+/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical
+ * constant. If an increase is needed, note that the NBD protocol
+ * recommends no larger than 32 mb, so that the client won't consider
+ * the reply as a denial of service attack. */
#define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8)
static int system_errno_to_nbd_errno(int err)
@@ -101,7 +102,7 @@ typedef struct NBDExportMetaContexts {
bool valid; /* means that negotiation of the option finished without
errors */
bool base_allocation; /* export base:allocation context (block
status) */
- bool dirty_bitmap; /* export qemu:dirty-bitmap:<export bitmap name> */
+ bool bitmap; /* export qemu:dirty-bitmap:<export bitmap name> */
} NBDExportMetaContexts;
struct NBDClient {
@@ -836,16 +837,17 @@ static int nbd_meta_qemu_query(NBDClient *client,
NBDExportMetaContexts *meta,
uint32_t len, Error **errp)
{
bool dirty_bitmap = false;
- int dirty_bitmap_len = strlen("dirty-bitmap:");
+ size_t dirty_bitmap_len = strlen("dirty-bitmap:");
int ret;
if (!meta->exp->export_bitmap) {
+ trace_nbd_negotiate_meta_query_skip("no dirty-bitmap exported");
return nbd_opt_skip(client, len, errp);
}
if (len == 0) {
if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
- meta->dirty_bitmap = true;
+ meta->bitmap = true;
}
trace_nbd_negotiate_meta_query_parse("empty");
return 1;
@@ -870,7 +872,7 @@ static int nbd_meta_qemu_query(NBDClient *client,
NBDExportMetaContexts *meta,
return nbd_meta_empty_or_pattern(
client, meta->exp->export_bitmap_context +
- strlen("qemu:dirty_bitmap:"), len, &meta->dirty_bitmap, errp);
+ strlen("qemu:dirty_bitmap:"), len, &meta->bitmap, errp);
}
/* nbd_negotiate_meta_query
@@ -888,11 +890,14 @@ static int nbd_meta_qemu_query(NBDClient *client,
NBDExportMetaContexts *meta,
static int nbd_negotiate_meta_query(NBDClient *client,
NBDExportMetaContexts *meta, Error
**errp)
{
+ /*
+ * Both 'qemu' and 'base' namespaces have length = 5 including a
+ * colon. If another length namespace is later introduced, this
+ * should certainly be refactored.
+ */
int ret;
size_t ns_len = 5;
- char ns[5]; /* Both 'qemu' and 'base' namespaces have length = 5
including a
- colon. If it's needed to introduce a namespace of
the other
- length, this should be certainly refactored. */
+ char ns[5];
uint32_t len;
ret = nbd_opt_read(client, &len, sizeof(len), errp);
@@ -991,7 +996,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
}
}
- if (meta->dirty_bitmap) {
+ if (meta->bitmap) {
ret = nbd_negotiate_send_meta_context(client,
meta->exp->export_bitmap_context,
NBD_META_ID_DIRTY_BITMAP,
@@ -1876,11 +1881,13 @@ static int
blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset,
/* nbd_co_send_extents
*
- * NBD_REPLY_FLAG_DONE is not set, don't forget to send it.
+ * @last controls whether NBD_REPLY_FLAG_DONE is sent.
*
- * @extents should be in big-endian */
+ * @extents should already be in big-endian format.
+ */
static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
- NBDExtent *extents, unsigned nb_extents,
+ NBDExtent *extents, unsigned int nb_extents,
+ uint64_t length, bool last,
uint32_t context_id, Error **errp)
{
NBDStructuredMeta chunk;
@@ -1890,7 +1897,9 @@ static int nbd_co_send_extents(NBDClient *client,
uint64_t handle,
{.iov_base = extents, .iov_len = nb_extents * sizeof(extents[0])}
};
- set_be_chunk(&chunk.h, 0, NBD_REPLY_TYPE_BLOCK_STATUS,
+ trace_nbd_co_send_extents(handle, nb_extents, context_id, length,
last);
+ set_be_chunk(&chunk.h, last ? NBD_REPLY_FLAG_DONE : 0,
+ NBD_REPLY_TYPE_BLOCK_STATUS,
handle, sizeof(chunk) - sizeof(chunk.h) +
iov[1].iov_len);
stl_be_p(&chunk.context_id, context_id);
@@ -1900,8 +1909,8 @@ static int nbd_co_send_extents(NBDClient *client,
uint64_t handle,
/* Get block status from the exported device and send it to the client */
static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
BlockDriverState *bs, uint64_t offset,
- uint64_t length, uint32_t context_id,
- Error **errp)
+ uint64_t length, bool last,
+ uint32_t context_id, Error **errp)
{
int ret;
NBDExtent extent;
@@ -1912,17 +1921,18 @@ static int nbd_co_send_block_status(NBDClient
*client, uint64_t handle,
client, handle, -ret, "can't get block status", errp);
}
- return nbd_co_send_extents(client, handle, &extent, 1, context_id,
errp);
+ return nbd_co_send_extents(client, handle, &extent, 1, length, last,
+ context_id, errp);
}
/* Set several extents, describing region of given @length with given
@flags.
* Do not set more than @nb_extents, return number of set extents.
*/
-static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
- uint64_t length, uint32_t flags)
+static unsigned int add_extents(NBDExtent *extents, unsigned nb_extents,
+ uint64_t length, uint32_t flags)
{
- unsigned i = 0;
- uint32_t max_extent = QEMU_ALIGN_DOWN(INT32_MAX, BDRV_SECTOR_SIZE);
+ unsigned int i = 0;
+ uint32_t max_extent = 0x80000000U;
uint32_t max_extent_be = cpu_to_be32(max_extent);
uint32_t flags_be = cpu_to_be32(flags);
@@ -1942,13 +1952,14 @@ static unsigned add_extents(NBDExtent *extents,
unsigned nb_extents,
return i;
}
-static unsigned bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
- uint64_t length, NBDExtent *extents,
- unsigned nb_extents, bool dont_fragment)
+static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t
offset,
+ uint64_t *length, NBDExtent *extents,
+ unsigned int nb_extents,
+ bool dont_fragment)
{
uint64_t begin = offset, end;
- uint64_t overall_end = offset + length;
- unsigned i = 0;
+ uint64_t overall_end = offset + *length;
+ unsigned int i = 0;
BdrvDirtyBitmapIter *it;
bool dirty;
@@ -1983,23 +1994,25 @@ static unsigned
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
bdrv_dirty_bitmap_unlock(bitmap);
+ *length = end - begin;
return i;
}
static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
BdrvDirtyBitmap *bitmap, uint64_t offset,
- uint64_t length, bool dont_fragment,
+ uint32_t length, bool dont_fragment,
uint32_t context_id, Error **errp)
{
int ret;
- unsigned nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS;
+ unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS;
NBDExtent *extents = g_new(NBDExtent, nb_extents);
+ uint64_t final_length = length;
- nb_extents = bitmap_to_extents(bitmap, offset, length, extents,
nb_extents,
- dont_fragment);
+ nb_extents = bitmap_to_extents(bitmap, offset, &final_length, extents,
+ nb_extents, dont_fragment);
- ret = nbd_co_send_extents(client, handle, extents, nb_extents,
context_id,
- errp);
+ ret = nbd_co_send_extents(client, handle, extents, nb_extents,
+ final_length, true, context_id, errp);
g_free(extents);
@@ -2221,12 +2234,13 @@ static coroutine_fn int
nbd_handle_request(NBDClient *client,
case NBD_CMD_BLOCK_STATUS:
if (client->export_meta.valid &&
(client->export_meta.base_allocation ||
- client->export_meta.dirty_bitmap))
+ client->export_meta.bitmap))
{
if (client->export_meta.base_allocation) {
ret = nbd_co_send_block_status(client, request->handle,
blk_bs(exp->blk),
request->from,
request->len,
+ !client->export_meta.bitmap,
NBD_META_ID_BASE_ALLOCATION,
errp);
if (ret < 0) {
@@ -2234,7 +2248,7 @@ static coroutine_fn int
nbd_handle_request(NBDClient *client,
}
}
- if (client->export_meta.dirty_bitmap) {
+ if (client->export_meta.bitmap) {
ret = nbd_co_send_bitmap(client, request->handle,
client->exp->export_bitmap,
request->from, request->len,
diff --git i/nbd/trace-events w/nbd/trace-events
index dee081e7758..5e1d4afe8e6 100644
--- i/nbd/trace-events
+++ w/nbd/trace-events
@@ -64,6 +64,7 @@ nbd_co_send_simple_reply(uint64_t handle, uint32_t
error, const char *errname, i
nbd_co_send_structured_done(uint64_t handle) "Send structured reply
done: handle = %" PRIu64
nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void
*data, size_t size) "Send structured read data reply: handle = %" PRIu64
", offset = %" PRIu64 ", data = %p, len = %zu"
nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset,
size_t size) "Send structured read hole reply: handle = %" PRIu64 ",
offset = %" PRIu64 ", len = %zu"
+nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id,
uint64_t length, int last) "Send block status reply: handle = %" PRIu64
", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last
chunk = %d)"
nbd_co_send_structured_error(uint64_t handle, int err, const char
*errname, const char *msg) "Send structured error reply: handle = %"
PRIu64 ", error = %d (%s), msg = '%s'"
nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type,
const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16
" (%s)"
nbd_co_receive_request_payload_received(uint64_t handle, uint32_t len)
"Payload received: handle = %" PRIu64 ", len = %" PRIu32
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-20 16:27 ` Eric Blake
@ 2018-06-20 17:04 ` Vladimir Sementsov-Ogievskiy
2018-06-20 18:09 ` Eric Blake
0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-20 17:04 UTC (permalink / raw)
To: Eric Blake, qemu-block, qemu-devel; +Cc: armbru, pbonzini, mreitz, kwolf, den
20.06.2018 19:27, Eric Blake wrote:
> On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Handle new NBD meta namespace: "qemu", and corresponding queries:
>> "qemu:dirty-bitmap:<export bitmap name>".
>>
>> With new metadata context negotiated, BLOCK_STATUS query will reply
>> with dirty-bitmap data, converted to extents. New public function
>> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
>> may be exported.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>> include/block/nbd.h | 6 ++
>> nbd/server.c | 271
>> ++++++++++++++++++++++++++++++++++++++++++++++++----
>> 2 files changed, 259 insertions(+), 18 deletions(-)
>
> I'm squashing this in, and adding:
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> (Grammar fixes, prefer 'unsigned int' over 'unsigned', add 'last' flag
> for avoiding separate chunk in reply, shorter variable name to fit 80
> columns, tweak 'length' tracking in order to add a trace before
> sending a status reply)
>
> diff --git i/include/block/nbd.h w/include/block/nbd.h
> index a653d0fba79..8bb9606c39b 100644
> --- i/include/block/nbd.h
> +++ w/include/block/nbd.h
> @@ -229,13 +229,11 @@ enum {
> #define NBD_REPLY_TYPE_ERROR NBD_REPLY_ERR(1)
> #define NBD_REPLY_TYPE_ERROR_OFFSET NBD_REPLY_ERR(2)
>
> -/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
> - * for base:allocation meta context */
> +/* Extent flags for base:allocation in NBD_REPLY_TYPE_BLOCK_STATUS */
> #define NBD_STATE_HOLE (1 << 0)
> #define NBD_STATE_ZERO (1 << 1)
>
> -/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
> - * for qemu:dirty-bitmap:* meta contexts */
> +/* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
> #define NBD_STATE_DIRTY (1 << 0)
>
> static inline bool nbd_reply_type_is_error(int type)
> diff --git i/nbd/server.c w/nbd/server.c
> index 6f662bd4c7f..e84aea6cf03 100644
> --- i/nbd/server.c
> +++ w/nbd/server.c
> @@ -25,9 +25,10 @@
> #define NBD_META_ID_BASE_ALLOCATION 0
> #define NBD_META_ID_DIRTY_BITMAP 1
>
> -/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical
> constant. If need
> - * to increase, note that NBD protocol defines 32 mb as a limit,
> after which the
> - * reply may be considered as a denial of service attack. */
> +/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical
> + * constant. If an increase is needed, note that the NBD protocol
> + * recommends no larger than 32 mb, so that the client won't consider
> + * the reply as a denial of service attack. */
> #define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8)
>
> static int system_errno_to_nbd_errno(int err)
> @@ -101,7 +102,7 @@ typedef struct NBDExportMetaContexts {
> bool valid; /* means that negotiation of the option finished without
> errors */
> bool base_allocation; /* export base:allocation context (block
> status) */
> - bool dirty_bitmap; /* export qemu:dirty-bitmap:<export bitmap
> name> */
> + bool bitmap; /* export qemu:dirty-bitmap:<export bitmap name> */
> } NBDExportMetaContexts;
>
> struct NBDClient {
> @@ -836,16 +837,17 @@ static int nbd_meta_qemu_query(NBDClient
> *client, NBDExportMetaContexts *meta,
> uint32_t len, Error **errp)
> {
> bool dirty_bitmap = false;
> - int dirty_bitmap_len = strlen("dirty-bitmap:");
> + size_t dirty_bitmap_len = strlen("dirty-bitmap:");
> int ret;
>
> if (!meta->exp->export_bitmap) {
> + trace_nbd_negotiate_meta_query_skip("no dirty-bitmap exported");
> return nbd_opt_skip(client, len, errp);
> }
>
> if (len == 0) {
> if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
> - meta->dirty_bitmap = true;
> + meta->bitmap = true;
> }
> trace_nbd_negotiate_meta_query_parse("empty");
> return 1;
> @@ -870,7 +872,7 @@ static int nbd_meta_qemu_query(NBDClient *client,
> NBDExportMetaContexts *meta,
>
> return nbd_meta_empty_or_pattern(
> client, meta->exp->export_bitmap_context +
> - strlen("qemu:dirty_bitmap:"), len, &meta->dirty_bitmap,
> errp);
> + strlen("qemu:dirty_bitmap:"), len, &meta->bitmap, errp);
> }
>
> /* nbd_negotiate_meta_query
> @@ -888,11 +890,14 @@ static int nbd_meta_qemu_query(NBDClient
> *client, NBDExportMetaContexts *meta,
> static int nbd_negotiate_meta_query(NBDClient *client,
> NBDExportMetaContexts *meta, Error **errp)
> {
> + /*
> + * Both 'qemu' and 'base' namespaces have length = 5 including a
> + * colon. If another length namespace is later introduced, this
> + * should certainly be refactored.
> + */
> int ret;
> size_t ns_len = 5;
> - char ns[5]; /* Both 'qemu' and 'base' namespaces have length = 5
> including a
> - colon. If it's needed to introduce a namespace of
> the other
> - length, this should be certainly refactored. */
> + char ns[5];
> uint32_t len;
>
> ret = nbd_opt_read(client, &len, sizeof(len), errp);
> @@ -991,7 +996,7 @@ static int nbd_negotiate_meta_queries(NBDClient
> *client,
> }
> }
>
> - if (meta->dirty_bitmap) {
> + if (meta->bitmap) {
> ret = nbd_negotiate_send_meta_context(client,
>
> meta->exp->export_bitmap_context,
> NBD_META_ID_DIRTY_BITMAP,
> @@ -1876,11 +1881,13 @@ static int
> blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset,
>
> /* nbd_co_send_extents
> *
> - * NBD_REPLY_FLAG_DONE is not set, don't forget to send it.
> + * @last controls whether NBD_REPLY_FLAG_DONE is sent.
> *
> - * @extents should be in big-endian */
> + * @extents should already be in big-endian format.
> + */
> static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
> - NBDExtent *extents, unsigned nb_extents,
> + NBDExtent *extents, unsigned int
> nb_extents,
> + uint64_t length, bool last,
> uint32_t context_id, Error **errp)
> {
> NBDStructuredMeta chunk;
> @@ -1890,7 +1897,9 @@ static int nbd_co_send_extents(NBDClient
> *client, uint64_t handle,
> {.iov_base = extents, .iov_len = nb_extents *
> sizeof(extents[0])}
> };
>
> - set_be_chunk(&chunk.h, 0, NBD_REPLY_TYPE_BLOCK_STATUS,
> + trace_nbd_co_send_extents(handle, nb_extents, context_id, length,
> last);
hm, length variable added only to be traced.. Ok, but a bit strange.
> + set_be_chunk(&chunk.h, last ? NBD_REPLY_FLAG_DONE : 0,
> + NBD_REPLY_TYPE_BLOCK_STATUS,
> handle, sizeof(chunk) - sizeof(chunk.h) +
> iov[1].iov_len);
> stl_be_p(&chunk.context_id, context_id);
>
> @@ -1900,8 +1909,8 @@ static int nbd_co_send_extents(NBDClient
> *client, uint64_t handle,
> /* Get block status from the exported device and send it to the
> client */
> static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
> BlockDriverState *bs, uint64_t
> offset,
> - uint64_t length, uint32_t
> context_id,
> - Error **errp)
> + uint64_t length, bool last,
> + uint32_t context_id, Error **errp)
> {
> int ret;
> NBDExtent extent;
> @@ -1912,17 +1921,18 @@ static int nbd_co_send_block_status(NBDClient
> *client, uint64_t handle,
> client, handle, -ret, "can't get block status", errp);
> }
>
> - return nbd_co_send_extents(client, handle, &extent, 1,
> context_id, errp);
> + return nbd_co_send_extents(client, handle, &extent, 1, length, last,
> + context_id, errp);
> }
>
> /* Set several extents, describing region of given @length with given
> @flags.
> * Do not set more than @nb_extents, return number of set extents.
> */
> -static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
> - uint64_t length, uint32_t flags)
> +static unsigned int add_extents(NBDExtent *extents, unsigned nb_extents,
> + uint64_t length, uint32_t flags)
> {
> - unsigned i = 0;
> - uint32_t max_extent = QEMU_ALIGN_DOWN(INT32_MAX, BDRV_SECTOR_SIZE);
> + unsigned int i = 0;
> + uint32_t max_extent = 0x80000000U;
So, you just take the biggest possible granularity, perfect.
> uint32_t max_extent_be = cpu_to_be32(max_extent);
> uint32_t flags_be = cpu_to_be32(flags);
>
> @@ -1942,13 +1952,14 @@ static unsigned add_extents(NBDExtent
> *extents, unsigned nb_extents,
> return i;
> }
>
> -static unsigned bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t
> offset,
> - uint64_t length, NBDExtent *extents,
> - unsigned nb_extents, bool
> dont_fragment)
> +static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap,
> uint64_t offset,
> + uint64_t *length, NBDExtent
> *extents,
length - in-out? worth comment?
> + unsigned int nb_extents,
> + bool dont_fragment)
> {
> uint64_t begin = offset, end;
> - uint64_t overall_end = offset + length;
> - unsigned i = 0;
> + uint64_t overall_end = offset + *length;
> + unsigned int i = 0;
> BdrvDirtyBitmapIter *it;
> bool dirty;
>
> @@ -1983,23 +1994,25 @@ static unsigned
> bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
>
> bdrv_dirty_bitmap_unlock(bitmap);
>
> + *length = end - begin;
hm, it will always be zero, as at the end of the previous loop we have
"begin = end;"
> return i;
> }
>
> static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
> BdrvDirtyBitmap *bitmap, uint64_t offset,
> - uint64_t length, bool dont_fragment,
> + uint32_t length, bool dont_fragment,
> uint32_t context_id, Error **errp)
> {
> int ret;
> - unsigned nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS;
> + unsigned int nb_extents = dont_fragment ? 1 :
> NBD_MAX_BITMAP_EXTENTS;
> NBDExtent *extents = g_new(NBDExtent, nb_extents);
> + uint64_t final_length = length;
>
> - nb_extents = bitmap_to_extents(bitmap, offset, length, extents,
> nb_extents,
> - dont_fragment);
> + nb_extents = bitmap_to_extents(bitmap, offset, &final_length,
> extents,
> + nb_extents, dont_fragment);
>
> - ret = nbd_co_send_extents(client, handle, extents, nb_extents,
> context_id,
> - errp);
> + ret = nbd_co_send_extents(client, handle, extents, nb_extents,
> + final_length, true, context_id, errp);
hmm in-out pointer field only to trace final_length? may be just trace
it in bitmap_to_extents?
also, it's not trivial, that the function now sends FLAG_DONE. I think,
worth add a comment, or
a parameter like for nbd_co_send_block_status. It will simplify
introducing new contexts in future.
>
> g_free(extents);
>
> @@ -2221,12 +2234,13 @@ static coroutine_fn int
> nbd_handle_request(NBDClient *client,
> case NBD_CMD_BLOCK_STATUS:
> if (client->export_meta.valid &&
> (client->export_meta.base_allocation ||
> - client->export_meta.dirty_bitmap))
> + client->export_meta.bitmap))
> {
> if (client->export_meta.base_allocation) {
> ret = nbd_co_send_block_status(client, request->handle,
> blk_bs(exp->blk), request->from,
> request->len,
> + !client->export_meta.bitmap,
>
> NBD_META_ID_BASE_ALLOCATION,
> errp);
> if (ret < 0) {
> @@ -2234,7 +2248,7 @@ static coroutine_fn int
> nbd_handle_request(NBDClient *client,
> }
> }
>
> - if (client->export_meta.dirty_bitmap) {
> + if (client->export_meta.bitmap) {
> ret = nbd_co_send_bitmap(client, request->handle,
> client->exp->export_bitmap,
> request->from, request->len,
> diff --git i/nbd/trace-events w/nbd/trace-events
> index dee081e7758..5e1d4afe8e6 100644
> --- i/nbd/trace-events
> +++ w/nbd/trace-events
> @@ -64,6 +64,7 @@ nbd_co_send_simple_reply(uint64_t handle, uint32_t
> error, const char *errname, i
> nbd_co_send_structured_done(uint64_t handle) "Send structured reply
> done: handle = %" PRIu64
> nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void
> *data, size_t size) "Send structured read data reply: handle = %"
> PRIu64 ", offset = %" PRIu64 ", data = %p, len = %zu"
> nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset,
> size_t size) "Send structured read hole reply: handle = %" PRIu64 ",
> offset = %" PRIu64 ", len = %zu"
> +nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t
> id, uint64_t length, int last) "Send block status reply: handle = %"
> PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes,
> last chunk = %d)"
> nbd_co_send_structured_error(uint64_t handle, int err, const char
> *errname, const char *msg) "Send structured error reply: handle = %"
> PRIu64 ", error = %d (%s), msg = '%s'"
> nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type,
> const char *name) "Decoding type: handle = %" PRIu64 ", type = %"
> PRIu16 " (%s)"
> nbd_co_receive_request_payload_received(uint64_t handle, uint32_t
> len) "Payload received: handle = %" PRIu64 ", len = %" PRIu32
>
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-20 17:04 ` Vladimir Sementsov-Ogievskiy
@ 2018-06-20 18:09 ` Eric Blake
2018-06-21 10:09 ` Vladimir Sementsov-Ogievskiy
2018-09-14 16:22 ` Vladimir Sementsov-Ogievskiy
0 siblings, 2 replies; 34+ messages in thread
From: Eric Blake @ 2018-06-20 18:09 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
Cc: armbru, pbonzini, mreitz, kwolf, den
On 06/20/2018 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> 20.06.2018 19:27, Eric Blake wrote:
>> On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Handle new NBD meta namespace: "qemu", and corresponding queries:
>>> "qemu:dirty-bitmap:<export bitmap name>".
>>>
>>> With new metadata context negotiated, BLOCK_STATUS query will reply
>>> with dirty-bitmap data, converted to extents. New public function
>>> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
>>> may be exported.
>>>
>> static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
>> - NBDExtent *extents, unsigned nb_extents,
>> + NBDExtent *extents, unsigned int
>> nb_extents,
>> + uint64_t length, bool last,
>> uint32_t context_id, Error **errp)
>> {
>> NBDStructuredMeta chunk;
>> @@ -1890,7 +1897,9 @@ static int nbd_co_send_extents(NBDClient
>> *client, uint64_t handle,
>> {.iov_base = extents, .iov_len = nb_extents *
>> sizeof(extents[0])}
>> };
>>
>> - set_be_chunk(&chunk.h, 0, NBD_REPLY_TYPE_BLOCK_STATUS,
>> + trace_nbd_co_send_extents(handle, nb_extents, context_id, length,
>> last);
>
> hm, length variable added only to be traced.. Ok, but a bit strange.
Yes. It doesn't affect what goes over the wire, but when it comes to
tracing, knowing the sum of the extents can be quite helpful (especially
knowing if the server's reply is shorter or longer than the client's
request, and the fact that when two or more contexts are negotiated by
the client, the different contexts can have different length replies)
>> +static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap,
>> uint64_t offset,
>> + uint64_t *length, NBDExtent
>> *extents,
>
> length - in-out? worth comment?
Sure.
>
>> + unsigned int nb_extents,
>> + bool dont_fragment)
>> {
>> uint64_t begin = offset, end;
>> - uint64_t overall_end = offset + length;
>> - unsigned i = 0;
>> + uint64_t overall_end = offset + *length;
>> + unsigned int i = 0;
>> BdrvDirtyBitmapIter *it;
>> bool dirty;
>>
>> @@ -1983,23 +1994,25 @@ static unsigned
>> bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
>>
>> bdrv_dirty_bitmap_unlock(bitmap);
>>
>> + *length = end - begin;
>
> hm, it will always be zero, as at the end of the previous loop we have
> "begin = end;"
Ah, then it should be end - offset. Thanks for the careful review.
In fact, since ONLY the final extent can be larger than 2G (all earlier
extents were short of the overall_end, and the incoming length is
32-bit), but the NBD protocol says that at most one extent can go beyond
the original request, we do NOT want to split a >2G extent into multiple
extents, but rather cap it to just shy of 4G at the granularity offered
by the bitmap itself. At which point add_extents() never returns more
than 1, and can just be inlined.
>
>> return i;
>> }
>>
>> static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
>> BdrvDirtyBitmap *bitmap, uint64_t offset,
>> - uint64_t length, bool dont_fragment,
>> + uint32_t length, bool dont_fragment,
>> uint32_t context_id, Error **errp)
>> {
>> int ret;
>> - unsigned nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS;
>> + unsigned int nb_extents = dont_fragment ? 1 :
>> NBD_MAX_BITMAP_EXTENTS;
>> NBDExtent *extents = g_new(NBDExtent, nb_extents);
>> + uint64_t final_length = length;
>>
>> - nb_extents = bitmap_to_extents(bitmap, offset, length, extents,
>> nb_extents,
>> - dont_fragment);
>> + nb_extents = bitmap_to_extents(bitmap, offset, &final_length,
>> extents,
>> + nb_extents, dont_fragment);
>>
>> - ret = nbd_co_send_extents(client, handle, extents, nb_extents,
>> context_id,
>> - errp);
>> + ret = nbd_co_send_extents(client, handle, extents, nb_extents,
>> + final_length, true, context_id, errp);
>
> hmm in-out pointer field only to trace final_length? may be just trace
> it in bitmap_to_extents?
No, because base:allocation also benefits from tracing final_length.
>
> also, it's not trivial, that the function now sends FLAG_DONE. I think,
> worth add a comment, or
> a parameter like for nbd_co_send_block_status. It will simplify
> introducing new contexts in future.
Do we anticipate adding any in the near future? But adding a parameter
that is always true so that the callsite becomes more obvious on when to
pass the done flag may indeed make future additions easier to spot in
one place.
So here's what else I'll squash in:
diff --git i/nbd/server.c w/nbd/server.c
index e84aea6cf03..7a96b296839 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -1881,9 +1881,10 @@ static int
blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset,
/* nbd_co_send_extents
*
- * @last controls whether NBD_REPLY_FLAG_DONE is sent.
- *
- * @extents should already be in big-endian format.
+ * @length is only for tracing purposes (and may be smaller or larger
+ * than the client's original request). @last controls whether
+ * NBD_REPLY_FLAG_DONE is sent. @extents should already be in
+ * big-endian format.
*/
static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
NBDExtent *extents, unsigned int
nb_extents,
@@ -1925,33 +1926,12 @@ static int nbd_co_send_block_status(NBDClient
*client, uint64_t handle,
context_id, errp);
}
-/* Set several extents, describing region of given @length with given
@flags.
- * Do not set more than @nb_extents, return number of set extents.
+/*
+ * Populate @extents from a dirty bitmap. Unless @dont_fragment, the
+ * final extent may exceed the original @length. Store in @length the
+ * byte length encoded (which may be smaller or larger than the
+ * original), and return the number of extents used.
*/
-static unsigned int add_extents(NBDExtent *extents, unsigned nb_extents,
- uint64_t length, uint32_t flags)
-{
- unsigned int i = 0;
- uint32_t max_extent = 0x80000000U;
- uint32_t max_extent_be = cpu_to_be32(max_extent);
- uint32_t flags_be = cpu_to_be32(flags);
-
- for (i = 0; i < nb_extents && length > max_extent;
- i++, length -= max_extent)
- {
- extents[i].length = max_extent_be;
- extents[i].flags = flags_be;
- }
-
- if (length > 0 && i < nb_extents) {
- extents[i].length = cpu_to_be32(length);
- extents[i].flags = flags_be;
- i++;
- }
-
- return i;
-}
-
static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap,
uint64_t offset,
uint64_t *length, NBDExtent
*extents,
unsigned int nb_extents,
@@ -1976,16 +1956,18 @@ static unsigned int
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
end = bdrv_dirty_iter_next(it);
}
if (end == -1) {
- end = bdrv_dirty_bitmap_size(bitmap);
+ /* Cap to an aligned value < 4G beyond begin. */
+ end = MIN(bdrv_dirty_bitmap_size(bitmap),
+ begin + 0x100000000ULL -
+ bdrv_dirty_bitmap_granularity(bitmap));
}
if (dont_fragment && end > overall_end) {
- /* We MUST not exceed requested region if
NBD_CMD_FLAG_REQ_ONE flag
- * is set */
end = overall_end;
}
- i += add_extents(extents + i, nb_extents - i, end - begin,
- dirty ? NBD_STATE_DIRTY : 0);
+ extents[i].length = cpu_to_be32(end - begin);
+ extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0);
+ i++;
begin = end;
dirty = !dirty;
}
@@ -1994,13 +1976,13 @@ static unsigned int
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
bdrv_dirty_bitmap_unlock(bitmap);
- *length = end - begin;
+ *length = end - offset;
return i;
}
static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
BdrvDirtyBitmap *bitmap, uint64_t offset,
- uint32_t length, bool dont_fragment,
+ uint32_t length, bool dont_fragment, bool
last,
uint32_t context_id, Error **errp)
{
int ret;
@@ -2012,7 +1994,7 @@ static int nbd_co_send_bitmap(NBDClient *client,
uint64_t handle,
nb_extents, dont_fragment);
ret = nbd_co_send_extents(client, handle, extents, nb_extents,
- final_length, true, context_id, errp);
+ final_length, last, context_id, errp);
g_free(extents);
@@ -2253,7 +2235,7 @@ static coroutine_fn int
nbd_handle_request(NBDClient *client,
client->exp->export_bitmap,
request->from, request->len,
request->flags &
NBD_CMD_FLAG_REQ_ONE,
- NBD_META_ID_DIRTY_BITMAP, errp);
+ true,
NBD_META_ID_DIRTY_BITMAP, errp);
if (ret < 0) {
return ret;
}
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-20 18:09 ` Eric Blake
@ 2018-06-21 10:09 ` Vladimir Sementsov-Ogievskiy
2018-09-14 16:22 ` Vladimir Sementsov-Ogievskiy
1 sibling, 0 replies; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-21 10:09 UTC (permalink / raw)
To: Eric Blake, qemu-block, qemu-devel; +Cc: armbru, pbonzini, mreitz, kwolf, den
20.06.2018 21:09, Eric Blake wrote:
> On 06/20/2018 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> 20.06.2018 19:27, Eric Blake wrote:
>>> On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Handle new NBD meta namespace: "qemu", and corresponding queries:
>>>> "qemu:dirty-bitmap:<export bitmap name>".
>>>>
>>>> With new metadata context negotiated, BLOCK_STATUS query will reply
>>>> with dirty-bitmap data, converted to extents. New public function
>>>> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
>>>> may be exported.
>>>>
>
>>> static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
>>> - NBDExtent *extents, unsigned
>>> nb_extents,
>>> + NBDExtent *extents, unsigned int
>>> nb_extents,
>>> + uint64_t length, bool last,
>>> uint32_t context_id, Error **errp)
>>> {
>>> NBDStructuredMeta chunk;
>>> @@ -1890,7 +1897,9 @@ static int nbd_co_send_extents(NBDClient
>>> *client, uint64_t handle,
>>> {.iov_base = extents, .iov_len = nb_extents *
>>> sizeof(extents[0])}
>>> };
>>>
>>> - set_be_chunk(&chunk.h, 0, NBD_REPLY_TYPE_BLOCK_STATUS,
>>> + trace_nbd_co_send_extents(handle, nb_extents, context_id,
>>> length, last);
>>
>> hm, length variable added only to be traced.. Ok, but a bit strange.
>
> Yes. It doesn't affect what goes over the wire, but when it comes to
> tracing, knowing the sum of the extents can be quite helpful
> (especially knowing if the server's reply is shorter or longer than
> the client's request, and the fact that when two or more contexts are
> negotiated by the client, the different contexts can have different
> length replies)
>
>
>>> +static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap,
>>> uint64_t offset,
>>> + uint64_t *length, NBDExtent
>>> *extents,
>>
>> length - in-out? worth comment?
>
> Sure.
>
>>
>>> + unsigned int nb_extents,
>>> + bool dont_fragment)
>>> {
>>> uint64_t begin = offset, end;
>>> - uint64_t overall_end = offset + length;
>>> - unsigned i = 0;
>>> + uint64_t overall_end = offset + *length;
>>> + unsigned int i = 0;
>>> BdrvDirtyBitmapIter *it;
>>> bool dirty;
>>>
>>> @@ -1983,23 +1994,25 @@ static unsigned
>>> bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
>>>
>>> bdrv_dirty_bitmap_unlock(bitmap);
>>>
>>> + *length = end - begin;
>>
>> hm, it will always be zero, as at the end of the previous loop we
>> have "begin = end;"
>
> Ah, then it should be end - offset. Thanks for the careful review.
>
> In fact, since ONLY the final extent can be larger than 2G (all
> earlier extents were short of the overall_end, and the incoming length
> is 32-bit), but the NBD protocol says that at most one extent can go
> beyond the original request, we do NOT want to split a >2G extent into
> multiple extents, but rather cap it to just shy of 4G at the
> granularity offered by the bitmap itself. At which point
> add_extents() never returns more than 1, and can just be inlined.
>
>>
>>> return i;
>>> }
>>>
>>> static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
>>> BdrvDirtyBitmap *bitmap, uint64_t
>>> offset,
>>> - uint64_t length, bool dont_fragment,
>>> + uint32_t length, bool dont_fragment,
>>> uint32_t context_id, Error **errp)
>>> {
>>> int ret;
>>> - unsigned nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS;
>>> + unsigned int nb_extents = dont_fragment ? 1 :
>>> NBD_MAX_BITMAP_EXTENTS;
>>> NBDExtent *extents = g_new(NBDExtent, nb_extents);
>>> + uint64_t final_length = length;
>>>
>>> - nb_extents = bitmap_to_extents(bitmap, offset, length, extents,
>>> nb_extents,
>>> - dont_fragment);
>>> + nb_extents = bitmap_to_extents(bitmap, offset, &final_length,
>>> extents,
>>> + nb_extents, dont_fragment);
>>>
>>> - ret = nbd_co_send_extents(client, handle, extents, nb_extents,
>>> context_id,
>>> - errp);
>>> + ret = nbd_co_send_extents(client, handle, extents, nb_extents,
>>> + final_length, true, context_id, errp);
>>
>> hmm in-out pointer field only to trace final_length? may be just
>> trace it in bitmap_to_extents?
>
> No, because base:allocation also benefits from tracing final_length.
>
>>
>> also, it's not trivial, that the function now sends FLAG_DONE. I
>> think, worth add a comment, or
>> a parameter like for nbd_co_send_block_status. It will simplify
>> introducing new contexts in future.
>
> Do we anticipate adding any in the near future? But adding a parameter
> that is always true so that the callsite becomes more obvious on when
> to pass the done flag may indeed make future additions easier to spot
> in one place.
>
> So here's what else I'll squash in:
>
> diff --git i/nbd/server.c w/nbd/server.c
> index e84aea6cf03..7a96b296839 100644
> --- i/nbd/server.c
> +++ w/nbd/server.c
> @@ -1881,9 +1881,10 @@ static int
> blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset,
>
> /* nbd_co_send_extents
> *
> - * @last controls whether NBD_REPLY_FLAG_DONE is sent.
> - *
> - * @extents should already be in big-endian format.
> + * @length is only for tracing purposes (and may be smaller or larger
> + * than the client's original request). @last controls whether
> + * NBD_REPLY_FLAG_DONE is sent. @extents should already be in
> + * big-endian format.
> */
> static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
> NBDExtent *extents, unsigned int
> nb_extents,
> @@ -1925,33 +1926,12 @@ static int nbd_co_send_block_status(NBDClient
> *client, uint64_t handle,
> context_id, errp);
> }
>
> -/* Set several extents, describing region of given @length with given
> @flags.
> - * Do not set more than @nb_extents, return number of set extents.
> +/*
> + * Populate @extents from a dirty bitmap. Unless @dont_fragment, the
> + * final extent may exceed the original @length. Store in @length the
> + * byte length encoded (which may be smaller or larger than the
> + * original), and return the number of extents used.
> */
> -static unsigned int add_extents(NBDExtent *extents, unsigned nb_extents,
> - uint64_t length, uint32_t flags)
> -{
> - unsigned int i = 0;
> - uint32_t max_extent = 0x80000000U;
> - uint32_t max_extent_be = cpu_to_be32(max_extent);
> - uint32_t flags_be = cpu_to_be32(flags);
> -
> - for (i = 0; i < nb_extents && length > max_extent;
> - i++, length -= max_extent)
> - {
> - extents[i].length = max_extent_be;
> - extents[i].flags = flags_be;
> - }
> -
> - if (length > 0 && i < nb_extents) {
> - extents[i].length = cpu_to_be32(length);
> - extents[i].flags = flags_be;
> - i++;
> - }
> -
> - return i;
> -}
> -
> static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap,
> uint64_t offset,
> uint64_t *length, NBDExtent
> *extents,
> unsigned int nb_extents,
> @@ -1976,16 +1956,18 @@ static unsigned int
> bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
> end = bdrv_dirty_iter_next(it);
> }
> if (end == -1) {
> - end = bdrv_dirty_bitmap_size(bitmap);
> + /* Cap to an aligned value < 4G beyond begin. */
> + end = MIN(bdrv_dirty_bitmap_size(bitmap),
> + begin + 0x100000000ULL -
> + bdrv_dirty_bitmap_granularity(bitmap));
so, in case of end == -1, all ok
> }
but else, it may be near to the whole bitmap, and end - begin may be
more and more larger than 4G..
so, you can leave "end = bdrv_dirty_bitmap_size(bitmap)" as is, and move
end = MIN after if(){} block.
also and in this case, we may add two extents, and we lose information
about the second, but it's not significant.
> if (dont_fragment && end > overall_end) {
> - /* We MUST not exceed requested region if
> NBD_CMD_FLAG_REQ_ONE flag
> - * is set */
why drop it?
> end = overall_end;
> }
>
> - i += add_extents(extents + i, nb_extents - i, end - begin,
> - dirty ? NBD_STATE_DIRTY : 0);
> + extents[i].length = cpu_to_be32(end - begin);
> + extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0);
> + i++;
> begin = end;
> dirty = !dirty;
> }
> @@ -1994,13 +1976,13 @@ static unsigned int
> bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
>
> bdrv_dirty_bitmap_unlock(bitmap);
>
> - *length = end - begin;
> + *length = end - offset;
> return i;
> }
>
> static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
> BdrvDirtyBitmap *bitmap, uint64_t offset,
> - uint32_t length, bool dont_fragment,
> + uint32_t length, bool dont_fragment,
> bool last,
> uint32_t context_id, Error **errp)
> {
> int ret;
> @@ -2012,7 +1994,7 @@ static int nbd_co_send_bitmap(NBDClient *client,
> uint64_t handle,
> nb_extents, dont_fragment);
>
> ret = nbd_co_send_extents(client, handle, extents, nb_extents,
> - final_length, true, context_id, errp);
> + final_length, last, context_id, errp);
>
> g_free(extents);
>
> @@ -2253,7 +2235,7 @@ static coroutine_fn int
> nbd_handle_request(NBDClient *client,
> client->exp->export_bitmap,
> request->from, request->len,
> request->flags & NBD_CMD_FLAG_REQ_ONE,
> - NBD_META_ID_DIRTY_BITMAP, errp);
> + true,
> NBD_META_ID_DIRTY_BITMAP, errp);
> if (ret < 0) {
> return ret;
> }
>
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-20 18:09 ` Eric Blake
2018-06-21 10:09 ` Vladimir Sementsov-Ogievskiy
@ 2018-09-14 16:22 ` Vladimir Sementsov-Ogievskiy
1 sibling, 0 replies; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-09-14 16:22 UTC (permalink / raw)
To: Eric Blake, qemu-block, qemu-devel; +Cc: armbru, pbonzini, mreitz, kwolf, den
20.06.2018 21:09, Eric Blake wrote:
> On 06/20/2018 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> 20.06.2018 19:27, Eric Blake wrote:
>>> On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Handle new NBD meta namespace: "qemu", and corresponding queries:
>>>> "qemu:dirty-bitmap:<export bitmap name>".
>>>>
>>>> With new metadata context negotiated, BLOCK_STATUS query will reply
>>>> with dirty-bitmap data, converted to extents. New public function
>>>> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
>>>> may be exported.
>>>>
>
>>> static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
>>> - NBDExtent *extents, unsigned
>>> nb_extents,
>>> + NBDExtent *extents, unsigned int
>>> nb_extents,
>>> + uint64_t length, bool last,
>>> uint32_t context_id, Error **errp)
>>> {
>>> NBDStructuredMeta chunk;
>>> @@ -1890,7 +1897,9 @@ static int nbd_co_send_extents(NBDClient
>>> *client, uint64_t handle,
>>> {.iov_base = extents, .iov_len = nb_extents *
>>> sizeof(extents[0])}
>>> };
>>>
>>> - set_be_chunk(&chunk.h, 0, NBD_REPLY_TYPE_BLOCK_STATUS,
>>> + trace_nbd_co_send_extents(handle, nb_extents, context_id,
>>> length, last);
>>
>> hm, length variable added only to be traced.. Ok, but a bit strange.
>
> Yes. It doesn't affect what goes over the wire, but when it comes to
> tracing, knowing the sum of the extents can be quite helpful
> (especially knowing if the server's reply is shorter or longer than
> the client's request, and the fact that when two or more contexts are
> negotiated by the client, the different contexts can have different
> length replies)
>
>
>>> +static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap,
>>> uint64_t offset,
>>> + uint64_t *length, NBDExtent
>>> *extents,
>>
>> length - in-out? worth comment?
>
> Sure.
>
>>
>>> + unsigned int nb_extents,
>>> + bool dont_fragment)
>>> {
>>> uint64_t begin = offset, end;
>>> - uint64_t overall_end = offset + length;
>>> - unsigned i = 0;
>>> + uint64_t overall_end = offset + *length;
>>> + unsigned int i = 0;
>>> BdrvDirtyBitmapIter *it;
>>> bool dirty;
>>>
>>> @@ -1983,23 +1994,25 @@ static unsigned
>>> bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
>>>
>>> bdrv_dirty_bitmap_unlock(bitmap);
>>>
>>> + *length = end - begin;
>>
>> hm, it will always be zero, as at the end of the previous loop we
>> have "begin = end;"
>
> Ah, then it should be end - offset. Thanks for the careful review.
>
> In fact, since ONLY the final extent can be larger than 2G (all
> earlier extents were short of the overall_end, and the incoming length
> is 32-bit), but the NBD protocol says that at most one extent can go
> beyond the original request, we do NOT want to split a >2G extent into
> multiple extents, but rather cap it to just shy of 4G at the
> granularity offered by the bitmap itself. At which point
> add_extents() never returns more than 1, and can just be inlined.
>
>>
>>> return i;
>>> }
>>>
>>> static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
>>> BdrvDirtyBitmap *bitmap, uint64_t
>>> offset,
>>> - uint64_t length, bool dont_fragment,
>>> + uint32_t length, bool dont_fragment,
>>> uint32_t context_id, Error **errp)
>>> {
>>> int ret;
>>> - unsigned nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS;
>>> + unsigned int nb_extents = dont_fragment ? 1 :
>>> NBD_MAX_BITMAP_EXTENTS;
>>> NBDExtent *extents = g_new(NBDExtent, nb_extents);
>>> + uint64_t final_length = length;
>>>
>>> - nb_extents = bitmap_to_extents(bitmap, offset, length, extents,
>>> nb_extents,
>>> - dont_fragment);
>>> + nb_extents = bitmap_to_extents(bitmap, offset, &final_length,
>>> extents,
>>> + nb_extents, dont_fragment);
>>>
>>> - ret = nbd_co_send_extents(client, handle, extents, nb_extents,
>>> context_id,
>>> - errp);
>>> + ret = nbd_co_send_extents(client, handle, extents, nb_extents,
>>> + final_length, true, context_id, errp);
>>
>> hmm in-out pointer field only to trace final_length? may be just
>> trace it in bitmap_to_extents?
>
> No, because base:allocation also benefits from tracing final_length.
>
>>
>> also, it's not trivial, that the function now sends FLAG_DONE. I
>> think, worth add a comment, or
>> a parameter like for nbd_co_send_block_status. It will simplify
>> introducing new contexts in future.
>
> Do we anticipate adding any in the near future? But adding a parameter
> that is always true so that the callsite becomes more obvious on when
> to pass the done flag may indeed make future additions easier to spot
> in one place.
>
> So here's what else I'll squash in:
>
> diff --git i/nbd/server.c w/nbd/server.c
> index e84aea6cf03..7a96b296839 100644
> --- i/nbd/server.c
> +++ w/nbd/server.c
> @@ -1881,9 +1881,10 @@ static int
> blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset,
>
> /* nbd_co_send_extents
> *
> - * @last controls whether NBD_REPLY_FLAG_DONE is sent.
> - *
> - * @extents should already be in big-endian format.
> + * @length is only for tracing purposes (and may be smaller or larger
> + * than the client's original request). @last controls whether
> + * NBD_REPLY_FLAG_DONE is sent. @extents should already be in
> + * big-endian format.
> */
> static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
> NBDExtent *extents, unsigned int
> nb_extents,
> @@ -1925,33 +1926,12 @@ static int nbd_co_send_block_status(NBDClient
> *client, uint64_t handle,
> context_id, errp);
> }
>
> -/* Set several extents, describing region of given @length with given
> @flags.
> - * Do not set more than @nb_extents, return number of set extents.
> +/*
> + * Populate @extents from a dirty bitmap. Unless @dont_fragment, the
> + * final extent may exceed the original @length. Store in @length the
> + * byte length encoded (which may be smaller or larger than the
> + * original), and return the number of extents used.
> */
> -static unsigned int add_extents(NBDExtent *extents, unsigned nb_extents,
> - uint64_t length, uint32_t flags)
> -{
> - unsigned int i = 0;
> - uint32_t max_extent = 0x80000000U;
> - uint32_t max_extent_be = cpu_to_be32(max_extent);
> - uint32_t flags_be = cpu_to_be32(flags);
> -
> - for (i = 0; i < nb_extents && length > max_extent;
> - i++, length -= max_extent)
> - {
> - extents[i].length = max_extent_be;
> - extents[i].flags = flags_be;
> - }
> -
> - if (length > 0 && i < nb_extents) {
> - extents[i].length = cpu_to_be32(length);
> - extents[i].flags = flags_be;
> - i++;
> - }
> -
> - return i;
> -}
> -
> static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap,
> uint64_t offset,
> uint64_t *length, NBDExtent
> *extents,
> unsigned int nb_extents,
> @@ -1976,16 +1956,18 @@ static unsigned int
> bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
> end = bdrv_dirty_iter_next(it);
> }
> if (end == -1) {
> - end = bdrv_dirty_bitmap_size(bitmap);
> + /* Cap to an aligned value < 4G beyond begin. */
> + end = MIN(bdrv_dirty_bitmap_size(bitmap),
> + begin + 0x100000000ULL -
> + bdrv_dirty_bitmap_granularity(bitmap));
> }
> if (dont_fragment && end > overall_end) {
> - /* We MUST not exceed requested region if
> NBD_CMD_FLAG_REQ_ONE flag
> - * is set */
> end = overall_end;
> }
>
> - i += add_extents(extents + i, nb_extents - i, end - begin,
> - dirty ? NBD_STATE_DIRTY : 0);
> + extents[i].length = cpu_to_be32(end - begin);
> + extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0);
> + i++;
> begin = end;
> dirty = !dirty;
hmm, suddenly I understand that it's wrong. My version switched
dirty=!dirty every iteration, processing all sequential zeroes or ones
in one iteration. After the change, we can process dirty bits in several
sequential iterations. I'll make a patch soon.
If I understand correctly, current code will produce zero-length extents
in the chain, between correct extents.
> }
> @@ -1994,13 +1976,13 @@ static unsigned int
> bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,
>
> bdrv_dirty_bitmap_unlock(bitmap);
>
> - *length = end - begin;
> + *length = end - offset;
> return i;
> }
>
> static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
> BdrvDirtyBitmap *bitmap, uint64_t offset,
> - uint32_t length, bool dont_fragment,
> + uint32_t length, bool dont_fragment,
> bool last,
> uint32_t context_id, Error **errp)
> {
> int ret;
> @@ -2012,7 +1994,7 @@ static int nbd_co_send_bitmap(NBDClient *client,
> uint64_t handle,
> nb_extents, dont_fragment);
>
> ret = nbd_co_send_extents(client, handle, extents, nb_extents,
> - final_length, true, context_id, errp);
> + final_length, last, context_id, errp);
>
> g_free(extents);
>
> @@ -2253,7 +2235,7 @@ static coroutine_fn int
> nbd_handle_request(NBDClient *client,
> client->exp->export_bitmap,
> request->from, request->len,
> request->flags &
> NBD_CMD_FLAG_REQ_ONE,
> - NBD_META_ID_DIRTY_BITMAP, errp);
> + true,
> NBD_META_ID_DIRTY_BITMAP, errp);
> if (ret < 0) {
> return ret;
> }
>
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-09 15:17 ` [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export Vladimir Sementsov-Ogievskiy
2018-06-20 11:24 ` Eric Blake
2018-06-20 16:27 ` Eric Blake
@ 2018-11-29 4:34 ` Eric Blake
2019-01-09 19:21 ` Eric Blake
3 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2018-11-29 4:34 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
Cc: armbru, pbonzini, mreitz, kwolf, den
On 6/9/18 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Handle new NBD meta namespace: "qemu", and corresponding queries:
> "qemu:dirty-bitmap:<export bitmap name>".
>
> With new metadata context negotiated, BLOCK_STATUS query will reply
> with dirty-bitmap data, converted to extents. New public function
> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
> may be exported.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> include/block/nbd.h | 6 ++
> nbd/server.c | 271 ++++++++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 259 insertions(+), 18 deletions(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index fcdcd54502..a653d0fba7 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -234,6 +234,10 @@ enum {
> #define NBD_STATE_HOLE (1 << 0)
> #define NBD_STATE_ZERO (1 << 1)
>
> +/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
> + * for qemu:dirty-bitmap:* meta contexts */
> +#define NBD_STATE_DIRTY (1 << 0)
Hindsight is 20:20, so there's nothing we can do about this definition
now; commit 3d068aff is already released in 3.0. However, if I had been
more careful when this was first introduced, I would have negated the
sense of this bit. The NBD protocol recommends that a status of all 0
represent the default state, and when it comes to dirty bitmaps, the
safest fallback path is to declare the entire image as dirty (the way
that we have a bit named NBD_STATE_HOLE that is 1 for a hole, but 0 for
data, because the safest fallback path is to treat the entire image as
data).
Why does this matter? Because with the 3.0 hack of qemu-img map with
x-dirty-bitmap=qemu:dirty-bitmap:FOO (commit 216ee365), if the client
requests a particular bitmap name but the server does NOT have that name
present, the client does NOT refuse to connect, but rather acts as
though bdrv_block_status() treats the entire image as data. Since the
hack relies on treating "data":false sections of the disk as dirty, but
the fallback causes qemu-img map to report "data":true for the entire
image, relying on the hack will end up seeing NO clusters as dirty, and
with NO indication that the x-dirty-bitmap= failed to work.
Had we used NBD_STATE_CLEAN (1 for clean, 0 for dirty), we would have at
least had the safe fallback of treating the entire image as dirty, which
would result in a full backup (a bit wasteful, but better than not
backing up any data on the incorrect assumption that the image is clean).
I'm in the process of implementing 'qemu-nbd --list' which will show
exactly which meta contexts an export supports, so that we can use that
as a fallback mechanism for a client to check whether the NBD server is
actually advertising the desired qemu:dirty-bitmap: context prior to
blindly requesting that context via x-dirty-bitmap. But even that hits
a slight snag:
> @@ -924,6 +987,16 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
> }
> }
>
> + if (meta->dirty_bitmap) {
> + ret = nbd_negotiate_send_meta_context(client,
> + meta->exp->export_bitmap_context,
> + NBD_META_ID_DIRTY_BITMAP,
> + errp);
> + if (ret < 0) {
> + return ret;
> + }
> + }
> +
The NBD spec says a client is allowed to issue NBD_OPT_LIST_META_CONTEXT
with 0 queries in order to learn about all supported contexts, but this
hunk forgot to advertise qemu:dirty-bitmap:FOO in that case. My full
patch series for 'qemu-nbd --list' is coming up soon, and includes a
patch that works around the 3.0/3.1 flaw by following up with a second
LIST_META_CONTEXT with 1 query of "qemu:' if the 0-query version didn't
get any "qemu:..." responses.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2018-06-09 15:17 ` [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export Vladimir Sementsov-Ogievskiy
` (2 preceding siblings ...)
2018-11-29 4:34 ` Eric Blake
@ 2019-01-09 19:21 ` Eric Blake
2019-01-10 7:15 ` Eric Blake
2019-01-17 21:09 ` John Snow
3 siblings, 2 replies; 34+ messages in thread
From: Eric Blake @ 2019-01-09 19:21 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
Cc: armbru, pbonzini, mreitz, kwolf, den
[-- Attachment #1: Type: text/plain, Size: 3512 bytes --]
Revisiting an older thread:
On 6/9/18 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Handle new NBD meta namespace: "qemu", and corresponding queries:
> "qemu:dirty-bitmap:<export bitmap name>".
>
> With new metadata context negotiated, BLOCK_STATUS query will reply
> with dirty-bitmap data, converted to extents. New public function
> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
> may be exported.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> @@ -2199,3 +2393,44 @@ void nbd_client_new(NBDExport *exp,
> co = qemu_coroutine_create(nbd_co_client_start, client);
> qemu_coroutine_enter(co);
> }
> +
> +void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
> + const char *bitmap_export_name, Error **errp)
> +{
> + BdrvDirtyBitmap *bm = NULL;
> + BlockDriverState *bs = blk_bs(exp->blk);
> +
> + if (exp->export_bitmap) {
> + error_setg(errp, "Export bitmap is already set");
> + return;
> + }
> +
> + while (true) {
> + bm = bdrv_find_dirty_bitmap(bs, bitmap);
> + if (bm != NULL || bs->backing == NULL) {
> + break;
> + }
> +
> + bs = bs->backing->bs;
> + }
> +
> + if (bm == NULL) {
> + error_setg(errp, "Bitmap '%s' is not found", bitmap);
> + return;
> + }
> +
> + if (bdrv_dirty_bitmap_enabled(bm)) {
> + error_setg(errp, "Bitmap '%s' is enabled", bitmap);
> + return;
> + }
Why are we restricting things to only export disabled bitmaps?
I can understand the argument that if the image being exported is
read-only, then an enabled bitmap _that can be changed_ is probably a
bad idea (it goes against the notion of the export being read only).
But if we were to allow a writable access to an image, wouldn't we
expect that writes be reflected into the bitmap, which means permitting
an enabled bitmap?
Furthermore, consider the scenario:
backing [with bitmap b0] <- active
If I request a bitmap add of b0 for an export of active, then it really
shouldn't matter whether b0 is left enabled or disabled - the backing
file is read-only, and so no changes will be made to bitmap b0.
I stumbled into the latter scenario while testing my new 'qemu-nbd -B
bitmap', but the problem appears anywhere that we have read-only access
to an image, where we can't change the status of the bitmap from enabled
to disabled (because the image is read-only) but where the bitmap
shouldn't be changing anyways (because the image is read-only).
> +
> + if (bdrv_dirty_bitmap_qmp_locked(bm)) {
> + error_setg(errp, "Bitmap '%s' is locked", bitmap);
> + return;
> + }
> +
> + bdrv_dirty_bitmap_set_qmp_locked(bm, true);
Can we have an enabled-but-locked bitmap (one that we can't delete
because some operation such as a writable NBD export is still using it,
but one that is still being updated by active writes)? If so, then it
may make sense to drop the restriction that an exported bitmap must be
disabled. And even if it is not possible, I'd still like to loosen the
restriction to require a disabled bitmap only if the image owning the
bitmap is writable (making it easier to do read-only exports without
having to first tweak the bitmap to be disabled).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2019-01-09 19:21 ` Eric Blake
@ 2019-01-10 7:15 ` Eric Blake
2019-01-17 21:09 ` John Snow
1 sibling, 0 replies; 34+ messages in thread
From: Eric Blake @ 2019-01-10 7:15 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
Cc: kwolf, pbonzini, den, armbru, mreitz
[-- Attachment #1: Type: text/plain, Size: 1311 bytes --]
On 1/9/19 1:21 PM, Eric Blake wrote:
> Revisiting an older thread:
>
> On 6/9/18 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Handle new NBD meta namespace: "qemu", and corresponding queries:
>> "qemu:dirty-bitmap:<export bitmap name>".
>>
>> With new metadata context negotiated, BLOCK_STATUS query will reply
>> with dirty-bitmap data, converted to extents. New public function
>> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
>> may be exported.
>>
>> + if (bdrv_dirty_bitmap_enabled(bm)) {
>> + error_setg(errp, "Bitmap '%s' is enabled", bitmap);
>> + return;
>> + }
>
> Why are we restricting things to only export disabled bitmaps?
>
> I can understand the argument that if the image being exported is
> read-only, then an enabled bitmap _that can be changed_ is probably a
> bad idea (it goes against the notion of the export being read only).
> But if we were to allow a writable access to an image, wouldn't we
> expect that writes be reflected into the bitmap, which means permitting
> an enabled bitmap?
I've now addressed this in my v2 Promote x-nbd-server-add-bitmap to
stable series.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
2019-01-09 19:21 ` Eric Blake
2019-01-10 7:15 ` Eric Blake
@ 2019-01-17 21:09 ` John Snow
1 sibling, 0 replies; 34+ messages in thread
From: John Snow @ 2019-01-17 21:09 UTC (permalink / raw)
To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
Cc: kwolf, pbonzini, den, armbru, mreitz
On 1/9/19 2:21 PM, Eric Blake wrote:
> Why are we restricting things to only export disabled bitmaps?
Late reply, but the original thought almost surely was that we would
only be exporting bitmaps for fleecing use, which should have a
non-changing bitmap attached to it.
Just some error checking to make sure the user didn't accidentally use
the live copy attached to the active disk image, versus the one prepared
for this purpose.
If that's not true anymore, that's fine if the NBD server can handle the
bits changing while it responds to requests... especially if a client
wants to just watch the bitmap change live and query sectors every now
and again, but it's a different use case from the one fairly targeted
one we were pursuing at the time.
--js
^ permalink raw reply [flat|nested] 34+ messages in thread