All of lore.kernel.org
 help / color / mirror / Atom feed
* Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
@ 2020-02-10 21:37 Eric Blake
  2020-02-10 21:41 ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension Eric Blake
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Eric Blake @ 2020-02-10 21:37 UTC (permalink / raw)
  To: nbd, QEMU, qemu-block, libguestfs
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, Richard W.M. Jones,
	Max Reitz

I will be following up to this email with four separate threads each 
addressed to the appropriate single list, with proposed changes to:
- the NBD protocol
- qemu: both server and client
- libnbd: client
- nbdkit: server

The feature in question adds a new optional NBD_INFO_ packet to the 
NBD_OPT_GO portion of handshake, adding up to 16 bits of information 
that the server can advertise to the client at connection time about any 
known initial state of the export [review to this series may propose 
slight changes, such as using 32 bits; but hopefully by having all four 
series posted in tandem it becomes easier to see whether any such tweaks 
are warranted, and can keep such tweaks interoperable before any of the 
projects land the series upstream].  For now, only 2 of those 16 bits 
are defined: NBD_INIT_SPARSE (the image has at least one hole) and 
NBD_INIT_ZERO (the image reads completely as zero); the two bits are 
orthogonal and can be set independently, although it is easy enough to 
see completely sparse files with both bits set.  Also, advertising the 
bits is orthogonal to whether the base:allocation metacontext is used, 
although a server with all possible extensions is likely to have the two 
concepts match one another.

The new bits are added as an information chunk rather than as runtime 
flags; this is because the intended client of this information is 
operations like copying a sparse image into an NBD server destination. 
Such a client only cares at initialization if it needs to perform a 
pre-zeroing pass or if it can rely on the destination already reading as 
zero.  Once the client starts making modifications, burdening the server 
with the ability to do a live runtime probe of current reads-as-zero 
state does not help the client, and burning per-export flags for 
something that quickly goes stale on the first edit was not thought to 
be wise, similarly, adding a new NBD_CMD did not seem worthwhile.

The existing 'qemu-img convert source... nbd://...' is the first command 
line example that can benefit from the new information; the goal of 
adding a protocol extension was to make this benefit automatic without 
the user having to specify the proposed --target-is-zero when possible. 
I have a similar thread pending for qemu which adds similar 
known-reads-zero information to qcow2 files:
https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg08075.html

That qemu series is at v1, and based on review it has had so far, it 
will need some interface changes for v2, which means my qemu series here 
will need a slight rebasing, but I'm posting this series to all lists 
now to at least demonstrate what is possible when we have better startup 
information.

Note that with this new bit, it is possible to learn if a destination is 
sparse as part of NBD_OPT_GO rather than having to use block-status 
commands.  With existing block-status commands, you can use an O(n) scan 
of block-status to learn if an image reads as all zeroes (or 
short-circuit in O(1) time if the first offset is reported as probable 
data rather than reading as zero); but with this new bit, the answer is 
O(1).  So even with Vladimir's recent change to make the spec permit 4G 
block-status even when max block size is 32M, or the proposed work to 
add 64-bit block-status, you still end up with more on-the-wire traffic 
for block-status to learn if an image is all zeroes than if the server 
just advertises this bit.  But by keeping both extensions orthogonal, a 
server can implement whichever one or both reporting methods it finds 
easiest, and a client can work with whatever a server supplies with sane 
fallbacks when the server lacks either extension.  Conversely, 
block-status tracks live changes to the image, while this bit is only 
valid at connection time.

My repo for each of the four projects contains a tag 'nbd-init-v1':
  https://repo.or.cz/nbd/ericb.git/shortlog/refs/tags/nbd-init-v1
  https://repo.or.cz/qemu/ericb.git/shortlog/refs/tags/nbd-init-v1
  https://repo.or.cz/libnbd/ericb.git/shortlog/refs/tags/nbd-init-v1
  https://repo.or.cz/nbdkit/ericb.git/shortlog/refs/tags/nbd-init-v1

For doing interoperability testing, I find it handy to use:

PATH=/path/to/built/qemu:/path/to/built/nbdkit:$PATH
/path/to/libnbd/run your command here

to pick up just-built qemu-nbd, nbdsh, and nbdkit that all support the 
feature.

For quickly setting flags:
nbdkit eval init_sparse='exit 0' init_zero='exit 0' ...

For quickly checking flags:
qemu-nbd --list ... | grep init
nbdsh -u uri... -c 'print(h.get_init_flags())'

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension
  2020-02-10 21:37 Cross-project NBD extension proposal: NBD_INFO_INIT_STATE Eric Blake
@ 2020-02-10 21:41 ` Eric Blake
  2020-02-10 21:41   ` [PATCH 1/3] nbd: Preparation for NBD_INFO_INIT_STATE Eric Blake
                     ` (4 more replies)
  2020-02-10 22:12 ` Cross-project NBD extension proposal: NBD_INFO_INIT_STATE Richard W.M. Jones
  2020-02-17 15:13 ` Max Reitz
  2 siblings, 5 replies; 19+ messages in thread
From: Eric Blake @ 2020-02-10 21:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: vsementsov, rjones, qemu-block, mreitz

See the cross-posted cover letter for more details.

Based-on: https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg08075.html

Eric Blake (3):
  nbd: Preparation for NBD_INFO_INIT_STATE
  nbd: Add .bdrv_known_zeroes() client support
  nbd: Add .bdrv_known_zeroes() server support

 block/block-backend.c          |  9 +++++++++
 block/nbd.c                    | 15 +++++++++++++++
 docs/interop/nbd.txt           |  1 +
 include/block/nbd.h            | 13 +++++++++++++
 include/sysemu/block-backend.h |  1 +
 nbd/client.c                   | 24 ++++++++++++++++++++----
 nbd/common.c                   |  2 ++
 nbd/server.c                   | 11 +++++++++++
 nbd/trace-events               |  1 +
 qemu-nbd.c                     | 13 +++++++++++++
 tests/qemu-iotests/223.out     |  4 ++++
 tests/qemu-iotests/233.out     |  1 +
 12 files changed, 91 insertions(+), 4 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] nbd: Preparation for NBD_INFO_INIT_STATE
  2020-02-10 21:41 ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension Eric Blake
@ 2020-02-10 21:41   ` Eric Blake
  2020-02-10 21:41   ` [PATCH 2/3] nbd: Add .bdrv_known_zeroes() client support Eric Blake
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2020-02-10 21:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, vsementsov, rjones, qemu-block, mreitz

Declare the constants being proposed for an NBD extension, which will
let qemu advertise/learn if an image began life with all zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 docs/interop/nbd.txt | 1 +
 include/block/nbd.h  | 9 +++++++++
 nbd/common.c         | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index 45118809618e..35ba85367153 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -55,3 +55,4 @@ the operation of that feature.
 NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 * 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports,
 NBD_CMD_FLAG_FAST_ZERO
+* 5.0: NBD_INFO_INIT_STATE
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 7f46932d80f1..0de020904a37 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -200,6 +200,15 @@ enum {
 #define NBD_INFO_NAME           1
 #define NBD_INFO_DESCRIPTION    2
 #define NBD_INFO_BLOCK_SIZE     3
+#define NBD_INFO_INIT_STATE     4
+
+/* Initial state bits, when replying to NBD_INFO_INIT_STATE */
+enum {
+    NBD_INIT_SPARSE_BIT       = 0,
+    NBD_INIT_ZERO_BIT         = 1,
+};
+#define NBD_INIT_SPARSE         (1 << NBD_INIT_SPARSE_BIT)
+#define NBD_INIT_ZERO           (1 << NBD_INIT_ZERO_BIT)

 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
diff --git a/nbd/common.c b/nbd/common.c
index ddfe7d118371..3e24feb0d502 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -129,6 +129,8 @@ const char *nbd_info_lookup(uint16_t info)
         return "description";
     case NBD_INFO_BLOCK_SIZE:
         return "block size";
+    case NBD_INFO_INIT_STATE:
+        return "init state";
     default:
         return "<unknown>";
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] nbd: Add .bdrv_known_zeroes() client support
  2020-02-10 21:41 ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension Eric Blake
  2020-02-10 21:41   ` [PATCH 1/3] nbd: Preparation for NBD_INFO_INIT_STATE Eric Blake
@ 2020-02-10 21:41   ` Eric Blake
  2020-02-10 21:41   ` [PATCH 3/3] nbd: Add .bdrv_known_zeroes() server support Eric Blake
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2020-02-10 21:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, vsementsov, rjones, qemu-block, mreitz

Using the new NBD extension of NBD_INFO_INIT_STATE, we can pass on the
information when a server reports that an image initially reads as all
zeroes.  The server information is treated as stale the moment we
request a write operation, even across reconnections to the server,
which is fine since our intended usage of BDRV_ZERO_OPEN is to
optimize qemu-img at startup, and not something relied on during later
image use.

Update iotests to reflect improved output of 'qemu-nbd --list'.

As NBD still cannot create or resize images, we don't need to worry
about BDRV_ZERO_CREATE or BDRV_ZERO_TRUNCATE.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd.c                | 15 +++++++++++++++
 include/block/nbd.h        |  4 ++++
 nbd/client.c               | 24 ++++++++++++++++++++----
 nbd/trace-events           |  1 +
 qemu-nbd.c                 | 13 +++++++++++++
 tests/qemu-iotests/223.out |  4 ++++
 tests/qemu-iotests/233.out |  1 +
 7 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index d085554f21ea..2e1fbd6152f6 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1204,6 +1204,7 @@ static int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
     };

     assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
+    s->info.modified = true;
     if (flags & BDRV_REQ_FUA) {
         assert(s->info.flags & NBD_FLAG_SEND_FUA);
         request.flags |= NBD_CMD_FLAG_FUA;
@@ -1276,6 +1277,7 @@ static int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset,
     };

     assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
+    s->info.modified = true;
     if (!(s->info.flags & NBD_FLAG_SEND_TRIM) || !bytes) {
         return 0;
     }
@@ -1909,6 +1911,16 @@ static int nbd_co_flush(BlockDriverState *bs)
     return nbd_client_co_flush(bs);
 }

+static int nbd_known_zeroes(BlockDriverState *bs)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    if (!s->info.modified && s->info.init_state & NBD_INIT_ZERO) {
+        return BDRV_ZERO_OPEN;
+    }
+    return 0;
+}
+
 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
@@ -2027,6 +2039,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_close                 = nbd_close,
     .bdrv_co_flush_to_os        = nbd_co_flush,
     .bdrv_co_pdiscard           = nbd_client_co_pdiscard,
+    .bdrv_known_zeroes          = nbd_known_zeroes,
     .bdrv_refresh_limits        = nbd_refresh_limits,
     .bdrv_getlength             = nbd_getlength,
     .bdrv_detach_aio_context    = nbd_client_detach_aio_context,
@@ -2052,6 +2065,7 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_close                 = nbd_close,
     .bdrv_co_flush_to_os        = nbd_co_flush,
     .bdrv_co_pdiscard           = nbd_client_co_pdiscard,
+    .bdrv_known_zeroes          = nbd_known_zeroes,
     .bdrv_refresh_limits        = nbd_refresh_limits,
     .bdrv_getlength             = nbd_getlength,
     .bdrv_detach_aio_context    = nbd_client_detach_aio_context,
@@ -2077,6 +2091,7 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_close                 = nbd_close,
     .bdrv_co_flush_to_os        = nbd_co_flush,
     .bdrv_co_pdiscard           = nbd_client_co_pdiscard,
+    .bdrv_known_zeroes          = nbd_known_zeroes,
     .bdrv_refresh_limits        = nbd_refresh_limits,
     .bdrv_getlength             = nbd_getlength,
     .bdrv_detach_aio_context    = nbd_client_detach_aio_context,
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 0de020904a37..5103053bed49 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -307,6 +307,7 @@ struct NBDExportInfo {
     uint32_t min_block;
     uint32_t opt_block;
     uint32_t max_block;
+    uint16_t init_state;

     uint32_t context_id;

@@ -314,6 +315,9 @@ struct NBDExportInfo {
     char *description;
     int n_contexts;
     char **contexts;
+
+    /* Set during runtime to track if init_state is still trustworthy. */
+    bool modified;
 };
 typedef struct NBDExportInfo NBDExportInfo;

diff --git a/nbd/client.c b/nbd/client.c
index ba173108baab..199a8a2bc49e 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -350,16 +350,17 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt,

     assert(opt == NBD_OPT_GO || opt == NBD_OPT_INFO);
     trace_nbd_opt_info_go_start(nbd_opt_lookup(opt), info->name);
-    buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
+    buf = g_malloc(4 + len + 2 + 2 * (info->request_sizes + 1) + 1);
     stl_be_p(buf, len);
     memcpy(buf + 4, info->name, len);
-    /* At most one request, everything else up to server */
-    stw_be_p(buf + 4 + len, info->request_sizes);
+    /* One or two requests, everything else up to server */
+    stw_be_p(buf + 4 + len, info->request_sizes + 1);
     if (info->request_sizes) {
         stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
     }
+    stw_be_p(buf + 4 + len + 2 + 2 * info->request_sizes, NBD_INFO_INIT_STATE);
     error = nbd_send_option_request(ioc, opt,
-                                    4 + len + 2 + 2 * info->request_sizes,
+                                    4 + len + 2 + 2 * (info->request_sizes + 1),
                                     buf, errp);
     g_free(buf);
     if (error < 0) {
@@ -484,6 +485,21 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt,
                                           info->max_block);
             break;

+        case NBD_INFO_INIT_STATE:
+            if (len != sizeof(info->init_state)) {
+                error_setg(errp, "remaining export info len %" PRIu32
+                           " is unexpected size", len);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            if (nbd_read16(ioc, &info->init_state, "info init state",
+                           errp) < 0) {
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }
+            trace_nbd_opt_info_init_state(info->init_state);
+            break;
+
         default:
             /*
              * Not worth the bother to check if NBD_INFO_NAME or
diff --git a/nbd/trace-events b/nbd/trace-events
index a955918e9707..12589b2afb84 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -10,6 +10,7 @@ nbd_opt_info_go_start(const char *opt, const char *name) "Attempting %s for expo
 nbd_opt_info_go_success(const char *opt) "Export is ready after %s request"
 nbd_opt_info_unknown(int info, const char *name) "Ignoring unknown info %d (%s)"
 nbd_opt_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t maximum) "Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32
+nbd_opt_info_init_state(unsigned int flags) "Initial state flags 0x%x"
 nbd_receive_query_exports_start(const char *wantname) "Querying export list for '%s'"
 nbd_receive_query_exports_success(const char *wantname) "Found desired export name '%s'"
 nbd_receive_starttls_new_client(void) "Setting up TLS"
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 4aa005004ebd..856df85823bc 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -220,6 +220,19 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
             printf("  opt block: %u\n", list[i].opt_block);
             printf("  max block: %u\n", list[i].max_block);
         }
+        {
+            static const char *const init_names[] = {
+                [NBD_INIT_SPARSE_BIT]            = "sparse",
+                [NBD_INIT_ZERO_BIT]              = "zero",
+            };
+            printf("  init state: 0x%x (", list[i].init_state);
+            for (size_t bit = 0; bit < ARRAY_SIZE(init_names); bit++) {
+                if (init_names[bit] && (list[i].init_state & (1 << bit))) {
+                    printf(" %s", init_names[bit]);
+                }
+            }
+            printf(" )\n");
+        }
         if (list[i].n_contexts) {
             printf("  available meta contexts: %d\n", list[i].n_contexts);
             for (j = 0; j < list[i].n_contexts; j++) {
diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index 80c0cf65095b..ce7945aa7cf6 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -59,6 +59,7 @@ exports available: 2
   min block: 1
   opt block: 4096
   max block: 33554432
+  init state: 0x0 ( )
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b
@@ -69,6 +70,7 @@ exports available: 2
   min block: 1
   opt block: 4096
   max block: 33554432
+  init state: 0x0 ( )
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b2
@@ -140,6 +142,7 @@ exports available: 2
   min block: 1
   opt block: 4096
   max block: 33554432
+  init state: 0x0 ( )
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b
@@ -150,6 +153,7 @@ exports available: 2
   min block: 1
   opt block: 4096
   max block: 33554432
+  init state: 0x0 ( )
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b2
diff --git a/tests/qemu-iotests/233.out b/tests/qemu-iotests/233.out
index c3c344811b2b..5be30d6b7c9c 100644
--- a/tests/qemu-iotests/233.out
+++ b/tests/qemu-iotests/233.out
@@ -43,6 +43,7 @@ exports available: 1
   min block: 1
   opt block: 4096
   max block: 33554432
+  init state: 0x0 ( )
   available meta contexts: 1
    base:allocation

-- 
2.24.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] nbd: Add .bdrv_known_zeroes() server support
  2020-02-10 21:41 ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension Eric Blake
  2020-02-10 21:41   ` [PATCH 1/3] nbd: Preparation for NBD_INFO_INIT_STATE Eric Blake
  2020-02-10 21:41   ` [PATCH 2/3] nbd: Add .bdrv_known_zeroes() client support Eric Blake
@ 2020-02-10 21:41   ` Eric Blake
  2020-02-10 21:51   ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension no-reply
  2020-02-10 21:53   ` no-reply
  4 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2020-02-10 21:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, vsementsov, rjones, qemu-block, mreitz

Using the new NBD extension of NBD_INFO_INIT_STATE, we can advertise
at least the NBD_INIT_ZERO bit based on what the block layer already
knows.  Advertising NBD_INIT_SPARSE might also be possible by
inspecting blk_probe_blocksizes, but as the block layer does not
consume that information at present, the effort of advertising it for
a third party is less important.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/block-backend.c          |  9 +++++++++
 include/sysemu/block-backend.h |  1 +
 nbd/server.c                   | 11 +++++++++++
 3 files changed, 21 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 8b8f2a80a0d5..d7e01f2e67de 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2127,6 +2127,15 @@ int blk_load_vmstate(BlockBackend *blk, uint8_t *buf, int64_t pos, int size)
     return bdrv_load_vmstate(blk_bs(blk), buf, pos, size);
 }

+int blk_known_zeroes(BlockBackend *blk)
+{
+    if (!blk_is_available(blk)) {
+        return 0;
+    }
+
+    return bdrv_known_zeroes(blk_bs(blk));
+}
+
 int blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz)
 {
     if (!blk_is_available(blk)) {
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index b198deca0b24..2a9b750bb775 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -245,6 +245,7 @@ int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
 int blk_load_vmstate(BlockBackend *blk, uint8_t *buf, int64_t pos, int size);
 int blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz);
 int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo);
+int blk_known_zeroes(BlockBackend *blk);
 BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
                                   BlockCompletionFunc *cb,
                                   void *opaque, int ret);
diff --git a/nbd/server.c b/nbd/server.c
index 11a31094ff83..f6bb7d944daa 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -661,6 +661,17 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
         return rc;
     }

+    /* Send NBD_INFO_INIT_STATE always */
+    trace_nbd_negotiate_new_style_size_flags(exp->size, myflags);
+    /* Is it worth using blk_probe_blocksizes for setting NBD_INIT_SPARSE? */
+    stw_be_p(buf, ((blk_known_zeroes(exp->blk) & BDRV_ZERO_OPEN)
+                   ? NBD_INIT_ZERO : 0));
+    rc = nbd_negotiate_send_info(client, NBD_INFO_INIT_STATE,
+                                 sizeof(uint16_t), buf, errp);
+    if (rc < 0) {
+        return rc;
+    }
+
     /*
      * If the client is just asking for NBD_OPT_INFO, but forgot to
      * request block sizes in a situation that would impact
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension
  2020-02-10 21:41 ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension Eric Blake
                     ` (2 preceding siblings ...)
  2020-02-10 21:41   ` [PATCH 3/3] nbd: Add .bdrv_known_zeroes() server support Eric Blake
@ 2020-02-10 21:51   ` no-reply
  2020-02-10 21:54     ` Eric Blake
  2020-02-10 21:53   ` no-reply
  4 siblings, 1 reply; 19+ messages in thread
From: no-reply @ 2020-02-10 21:51 UTC (permalink / raw)
  To: eblake; +Cc: mreitz, vsementsov, qemu-devel, qemu-block, rjones

Patchew URL: https://patchew.org/QEMU/20200210214109.751734-1-eblake@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      block/block-copy.o
  CC      block/crypto.o
/tmp/qemu-test/src/block/block-backend.c: In function 'blk_known_zeroes':
/tmp/qemu-test/src/block/block-backend.c:2136:12: error: implicit declaration of function 'bdrv_known_zeroes'; did you mean 'blk_known_zeroes'? [-Werror=implicit-function-declaration]
     return bdrv_known_zeroes(blk_bs(blk));
            ^~~~~~~~~~~~~~~~~
            blk_known_zeroes
/tmp/qemu-test/src/block/block-backend.c:2136:12: error: nested extern declaration of 'bdrv_known_zeroes' [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [/tmp/qemu-test/src/rules.mak:69: block/block-backend.o] Error 1
make: *** Waiting for unfinished jobs....
  CC      block/aio_task.o
/tmp/qemu-test/src/block/nbd.c: In function 'nbd_known_zeroes':
/tmp/qemu-test/src/block/nbd.c:1919:16: error: 'BDRV_ZERO_OPEN' undeclared (first use in this function); did you mean 'BDRV_REQ_MASK'?
         return BDRV_ZERO_OPEN;
                ^~~~~~~~~~~~~~
                BDRV_REQ_MASK
/tmp/qemu-test/src/block/nbd.c:1919:16: note: each undeclared identifier is reported only once for each function it appears in
/tmp/qemu-test/src/block/nbd.c: At top level:
/tmp/qemu-test/src/block/nbd.c:2042:6: error: 'BlockDriver' {aka 'struct BlockDriver'} has no member named 'bdrv_known_zeroes'; did you mean 'bdrv_join_options'?
     .bdrv_known_zeroes          = nbd_known_zeroes,
      ^~~~~~~~~~~~~~~~~
      bdrv_join_options
/tmp/qemu-test/src/block/nbd.c:2042:35: error: initialization of 'int (*)(BlockDriverState *, BdrvChild *, uint64_t,  BdrvChild *, uint64_t,  uint64_t,  BdrvRequestFlags,  BdrvRequestFlags)' {aka 'int (*)(struct BlockDriverState *, struct BdrvChild *, long long unsigned int,  struct BdrvChild *, long long unsigned int,  long long unsigned int,  enum <anonymous>,  enum <anonymous>)'} from incompatible pointer type 'int (*)(BlockDriverState *)' {aka 'int (*)(struct BlockDriverState *)'} [-Werror=incompatible-pointer-types]
     .bdrv_known_zeroes          = nbd_known_zeroes,
                                   ^~~~~~~~~~~~~~~~
/tmp/qemu-test/src/block/nbd.c:2042:35: note: (near initialization for 'bdrv_nbd.bdrv_co_copy_range_from')
/tmp/qemu-test/src/block/nbd.c:2068:6: error: 'BlockDriver' {aka 'struct BlockDriver'} has no member named 'bdrv_known_zeroes'; did you mean 'bdrv_join_options'?
     .bdrv_known_zeroes          = nbd_known_zeroes,
      ^~~~~~~~~~~~~~~~~
      bdrv_join_options
/tmp/qemu-test/src/block/nbd.c:2068:35: error: initialization of 'int (*)(BlockDriverState *, BdrvChild *, uint64_t,  BdrvChild *, uint64_t,  uint64_t,  BdrvRequestFlags,  BdrvRequestFlags)' {aka 'int (*)(struct BlockDriverState *, struct BdrvChild *, long long unsigned int,  struct BdrvChild *, long long unsigned int,  long long unsigned int,  enum <anonymous>,  enum <anonymous>)'} from incompatible pointer type 'int (*)(BlockDriverState *)' {aka 'int (*)(struct BlockDriverState *)'} [-Werror=incompatible-pointer-types]
     .bdrv_known_zeroes          = nbd_known_zeroes,
                                   ^~~~~~~~~~~~~~~~
/tmp/qemu-test/src/block/nbd.c:2068:35: note: (near initialization for 'bdrv_nbd_tcp.bdrv_co_copy_range_from')
/tmp/qemu-test/src/block/nbd.c:2094:6: error: 'BlockDriver' {aka 'struct BlockDriver'} has no member named 'bdrv_known_zeroes'; did you mean 'bdrv_join_options'?
     .bdrv_known_zeroes          = nbd_known_zeroes,
      ^~~~~~~~~~~~~~~~~
      bdrv_join_options
/tmp/qemu-test/src/block/nbd.c:2094:35: error: initialization of 'int (*)(BlockDriverState *, BdrvChild *, uint64_t,  BdrvChild *, uint64_t,  uint64_t,  BdrvRequestFlags,  BdrvRequestFlags)' {aka 'int (*)(struct BlockDriverState *, struct BdrvChild *, long long unsigned int,  struct BdrvChild *, long long unsigned int,  long long unsigned int,  enum <anonymous>,  enum <anonymous>)'} from incompatible pointer type 'int (*)(BlockDriverState *)' {aka 'int (*)(struct BlockDriverState *)'} [-Werror=incompatible-pointer-types]
     .bdrv_known_zeroes          = nbd_known_zeroes,
                                   ^~~~~~~~~~~~~~~~
/tmp/qemu-test/src/block/nbd.c:2094:35: note: (near initialization for 'bdrv_nbd_unix.bdrv_co_copy_range_from')
cc1: all warnings being treated as errors
make: *** [/tmp/qemu-test/src/rules.mak:69: block/nbd.o] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=811a6c8d2b1f4332b71aff13616f8b14', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-n_zjkark/src/docker-src.2020-02-10-16.49.18.9920:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=811a6c8d2b1f4332b71aff13616f8b14
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-n_zjkark/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    2m18.509s
user    0m7.961s


The full log is available at
http://patchew.org/logs/20200210214109.751734-1-eblake@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension
  2020-02-10 21:41 ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension Eric Blake
                     ` (3 preceding siblings ...)
  2020-02-10 21:51   ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension no-reply
@ 2020-02-10 21:53   ` no-reply
  4 siblings, 0 replies; 19+ messages in thread
From: no-reply @ 2020-02-10 21:53 UTC (permalink / raw)
  To: eblake; +Cc: mreitz, vsementsov, qemu-devel, qemu-block, rjones

Patchew URL: https://patchew.org/QEMU/20200210214109.751734-1-eblake@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      block/linux-aio.o
  CC      block/null.o
/tmp/qemu-test/src/block/block-backend.c: In function 'blk_known_zeroes':
/tmp/qemu-test/src/block/block-backend.c:2136:5: error: implicit declaration of function 'bdrv_known_zeroes' [-Werror=implicit-function-declaration]
     return bdrv_known_zeroes(blk_bs(blk));
     ^
/tmp/qemu-test/src/block/block-backend.c:2136:5: error: nested extern declaration of 'bdrv_known_zeroes' [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [block/block-backend.o] Error 1
make: *** Waiting for unfinished jobs....
rm tests/qemu-iotests/socket_scm_helper.o
Traceback (most recent call last):
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=61fa79588f1947f192503fe5de848280', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-ags5e8nv/src/docker-src.2020-02-10-16.52.09.15629:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=61fa79588f1947f192503fe5de848280
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ags5e8nv/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    1m38.251s
user    0m8.315s


The full log is available at
http://patchew.org/logs/20200210214109.751734-1-eblake@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension
  2020-02-10 21:51   ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension no-reply
@ 2020-02-10 21:54     ` Eric Blake
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2020-02-10 21:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: vsementsov, rjones, qemu-block, mreitz

On 2/10/20 3:51 PM, no-reply@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20200210214109.751734-1-eblake@redhat.com/
> 
> 
> 
> Hi,
> 
> This series failed the docker-mingw@fedora build test. Please find the testing commands and
> their output below. If you have Docker installed, you can probably reproduce it
> locally.
> 
> === TEST SCRIPT BEGIN ===
> #! /bin/bash
> export ARCH=x86_64
> make docker-image-fedora V=1 NETWORK=1
> time make docker-test-mingw@fedora J=14 NETWORK=1
> === TEST SCRIPT END ===
> 
>    CC      block/block-copy.o
>    CC      block/crypto.o
> /tmp/qemu-test/src/block/block-backend.c: In function 'blk_known_zeroes':
> /tmp/qemu-test/src/block/block-backend.c:2136:12: error: implicit declaration of function 'bdrv_known_zeroes'; did you mean 'blk_known_zeroes'? [-Werror=implicit-function-declaration]
>       return bdrv_known_zeroes(blk_bs(blk));
>              ^~~~~~~~~~~~~~~~~

Patchew didn't find my Based-on tag in 0/3 (maybe because it wasn't the 
actual cover letter?)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-10 21:37 Cross-project NBD extension proposal: NBD_INFO_INIT_STATE Eric Blake
  2020-02-10 21:41 ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension Eric Blake
@ 2020-02-10 22:12 ` Richard W.M. Jones
  2020-02-10 22:29   ` Eric Blake
  2020-02-17 15:13 ` Max Reitz
  2 siblings, 1 reply; 19+ messages in thread
From: Richard W.M. Jones @ 2020-02-10 22:12 UTC (permalink / raw)
  To: Eric Blake
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-block, QEMU,
	Max Reitz, nbd, libguestfs

On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote:
> For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the
> image has at least one hole) and NBD_INIT_ZERO (the image reads
> completely as zero); the two bits are orthogonal and can be set
> independently, although it is easy enough to see completely sparse
> files with both bits set.

I think I'm confused about the exact meaning of NBD_INIT_SPARSE.  Do
you really mean the whole image is sparse; or (as you seem to have
said above) that there exists a hole somewhere in the image but we're
not saying where it is and there can be non-sparse parts of the image?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-10 22:12 ` Cross-project NBD extension proposal: NBD_INFO_INIT_STATE Richard W.M. Jones
@ 2020-02-10 22:29   ` Eric Blake
  2020-02-10 22:52     ` Richard W.M. Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2020-02-10 22:29 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-block, QEMU,
	Max Reitz, nbd, libguestfs

On 2/10/20 4:12 PM, Richard W.M. Jones wrote:
> On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote:
>> For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the
>> image has at least one hole) and NBD_INIT_ZERO (the image reads
>> completely as zero); the two bits are orthogonal and can be set
>> independently, although it is easy enough to see completely sparse
>> files with both bits set.
> 
> I think I'm confused about the exact meaning of NBD_INIT_SPARSE.  Do
> you really mean the whole image is sparse; or (as you seem to have
> said above) that there exists a hole somewhere in the image but we're
> not saying where it is and there can be non-sparse parts of the image?

As implemented:

NBD_INIT_SPARSE - there is at least one hole somewhere (allocation would 
be required to write to that part of the file), but there may b 
allocated data elsewhere in the image.  Most disk images will fit this 
definition (for example, it is very common to have a hole between the 
MBR or GPT and the first partition containing a file system, or for file 
systems themselves to be sparse within the larger block device).

NBD_INIT_ZERO - all bytes read as zero.

The combination NBD_INIT_SPARSE|NBD_INIT_ZERO is common (generally, if 
you use lseek(SEEK_DATA) to prove the entire image reads as zeroes, you 
also know the entire image is sparse), but NBD_INIT_ZERO in isolation is 
also possible (especially with the qcow2 proposal of a persistent 
autoclear bit, where even with a fully preallocated qcow2 image you 
still know it reads as zeroes but there are no holes).  But you are also 
right that for servers that can advertise both bits efficiently, 
NBD_INIT_SPARSE in isolation may be more common than 
NBD_INIT_SPARSE|NBD_INIT_ZERO (the former for most disk images, the 
latter only for a freshly-created image that happens to create with zero 
initialization).

What's more, in my patches, I did NOT patch qemu to set or consume 
INIT_SPARSE; so far, it only sets/consumes INIT_ZERO.  Of course, if we 
can find a reason WHY qemu should track whether a qcow2 image is 
fully-allocated, by demonstrating a qemu-img algorithm that becomes 
easier for knowing if an image is sparse (even if our justification is: 
"when copying an image, I want to know if the _source_ is sparse, to 
know whether I have to bend over backwards to preallocate the 
destination"), then using that in qemu makes sense for my v2 patches. 
But for v1, my only justification was "when copying an image, I can skip 
holes in the source if I know the _destination_ already reads as 
zeroes", which only needed INIT_ZERO.

Some of the nbdkit patches demonstrate the some-vs.-all nature of the 
two bits; for example, in the split plugin, I initialize h->init_sparse 
= false; h->init_zero = true; then in a loop over each file change 
h->init_sparse to true if at least one file was sparse, and change 
h->init_zero to false if at least one file had non-zero contents.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-10 22:29   ` Eric Blake
@ 2020-02-10 22:52     ` Richard W.M. Jones
  2020-02-11 14:33       ` Eric Blake
  2020-02-12  7:27       ` Wouter Verhelst
  0 siblings, 2 replies; 19+ messages in thread
From: Richard W.M. Jones @ 2020-02-10 22:52 UTC (permalink / raw)
  To: Eric Blake
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-block, QEMU,
	Max Reitz, nbd, libguestfs

On Mon, Feb 10, 2020 at 04:29:53PM -0600, Eric Blake wrote:
> On 2/10/20 4:12 PM, Richard W.M. Jones wrote:
> >On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote:
> >>For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the
> >>image has at least one hole) and NBD_INIT_ZERO (the image reads
> >>completely as zero); the two bits are orthogonal and can be set
> >>independently, although it is easy enough to see completely sparse
> >>files with both bits set.
> >
> >I think I'm confused about the exact meaning of NBD_INIT_SPARSE.  Do
> >you really mean the whole image is sparse; or (as you seem to have
> >said above) that there exists a hole somewhere in the image but we're
> >not saying where it is and there can be non-sparse parts of the image?
> 
> As implemented:
> 
> NBD_INIT_SPARSE - there is at least one hole somewhere (allocation
> would be required to write to that part of the file), but there may
> b allocated data elsewhere in the image.  Most disk images will fit
> this definition (for example, it is very common to have a hole
> between the MBR or GPT and the first partition containing a file
> system, or for file systems themselves to be sparse within the
> larger block device).

I think I'm still confused about why this particular flag would be
useful for clients (I can completely understand why clients need
NBD_INIT_ZERO).

But anyway ... could a flag indicating that the whole image is sparse
be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
could use it to avoid an initial disk trim, which is something that
mke2fs does:

  https://github.com/tytso/e2fsprogs/blob/0670fc20df4a4bbbeb0edb30d82628ea30a80598/misc/mke2fs.c#L2768

and which is painfully slow over NBD for very large devices because of
the 32 bit limit on request sizes - try doing mke2fs on a 1E nbdkit
memory disk some time.

> NBD_INIT_ZERO - all bytes read as zero.
> 
> The combination NBD_INIT_SPARSE|NBD_INIT_ZERO is common (generally,
> if you use lseek(SEEK_DATA) to prove the entire image reads as
> zeroes, you also know the entire image is sparse), but NBD_INIT_ZERO
> in isolation is also possible (especially with the qcow2 proposal of
> a persistent autoclear bit, where even with a fully preallocated
> qcow2 image you still know it reads as zeroes but there are no
> holes).  But you are also right that for servers that can advertise
> both bits efficiently, NBD_INIT_SPARSE in isolation may be more
> common than NBD_INIT_SPARSE|NBD_INIT_ZERO (the former for most disk
> images, the latter only for a freshly-created image that happens to
> create with zero initialization).
> 
> What's more, in my patches, I did NOT patch qemu to set or consume
> INIT_SPARSE; so far, it only sets/consumes INIT_ZERO.  Of course, if
> we can find a reason WHY qemu should track whether a qcow2 image is
> fully-allocated, by demonstrating a qemu-img algorithm that becomes
> easier for knowing if an image is sparse (even if our justification
> is: "when copying an image, I want to know if the _source_ is
> sparse, to know whether I have to bend over backwards to preallocate
> the destination"), then using that in qemu makes sense for my v2
> patches. But for v1, my only justification was "when copying an
> image, I can skip holes in the source if I know the _destination_
> already reads as zeroes", which only needed INIT_ZERO.
> 
> Some of the nbdkit patches demonstrate the some-vs.-all nature of
> the two bits; for example, in the split plugin, I initialize
> h->init_sparse = false; h->init_zero = true; then in a loop over
> each file change h->init_sparse to true if at least one file was
> sparse, and change h->init_zero to false if at least one file had
> non-zero contents.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-10 22:52     ` Richard W.M. Jones
@ 2020-02-11 14:33       ` Eric Blake
  2020-02-12  7:27       ` Wouter Verhelst
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Blake @ 2020-02-11 14:33 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-block, QEMU,
	Max Reitz, nbd, libguestfs

On 2/10/20 4:52 PM, Richard W.M. Jones wrote:
> On Mon, Feb 10, 2020 at 04:29:53PM -0600, Eric Blake wrote:
>> On 2/10/20 4:12 PM, Richard W.M. Jones wrote:
>>> On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote:
>>>> For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the
>>>> image has at least one hole) and NBD_INIT_ZERO (the image reads
>>>> completely as zero); the two bits are orthogonal and can be set
>>>> independently, although it is easy enough to see completely sparse
>>>> files with both bits set.
>>>
>>> I think I'm confused about the exact meaning of NBD_INIT_SPARSE.  Do
>>> you really mean the whole image is sparse; or (as you seem to have
>>> said above) that there exists a hole somewhere in the image but we're
>>> not saying where it is and there can be non-sparse parts of the image?
>>
>> As implemented:
>>
>> NBD_INIT_SPARSE - there is at least one hole somewhere (allocation
>> would be required to write to that part of the file), but there may
>> b allocated data elsewhere in the image.  Most disk images will fit
>> this definition (for example, it is very common to have a hole
>> between the MBR or GPT and the first partition containing a file
>> system, or for file systems themselves to be sparse within the
>> larger block device).
> 
> I think I'm still confused about why this particular flag would be
> useful for clients (I can completely understand why clients need
> NBD_INIT_ZERO).
> 
> But anyway ... could a flag indicating that the whole image is sparse
> be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
> could use it to avoid an initial disk trim, which is something that
> mke2fs does:
> 
>    https://github.com/tytso/e2fsprogs/blob/0670fc20df4a4bbbeb0edb30d82628ea30a80598/misc/mke2fs.c#L2768

I'm open to suggestions on how many initial bits should be provided.  In 
fact, if we wanted, we could have a pair mutually-exclusive bits, 
advertising:
00 - no information known
01 - image is completely sparse
10 - image is completely allocated
11 - error

The goal of providing a 16-bit answer (or we could mandate 32 or 64 
bits, if we think we will ever want to extend that far) was to make it 
easier to add in whatever other initial-state extensions that someone 
could find useful.  Until we're happy with the design, the size or any 
given bit assignment is not locked down; once we do start committing any 
of this series, we've locked in what interoperability will demand but 
still have spare bits as future extensions.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-10 22:52     ` Richard W.M. Jones
  2020-02-11 14:33       ` Eric Blake
@ 2020-02-12  7:27       ` Wouter Verhelst
  2020-02-12 12:09         ` Eric Blake
  1 sibling, 1 reply; 19+ messages in thread
From: Wouter Verhelst @ 2020-02-12  7:27 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-block, QEMU,
	Max Reitz, nbd, libguestfs

Hi,

On Mon, Feb 10, 2020 at 10:52:55PM +0000, Richard W.M. Jones wrote:
> But anyway ... could a flag indicating that the whole image is sparse
> be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
> could use it to avoid an initial disk trim, which is something that
> mke2fs does:

Yeah, I think that could definitely be useful. I honestly can't see a
use for NBD_INIT_SPARSE as defined in this proposal; and I don't think
it's generally useful to have a feature if we can't think of a use case
for it (that creates added complexity for no benefit).

If we can find a reasonable use case for NBD_INIT_SPARSE as defined in
this proposal, then just add a third bit (NBD_INIT_ALL_SPARSE or
something) that says "the whole image is sparse". Otherwise, I think we
should redefine NBD_INIT_SPARSE to say that.

-- 
To the thief who stole my anti-depressants: I hope you're happy

  -- seen somewhere on the Internet on a photo of a billboard


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-12  7:27       ` Wouter Verhelst
@ 2020-02-12 12:09         ` Eric Blake
  2020-02-12 12:36           ` Richard W.M. Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2020-02-12 12:09 UTC (permalink / raw)
  To: Wouter Verhelst, Richard W.M. Jones
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-block, QEMU,
	Max Reitz, nbd, libguestfs

On 2/12/20 1:27 AM, Wouter Verhelst wrote:
> Hi,
> 
> On Mon, Feb 10, 2020 at 10:52:55PM +0000, Richard W.M. Jones wrote:
>> But anyway ... could a flag indicating that the whole image is sparse
>> be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
>> could use it to avoid an initial disk trim, which is something that
>> mke2fs does:
> 
> Yeah, I think that could definitely be useful. I honestly can't see a
> use for NBD_INIT_SPARSE as defined in this proposal; and I don't think
> it's generally useful to have a feature if we can't think of a use case
> for it (that creates added complexity for no benefit).
> 
> If we can find a reasonable use case for NBD_INIT_SPARSE as defined in
> this proposal, then just add a third bit (NBD_INIT_ALL_SPARSE or
> something) that says "the whole image is sparse". Otherwise, I think we
> should redefine NBD_INIT_SPARSE to say that.

Okay, in v2, I will start with just two bits, NBD_INIT_SPARSE (entire 
image is sparse, nothing is allocated) and NBD_INIT_ZERO (entire image 
reads as zero), and save any future bits for later additions.  Do we 
think that 16 bits is sufficient for the amount of initial information 
likely to be exposed?  Are we in agreement that my addition of an 
NBD_INFO_ response to NBD_OPT_GO is the best way to expose initial state 
bits?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-12 12:09         ` Eric Blake
@ 2020-02-12 12:36           ` Richard W.M. Jones
  2020-02-12 12:47             ` Eric Blake
  0 siblings, 1 reply; 19+ messages in thread
From: Richard W.M. Jones @ 2020-02-12 12:36 UTC (permalink / raw)
  To: Eric Blake
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-block, QEMU,
	Max Reitz, Wouter Verhelst, nbd, libguestfs


On Wed, Feb 12, 2020 at 06:09:11AM -0600, Eric Blake wrote:
> On 2/12/20 1:27 AM, Wouter Verhelst wrote:
> >Hi,
> >
> >On Mon, Feb 10, 2020 at 10:52:55PM +0000, Richard W.M. Jones wrote:
> >>But anyway ... could a flag indicating that the whole image is sparse
> >>be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
> >>could use it to avoid an initial disk trim, which is something that
> >>mke2fs does:
> >
> >Yeah, I think that could definitely be useful. I honestly can't see a
> >use for NBD_INIT_SPARSE as defined in this proposal; and I don't think
> >it's generally useful to have a feature if we can't think of a use case
> >for it (that creates added complexity for no benefit).
> >
> >If we can find a reasonable use case for NBD_INIT_SPARSE as defined in
> >this proposal, then just add a third bit (NBD_INIT_ALL_SPARSE or
> >something) that says "the whole image is sparse". Otherwise, I think we
> >should redefine NBD_INIT_SPARSE to say that.
> 
> Okay, in v2, I will start with just two bits, NBD_INIT_SPARSE
> (entire image is sparse, nothing is allocated) and NBD_INIT_ZERO
> (entire image reads as zero), and save any future bits for later
> additions.  Do we think that 16 bits is sufficient for the amount of
> initial information likely to be exposed?

So as I understand the proposal, the 16 bit limit comes about because
we want a round 4 byte reply, 16 bits are used by NBD_INFO_INIT_STATE
and that leaves 16 bits feature bits.  Therefore the only way to go
from there is to have 32 feature bits but an awkward unaligned 6 byte
structure, or 48 feature bits (8 byte structure).

I guess given those constraints we can stick with 16 feature bits, and
if we ever needed more then we'd have to introduce NBD_INFO_INIT_STATE2.

The only thing I can think of which might be useful is a "fully
preallocated" bit which might be used as an indication that writes are
fast and are unlikely to fail with ENOSPC.

> Are we in agreement that
> my addition of an NBD_INFO_ response to NBD_OPT_GO is the best way
> to expose initial state bits?

Seems reasonable to me.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-12 12:36           ` Richard W.M. Jones
@ 2020-02-12 12:47             ` Eric Blake
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Blake @ 2020-02-12 12:47 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-block, QEMU,
	Max Reitz, Wouter Verhelst, nbd, libguestfs

On 2/12/20 6:36 AM, Richard W.M. Jones wrote:

>> Okay, in v2, I will start with just two bits, NBD_INIT_SPARSE
>> (entire image is sparse, nothing is allocated) and NBD_INIT_ZERO
>> (entire image reads as zero), and save any future bits for later
>> additions.  Do we think that 16 bits is sufficient for the amount of
>> initial information likely to be exposed?
> 
> So as I understand the proposal, the 16 bit limit comes about because
> we want a round 4 byte reply, 16 bits are used by NBD_INFO_INIT_STATE
> and that leaves 16 bits feature bits.  Therefore the only way to go
> from there is to have 32 feature bits but an awkward unaligned 6 byte
> structure, or 48 feature bits (8 byte structure).

In general, the NBD protocol has NOT focused on alignment issues (for 
good or for bad).  For example, NBD_INFO_BLOCK_SIZE is 18 bytes; all 
NBD_CMD_* 32-bit requests are 28 bytes except for NBD_CMD_WRITE which 
can send unaligned payload with no further padding, and so forth.

> 
> I guess given those constraints we can stick with 16 feature bits, and
> if we ever needed more then we'd have to introduce NBD_INFO_INIT_STATE2.
> 
> The only thing I can think of which might be useful is a "fully
> preallocated" bit which might be used as an indication that writes are
> fast and are unlikely to fail with ENOSPC.

and which would be mutually-exclusive with NBD_INFO_SPARSE (except for 
an image of size 0).  That bit would ALSO be an indication that the user 
may not want to punch holes into the image, but preserve the 
fully-allocated state (and thus avoid NBD_CMD_TRIM as well as passing 
NBD_CMD_FLAG_NO_HOLE to any WRITE_ZEROES request).

> 
>> Are we in agreement that
>> my addition of an NBD_INFO_ response to NBD_OPT_GO is the best way
>> to expose initial state bits?
> 
> Seems reasonable to me.
> 
> Rich.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-10 21:37 Cross-project NBD extension proposal: NBD_INFO_INIT_STATE Eric Blake
  2020-02-10 21:41 ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension Eric Blake
  2020-02-10 22:12 ` Cross-project NBD extension proposal: NBD_INFO_INIT_STATE Richard W.M. Jones
@ 2020-02-17 15:13 ` Max Reitz
  2020-02-18 20:55   ` Eric Blake
  2 siblings, 1 reply; 19+ messages in thread
From: Max Reitz @ 2020-02-17 15:13 UTC (permalink / raw)
  To: Eric Blake, nbd, QEMU, qemu-block, libguestfs
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, Richard W.M. Jones


[-- Attachment #1.1: Type: text/plain, Size: 1994 bytes --]

Hi,

It’s my understanding that without some is_zero infrastructure for QEMU,
it’s impossible to implement this flag in qemu’s NBD server.

At the same time, I still haven’t understood what we need the flag for.

As far as I understood in our discussion on your qemu series, there is
no case where anyone would need to know whether an image is zero.  All
practical cases involve someone having to ensure that some image is
zero.  Knowing whether an image is zero can help with that, but that can
be an implementation detail.

For qcow2, the idea would be that there is some flag that remains true
as long as the image is guaranteed to be zero.  Then we’d have some
bdrv_make_zero function, and qcow2’s implementation would use this
information to gauge whether there’s something to do as all.

For NBD, we cannot use this idea directly because to implement such a
flag (as you’re describing in this mail), we’d need separate is_zero
infrastructure, and that kind of makes the point of “drivers’
bdrv_make_zero() implementations do the right thing by themselves” moot.

OTOH, we wouldn’t need such a flag for the implementation, because we
could just send a 64-bit discard/make_zero over the whole block device
length to the NBD server, and then the server internally does the right
thing(TM).  AFAIU discard and write_zeroes currently have only 32 bit
length fields, but there were plans for adding support for 64 bit
versions anyway.  From my naïve outsider perspective, doing that doesn’t
seem a more complicated protocol addition than adding some way to tell
whether an NBD export is zero.

So I’m still wondering whether there are actually cases where we need to
tell whether some image or NBD export is zero that do not involve making
it zero if it isn’t.

(I keep asking because it seems to me that if all we ever really want to
do is to ensure that some images/exports are zero, we should implement
that.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-17 15:13 ` Max Reitz
@ 2020-02-18 20:55   ` Eric Blake
  2020-02-19 11:10     ` Max Reitz
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2020-02-18 20:55 UTC (permalink / raw)
  To: Max Reitz, nbd, QEMU, qemu-block, libguestfs
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, Richard W.M. Jones

On 2/17/20 9:13 AM, Max Reitz wrote:
> Hi,
> 
> It’s my understanding that without some is_zero infrastructure for QEMU,
> it’s impossible to implement this flag in qemu’s NBD server.

You're right that we may need some more infrastructure before being able 
to decide when to report this bit in all cases.  But for raw files, that 
infrastructure already exists: does block_status at offset 0 and the 
entire image as length return status that the entire file is a hole. 
And for qcow2 files, it would not be that hard to teach a similar 
block_status request to report the entire image as a hole based on my 
proposed qcow2 autoclear bit tracking that the image still reads as zero.

> 
> At the same time, I still haven’t understood what we need the flag for.
> 
> As far as I understood in our discussion on your qemu series, there is
> no case where anyone would need to know whether an image is zero.  All > practical cases involve someone having to ensure that some image is
> zero.  Knowing whether an image is zero can help with that, but that can
> be an implementation detail.
> 
> For qcow2, the idea would be that there is some flag that remains true
> as long as the image is guaranteed to be zero.  Then we’d have some
> bdrv_make_zero function, and qcow2’s implementation would use this
> information to gauge whether there’s something to do as all.
> 
> For NBD, we cannot use this idea directly because to implement such a
> flag (as you’re describing in this mail), we’d need separate is_zero
> infrastructure, and that kind of makes the point of “drivers’
> bdrv_make_zero() implementations do the right thing by themselves” moot.

We don't necessarily need a separate is_zero infrastructure if we can 
instead teach the existing block_status infrastructure to report that 
the entire image reads as zero.  You're right that clients that need to 
force an entire image to be zero won't need to directly call 
block_status (they can just call bdrv_make_zero, and let that worry 
about whether a block status call makes sense among its list of steps to 
try).  But since block_status can report all-zero status for some cases, 
it's not hard to use that for feeding the NBD bit.

However, there's a difference between qemu's block status (which is 
already typed correctly to return a 64-bit answer, even if it may need a 
few tweaks for clients that currently don't expect it to request more 
than 32 bits) and NBD's block status (which can only report 32 bits 
barring a new extension to the protocol), and where a single all-zero 
bit at NBD_OPT_GO is just as easy of an extension as a way to report a 
64-bit all-zero response to NBD_CMD_BLOCK_STATUS.

> 
> OTOH, we wouldn’t need such a flag for the implementation, because we
> could just send a 64-bit discard/make_zero over the whole block device
> length to the NBD server, and then the server internally does the right
> thing(TM).  AFAIU discard and write_zeroes currently have only 32 bit
> length fields, but there were plans for adding support for 64 bit
> versions anyway.  From my naïve outsider perspective, doing that doesn’t
> seem a more complicated protocol addition than adding some way to tell
> whether an NBD export is zero.

Adding 64-bit commands to NBD is more invasive than adding a single 
startup status bit.  Both ideas can be done - doing one does not 
preclude the other.  But at the same time, not all servers will 
implement both ideas - if one is easy to implement while the other is 
hard, it is not unlikely that qemu will still encounter NBD servers that 
advertise startup state but not support 64-bit make_zero (even if qemu 
as NBD server starts supporting 64-bit make zero) or even 64-bit block 
status results.

Another thing to think about here is timing.  With the proposed NBD 
addition, it is the server telling the client that "the image you are 
connecting to started zero", prior to the point that the client even has 
a chance to request "can you make the image all zero in a quick manner 
(and if not, I'll fall back to writing zeroes as I go)".  And even if 
NBD gains a 64-bit block status and/or make zero command, it is still 
less network traffic for the server to advertise up-front that the image 
is all zero than it is for the client to have to issue command requests 
of the server (network traffic is not always the bottleneck, but it can 
be a consideration).

> 
> So I’m still wondering whether there are actually cases where we need to
> tell whether some image or NBD export is zero that do not involve making
> it zero if it isn’t.

Just because we don't think that qemu-img has such a case does not mean 
that other NBD clients will not be able to come up with some use for 
knowing if an image starts all zero.

> 
> (I keep asking because it seems to me that if all we ever really want to
> do is to ensure that some images/exports are zero, we should implement
> that.)

The problem is WHERE do you implement it.  Is it more efficient to 
implement make_zero in the NBD server (the client merely requests to 
make zero, but lets the server do all the work) or in the NBD client 
(the client learns whether the server is already zero, and not hearing 
yes, the client proceeds to do all the work to write zeroes).  From the 
qemu perspective, qemu-img convert needs the image to be zero, and 
bdrv_make_zero will report back either "yes I quickly made it zero, 
possibly by doing nothing" or "no, making it zero now is no more 
efficient than you just writing zeroes as you go".  But although the 
code in qemu-img is the same whether bdrv_make_zero is able to request 
the work be done in the server or whether the work has to be done in the 
client, the code in the block layer that implements bdrv_make_zero may 
itself care about knowing whether the NBD server started all zero.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
  2020-02-18 20:55   ` Eric Blake
@ 2020-02-19 11:10     ` Max Reitz
  0 siblings, 0 replies; 19+ messages in thread
From: Max Reitz @ 2020-02-19 11:10 UTC (permalink / raw)
  To: Eric Blake, nbd, QEMU, qemu-block, libguestfs
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, Richard W.M. Jones


[-- Attachment #1.1: Type: text/plain, Size: 8462 bytes --]

On 18.02.20 21:55, Eric Blake wrote:
> On 2/17/20 9:13 AM, Max Reitz wrote:
>> Hi,
>>
>> It’s my understanding that without some is_zero infrastructure for QEMU,
>> it’s impossible to implement this flag in qemu’s NBD server.
> 
> You're right that we may need some more infrastructure before being able
> to decide when to report this bit in all cases.  But for raw files, that
> infrastructure already exists: does block_status at offset 0 and the
> entire image as length return status that the entire file is a hole.

Hm, except that only works if the file is just completely unallocated.
Calling that existing infrastructure is a bit of a stretch, I think.

Or are you saying that bdrv_co_block_status(..., 0, bdrv_getlength(),
...) is our existing infrastructure?  Actually, why not.  Can we make
block drivers catch that special case?  (Or might the generic block code
somehow truncate such requests?)

> And
> for qcow2 files, it would not be that hard to teach a similar
> block_status request to report the entire image as a hole based on my
> proposed qcow2 autoclear bit tracking that the image still reads as zero.
> 
>>
>> At the same time, I still haven’t understood what we need the flag for.
>>
>> As far as I understood in our discussion on your qemu series, there is
>> no case where anyone would need to know whether an image is zero.  All
>> > practical cases involve someone having to ensure that some image is
>> zero.  Knowing whether an image is zero can help with that, but that can
>> be an implementation detail.
>>
>> For qcow2, the idea would be that there is some flag that remains true
>> as long as the image is guaranteed to be zero.  Then we’d have some
>> bdrv_make_zero function, and qcow2’s implementation would use this
>> information to gauge whether there’s something to do as all.
>>
>> For NBD, we cannot use this idea directly because to implement such a
>> flag (as you’re describing in this mail), we’d need separate is_zero
>> infrastructure, and that kind of makes the point of “drivers’
>> bdrv_make_zero() implementations do the right thing by themselves” moot.
> 
> We don't necessarily need a separate is_zero infrastructure if we can
> instead teach the existing block_status infrastructure to report that
> the entire image reads as zero.  You're right that clients that need to
> force an entire image to be zero won't need to directly call
> block_status (they can just call bdrv_make_zero, and let that worry
> about whether a block status call makes sense among its list of steps to
> try).  But since block_status can report all-zero status for some cases,
> it's not hard to use that for feeding the NBD bit.

OK.  I’m not 100% sure there’s nothing that would bite us in the butt
here, but I seem to remember you made all block_status things 64-bit, so
I suppose you know. :)

> However, there's a difference between qemu's block status (which is
> already typed correctly to return a 64-bit answer, even if it may need a
> few tweaks for clients that currently don't expect it to request more
> than 32 bits) and NBD's block status (which can only report 32 bits
> barring a new extension to the protocol), and where a single all-zero
> bit at NBD_OPT_GO is just as easy of an extension as a way to report a
> 64-bit all-zero response to NBD_CMD_BLOCK_STATUS.

Agreed.

>> OTOH, we wouldn’t need such a flag for the implementation, because we
>> could just send a 64-bit discard/make_zero over the whole block device
>> length to the NBD server, and then the server internally does the right
>> thing(TM).  AFAIU discard and write_zeroes currently have only 32 bit
>> length fields, but there were plans for adding support for 64 bit
>> versions anyway.  From my naïve outsider perspective, doing that doesn’t
>> seem a more complicated protocol addition than adding some way to tell
>> whether an NBD export is zero.
> 
> Adding 64-bit commands to NBD is more invasive than adding a single
> startup status bit.

True.  But if we/you want 64-bit commands anyway, then it doesn’t really
matter what’s more invasive.

> Both ideas can be done - doing one does not
> preclude the other.

Absolutely.  It’s just that if you do one anyway and it supersedes the
other, than we don’t have to do both.  Hence me wondering whether one
does supersede the other.

> But at the same time, not all servers will
> implement both ideas - if one is easy to implement while the other is
> hard, it is not unlikely that qemu will still encounter NBD servers that
> advertise startup state but not support 64-bit make_zero (even if qemu
> as NBD server starts supporting 64-bit make zero) or even 64-bit block
> status results.

Hm.  You know better than me whether that’s a good argument, because it
mostly depends on how many NBD server implementations there are;
specifically whether there are any that are decidedly not feature-complete.

> Another thing to think about here is timing.  With the proposed NBD
> addition, it is the server telling the client that "the image you are
> connecting to started zero", prior to the point that the client even has
> a chance to request "can you make the image all zero in a quick manner
> (and if not, I'll fall back to writing zeroes as I go)".  And even if
> NBD gains a 64-bit block status and/or make zero command, it is still
> less network traffic for the server to advertise up-front that the image
> is all zero than it is for the client to have to issue command requests
> of the server (network traffic is not always the bottleneck, but it can
> be a consideration).

I suppose one 64-bit discard/write_zeroes to the whole image wouldn’t be
too bad, regardless of the network speed.

>> So I’m still wondering whether there are actually cases where we need to
>> tell whether some image or NBD export is zero that do not involve making
>> it zero if it isn’t.
> 
> Just because we don't think that qemu-img has such a case does not mean
> that other NBD clients will not be able to come up with some use for
> knowing if an image starts all zero.

Sure, but implementing a feature on the basis of “Somebody may come up
with a use for it” sounds fishy to me.

OTOH, you completely convinced me with the fact that we can start by
letting qemu’s NBD server just invoke a block-status call over the whole
image, and then (potentially later) letting various block drivers
recognize that case.  But I suppose that means we no longer need a
dedicated has_zero_open() function, right?

>> (I keep asking because it seems to me that if all we ever really want to
>> do is to ensure that some images/exports are zero, we should implement
>> that.)
> 
> The problem is WHERE do you implement it.  Is it more efficient to
> implement make_zero in the NBD server (the client merely requests to
> make zero, but lets the server do all the work) or in the NBD client
> (the client learns whether the server is already zero, and not hearing
> yes, the client proceeds to do all the work to write zeroes).  From the
> qemu perspective, qemu-img convert needs the image to be zero, and
> bdrv_make_zero will report back either "yes I quickly made it zero,
> possibly by doing nothing" or "no, making it zero now is no more
> efficient than you just writing zeroes as you go".  But although the
> code in qemu-img is the same whether bdrv_make_zero is able to request
> the work be done in the server or whether the work has to be done in the
> client, the code in the block layer that implements bdrv_make_zero may
> itself care about knowing whether the NBD server started all zero.

If we have both 64-bit write_zeroes and a zero flag in the NBD protocol,
then I don’t think there’s much difference in terms of efficiency.

However, if we had to decide between which of the features is more
important for efficiency, then the difference that appears to me is that:
- With just a flag but no 64-bit write_zeroes, zeroing a non-zero image
will be inefficient.
- With just a 64-bit write_zeroes but no flag, the client has to issue a
NOP whole-image write_zeroes call for images that are zero already, but
I don’t really see that as an issue.

So *if* we had to decide, it appears to me that 64-bit write_zeroes
would be more important.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-02-19 11:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10 21:37 Cross-project NBD extension proposal: NBD_INFO_INIT_STATE Eric Blake
2020-02-10 21:41 ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension Eric Blake
2020-02-10 21:41   ` [PATCH 1/3] nbd: Preparation for NBD_INFO_INIT_STATE Eric Blake
2020-02-10 21:41   ` [PATCH 2/3] nbd: Add .bdrv_known_zeroes() client support Eric Blake
2020-02-10 21:41   ` [PATCH 3/3] nbd: Add .bdrv_known_zeroes() server support Eric Blake
2020-02-10 21:51   ` [qemu PATCH 0/3] NBD_INFO_INIT_STATE extension no-reply
2020-02-10 21:54     ` Eric Blake
2020-02-10 21:53   ` no-reply
2020-02-10 22:12 ` Cross-project NBD extension proposal: NBD_INFO_INIT_STATE Richard W.M. Jones
2020-02-10 22:29   ` Eric Blake
2020-02-10 22:52     ` Richard W.M. Jones
2020-02-11 14:33       ` Eric Blake
2020-02-12  7:27       ` Wouter Verhelst
2020-02-12 12:09         ` Eric Blake
2020-02-12 12:36           ` Richard W.M. Jones
2020-02-12 12:47             ` Eric Blake
2020-02-17 15:13 ` Max Reitz
2020-02-18 20:55   ` Eric Blake
2020-02-19 11:10     ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.