[Qemu-devel] [RFC 0/5] block: Generic file truncation/creation fallbacks

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC 0/5] block: Generic file truncation/creation fallbacks
@ 2019-07-11 19:57 Max Reitz
  2019-07-11 19:58 ` [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close() Max Reitz
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Max Reitz @ 2019-07-11 19:57 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Hi,

Some protocol drivers do not really support file truncation but still
implement .bdrv_co_truncate(): They just don’t do anything when asked to
shrink a file.  This is reflected by qemu-img, which warns if you resize
a file and it has the exact same length afterwards as it had before.

We can just do that generically.  There is no reason for some protocol
drivers to act ashamed and pretend nobody notices.  The only thing we
have to take care of is to zero everything in the first sector past the
desired EOF, so that format probing won’t go wrong.

(RFC: Is it really OK to just do this for all block drivers?)

Similarly, we can add a fallback file creation path: If a block driver
does not support creating a file but it already exists, we can just use
it.  All we have to do is truncate it to the desired size.

This is an RFC because it feels weird and I don’t want people to
associate me with weird stuff too closely.

Well, patch 1 isn’t really an RFC.  It’s just a fix.

I was inspired to this series by Maxim’s patch “block/nvme: add support
for image creation” (from his “Few fixes for userspace NVME driver”
series).

Max Reitz (5):
  block/nbd: Fix hang in .bdrv_close()
  block: Generic truncation fallback
  block: Fall back to fallback truncate function
  block: Generic file creation fallback
  iotests: Add test for fallback truncate/create

 block.c                    | 77 ++++++++++++++++++++++++++++++++------
 block/file-posix.c         | 21 +----------
 block/io.c                 | 69 ++++++++++++++++++++++++++++++++--
 block/nbd.c                | 12 +++++-
 block/sheepdog.c           |  2 +-
 tests/qemu-iotests/259     | 71 +++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/259.out | 20 ++++++++++
 tests/qemu-iotests/group   |  1 +
 8 files changed, 235 insertions(+), 38 deletions(-)
 create mode 100755 tests/qemu-iotests/259
 create mode 100644 tests/qemu-iotests/259.out

-- 
2.21.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close()
  2019-07-11 19:57 [Qemu-devel] [RFC 0/5] block: Generic file truncation/creation fallbacks Max Reitz
@ 2019-07-11 19:58 ` Max Reitz
  2019-07-12  9:24   ` Kevin Wolf
  2019-07-11 19:58 ` [Qemu-devel] [RFC 2/5] block: Generic truncation fallback Max Reitz
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2019-07-11 19:58 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

When nbd_close() is called from a coroutine, the connection_co never
gets to run, and thus nbd_teardown_connection() hangs.

This is because aio_co_enter() only puts the connection_co into the main
coroutine's wake-up queue, so this main coroutine needs to yield and
reschedule itself to let the connection_co run.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/nbd.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 81edabbf35..b83b6cd43e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -135,7 +135,17 @@ static void nbd_teardown_connection(BlockDriverState *bs)
     qio_channel_shutdown(s->ioc,
                          QIO_CHANNEL_SHUTDOWN_BOTH,
                          NULL);
-    BDRV_POLL_WHILE(bs, s->connection_co);
+
+    if (qemu_in_coroutine()) {
+        /* Let our caller poll and just yield until connection_co is done */
+        while (s->connection_co) {
+            aio_co_schedule(qemu_get_current_aio_context(),
+                            qemu_coroutine_self());
+            qemu_coroutine_yield();
+        }
+    } else {
+        BDRV_POLL_WHILE(bs, s->connection_co);
+    }
 
     nbd_client_detach_aio_context(bs);
     object_unref(OBJECT(s->sioc));
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC 2/5] block: Generic truncation fallback
  2019-07-11 19:57 [Qemu-devel] [RFC 0/5] block: Generic file truncation/creation fallbacks Max Reitz
  2019-07-11 19:58 ` [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close() Max Reitz
@ 2019-07-11 19:58 ` Max Reitz
  2019-07-12  9:49   ` Kevin Wolf
  2019-07-11 19:58 ` [Qemu-devel] [RFC 3/5] block: Fall back to fallback truncate function Max Reitz
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2019-07-11 19:58 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

If a protocol driver does not support truncation, we call fall back to
effectively not doing anything if the new size is less than the actual
file size.  This is what we have been doing for some host device drivers
already.

The only caveat is that we have to zero out everything in the first
sector that lies beyond the new "EOF" so we do not get any surprises
with format probing.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 65 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 24a18759fd..382728fa9a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3064,6 +3064,57 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs)
     }
 }
 
+static int coroutine_fn bdrv_co_truncate_fallback(BdrvChild *child,
+                                                  int64_t offset,
+                                                  PreallocMode prealloc,
+                                                  Error **errp)
+{
+    BlockDriverState *bs = child->bs;
+    int64_t cur_size = bdrv_getlength(bs);
+
+    if (cur_size < 0) {
+        error_setg_errno(errp, -cur_size,
+                         "Failed to inquire current file size");
+        return cur_size;
+    }
+
+    if (prealloc != PREALLOC_MODE_OFF) {
+        error_setg(errp, "Unsupported preallocation mode: %s",
+                   PreallocMode_str(prealloc));
+        return -ENOTSUP;
+    }
+
+    if (offset > cur_size) {
+        error_setg(errp, "Cannot grow this %s node", bs->drv->format_name);
+        return -ENOTSUP;
+    }
+
+    /*
+     * Overwrite first "post-EOF" parts of the first sector with
+     * zeroes so raw images will not be misprobed
+     */
+    if (offset < BDRV_SECTOR_SIZE && offset < cur_size) {
+        int64_t fill_len = MIN(BDRV_SECTOR_SIZE - offset, cur_size - offset);
+        int ret;
+
+        if (!(child->perm & BLK_PERM_WRITE)) {
+            error_setg(errp, "Cannot write to this node to clear the file past "
+                       "the truncated EOF");
+            return -EPERM;
+        }
+
+        ret = bdrv_co_pwrite_zeroes(child, offset, fill_len, 0);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret,
+                             "Failed to clear file past the truncated EOF");
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+
 /**
  * Truncate file to 'offset' bytes (needed only for file protocols)
  */
@@ -3074,6 +3125,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset,
     BlockDriver *drv = bs->drv;
     BdrvTrackedRequest req;
     int64_t old_size, new_bytes;
+    Error *local_err = NULL;
     int ret;
 
 
@@ -3127,15 +3179,24 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset,
             ret = bdrv_co_truncate(bs->file, offset, prealloc, errp);
             goto out;
         }
-        error_setg(errp, "Image format driver does not support resize");
+        error_setg(&local_err, "Image format driver does not support resize");
         ret = -ENOTSUP;
-        goto out;
+    } else {
+        ret = drv->bdrv_co_truncate(bs, offset, prealloc, &local_err);
     }
 
-    ret = drv->bdrv_co_truncate(bs, offset, prealloc, errp);
-    if (ret < 0) {
+    if (ret == -ENOTSUP && drv->bdrv_file_open) {
+        error_free(local_err);
+
+        ret = bdrv_co_truncate_fallback(child, offset, prealloc, errp);
+        if (ret < 0) {
+            goto out;
+        }
+    } else if (ret < 0) {
+        error_propagate(errp, local_err);
         goto out;
     }
+
     ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Could not refresh total sector count");
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC 3/5] block: Fall back to fallback truncate function
  2019-07-11 19:57 [Qemu-devel] [RFC 0/5] block: Generic file truncation/creation fallbacks Max Reitz
  2019-07-11 19:58 ` [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close() Max Reitz
  2019-07-11 19:58 ` [Qemu-devel] [RFC 2/5] block: Generic truncation fallback Max Reitz
@ 2019-07-11 19:58 ` Max Reitz
  2019-07-12 10:04   ` Kevin Wolf
  2019-07-11 19:58 ` [Qemu-devel] [RFC 4/5] block: Generic file creation fallback Max Reitz
  2019-07-11 19:58 ` [Qemu-devel] [RFC 5/5] iotests: Add test for fallback truncate/create Max Reitz
  4 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2019-07-11 19:58 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

file-posix does not need to basically duplicate our fallback truncate
implementation; and sheepdog can fall back to it for "shrinking" files.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/file-posix.c | 21 +--------------------
 block/sheepdog.c   |  2 +-
 2 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index ab05b51a66..bcddfc7fbe 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2031,23 +2031,7 @@ static int coroutine_fn raw_co_truncate(BlockDriverState *bs, int64_t offset,
         return raw_regular_truncate(bs, s->fd, offset, prealloc, errp);
     }
 
-    if (prealloc != PREALLOC_MODE_OFF) {
-        error_setg(errp, "Preallocation mode '%s' unsupported for this "
-                   "non-regular file", PreallocMode_str(prealloc));
-        return -ENOTSUP;
-    }
-
-    if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
-        if (offset > raw_getlength(bs)) {
-            error_setg(errp, "Cannot grow device files");
-            return -EINVAL;
-        }
-    } else {
-        error_setg(errp, "Resizing this file is not supported");
-        return -ENOTSUP;
-    }
-
-    return 0;
+    return -ENOTSUP;
 }
 
 #ifdef __OpenBSD__
@@ -3413,7 +3397,6 @@ static BlockDriver bdrv_host_device = {
     .bdrv_io_unplug = raw_aio_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
-    .bdrv_co_truncate       = raw_co_truncate,
     .bdrv_getlength	= raw_getlength,
     .bdrv_get_info = raw_get_info,
     .bdrv_get_allocated_file_size
@@ -3537,7 +3520,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_io_unplug = raw_aio_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
-    .bdrv_co_truncate    = raw_co_truncate,
     .bdrv_getlength      = raw_getlength,
     .has_variable_length = true,
     .bdrv_get_allocated_file_size
@@ -3669,7 +3651,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_io_unplug = raw_aio_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
-    .bdrv_co_truncate    = raw_co_truncate,
     .bdrv_getlength      = raw_getlength,
     .has_variable_length = true,
     .bdrv_get_allocated_file_size
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6f402e5d4d..4af4961cb7 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2301,7 +2301,7 @@ static int coroutine_fn sd_co_truncate(BlockDriverState *bs, int64_t offset,
     max_vdi_size = (UINT64_C(1) << s->inode.block_size_shift) * MAX_DATA_OBJS;
     if (offset < old_size) {
         error_setg(errp, "shrinking is not supported");
-        return -EINVAL;
+        return -ENOTSUP;
     } else if (offset > max_vdi_size) {
         error_setg(errp, "too big image size");
         return -EINVAL;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC 4/5] block: Generic file creation fallback
  2019-07-11 19:57 [Qemu-devel] [RFC 0/5] block: Generic file truncation/creation fallbacks Max Reitz
                   ` (2 preceding siblings ...)
  2019-07-11 19:58 ` [Qemu-devel] [RFC 3/5] block: Fall back to fallback truncate function Max Reitz
@ 2019-07-11 19:58 ` Max Reitz
  2019-07-11 19:58 ` [Qemu-devel] [RFC 5/5] iotests: Add test for fallback truncate/create Max Reitz
  4 siblings, 0 replies; 20+ messages in thread
From: Max Reitz @ 2019-07-11 19:58 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

If a protocol driver does not support image creation, we can see whether
maybe the file exists already.  If so, just truncating it will be
sufficient.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 65 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index c139540f2b..8fb8e4dfda 100644
--- a/block.c
+++ b/block.c
@@ -531,20 +531,57 @@ out:
     return ret;
 }
 
-int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
+static int bdrv_create_file_fallback(const char *filename, BlockDriver *drv,
+                                     QemuOpts *opts, Error **errp)
 {
-    BlockDriver *drv;
+    BlockBackend *blk;
+    QDict *options = qdict_new();
+    int64_t size = 0;
+    char *buf = NULL;
+    PreallocMode prealloc;
     Error *local_err = NULL;
     int ret;
 
+    size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
+    buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+    prealloc = qapi_enum_parse(&PreallocMode_lookup, buf,
+                               PREALLOC_MODE_OFF, &local_err);
+    g_free(buf);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return -EINVAL;
+    }
+
+    qdict_put_str(options, "driver", drv->format_name);
+
+    blk = blk_new_open(filename, NULL, options,
+                       BDRV_O_RDWR | BDRV_O_RESIZE, errp);
+    if (!blk) {
+        error_prepend(errp, "Protocol driver '%s' does not support "
+                      "image creation, and opening the image failed: ",
+                      drv->format_name);
+        return -EINVAL;
+    }
+
+    ret = blk_truncate(blk, size, prealloc, errp);
+    blk_unref(blk);
+    return ret;
+}
+
+int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
+{
+    BlockDriver *drv;
+
     drv = bdrv_find_protocol(filename, true, errp);
     if (drv == NULL) {
         return -ENOENT;
     }
 
-    ret = bdrv_create(drv, filename, opts, &local_err);
-    error_propagate(errp, local_err);
-    return ret;
+    if (drv->bdrv_co_create_opts) {
+        return bdrv_create(drv, filename, opts, errp);
+    } else {
+        return bdrv_create_file_fallback(filename, drv, opts, errp);
+    }
 }
 
 /**
@@ -1420,6 +1457,24 @@ QemuOptsList bdrv_runtime_opts = {
     },
 };
 
+static QemuOptsList fallback_create_opts = {
+    .name = "fallback-create-opts",
+    .head = QTAILQ_HEAD_INITIALIZER(fallback_create_opts.head),
+    .desc = {
+        {
+            .name = BLOCK_OPT_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "Virtual disk size"
+        },
+        {
+            .name = BLOCK_OPT_PREALLOC,
+            .type = QEMU_OPT_STRING,
+            .help = "Preallocation mode (allowed values: off)"
+        },
+        { /* end of list */ }
+    }
+};
+
 /*
  * Common part for opening disk images and files
  *
@@ -5681,14 +5736,12 @@ void bdrv_img_create(const char *filename, const char *fmt,
         return;
     }
 
-    if (!proto_drv->create_opts) {
-        error_setg(errp, "Protocol driver '%s' does not support image creation",
-                   proto_drv->format_name);
-        return;
-    }
-
     create_opts = qemu_opts_append(create_opts, drv->create_opts);
-    create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
+    if (proto_drv->create_opts) {
+        create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
+    } else {
+        create_opts = qemu_opts_append(create_opts, &fallback_create_opts);
+    }
 
     /* Create parameter list with default values */
     opts = qemu_opts_create(create_opts, NULL, 0, &error_abort);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC 5/5] iotests: Add test for fallback truncate/create
  2019-07-11 19:57 [Qemu-devel] [RFC 0/5] block: Generic file truncation/creation fallbacks Max Reitz
                   ` (3 preceding siblings ...)
  2019-07-11 19:58 ` [Qemu-devel] [RFC 4/5] block: Generic file creation fallback Max Reitz
@ 2019-07-11 19:58 ` Max Reitz
  4 siblings, 0 replies; 20+ messages in thread
From: Max Reitz @ 2019-07-11 19:58 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/259     | 71 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/259.out | 20 +++++++++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 92 insertions(+)
 create mode 100755 tests/qemu-iotests/259
 create mode 100644 tests/qemu-iotests/259.out

diff --git a/tests/qemu-iotests/259 b/tests/qemu-iotests/259
new file mode 100755
index 0000000000..6e3941378f
--- /dev/null
+++ b/tests/qemu-iotests/259
@@ -0,0 +1,71 @@
+#!/usr/bin/env bash
+#
+# Test generic image creation and truncation fallback (by using NBD)
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=mreitz@redhat.com
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto nbd
+_supported_os Linux
+
+
+_make_test_img 64M
+
+echo
+echo '--- Testing no-op shrinking ---'
+
+$QEMU_IMG resize -f raw --shrink "$TEST_IMG" 32M
+
+echo
+echo '--- Testing non-working growing ---'
+
+$QEMU_IMG resize -f raw "$TEST_IMG" 128M
+
+echo
+echo '--- Testing creation ---'
+
+$QEMU_IMG create -f qcow2 "$TEST_IMG" 64M | _filter_img_create
+$QEMU_IMG info "$TEST_IMG" | _filter_img_info
+
+echo
+echo '--- Testing creation for which the node would need to grow ---'
+
+$QEMU_IMG create -f qcow2 -o preallocation=metadata "$TEST_IMG" 64M 2>&1 \
+    | _filter_img_create
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/259.out b/tests/qemu-iotests/259.out
new file mode 100644
index 0000000000..1e4b11055f
--- /dev/null
+++ b/tests/qemu-iotests/259.out
@@ -0,0 +1,20 @@
+QA output created by 259
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+
+--- Testing no-op shrinking ---
+qemu-img: Image was not resized; resizing may not be supported for this image
+
+--- Testing non-working growing ---
+qemu-img: Cannot grow this nbd node
+
+--- Testing creation ---
+Formatting 'TEST_DIR/t.IMGFMT', fmt=qcow2 size=67108864
+image: TEST_DIR/t.IMGFMT
+file format: qcow2
+virtual size: 64 MiB (67108864 bytes)
+disk size: unavailable
+
+--- Testing creation for which the node would need to grow ---
+qemu-img: TEST_DIR/t.IMGFMT: Could not resize image: Cannot grow this nbd node
+Formatting 'TEST_DIR/t.IMGFMT', fmt=qcow2 size=67108864 preallocation=metadata
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index b34c8e3c0c..80e7603174 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -269,3 +269,4 @@
 254 rw auto backing quick
 255 rw auto quick
 256 rw auto quick
+259 rw auto quick
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close()
  2019-07-11 19:58 ` [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close() Max Reitz
@ 2019-07-12  9:24   ` Kevin Wolf
  2019-07-12 10:47     ` Max Reitz
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2019-07-12  9:24 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
> When nbd_close() is called from a coroutine, the connection_co never
> gets to run, and thus nbd_teardown_connection() hangs.
> 
> This is because aio_co_enter() only puts the connection_co into the main
> coroutine's wake-up queue, so this main coroutine needs to yield and
> reschedule itself to let the connection_co run.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/nbd.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 81edabbf35..b83b6cd43e 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -135,7 +135,17 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>      qio_channel_shutdown(s->ioc,
>                           QIO_CHANNEL_SHUTDOWN_BOTH,
>                           NULL);
> -    BDRV_POLL_WHILE(bs, s->connection_co);
> +
> +    if (qemu_in_coroutine()) {
> +        /* Let our caller poll and just yield until connection_co is done */
> +        while (s->connection_co) {
> +            aio_co_schedule(qemu_get_current_aio_context(),
> +                            qemu_coroutine_self());
> +            qemu_coroutine_yield();
> +        }

Isn't this busy waiting? Why not let s->connection_co wake us up when
it's about to terminate instead of immediately rescheduling ourselves?

> +    } else {
> +        BDRV_POLL_WHILE(bs, s->connection_co);
> +    }
>  
>      nbd_client_detach_aio_context(bs);
>      object_unref(OBJECT(s->sioc));

Kevin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/5] block: Generic truncation fallback
  2019-07-11 19:58 ` [Qemu-devel] [RFC 2/5] block: Generic truncation fallback Max Reitz
@ 2019-07-12  9:49   ` Kevin Wolf
  2019-07-12 10:58     ` Max Reitz
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2019-07-12  9:49 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
> If a protocol driver does not support truncation, we call fall back to
> effectively not doing anything if the new size is less than the actual
> file size.  This is what we have been doing for some host device drivers
> already.

Specifically, we're doing it for drivers that access a fixed-size image,
i.e. block devices rather than regular files. We don't want to do this
for drivers where the file size could be changed, but just didn't
implement it.

So I would suggest calling the function more specifically something like
bdrv_co_truncate_blockdev(), and not using it as an automatic fallback
in bdrv_co_truncate(), but just make it the BlockDriver.bdrv_co_truncate
implementation for those drivers where it makes sense.

And of course, we only need these fake implementations because qemu-img
(or .bdrv_co_create_opts) always wants to create the protocol level. If
we could avoid this, then we wouldn't need any of this.

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 3/5] block: Fall back to fallback truncate function
  2019-07-11 19:58 ` [Qemu-devel] [RFC 3/5] block: Fall back to fallback truncate function Max Reitz
@ 2019-07-12 10:04   ` Kevin Wolf
  2019-07-12 11:05     ` Max Reitz
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2019-07-12 10:04 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
> file-posix does not need to basically duplicate our fallback truncate
> implementation; and sheepdog can fall back to it for "shrinking" files.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/file-posix.c | 21 +--------------------
>  block/sheepdog.c   |  2 +-
>  2 files changed, 2 insertions(+), 21 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index ab05b51a66..bcddfc7fbe 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2031,23 +2031,7 @@ static int coroutine_fn raw_co_truncate(BlockDriverState *bs, int64_t offset,
>          return raw_regular_truncate(bs, s->fd, offset, prealloc, errp);
>      }

The only thing that is left here is a fstat() check to see that we
really have a regular file here. But since this function is for the
'file' driver, we can just assume this and the function can go away
altogether.

In fact, 'file' with block/character devices has been deprecated since
3.0, so we can just remove support for it now and make it more than just
an assumption.

> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 6f402e5d4d..4af4961cb7 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -2301,7 +2301,7 @@ static int coroutine_fn sd_co_truncate(BlockDriverState *bs, int64_t offset,
>      max_vdi_size = (UINT64_C(1) << s->inode.block_size_shift) * MAX_DATA_OBJS;
>      if (offset < old_size) {
>          error_setg(errp, "shrinking is not supported");
> -        return -EINVAL;
> +        return -ENOTSUP;
>      } else if (offset > max_vdi_size) {
>          error_setg(errp, "too big image size");
>          return -EINVAL;

The image will be unchanged and the guest will keep seeing the old image
size, so is there any use case where having the fallback here makes
sense? The only expected case where an image is shrunk is that the user
explicitly sent block_resize - and in that case returning success, but
doing nothing achieves nothing except confusing the user.

file-posix has the same confusing case, but at least it also has cases
where the fake truncate actually achieves something (such a allowing to
create images on block devices).

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close()
  2019-07-12  9:24   ` Kevin Wolf
@ 2019-07-12 10:47     ` Max Reitz
  2019-07-12 11:01       ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2019-07-12 10:47 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 2327 bytes --]

On 12.07.19 11:24, Kevin Wolf wrote:
> Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
>> When nbd_close() is called from a coroutine, the connection_co never
>> gets to run, and thus nbd_teardown_connection() hangs.
>>
>> This is because aio_co_enter() only puts the connection_co into the main
>> coroutine's wake-up queue, so this main coroutine needs to yield and
>> reschedule itself to let the connection_co run.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/nbd.c | 12 +++++++++++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/nbd.c b/block/nbd.c
>> index 81edabbf35..b83b6cd43e 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -135,7 +135,17 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>      qio_channel_shutdown(s->ioc,
>>                           QIO_CHANNEL_SHUTDOWN_BOTH,
>>                           NULL);
>> -    BDRV_POLL_WHILE(bs, s->connection_co);
>> +
>> +    if (qemu_in_coroutine()) {
>> +        /* Let our caller poll and just yield until connection_co is done */
>> +        while (s->connection_co) {
>> +            aio_co_schedule(qemu_get_current_aio_context(),
>> +                            qemu_coroutine_self());
>> +            qemu_coroutine_yield();
>> +        }
> 
> Isn't this busy waiting? Why not let s->connection_co wake us up when
> it's about to terminate instead of immediately rescheduling ourselves?

Yes, it is busy waiting, but I didn’t find that bad.  The connection_co
will be invoked in basically every iteration, and once there is no
pending data, it will quit.

The answer to “why not...” of course is because it’d be more complicated.

But anyway.

Adding a new function qemu_coroutine_run_after(target) that adds
qemu_coroutine_self() to the given @target coroutine’s wake-up queue and
then using that instead of scheduling works, too, yes.

I don’t really like being responsible for coroutine code, though...

(And maybe it’d be better to make it qemu_coroutine_yield_for(target),
which does the above and then yields?)

Max

>> +    } else {
>> +        BDRV_POLL_WHILE(bs, s->connection_co);
>> +    }
>>  
>>      nbd_client_detach_aio_context(bs);
>>      object_unref(OBJECT(s->sioc));
> 
> Kevin
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/5] block: Generic truncation fallback
  2019-07-12  9:49   ` Kevin Wolf
@ 2019-07-12 10:58     ` Max Reitz
  2019-07-12 11:17       ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2019-07-12 10:58 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1.1: Type: text/plain, Size: 2247 bytes --]

On 12.07.19 11:49, Kevin Wolf wrote:
> Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
>> If a protocol driver does not support truncation, we call fall back to
>> effectively not doing anything if the new size is less than the actual
>> file size.  This is what we have been doing for some host device drivers
>> already.
> 
> Specifically, we're doing it for drivers that access a fixed-size image,
> i.e. block devices rather than regular files. We don't want to do this
> for drivers where the file size could be changed, but just didn't
> implement it.
> 
> So I would suggest calling the function more specifically something like
> bdrv_co_truncate_blockdev(), and not using it as an automatic fallback
> in bdrv_co_truncate(), but just make it the BlockDriver.bdrv_co_truncate
> implementation for those drivers where it makes sense.

I was thinking about this, but the problem is that .bdrv_co_truncate()
does not get a BdrvChild, so an implementation for it cannot generically
zero the first sector (without bypassing the permission system, which
would be wrong).

So the function pointer would actually need to be set to something like
(int (*)(BlockDriverState *, int64_t, PreallocMode, Error **))42ul, or a
dummy function that just aborts, and then bdrv_co_truncate() would
recognize this magic constant.  But that seemed so weird to me that I
decided just not to do it, mostly because I was wondering what would be
so bad about treating images whose size we cannot change because we
haven’t implemented it exactly like fixed-size images.

(Also, “fixed-size” is up to interpretation.  You can change an LVM
volume’s size.  qemu doesn’t do it, obviously.  But that is the reason
for the warning qemu-img resize emits when it sees that the file size
did not change.)

> And of course, we only need these fake implementations because qemu-img
> (or .bdrv_co_create_opts) always wants to create the protocol level. If
> we could avoid this, then we wouldn't need any of this.

It’s trivial to avoid this.  I mean, patch 4 does exactly that.

So it isn’t about avoiding creating the protocol level, it’s about
avoiding the truncation there.  But why would you do that?

Max

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close()
  2019-07-12 10:47     ` Max Reitz
@ 2019-07-12 11:01       ` Kevin Wolf
  2019-07-12 11:09         ` Max Reitz
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2019-07-12 11:01 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3643 bytes --]

Am 12.07.2019 um 12:47 hat Max Reitz geschrieben:
> On 12.07.19 11:24, Kevin Wolf wrote:
> > Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
> >> When nbd_close() is called from a coroutine, the connection_co never
> >> gets to run, and thus nbd_teardown_connection() hangs.
> >>
> >> This is because aio_co_enter() only puts the connection_co into the main
> >> coroutine's wake-up queue, so this main coroutine needs to yield and
> >> reschedule itself to let the connection_co run.
> >>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> ---
> >>  block/nbd.c | 12 +++++++++++-
> >>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/block/nbd.c b/block/nbd.c
> >> index 81edabbf35..b83b6cd43e 100644
> >> --- a/block/nbd.c
> >> +++ b/block/nbd.c
> >> @@ -135,7 +135,17 @@ static void nbd_teardown_connection(BlockDriverState *bs)
> >>      qio_channel_shutdown(s->ioc,
> >>                           QIO_CHANNEL_SHUTDOWN_BOTH,
> >>                           NULL);
> >> -    BDRV_POLL_WHILE(bs, s->connection_co);
> >> +
> >> +    if (qemu_in_coroutine()) {
> >> +        /* Let our caller poll and just yield until connection_co is done */
> >> +        while (s->connection_co) {
> >> +            aio_co_schedule(qemu_get_current_aio_context(),
> >> +                            qemu_coroutine_self());
> >> +            qemu_coroutine_yield();
> >> +        }
> > 
> > Isn't this busy waiting? Why not let s->connection_co wake us up when
> > it's about to terminate instead of immediately rescheduling ourselves?
> 
> Yes, it is busy waiting, but I didn’t find that bad.  The connection_co
> will be invoked in basically every iteration, and once there is no
> pending data, it will quit.
> 
> The answer to “why not...” of course is because it’d be more complicated.
> 
> But anyway.
> 
> Adding a new function qemu_coroutine_run_after(target) that adds
> qemu_coroutine_self() to the given @target coroutine’s wake-up queue and
> then using that instead of scheduling works, too, yes.
> 
> I don’t really like being responsible for coroutine code, though...
> 
> (And maybe it’d be better to make it qemu_coroutine_yield_for(target),
> which does the above and then yields?)

Or just do something like this, which is arguably not only a fix for the
busy wait, but also a code simplification:

diff --git a/block/nbd.c b/block/nbd.c
index b83b6cd43e..c061bd1bfc 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -61,6 +61,7 @@ typedef struct BDRVNBDState {
     CoMutex send_mutex;
     CoQueue free_sema;
     Coroutine *connection_co;
+    Coroutine *teardown_co;
     int in_flight;

     NBDClientRequest requests[MAX_NBD_REQUESTS];
@@ -137,12 +138,9 @@ static void nbd_teardown_connection(BlockDriverState *bs)
                          NULL);

     if (qemu_in_coroutine()) {
-        /* Let our caller poll and just yield until connection_co is done */
-        while (s->connection_co) {
-            aio_co_schedule(qemu_get_current_aio_context(),
-                            qemu_coroutine_self());
-            qemu_coroutine_yield();
-        }
+        /* just yield until connection_co is done */
+        s->teardown_co = qemu_coroutine_self();
+        qemu_coroutine_yield();
     } else {
         BDRV_POLL_WHILE(bs, s->connection_co);
     }
@@ -217,6 +215,9 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     bdrv_dec_in_flight(s->bs);

     s->connection_co = NULL;
+    if (s->teardown_co) {
+        aio_co_wake(s->teardown_co);
+    }
     aio_wait_kick();
 }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 3/5] block: Fall back to fallback truncate function
  2019-07-12 10:04   ` Kevin Wolf
@ 2019-07-12 11:05     ` Max Reitz
  0 siblings, 0 replies; 20+ messages in thread
From: Max Reitz @ 2019-07-12 11:05 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 2945 bytes --]

On 12.07.19 12:04, Kevin Wolf wrote:
> Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
>> file-posix does not need to basically duplicate our fallback truncate
>> implementation; and sheepdog can fall back to it for "shrinking" files.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/file-posix.c | 21 +--------------------
>>  block/sheepdog.c   |  2 +-
>>  2 files changed, 2 insertions(+), 21 deletions(-)
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index ab05b51a66..bcddfc7fbe 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -2031,23 +2031,7 @@ static int coroutine_fn raw_co_truncate(BlockDriverState *bs, int64_t offset,
>>          return raw_regular_truncate(bs, s->fd, offset, prealloc, errp);
>>      }
> 
> The only thing that is left here is a fstat() check to see that we
> really have a regular file here. But since this function is for the
> 'file' driver, we can just assume this and the function can go away
> altogether.
> 
> In fact, 'file' with block/character devices has been deprecated since
> 3.0, so we can just remove support for it now and make it more than just
> an assumption.
> 
>> diff --git a/block/sheepdog.c b/block/sheepdog.c
>> index 6f402e5d4d..4af4961cb7 100644
>> --- a/block/sheepdog.c
>> +++ b/block/sheepdog.c
>> @@ -2301,7 +2301,7 @@ static int coroutine_fn sd_co_truncate(BlockDriverState *bs, int64_t offset,
>>      max_vdi_size = (UINT64_C(1) << s->inode.block_size_shift) * MAX_DATA_OBJS;
>>      if (offset < old_size) {
>>          error_setg(errp, "shrinking is not supported");
>> -        return -EINVAL;
>> +        return -ENOTSUP;
>>      } else if (offset > max_vdi_size) {
>>          error_setg(errp, "too big image size");
>>          return -EINVAL;
> 
> The image will be unchanged and the guest will keep seeing the old image
> size, so is there any use case where having the fallback here makes
> sense? The only expected case where an image is shrunk is that the user
> explicitly sent block_resize - and in that case returning success, but
> doing nothing achieves nothing except confusing the user.
> 
> file-posix has the same confusing case, but at least it also has cases
> where the fake truncate actually achieves something (such a allowing to
> create images on block devices).

Hm, yes, that’s right.  It is as confusing for block devices, but that
at least gives something in return.  For sheepdog (and others, like
ssh), there is nothing in return.

Explaining for every block driver why it needs to be a bit confusing and
fall back to the fixed-size device implementation (because it doesn’t
implement creation) seems a bit off, too.

Hm.  Maybe the protocol creation fallback should just ignore failures
when truncating an image and in such a case zero the first sector of the
image?  Maybe it should just ignore ENOTSUP errors?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close()
  2019-07-12 11:01       ` Kevin Wolf
@ 2019-07-12 11:09         ` Max Reitz
  2019-07-12 11:23           ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2019-07-12 11:09 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 4100 bytes --]

On 12.07.19 13:01, Kevin Wolf wrote:
> Am 12.07.2019 um 12:47 hat Max Reitz geschrieben:
>> On 12.07.19 11:24, Kevin Wolf wrote:
>>> Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
>>>> When nbd_close() is called from a coroutine, the connection_co never
>>>> gets to run, and thus nbd_teardown_connection() hangs.
>>>>
>>>> This is because aio_co_enter() only puts the connection_co into the main
>>>> coroutine's wake-up queue, so this main coroutine needs to yield and
>>>> reschedule itself to let the connection_co run.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>  block/nbd.c | 12 +++++++++++-
>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/nbd.c b/block/nbd.c
>>>> index 81edabbf35..b83b6cd43e 100644
>>>> --- a/block/nbd.c
>>>> +++ b/block/nbd.c
>>>> @@ -135,7 +135,17 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>>>      qio_channel_shutdown(s->ioc,
>>>>                           QIO_CHANNEL_SHUTDOWN_BOTH,
>>>>                           NULL);
>>>> -    BDRV_POLL_WHILE(bs, s->connection_co);
>>>> +
>>>> +    if (qemu_in_coroutine()) {
>>>> +        /* Let our caller poll and just yield until connection_co is done */
>>>> +        while (s->connection_co) {
>>>> +            aio_co_schedule(qemu_get_current_aio_context(),
>>>> +                            qemu_coroutine_self());
>>>> +            qemu_coroutine_yield();
>>>> +        }
>>>
>>> Isn't this busy waiting? Why not let s->connection_co wake us up when
>>> it's about to terminate instead of immediately rescheduling ourselves?
>>
>> Yes, it is busy waiting, but I didn’t find that bad.  The connection_co
>> will be invoked in basically every iteration, and once there is no
>> pending data, it will quit.
>>
>> The answer to “why not...” of course is because it’d be more complicated.
>>
>> But anyway.
>>
>> Adding a new function qemu_coroutine_run_after(target) that adds
>> qemu_coroutine_self() to the given @target coroutine’s wake-up queue and
>> then using that instead of scheduling works, too, yes.
>>
>> I don’t really like being responsible for coroutine code, though...
>>
>> (And maybe it’d be better to make it qemu_coroutine_yield_for(target),
>> which does the above and then yields?)
> 
> Or just do something like this, which is arguably not only a fix for the
> busy wait, but also a code simplification:

1. Is that guaranteed to work?  What if data sneaks in, the
connection_co handles that, and doesn’t wake up the teardown_co?  Will
it be re-scheduled?

2. I precisely didn’t want to do this because we have this functionality
already in the form of Coroutine.co_queue_wakeup.  Why duplicate it here?

Max

> diff --git a/block/nbd.c b/block/nbd.c
> index b83b6cd43e..c061bd1bfc 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -61,6 +61,7 @@ typedef struct BDRVNBDState {
>      CoMutex send_mutex;
>      CoQueue free_sema;
>      Coroutine *connection_co;
> +    Coroutine *teardown_co;
>      int in_flight;
> 
>      NBDClientRequest requests[MAX_NBD_REQUESTS];
> @@ -137,12 +138,9 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>                           NULL);
> 
>      if (qemu_in_coroutine()) {
> -        /* Let our caller poll and just yield until connection_co is done */
> -        while (s->connection_co) {
> -            aio_co_schedule(qemu_get_current_aio_context(),
> -                            qemu_coroutine_self());
> -            qemu_coroutine_yield();
> -        }
> +        /* just yield until connection_co is done */
> +        s->teardown_co = qemu_coroutine_self();
> +        qemu_coroutine_yield();
>      } else {
>          BDRV_POLL_WHILE(bs, s->connection_co);
>      }
> @@ -217,6 +215,9 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
>      bdrv_dec_in_flight(s->bs);
> 
>      s->connection_co = NULL;
> +    if (s->teardown_co) {
> +        aio_co_wake(s->teardown_co);
> +    }
>      aio_wait_kick();
>  }
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/5] block: Generic truncation fallback
  2019-07-12 10:58     ` Max Reitz
@ 2019-07-12 11:17       ` Kevin Wolf
  2019-07-12 11:48         ` Max Reitz
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2019-07-12 11:17 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3095 bytes --]

Am 12.07.2019 um 12:58 hat Max Reitz geschrieben:
> On 12.07.19 11:49, Kevin Wolf wrote:
> > Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
> >> If a protocol driver does not support truncation, we call fall back to
> >> effectively not doing anything if the new size is less than the actual
> >> file size.  This is what we have been doing for some host device drivers
> >> already.
> > 
> > Specifically, we're doing it for drivers that access a fixed-size image,
> > i.e. block devices rather than regular files. We don't want to do this
> > for drivers where the file size could be changed, but just didn't
> > implement it.
> > 
> > So I would suggest calling the function more specifically something like
> > bdrv_co_truncate_blockdev(), and not using it as an automatic fallback
> > in bdrv_co_truncate(), but just make it the BlockDriver.bdrv_co_truncate
> > implementation for those drivers where it makes sense.
> 
> I was thinking about this, but the problem is that .bdrv_co_truncate()
> does not get a BdrvChild, so an implementation for it cannot generically
> zero the first sector (without bypassing the permission system, which
> would be wrong).

Hm, I see. The next best thing would then probably having a bool in
BlockDriver that enables the fallback.

> So the function pointer would actually need to be set to something like
> (int (*)(BlockDriverState *, int64_t, PreallocMode, Error **))42ul, or a
> dummy function that just aborts, and then bdrv_co_truncate() would
> recognize this magic constant.  But that seemed so weird to me that I
> decided just not to do it, mostly because I was wondering what would be
> so bad about treating images whose size we cannot change because we
> haven’t implemented it exactly like fixed-size images.
> 
> (Also, “fixed-size” is up to interpretation.  You can change an LVM
> volume’s size.  qemu doesn’t do it, obviously.  But that is the reason
> for the warning qemu-img resize emits when it sees that the file size
> did not change.)
> 
> > And of course, we only need these fake implementations because qemu-img
> > (or .bdrv_co_create_opts) always wants to create the protocol level. If
> > we could avoid this, then we wouldn't need any of this.
> 
> It’s trivial to avoid this.  I mean, patch 4 does exactly that.
> 
> So it isn’t about avoiding creating the protocol level, it’s about
> avoiding the truncation there.  But why would you do that?

Because we can't actually truncate there. We can only do the fake thing
and claim we have truncated even though the size didn't change.

But actually, while avoiding the fake truncate outside of image creation
would avoid confusing situations where the user requested image
shrinking, gets success, but nothing happened, it would also break
actual resizes where the volume is resized externally and block_resize
is only called to notify qemu to pick up the size change.

So after all, we probably do need to keep the hack in bdrv_co_truncate()
instead of moving it to image creation only.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close()
  2019-07-12 11:09         ` Max Reitz
@ 2019-07-12 11:23           ` Kevin Wolf
  2019-07-12 11:44             ` Max Reitz
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2019-07-12 11:23 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 4948 bytes --]

Am 12.07.2019 um 13:09 hat Max Reitz geschrieben:
> On 12.07.19 13:01, Kevin Wolf wrote:
> > Am 12.07.2019 um 12:47 hat Max Reitz geschrieben:
> >> On 12.07.19 11:24, Kevin Wolf wrote:
> >>> Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
> >>>> When nbd_close() is called from a coroutine, the connection_co never
> >>>> gets to run, and thus nbd_teardown_connection() hangs.
> >>>>
> >>>> This is because aio_co_enter() only puts the connection_co into the main
> >>>> coroutine's wake-up queue, so this main coroutine needs to yield and
> >>>> reschedule itself to let the connection_co run.
> >>>>
> >>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>> ---
> >>>>  block/nbd.c | 12 +++++++++++-
> >>>>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/block/nbd.c b/block/nbd.c
> >>>> index 81edabbf35..b83b6cd43e 100644
> >>>> --- a/block/nbd.c
> >>>> +++ b/block/nbd.c
> >>>> @@ -135,7 +135,17 @@ static void nbd_teardown_connection(BlockDriverState *bs)
> >>>>      qio_channel_shutdown(s->ioc,
> >>>>                           QIO_CHANNEL_SHUTDOWN_BOTH,
> >>>>                           NULL);
> >>>> -    BDRV_POLL_WHILE(bs, s->connection_co);
> >>>> +
> >>>> +    if (qemu_in_coroutine()) {
> >>>> +        /* Let our caller poll and just yield until connection_co is done */
> >>>> +        while (s->connection_co) {
> >>>> +            aio_co_schedule(qemu_get_current_aio_context(),
> >>>> +                            qemu_coroutine_self());
> >>>> +            qemu_coroutine_yield();
> >>>> +        }
> >>>
> >>> Isn't this busy waiting? Why not let s->connection_co wake us up when
> >>> it's about to terminate instead of immediately rescheduling ourselves?
> >>
> >> Yes, it is busy waiting, but I didn’t find that bad.  The connection_co
> >> will be invoked in basically every iteration, and once there is no
> >> pending data, it will quit.
> >>
> >> The answer to “why not...” of course is because it’d be more complicated.
> >>
> >> But anyway.
> >>
> >> Adding a new function qemu_coroutine_run_after(target) that adds
> >> qemu_coroutine_self() to the given @target coroutine’s wake-up queue and
> >> then using that instead of scheduling works, too, yes.
> >>
> >> I don’t really like being responsible for coroutine code, though...
> >>
> >> (And maybe it’d be better to make it qemu_coroutine_yield_for(target),
> >> which does the above and then yields?)
> > 
> > Or just do something like this, which is arguably not only a fix for the
> > busy wait, but also a code simplification:
> 
> 1. Is that guaranteed to work?  What if data sneaks in, the
> connection_co handles that, and doesn’t wake up the teardown_co?  Will
> it be re-scheduled?

Then connection_co is buggy because we clearly requested that it
terminate. It is possible that it does so only after handling another
request, but this wouldn't be a problem. teardown_co would then just
sleep for a few cycles more until connection_co is done and reaches the
aio_co_wake() call.

> 2. I precisely didn’t want to do this because we have this functionality
> already in the form of Coroutine.co_queue_wakeup.  Why duplicate it here?

co_queue_wakeup contains coroutines to be run at the next yield point
(or termination), which may be when connection_co is actually done, but
it might also be earlier. My explicit aio_co_wake() at the end of
connection_co is guaranteed to run only when connection_co is done.

Kevin

> > diff --git a/block/nbd.c b/block/nbd.c
> > index b83b6cd43e..c061bd1bfc 100644
> > --- a/block/nbd.c
> > +++ b/block/nbd.c
> > @@ -61,6 +61,7 @@ typedef struct BDRVNBDState {
> >      CoMutex send_mutex;
> >      CoQueue free_sema;
> >      Coroutine *connection_co;
> > +    Coroutine *teardown_co;
> >      int in_flight;
> > 
> >      NBDClientRequest requests[MAX_NBD_REQUESTS];
> > @@ -137,12 +138,9 @@ static void nbd_teardown_connection(BlockDriverState *bs)
> >                           NULL);
> > 
> >      if (qemu_in_coroutine()) {
> > -        /* Let our caller poll and just yield until connection_co is done */
> > -        while (s->connection_co) {
> > -            aio_co_schedule(qemu_get_current_aio_context(),
> > -                            qemu_coroutine_self());
> > -            qemu_coroutine_yield();
> > -        }
> > +        /* just yield until connection_co is done */
> > +        s->teardown_co = qemu_coroutine_self();
> > +        qemu_coroutine_yield();
> >      } else {
> >          BDRV_POLL_WHILE(bs, s->connection_co);
> >      }
> > @@ -217,6 +215,9 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
> >      bdrv_dec_in_flight(s->bs);
> > 
> >      s->connection_co = NULL;
> > +    if (s->teardown_co) {
> > +        aio_co_wake(s->teardown_co);
> > +    }
> >      aio_wait_kick();
> >  }
> > 
> 
> 




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close()
  2019-07-12 11:23           ` Kevin Wolf
@ 2019-07-12 11:44             ` Max Reitz
  2019-07-12 12:30               ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2019-07-12 11:44 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 5038 bytes --]

On 12.07.19 13:23, Kevin Wolf wrote:
> Am 12.07.2019 um 13:09 hat Max Reitz geschrieben:
>> On 12.07.19 13:01, Kevin Wolf wrote:
>>> Am 12.07.2019 um 12:47 hat Max Reitz geschrieben:
>>>> On 12.07.19 11:24, Kevin Wolf wrote:
>>>>> Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
>>>>>> When nbd_close() is called from a coroutine, the connection_co never
>>>>>> gets to run, and thus nbd_teardown_connection() hangs.
>>>>>>
>>>>>> This is because aio_co_enter() only puts the connection_co into the main
>>>>>> coroutine's wake-up queue, so this main coroutine needs to yield and
>>>>>> reschedule itself to let the connection_co run.
>>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>  block/nbd.c | 12 +++++++++++-
>>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/block/nbd.c b/block/nbd.c
>>>>>> index 81edabbf35..b83b6cd43e 100644
>>>>>> --- a/block/nbd.c
>>>>>> +++ b/block/nbd.c
>>>>>> @@ -135,7 +135,17 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>>>>>      qio_channel_shutdown(s->ioc,
>>>>>>                           QIO_CHANNEL_SHUTDOWN_BOTH,
>>>>>>                           NULL);
>>>>>> -    BDRV_POLL_WHILE(bs, s->connection_co);
>>>>>> +
>>>>>> +    if (qemu_in_coroutine()) {
>>>>>> +        /* Let our caller poll and just yield until connection_co is done */
>>>>>> +        while (s->connection_co) {
>>>>>> +            aio_co_schedule(qemu_get_current_aio_context(),
>>>>>> +                            qemu_coroutine_self());
>>>>>> +            qemu_coroutine_yield();
>>>>>> +        }
>>>>>
>>>>> Isn't this busy waiting? Why not let s->connection_co wake us up when
>>>>> it's about to terminate instead of immediately rescheduling ourselves?
>>>>
>>>> Yes, it is busy waiting, but I didn’t find that bad.  The connection_co
>>>> will be invoked in basically every iteration, and once there is no
>>>> pending data, it will quit.
>>>>
>>>> The answer to “why not...” of course is because it’d be more complicated.
>>>>
>>>> But anyway.
>>>>
>>>> Adding a new function qemu_coroutine_run_after(target) that adds
>>>> qemu_coroutine_self() to the given @target coroutine’s wake-up queue and
>>>> then using that instead of scheduling works, too, yes.
>>>>
>>>> I don’t really like being responsible for coroutine code, though...
>>>>
>>>> (And maybe it’d be better to make it qemu_coroutine_yield_for(target),
>>>> which does the above and then yields?)
>>>
>>> Or just do something like this, which is arguably not only a fix for the
>>> busy wait, but also a code simplification:
>>
>> 1. Is that guaranteed to work?  What if data sneaks in, the
>> connection_co handles that, and doesn’t wake up the teardown_co?  Will
>> it be re-scheduled?
> 
> Then connection_co is buggy because we clearly requested that it
> terminate.

Did we?  This would be done by setting s->quit to true, which isn’t
explicitly done here.

I thought it worked by just waking up the coroutine until it doesn’t
receive anything anymore, because the connection is closed.  Now I don’t
know whether QIO_CHANNEL_SHUTDOWN_BOTH discards data that has been
received before the channel is closed.  I don’t expect it to.

>            It is possible that it does so only after handling another
> request, but this wouldn't be a problem. teardown_co would then just
> sleep for a few cycles more until connection_co is done and reaches the
> aio_co_wake() call.

I don’t quite understand, because the fact how connection_co would
proceed after handling another request was exactly my question.  If it
were to yield and not to wake up, it would never be done.

But I’ve followed nbd_receive_reply() now, and I suppose nbd_read_eof()
will simply never yield after we have invoked qio_channel_shutdown().

>> 2. I precisely didn’t want to do this because we have this functionality
>> already in the form of Coroutine.co_queue_wakeup.  Why duplicate it here?
> 
> co_queue_wakeup contains coroutines to be run at the next yield point
> (or termination), which may be when connection_co is actually done, but
> it might also be earlier.

Yes, if it handles another request, that would be the yield point at the
end of its main loop.

But that would be solved by simply putting the yield_for() in a loop,
like the one that already exists for the non-coroutine case with
BDRV_POLL_WHILE().

>                           My explicit aio_co_wake() at the end of
> connection_co is guaranteed to run only when connection_co is done.

I can’t explain it, it feels off to me to add this field to BDRVNBDState
and let connection_co handle it just for this special case.  It seems to
me that if we had a yield_for() function, we could simply use it instead
-- and I’d prefer a generic function over a special-case implementation.

I do understand that “it feels off” is not a reasonable justification
for doing anything.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/5] block: Generic truncation fallback
  2019-07-12 11:17       ` Kevin Wolf
@ 2019-07-12 11:48         ` Max Reitz
  2019-07-12 13:48           ` Max Reitz
  0 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2019-07-12 11:48 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 3620 bytes --]

On 12.07.19 13:17, Kevin Wolf wrote:
> Am 12.07.2019 um 12:58 hat Max Reitz geschrieben:
>> On 12.07.19 11:49, Kevin Wolf wrote:
>>> Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
>>>> If a protocol driver does not support truncation, we call fall back to
>>>> effectively not doing anything if the new size is less than the actual
>>>> file size.  This is what we have been doing for some host device drivers
>>>> already.
>>>
>>> Specifically, we're doing it for drivers that access a fixed-size image,
>>> i.e. block devices rather than regular files. We don't want to do this
>>> for drivers where the file size could be changed, but just didn't
>>> implement it.
>>>
>>> So I would suggest calling the function more specifically something like
>>> bdrv_co_truncate_blockdev(), and not using it as an automatic fallback
>>> in bdrv_co_truncate(), but just make it the BlockDriver.bdrv_co_truncate
>>> implementation for those drivers where it makes sense.
>>
>> I was thinking about this, but the problem is that .bdrv_co_truncate()
>> does not get a BdrvChild, so an implementation for it cannot generically
>> zero the first sector (without bypassing the permission system, which
>> would be wrong).
> 
> Hm, I see. The next best thing would then probably having a bool in
> BlockDriver that enables the fallback.
> 
>> So the function pointer would actually need to be set to something like
>> (int (*)(BlockDriverState *, int64_t, PreallocMode, Error **))42ul, or a
>> dummy function that just aborts, and then bdrv_co_truncate() would
>> recognize this magic constant.  But that seemed so weird to me that I
>> decided just not to do it, mostly because I was wondering what would be
>> so bad about treating images whose size we cannot change because we
>> haven’t implemented it exactly like fixed-size images.
>>
>> (Also, “fixed-size” is up to interpretation.  You can change an LVM
>> volume’s size.  qemu doesn’t do it, obviously.  But that is the reason
>> for the warning qemu-img resize emits when it sees that the file size
>> did not change.)
>>
>>> And of course, we only need these fake implementations because qemu-img
>>> (or .bdrv_co_create_opts) always wants to create the protocol level. If
>>> we could avoid this, then we wouldn't need any of this.
>>
>> It’s trivial to avoid this.  I mean, patch 4 does exactly that.
>>
>> So it isn’t about avoiding creating the protocol level, it’s about
>> avoiding the truncation there.  But why would you do that?
> 
> Because we can't actually truncate there. We can only do the fake thing
> and claim we have truncated even though the size didn't change.

You’re right.  I actually didn’t realize that we have no drivers that
support truncation, but not image creation.

Yes, then it’s stupid.

I thought it was a reasonable thing to do for such drivers.

So I suppose the best thing is to do what I hinted at in my reply to
your reply to patch 3, which is to drop patches 2 and 3 and instead make
the creation fallback work around blk_truncate() failures.

Max

> But actually, while avoiding the fake truncate outside of image creation
> would avoid confusing situations where the user requested image
> shrinking, gets success, but nothing happened, it would also break
> actual resizes where the volume is resized externally and block_resize
> is only called to notify qemu to pick up the size change.
> 
> So after all, we probably do need to keep the hack in bdrv_co_truncate()
> instead of moving it to image creation only.
> 
> Kevin
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close()
  2019-07-12 11:44             ` Max Reitz
@ 2019-07-12 12:30               ` Kevin Wolf
  0 siblings, 0 replies; 20+ messages in thread
From: Kevin Wolf @ 2019-07-12 12:30 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 4871 bytes --]

Am 12.07.2019 um 13:44 hat Max Reitz geschrieben:
> On 12.07.19 13:23, Kevin Wolf wrote:
> > Am 12.07.2019 um 13:09 hat Max Reitz geschrieben:
> >> On 12.07.19 13:01, Kevin Wolf wrote:
> >>> Am 12.07.2019 um 12:47 hat Max Reitz geschrieben:
> >>>> On 12.07.19 11:24, Kevin Wolf wrote:
> >>>>> Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
> >>>>>> When nbd_close() is called from a coroutine, the connection_co never
> >>>>>> gets to run, and thus nbd_teardown_connection() hangs.
> >>>>>>
> >>>>>> This is because aio_co_enter() only puts the connection_co into the main
> >>>>>> coroutine's wake-up queue, so this main coroutine needs to yield and
> >>>>>> reschedule itself to let the connection_co run.
> >>>>>>
> >>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>>> ---
> >>>>>>  block/nbd.c | 12 +++++++++++-
> >>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/block/nbd.c b/block/nbd.c
> >>>>>> index 81edabbf35..b83b6cd43e 100644
> >>>>>> --- a/block/nbd.c
> >>>>>> +++ b/block/nbd.c
> >>>>>> @@ -135,7 +135,17 @@ static void nbd_teardown_connection(BlockDriverState *bs)
> >>>>>>      qio_channel_shutdown(s->ioc,
> >>>>>>                           QIO_CHANNEL_SHUTDOWN_BOTH,
> >>>>>>                           NULL);
> >>>>>> -    BDRV_POLL_WHILE(bs, s->connection_co);
> >>>>>> +
> >>>>>> +    if (qemu_in_coroutine()) {
> >>>>>> +        /* Let our caller poll and just yield until connection_co is done */
> >>>>>> +        while (s->connection_co) {
> >>>>>> +            aio_co_schedule(qemu_get_current_aio_context(),
> >>>>>> +                            qemu_coroutine_self());
> >>>>>> +            qemu_coroutine_yield();
> >>>>>> +        }
> >>>>>
> >>>>> Isn't this busy waiting? Why not let s->connection_co wake us up when
> >>>>> it's about to terminate instead of immediately rescheduling ourselves?
> >>>>
> >>>> Yes, it is busy waiting, but I didn’t find that bad.  The connection_co
> >>>> will be invoked in basically every iteration, and once there is no
> >>>> pending data, it will quit.
> >>>>
> >>>> The answer to “why not...” of course is because it’d be more complicated.
> >>>>
> >>>> But anyway.
> >>>>
> >>>> Adding a new function qemu_coroutine_run_after(target) that adds
> >>>> qemu_coroutine_self() to the given @target coroutine’s wake-up queue and
> >>>> then using that instead of scheduling works, too, yes.
> >>>>
> >>>> I don’t really like being responsible for coroutine code, though...
> >>>>
> >>>> (And maybe it’d be better to make it qemu_coroutine_yield_for(target),
> >>>> which does the above and then yields?)
> >>>
> >>> Or just do something like this, which is arguably not only a fix for the
> >>> busy wait, but also a code simplification:
> >>
> >> 1. Is that guaranteed to work?  What if data sneaks in, the
> >> connection_co handles that, and doesn’t wake up the teardown_co?  Will
> >> it be re-scheduled?
> > 
> > Then connection_co is buggy because we clearly requested that it
> > terminate.
> 
> Did we?  This would be done by setting s->quit to true, which isn’t
> explicitly done here.

*we clearly requested implicitly ;-)

> I thought it worked by just waking up the coroutine until it doesn’t
> receive anything anymore, because the connection is closed.  Now I don’t
> know whether QIO_CHANNEL_SHUTDOWN_BOTH discards data that has been
> received before the channel is closed.  I don’t expect it to.

It doesn't really matter, but I expect that we'll still read everything
that was buffered and receive EOF after everything is read.

> >            It is possible that it does so only after handling another
> > request, but this wouldn't be a problem. teardown_co would then just
> > sleep for a few cycles more until connection_co is done and reaches the
> > aio_co_wake() call.
> 
> I don’t quite understand, because the fact how connection_co would
> proceed after handling another request was exactly my question.  If it
> were to yield and not to wake up, it would never be done.

But why would it not wake up? This would be a bug, every yield needs a
corresponding place from which the coroutine is reentered later.

If this were missing, it would already today mean that we hang during
shutdown because s->connection_co would never become NULL.

> But I’ve followed nbd_receive_reply() now, and I suppose nbd_read_eof()
> will simply never yield after we have invoked qio_channel_shutdown().

If my expectation above is right, this would probably be the case at
least for the "main" yield. Not sure if there aren't other yield points,
though. But as I said, it doesn't matter anyway how many times the
coroutine yields and is reentered before finally reaching the
aio_co_wake() and terminating.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/5] block: Generic truncation fallback
  2019-07-12 11:48         ` Max Reitz
@ 2019-07-12 13:48           ` Max Reitz
  0 siblings, 0 replies; 20+ messages in thread
From: Max Reitz @ 2019-07-12 13:48 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 3404 bytes --]

On 12.07.19 13:48, Max Reitz wrote:
> On 12.07.19 13:17, Kevin Wolf wrote:
>> Am 12.07.2019 um 12:58 hat Max Reitz geschrieben:
>>> On 12.07.19 11:49, Kevin Wolf wrote:
>>>> Am 11.07.2019 um 21:58 hat Max Reitz geschrieben:
>>>>> If a protocol driver does not support truncation, we call fall back to
>>>>> effectively not doing anything if the new size is less than the actual
>>>>> file size.  This is what we have been doing for some host device drivers
>>>>> already.
>>>>
>>>> Specifically, we're doing it for drivers that access a fixed-size image,
>>>> i.e. block devices rather than regular files. We don't want to do this
>>>> for drivers where the file size could be changed, but just didn't
>>>> implement it.
>>>>
>>>> So I would suggest calling the function more specifically something like
>>>> bdrv_co_truncate_blockdev(), and not using it as an automatic fallback
>>>> in bdrv_co_truncate(), but just make it the BlockDriver.bdrv_co_truncate
>>>> implementation for those drivers where it makes sense.
>>>
>>> I was thinking about this, but the problem is that .bdrv_co_truncate()
>>> does not get a BdrvChild, so an implementation for it cannot generically
>>> zero the first sector (without bypassing the permission system, which
>>> would be wrong).
>>
>> Hm, I see. The next best thing would then probably having a bool in
>> BlockDriver that enables the fallback.
>>
>>> So the function pointer would actually need to be set to something like
>>> (int (*)(BlockDriverState *, int64_t, PreallocMode, Error **))42ul, or a
>>> dummy function that just aborts, and then bdrv_co_truncate() would
>>> recognize this magic constant.  But that seemed so weird to me that I
>>> decided just not to do it, mostly because I was wondering what would be
>>> so bad about treating images whose size we cannot change because we
>>> haven’t implemented it exactly like fixed-size images.
>>>
>>> (Also, “fixed-size” is up to interpretation.  You can change an LVM
>>> volume’s size.  qemu doesn’t do it, obviously.  But that is the reason
>>> for the warning qemu-img resize emits when it sees that the file size
>>> did not change.)
>>>
>>>> And of course, we only need these fake implementations because qemu-img
>>>> (or .bdrv_co_create_opts) always wants to create the protocol level. If
>>>> we could avoid this, then we wouldn't need any of this.
>>>
>>> It’s trivial to avoid this.  I mean, patch 4 does exactly that.
>>>
>>> So it isn’t about avoiding creating the protocol level, it’s about
>>> avoiding the truncation there.  But why would you do that?
>>
>> Because we can't actually truncate there. We can only do the fake thing
>> and claim we have truncated even though the size didn't change.
> 
> You’re right.  I actually didn’t realize that we have no drivers that
> support truncation, but not image creation.
> 
> Yes, then it’s stupid.
> 
> I thought it was a reasonable thing to do for such drivers.
> 
> So I suppose the best thing is to do what I hinted at in my reply to
> your reply to patch 3, which is to drop patches 2 and 3 and instead make
> the creation fallback work around blk_truncate() failures.

Oh no.  Now I remember.  The problem with that is that nowadays all
format drivers truncate the protocol file to 0 in their
.bdrv_co_create() implementation.  Aw, man.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-07-12 13:49 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-11 19:57 [Qemu-devel] [RFC 0/5] block: Generic file truncation/creation fallbacks Max Reitz
2019-07-11 19:58 ` [Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close() Max Reitz
2019-07-12  9:24   ` Kevin Wolf
2019-07-12 10:47     ` Max Reitz
2019-07-12 11:01       ` Kevin Wolf
2019-07-12 11:09         ` Max Reitz
2019-07-12 11:23           ` Kevin Wolf
2019-07-12 11:44             ` Max Reitz
2019-07-12 12:30               ` Kevin Wolf
2019-07-11 19:58 ` [Qemu-devel] [RFC 2/5] block: Generic truncation fallback Max Reitz
2019-07-12  9:49   ` Kevin Wolf
2019-07-12 10:58     ` Max Reitz
2019-07-12 11:17       ` Kevin Wolf
2019-07-12 11:48         ` Max Reitz
2019-07-12 13:48           ` Max Reitz
2019-07-11 19:58 ` [Qemu-devel] [RFC 3/5] block: Fall back to fallback truncate function Max Reitz
2019-07-12 10:04   ` Kevin Wolf
2019-07-12 11:05     ` Max Reitz
2019-07-11 19:58 ` [Qemu-devel] [RFC 4/5] block: Generic file creation fallback Max Reitz
2019-07-11 19:58 ` [Qemu-devel] [RFC 5/5] iotests: Add test for fallback truncate/create Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).