All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/3] image fleecing
@ 2018-06-29 15:15 Vladimir Sementsov-Ogievskiy
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source Vladimir Sementsov-Ogievskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-29 15:15 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: eblake, armbru, kwolf, mreitz, jsnow, famz, vsementsov, den

Image fleecing, or external (or pull) backup scheme is near to complete.
Here is forgotten test, written by Fam in far 2014, with two my necessary
patches, previously sent separately, so the series are called "v2"

v2:
 01: add transaction support
 02: rebase on master, drop "'fleecing-filter': 'BlockdevCreateNotSupported'"
     add empty .bdrv_close
     add default .bdrv_child_perm
 03: comparing to [PATCH v20 15/15] qemu-iotests: Image fleecing test case 08:
     add -f iotests.imgfmt to qemu_io
     fix target_img to be always qcow2
     add -r to qemu_io for nbd
     add fleecing-filter layer
     wrap long lines

Fam Zheng (1):
  qemu-iotests: Image fleecing test case 222

Vladimir Sementsov-Ogievskiy (2):
  blockdev-backup: enable non-root nodes for backup source
  block/fleecing-filter: new filter driver for fleecing

 qapi/block-core.json       |   6 ++-
 block/fleecing-filter.c    |  80 ++++++++++++++++++++++++++++++++
 blockdev.c                 |   4 +-
 block/Makefile.objs        |   1 +
 tests/qemu-iotests/222     | 112 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/222.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 7 files changed, 205 insertions(+), 4 deletions(-)
 create mode 100644 block/fleecing-filter.c
 create mode 100755 tests/qemu-iotests/222
 create mode 100644 tests/qemu-iotests/222.out

-- 
2.11.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source
  2018-06-29 15:15 [Qemu-devel] [PATCH v2 0/3] image fleecing Vladimir Sementsov-Ogievskiy
@ 2018-06-29 15:15 ` Vladimir Sementsov-Ogievskiy
  2018-06-29 17:13   ` Eric Blake
  2018-06-29 17:31   ` John Snow
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-29 15:15 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: eblake, armbru, kwolf, mreitz, jsnow, famz, vsementsov, den

This is needed to implement image-fleecing scheme, when we create
a temporary node, mark our active node to be backing for the temp,
and start backup(sync=none) from active node to the temp node.
Temp node then represents a kind of snapshot and may be used
for external backup through NBD.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 58d7570932..72f5347df5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1859,7 +1859,7 @@ static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
     assert(common->action->type == TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP);
     backup = common->action->u.blockdev_backup.data;
 
-    bs = qmp_get_root_bs(backup->device, errp);
+    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
     if (!bs) {
         return;
     }
@@ -3517,7 +3517,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, JobTxn *txn,
         backup->compress = false;
     }
 
-    bs = qmp_get_root_bs(backup->device, errp);
+    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
     if (!bs) {
         return NULL;
     }
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-06-29 15:15 [Qemu-devel] [PATCH v2 0/3] image fleecing Vladimir Sementsov-Ogievskiy
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source Vladimir Sementsov-Ogievskiy
@ 2018-06-29 15:15 ` Vladimir Sementsov-Ogievskiy
  2018-06-29 17:24   ` Eric Blake
  2018-06-29 17:30   ` John Snow
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222 Vladimir Sementsov-Ogievskiy
  2018-06-29 16:38 ` [Qemu-devel] [PATCH v2 0/3] image fleecing John Snow
  3 siblings, 2 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-29 15:15 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: eblake, armbru, kwolf, mreitz, jsnow, famz, vsementsov, den

We need to synchronize backup job with reading from fleecing image
like it was done in block/replication.c.

Otherwise, the following situation is theoretically possible:

1. client start reading
2. client understand, that there is no corresponding cluster in
   fleecing image
3. client is going to read from backing file (i.e. active image)
4. guest writes to active image
5. this write is stopped by backup(sync=none) and cluster is copied to
   fleecing image
6. guest write continues...
7. and client reads _new_ (or partly new) date from active image

So, this fleecing-filter should be above fleecing image, the whole
picture of fleecing looks like this:

    +-------+           +------------+
    |       |           |            |
    | guest |           | NBD client +<------+
    |       |           |            |       |
    ++-----++           +------------+       |only read
     |     ^                                 |
     | IO  |                                 |
     v     |                           +-----+------+
    ++-----+---------+                 |            |
    |                |                 |  internal  |
    |  active image  +----+            | NBD server |
    |                |    |            |            |
    +-+--------------+    |backup      +-+----------+
      ^                   |sync=none     ^
      |backing            |              |only read
      |                   |              |
    +-+--------------+    |       +------+----------+
    |                |    |       |                 |
    | fleecing image +<---+       | fleecing filter |
    |                |            |                 |
    +--------+-------+            +-----+-----------+
             ^                          |
             |                          |
             +--------------------------+
                       file

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json    |  6 ++--
 block/fleecing-filter.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++
 block/Makefile.objs     |  1 +
 3 files changed, 85 insertions(+), 2 deletions(-)
 create mode 100644 block/fleecing-filter.c

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 577ce5e999..43872c3d79 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2542,7 +2542,8 @@
             'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
             'null-aio', 'null-co', 'nvme', 'parallels', 'qcow', 'qcow2', 'qed',
             'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh',
-            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] }
+            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs',
+            'fleecing-filter' ] }
 
 ##
 # @BlockdevOptionsFile:
@@ -3594,7 +3595,8 @@
       'vmdk':       'BlockdevOptionsGenericCOWFormat',
       'vpc':        'BlockdevOptionsGenericFormat',
       'vvfat':      'BlockdevOptionsVVFAT',
-      'vxhs':       'BlockdevOptionsVxHS'
+      'vxhs':       'BlockdevOptionsVxHS',
+      'fleecing-filter': 'BlockdevOptionsGenericFormat'
   } }
 
 ##
diff --git a/block/fleecing-filter.c b/block/fleecing-filter.c
new file mode 100644
index 0000000000..b501887c10
--- /dev/null
+++ b/block/fleecing-filter.c
@@ -0,0 +1,80 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "block/blockjob.h"
+#include "block/block_int.h"
+#include "block/block_backup.h"
+
+static int64_t fleecing_getlength(BlockDriverState *bs)
+{
+    return bdrv_getlength(bs->file->bs);
+}
+
+static coroutine_fn int fleecing_co_preadv(BlockDriverState *bs,
+                                           uint64_t offset, uint64_t bytes,
+                                           QEMUIOVector *qiov, int flags)
+{
+    int ret;
+    BlockJob *job = bs->file->bs->backing->bs->job;
+    CowRequest req;
+
+    backup_wait_for_overlapping_requests(job, offset, bytes);
+    backup_cow_request_begin(&req, job, offset, bytes);
+
+    ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
+
+    backup_cow_request_end(&req);
+
+    return ret;
+}
+
+static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
+                                            uint64_t offset, uint64_t bytes,
+                                            QEMUIOVector *qiov, int flags)
+{
+    return -EINVAL;
+}
+
+static bool fleecing_recurse_is_first_non_filter(BlockDriverState *bs,
+                                                 BlockDriverState *candidate)
+{
+    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
+}
+
+static int fleecing_open(BlockDriverState *bs, QDict *options,
+                         int flags, Error **errp)
+{
+    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, false,
+                               errp);
+
+    return bs->file ? 0 : -EINVAL;
+}
+
+static void fleecing_close(BlockDriverState *bs)
+{
+    /* Do nothing, we have to add .bdrv_close, because bdrv_close() don't check
+     * it, just call. */
+}
+
+BlockDriver bdrv_fleecing_filter = {
+    .format_name = "fleecing-filter",
+    .protocol_name = "fleecing-filter",
+    .instance_size = 0,
+
+    .bdrv_open = fleecing_open,
+    .bdrv_close = fleecing_close,
+
+    .bdrv_getlength = fleecing_getlength,
+    .bdrv_co_preadv = fleecing_co_preadv,
+    .bdrv_co_pwritev = fleecing_co_pwritev,
+
+    .is_filter = true,
+    .bdrv_recurse_is_first_non_filter = fleecing_recurse_is_first_non_filter,
+    .bdrv_child_perm        = bdrv_filter_default_perms,
+};
+
+static void bdrv_fleecing_init(void)
+{
+    bdrv_register(&bdrv_fleecing_filter);
+}
+
+block_init(bdrv_fleecing_init);
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 899bfb5e2c..aa0a6dd971 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -27,6 +27,7 @@ block-obj-y += write-threshold.o
 block-obj-y += backup.o
 block-obj-$(CONFIG_REPLICATION) += replication.o
 block-obj-y += throttle.o copy-on-read.o
+block-obj-y += fleecing-filter.o
 
 block-obj-y += crypto.o
 
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222
  2018-06-29 15:15 [Qemu-devel] [PATCH v2 0/3] image fleecing Vladimir Sementsov-Ogievskiy
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source Vladimir Sementsov-Ogievskiy
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing Vladimir Sementsov-Ogievskiy
@ 2018-06-29 15:15 ` Vladimir Sementsov-Ogievskiy
  2018-06-29 15:31   ` Vladimir Sementsov-Ogievskiy
  2018-06-29 17:58   ` Eric Blake
  2018-06-29 16:38 ` [Qemu-devel] [PATCH v2 0/3] image fleecing John Snow
  3 siblings, 2 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-29 15:15 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: eblake, armbru, kwolf, mreitz, jsnow, famz, vsementsov, den

From: Fam Zheng <famz@redhat.com>

This tests the workflow of creating a lightweight point-in-time snapshot
with blockdev-backup command, and exporting it with built-in NBD server.

It's tested that any post-snapshot writing to the original device
doesn't change data seen in NBD target.

Signed-off-by: Fam Zheng <famz@redhat.com>
[vsementsov: add -f iotests.imgfmt to qemu_io, fix target_img to be
always qcow2, add -r to qemu_io for nbd, add fleecing-filter layer,
wrap long lines]
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/222     | 112 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/222.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 118 insertions(+)
 create mode 100755 tests/qemu-iotests/222
 create mode 100644 tests/qemu-iotests/222.out

diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/222
new file mode 100755
index 0000000000..3bcf9505fd
--- /dev/null
+++ b/tests/qemu-iotests/222
@@ -0,0 +1,112 @@
+#!/usr/bin/env python
+#
+# Tests for image fleecing (point in time snapshot export to NBD)
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# Based on 055.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import time
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+target_img = os.path.join(iotests.test_dir, 'target.img')
+nbd_sock = os.path.join(iotests.test_dir, 'nbd.sock')
+
+class TestImageFleecing(iotests.QMPTestCase):
+    image_len = 64 * 1024 * 1024 # MB
+
+    def setUp(self):
+        # Write data to the image so we can compare later
+        qemu_img('create', '-f', iotests.imgfmt, test_img,
+                 str(TestImageFleecing.image_len))
+        self.patterns = [
+                ("0x5d", "0", "64k"),
+                ("0xd5", "1M", "64k"),
+                ("0xdc", "32M", "64k"),
+                ("0xdc", "67043328", "64k")]
+
+        for p in self.patterns:
+            qemu_io('-f', iotests.imgfmt, '-c', 'write -P%s %s %s' % p,
+                    test_img)
+
+        qemu_img('create', '-f', 'qcow2', target_img,
+                 str(TestImageFleecing.image_len))
+
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+        self.overwrite_patterns = [
+                ("0xa0", "0", "64k"),
+                ("0x0a", "1M", "64k"),
+                ("0x55", "32M", "64k"),
+                ("0x56", "67043328", "64k")]
+
+        self.nbd_uri = "nbd+unix:///drive1?socket=%s" % nbd_sock
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(target_img)
+
+    def verify_patterns(self):
+        for p in self.patterns:
+            self.assertEqual(
+                    -1,
+                    qemu_io(self.nbd_uri, '-r', '-c', 'read -P%s %s %s' % p)
+                        .find("verification failed"),
+                    "Failed to verify pattern: %s %s %s" % p)
+
+    def test_image_fleecing(self):
+        result = self.vm.qmp("blockdev-add", **{
+            "driver": "fleecing-filter",
+            "node-name": "drive1",
+            "file": {
+                "driver": "qcow2",
+                "file": {
+                    "driver": "file",
+                    "filename": target_img,
+                    },
+                "backing": "drive0",
+            }
+            })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp(
+                "nbd-server-start",
+                **{"addr": { "type": "unix", "data": { "path": nbd_sock } } })
+        self.assert_qmp(result, 'return', {})
+        result = self.vm.qmp("blockdev-backup", device="drive0",
+                             target="drive1", sync="none")
+        self.assert_qmp(result, 'return', {})
+        result = self.vm.qmp("nbd-server-add", device="drive1")
+        self.assert_qmp(result, 'return', {})
+
+        self.verify_patterns()
+
+        for p in self.overwrite_patterns:
+            self.vm.hmp_qemu_io("drive0", "write -P%s %s %s" % p)
+
+        self.verify_patterns()
+
+        self.cancel_and_wait(resume=True)
+        self.assert_no_active_block_jobs()
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['raw', 'qcow2'])
diff --git a/tests/qemu-iotests/222.out b/tests/qemu-iotests/222.out
new file mode 100644
index 0000000000..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/222.out
@@ -0,0 +1,5 @@
+.
+----------------------------------------------------------------------
+Ran 1 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index eea75819d2..8019a9f721 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -220,3 +220,4 @@
 218 rw auto quick
 219 rw auto
 221 rw auto quick
+222 rw auto quick
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222 Vladimir Sementsov-Ogievskiy
@ 2018-06-29 15:31   ` Vladimir Sementsov-Ogievskiy
  2018-06-29 17:58   ` Eric Blake
  1 sibling, 0 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-29 15:31 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: eblake, armbru, kwolf, mreitz, jsnow, famz, den

29.06.2018 18:15, Vladimir Sementsov-Ogievskiy wrote:
> From: Fam Zheng <famz@redhat.com>
>
> This tests the workflow of creating a lightweight point-in-time snapshot
> with blockdev-backup command, and exporting it with built-in NBD server.
>
> It's tested that any post-snapshot writing to the original device
> doesn't change data seen in NBD target.
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> [vsementsov: add -f iotests.imgfmt to qemu_io, fix target_img to be
> always qcow2, add -r to qemu_io for nbd, add fleecing-filter layer,
> wrap long lines]
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/222     | 112 +++++++++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/222.out |   5 ++
>   tests/qemu-iotests/group   |   1 +
>   3 files changed, 118 insertions(+)
>   create mode 100755 tests/qemu-iotests/222
>   create mode 100644 tests/qemu-iotests/222.out
>
> diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/222
> new file mode 100755
> index 0000000000..3bcf9505fd
> --- /dev/null
> +++ b/tests/qemu-iotests/222
> @@ -0,0 +1,112 @@
> +#!/usr/bin/env python
> +#
> +# Tests for image fleecing (point in time snapshot export to NBD)
> +#
> +# Copyright (C) 2014 Red Hat, Inc.
> +#
> +# Based on 055.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +#
> +
> +import time
> +import os
> +import iotests
> +from iotests import qemu_img, qemu_io
> +
> +test_img = os.path.join(iotests.test_dir, 'test.img')
> +target_img = os.path.join(iotests.test_dir, 'target.img')
> +nbd_sock = os.path.join(iotests.test_dir, 'nbd.sock')
> +
> +class TestImageFleecing(iotests.QMPTestCase):
> +    image_len = 64 * 1024 * 1024 # MB
> +
> +    def setUp(self):
> +        # Write data to the image so we can compare later
> +        qemu_img('create', '-f', iotests.imgfmt, test_img,
> +                 str(TestImageFleecing.image_len))
> +        self.patterns = [
> +                ("0x5d", "0", "64k"),
> +                ("0xd5", "1M", "64k"),
> +                ("0xdc", "32M", "64k"),
> +                ("0xdc", "67043328", "64k")]
> +
> +        for p in self.patterns:
> +            qemu_io('-f', iotests.imgfmt, '-c', 'write -P%s %s %s' % p,
> +                    test_img)
> +
> +        qemu_img('create', '-f', 'qcow2', target_img,
> +                 str(TestImageFleecing.image_len))
> +
> +        self.vm = iotests.VM().add_drive(test_img)
> +        self.vm.launch()
> +
> +        self.overwrite_patterns = [
> +                ("0xa0", "0", "64k"),
> +                ("0x0a", "1M", "64k"),
> +                ("0x55", "32M", "64k"),
> +                ("0x56", "67043328", "64k")]
> +
> +        self.nbd_uri = "nbd+unix:///drive1?socket=%s" % nbd_sock
> +
> +    def tearDown(self):
> +        self.vm.shutdown()
> +        os.remove(test_img)
> +        os.remove(target_img)
> +
> +    def verify_patterns(self):
> +        for p in self.patterns:
> +            self.assertEqual(
> +                    -1,
> +                    qemu_io(self.nbd_uri, '-r', '-c', 'read -P%s %s %s' % p)
> +                        .find("verification failed"),
> +                    "Failed to verify pattern: %s %s %s" % p)
> +
> +    def test_image_fleecing(self):
> +        result = self.vm.qmp("blockdev-add", **{
> +            "driver": "fleecing-filter",
> +            "node-name": "drive1",
> +            "file": {
> +                "driver": "qcow2",
> +                "file": {
> +                    "driver": "file",
> +                    "filename": target_img,
> +                    },
> +                "backing": "drive0",
> +            }
> +            })
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp(
> +                "nbd-server-start",
> +                **{"addr": { "type": "unix", "data": { "path": nbd_sock } } })
> +        self.assert_qmp(result, 'return', {})
> +        result = self.vm.qmp("blockdev-backup", device="drive0",
> +                             target="drive1", sync="none")
> +        self.assert_qmp(result, 'return', {})
> +        result = self.vm.qmp("nbd-server-add", device="drive1")
> +        self.assert_qmp(result, 'return', {})
> +
> +        self.verify_patterns()
> +
> +        for p in self.overwrite_patterns:
> +            self.vm.hmp_qemu_io("drive0", "write -P%s %s %s" % p)
> +
> +        self.verify_patterns()
> +
> +        self.cancel_and_wait(resume=True)
> +        self.assert_no_active_block_jobs()
> +
> +if __name__ == '__main__':
> +    iotests.main(supported_fmts=['raw', 'qcow2'])
> diff --git a/tests/qemu-iotests/222.out b/tests/qemu-iotests/222.out
> new file mode 100644
> index 0000000000..ae1213e6f8
> --- /dev/null
> +++ b/tests/qemu-iotests/222.out
> @@ -0,0 +1,5 @@
> +.
> +----------------------------------------------------------------------
> +Ran 1 tests
> +
> +OK
> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index eea75819d2..8019a9f721 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -220,3 +220,4 @@
>   218 rw auto quick
>   219 rw auto
>   221 rw auto quick
> +222 rw auto quick

Hmm, don't understand why this works, but it should not, without the 
following squashed (we can't write to fleecing-filter):

diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/222
index 3bcf9505fd..77637a8ce2 100755
--- a/tests/qemu-iotests/222
+++ b/tests/qemu-iotests/222
@@ -76,9 +76,10 @@ class TestImageFleecing(iotests.QMPTestCase):
      def test_image_fleecing(self):
          result = self.vm.qmp("blockdev-add", **{
              "driver": "fleecing-filter",
-            "node-name": "drive1",
+            "node-name": "drive-fleecing-filter",
              "file": {
                  "driver": "qcow2",
+                "node-name": "drive-fleecing-image",
                  "file": {
                      "driver": "file",
                      "filename": target_img,
@@ -93,9 +94,9 @@ class TestImageFleecing(iotests.QMPTestCase):
                  **{"addr": { "type": "unix", "data": { "path": 
nbd_sock } } })
          self.assert_qmp(result, 'return', {})
          result = self.vm.qmp("blockdev-backup", device="drive0",
-                             target="drive1", sync="none")
+                             target="drive-fleecing-image", sync="none")
          self.assert_qmp(result, 'return', {})
-        result = self.vm.qmp("nbd-server-add", device="drive1")
+        result = self.vm.qmp("nbd-server-add", 
device="drive-fleecing-filter")


-- 
Best regards,
Vladimir

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] image fleecing
  2018-06-29 15:15 [Qemu-devel] [PATCH v2 0/3] image fleecing Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222 Vladimir Sementsov-Ogievskiy
@ 2018-06-29 16:38 ` John Snow
  2018-06-29 17:36   ` Vladimir Sementsov-Ogievskiy
  3 siblings, 1 reply; 28+ messages in thread
From: John Snow @ 2018-06-29 16:38 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: eblake, armbru, kwolf, mreitz, famz, den



On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> Image fleecing, or external (or pull) backup scheme is near to complete.
> Here is forgotten test, written by Fam in far 2014, with two my necessary
> patches, previously sent separately, so the series are called "v2"
> 
> v2:
>  01: add transaction support
>  02: rebase on master, drop "'fleecing-filter': 'BlockdevCreateNotSupported'"
>      add empty .bdrv_close
>      add default .bdrv_child_perm
>  03: comparing to [PATCH v20 15/15] qemu-iotests: Image fleecing test case 08:
>      add -f iotests.imgfmt to qemu_io
>      fix target_img to be always qcow2
>      add -r to qemu_io for nbd
>      add fleecing-filter layer
>      wrap long lines
> 
> Fam Zheng (1):
>   qemu-iotests: Image fleecing test case 222
> 
> Vladimir Sementsov-Ogievskiy (2):
>   blockdev-backup: enable non-root nodes for backup source
>   block/fleecing-filter: new filter driver for fleecing
> 
>  qapi/block-core.json       |   6 ++-
>  block/fleecing-filter.c    |  80 ++++++++++++++++++++++++++++++++
>  blockdev.c                 |   4 +-
>  block/Makefile.objs        |   1 +
>  tests/qemu-iotests/222     | 112 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/222.out |   5 ++
>  tests/qemu-iotests/group   |   1 +
>  7 files changed, 205 insertions(+), 4 deletions(-)
>  create mode 100644 block/fleecing-filter.c
>  create mode 100755 tests/qemu-iotests/222
>  create mode 100644 tests/qemu-iotests/222.out
> 

Have you been able to reproduce a race that proves the filter is
necessary, or can you explain how it might happen?

I think I mentioned it once as a possibility but I hadn't convinced myself.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source Vladimir Sementsov-Ogievskiy
@ 2018-06-29 17:13   ` Eric Blake
  2018-06-29 17:31   ` John Snow
  1 sibling, 0 replies; 28+ messages in thread
From: Eric Blake @ 2018-06-29 17:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, kwolf, mreitz, jsnow, famz, den

On 06/29/2018 10:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> This is needed to implement image-fleecing scheme, when we create
> a temporary node, mark our active node to be backing for the temp,
> and start backup(sync=none) from active node to the temp node.
> Temp node then represents a kind of snapshot and may be used
> for external backup through NBD.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   blockdev.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 58d7570932..72f5347df5 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1859,7 +1859,7 @@ static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
>       assert(common->action->type == TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP);
>       backup = common->action->u.blockdev_backup.data;
>   
> -    bs = qmp_get_root_bs(backup->device, errp);
> +    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
>       if (!bs) {
>           return;
>       }

This hunk is new,

> @@ -3517,7 +3517,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, JobTxn *txn,
>           backup->compress = false;
>       }
>   
> -    bs = qmp_get_root_bs(backup->device, errp);
> +    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
>       if (!bs) {
>           return NULL;
>       }
> 

but this hunk is identical to John's patch:
https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08593.html

I guess yours adds additional code needed to work under a transaction, 
as opposed to separate QMP commands.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing Vladimir Sementsov-Ogievskiy
@ 2018-06-29 17:24   ` Eric Blake
  2018-07-02  6:35     ` Fam Zheng
  2018-07-02 11:47     ` Vladimir Sementsov-Ogievskiy
  2018-06-29 17:30   ` John Snow
  1 sibling, 2 replies; 28+ messages in thread
From: Eric Blake @ 2018-06-29 17:24 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, kwolf, mreitz, jsnow, famz, den

On 06/29/2018 10:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> We need to synchronize backup job with reading from fleecing image
> like it was done in block/replication.c.
> 
> Otherwise, the following situation is theoretically possible:
> 

Grammar suggestions:

> 1. client start reading

client starts reading

> 2. client understand, that there is no corresponding cluster in
>     fleecing image
> 3. client is going to read from backing file (i.e. active image)

client sees that no corresponding cluster has been allocated in the 
fleecing image, so the request is forwarded to the backing file

> 4. guest writes to active image
> 5. this write is stopped by backup(sync=none) and cluster is copied to
>     fleecing image
> 6. guest write continues...
> 7. and client reads _new_ (or partly new) date from active image

Interesting race. Can it actually happen, or does our read code already 
serialize writes to the same area while a read is underway?

In short, I see what problem you are claiming exists: the moment the 
client starts reading from the backing file, that portion of the backing 
file must remain unchanged until after the client is done reading.  But 
I don't know enough details of the block layer to know if this is 
actually a problem, or if adding the new filter is just overhead.

> 
> So, this fleecing-filter should be above fleecing image, the whole
> picture of fleecing looks like this:
> 
>      +-------+           +------------+
>      |       |           |            |
>      | guest |           | NBD client +<------+
>      |       |           |            |       |
>      ++-----++           +------------+       |only read
>       |     ^                                 |
>       | IO  |                                 |
>       v     |                           +-----+------+
>      ++-----+---------+                 |            |
>      |                |                 |  internal  |
>      |  active image  +----+            | NBD server |
>      |                |    |            |            |
>      +-+--------------+    |backup      +-+----------+
>        ^                   |sync=none     ^
>        |backing            |              |only read
>        |                   |              |
>      +-+--------------+    |       +------+----------+
>      |                |    |       |                 |
>      | fleecing image +<---+       | fleecing filter |
>      |                |            |                 |
>      +--------+-------+            +-----+-----------+
>               ^                          |
>               |                          |
>               +--------------------------+
>                         file

Can you also show the sequence of QMP commands to set up this structure 
(or maybe you do in 3/3; which I haven't looked at yet).

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   qapi/block-core.json    |  6 ++--
>   block/fleecing-filter.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++
>   block/Makefile.objs     |  1 +
>   3 files changed, 85 insertions(+), 2 deletions(-)
>   create mode 100644 block/fleecing-filter.c
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 577ce5e999..43872c3d79 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2542,7 +2542,8 @@
>               'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
>               'null-aio', 'null-co', 'nvme', 'parallels', 'qcow', 'qcow2', 'qed',
>               'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh',
> -            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] }
> +            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs',
> +            'fleecing-filter' ] }

Missing a 'since 3.0' documentation blurb; also, this enum has been kept 
sorted, so your new filter needs to come earlier.

>   
>   ##
>   # @BlockdevOptionsFile:
> @@ -3594,7 +3595,8 @@
>         'vmdk':       'BlockdevOptionsGenericCOWFormat',
>         'vpc':        'BlockdevOptionsGenericFormat',
>         'vvfat':      'BlockdevOptionsVVFAT',
> -      'vxhs':       'BlockdevOptionsVxHS'
> +      'vxhs':       'BlockdevOptionsVxHS',
> +      'fleecing-filter': 'BlockdevOptionsGenericFormat'

Again, this has been kept sorted.

> +static coroutine_fn int fleecing_co_preadv(BlockDriverState *bs,
> +                                           uint64_t offset, uint64_t bytes,
> +                                           QEMUIOVector *qiov, int flags)
> +{
> +    int ret;
> +    BlockJob *job = bs->file->bs->backing->bs->job;
> +    CowRequest req;
> +
> +    backup_wait_for_overlapping_requests(job, offset, bytes);
> +    backup_cow_request_begin(&req, job, offset, bytes);
> +
> +    ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
> +
> +    backup_cow_request_end(&req);
> +
> +    return ret;
> +}

So the idea here is that you force a serializing request to ensure that 
there are no other writes to the area in the meantime.

> +
> +static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
> +                                            uint64_t offset, uint64_t bytes,
> +                                            QEMUIOVector *qiov, int flags)
> +{
> +    return -EINVAL;

and you force this to be a read-only interface. (Does the block layer 
actually require us to provide a pwritev callback, or can we leave it 
NULL instead?)

> +BlockDriver bdrv_fleecing_filter = {
> +    .format_name = "fleecing-filter",
> +    .protocol_name = "fleecing-filter",
> +    .instance_size = 0,
> +
> +    .bdrv_open = fleecing_open,
> +    .bdrv_close = fleecing_close,
> +
> +    .bdrv_getlength = fleecing_getlength,
> +    .bdrv_co_preadv = fleecing_co_preadv,
> +    .bdrv_co_pwritev = fleecing_co_pwritev,
> +
> +    .is_filter = true,
> +    .bdrv_recurse_is_first_non_filter = fleecing_recurse_is_first_non_filter,
> +    .bdrv_child_perm        = bdrv_filter_default_perms,

No .bdrv_co_block_status callback?  That probably hurts querying for 
sparse regions.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing Vladimir Sementsov-Ogievskiy
  2018-06-29 17:24   ` Eric Blake
@ 2018-06-29 17:30   ` John Snow
  2018-06-29 17:40     ` Eric Blake
  2018-07-02 11:57     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 2 replies; 28+ messages in thread
From: John Snow @ 2018-06-29 17:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: eblake, armbru, kwolf, mreitz, famz, den



On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> We need to synchronize backup job with reading from fleecing image
> like it was done in block/replication.c.
> 
> Otherwise, the following situation is theoretically possible:
> 
> 1. client start reading
> 2. client understand, that there is no corresponding cluster in
>    fleecing image

I don't think the client refocuses the read, but QEMU does. (the client
has no idea if it is reading from the fleecing node or the backing file.)

... but understood:

> 3. client is going to read from backing file (i.e. active image)
> 4. guest writes to active image

My question here is if QEMU will allow reads and writes to interleave in
general. If the client is reading from the backing file, the active
image, will QEMU allow a write to modify that data before we're done
getting that data?

> 5. this write is stopped by backup(sync=none) and cluster is copied to
>    fleecing image
> 6. guest write continues...
> 7. and client reads _new_ (or partly new) date from active image
> 

I can't answer this for myself one way or the other right now, but I
imagine you observed a failure in practice in your test setups, which
motivated this patch?

A test or some proof would help justification for this patch. It would
also help prove that it solves what it claims to!

> So, this fleecing-filter should be above fleecing image, the whole
> picture of fleecing looks like this:
> 
>     +-------+           +------------+
>     |       |           |            |
>     | guest |           | NBD client +<------+
>     |       |           |            |       |
>     ++-----++           +------------+       |only read
>      |     ^                                 |
>      | IO  |                                 |
>      v     |                           +-----+------+
>     ++-----+---------+                 |            |
>     |                |                 |  internal  |
>     |  active image  +----+            | NBD server |
>     |                |    |            |            |
>     +-+--------------+    |backup      +-+----------+
>       ^                   |sync=none     ^
>       |backing            |              |only read
>       |                   |              |
>     +-+--------------+    |       +------+----------+
>     |                |    |       |                 |
>     | fleecing image +<---+       | fleecing filter |
>     |                |            |                 |
>     +--------+-------+            +-----+-----------+
>              ^                          |
>              |                          |
>              +--------------------------+
>                        file
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  qapi/block-core.json    |  6 ++--
>  block/fleecing-filter.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++
>  block/Makefile.objs     |  1 +
>  3 files changed, 85 insertions(+), 2 deletions(-)
>  create mode 100644 block/fleecing-filter.c
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 577ce5e999..43872c3d79 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2542,7 +2542,8 @@
>              'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
>              'null-aio', 'null-co', 'nvme', 'parallels', 'qcow', 'qcow2', 'qed',
>              'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh',
> -            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] }
> +            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs',
> +            'fleecing-filter' ] }
>  
>  ##
>  # @BlockdevOptionsFile:
> @@ -3594,7 +3595,8 @@
>        'vmdk':       'BlockdevOptionsGenericCOWFormat',
>        'vpc':        'BlockdevOptionsGenericFormat',
>        'vvfat':      'BlockdevOptionsVVFAT',
> -      'vxhs':       'BlockdevOptionsVxHS'
> +      'vxhs':       'BlockdevOptionsVxHS',
> +      'fleecing-filter': 'BlockdevOptionsGenericFormat'
>    } }
>  
>  ##
> diff --git a/block/fleecing-filter.c b/block/fleecing-filter.c
> new file mode 100644
> index 0000000000..b501887c10
> --- /dev/null
> +++ b/block/fleecing-filter.c
> @@ -0,0 +1,80 @@
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "block/blockjob.h"
> +#include "block/block_int.h"
> +#include "block/block_backup.h"
> +
> +static int64_t fleecing_getlength(BlockDriverState *bs)
> +{
> +    return bdrv_getlength(bs->file->bs);
> +}
> +
> +static coroutine_fn int fleecing_co_preadv(BlockDriverState *bs,
> +                                           uint64_t offset, uint64_t bytes,
> +                                           QEMUIOVector *qiov, int flags)
> +{
> +    int ret;
> +    BlockJob *job = bs->file->bs->backing->bs->job;
> +    CowRequest req;
> +
> +    backup_wait_for_overlapping_requests(job, offset, bytes);
> +    backup_cow_request_begin(&req, job, offset, bytes);
> +
> +    ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
> +
> +    backup_cow_request_end(&req);
> +
> +    return ret;
> +}
> +
> +static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
> +                                            uint64_t offset, uint64_t bytes,
> +                                            QEMUIOVector *qiov, int flags)
> +{
> +    return -EINVAL;
> +}
> +
> +static bool fleecing_recurse_is_first_non_filter(BlockDriverState *bs,
> +                                                 BlockDriverState *candidate)
> +{
> +    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
> +}
> +
> +static int fleecing_open(BlockDriverState *bs, QDict *options,
> +                         int flags, Error **errp)
> +{
> +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, false,
> +                               errp);
> +
> +    return bs->file ? 0 : -EINVAL;
> +}
> +
> +static void fleecing_close(BlockDriverState *bs)
> +{
> +    /* Do nothing, we have to add .bdrv_close, because bdrv_close() don't check
> +     * it, just call. */
> +}
> +
> +BlockDriver bdrv_fleecing_filter = {
> +    .format_name = "fleecing-filter",
> +    .protocol_name = "fleecing-filter",
> +    .instance_size = 0,
> +
> +    .bdrv_open = fleecing_open,
> +    .bdrv_close = fleecing_close,
> +
> +    .bdrv_getlength = fleecing_getlength,
> +    .bdrv_co_preadv = fleecing_co_preadv,
> +    .bdrv_co_pwritev = fleecing_co_pwritev,
> +
> +    .is_filter = true,
> +    .bdrv_recurse_is_first_non_filter = fleecing_recurse_is_first_non_filter,
> +    .bdrv_child_perm        = bdrv_filter_default_perms,
> +};
> +
> +static void bdrv_fleecing_init(void)
> +{
> +    bdrv_register(&bdrv_fleecing_filter);
> +}
> +
> +block_init(bdrv_fleecing_init);
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 899bfb5e2c..aa0a6dd971 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -27,6 +27,7 @@ block-obj-y += write-threshold.o
>  block-obj-y += backup.o
>  block-obj-$(CONFIG_REPLICATION) += replication.o
>  block-obj-y += throttle.o copy-on-read.o
> +block-obj-y += fleecing-filter.o
>  
>  block-obj-y += crypto.o
>  
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source Vladimir Sementsov-Ogievskiy
  2018-06-29 17:13   ` Eric Blake
@ 2018-06-29 17:31   ` John Snow
  1 sibling, 0 replies; 28+ messages in thread
From: John Snow @ 2018-06-29 17:31 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, famz, armbru, mreitz, den



On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> This is needed to implement image-fleecing scheme, when we create
> a temporary node, mark our active node to be backing for the temp,
> and start backup(sync=none) from active node to the temp node.
> Temp node then represents a kind of snapshot and may be used
> for external backup through NBD.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

A hair more comprehensive than my recent patch which did the same thing, so:

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] image fleecing
  2018-06-29 16:38 ` [Qemu-devel] [PATCH v2 0/3] image fleecing John Snow
@ 2018-06-29 17:36   ` Vladimir Sementsov-Ogievskiy
  2018-06-29 17:52     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-29 17:36 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block
  Cc: eblake, armbru, kwolf, mreitz, famz, den

29.06.2018 19:38, John Snow wrote:
>
> On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Image fleecing, or external (or pull) backup scheme is near to complete.
>> Here is forgotten test, written by Fam in far 2014, with two my necessary
>> patches, previously sent separately, so the series are called "v2"
>>
>> v2:
>>   01: add transaction support
>>   02: rebase on master, drop "'fleecing-filter': 'BlockdevCreateNotSupported'"
>>       add empty .bdrv_close
>>       add default .bdrv_child_perm
>>   03: comparing to [PATCH v20 15/15] qemu-iotests: Image fleecing test case 08:
>>       add -f iotests.imgfmt to qemu_io
>>       fix target_img to be always qcow2
>>       add -r to qemu_io for nbd
>>       add fleecing-filter layer
>>       wrap long lines
>>
>> Fam Zheng (1):
>>    qemu-iotests: Image fleecing test case 222
>>
>> Vladimir Sementsov-Ogievskiy (2):
>>    blockdev-backup: enable non-root nodes for backup source
>>    block/fleecing-filter: new filter driver for fleecing
>>
>>   qapi/block-core.json       |   6 ++-
>>   block/fleecing-filter.c    |  80 ++++++++++++++++++++++++++++++++
>>   blockdev.c                 |   4 +-
>>   block/Makefile.objs        |   1 +
>>   tests/qemu-iotests/222     | 112 +++++++++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/222.out |   5 ++
>>   tests/qemu-iotests/group   |   1 +
>>   7 files changed, 205 insertions(+), 4 deletions(-)
>>   create mode 100644 block/fleecing-filter.c
>>   create mode 100755 tests/qemu-iotests/222
>>   create mode 100644 tests/qemu-iotests/222.out
>>
> Have you been able to reproduce a race that proves the filter is
> necessary, or can you explain how it might happen?

No. But same logic is in block/replication, and I remember, that I 
discussed it on list, and the case I've described in 02 is not my idea.

>
> I think I mentioned it once as a possibility but I hadn't convinced myself.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-06-29 17:30   ` John Snow
@ 2018-06-29 17:40     ` Eric Blake
  2018-07-02 12:09       ` Vladimir Sementsov-Ogievskiy
  2018-07-02 11:57     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Blake @ 2018-06-29 17:40 UTC (permalink / raw)
  To: John Snow, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, kwolf, mreitz, famz, den

On 06/29/2018 12:30 PM, John Snow wrote:
> 
> 
> On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>> We need to synchronize backup job with reading from fleecing image
>> like it was done in block/replication.c.
>>
>> Otherwise, the following situation is theoretically possible:
>>
>> 1. client start reading
>> 2. client understand, that there is no corresponding cluster in
>>     fleecing image
> 
> I don't think the client refocuses the read, but QEMU does. (the client
> has no idea if it is reading from the fleecing node or the backing file.)
> 
> ... but understood:
> 
>> 3. client is going to read from backing file (i.e. active image)
>> 4. guest writes to active image
> 
> My question here is if QEMU will allow reads and writes to interleave in
> general. If the client is reading from the backing file, the active
> image, will QEMU allow a write to modify that data before we're done
> getting that data?
> 
>> 5. this write is stopped by backup(sync=none) and cluster is copied to
>>     fleecing image
>> 6. guest write continues...
>> 7. and client reads _new_ (or partly new) date from active image
>>
> 
> I can't answer this for myself one way or the other right now, but I
> imagine you observed a failure in practice in your test setups, which
> motivated this patch?
> 
> A test or some proof would help justification for this patch. It would
> also help prove that it solves what it claims to!

In fact, do we really need a new filter, or do we just need to make the 
"sync":"none" blockdev-backup job take the appropriate synchronization 
locks?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] image fleecing
  2018-06-29 17:36   ` Vladimir Sementsov-Ogievskiy
@ 2018-06-29 17:52     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-29 17:52 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block
  Cc: eblake, armbru, kwolf, mreitz, famz, den

29.06.2018 20:36, Vladimir Sementsov-Ogievskiy wrote:
> 29.06.2018 19:38, John Snow wrote:
>>
>> On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Image fleecing, or external (or pull) backup scheme is near to 
>>> complete.
>>> Here is forgotten test, written by Fam in far 2014, with two my 
>>> necessary
>>> patches, previously sent separately, so the series are called "v2"
>>>
>>> v2:
>>>   01: add transaction support
>>>   02: rebase on master, drop "'fleecing-filter': 
>>> 'BlockdevCreateNotSupported'"
>>>       add empty .bdrv_close
>>>       add default .bdrv_child_perm
>>>   03: comparing to [PATCH v20 15/15] qemu-iotests: Image fleecing 
>>> test case 08:
>>>       add -f iotests.imgfmt to qemu_io
>>>       fix target_img to be always qcow2
>>>       add -r to qemu_io for nbd
>>>       add fleecing-filter layer
>>>       wrap long lines
>>>
>>> Fam Zheng (1):
>>>    qemu-iotests: Image fleecing test case 222
>>>
>>> Vladimir Sementsov-Ogievskiy (2):
>>>    blockdev-backup: enable non-root nodes for backup source
>>>    block/fleecing-filter: new filter driver for fleecing
>>>
>>>   qapi/block-core.json       |   6 ++-
>>>   block/fleecing-filter.c    |  80 ++++++++++++++++++++++++++++++++
>>>   blockdev.c                 |   4 +-
>>>   block/Makefile.objs        |   1 +
>>>   tests/qemu-iotests/222     | 112 
>>> +++++++++++++++++++++++++++++++++++++++++++++
>>>   tests/qemu-iotests/222.out |   5 ++
>>>   tests/qemu-iotests/group   |   1 +
>>>   7 files changed, 205 insertions(+), 4 deletions(-)
>>>   create mode 100644 block/fleecing-filter.c
>>>   create mode 100755 tests/qemu-iotests/222
>>>   create mode 100644 tests/qemu-iotests/222.out
>>>
>> Have you been able to reproduce a race that proves the filter is
>> necessary, or can you explain how it might happen?
>
> No. But same logic is in block/replication, and I remember, that I 
> discussed it on list, and the case I've described in 02 is not my idea.

found: http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg01807.html

>
>>
>> I think I mentioned it once as a possibility but I hadn't convinced 
>> myself.
>
>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222
  2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222 Vladimir Sementsov-Ogievskiy
  2018-06-29 15:31   ` Vladimir Sementsov-Ogievskiy
@ 2018-06-29 17:58   ` Eric Blake
  2018-06-29 21:04     ` John Snow
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Blake @ 2018-06-29 17:58 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: armbru, kwolf, mreitz, jsnow, famz, den

On 06/29/2018 10:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Fam Zheng <famz@redhat.com>
> 
> This tests the workflow of creating a lightweight point-in-time snapshot
> with blockdev-backup command, and exporting it with built-in NBD server.
> 
> It's tested that any post-snapshot writing to the original device
> doesn't change data seen in NBD target.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> [vsementsov: add -f iotests.imgfmt to qemu_io, fix target_img to be
> always qcow2, add -r to qemu_io for nbd, add fleecing-filter layer,
> wrap long lines]
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

For my own records: Fam's most recent post was:
https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg03895.html
[Qemu-devel] [PATCH v20 15/15] qemu-iotests: Image fleecing test case 089

Wow - has it really been FOUR YEARS since we first proposed this?

I'm trying to figure out why that original post was abandoned; it looks 
like:
https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg04333.html

split that v20 series into stuff that was ready (which made it in back 
then), and stuff that was still undergoing comments (which then stalled 
and got lost in the shuffle until now).

My next question: how does this compare to John's recent patch?  And how 
much of Fam's original work did John borrow (that self.patterns array 
looks pretty common, now that I'm looking at multiple mails - although 
John's version lacks any commit message body):
https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08594.html

And given the iterative improvements that have gone into John's version 
(such as checking that the fleece image reads overwritten zeroes just as 
correctly as overwritten data), we definitely need to merge the two 
approaches into one patch.

> +++ b/tests/qemu-iotests/222
> @@ -0,0 +1,112 @@
> +#!/usr/bin/env python
> +#
> +# Tests for image fleecing (point in time snapshot export to NBD)
> +#
> +# Copyright (C) 2014 Red Hat, Inc.
> +#

Interesting copyright line; it matches Fam's original post (and the fact 
that you kept him as author); whereas John's version had the humorous:

+# Copyright (C) 2018 Red Hat, Inc.
+# John helped, too.

> +# Based on 055.

Is this information still useful? John dropped it.

> +
> +test_img = os.path.join(iotests.test_dir, 'test.img')
> +target_img = os.path.join(iotests.test_dir, 'target.img')
> +nbd_sock = os.path.join(iotests.test_dir, 'nbd.sock')
> +
> +class TestImageFleecing(iotests.QMPTestCase):
> +    image_len = 64 * 1024 * 1024 # MB
> +
> +    def setUp(self):
> +        # Write data to the image so we can compare later
> +        qemu_img('create', '-f', iotests.imgfmt, test_img,
> +                 str(TestImageFleecing.image_len))

Difference in style between nested methods of a class, vs. directly 
using 'with iotests.FilePath(..., iotests.VM() as vm:'

> +        self.patterns = [
> +                ("0x5d", "0", "64k"),
> +                ("0xd5", "1M", "64k"),
> +                ("0xdc", "32M", "64k"),
> +                ("0xdc", "67043328", "64k")]

John spelled it:
  ("0xcd", "0x3ff0000", "64k"]  # 64M - 64K

The 0xdc vs. 0xcd doesn't matter all that much, but having distinct 
patterns in distinct ranges is kind of nice if we have to later inspect 
a file.

> +
> +        for p in self.patterns:
> +            qemu_io('-f', iotests.imgfmt, '-c', 'write -P%s %s %s' % p,
> +                    test_img)
> +
> +        qemu_img('create', '-f', 'qcow2', target_img,
> +                 str(TestImageFleecing.image_len))
> +
> +        self.vm = iotests.VM().add_drive(test_img)
> +        self.vm.launch()
> +
> +        self.overwrite_patterns = [
> +                ("0xa0", "0", "64k"),
> +                ("0x0a", "1M", "64k"),
> +                ("0x55", "32M", "64k"),
> +                ("0x56", "67043328", "64k")]
> +
> +        self.nbd_uri = "nbd+unix:///drive1?socket=%s" % nbd_sock
> +
> +    def tearDown(self):
> +        self.vm.shutdown()
> +        os.remove(test_img)
> +        os.remove(target_img)

I'm not seeing anything that stops the NBD server; John's version added 
that at my insistence.

> +
> +    def verify_patterns(self):
> +        for p in self.patterns:
> +            self.assertEqual(
> +                    -1,
> +                    qemu_io(self.nbd_uri, '-r', '-c', 'read -P%s %s %s' % p)
> +                        .find("verification failed"),
> +                    "Failed to verify pattern: %s %s %s" % p)
> +
> +    def test_image_fleecing(self):
> +        result = self.vm.qmp("blockdev-add", **{
> +            "driver": "fleecing-filter",
> +            "node-name": "drive1",
> +            "file": {
> +                "driver": "qcow2",
> +                "file": {
> +                    "driver": "file",
> +                    "filename": target_img,
> +                    },
> +                "backing": "drive0",
> +            }
> +            })
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp(
> +                "nbd-server-start",
> +                **{"addr": { "type": "unix", "data": { "path": nbd_sock } } })
> +        self.assert_qmp(result, 'return', {})
> +        result = self.vm.qmp("blockdev-backup", device="drive0",
> +                             target="drive1", sync="none")
> +        self.assert_qmp(result, 'return', {})
> +        result = self.vm.qmp("nbd-server-add", device="drive1")
> +        self.assert_qmp(result, 'return', {})
> +
> +        self.verify_patterns()
> +
> +        for p in self.overwrite_patterns:
> +            self.vm.hmp_qemu_io("drive0", "write -P%s %s %s" % p)
> +
> +        self.verify_patterns()
> +
> +        self.cancel_and_wait(resume=True)
> +        self.assert_no_active_block_jobs()
> +
> +if __name__ == '__main__':
> +    iotests.main(supported_fmts=['raw', 'qcow2'])
> diff --git a/tests/qemu-iotests/222.out b/tests/qemu-iotests/222.out
> new file mode 100644
> index 0000000000..ae1213e6f8
> --- /dev/null
> +++ b/tests/qemu-iotests/222.out
> @@ -0,0 +1,5 @@
> +.
> +----------------------------------------------------------------------
> +Ran 1 tests
> +

I also much prefer the output style that resulted in John's version.

> +OK
> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index eea75819d2..8019a9f721 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -220,3 +220,4 @@
>   218 rw auto quick
>   219 rw auto
>   221 rw auto quick
> +222 rw auto quick
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222
  2018-06-29 17:58   ` Eric Blake
@ 2018-06-29 21:04     ` John Snow
  2018-07-02  6:45       ` Fam Zheng
  2018-07-02 12:58       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 28+ messages in thread
From: John Snow @ 2018-06-29 21:04 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, famz, armbru, mreitz, den



On 06/29/2018 01:58 PM, Eric Blake wrote:
> On 06/29/2018 10:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>> From: Fam Zheng <famz@redhat.com>
>>
>> This tests the workflow of creating a lightweight point-in-time snapshot
>> with blockdev-backup command, and exporting it with built-in NBD server.
>>
>> It's tested that any post-snapshot writing to the original device
>> doesn't change data seen in NBD target.
>>
>> Signed-off-by: Fam Zheng <famz@redhat.com>
>> [vsementsov: add -f iotests.imgfmt to qemu_io, fix target_img to be
>> always qcow2, add -r to qemu_io for nbd, add fleecing-filter layer,
>> wrap long lines]
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> For my own records: Fam's most recent post was:
> https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg03895.html
> [Qemu-devel] [PATCH v20 15/15] qemu-iotests: Image fleecing test case 089
> 
> Wow - has it really been FOUR YEARS since we first proposed this?
> 

Yup...

> I'm trying to figure out why that original post was abandoned; it looks
> like:
> https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg04333.html
> 
> split that v20 series into stuff that was ready (which made it in back
> then), and stuff that was still undergoing comments (which then stalled
> and got lost in the shuffle until now).
> 
> My next question: how does this compare to John's recent patch?  And how
> much of Fam's original work did John borrow (that self.patterns array
> looks pretty common, now that I'm looking at multiple mails - although
> John's version lacks any commit message body):
> https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08594.html
> 

Mine was indeed inspired by Fam's version, but I opted to rewrite it
instead of continue with the python unit tests approach. This is why I
lost the commit message, existing copyright text, etc.

I can re-add him as author credit to my version, or at least mention in
the commit message that it was based on a proposed test that he wrote.

I'm sorry if that was a faux pas.

> And given the iterative improvements that have gone into John's version
> (such as checking that the fleece image reads overwritten zeroes just as
> correctly as overwritten data), we definitely need to merge the two
> approaches into one patch.
> 

And before we do that, getting to the bottom of patch 2/3 in this series
is in order.

>> +++ b/tests/qemu-iotests/222
>> @@ -0,0 +1,112 @@
>> +#!/usr/bin/env python
>> +#
>> +# Tests for image fleecing (point in time snapshot export to NBD)
>> +#
>> +# Copyright (C) 2014 Red Hat, Inc.
>> +#
> 
> Interesting copyright line; it matches Fam's original post (and the fact
> that you kept him as author); whereas John's version had the humorous:
> 
> +# Copyright (C) 2018 Red Hat, Inc.
> +# John helped, too.
> 
>> +# Based on 055.
> 
> Is this information still useful? John dropped it.
> 

My version started with a blank test which was based on an unmerged
test, so it didn't have anything useful to reference as a "source." If
we keep the python unit test style here, keeping this line is fine.

>> +
>> +test_img = os.path.join(iotests.test_dir, 'test.img')
>> +target_img = os.path.join(iotests.test_dir, 'target.img')
>> +nbd_sock = os.path.join(iotests.test_dir, 'nbd.sock')
>> +
>> +class TestImageFleecing(iotests.QMPTestCase):
>> +    image_len = 64 * 1024 * 1024 # MB
>> +
>> +    def setUp(self):
>> +        # Write data to the image so we can compare later
>> +        qemu_img('create', '-f', iotests.imgfmt, test_img,
>> +                 str(TestImageFleecing.image_len))
> 
> Difference in style between nested methods of a class, vs. directly
> using 'with iotests.FilePath(..., iotests.VM() as vm:'
> 

Because this version will use the cleanup infrastructure as part of the
unit tests, unlike the python scripting approach which needs to manage
its own cleanup.

>> +        self.patterns = [
>> +                ("0x5d", "0", "64k"),
>> +                ("0xd5", "1M", "64k"),
>> +                ("0xdc", "32M", "64k"),
>> +                ("0xdc", "67043328", "64k")]
> 
> John spelled it:
>  ("0xcd", "0x3ff0000", "64k"]  # 64M - 64K
> 
> The 0xdc vs. 0xcd doesn't matter all that much, but having distinct
> patterns in distinct ranges is kind of nice if we have to later inspect
> a file.
> 

I felt like the hex aided clarity to show which cluster we were
targeting. I'm not very good at reading raw byte values.

>> +
>> +        for p in self.patterns:
>> +            qemu_io('-f', iotests.imgfmt, '-c', 'write -P%s %s %s' % p,
>> +                    test_img)
>> +
>> +        qemu_img('create', '-f', 'qcow2', target_img,
>> +                 str(TestImageFleecing.image_len))
>> +
>> +        self.vm = iotests.VM().add_drive(test_img)
>> +        self.vm.launch()
>> +
>> +        self.overwrite_patterns = [
>> +                ("0xa0", "0", "64k"),
>> +                ("0x0a", "1M", "64k"),
>> +                ("0x55", "32M", "64k"),
>> +                ("0x56", "67043328", "64k")]
>> +
>> +        self.nbd_uri = "nbd+unix:///drive1?socket=%s" % nbd_sock
>> +
>> +    def tearDown(self):
>> +        self.vm.shutdown()
>> +        os.remove(test_img)
>> +        os.remove(target_img)
> 
> I'm not seeing anything that stops the NBD server; John's version added
> that at my insistence.
> 
>> +
>> +    def verify_patterns(self):
>> +        for p in self.patterns:
>> +            self.assertEqual(
>> +                    -1,
>> +                    qemu_io(self.nbd_uri, '-r', '-c', 'read -P%s %s
>> %s' % p)
>> +                        .find("verification failed"),
>> +                    "Failed to verify pattern: %s %s %s" % p)
>> +
>> +    def test_image_fleecing(self):
>> +        result = self.vm.qmp("blockdev-add", **{
>> +            "driver": "fleecing-filter",
>> +            "node-name": "drive1",
>> +            "file": {
>> +                "driver": "qcow2",
>> +                "file": {
>> +                    "driver": "file",
>> +                    "filename": target_img,
>> +                    },
>> +                "backing": "drive0",
>> +            }
>> +            })
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        result = self.vm.qmp(
>> +                "nbd-server-start",
>> +                **{"addr": { "type": "unix", "data": { "path":
>> nbd_sock } } })
>> +        self.assert_qmp(result, 'return', {})
>> +        result = self.vm.qmp("blockdev-backup", device="drive0",
>> +                             target="drive1", sync="none")
>> +        self.assert_qmp(result, 'return', {})
>> +        result = self.vm.qmp("nbd-server-add", device="drive1")
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        self.verify_patterns()
>> +
>> +        for p in self.overwrite_patterns:
>> +            self.vm.hmp_qemu_io("drive0", "write -P%s %s %s" % p)
>> +
>> +        self.verify_patterns()
>> +
>> +        self.cancel_and_wait(resume=True)
>> +        self.assert_no_active_block_jobs()
>> +
>> +if __name__ == '__main__':
>> +    iotests.main(supported_fmts=['raw', 'qcow2'])
>> diff --git a/tests/qemu-iotests/222.out b/tests/qemu-iotests/222.out
>> new file mode 100644
>> index 0000000000..ae1213e6f8
>> --- /dev/null
>> +++ b/tests/qemu-iotests/222.out
>> @@ -0,0 +1,5 @@
>> +.
>> +----------------------------------------------------------------------
>> +Ran 1 tests
>> +
> 
> I also much prefer the output style that resulted in John's version.
> 

Yes, I think the "python script" style is superior because you can get
debug printfs from a test build of QEMU into the diff output instead of
having to edit the iotests infrastructure to stop deleting the logs so
you can read them.

>> +OK
>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>> index eea75819d2..8019a9f721 100644
>> --- a/tests/qemu-iotests/group
>> +++ b/tests/qemu-iotests/group
>> @@ -220,3 +220,4 @@
>>   218 rw auto quick
>>   219 rw auto
>>   221 rw auto quick
>> +222 rw auto quick
>>
> 
I would prefer to use "my" version as a base, but we can modify it to
use the fleecing filter if that winds up being requisite. Your patch 1/3
is better, though.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-06-29 17:24   ` Eric Blake
@ 2018-07-02  6:35     ` Fam Zheng
  2018-07-02 11:27       ` Vladimir Sementsov-Ogievskiy
  2018-07-02 11:47     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 28+ messages in thread
From: Fam Zheng @ 2018-07-02  6:35 UTC (permalink / raw)
  To: Eric Blake
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, armbru,
	kwolf, mreitz, jsnow, den

On Fri, 06/29 12:24, Eric Blake wrote:
> On 06/29/2018 10:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> > We need to synchronize backup job with reading from fleecing image
> > like it was done in block/replication.c.
> > 
> > Otherwise, the following situation is theoretically possible:
> > 
> 
> Grammar suggestions:
> 
> > 1. client start reading
> 
> client starts reading
> 
> > 2. client understand, that there is no corresponding cluster in
> >     fleecing image
> > 3. client is going to read from backing file (i.e. active image)
> 
> client sees that no corresponding cluster has been allocated in the fleecing
> image, so the request is forwarded to the backing file
> 
> > 4. guest writes to active image
> > 5. this write is stopped by backup(sync=none) and cluster is copied to
> >     fleecing image
> > 6. guest write continues...
> > 7. and client reads _new_ (or partly new) date from active image
> 
> Interesting race. Can it actually happen, or does our read code already
> serialize writes to the same area while a read is underway?

Yes, I wonder why wait_serialising_requests() is not enough. If it's possible,
can we have a test case (with help of blkdebug, for example)?

> 
> In short, I see what problem you are claiming exists: the moment the client
> starts reading from the backing file, that portion of the backing file must
> remain unchanged until after the client is done reading.  But I don't know
> enough details of the block layer to know if this is actually a problem, or
> if adding the new filter is just overhead.

Fam

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222
  2018-06-29 21:04     ` John Snow
@ 2018-07-02  6:45       ` Fam Zheng
  2018-07-02 12:58       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 28+ messages in thread
From: Fam Zheng @ 2018-07-02  6:45 UTC (permalink / raw)
  To: Eric Blake, John Snow
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, kwolf,
	armbru, mreitz, den

On Fri, 06/29 17:04, John Snow wrote:
> Mine was indeed inspired by Fam's version, but I opted to rewrite it
> instead of continue with the python unit tests approach. This is why I
> lost the commit message, existing copyright text, etc.
> 
> I can re-add him as author credit to my version, or at least mention in
> the commit message that it was based on a proposed test that he wrote.
> 
> I'm sorry if that was a faux pas.

No, especially since you basically rewrote it. Thank you both for mentioning
this. :)

Fam

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-07-02  6:35     ` Fam Zheng
@ 2018-07-02 11:27       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-02 11:27 UTC (permalink / raw)
  To: Fam Zheng, Eric Blake
  Cc: qemu-devel, qemu-block, armbru, kwolf, mreitz, jsnow, den

02.07.2018 09:35, Fam Zheng wrote:
> On Fri, 06/29 12:24, Eric Blake wrote:
>> On 06/29/2018 10:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> We need to synchronize backup job with reading from fleecing image
>>> like it was done in block/replication.c.
>>>
>>> Otherwise, the following situation is theoretically possible:
>>>
>> Grammar suggestions:
>>
>>> 1. client start reading
>> client starts reading
>>
>>> 2. client understand, that there is no corresponding cluster in
>>>      fleecing image
>>> 3. client is going to read from backing file (i.e. active image)
>> client sees that no corresponding cluster has been allocated in the fleecing
>> image, so the request is forwarded to the backing file
>>
>>> 4. guest writes to active image
>>> 5. this write is stopped by backup(sync=none) and cluster is copied to
>>>      fleecing image
>>> 6. guest write continues...
>>> 7. and client reads _new_ (or partly new) date from active image
>> Interesting race. Can it actually happen, or does our read code already
>> serialize writes to the same area while a read is underway?
> Yes, I wonder why wait_serialising_requests() is not enough. If it's possible,
> can we have a test case (with help of blkdebug, for example)?

Hmm, only unaligned and COPY_ON_READ requests are marked serializing.. 
So If I understand correctly, nothing special prevents intersection, 
it's a problem of the guest. But in backup case, it's our problem.

>
>> In short, I see what problem you are claiming exists: the moment the client
>> starts reading from the backing file, that portion of the backing file must
>> remain unchanged until after the client is done reading.  But I don't know
>> enough details of the block layer to know if this is actually a problem, or
>> if adding the new filter is just overhead.
> Fam


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-06-29 17:24   ` Eric Blake
  2018-07-02  6:35     ` Fam Zheng
@ 2018-07-02 11:47     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-02 11:47 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: armbru, kwolf, mreitz, jsnow, famz, den

29.06.2018 20:24, Eric Blake wrote:
> On 06/29/2018 10:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>> We need to synchronize backup job with reading from fleecing image
>> like it was done in block/replication.c.
>>
>> Otherwise, the following situation is theoretically possible:
>>
>
> Grammar suggestions:
>
>> 1. client start reading
>
> client starts reading
>
>> 2. client understand, that there is no corresponding cluster in
>>     fleecing image
>> 3. client is going to read from backing file (i.e. active image)
>
> client sees that no corresponding cluster has been allocated in the 
> fleecing image, so the request is forwarded to the backing file
>
>> 4. guest writes to active image
>> 5. this write is stopped by backup(sync=none) and cluster is copied to
>>     fleecing image
>> 6. guest write continues...
>> 7. and client reads _new_ (or partly new) date from active image
>
> Interesting race. Can it actually happen, or does our read code 
> already serialize writes to the same area while a read is underway?
>
> In short, I see what problem you are claiming exists: the moment the 
> client starts reading from the backing file, that portion of the 
> backing file must remain unchanged until after the client is done 
> reading.  But I don't know enough details of the block layer to know 
> if this is actually a problem, or if adding the new filter is just 
> overhead.


Looking at the code, more real example (but I still have no reproducer):

1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and 
goes up to l2 table loading (assume cache miss)
2) guest write => backup COW => qcow2 write => try to take qcow2 mutex 
=> waiting
3. l2 table loaded, we see that cluster is UNALLOCATED, go to "case 
QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before 
bdrv_co_preadv(bs->backing, ...)
4) aha, mutex unlocked, backup COW continues, and we finally finish 
guest write and change cluster in our active disk
5. actually, do bdrv_co_preadv(bs->backing, ...) and read _new updated_ 
data.



>
>>
>> So, this fleecing-filter should be above fleecing image, the whole
>> picture of fleecing looks like this:
>>
>>      +-------+           +------------+
>>      |       |           |            |
>>      | guest |           | NBD client +<------+
>>      |       |           |            |       |
>>      ++-----++           +------------+       |only read
>>       |     ^                                 |
>>       | IO  |                                 |
>>       v     |                           +-----+------+
>>      ++-----+---------+                 |            |
>>      |                |                 |  internal  |
>>      |  active image  +----+            | NBD server |
>>      |                |    |            |            |
>>      +-+--------------+    |backup      +-+----------+
>>        ^                   |sync=none     ^
>>        |backing            |              |only read
>>        |                   |              |
>>      +-+--------------+    |       +------+----------+
>>      |                |    |       |                 |
>>      | fleecing image +<---+       | fleecing filter |
>>      |                |            |                 |
>>      +--------+-------+            +-----+-----------+
>>               ^                          |
>>               |                          |
>>               +--------------------------+
>>                         file
>
> Can you also show the sequence of QMP commands to set up this 
> structure (or maybe you do in 3/3; which I haven't looked at yet).
>
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json    |  6 ++--
>>   block/fleecing-filter.c | 80 
>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/Makefile.objs     |  1 +
>>   3 files changed, 85 insertions(+), 2 deletions(-)
>>   create mode 100644 block/fleecing-filter.c
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 577ce5e999..43872c3d79 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2542,7 +2542,8 @@
>>               'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 
>> 'nfs',
>>               'null-aio', 'null-co', 'nvme', 'parallels', 'qcow', 
>> 'qcow2', 'qed',
>>               'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh',
>> -            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 
>> 'vxhs' ] }
>> +            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs',
>> +            'fleecing-filter' ] }
>
> Missing a 'since 3.0' documentation blurb; also, this enum has been 
> kept sorted, so your new filter needs to come earlier.
>
>>     ##
>>   # @BlockdevOptionsFile:
>> @@ -3594,7 +3595,8 @@
>>         'vmdk': 'BlockdevOptionsGenericCOWFormat',
>>         'vpc':        'BlockdevOptionsGenericFormat',
>>         'vvfat':      'BlockdevOptionsVVFAT',
>> -      'vxhs':       'BlockdevOptionsVxHS'
>> +      'vxhs':       'BlockdevOptionsVxHS',
>> +      'fleecing-filter': 'BlockdevOptionsGenericFormat'
>
> Again, this has been kept sorted.
>
>> +static coroutine_fn int fleecing_co_preadv(BlockDriverState *bs,
>> +                                           uint64_t offset, uint64_t 
>> bytes,
>> + QEMUIOVector *qiov, int flags)
>> +{
>> +    int ret;
>> +    BlockJob *job = bs->file->bs->backing->bs->job;
>> +    CowRequest req;
>> +
>> +    backup_wait_for_overlapping_requests(job, offset, bytes);
>> +    backup_cow_request_begin(&req, job, offset, bytes);
>> +
>> +    ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
>> +
>> +    backup_cow_request_end(&req);
>> +
>> +    return ret;
>> +}
>
> So the idea here is that you force a serializing request to ensure 
> that there are no other writes to the area in the meantime.
>
>> +
>> +static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
>> +                                            uint64_t offset, 
>> uint64_t bytes,
>> + QEMUIOVector *qiov, int flags)
>> +{
>> +    return -EINVAL;
>
> and you force this to be a read-only interface. (Does the block layer 
> actually require us to provide a pwritev callback, or can we leave it 
> NULL instead?)
>
>> +BlockDriver bdrv_fleecing_filter = {
>> +    .format_name = "fleecing-filter",
>> +    .protocol_name = "fleecing-filter",
>> +    .instance_size = 0,
>> +
>> +    .bdrv_open = fleecing_open,
>> +    .bdrv_close = fleecing_close,
>> +
>> +    .bdrv_getlength = fleecing_getlength,
>> +    .bdrv_co_preadv = fleecing_co_preadv,
>> +    .bdrv_co_pwritev = fleecing_co_pwritev,
>> +
>> +    .is_filter = true,
>> +    .bdrv_recurse_is_first_non_filter = 
>> fleecing_recurse_is_first_non_filter,
>> +    .bdrv_child_perm        = bdrv_filter_default_perms,
>
> No .bdrv_co_block_status callback?  That probably hurts querying for 
> sparse regions.
>

hm, worth add.. and it possibly needs synchronization with backup too.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-06-29 17:30   ` John Snow
  2018-06-29 17:40     ` Eric Blake
@ 2018-07-02 11:57     ` Vladimir Sementsov-Ogievskiy
  2018-07-03 11:22       ` Kevin Wolf
  1 sibling, 1 reply; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-02 11:57 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block
  Cc: eblake, armbru, kwolf, mreitz, famz, den

29.06.2018 20:30, John Snow wrote:
>
> On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>> We need to synchronize backup job with reading from fleecing image
>> like it was done in block/replication.c.
>>
>> Otherwise, the following situation is theoretically possible:
>>
>> 1. client start reading
>> 2. client understand, that there is no corresponding cluster in
>>     fleecing image
> I don't think the client refocuses the read, but QEMU does. (the client
> has no idea if it is reading from the fleecing node or the backing file.)
>
> ... but understood:
>
>> 3. client is going to read from backing file (i.e. active image)
>> 4. guest writes to active image
> My question here is if QEMU will allow reads and writes to interleave in
> general. If the client is reading from the backing file, the active
> image, will QEMU allow a write to modify that data before we're done
> getting that data?

If I understand correctly: yes. Physical drives allows intersecting 
operations too and there no guarantee on result. We have serializing 
requests, but in general only unaligned requests are marked serializing.

>
>> 5. this write is stopped by backup(sync=none) and cluster is copied to
>>     fleecing image
>> 6. guest write continues...
>> 7. and client reads _new_ (or partly new) date from active image
>>
> I can't answer this for myself one way or the other right now, but I
> imagine you observed a failure in practice in your test setups, which
> motivated this patch?

No. I just found this code in Qemu: block/replication already use the 
same scheme with backup(sync=none). And it uses the synchronization as 
in this patch. I've discussed it on list a bit:
http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg01807.html
https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg03025.html

>
> A test or some proof would help justification for this patch. It would
> also help prove that it solves what it claims to!

Agree, a reproducer is better than any theoretical reflections. I'm 
thinking about it, but don't see simple solution...

>
>> So, this fleecing-filter should be above fleecing image, the whole
>> picture of fleecing looks like this:
>>
>>      +-------+           +------------+
>>      |       |           |            |
>>      | guest |           | NBD client +<------+
>>      |       |           |            |       |
>>      ++-----++           +------------+       |only read
>>       |     ^                                 |
>>       | IO  |                                 |
>>       v     |                           +-----+------+
>>      ++-----+---------+                 |            |
>>      |                |                 |  internal  |
>>      |  active image  +----+            | NBD server |
>>      |                |    |            |            |
>>      +-+--------------+    |backup      +-+----------+
>>        ^                   |sync=none     ^
>>        |backing            |              |only read
>>        |                   |              |
>>      +-+--------------+    |       +------+----------+
>>      |                |    |       |                 |
>>      | fleecing image +<---+       | fleecing filter |
>>      |                |            |                 |
>>      +--------+-------+            +-----+-----------+
>>               ^                          |
>>               |                          |
>>               +--------------------------+
>>                         file
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json    |  6 ++--
>>   block/fleecing-filter.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/Makefile.objs     |  1 +
>>   3 files changed, 85 insertions(+), 2 deletions(-)
>>   create mode 100644 block/fleecing-filter.c
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 577ce5e999..43872c3d79 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2542,7 +2542,8 @@
>>               'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
>>               'null-aio', 'null-co', 'nvme', 'parallels', 'qcow', 'qcow2', 'qed',
>>               'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh',
>> -            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] }
>> +            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs',
>> +            'fleecing-filter' ] }
>>   
>>   ##
>>   # @BlockdevOptionsFile:
>> @@ -3594,7 +3595,8 @@
>>         'vmdk':       'BlockdevOptionsGenericCOWFormat',
>>         'vpc':        'BlockdevOptionsGenericFormat',
>>         'vvfat':      'BlockdevOptionsVVFAT',
>> -      'vxhs':       'BlockdevOptionsVxHS'
>> +      'vxhs':       'BlockdevOptionsVxHS',
>> +      'fleecing-filter': 'BlockdevOptionsGenericFormat'
>>     } }
>>   
>>   ##
>> diff --git a/block/fleecing-filter.c b/block/fleecing-filter.c
>> new file mode 100644
>> index 0000000000..b501887c10
>> --- /dev/null
>> +++ b/block/fleecing-filter.c
>> @@ -0,0 +1,80 @@
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +#include "block/blockjob.h"
>> +#include "block/block_int.h"
>> +#include "block/block_backup.h"
>> +
>> +static int64_t fleecing_getlength(BlockDriverState *bs)
>> +{
>> +    return bdrv_getlength(bs->file->bs);
>> +}
>> +
>> +static coroutine_fn int fleecing_co_preadv(BlockDriverState *bs,
>> +                                           uint64_t offset, uint64_t bytes,
>> +                                           QEMUIOVector *qiov, int flags)
>> +{
>> +    int ret;
>> +    BlockJob *job = bs->file->bs->backing->bs->job;
>> +    CowRequest req;
>> +
>> +    backup_wait_for_overlapping_requests(job, offset, bytes);
>> +    backup_cow_request_begin(&req, job, offset, bytes);
>> +
>> +    ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
>> +
>> +    backup_cow_request_end(&req);
>> +
>> +    return ret;
>> +}
>> +
>> +static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
>> +                                            uint64_t offset, uint64_t bytes,
>> +                                            QEMUIOVector *qiov, int flags)
>> +{
>> +    return -EINVAL;
>> +}
>> +
>> +static bool fleecing_recurse_is_first_non_filter(BlockDriverState *bs,
>> +                                                 BlockDriverState *candidate)
>> +{
>> +    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
>> +}
>> +
>> +static int fleecing_open(BlockDriverState *bs, QDict *options,
>> +                         int flags, Error **errp)
>> +{
>> +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, false,
>> +                               errp);
>> +
>> +    return bs->file ? 0 : -EINVAL;
>> +}
>> +
>> +static void fleecing_close(BlockDriverState *bs)
>> +{
>> +    /* Do nothing, we have to add .bdrv_close, because bdrv_close() don't check
>> +     * it, just call. */
>> +}
>> +
>> +BlockDriver bdrv_fleecing_filter = {
>> +    .format_name = "fleecing-filter",
>> +    .protocol_name = "fleecing-filter",
>> +    .instance_size = 0,
>> +
>> +    .bdrv_open = fleecing_open,
>> +    .bdrv_close = fleecing_close,
>> +
>> +    .bdrv_getlength = fleecing_getlength,
>> +    .bdrv_co_preadv = fleecing_co_preadv,
>> +    .bdrv_co_pwritev = fleecing_co_pwritev,
>> +
>> +    .is_filter = true,
>> +    .bdrv_recurse_is_first_non_filter = fleecing_recurse_is_first_non_filter,
>> +    .bdrv_child_perm        = bdrv_filter_default_perms,
>> +};
>> +
>> +static void bdrv_fleecing_init(void)
>> +{
>> +    bdrv_register(&bdrv_fleecing_filter);
>> +}
>> +
>> +block_init(bdrv_fleecing_init);
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 899bfb5e2c..aa0a6dd971 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -27,6 +27,7 @@ block-obj-y += write-threshold.o
>>   block-obj-y += backup.o
>>   block-obj-$(CONFIG_REPLICATION) += replication.o
>>   block-obj-y += throttle.o copy-on-read.o
>> +block-obj-y += fleecing-filter.o
>>   
>>   block-obj-y += crypto.o
>>   
>>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-06-29 17:40     ` Eric Blake
@ 2018-07-02 12:09       ` Vladimir Sementsov-Ogievskiy
  2018-07-03 11:15         ` Kevin Wolf
  0 siblings, 1 reply; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-02 12:09 UTC (permalink / raw)
  To: Eric Blake, John Snow, qemu-devel, qemu-block
  Cc: armbru, kwolf, mreitz, famz, den

29.06.2018 20:40, Eric Blake wrote:
> On 06/29/2018 12:30 PM, John Snow wrote:
>>
>>
>> On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> We need to synchronize backup job with reading from fleecing image
>>> like it was done in block/replication.c.
>>>
>>> Otherwise, the following situation is theoretically possible:
>>>
>>> 1. client start reading
>>> 2. client understand, that there is no corresponding cluster in
>>>     fleecing image
>>
>> I don't think the client refocuses the read, but QEMU does. (the client
>> has no idea if it is reading from the fleecing node or the backing 
>> file.)
>>
>> ... but understood:
>>
>>> 3. client is going to read from backing file (i.e. active image)
>>> 4. guest writes to active image
>>
>> My question here is if QEMU will allow reads and writes to interleave in
>> general. If the client is reading from the backing file, the active
>> image, will QEMU allow a write to modify that data before we're done
>> getting that data?
>>
>>> 5. this write is stopped by backup(sync=none) and cluster is copied to
>>>     fleecing image
>>> 6. guest write continues...
>>> 7. and client reads _new_ (or partly new) date from active image
>>>
>>
>> I can't answer this for myself one way or the other right now, but I
>> imagine you observed a failure in practice in your test setups, which
>> motivated this patch?
>>
>> A test or some proof would help justification for this patch. It would
>> also help prove that it solves what it claims to!
>
> In fact, do we really need a new filter, or do we just need to make 
> the "sync":"none" blockdev-backup job take the appropriate 
> synchronization locks?
>

How? We'll need additional mechanism like serializing requests.. Or a 
way to reuse serializing requests. Using backup internal synchronization 
looks simpler, and it is already used in block replication.
On the other hand, I think the best thing is instead refuse 
backup(sync=none), and do the whole functionality for external backup in 
the special filter driver, but for now I'd prefer to start with already 
existing staff.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222
  2018-06-29 21:04     ` John Snow
  2018-07-02  6:45       ` Fam Zheng
@ 2018-07-02 12:58       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-02 12:58 UTC (permalink / raw)
  To: John Snow, Eric Blake, qemu-devel, qemu-block
  Cc: kwolf, famz, armbru, mreitz, den

30.06.2018 00:04, John Snow wrote:
>
> On 06/29/2018 01:58 PM, Eric Blake wrote:
>> On 06/29/2018 10:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> From: Fam Zheng <famz@redhat.com>
>>>
>>> This tests the workflow of creating a lightweight point-in-time snapshot
>>> with blockdev-backup command, and exporting it with built-in NBD server.
>>>
>>> It's tested that any post-snapshot writing to the original device
>>> doesn't change data seen in NBD target.
>>>
>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>> [vsementsov: add -f iotests.imgfmt to qemu_io, fix target_img to be
>>> always qcow2, add -r to qemu_io for nbd, add fleecing-filter layer,
>>> wrap long lines]
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> For my own records: Fam's most recent post was:
>> https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg03895.html
>> [Qemu-devel] [PATCH v20 15/15] qemu-iotests: Image fleecing test case 089
>>
>> Wow - has it really been FOUR YEARS since we first proposed this?
>>
> Yup...
>
>> I'm trying to figure out why that original post was abandoned; it looks
>> like:
>> https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg04333.html
>>
>> split that v20 series into stuff that was ready (which made it in back
>> then), and stuff that was still undergoing comments (which then stalled
>> and got lost in the shuffle until now).
>>
>> My next question: how does this compare to John's recent patch?  And how
>> much of Fam's original work did John borrow (that self.patterns array
>> looks pretty common, now that I'm looking at multiple mails - although
>> John's version lacks any commit message body):
>> https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08594.html
>>
> Mine was indeed inspired by Fam's version, but I opted to rewrite it
> instead of continue with the python unit tests approach. This is why I
> lost the commit message, existing copyright text, etc.
>
> I can re-add him as author credit to my version, or at least mention in
> the commit message that it was based on a proposed test that he wrote.
>
> I'm sorry if that was a faux pas.
>
>> And given the iterative improvements that have gone into John's version
>> (such as checking that the fleece image reads overwritten zeroes just as
>> correctly as overwritten data), we definitely need to merge the two
>> approaches into one patch.
>>
> And before we do that, getting to the bottom of patch 2/3 in this series
> is in order.
>
>>> +++ b/tests/qemu-iotests/222
>>> @@ -0,0 +1,112 @@
>>> +#!/usr/bin/env python
>>> +#
>>> +# Tests for image fleecing (point in time snapshot export to NBD)
>>> +#
>>> +# Copyright (C) 2014 Red Hat, Inc.
>>> +#
>> Interesting copyright line; it matches Fam's original post (and the fact
>> that you kept him as author); whereas John's version had the humorous:
>>
>> +# Copyright (C) 2018 Red Hat, Inc.
>> +# John helped, too.
>>
>>> +# Based on 055.
>> Is this information still useful? John dropped it.
>>
> My version started with a blank test which was based on an unmerged
> test, so it didn't have anything useful to reference as a "source." If
> we keep the python unit test style here, keeping this line is fine.
>
>>> +
>>> +test_img = os.path.join(iotests.test_dir, 'test.img')
>>> +target_img = os.path.join(iotests.test_dir, 'target.img')
>>> +nbd_sock = os.path.join(iotests.test_dir, 'nbd.sock')
>>> +
>>> +class TestImageFleecing(iotests.QMPTestCase):
>>> +    image_len = 64 * 1024 * 1024 # MB
>>> +
>>> +    def setUp(self):
>>> +        # Write data to the image so we can compare later
>>> +        qemu_img('create', '-f', iotests.imgfmt, test_img,
>>> +                 str(TestImageFleecing.image_len))
>> Difference in style between nested methods of a class, vs. directly
>> using 'with iotests.FilePath(..., iotests.VM() as vm:'
>>
> Because this version will use the cleanup infrastructure as part of the
> unit tests, unlike the python scripting approach which needs to manage
> its own cleanup.
>
>>> +        self.patterns = [
>>> +                ("0x5d", "0", "64k"),
>>> +                ("0xd5", "1M", "64k"),
>>> +                ("0xdc", "32M", "64k"),
>>> +                ("0xdc", "67043328", "64k")]
>> John spelled it:
>>   ("0xcd", "0x3ff0000", "64k"]  # 64M - 64K
>>
>> The 0xdc vs. 0xcd doesn't matter all that much, but having distinct
>> patterns in distinct ranges is kind of nice if we have to later inspect
>> a file.
>>
> I felt like the hex aided clarity to show which cluster we were
> targeting. I'm not very good at reading raw byte values.
>
>>> +
>>> +        for p in self.patterns:
>>> +            qemu_io('-f', iotests.imgfmt, '-c', 'write -P%s %s %s' % p,
>>> +                    test_img)
>>> +
>>> +        qemu_img('create', '-f', 'qcow2', target_img,
>>> +                 str(TestImageFleecing.image_len))
>>> +
>>> +        self.vm = iotests.VM().add_drive(test_img)
>>> +        self.vm.launch()
>>> +
>>> +        self.overwrite_patterns = [
>>> +                ("0xa0", "0", "64k"),
>>> +                ("0x0a", "1M", "64k"),
>>> +                ("0x55", "32M", "64k"),
>>> +                ("0x56", "67043328", "64k")]
>>> +
>>> +        self.nbd_uri = "nbd+unix:///drive1?socket=%s" % nbd_sock
>>> +
>>> +    def tearDown(self):
>>> +        self.vm.shutdown()
>>> +        os.remove(test_img)
>>> +        os.remove(target_img)
>> I'm not seeing anything that stops the NBD server; John's version added
>> that at my insistence.
>>
>>> +
>>> +    def verify_patterns(self):
>>> +        for p in self.patterns:
>>> +            self.assertEqual(
>>> +                    -1,
>>> +                    qemu_io(self.nbd_uri, '-r', '-c', 'read -P%s %s
>>> %s' % p)
>>> +                        .find("verification failed"),
>>> +                    "Failed to verify pattern: %s %s %s" % p)
>>> +
>>> +    def test_image_fleecing(self):
>>> +        result = self.vm.qmp("blockdev-add", **{
>>> +            "driver": "fleecing-filter",
>>> +            "node-name": "drive1",
>>> +            "file": {
>>> +                "driver": "qcow2",
>>> +                "file": {
>>> +                    "driver": "file",
>>> +                    "filename": target_img,
>>> +                    },
>>> +                "backing": "drive0",
>>> +            }
>>> +            })
>>> +        self.assert_qmp(result, 'return', {})
>>> +
>>> +        result = self.vm.qmp(
>>> +                "nbd-server-start",
>>> +                **{"addr": { "type": "unix", "data": { "path":
>>> nbd_sock } } })
>>> +        self.assert_qmp(result, 'return', {})
>>> +        result = self.vm.qmp("blockdev-backup", device="drive0",
>>> +                             target="drive1", sync="none")
>>> +        self.assert_qmp(result, 'return', {})
>>> +        result = self.vm.qmp("nbd-server-add", device="drive1")
>>> +        self.assert_qmp(result, 'return', {})
>>> +
>>> +        self.verify_patterns()
>>> +
>>> +        for p in self.overwrite_patterns:
>>> +            self.vm.hmp_qemu_io("drive0", "write -P%s %s %s" % p)
>>> +
>>> +        self.verify_patterns()
>>> +
>>> +        self.cancel_and_wait(resume=True)
>>> +        self.assert_no_active_block_jobs()
>>> +
>>> +if __name__ == '__main__':
>>> +    iotests.main(supported_fmts=['raw', 'qcow2'])
>>> diff --git a/tests/qemu-iotests/222.out b/tests/qemu-iotests/222.out
>>> new file mode 100644
>>> index 0000000000..ae1213e6f8
>>> --- /dev/null
>>> +++ b/tests/qemu-iotests/222.out
>>> @@ -0,0 +1,5 @@
>>> +.
>>> +----------------------------------------------------------------------
>>> +Ran 1 tests
>>> +
>> I also much prefer the output style that resulted in John's version.
>>
> Yes, I think the "python script" style is superior because you can get
> debug printfs from a test build of QEMU into the diff output instead of
> having to edit the iotests infrastructure to stop deleting the logs so
> you can read them.

infrastructure for python unittest based tests definitely should be 
improved..

>
>>> +OK
>>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>>> index eea75819d2..8019a9f721 100644
>>> --- a/tests/qemu-iotests/group
>>> +++ b/tests/qemu-iotests/group
>>> @@ -220,3 +220,4 @@
>>>    218 rw auto quick
>>>    219 rw auto
>>>    221 rw auto quick
>>> +222 rw auto quick
>>>
> I would prefer to use "my" version as a base, but we can modify it to
> use the fleecing filter if that winds up being requisite. Your patch 1/3
> is better, though.

Ok, no objections

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-07-02 12:09       ` Vladimir Sementsov-Ogievskiy
@ 2018-07-03 11:15         ` Kevin Wolf
  2018-07-03 11:52           ` Vladimir Sementsov-Ogievskiy
                             ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Kevin Wolf @ 2018-07-03 11:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Eric Blake, John Snow, qemu-devel, qemu-block, armbru, mreitz, famz, den

Am 02.07.2018 um 14:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 29.06.2018 20:40, Eric Blake wrote:
> > On 06/29/2018 12:30 PM, John Snow wrote:
> > > 
> > > 
> > > On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> > > > We need to synchronize backup job with reading from fleecing image
> > > > like it was done in block/replication.c.
> > > > 
> > > > Otherwise, the following situation is theoretically possible:
> > > > 
> > > > 1. client start reading
> > > > 2. client understand, that there is no corresponding cluster in
> > > >     fleecing image
> > > 
> > > I don't think the client refocuses the read, but QEMU does. (the client
> > > has no idea if it is reading from the fleecing node or the backing
> > > file.)
> > > 
> > > ... but understood:
> > > 
> > > > 3. client is going to read from backing file (i.e. active image)
> > > > 4. guest writes to active image
> > > 
> > > My question here is if QEMU will allow reads and writes to interleave in
> > > general. If the client is reading from the backing file, the active
> > > image, will QEMU allow a write to modify that data before we're done
> > > getting that data?
> > > 
> > > > 5. this write is stopped by backup(sync=none) and cluster is copied to
> > > >     fleecing image
> > > > 6. guest write continues...
> > > > 7. and client reads _new_ (or partly new) date from active image
> > > > 
> > > 
> > > I can't answer this for myself one way or the other right now, but I
> > > imagine you observed a failure in practice in your test setups, which
> > > motivated this patch?
> > > 
> > > A test or some proof would help justification for this patch. It would
> > > also help prove that it solves what it claims to!
> > 
> > In fact, do we really need a new filter, or do we just need to make the
> > "sync":"none" blockdev-backup job take the appropriate synchronization
> > locks?
> > 
> 
> How? We'll need additional mechanism like serializing requests.. Or a way to
> reuse serializing requests. Using backup internal synchronization looks
> simpler, and it is already used in block replication.

But it also just an ugly hack that fixes one special case and leaves
everything else broken. replication is usually not a good example for
anything. It always gives me bad surprises when I have to look at it.

We'll have to figure out where to fix this problem (or what it really
is, once you look more than just at fleecing), but I think requiring the
user to add a filter driver to work around missing serialisation in
other code, and corrupting their image if they forget to, is not a
reasonable solution.

I see at least two things wrong in this context:

* The permissions don't seem to match reality. The NBD server
  unconditionally shares PERM_WRITE, which is wrong in this case. The
  client wants to see a point-in-time snapshot that never changes. This
  should become an option so that it can be properly reflected in the
  permissions used.

* Once we have proper permissions, the fleecing setup breaks down
  because the guest needs PERM_WRITE on the backing file, but the
  fleecing overlay allows that only if the NBD client allows it (which
  it doesn't for fleecing).

  Now we can implement an exception right into backup that installs a
  backup filter driver between source and target if the source is the
  backing file of the target. The filter driver would be similar to the
  commit filter driver in that it simply promises !PERM_WRITE to its
  parents, but allows PERM_WRITE on the source because it has installed
  the before_write_notifier that guarantees this condition.

  All writes to the target that are made by the backup job in this setup
  (including before_write_notifier writes) need to be marked as
  serialising so that any concurrent reads are completed first.

And if we decide to add a target filter to backup, we should probably at
the same time use a filter driver for intercepting source writes instead
of using before_write_notifier.

Max, I think you intended to make both source and target children of the
same block job node (or at least for mirror). But wouldn't that create
loops in a setup like this? I think we may need two filters that are
only connected through the block job, but not with block graph edges.

Kevin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-07-02 11:57     ` Vladimir Sementsov-Ogievskiy
@ 2018-07-03 11:22       ` Kevin Wolf
  0 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2018-07-03 11:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: John Snow, qemu-devel, qemu-block, eblake, armbru, mreitz, famz, den

Am 02.07.2018 um 13:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 29.06.2018 20:30, John Snow wrote:
> > 
> > On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> > > We need to synchronize backup job with reading from fleecing image
> > > like it was done in block/replication.c.
> > > 
> > > Otherwise, the following situation is theoretically possible:
> > > 
> > > 1. client start reading
> > > 2. client understand, that there is no corresponding cluster in
> > >     fleecing image
> > I don't think the client refocuses the read, but QEMU does. (the client
> > has no idea if it is reading from the fleecing node or the backing file.)
> > 
> > ... but understood:
> > 
> > > 3. client is going to read from backing file (i.e. active image)
> > > 4. guest writes to active image
> > My question here is if QEMU will allow reads and writes to interleave in
> > general. If the client is reading from the backing file, the active
> > image, will QEMU allow a write to modify that data before we're done
> > getting that data?
> 
> If I understand correctly: yes. Physical drives allows intersecting
> operations too and there no guarantee on result. We have serializing
> requests, but in general only unaligned requests are marked serializing.

Copy on read (in the context of image streaming) is the feature which
actually introduced it. It has some similarities with the backup job, so
the idea of using it for backup, too, isn't exactly revolutionary.

Kevin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-07-03 11:15         ` Kevin Wolf
@ 2018-07-03 11:52           ` Vladimir Sementsov-Ogievskiy
  2018-07-03 16:11           ` Vladimir Sementsov-Ogievskiy
  2018-07-04 14:07           ` Max Reitz
  2 siblings, 0 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-03 11:52 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Eric Blake, John Snow, qemu-devel, qemu-block, armbru, mreitz, famz, den

03.07.2018 14:15, Kevin Wolf wrote:
> Am 02.07.2018 um 14:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 29.06.2018 20:40, Eric Blake wrote:
>>> On 06/29/2018 12:30 PM, John Snow wrote:
>>>>
>>>> On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> We need to synchronize backup job with reading from fleecing image
>>>>> like it was done in block/replication.c.
>>>>>
>>>>> Otherwise, the following situation is theoretically possible:
>>>>>
>>>>> 1. client start reading
>>>>> 2. client understand, that there is no corresponding cluster in
>>>>>      fleecing image
>>>> I don't think the client refocuses the read, but QEMU does. (the client
>>>> has no idea if it is reading from the fleecing node or the backing
>>>> file.)
>>>>
>>>> ... but understood:
>>>>
>>>>> 3. client is going to read from backing file (i.e. active image)
>>>>> 4. guest writes to active image
>>>> My question here is if QEMU will allow reads and writes to interleave in
>>>> general. If the client is reading from the backing file, the active
>>>> image, will QEMU allow a write to modify that data before we're done
>>>> getting that data?
>>>>
>>>>> 5. this write is stopped by backup(sync=none) and cluster is copied to
>>>>>      fleecing image
>>>>> 6. guest write continues...
>>>>> 7. and client reads _new_ (or partly new) date from active image
>>>>>
>>>> I can't answer this for myself one way or the other right now, but I
>>>> imagine you observed a failure in practice in your test setups, which
>>>> motivated this patch?
>>>>
>>>> A test or some proof would help justification for this patch. It would
>>>> also help prove that it solves what it claims to!
>>> In fact, do we really need a new filter, or do we just need to make the
>>> "sync":"none" blockdev-backup job take the appropriate synchronization
>>> locks?
>>>
>> How? We'll need additional mechanism like serializing requests.. Or a way to
>> reuse serializing requests. Using backup internal synchronization looks
>> simpler, and it is already used in block replication.
> But it also just an ugly hack

agree.

>   that fixes one special case and leaves
> everything else broken. replication is usually not a good example for
> anything. It always gives me bad surprises when I have to look at it.
>
> We'll have to figure out where to fix this problem (or what it really
> is, once you look more than just at fleecing), but I think requiring the
> user to add a filter driver to work around missing serialisation in
> other code, and corrupting their image if they forget to, is not a
> reasonable solution.
>
> I see at least two things wrong in this context:
>
> * The permissions don't seem to match reality. The NBD server
>    unconditionally shares PERM_WRITE, which is wrong in this case. The
>    client wants to see a point-in-time snapshot that never changes. This
>    should become an option so that it can be properly reflected in the
>    permissions used.
>
> * Once we have proper permissions, the fleecing setup breaks down
>    because the guest needs PERM_WRITE on the backing file, but the
>    fleecing overlay allows that only if the NBD client allows it (which
>    it doesn't for fleecing).
>
>    Now we can implement an exception right into backup that installs a
>    backup filter driver between source and target if the source is the
>    backing file of the target. The filter driver would be similar to the
>    commit filter driver in that it simply promises !PERM_WRITE to its
>    parents, but allows PERM_WRITE on the source because it has installed
>    the before_write_notifier that guarantees this condition.
>
>    All writes to the target that are made by the backup job in this setup
>    (including before_write_notifier writes) need to be marked as
>    serialising so that any concurrent reads are completed first.
>
> And if we decide to add a target filter to backup, we should probably at
> the same time use a filter driver for intercepting source writes instead
> of using before_write_notifier.

and this is the point, where we can drop backup job at all from fleecing 
scheme, as actually backup(sync=none) == such special filter driver

>
> Max, I think you intended to make both source and target children of the
> same block job node (or at least for mirror). But wouldn't that create
> loops in a setup like this? I think we may need two filters that are
> only connected through the block job, but not with block graph edges.
>
> Kevin

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-07-03 11:15         ` Kevin Wolf
  2018-07-03 11:52           ` Vladimir Sementsov-Ogievskiy
@ 2018-07-03 16:11           ` Vladimir Sementsov-Ogievskiy
  2018-07-03 18:02             ` Kevin Wolf
  2018-07-04 14:07           ` Max Reitz
  2 siblings, 1 reply; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-03 16:11 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Eric Blake, John Snow, qemu-devel, qemu-block, armbru, mreitz, famz, den

03.07.2018 14:15, Kevin Wolf wrote:
> Am 02.07.2018 um 14:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 29.06.2018 20:40, Eric Blake wrote:
>>> On 06/29/2018 12:30 PM, John Snow wrote:
>>>>
>>>> On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> We need to synchronize backup job with reading from fleecing image
>>>>> like it was done in block/replication.c.
>>>>>
>>>>> Otherwise, the following situation is theoretically possible:
>>>>>
>>>>> 1. client start reading
>>>>> 2. client understand, that there is no corresponding cluster in
>>>>>      fleecing image
>>>> I don't think the client refocuses the read, but QEMU does. (the client
>>>> has no idea if it is reading from the fleecing node or the backing
>>>> file.)
>>>>
>>>> ... but understood:
>>>>
>>>>> 3. client is going to read from backing file (i.e. active image)
>>>>> 4. guest writes to active image
>>>> My question here is if QEMU will allow reads and writes to interleave in
>>>> general. If the client is reading from the backing file, the active
>>>> image, will QEMU allow a write to modify that data before we're done
>>>> getting that data?
>>>>
>>>>> 5. this write is stopped by backup(sync=none) and cluster is copied to
>>>>>      fleecing image
>>>>> 6. guest write continues...
>>>>> 7. and client reads _new_ (or partly new) date from active image
>>>>>
>>>> I can't answer this for myself one way or the other right now, but I
>>>> imagine you observed a failure in practice in your test setups, which
>>>> motivated this patch?
>>>>
>>>> A test or some proof would help justification for this patch. It would
>>>> also help prove that it solves what it claims to!
>>> In fact, do we really need a new filter, or do we just need to make the
>>> "sync":"none" blockdev-backup job take the appropriate synchronization
>>> locks?
>>>
>> How? We'll need additional mechanism like serializing requests.. Or a way to
>> reuse serializing requests. Using backup internal synchronization looks
>> simpler, and it is already used in block replication.
> But it also just an ugly hack that fixes one special case and leaves
> everything else broken. replication is usually not a good example for
> anything. It always gives me bad surprises when I have to look at it.
>
> We'll have to figure out where to fix this problem (or what it really
> is, once you look more than just at fleecing), but I think requiring the
> user to add a filter driver to work around missing serialisation in
> other code, and corrupting their image if they forget to, is not a
> reasonable solution.
>
> I see at least two things wrong in this context:
>
> * The permissions don't seem to match reality. The NBD server
>    unconditionally shares PERM_WRITE, which is wrong in this case. The
>    client wants to see a point-in-time snapshot that never changes. This
>    should become an option so that it can be properly reflected in the
>    permissions used.
>
> * Once we have proper permissions, the fleecing setup breaks down
>    because the guest needs PERM_WRITE on the backing file, but the
>    fleecing overlay allows that only if the NBD client allows it (which
>    it doesn't for fleecing).
>
>    Now we can implement an exception right into backup that installs a
>    backup filter driver between source and target if the source is the
>    backing file of the target. The filter driver would be similar to the
>    commit filter driver in that it simply promises !PERM_WRITE to its
>    parents, but allows PERM_WRITE on the source because it has installed
>    the before_write_notifier that guarantees this condition.
>
>    All writes to the target that are made by the backup job in this setup
>    (including before_write_notifier writes) need to be marked as
>    serialising so that any concurrent reads are completed first.
>
> And if we decide to add a target filter to backup, we should probably at
> the same time use a filter driver for intercepting source writes instead
> of using before_write_notifier.

Hmm, is it possible to do all the staff in one super filter driver, 
which we insert into the tree like this:

top blk        fleecing qcow2
      +           +
      |           |backing
      v     <-----+
    super filter
      +
      |file
      v
    active image


And super filter do the following:

1. copy-on-write, before forwarding write to file, it do serializing 
write to fleecing qcow2
2. fake .bdrv_child_perm for fleecing qcow2, like in block commit

and no block job is needed.

>
> Max, I think you intended to make both source and target children of the
> same block job node (or at least for mirror). But wouldn't that create
> loops in a setup like this? I think we may need two filters that are
> only connected through the block job, but not with block graph edges.
>
> Kevin


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-07-03 16:11           ` Vladimir Sementsov-Ogievskiy
@ 2018-07-03 18:02             ` Kevin Wolf
  0 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2018-07-03 18:02 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Eric Blake, John Snow, qemu-devel, qemu-block, armbru, mreitz, famz, den

Am 03.07.2018 um 18:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 03.07.2018 14:15, Kevin Wolf wrote:
> > We'll have to figure out where to fix this problem (or what it really
> > is, once you look more than just at fleecing), but I think requiring the
> > user to add a filter driver to work around missing serialisation in
> > other code, and corrupting their image if they forget to, is not a
> > reasonable solution.
> > 
> > I see at least two things wrong in this context:
> > 
> > * The permissions don't seem to match reality. The NBD server
> >    unconditionally shares PERM_WRITE, which is wrong in this case. The
> >    client wants to see a point-in-time snapshot that never changes. This
> >    should become an option so that it can be properly reflected in the
> >    permissions used.
> > 
> > * Once we have proper permissions, the fleecing setup breaks down
> >    because the guest needs PERM_WRITE on the backing file, but the
> >    fleecing overlay allows that only if the NBD client allows it (which
> >    it doesn't for fleecing).
> > 
> >    Now we can implement an exception right into backup that installs a
> >    backup filter driver between source and target if the source is the
> >    backing file of the target. The filter driver would be similar to the
> >    commit filter driver in that it simply promises !PERM_WRITE to its
> >    parents, but allows PERM_WRITE on the source because it has installed
> >    the before_write_notifier that guarantees this condition.
> > 
> >    All writes to the target that are made by the backup job in this setup
> >    (including before_write_notifier writes) need to be marked as
> >    serialising so that any concurrent reads are completed first.
> > 
> > And if we decide to add a target filter to backup, we should probably at
> > the same time use a filter driver for intercepting source writes instead
> > of using before_write_notifier.
> 
> Hmm, is it possible to do all the staff in one super filter driver, which we
> insert into the tree like this:
> 
> top blk        fleecing qcow2
>      +           +
>      |           |backing
>      v     <-----+
>    super filter
>      +
>      |file
>      v
>    active image
> 
> 
> And super filter do the following:
> 
> 1. copy-on-write, before forwarding write to file, it do serializing write
> to fleecing qcow2

This is where it breaks down. The filter driver in your graph doesn't
know fleecing.qcow2, so it can't write to it. Attaching fleecing.qcow2
as an additional child to the super filter doesn't work either because
you would create a loop then.

I think we need two separate nodes (and probably it's better to have
them managed by a block job so that both together can be checked to
result in a consistent setup).

> 2. fake .bdrv_child_perm for fleecing qcow2, like in block commit
> 
> and no block job is needed.

Kevin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
  2018-07-03 11:15         ` Kevin Wolf
  2018-07-03 11:52           ` Vladimir Sementsov-Ogievskiy
  2018-07-03 16:11           ` Vladimir Sementsov-Ogievskiy
@ 2018-07-04 14:07           ` Max Reitz
  2 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2018-07-04 14:07 UTC (permalink / raw)
  To: Kevin Wolf, Vladimir Sementsov-Ogievskiy
  Cc: Eric Blake, John Snow, qemu-devel, qemu-block, armbru, famz, den

[-- Attachment #1: Type: text/plain, Size: 5766 bytes --]

On 2018-07-03 13:15, Kevin Wolf wrote:
> Am 02.07.2018 um 14:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 29.06.2018 20:40, Eric Blake wrote:
>>> On 06/29/2018 12:30 PM, John Snow wrote:
>>>>
>>>>
>>>> On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> We need to synchronize backup job with reading from fleecing image
>>>>> like it was done in block/replication.c.
>>>>>
>>>>> Otherwise, the following situation is theoretically possible:
>>>>>
>>>>> 1. client start reading
>>>>> 2. client understand, that there is no corresponding cluster in
>>>>>     fleecing image
>>>>
>>>> I don't think the client refocuses the read, but QEMU does. (the client
>>>> has no idea if it is reading from the fleecing node or the backing
>>>> file.)
>>>>
>>>> ... but understood:
>>>>
>>>>> 3. client is going to read from backing file (i.e. active image)
>>>>> 4. guest writes to active image
>>>>
>>>> My question here is if QEMU will allow reads and writes to interleave in
>>>> general. If the client is reading from the backing file, the active
>>>> image, will QEMU allow a write to modify that data before we're done
>>>> getting that data?
>>>>
>>>>> 5. this write is stopped by backup(sync=none) and cluster is copied to
>>>>>     fleecing image
>>>>> 6. guest write continues...
>>>>> 7. and client reads _new_ (or partly new) date from active image
>>>>>
>>>>
>>>> I can't answer this for myself one way or the other right now, but I
>>>> imagine you observed a failure in practice in your test setups, which
>>>> motivated this patch?
>>>>
>>>> A test or some proof would help justification for this patch. It would
>>>> also help prove that it solves what it claims to!
>>>
>>> In fact, do we really need a new filter, or do we just need to make the
>>> "sync":"none" blockdev-backup job take the appropriate synchronization
>>> locks?
>>>
>>
>> How? We'll need additional mechanism like serializing requests.. Or a way to
>> reuse serializing requests. Using backup internal synchronization looks
>> simpler, and it is already used in block replication.
> 
> But it also just an ugly hack that fixes one special case and leaves
> everything else broken. replication is usually not a good example for
> anything. It always gives me bad surprises when I have to look at it.
> 
> We'll have to figure out where to fix this problem (or what it really
> is, once you look more than just at fleecing), but I think requiring the
> user to add a filter driver to work around missing serialisation in
> other code, and corrupting their image if they forget to, is not a
> reasonable solution.
> 
> I see at least two things wrong in this context:
> 
> * The permissions don't seem to match reality. The NBD server
>   unconditionally shares PERM_WRITE, which is wrong in this case. The
>   client wants to see a point-in-time snapshot that never changes. This
>   should become an option so that it can be properly reflected in the
>   permissions used.
> 
> * Once we have proper permissions, the fleecing setup breaks down
>   because the guest needs PERM_WRITE on the backing file, but the
>   fleecing overlay allows that only if the NBD client allows it (which
>   it doesn't for fleecing).
> 
>   Now we can implement an exception right into backup that installs a
>   backup filter driver between source and target if the source is the
>   backing file of the target. The filter driver would be similar to the
>   commit filter driver in that it simply promises !PERM_WRITE to its
>   parents, but allows PERM_WRITE on the source because it has installed
>   the before_write_notifier that guarantees this condition.

I don't quite understand what you mean "between source and target",
because *the* backup filter driver would need to be on top of both.
Unless you continue to use before-write notifiers, but I don't see why
when a driver that sits on top of source would intercept all requests
anyway.

And if we want to unify mirror and backup at any time, I think we have
to get away from before-write anyway.

>   All writes to the target that are made by the backup job in this setup
>   (including before_write_notifier writes) need to be marked as
>   serialising so that any concurrent reads are completed first.
> 
> And if we decide to add a target filter to backup, we should probably at
> the same time use a filter driver for intercepting source writes instead
> of using before_write_notifier.
> 
> Max, I think you intended to make both source and target children of the
> same block job node (or at least for mirror). But wouldn't that create
> loops in a setup like this? I think we may need two filters that are
> only connected through the block job, but not with block graph edges.

A node can only provide one data view to all of its parents.  It cannot
distinguish based on the parent.  Therefore, the main backup driver on
top of source and target would necessarily present the current disk
(changing) state, just like the mirror filter does.

Fleecing wants another view, though.  It wants an unchanging view.  It
therefore seems clear to me that we'd need to separate nodes: One
between guest and source (which performs the backup by intercepting
write requests), and one between the fleecing user (e.g. NBD) and the
target.  The former presents the current state, the latter presents the
"unchanging" point-in-time snapshot.

Sure, we can get away without the former (we do so today), but I don't
see why we would want to.  blockdev-copy (as an extension of
blockdev-mirror) would have such a node anyway, and I can see no reason
why we wouldn't attach both source and target to it.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-07-04 14:07 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-29 15:15 [Qemu-devel] [PATCH v2 0/3] image fleecing Vladimir Sementsov-Ogievskiy
2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source Vladimir Sementsov-Ogievskiy
2018-06-29 17:13   ` Eric Blake
2018-06-29 17:31   ` John Snow
2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing Vladimir Sementsov-Ogievskiy
2018-06-29 17:24   ` Eric Blake
2018-07-02  6:35     ` Fam Zheng
2018-07-02 11:27       ` Vladimir Sementsov-Ogievskiy
2018-07-02 11:47     ` Vladimir Sementsov-Ogievskiy
2018-06-29 17:30   ` John Snow
2018-06-29 17:40     ` Eric Blake
2018-07-02 12:09       ` Vladimir Sementsov-Ogievskiy
2018-07-03 11:15         ` Kevin Wolf
2018-07-03 11:52           ` Vladimir Sementsov-Ogievskiy
2018-07-03 16:11           ` Vladimir Sementsov-Ogievskiy
2018-07-03 18:02             ` Kevin Wolf
2018-07-04 14:07           ` Max Reitz
2018-07-02 11:57     ` Vladimir Sementsov-Ogievskiy
2018-07-03 11:22       ` Kevin Wolf
2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222 Vladimir Sementsov-Ogievskiy
2018-06-29 15:31   ` Vladimir Sementsov-Ogievskiy
2018-06-29 17:58   ` Eric Blake
2018-06-29 21:04     ` John Snow
2018-07-02  6:45       ` Fam Zheng
2018-07-02 12:58       ` Vladimir Sementsov-Ogievskiy
2018-06-29 16:38 ` [Qemu-devel] [PATCH v2 0/3] image fleecing John Snow
2018-06-29 17:36   ` Vladimir Sementsov-Ogievskiy
2018-06-29 17:52     ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.