All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/4] fix image fleecing
@ 2018-07-06 18:30 Vladimir Sementsov-Ogievskiy
  2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range Vladimir Sementsov-Ogievskiy
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-06 18:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pl, pbonzini, ronniesahlberg, famz, stefanha, mreitz, kwolf,
	jcody, jsnow, den, vsementsov

Hi all.

This fixes image fleecing scheme for 3.0, details are in 04 patch.
01 is a significant fix too.

v4:
01: fix copy_range architecture here too
02: rebase on 01
03: rebase

v3:
02: fix typo in assert, to fix build

v2:
01,02: new patches

03: - improve comment
    - fix assert in bdrv_aligned_pwritev
    - add asserts to not use the flag on read requests
    - support copy_range

04: - expand "detected cases" range 
    - s/fleecing/serialize_target_writes
    - support backup_cow_with_offload
    - drop restriction on compressed writes


v1 cover:

It's a continuation of discussion under
"[PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing" [1].

Here is my try to implement Kevin's idea, that all backup writes (for
fleecing case) should be serialized. (However, I've skipped for now
fixing related permissions).

Looks like these patches may replace patch [1], to make fleecing scheme
safe. But I'm not sure, a look by Kevin is necessary.

A test is still needed, to prove that this patch is necessary and that it
works..

Vladimir Sementsov-Ogievskiy (4):
  block/io: fix copy_range
  block: split flags in copy_range
  block: add BDRV_REQ_SERIALISING flag
  block/backup: fix fleecing scheme: use serialized writes

 include/block/block.h          |  31 ++++++++-
 include/block/block_int.h      |  14 ++--
 include/sysemu/block-backend.h |   3 +-
 block/backup.c                 |  20 ++++--
 block/block-backend.c          |   5 +-
 block/file-posix.c             |  21 +++---
 block/io.c                     | 147 ++++++++++++++++++++++++++++-------------
 block/iscsi.c                  |   9 ++-
 block/qcow2.c                  |  20 +++---
 block/raw-format.c             |  24 ++++---
 qemu-img.c                     |   2 +-
 11 files changed, 207 insertions(+), 89 deletions(-)

-- 
2.11.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range
  2018-07-06 18:30 [Qemu-devel] [PATCH v4 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
@ 2018-07-06 18:30 ` Vladimir Sementsov-Ogievskiy
  2018-07-09  1:15   ` Fam Zheng
  2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 2/4] block: split flags in copy_range Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-06 18:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pl, pbonzini, ronniesahlberg, famz, stefanha, mreitz, kwolf,
	jcody, jsnow, den, vsementsov

Here two things are fixed:

1. Architecture

On each recursion step, we go to the child of src or dst, only for one
of them. So, it's wrong to create tracked requests for both on each
step. It leads to tracked requests duplication.

2. Wait for serializing requests on write path independently of
   BDRV_REQ_NO_SERIALISING

Before commit 9ded4a01149 "backup: Use copy offloading",
BDRV_REQ_NO_SERIALISING was used for only one case: read in
copy-on-write operation during backup. Also, the flag was handled only
on read path (in bdrv_co_preadv and bdrv_aligned_preadv).

After 9ded4a01149, flag is used for not waiting serializing operations
on backup target (in same case of copy-on-write operation). This
behavior change is unsubstantiated and potentially dangerous, let's
drop it and add additional asserts and documentation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block.h |  13 +++++++
 block/io.c            | 103 +++++++++++++++++++++++++++++++-------------------
 2 files changed, 78 insertions(+), 38 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index e5c7759a0c..a06a4d27de 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -50,6 +50,19 @@ typedef enum {
      * opened with BDRV_O_UNMAP.
      */
     BDRV_REQ_MAY_UNMAP          = 0x4,
+
+    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
+     * wait_serialising_requests(), when reading.
+     *
+     * This flag is used for backup copy on write operation, when we need to
+     * read old data before write (write notifier triggered). It is ok, due to
+     * we already waited for serializing requests in initiative write (see
+     * bdrv_aligned_pwritev), and it is necessary for the case when initiative
+     * write is serializing itself (we'll dead lock waiting it).
+     *
+     * The described case is the only usage for the flag for now, so, it is
+     * supported only for read operation and restricted for write.
+     */
     BDRV_REQ_NO_SERIALISING     = 0x8,
     BDRV_REQ_FUA                = 0x10,
     BDRV_REQ_WRITE_COMPRESSED   = 0x20,
diff --git a/block/io.c b/block/io.c
index 1a2272fad3..621b21c455 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1572,6 +1572,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
     max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
                                    align);
 
+    /* BDRV_REQ_NO_SERIALISING is only for read operation */
+    assert(!(flags & BDRV_REQ_NO_SERIALISING));
     waited = wait_serialising_requests(req);
     assert(!waited || !req->serialising);
     assert(req->overlap_offset <= offset);
@@ -2888,15 +2890,19 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
     }
 }
 
-static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
-                                                    uint64_t src_offset,
-                                                    BdrvChild *dst,
-                                                    uint64_t dst_offset,
-                                                    uint64_t bytes,
-                                                    BdrvRequestFlags flags,
-                                                    bool recurse_src)
+/* Common part of bdrv_co_copy_range_from and bdrv_co_copy_range_to.
+ *
+ * Return -errno on failure,
+ *        0 if successfully handled by bdrv_co_pwrite_zeroes
+ *        1 to continue copy_range operation
+ */
+static int coroutine_fn bdrv_co_copy_range_check(BdrvChild *src,
+                                                 uint64_t src_offset,
+                                                 BdrvChild *dst,
+                                                 uint64_t dst_offset,
+                                                 uint64_t bytes,
+                                                 BdrvRequestFlags flags)
 {
-    BdrvTrackedRequest src_req, dst_req;
     int ret;
 
     if (!dst || !dst->bs) {
@@ -2923,33 +2929,8 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
         || src->bs->encrypted || dst->bs->encrypted) {
         return -ENOTSUP;
     }
-    bdrv_inc_in_flight(src->bs);
-    bdrv_inc_in_flight(dst->bs);
-    tracked_request_begin(&src_req, src->bs, src_offset,
-                          bytes, BDRV_TRACKED_READ);
-    tracked_request_begin(&dst_req, dst->bs, dst_offset,
-                          bytes, BDRV_TRACKED_WRITE);
 
-    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
-        wait_serialising_requests(&src_req);
-        wait_serialising_requests(&dst_req);
-    }
-    if (recurse_src) {
-        ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
-                                                    src, src_offset,
-                                                    dst, dst_offset,
-                                                    bytes, flags);
-    } else {
-        ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
-                                                  src, src_offset,
-                                                  dst, dst_offset,
-                                                  bytes, flags);
-    }
-    tracked_request_end(&src_req);
-    tracked_request_end(&dst_req);
-    bdrv_dec_in_flight(src->bs);
-    bdrv_dec_in_flight(dst->bs);
-    return ret;
+    return 1;
 }
 
 /* Copy range from @src to @dst.
@@ -2960,8 +2941,31 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
                                          BdrvChild *dst, uint64_t dst_offset,
                                          uint64_t bytes, BdrvRequestFlags flags)
 {
-    return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
-                                       bytes, flags, true);
+    BdrvTrackedRequest req;
+    int ret;
+
+    ret = bdrv_co_copy_range_check(src, src_offset, dst, dst_offset, bytes,
+                                   flags);
+    if (ret <= 0) {
+        return ret;
+    }
+
+    bdrv_inc_in_flight(src->bs);
+    tracked_request_begin(&req, src->bs, src_offset, bytes, BDRV_TRACKED_READ);
+
+    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
+        wait_serialising_requests(&req);
+    }
+
+    ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
+                                                src, src_offset,
+                                                dst, dst_offset,
+                                                bytes, flags);
+
+    tracked_request_end(&req);
+    bdrv_dec_in_flight(src->bs);
+
+    return ret;
 }
 
 /* Copy range from @src to @dst.
@@ -2972,8 +2976,31 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
                                        BdrvChild *dst, uint64_t dst_offset,
                                        uint64_t bytes, BdrvRequestFlags flags)
 {
-    return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
-                                       bytes, flags, false);
+    BdrvTrackedRequest req;
+    int ret;
+
+    ret = bdrv_co_copy_range_check(src, src_offset, dst, dst_offset, bytes,
+                                   flags);
+    if (ret <= 0) {
+        return ret;
+    }
+
+    bdrv_inc_in_flight(dst->bs);
+    tracked_request_begin(&req, dst->bs, dst_offset, bytes, BDRV_TRACKED_WRITE);
+
+    /* BDRV_REQ_NO_SERIALISING is only for read operation, so we ignore it in
+     * flags. */
+    wait_serialising_requests(&req);
+
+    ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
+                                              src, src_offset,
+                                              dst, dst_offset,
+                                              bytes, flags);
+
+    tracked_request_end(&req);
+    bdrv_dec_in_flight(dst->bs);
+
+    return ret;
 }
 
 int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v4 2/4] block: split flags in copy_range
  2018-07-06 18:30 [Qemu-devel] [PATCH v4 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
  2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range Vladimir Sementsov-Ogievskiy
@ 2018-07-06 18:30 ` Vladimir Sementsov-Ogievskiy
  2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 3/4] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-06 18:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pl, pbonzini, ronniesahlberg, famz, stefanha, mreitz, kwolf,
	jcody, jsnow, den, vsementsov

Pass read flags and write flags separately. This is needed to handle
coming BDRV_REQ_NO_SERIALISING clearly in following patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block.h          |  3 ++-
 include/block/block_int.h      | 14 ++++++++++----
 include/sysemu/block-backend.h |  3 ++-
 block/backup.c                 |  2 +-
 block/block-backend.c          |  5 +++--
 block/file-posix.c             | 21 +++++++++++++--------
 block/io.c                     | 34 ++++++++++++++++++++--------------
 block/iscsi.c                  |  9 ++++++---
 block/qcow2.c                  | 20 +++++++++++---------
 block/raw-format.c             | 24 ++++++++++++++++--------
 qemu-img.c                     |  2 +-
 11 files changed, 85 insertions(+), 52 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index a06a4d27de..478ebc6c6c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -685,5 +685,6 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host);
  **/
 int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
                                     BdrvChild *dst, uint64_t dst_offset,
-                                    uint64_t bytes, BdrvRequestFlags flags);
+                                    uint64_t bytes, BdrvRequestFlags read_flags,
+                                    BdrvRequestFlags write_flags);
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index af71b414be..119b6e4ea5 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -218,7 +218,8 @@ struct BlockDriver {
                                                 BdrvChild *dst,
                                                 uint64_t dst_offset,
                                                 uint64_t bytes,
-                                                BdrvRequestFlags flags);
+                                                BdrvRequestFlags read_flags,
+                                                BdrvRequestFlags write_flags);
 
     /* Map [offset, offset + nbytes) range onto a child of bs to copy data to,
      * and invoke bdrv_co_copy_range_to(child, src, ...), or perform the copy
@@ -234,7 +235,8 @@ struct BlockDriver {
                                               BdrvChild *dst,
                                               uint64_t dst_offset,
                                               uint64_t bytes,
-                                              BdrvRequestFlags flags);
+                                              BdrvRequestFlags read_flags,
+                                              BdrvRequestFlags write_flags);
 
     /*
      * Building block for bdrv_block_status[_above] and
@@ -1153,10 +1155,14 @@ void blockdev_close_all_bdrv_states(void);
 
 int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
                                          BdrvChild *dst, uint64_t dst_offset,
-                                         uint64_t bytes, BdrvRequestFlags flags);
+                                         uint64_t bytes,
+                                         BdrvRequestFlags read_flags,
+                                         BdrvRequestFlags write_flags);
 int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
                                        BdrvChild *dst, uint64_t dst_offset,
-                                       uint64_t bytes, BdrvRequestFlags flags);
+                                       uint64_t bytes,
+                                       BdrvRequestFlags read_flags,
+                                       BdrvRequestFlags write_flags);
 
 int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8d03d493c2..830d873f24 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -234,6 +234,7 @@ void blk_unregister_buf(BlockBackend *blk, void *host);
 
 int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
                                    BlockBackend *blk_out, int64_t off_out,
-                                   int bytes, BdrvRequestFlags flags);
+                                   int bytes, BdrvRequestFlags read_flags,
+                                   BdrvRequestFlags write_flags);
 
 #endif
diff --git a/block/backup.c b/block/backup.c
index 81895ddbe2..f3e4e814b6 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -163,7 +163,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     hbitmap_reset(job->copy_bitmap, start / job->cluster_size,
                   nr_clusters);
     ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
-                            is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
+                            is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0);
     if (ret < 0) {
         trace_backup_do_cow_copy_range_fail(job, start, ret);
         hbitmap_set(job->copy_bitmap, start / job->cluster_size,
diff --git a/block/block-backend.c b/block/block-backend.c
index 6b75bca317..ac8c3e0b1c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2218,7 +2218,8 @@ void blk_unregister_buf(BlockBackend *blk, void *host)
 
 int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
                                    BlockBackend *blk_out, int64_t off_out,
-                                   int bytes, BdrvRequestFlags flags)
+                                   int bytes, BdrvRequestFlags read_flags,
+                                   BdrvRequestFlags write_flags)
 {
     int r;
     r = blk_check_byte_request(blk_in, off_in, bytes);
@@ -2231,5 +2232,5 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
     }
     return bdrv_co_copy_range(blk_in->root, off_in,
                               blk_out->root, off_out,
-                              bytes, flags);
+                              bytes, read_flags, write_flags);
 }
diff --git a/block/file-posix.c b/block/file-posix.c
index 829ee538d8..2f753f0856 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2578,18 +2578,23 @@ static void raw_abort_perm_update(BlockDriverState *bs)
     raw_handle_perm_lock(bs, RAW_PL_ABORT, 0, 0, NULL);
 }
 
-static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
-                                               BdrvChild *src, uint64_t src_offset,
-                                               BdrvChild *dst, uint64_t dst_offset,
-                                               uint64_t bytes, BdrvRequestFlags flags)
+static int coroutine_fn raw_co_copy_range_from(
+        BlockDriverState *bs, BdrvChild *src, uint64_t src_offset,
+        BdrvChild *dst, uint64_t dst_offset, uint64_t bytes,
+        BdrvRequestFlags read_flags, BdrvRequestFlags write_flags)
 {
-    return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags);
+    return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes,
+                                 read_flags, write_flags);
 }
 
 static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
-                                             BdrvChild *src, uint64_t src_offset,
-                                             BdrvChild *dst, uint64_t dst_offset,
-                                             uint64_t bytes, BdrvRequestFlags flags)
+                                             BdrvChild *src,
+                                             uint64_t src_offset,
+                                             BdrvChild *dst,
+                                             uint64_t dst_offset,
+                                             uint64_t bytes,
+                                             BdrvRequestFlags read_flags,
+                                             BdrvRequestFlags write_flags)
 {
     BDRVRawState *s = bs->opaque;
     BDRVRawState *src_s;
diff --git a/block/io.c b/block/io.c
index 621b21c455..ba5a136d11 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2901,7 +2901,8 @@ static int coroutine_fn bdrv_co_copy_range_check(BdrvChild *src,
                                                  BdrvChild *dst,
                                                  uint64_t dst_offset,
                                                  uint64_t bytes,
-                                                 BdrvRequestFlags flags)
+                                                 BdrvRequestFlags read_flags,
+                                                 BdrvRequestFlags write_flags)
 {
     int ret;
 
@@ -2912,8 +2913,8 @@ static int coroutine_fn bdrv_co_copy_range_check(BdrvChild *src,
     if (ret) {
         return ret;
     }
-    if (flags & BDRV_REQ_ZERO_WRITE) {
-        return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, flags);
+    if (write_flags & BDRV_REQ_ZERO_WRITE) {
+        return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, write_flags);
     }
 
     if (!src || !src->bs) {
@@ -2939,13 +2940,15 @@ static int coroutine_fn bdrv_co_copy_range_check(BdrvChild *src,
  * semantics. */
 int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
                                          BdrvChild *dst, uint64_t dst_offset,
-                                         uint64_t bytes, BdrvRequestFlags flags)
+                                         uint64_t bytes,
+                                         BdrvRequestFlags read_flags,
+                                         BdrvRequestFlags write_flags)
 {
     BdrvTrackedRequest req;
     int ret;
 
     ret = bdrv_co_copy_range_check(src, src_offset, dst, dst_offset, bytes,
-                                   flags);
+                                   read_flags, write_flags);
     if (ret <= 0) {
         return ret;
     }
@@ -2953,14 +2956,14 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
     bdrv_inc_in_flight(src->bs);
     tracked_request_begin(&req, src->bs, src_offset, bytes, BDRV_TRACKED_READ);
 
-    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
+    if (!(read_flags & BDRV_REQ_NO_SERIALISING)) {
         wait_serialising_requests(&req);
     }
 
     ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
                                                 src, src_offset,
                                                 dst, dst_offset,
-                                                bytes, flags);
+                                                bytes, read_flags, write_flags);
 
     tracked_request_end(&req);
     bdrv_dec_in_flight(src->bs);
@@ -2974,13 +2977,15 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
  * semantics. */
 int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
                                        BdrvChild *dst, uint64_t dst_offset,
-                                       uint64_t bytes, BdrvRequestFlags flags)
+                                       uint64_t bytes,
+                                       BdrvRequestFlags read_flags,
+                                       BdrvRequestFlags write_flags)
 {
     BdrvTrackedRequest req;
     int ret;
 
     ret = bdrv_co_copy_range_check(src, src_offset, dst, dst_offset, bytes,
-                                   flags);
+                                   read_flags, write_flags);
     if (ret <= 0) {
         return ret;
     }
@@ -2988,14 +2993,14 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
     bdrv_inc_in_flight(dst->bs);
     tracked_request_begin(&req, dst->bs, dst_offset, bytes, BDRV_TRACKED_WRITE);
 
-    /* BDRV_REQ_NO_SERIALISING is only for read operation, so we ignore it in
-     * flags. */
+    /* BDRV_REQ_NO_SERIALISING is only for read operation */
+    assert(!(write_flags & BDRV_REQ_NO_SERIALISING));
     wait_serialising_requests(&req);
 
     ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
                                               src, src_offset,
                                               dst, dst_offset,
-                                              bytes, flags);
+                                              bytes, read_flags, write_flags);
 
     tracked_request_end(&req);
     bdrv_dec_in_flight(dst->bs);
@@ -3005,11 +3010,12 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
 
 int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
                                     BdrvChild *dst, uint64_t dst_offset,
-                                    uint64_t bytes, BdrvRequestFlags flags)
+                                    uint64_t bytes, BdrvRequestFlags read_flags,
+                                    BdrvRequestFlags write_flags)
 {
     return bdrv_co_copy_range_from(src, src_offset,
                                    dst, dst_offset,
-                                   bytes, flags);
+                                   bytes, read_flags, write_flags);
 }
 
 static void bdrv_parent_cb_resize(BlockDriverState *bs)
diff --git a/block/iscsi.c b/block/iscsi.c
index ead2bd5aa7..38b917a1e5 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2193,9 +2193,11 @@ static int coroutine_fn iscsi_co_copy_range_from(BlockDriverState *bs,
                                                  BdrvChild *dst,
                                                  uint64_t dst_offset,
                                                  uint64_t bytes,
-                                                 BdrvRequestFlags flags)
+                                                 BdrvRequestFlags read_flags,
+                                                 BdrvRequestFlags write_flags)
 {
-    return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags);
+    return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes,
+                                 read_flags, write_flags);
 }
 
 static struct scsi_task *iscsi_xcopy_task(int param_len)
@@ -2332,7 +2334,8 @@ static int coroutine_fn iscsi_co_copy_range_to(BlockDriverState *bs,
                                                BdrvChild *dst,
                                                uint64_t dst_offset,
                                                uint64_t bytes,
-                                               BdrvRequestFlags flags)
+                                               BdrvRequestFlags read_flags,
+                                               BdrvRequestFlags write_flags)
 {
     IscsiLun *dst_lun = dst->bs->opaque;
     IscsiLun *src_lun;
diff --git a/block/qcow2.c b/block/qcow2.c
index 2f9e58e0c4..6855356fb9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3246,13 +3246,14 @@ static int coroutine_fn
 qcow2_co_copy_range_from(BlockDriverState *bs,
                          BdrvChild *src, uint64_t src_offset,
                          BdrvChild *dst, uint64_t dst_offset,
-                         uint64_t bytes, BdrvRequestFlags flags)
+                         uint64_t bytes, BdrvRequestFlags read_flags,
+                         BdrvRequestFlags write_flags)
 {
     BDRVQcow2State *s = bs->opaque;
     int ret;
     unsigned int cur_bytes; /* number of bytes in current iteration */
     BdrvChild *child = NULL;
-    BdrvRequestFlags cur_flags;
+    BdrvRequestFlags cur_write_flags;
 
     assert(!bs->encrypted);
     qemu_co_mutex_lock(&s->lock);
@@ -3261,7 +3262,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
         uint64_t copy_offset = 0;
         /* prepare next request */
         cur_bytes = MIN(bytes, INT_MAX);
-        cur_flags = flags;
+        cur_write_flags = write_flags;
 
         ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes, &copy_offset);
         if (ret < 0) {
@@ -3273,20 +3274,20 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
             if (bs->backing && bs->backing->bs) {
                 int64_t backing_length = bdrv_getlength(bs->backing->bs);
                 if (src_offset >= backing_length) {
-                    cur_flags |= BDRV_REQ_ZERO_WRITE;
+                    cur_write_flags |= BDRV_REQ_ZERO_WRITE;
                 } else {
                     child = bs->backing;
                     cur_bytes = MIN(cur_bytes, backing_length - src_offset);
                     copy_offset = src_offset;
                 }
             } else {
-                cur_flags |= BDRV_REQ_ZERO_WRITE;
+                cur_write_flags |= BDRV_REQ_ZERO_WRITE;
             }
             break;
 
         case QCOW2_CLUSTER_ZERO_PLAIN:
         case QCOW2_CLUSTER_ZERO_ALLOC:
-            cur_flags |= BDRV_REQ_ZERO_WRITE;
+            cur_write_flags |= BDRV_REQ_ZERO_WRITE;
             break;
 
         case QCOW2_CLUSTER_COMPRESSED:
@@ -3310,7 +3311,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
         ret = bdrv_co_copy_range_from(child,
                                       copy_offset,
                                       dst, dst_offset,
-                                      cur_bytes, cur_flags);
+                                      cur_bytes, read_flags, cur_write_flags);
         qemu_co_mutex_lock(&s->lock);
         if (ret < 0) {
             goto out;
@@ -3331,7 +3332,8 @@ static int coroutine_fn
 qcow2_co_copy_range_to(BlockDriverState *bs,
                        BdrvChild *src, uint64_t src_offset,
                        BdrvChild *dst, uint64_t dst_offset,
-                       uint64_t bytes, BdrvRequestFlags flags)
+                       uint64_t bytes, BdrvRequestFlags read_flags,
+                       BdrvRequestFlags write_flags)
 {
     BDRVQcow2State *s = bs->opaque;
     int offset_in_cluster;
@@ -3375,7 +3377,7 @@ qcow2_co_copy_range_to(BlockDriverState *bs,
         ret = bdrv_co_copy_range_to(src, src_offset,
                                     bs->file,
                                     cluster_offset + offset_in_cluster,
-                                    cur_bytes, flags);
+                                    cur_bytes, read_flags, write_flags);
         qemu_co_mutex_lock(&s->lock);
         if (ret < 0) {
             goto fail;
diff --git a/block/raw-format.c b/block/raw-format.c
index b78da564d4..a3591985f6 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -498,9 +498,13 @@ static int raw_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
 }
 
 static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
-                                               BdrvChild *src, uint64_t src_offset,
-                                               BdrvChild *dst, uint64_t dst_offset,
-                                               uint64_t bytes, BdrvRequestFlags flags)
+                                               BdrvChild *src,
+                                               uint64_t src_offset,
+                                               BdrvChild *dst,
+                                               uint64_t dst_offset,
+                                               uint64_t bytes,
+                                               BdrvRequestFlags read_flags,
+                                               BdrvRequestFlags write_flags)
 {
     int ret;
 
@@ -509,13 +513,17 @@ static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
         return ret;
     }
     return bdrv_co_copy_range_from(bs->file, src_offset, dst, dst_offset,
-                                   bytes, flags);
+                                   bytes, read_flags, write_flags);
 }
 
 static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
-                                             BdrvChild *src, uint64_t src_offset,
-                                             BdrvChild *dst, uint64_t dst_offset,
-                                             uint64_t bytes, BdrvRequestFlags flags)
+                                             BdrvChild *src,
+                                             uint64_t src_offset,
+                                             BdrvChild *dst,
+                                             uint64_t dst_offset,
+                                             uint64_t bytes,
+                                             BdrvRequestFlags read_flags,
+                                             BdrvRequestFlags write_flags)
 {
     int ret;
 
@@ -524,7 +532,7 @@ static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
         return ret;
     }
     return bdrv_co_copy_range_to(src, src_offset, bs->file, dst_offset, bytes,
-                                 flags);
+                                 read_flags, write_flags);
 }
 
 BlockDriver bdrv_raw = {
diff --git a/qemu-img.c b/qemu-img.c
index e1a506f7f6..e417c872db 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1783,7 +1783,7 @@ static int coroutine_fn convert_co_copy_range(ImgConvertState *s, int64_t sector
 
         ret = blk_co_copy_range(blk, offset, s->target,
                                 sector_num << BDRV_SECTOR_BITS,
-                                n << BDRV_SECTOR_BITS, 0);
+                                n << BDRV_SECTOR_BITS, 0, 0);
         if (ret < 0) {
             return ret;
         }
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v4 3/4] block: add BDRV_REQ_SERIALISING flag
  2018-07-06 18:30 [Qemu-devel] [PATCH v4 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
  2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range Vladimir Sementsov-Ogievskiy
  2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 2/4] block: split flags in copy_range Vladimir Sementsov-Ogievskiy
@ 2018-07-06 18:30 ` Vladimir Sementsov-Ogievskiy
  2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 4/4] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
  2018-07-06 21:55 ` [Qemu-devel] [PATCH v4 0/4] fix image fleecing Eric Blake
  4 siblings, 0 replies; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-06 18:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pl, pbonzini, ronniesahlberg, famz, stefanha, mreitz, kwolf,
	jcody, jsnow, den, vsementsov

Serialized writes should be used in copy-on-write of backup(sync=none)
for image fleecing scheme.

We need to change an assert in bdrv_aligned_pwritev, added in
28de2dcd88de. The assert may fail now, because call to
wait_serialising_requests here may become first call to it for this
request with serializing flag set. It occurs if the request is aligned
(otherwise, we should already set serializing flag before calling
bdrv_aligned_pwritev and correspondingly waited for all intersecting
requests). However, for aligned requests, we should not care about
outdating of previously read data, as there no such data. Therefore,
let's just update an assert to not care about aligned requests.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block.h | 15 ++++++++++++++-
 block/io.c            | 26 +++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 478ebc6c6c..fded1b7657 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -71,8 +71,21 @@ typedef enum {
      * content. */
     BDRV_REQ_WRITE_UNCHANGED    = 0x40,
 
+    /* BDRV_REQ_SERIALISING forces request serializing. Only for writes. Used
+     * to serialize writes to target in backup process, when source is in
+     * backing chain of target (image fleecing scheme is example) to avoid a
+     * possibility for a client, reading from target during backup to read
+     * updated data from source in case of unhappy race of client-read and
+     * backup-cow-write.
+     *
+     * Note, that BDRV_REQ_SERIALISING is _not_ opposite in meaning to
+     * BDRV_REQ_NO_SERIALISING. May be, better name for the latter is
+     * _DO_NOT_WAIT_FOR_SERIALISING, but it is too long.
+     */
+    BDRV_REQ_SERIALISING        = 0x80,
+
     /* Mask of valid flags */
-    BDRV_REQ_MASK               = 0x7f,
+    BDRV_REQ_MASK               = 0xff,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/block/io.c b/block/io.c
index ba5a136d11..ecb35881d0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -623,6 +623,16 @@ static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
     req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes);
 }
 
+static bool is_request_serialising_and_aligned(BdrvTrackedRequest *req)
+{
+    /* if request is serialising, overlap_offset and overlap_bytes are set, so
+     * we can check is request aligned. Otherwise don't care and return false
+     */
+
+    return req->serialising && (req->offset == req->overlap_offset) &&
+           (req->bytes == req->overlap_bytes);
+}
+
 /**
  * Round a region to cluster boundaries
  */
@@ -1291,6 +1301,9 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
         mark_request_serialising(req, bdrv_get_cluster_size(bs));
     }
 
+    /* BDRV_REQ_SERIALISING is only for write operation */
+    assert(!(flags & BDRV_REQ_SERIALISING));
+
     if (!(flags & BDRV_REQ_NO_SERIALISING)) {
         wait_serialising_requests(req);
     }
@@ -1574,8 +1587,14 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
 
     /* BDRV_REQ_NO_SERIALISING is only for read operation */
     assert(!(flags & BDRV_REQ_NO_SERIALISING));
+
+    if (flags & BDRV_REQ_SERIALISING) {
+        mark_request_serialising(req, bdrv_get_cluster_size(bs));
+    }
+
     waited = wait_serialising_requests(req);
-    assert(!waited || !req->serialising);
+    assert(!waited || !req->serialising ||
+           is_request_serialising_and_aligned(req));
     assert(req->overlap_offset <= offset);
     assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
     if (flags & BDRV_REQ_WRITE_UNCHANGED) {
@@ -2956,6 +2975,8 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
     bdrv_inc_in_flight(src->bs);
     tracked_request_begin(&req, src->bs, src_offset, bytes, BDRV_TRACKED_READ);
 
+    /* BDRV_REQ_SERIALISING is only for write operation */
+    assert(!(read_flags & BDRV_REQ_SERIALISING));
     if (!(read_flags & BDRV_REQ_NO_SERIALISING)) {
         wait_serialising_requests(&req);
     }
@@ -2995,6 +3016,9 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
 
     /* BDRV_REQ_NO_SERIALISING is only for read operation */
     assert(!(write_flags & BDRV_REQ_NO_SERIALISING));
+    if (write_flags & BDRV_REQ_SERIALISING) {
+        mark_request_serialising(&req, bdrv_get_cluster_size(dst->bs));
+    }
     wait_serialising_requests(&req);
 
     ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v4 4/4] block/backup: fix fleecing scheme: use serialized writes
  2018-07-06 18:30 [Qemu-devel] [PATCH v4 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 3/4] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
@ 2018-07-06 18:30 ` Vladimir Sementsov-Ogievskiy
  2018-07-06 21:55 ` [Qemu-devel] [PATCH v4 0/4] fix image fleecing Eric Blake
  4 siblings, 0 replies; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-06 18:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pl, pbonzini, ronniesahlberg, famz, stefanha, mreitz, kwolf,
	jcody, jsnow, den, vsementsov

Fleecing scheme works as follows: we want a kind of temporary snapshot
of active drive A. We create temporary image B, with B->backing = A.
Then we start backup(sync=none) from A to B. From this point, B reads
as point-in-time snapshot of A (A continues to be active drive,
accepting guest IO).

This scheme needs some additional synchronization between reads from B
and backup COW operations, otherwise, the following situation is
theoretically possible:

(assume B is qcow2, client is NBD client, reading from B)

1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and
   goes up to l2 table loading (assume cache miss)

2) guest write => backup COW => qcow2 write =>
   try to take qcow2 mutex => waiting

3. l2 table loaded, we see that cluster is UNALLOCATED, go to
   "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before
   bdrv_co_preadv(bs->backing, ...)

4) aha, mutex unlocked, backup COW continues, and we finally finish
   guest write and change cluster in our active disk A

5. actually, do bdrv_co_preadv(bs->backing, ...) and read
   _new updated_ data.

To avoid this, let's make backup writes serializing, to not intersect
with reads from B.

Note: we expand range of handled cases from (sync=none and
B->backing = A) to just (A in backing chain of B), to finally allow
safe reading from B during backup for all cases when A in backing chain
of B, i.e. B formally looks like point-in-time snapshot of A.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index f3e4e814b6..319fc922e8 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -47,6 +47,8 @@ typedef struct BackupBlockJob {
     HBitmap *copy_bitmap;
     bool use_copy_range;
     int64_t copy_range_size;
+
+    bool serialize_target_writes;
 } BackupBlockJob;
 
 static const BlockJobDriver backup_job_driver;
@@ -102,6 +104,8 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
     QEMUIOVector qiov;
     BlockBackend *blk = job->common.blk;
     int nbytes;
+    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
+    int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
     hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1);
     nbytes = MIN(job->cluster_size, job->len - start);
@@ -112,8 +116,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
     iov.iov_len = nbytes;
     qemu_iovec_init_external(&qiov, &iov, 1);
 
-    ret = blk_co_preadv(blk, start, qiov.size, &qiov,
-                        is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
+    ret = blk_co_preadv(blk, start, qiov.size, &qiov, read_flags);
     if (ret < 0) {
         trace_backup_do_cow_read_fail(job, start, ret);
         if (error_is_read) {
@@ -124,11 +127,11 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
 
     if (qemu_iovec_is_zero(&qiov)) {
         ret = blk_co_pwrite_zeroes(job->target, start,
-                                   qiov.size, BDRV_REQ_MAY_UNMAP);
+                                   qiov.size, write_flags | BDRV_REQ_MAY_UNMAP);
     } else {
         ret = blk_co_pwritev(job->target, start,
-                             qiov.size, &qiov,
-                             job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+                             qiov.size, &qiov, write_flags |
+                             (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));
     }
     if (ret < 0) {
         trace_backup_do_cow_write_fail(job, start, ret);
@@ -156,6 +159,8 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     int nr_clusters;
     BlockBackend *blk = job->common.blk;
     int nbytes;
+    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
+    int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
     assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
     nbytes = MIN(job->copy_range_size, end - start);
@@ -163,7 +168,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     hbitmap_reset(job->copy_bitmap, start / job->cluster_size,
                   nr_clusters);
     ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
-                            is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0);
+                            read_flags, write_flags);
     if (ret < 0) {
         trace_backup_do_cow_copy_range_fail(job, start, ret);
         hbitmap_set(job->copy_bitmap, start / job->cluster_size,
@@ -701,6 +706,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                        sync_bitmap : NULL;
     job->compress = compress;
 
+    /* Detect image-fleecing (and similar) schemes */
+    job->serialize_target_writes = bdrv_chain_contains(target, bs);
+
     /* If there is no backing file on the target, we cannot rely on COW if our
      * backup cluster size is smaller than the target cluster size. Even for
      * targets with a backing file, try to avoid COW if possible. */
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/4] fix image fleecing
  2018-07-06 18:30 [Qemu-devel] [PATCH v4 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 4/4] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
@ 2018-07-06 21:55 ` Eric Blake
  4 siblings, 0 replies; 11+ messages in thread
From: Eric Blake @ 2018-07-06 21:55 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, famz, ronniesahlberg, jcody, pl, mreitz, stefanha, den,
	pbonzini, jsnow

On 07/06/2018 01:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all.
> 
> This fixes image fleecing scheme for 3.0, details are in 04 patch.
> 01 is a significant fix too.
> 
> v4:
> 01: fix copy_range architecture here too
> 02: rebase on 01
> 03: rebase

Hmm - I commented on v3 before seeing you'd already sent v4; my comments 
on grammar suggestions are still applicable.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range
  2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range Vladimir Sementsov-Ogievskiy
@ 2018-07-09  1:15   ` Fam Zheng
  2018-07-09  9:43     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2018-07-09  1:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, pl, pbonzini, ronniesahlberg, stefanha,
	mreitz, kwolf, jcody, jsnow, den

On Fri, 07/06 21:30, Vladimir Sementsov-Ogievskiy wrote:
> Here two things are fixed:
> 
> 1. Architecture
> 
> On each recursion step, we go to the child of src or dst, only for one
> of them. So, it's wrong to create tracked requests for both on each
> step. It leads to tracked requests duplication.
> 
> 2. Wait for serializing requests on write path independently of
>    BDRV_REQ_NO_SERIALISING
> 
> Before commit 9ded4a01149 "backup: Use copy offloading",
> BDRV_REQ_NO_SERIALISING was used for only one case: read in
> copy-on-write operation during backup. Also, the flag was handled only
> on read path (in bdrv_co_preadv and bdrv_aligned_preadv).
> 
> After 9ded4a01149, flag is used for not waiting serializing operations
> on backup target (in same case of copy-on-write operation). This
> behavior change is unsubstantiated and potentially dangerous, let's
> drop it and add additional asserts and documentation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/block.h |  13 +++++++
>  block/io.c            | 103 +++++++++++++++++++++++++++++++-------------------
>  2 files changed, 78 insertions(+), 38 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index e5c7759a0c..a06a4d27de 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -50,6 +50,19 @@ typedef enum {
>       * opened with BDRV_O_UNMAP.
>       */
>      BDRV_REQ_MAY_UNMAP          = 0x4,
> +
> +    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
> +     * wait_serialising_requests(), when reading.
> +     *
> +     * This flag is used for backup copy on write operation, when we need to
> +     * read old data before write (write notifier triggered). It is ok, due to
> +     * we already waited for serializing requests in initiative write (see
> +     * bdrv_aligned_pwritev), and it is necessary for the case when initiative
> +     * write is serializing itself (we'll dead lock waiting it).
> +     *
> +     * The described case is the only usage for the flag for now, so, it is
> +     * supported only for read operation and restricted for write.
> +     */
>      BDRV_REQ_NO_SERIALISING     = 0x8,
>      BDRV_REQ_FUA                = 0x10,
>      BDRV_REQ_WRITE_COMPRESSED   = 0x20,
> diff --git a/block/io.c b/block/io.c
> index 1a2272fad3..621b21c455 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1572,6 +1572,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>      max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
>                                     align);
>  
> +    /* BDRV_REQ_NO_SERIALISING is only for read operation */
> +    assert(!(flags & BDRV_REQ_NO_SERIALISING));
>      waited = wait_serialising_requests(req);
>      assert(!waited || !req->serialising);
>      assert(req->overlap_offset <= offset);
> @@ -2888,15 +2890,19 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
>      }
>  }
>  
> -static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
> -                                                    uint64_t src_offset,
> -                                                    BdrvChild *dst,
> -                                                    uint64_t dst_offset,
> -                                                    uint64_t bytes,
> -                                                    BdrvRequestFlags flags,
> -                                                    bool recurse_src)
> +/* Common part of bdrv_co_copy_range_from and bdrv_co_copy_range_to.
> + *
> + * Return -errno on failure,
> + *        0 if successfully handled by bdrv_co_pwrite_zeroes
> + *        1 to continue copy_range operation
> + */
> +static int coroutine_fn bdrv_co_copy_range_check(BdrvChild *src,
> +                                                 uint64_t src_offset,
> +                                                 BdrvChild *dst,
> +                                                 uint64_t dst_offset,
> +                                                 uint64_t bytes,
> +                                                 BdrvRequestFlags flags)
>  {
> -    BdrvTrackedRequest src_req, dst_req;
>      int ret;
>  
>      if (!dst || !dst->bs) {
> @@ -2923,33 +2929,8 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
>          || src->bs->encrypted || dst->bs->encrypted) {
>          return -ENOTSUP;
>      }
> -    bdrv_inc_in_flight(src->bs);
> -    bdrv_inc_in_flight(dst->bs);
> -    tracked_request_begin(&src_req, src->bs, src_offset,
> -                          bytes, BDRV_TRACKED_READ);
> -    tracked_request_begin(&dst_req, dst->bs, dst_offset,
> -                          bytes, BDRV_TRACKED_WRITE);
>  
> -    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
> -        wait_serialising_requests(&src_req);
> -        wait_serialising_requests(&dst_req);
> -    }
> -    if (recurse_src) {
> -        ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
> -                                                    src, src_offset,
> -                                                    dst, dst_offset,
> -                                                    bytes, flags);
> -    } else {
> -        ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
> -                                                  src, src_offset,
> -                                                  dst, dst_offset,
> -                                                  bytes, flags);
> -    }
> -    tracked_request_end(&src_req);
> -    tracked_request_end(&dst_req);
> -    bdrv_dec_in_flight(src->bs);
> -    bdrv_dec_in_flight(dst->bs);
> -    return ret;
> +    return 1;
>  }
>  
>  /* Copy range from @src to @dst.
> @@ -2960,8 +2941,31 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
>                                           BdrvChild *dst, uint64_t dst_offset,
>                                           uint64_t bytes, BdrvRequestFlags flags)
>  {
> -    return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
> -                                       bytes, flags, true);
> +    BdrvTrackedRequest req;
> +    int ret;
> +
> +    ret = bdrv_co_copy_range_check(src, src_offset, dst, dst_offset, bytes,
> +                                   flags);

I don't like a function called _check to already do I/O here. Instead, I think
this is cleaner:

---


diff --git a/block/io.c b/block/io.c
index 1a2272fad3..694a94dfae 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2923,32 +2923,34 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
         || src->bs->encrypted || dst->bs->encrypted) {
         return -ENOTSUP;
     }
-    bdrv_inc_in_flight(src->bs);
-    bdrv_inc_in_flight(dst->bs);
-    tracked_request_begin(&src_req, src->bs, src_offset,
-                          bytes, BDRV_TRACKED_READ);
-    tracked_request_begin(&dst_req, dst->bs, dst_offset,
-                          bytes, BDRV_TRACKED_WRITE);
 
-    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
-        wait_serialising_requests(&src_req);
-        wait_serialising_requests(&dst_req);
-    }
     if (recurse_src) {
+        bdrv_inc_in_flight(src->bs);
+        tracked_request_begin(&src_req, src->bs, src_offset,
+                              bytes, BDRV_TRACKED_READ);
+        if (!(flags & BDRV_REQ_NO_SERIALISING)) {
+            wait_serialising_requests(&src_req);
+        }
         ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
                                                     src, src_offset,
                                                     dst, dst_offset,
                                                     bytes, flags);
+        tracked_request_end(&src_req);
+        bdrv_dec_in_flight(src->bs);
     } else {
+        bdrv_inc_in_flight(dst->bs);
+        tracked_request_begin(&dst_req, dst->bs, dst_offset,
+                              bytes, BDRV_TRACKED_WRITE);
+        /* BDRV_REQ_NO_SERIALISING is only for read operation, so we ignore it
+         * in flags. */
+        wait_serialising_requests(&dst_req);
         ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
                                                   src, src_offset,
                                                   dst, dst_offset,
                                                   bytes, flags);
+        tracked_request_end(&dst_req);
+        bdrv_dec_in_flight(dst->bs);
     }
-    tracked_request_end(&src_req);
-    tracked_request_end(&dst_req);
-    bdrv_dec_in_flight(src->bs);
-    bdrv_dec_in_flight(dst->bs);
     return ret;
 }
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range
  2018-07-09  1:15   ` Fam Zheng
@ 2018-07-09  9:43     ` Vladimir Sementsov-Ogievskiy
  2018-07-09 13:17       ` Fam Zheng
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-09  9:43 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, qemu-block, pl, pbonzini, ronniesahlberg, stefanha,
	mreitz, kwolf, jcody, jsnow, den

09.07.2018 04:15, Fam Zheng wrote:
> On Fri, 07/06 21:30, Vladimir Sementsov-Ogievskiy wrote:
>> Here two things are fixed:
>>
>> 1. Architecture
>>
>> On each recursion step, we go to the child of src or dst, only for one
>> of them. So, it's wrong to create tracked requests for both on each
>> step. It leads to tracked requests duplication.
>>
>> 2. Wait for serializing requests on write path independently of
>>     BDRV_REQ_NO_SERIALISING
>>
>> Before commit 9ded4a01149 "backup: Use copy offloading",
>> BDRV_REQ_NO_SERIALISING was used for only one case: read in
>> copy-on-write operation during backup. Also, the flag was handled only
>> on read path (in bdrv_co_preadv and bdrv_aligned_preadv).
>>
>> After 9ded4a01149, flag is used for not waiting serializing operations
>> on backup target (in same case of copy-on-write operation). This
>> behavior change is unsubstantiated and potentially dangerous, let's
>> drop it and add additional asserts and documentation.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   include/block/block.h |  13 +++++++
>>   block/io.c            | 103 +++++++++++++++++++++++++++++++-------------------
>>   2 files changed, 78 insertions(+), 38 deletions(-)
>>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index e5c7759a0c..a06a4d27de 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -50,6 +50,19 @@ typedef enum {
>>        * opened with BDRV_O_UNMAP.
>>        */
>>       BDRV_REQ_MAY_UNMAP          = 0x4,
>> +
>> +    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
>> +     * wait_serialising_requests(), when reading.
>> +     *
>> +     * This flag is used for backup copy on write operation, when we need to
>> +     * read old data before write (write notifier triggered). It is ok, due to
>> +     * we already waited for serializing requests in initiative write (see
>> +     * bdrv_aligned_pwritev), and it is necessary for the case when initiative
>> +     * write is serializing itself (we'll dead lock waiting it).
>> +     *
>> +     * The described case is the only usage for the flag for now, so, it is
>> +     * supported only for read operation and restricted for write.
>> +     */
>>       BDRV_REQ_NO_SERIALISING     = 0x8,
>>       BDRV_REQ_FUA                = 0x10,
>>       BDRV_REQ_WRITE_COMPRESSED   = 0x20,
>> diff --git a/block/io.c b/block/io.c
>> index 1a2272fad3..621b21c455 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1572,6 +1572,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>>       max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
>>                                      align);
>>   
>> +    /* BDRV_REQ_NO_SERIALISING is only for read operation */
>> +    assert(!(flags & BDRV_REQ_NO_SERIALISING));
>>       waited = wait_serialising_requests(req);
>>       assert(!waited || !req->serialising);
>>       assert(req->overlap_offset <= offset);
>> @@ -2888,15 +2890,19 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
>>       }
>>   }
>>   
>> -static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
>> -                                                    uint64_t src_offset,
>> -                                                    BdrvChild *dst,
>> -                                                    uint64_t dst_offset,
>> -                                                    uint64_t bytes,
>> -                                                    BdrvRequestFlags flags,
>> -                                                    bool recurse_src)
>> +/* Common part of bdrv_co_copy_range_from and bdrv_co_copy_range_to.
>> + *
>> + * Return -errno on failure,
>> + *        0 if successfully handled by bdrv_co_pwrite_zeroes
>> + *        1 to continue copy_range operation
>> + */
>> +static int coroutine_fn bdrv_co_copy_range_check(BdrvChild *src,
>> +                                                 uint64_t src_offset,
>> +                                                 BdrvChild *dst,
>> +                                                 uint64_t dst_offset,
>> +                                                 uint64_t bytes,
>> +                                                 BdrvRequestFlags flags)
>>   {
>> -    BdrvTrackedRequest src_req, dst_req;
>>       int ret;
>>   
>>       if (!dst || !dst->bs) {
>> @@ -2923,33 +2929,8 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
>>           || src->bs->encrypted || dst->bs->encrypted) {
>>           return -ENOTSUP;
>>       }
>> -    bdrv_inc_in_flight(src->bs);
>> -    bdrv_inc_in_flight(dst->bs);
>> -    tracked_request_begin(&src_req, src->bs, src_offset,
>> -                          bytes, BDRV_TRACKED_READ);
>> -    tracked_request_begin(&dst_req, dst->bs, dst_offset,
>> -                          bytes, BDRV_TRACKED_WRITE);
>>   
>> -    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
>> -        wait_serialising_requests(&src_req);
>> -        wait_serialising_requests(&dst_req);
>> -    }
>> -    if (recurse_src) {
>> -        ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
>> -                                                    src, src_offset,
>> -                                                    dst, dst_offset,
>> -                                                    bytes, flags);
>> -    } else {
>> -        ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
>> -                                                  src, src_offset,
>> -                                                  dst, dst_offset,
>> -                                                  bytes, flags);
>> -    }
>> -    tracked_request_end(&src_req);
>> -    tracked_request_end(&dst_req);
>> -    bdrv_dec_in_flight(src->bs);
>> -    bdrv_dec_in_flight(dst->bs);
>> -    return ret;
>> +    return 1;
>>   }
>>   
>>   /* Copy range from @src to @dst.
>> @@ -2960,8 +2941,31 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
>>                                            BdrvChild *dst, uint64_t dst_offset,
>>                                            uint64_t bytes, BdrvRequestFlags flags)
>>   {
>> -    return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
>> -                                       bytes, flags, true);
>> +    BdrvTrackedRequest req;
>> +    int ret;
>> +
>> +    ret = bdrv_co_copy_range_check(src, src_offset, dst, dst_offset, bytes,
>> +                                   flags);
> I don't like a function called _check to already do I/O here. Instead, I think
> this is cleaner:
>
> ---
>
>
> diff --git a/block/io.c b/block/io.c
> index 1a2272fad3..694a94dfae 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2923,32 +2923,34 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
>           || src->bs->encrypted || dst->bs->encrypted) {
>           return -ENOTSUP;
>       }
> -    bdrv_inc_in_flight(src->bs);
> -    bdrv_inc_in_flight(dst->bs);
> -    tracked_request_begin(&src_req, src->bs, src_offset,
> -                          bytes, BDRV_TRACKED_READ);
> -    tracked_request_begin(&dst_req, dst->bs, dst_offset,
> -                          bytes, BDRV_TRACKED_WRITE);
>   
> -    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
> -        wait_serialising_requests(&src_req);
> -        wait_serialising_requests(&dst_req);
> -    }
>       if (recurse_src) {
> +        bdrv_inc_in_flight(src->bs);
> +        tracked_request_begin(&src_req, src->bs, src_offset,
> +                              bytes, BDRV_TRACKED_READ);
> +        if (!(flags & BDRV_REQ_NO_SERIALISING)) {
> +            wait_serialising_requests(&src_req);
> +        }
>           ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
>                                                       src, src_offset,
>                                                       dst, dst_offset,
>                                                       bytes, flags);
> +        tracked_request_end(&src_req);
> +        bdrv_dec_in_flight(src->bs);
>       } else {
> +        bdrv_inc_in_flight(dst->bs);
> +        tracked_request_begin(&dst_req, dst->bs, dst_offset,
> +                              bytes, BDRV_TRACKED_WRITE);
> +        /* BDRV_REQ_NO_SERIALISING is only for read operation, so we ignore it
> +         * in flags. */
> +        wait_serialising_requests(&dst_req);
>           ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
>                                                     src, src_offset,
>                                                     dst, dst_offset,
>                                                     bytes, flags);
> +        tracked_request_end(&dst_req);
> +        bdrv_dec_in_flight(dst->bs);
>       }
> -    tracked_request_end(&src_req);
> -    tracked_request_end(&dst_req);
> -    bdrv_dec_in_flight(src->bs);
> -    bdrv_dec_in_flight(dst->bs);
>       return ret;
>   }
>   

A matter of taste, I think. I decided, that such way only stresses that 
these functions have more different than similar content and went 
another one.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range
  2018-07-09  9:43     ` Vladimir Sementsov-Ogievskiy
@ 2018-07-09 13:17       ` Fam Zheng
  2018-07-09 14:38         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2018-07-09 13:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, pl, pbonzini, ronniesahlberg, stefanha,
	mreitz, kwolf, jcody, jsnow, den

On Mon, 07/09 12:43, Vladimir Sementsov-Ogievskiy wrote:
> 09.07.2018 04:15, Fam Zheng wrote:
> > On Fri, 07/06 21:30, Vladimir Sementsov-Ogievskiy wrote:
> > > Here two things are fixed:
> > > 
> > > 1. Architecture
> > > 
> > > On each recursion step, we go to the child of src or dst, only for one
> > > of them. So, it's wrong to create tracked requests for both on each
> > > step. It leads to tracked requests duplication.
> > > 
> > > 2. Wait for serializing requests on write path independently of
> > >     BDRV_REQ_NO_SERIALISING
> > > 
> > > Before commit 9ded4a01149 "backup: Use copy offloading",
> > > BDRV_REQ_NO_SERIALISING was used for only one case: read in
> > > copy-on-write operation during backup. Also, the flag was handled only
> > > on read path (in bdrv_co_preadv and bdrv_aligned_preadv).
> > > 
> > > After 9ded4a01149, flag is used for not waiting serializing operations
> > > on backup target (in same case of copy-on-write operation). This
> > > behavior change is unsubstantiated and potentially dangerous, let's
> > > drop it and add additional asserts and documentation.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > ---
> > >   include/block/block.h |  13 +++++++
> > >   block/io.c            | 103 +++++++++++++++++++++++++++++++-------------------
> > >   2 files changed, 78 insertions(+), 38 deletions(-)
> > > 
> > > diff --git a/include/block/block.h b/include/block/block.h
> > > index e5c7759a0c..a06a4d27de 100644
> > > --- a/include/block/block.h
> > > +++ b/include/block/block.h
> > > @@ -50,6 +50,19 @@ typedef enum {
> > >        * opened with BDRV_O_UNMAP.
> > >        */
> > >       BDRV_REQ_MAY_UNMAP          = 0x4,
> > > +
> > > +    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
> > > +     * wait_serialising_requests(), when reading.
> > > +     *
> > > +     * This flag is used for backup copy on write operation, when we need to
> > > +     * read old data before write (write notifier triggered). It is ok, due to
> > > +     * we already waited for serializing requests in initiative write (see
> > > +     * bdrv_aligned_pwritev), and it is necessary for the case when initiative
> > > +     * write is serializing itself (we'll dead lock waiting it).
> > > +     *
> > > +     * The described case is the only usage for the flag for now, so, it is
> > > +     * supported only for read operation and restricted for write.
> > > +     */
> > >       BDRV_REQ_NO_SERIALISING     = 0x8,
> > >       BDRV_REQ_FUA                = 0x10,
> > >       BDRV_REQ_WRITE_COMPRESSED   = 0x20,
> > > diff --git a/block/io.c b/block/io.c
> > > index 1a2272fad3..621b21c455 100644
> > > --- a/block/io.c
> > > +++ b/block/io.c
> > > @@ -1572,6 +1572,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
> > >       max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
> > >                                      align);
> > > +    /* BDRV_REQ_NO_SERIALISING is only for read operation */
> > > +    assert(!(flags & BDRV_REQ_NO_SERIALISING));
> > >       waited = wait_serialising_requests(req);
> > >       assert(!waited || !req->serialising);
> > >       assert(req->overlap_offset <= offset);
> > > @@ -2888,15 +2890,19 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
> > >       }
> > >   }
> > > -static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
> > > -                                                    uint64_t src_offset,
> > > -                                                    BdrvChild *dst,
> > > -                                                    uint64_t dst_offset,
> > > -                                                    uint64_t bytes,
> > > -                                                    BdrvRequestFlags flags,
> > > -                                                    bool recurse_src)
> > > +/* Common part of bdrv_co_copy_range_from and bdrv_co_copy_range_to.
> > > + *
> > > + * Return -errno on failure,
> > > + *        0 if successfully handled by bdrv_co_pwrite_zeroes
> > > + *        1 to continue copy_range operation
> > > + */
> > > +static int coroutine_fn bdrv_co_copy_range_check(BdrvChild *src,
> > > +                                                 uint64_t src_offset,
> > > +                                                 BdrvChild *dst,
> > > +                                                 uint64_t dst_offset,
> > > +                                                 uint64_t bytes,
> > > +                                                 BdrvRequestFlags flags)
> > >   {
> > > -    BdrvTrackedRequest src_req, dst_req;
> > >       int ret;
> > >       if (!dst || !dst->bs) {
> > > @@ -2923,33 +2929,8 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
> > >           || src->bs->encrypted || dst->bs->encrypted) {
> > >           return -ENOTSUP;
> > >       }
> > > -    bdrv_inc_in_flight(src->bs);
> > > -    bdrv_inc_in_flight(dst->bs);
> > > -    tracked_request_begin(&src_req, src->bs, src_offset,
> > > -                          bytes, BDRV_TRACKED_READ);
> > > -    tracked_request_begin(&dst_req, dst->bs, dst_offset,
> > > -                          bytes, BDRV_TRACKED_WRITE);
> > > -    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
> > > -        wait_serialising_requests(&src_req);
> > > -        wait_serialising_requests(&dst_req);
> > > -    }
> > > -    if (recurse_src) {
> > > -        ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
> > > -                                                    src, src_offset,
> > > -                                                    dst, dst_offset,
> > > -                                                    bytes, flags);
> > > -    } else {
> > > -        ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
> > > -                                                  src, src_offset,
> > > -                                                  dst, dst_offset,
> > > -                                                  bytes, flags);
> > > -    }
> > > -    tracked_request_end(&src_req);
> > > -    tracked_request_end(&dst_req);
> > > -    bdrv_dec_in_flight(src->bs);
> > > -    bdrv_dec_in_flight(dst->bs);
> > > -    return ret;
> > > +    return 1;
> > >   }
> > >   /* Copy range from @src to @dst.
> > > @@ -2960,8 +2941,31 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
> > >                                            BdrvChild *dst, uint64_t dst_offset,
> > >                                            uint64_t bytes, BdrvRequestFlags flags)
> > >   {
> > > -    return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
> > > -                                       bytes, flags, true);
> > > +    BdrvTrackedRequest req;
> > > +    int ret;
> > > +
> > > +    ret = bdrv_co_copy_range_check(src, src_offset, dst, dst_offset, bytes,
> > > +                                   flags);
> > I don't like a function called _check to already do I/O here. Instead, I think
> > this is cleaner:
> > 
> > ---
> > 
> > 
> > diff --git a/block/io.c b/block/io.c
> > index 1a2272fad3..694a94dfae 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -2923,32 +2923,34 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
> >           || src->bs->encrypted || dst->bs->encrypted) {
> >           return -ENOTSUP;
> >       }
> > -    bdrv_inc_in_flight(src->bs);
> > -    bdrv_inc_in_flight(dst->bs);
> > -    tracked_request_begin(&src_req, src->bs, src_offset,
> > -                          bytes, BDRV_TRACKED_READ);
> > -    tracked_request_begin(&dst_req, dst->bs, dst_offset,
> > -                          bytes, BDRV_TRACKED_WRITE);
> > -    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
> > -        wait_serialising_requests(&src_req);
> > -        wait_serialising_requests(&dst_req);
> > -    }
> >       if (recurse_src) {
> > +        bdrv_inc_in_flight(src->bs);
> > +        tracked_request_begin(&src_req, src->bs, src_offset,
> > +                              bytes, BDRV_TRACKED_READ);
> > +        if (!(flags & BDRV_REQ_NO_SERIALISING)) {
> > +            wait_serialising_requests(&src_req);
> > +        }
> >           ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
> >                                                       src, src_offset,
> >                                                       dst, dst_offset,
> >                                                       bytes, flags);
> > +        tracked_request_end(&src_req);
> > +        bdrv_dec_in_flight(src->bs);
> >       } else {
> > +        bdrv_inc_in_flight(dst->bs);
> > +        tracked_request_begin(&dst_req, dst->bs, dst_offset,
> > +                              bytes, BDRV_TRACKED_WRITE);
> > +        /* BDRV_REQ_NO_SERIALISING is only for read operation, so we ignore it
> > +         * in flags. */
> > +        wait_serialising_requests(&dst_req);
> >           ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
> >                                                     src, src_offset,
> >                                                     dst, dst_offset,
> >                                                     bytes, flags);
> > +        tracked_request_end(&dst_req);
> > +        bdrv_dec_in_flight(dst->bs);
> >       }
> > -    tracked_request_end(&src_req);
> > -    tracked_request_end(&dst_req);
> > -    bdrv_dec_in_flight(src->bs);
> > -    bdrv_dec_in_flight(dst->bs);
> >       return ret;
> >   }
> 
> A matter of taste, I think. I decided, that such way only stresses that
> these functions have more different than similar content and went another
> one.

But then you have to use a specialized return value to designate "handled with
write zeroes", which makes the code harder to read.

Fam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range
  2018-07-09 13:17       ` Fam Zheng
@ 2018-07-09 14:38         ` Vladimir Sementsov-Ogievskiy
  2018-07-09 15:21           ` Fam Zheng
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-09 14:38 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, qemu-block, pl, pbonzini, ronniesahlberg, stefanha,
	mreitz, kwolf, jcody, jsnow, den

09.07.2018 16:17, Fam Zheng wrote:
> On Mon, 07/09 12:43, Vladimir Sementsov-Ogievskiy wrote:
>> 09.07.2018 04:15, Fam Zheng wrote:
>>> On Fri, 07/06 21:30, Vladimir Sementsov-Ogievskiy wrote:
>>>> Here two things are fixed:
>>>>
>>>> 1. Architecture
>>>>
>>>> On each recursion step, we go to the child of src or dst, only for one
>>>> of them. So, it's wrong to create tracked requests for both on each
>>>> step. It leads to tracked requests duplication.
>>>>
>>>> 2. Wait for serializing requests on write path independently of
>>>>      BDRV_REQ_NO_SERIALISING
>>>>
>>>> Before commit 9ded4a01149 "backup: Use copy offloading",
>>>> BDRV_REQ_NO_SERIALISING was used for only one case: read in
>>>> copy-on-write operation during backup. Also, the flag was handled only
>>>> on read path (in bdrv_co_preadv and bdrv_aligned_preadv).
>>>>
>>>> After 9ded4a01149, flag is used for not waiting serializing operations
>>>> on backup target (in same case of copy-on-write operation). This
>>>> behavior change is unsubstantiated and potentially dangerous, let's
>>>> drop it and add additional asserts and documentation.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    include/block/block.h |  13 +++++++
>>>>    block/io.c            | 103 +++++++++++++++++++++++++++++++-------------------
>>>>    2 files changed, 78 insertions(+), 38 deletions(-)
>>>>
>>>> diff --git a/include/block/block.h b/include/block/block.h
>>>> index e5c7759a0c..a06a4d27de 100644
>>>> --- a/include/block/block.h
>>>> +++ b/include/block/block.h
>>>> @@ -50,6 +50,19 @@ typedef enum {
>>>>         * opened with BDRV_O_UNMAP.
>>>>         */
>>>>        BDRV_REQ_MAY_UNMAP          = 0x4,
>>>> +
>>>> +    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
>>>> +     * wait_serialising_requests(), when reading.
>>>> +     *
>>>> +     * This flag is used for backup copy on write operation, when we need to
>>>> +     * read old data before write (write notifier triggered). It is ok, due to
>>>> +     * we already waited for serializing requests in initiative write (see
>>>> +     * bdrv_aligned_pwritev), and it is necessary for the case when initiative
>>>> +     * write is serializing itself (we'll dead lock waiting it).
>>>> +     *
>>>> +     * The described case is the only usage for the flag for now, so, it is
>>>> +     * supported only for read operation and restricted for write.
>>>> +     */
>>>>        BDRV_REQ_NO_SERIALISING     = 0x8,
>>>>        BDRV_REQ_FUA                = 0x10,
>>>>        BDRV_REQ_WRITE_COMPRESSED   = 0x20,
>>>> diff --git a/block/io.c b/block/io.c
>>>> index 1a2272fad3..621b21c455 100644
>>>> --- a/block/io.c
>>>> +++ b/block/io.c
>>>> @@ -1572,6 +1572,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>>>>        max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
>>>>                                       align);
>>>> +    /* BDRV_REQ_NO_SERIALISING is only for read operation */
>>>> +    assert(!(flags & BDRV_REQ_NO_SERIALISING));
>>>>        waited = wait_serialising_requests(req);
>>>>        assert(!waited || !req->serialising);
>>>>        assert(req->overlap_offset <= offset);
>>>> @@ -2888,15 +2890,19 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
>>>>        }
>>>>    }
>>>> -static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
>>>> -                                                    uint64_t src_offset,
>>>> -                                                    BdrvChild *dst,
>>>> -                                                    uint64_t dst_offset,
>>>> -                                                    uint64_t bytes,
>>>> -                                                    BdrvRequestFlags flags,
>>>> -                                                    bool recurse_src)
>>>> +/* Common part of bdrv_co_copy_range_from and bdrv_co_copy_range_to.
>>>> + *
>>>> + * Return -errno on failure,
>>>> + *        0 if successfully handled by bdrv_co_pwrite_zeroes
>>>> + *        1 to continue copy_range operation
>>>> + */
>>>> +static int coroutine_fn bdrv_co_copy_range_check(BdrvChild *src,
>>>> +                                                 uint64_t src_offset,
>>>> +                                                 BdrvChild *dst,
>>>> +                                                 uint64_t dst_offset,
>>>> +                                                 uint64_t bytes,
>>>> +                                                 BdrvRequestFlags flags)
>>>>    {
>>>> -    BdrvTrackedRequest src_req, dst_req;
>>>>        int ret;
>>>>        if (!dst || !dst->bs) {
>>>> @@ -2923,33 +2929,8 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
>>>>            || src->bs->encrypted || dst->bs->encrypted) {
>>>>            return -ENOTSUP;
>>>>        }
>>>> -    bdrv_inc_in_flight(src->bs);
>>>> -    bdrv_inc_in_flight(dst->bs);
>>>> -    tracked_request_begin(&src_req, src->bs, src_offset,
>>>> -                          bytes, BDRV_TRACKED_READ);
>>>> -    tracked_request_begin(&dst_req, dst->bs, dst_offset,
>>>> -                          bytes, BDRV_TRACKED_WRITE);
>>>> -    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
>>>> -        wait_serialising_requests(&src_req);
>>>> -        wait_serialising_requests(&dst_req);
>>>> -    }
>>>> -    if (recurse_src) {
>>>> -        ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
>>>> -                                                    src, src_offset,
>>>> -                                                    dst, dst_offset,
>>>> -                                                    bytes, flags);
>>>> -    } else {
>>>> -        ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
>>>> -                                                  src, src_offset,
>>>> -                                                  dst, dst_offset,
>>>> -                                                  bytes, flags);
>>>> -    }
>>>> -    tracked_request_end(&src_req);
>>>> -    tracked_request_end(&dst_req);
>>>> -    bdrv_dec_in_flight(src->bs);
>>>> -    bdrv_dec_in_flight(dst->bs);
>>>> -    return ret;
>>>> +    return 1;
>>>>    }
>>>>    /* Copy range from @src to @dst.
>>>> @@ -2960,8 +2941,31 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
>>>>                                             BdrvChild *dst, uint64_t dst_offset,
>>>>                                             uint64_t bytes, BdrvRequestFlags flags)
>>>>    {
>>>> -    return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
>>>> -                                       bytes, flags, true);
>>>> +    BdrvTrackedRequest req;
>>>> +    int ret;
>>>> +
>>>> +    ret = bdrv_co_copy_range_check(src, src_offset, dst, dst_offset, bytes,
>>>> +                                   flags);
>>> I don't like a function called _check to already do I/O here. Instead, I think
>>> this is cleaner:
>>>
>>> ---
>>>
>>>
>>> diff --git a/block/io.c b/block/io.c
>>> index 1a2272fad3..694a94dfae 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -2923,32 +2923,34 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
>>>            || src->bs->encrypted || dst->bs->encrypted) {
>>>            return -ENOTSUP;
>>>        }
>>> -    bdrv_inc_in_flight(src->bs);
>>> -    bdrv_inc_in_flight(dst->bs);
>>> -    tracked_request_begin(&src_req, src->bs, src_offset,
>>> -                          bytes, BDRV_TRACKED_READ);
>>> -    tracked_request_begin(&dst_req, dst->bs, dst_offset,
>>> -                          bytes, BDRV_TRACKED_WRITE);
>>> -    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
>>> -        wait_serialising_requests(&src_req);
>>> -        wait_serialising_requests(&dst_req);
>>> -    }
>>>        if (recurse_src) {
>>> +        bdrv_inc_in_flight(src->bs);
>>> +        tracked_request_begin(&src_req, src->bs, src_offset,
>>> +                              bytes, BDRV_TRACKED_READ);
>>> +        if (!(flags & BDRV_REQ_NO_SERIALISING)) {
>>> +            wait_serialising_requests(&src_req);
>>> +        }
>>>            ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
>>>                                                        src, src_offset,
>>>                                                        dst, dst_offset,
>>>                                                        bytes, flags);
>>> +        tracked_request_end(&src_req);
>>> +        bdrv_dec_in_flight(src->bs);
>>>        } else {
>>> +        bdrv_inc_in_flight(dst->bs);
>>> +        tracked_request_begin(&dst_req, dst->bs, dst_offset,
>>> +                              bytes, BDRV_TRACKED_WRITE);
>>> +        /* BDRV_REQ_NO_SERIALISING is only for read operation, so we ignore it
>>> +         * in flags. */
>>> +        wait_serialising_requests(&dst_req);
>>>            ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
>>>                                                      src, src_offset,
>>>                                                      dst, dst_offset,
>>>                                                      bytes, flags);
>>> +        tracked_request_end(&dst_req);
>>> +        bdrv_dec_in_flight(dst->bs);
>>>        }
>>> -    tracked_request_end(&src_req);
>>> -    tracked_request_end(&dst_req);
>>> -    bdrv_dec_in_flight(src->bs);
>>> -    bdrv_dec_in_flight(dst->bs);
>>>        return ret;
>>>    }
>> A matter of taste, I think. I decided, that such way only stresses that
>> these functions have more different than similar content and went another
>> one.
> But then you have to use a specialized return value to designate "handled with
> write zeroes", which makes the code harder to read.
>
> Fam

Hmm, didn't care about this, it's normal return semantics for a lot of 
functions in qemu nbd code, I'm used to it.
Oops, missed that it's your code and you are its maintainer) Will 
resend, if you are not comfortable with such semantics. I assume, you 
agree with the fix itself..

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range
  2018-07-09 14:38         ` Vladimir Sementsov-Ogievskiy
@ 2018-07-09 15:21           ` Fam Zheng
  0 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2018-07-09 15:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, pl, pbonzini, ronniesahlberg, stefanha,
	mreitz, kwolf, jcody, jsnow, den

On Mon, 07/09 17:38, Vladimir Sementsov-Ogievskiy wrote:
> 09.07.2018 16:17, Fam Zheng wrote:
> > On Mon, 07/09 12:43, Vladimir Sementsov-Ogievskiy wrote:
> > > 09.07.2018 04:15, Fam Zheng wrote:
> > > > On Fri, 07/06 21:30, Vladimir Sementsov-Ogievskiy wrote:
> > > > > Here two things are fixed:
> > > > > 
> > > > > 1. Architecture
> > > > > 
> > > > > On each recursion step, we go to the child of src or dst, only for one
> > > > > of them. So, it's wrong to create tracked requests for both on each
> > > > > step. It leads to tracked requests duplication.
> > > > > 
> > > > > 2. Wait for serializing requests on write path independently of
> > > > >      BDRV_REQ_NO_SERIALISING
> > > > > 
> > > > > Before commit 9ded4a01149 "backup: Use copy offloading",
> > > > > BDRV_REQ_NO_SERIALISING was used for only one case: read in
> > > > > copy-on-write operation during backup. Also, the flag was handled only
> > > > > on read path (in bdrv_co_preadv and bdrv_aligned_preadv).
> > > > > 
> > > > > After 9ded4a01149, flag is used for not waiting serializing operations
> > > > > on backup target (in same case of copy-on-write operation). This
> > > > > behavior change is unsubstantiated and potentially dangerous, let's
> > > > > drop it and add additional asserts and documentation.
> > > > > 
> > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > > > ---
> > > > >    include/block/block.h |  13 +++++++
> > > > >    block/io.c            | 103 +++++++++++++++++++++++++++++++-------------------
> > > > >    2 files changed, 78 insertions(+), 38 deletions(-)
> > > > > 
> > > > > diff --git a/include/block/block.h b/include/block/block.h
> > > > > index e5c7759a0c..a06a4d27de 100644
> > > > > --- a/include/block/block.h
> > > > > +++ b/include/block/block.h
> > > > > @@ -50,6 +50,19 @@ typedef enum {
> > > > >         * opened with BDRV_O_UNMAP.
> > > > >         */
> > > > >        BDRV_REQ_MAY_UNMAP          = 0x4,
> > > > > +
> > > > > +    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
> > > > > +     * wait_serialising_requests(), when reading.
> > > > > +     *
> > > > > +     * This flag is used for backup copy on write operation, when we need to
> > > > > +     * read old data before write (write notifier triggered). It is ok, due to
> > > > > +     * we already waited for serializing requests in initiative write (see
> > > > > +     * bdrv_aligned_pwritev), and it is necessary for the case when initiative
> > > > > +     * write is serializing itself (we'll dead lock waiting it).
> > > > > +     *
> > > > > +     * The described case is the only usage for the flag for now, so, it is
> > > > > +     * supported only for read operation and restricted for write.
> > > > > +     */
> > > > >        BDRV_REQ_NO_SERIALISING     = 0x8,
> > > > >        BDRV_REQ_FUA                = 0x10,
> > > > >        BDRV_REQ_WRITE_COMPRESSED   = 0x20,
> > > > > diff --git a/block/io.c b/block/io.c
> > > > > index 1a2272fad3..621b21c455 100644
> > > > > --- a/block/io.c
> > > > > +++ b/block/io.c
> > > > > @@ -1572,6 +1572,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
> > > > >        max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
> > > > >                                       align);
> > > > > +    /* BDRV_REQ_NO_SERIALISING is only for read operation */
> > > > > +    assert(!(flags & BDRV_REQ_NO_SERIALISING));
> > > > >        waited = wait_serialising_requests(req);
> > > > >        assert(!waited || !req->serialising);
> > > > >        assert(req->overlap_offset <= offset);
> > > > > @@ -2888,15 +2890,19 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
> > > > >        }
> > > > >    }
> > > > > -static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
> > > > > -                                                    uint64_t src_offset,
> > > > > -                                                    BdrvChild *dst,
> > > > > -                                                    uint64_t dst_offset,
> > > > > -                                                    uint64_t bytes,
> > > > > -                                                    BdrvRequestFlags flags,
> > > > > -                                                    bool recurse_src)
> > > > > +/* Common part of bdrv_co_copy_range_from and bdrv_co_copy_range_to.
> > > > > + *
> > > > > + * Return -errno on failure,
> > > > > + *        0 if successfully handled by bdrv_co_pwrite_zeroes
> > > > > + *        1 to continue copy_range operation
> > > > > + */
> > > > > +static int coroutine_fn bdrv_co_copy_range_check(BdrvChild *src,
> > > > > +                                                 uint64_t src_offset,
> > > > > +                                                 BdrvChild *dst,
> > > > > +                                                 uint64_t dst_offset,
> > > > > +                                                 uint64_t bytes,
> > > > > +                                                 BdrvRequestFlags flags)
> > > > >    {
> > > > > -    BdrvTrackedRequest src_req, dst_req;
> > > > >        int ret;
> > > > >        if (!dst || !dst->bs) {
> > > > > @@ -2923,33 +2929,8 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
> > > > >            || src->bs->encrypted || dst->bs->encrypted) {
> > > > >            return -ENOTSUP;
> > > > >        }
> > > > > -    bdrv_inc_in_flight(src->bs);
> > > > > -    bdrv_inc_in_flight(dst->bs);
> > > > > -    tracked_request_begin(&src_req, src->bs, src_offset,
> > > > > -                          bytes, BDRV_TRACKED_READ);
> > > > > -    tracked_request_begin(&dst_req, dst->bs, dst_offset,
> > > > > -                          bytes, BDRV_TRACKED_WRITE);
> > > > > -    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
> > > > > -        wait_serialising_requests(&src_req);
> > > > > -        wait_serialising_requests(&dst_req);
> > > > > -    }
> > > > > -    if (recurse_src) {
> > > > > -        ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
> > > > > -                                                    src, src_offset,
> > > > > -                                                    dst, dst_offset,
> > > > > -                                                    bytes, flags);
> > > > > -    } else {
> > > > > -        ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
> > > > > -                                                  src, src_offset,
> > > > > -                                                  dst, dst_offset,
> > > > > -                                                  bytes, flags);
> > > > > -    }
> > > > > -    tracked_request_end(&src_req);
> > > > > -    tracked_request_end(&dst_req);
> > > > > -    bdrv_dec_in_flight(src->bs);
> > > > > -    bdrv_dec_in_flight(dst->bs);
> > > > > -    return ret;
> > > > > +    return 1;
> > > > >    }
> > > > >    /* Copy range from @src to @dst.
> > > > > @@ -2960,8 +2941,31 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
> > > > >                                             BdrvChild *dst, uint64_t dst_offset,
> > > > >                                             uint64_t bytes, BdrvRequestFlags flags)
> > > > >    {
> > > > > -    return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
> > > > > -                                       bytes, flags, true);
> > > > > +    BdrvTrackedRequest req;
> > > > > +    int ret;
> > > > > +
> > > > > +    ret = bdrv_co_copy_range_check(src, src_offset, dst, dst_offset, bytes,
> > > > > +                                   flags);
> > > > I don't like a function called _check to already do I/O here. Instead, I think
> > > > this is cleaner:
> > > > 
> > > > ---
> > > > 
> > > > 
> > > > diff --git a/block/io.c b/block/io.c
> > > > index 1a2272fad3..694a94dfae 100644
> > > > --- a/block/io.c
> > > > +++ b/block/io.c
> > > > @@ -2923,32 +2923,34 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
> > > >            || src->bs->encrypted || dst->bs->encrypted) {
> > > >            return -ENOTSUP;
> > > >        }
> > > > -    bdrv_inc_in_flight(src->bs);
> > > > -    bdrv_inc_in_flight(dst->bs);
> > > > -    tracked_request_begin(&src_req, src->bs, src_offset,
> > > > -                          bytes, BDRV_TRACKED_READ);
> > > > -    tracked_request_begin(&dst_req, dst->bs, dst_offset,
> > > > -                          bytes, BDRV_TRACKED_WRITE);
> > > > -    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
> > > > -        wait_serialising_requests(&src_req);
> > > > -        wait_serialising_requests(&dst_req);
> > > > -    }
> > > >        if (recurse_src) {
> > > > +        bdrv_inc_in_flight(src->bs);
> > > > +        tracked_request_begin(&src_req, src->bs, src_offset,
> > > > +                              bytes, BDRV_TRACKED_READ);
> > > > +        if (!(flags & BDRV_REQ_NO_SERIALISING)) {
> > > > +            wait_serialising_requests(&src_req);
> > > > +        }
> > > >            ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
> > > >                                                        src, src_offset,
> > > >                                                        dst, dst_offset,
> > > >                                                        bytes, flags);
> > > > +        tracked_request_end(&src_req);
> > > > +        bdrv_dec_in_flight(src->bs);
> > > >        } else {
> > > > +        bdrv_inc_in_flight(dst->bs);
> > > > +        tracked_request_begin(&dst_req, dst->bs, dst_offset,
> > > > +                              bytes, BDRV_TRACKED_WRITE);
> > > > +        /* BDRV_REQ_NO_SERIALISING is only for read operation, so we ignore it
> > > > +         * in flags. */
> > > > +        wait_serialising_requests(&dst_req);
> > > >            ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
> > > >                                                      src, src_offset,
> > > >                                                      dst, dst_offset,
> > > >                                                      bytes, flags);
> > > > +        tracked_request_end(&dst_req);
> > > > +        bdrv_dec_in_flight(dst->bs);
> > > >        }
> > > > -    tracked_request_end(&src_req);
> > > > -    tracked_request_end(&dst_req);
> > > > -    bdrv_dec_in_flight(src->bs);
> > > > -    bdrv_dec_in_flight(dst->bs);
> > > >        return ret;
> > > >    }
> > > A matter of taste, I think. I decided, that such way only stresses that
> > > these functions have more different than similar content and went another
> > > one.
> > But then you have to use a specialized return value to designate "handled with
> > write zeroes", which makes the code harder to read.
> > 
> > Fam
> 
> Hmm, didn't care about this, it's normal return semantics for a lot of
> functions in qemu nbd code, I'm used to it.
> Oops, missed that it's your code and you are its maintainer) Will resend, if
> you are not comfortable with such semantics. I assume, you agree with the
> fix itself..

Yes, the fix is good. Actually I'll have to add some new code on top of your fix
after QEMU 3.0.  I planned to call it bdrv_co_copy_range_check(), but it will do
completely different things than this patch: it will do a recursion to see if
all drivers are happy with the parameters, with no side effect (e.g. no qcow2
cluster allocation). That's one reason why I prefer we don't split the "zero
write" code and the copy offloading code to multiple functions now; besides, my
version of bdrv_co_copy_range_check() will have to be called outside of the
actual I/O recursion. My impression is that even though one of them can change
the name, having two checking helpers around is still confusing.

Also, v3 of "block: Fix dst reading after tail copy offloading" series is pending
on this series as well. I appreciate if you resend. :)

Fam

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-07-09 15:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-06 18:30 [Qemu-devel] [PATCH v4 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 1/4] block/io: fix copy_range Vladimir Sementsov-Ogievskiy
2018-07-09  1:15   ` Fam Zheng
2018-07-09  9:43     ` Vladimir Sementsov-Ogievskiy
2018-07-09 13:17       ` Fam Zheng
2018-07-09 14:38         ` Vladimir Sementsov-Ogievskiy
2018-07-09 15:21           ` Fam Zheng
2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 2/4] block: split flags in copy_range Vladimir Sementsov-Ogievskiy
2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 3/4] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
2018-07-06 18:30 ` [Qemu-devel] [PATCH v4 4/4] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
2018-07-06 21:55 ` [Qemu-devel] [PATCH v4 0/4] fix image fleecing Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.