All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/4] fix image fleecing
@ 2018-07-05  7:46 Vladimir Sementsov-Ogievskiy
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 1/4] block: disallow BDRV_REQ_NO_SERIALISING for write Vladimir Sementsov-Ogievskiy
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-05  7:46 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pl, pbonzini, ronniesahlberg, famz, stefanha, mreitz, kwolf,
	jcody, jsnow, den, vsementsov

Hi all.

This fixes image fleecing scheme for 3.0, details are in 04 patch.

v3:
02: fix typo in assert, to fix build

v2:
01,02: new patches

03: - improve comment
    - fix assert in bdrv_aligned_pwritev
    - add asserts to not use the flag on read requests
    - support copy_range

04: - expand "detected cases" range 
    - s/fleecing/serialize_target_writes
    - support backup_cow_with_offload
    - drop restriction on compressed writes


v1 cover:

It's a continuation of discussion under
"[PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing" [1].

Here is my try to implement Kevin's idea, that all backup writes (for
fleecing case) should be serialized. (However, I've skipped for now
fixing related permissions).

Looks like these patches may replace patch [1], to make fleecing scheme
safe. But I'm not sure, a look by Kevin is necessary.

A test is still needed, to prove that this patch is necessary and that it
works..

Vladimir Sementsov-Ogievskiy (4):
  block: disallow BDRV_REQ_NO_SERIALISING for write
  block: split flags in copy_range
  block: add BDRV_REQ_SERIALISING flag
  block/backup: fix fleecing scheme: use serialized writes

 include/block/block.h          | 31 ++++++++++++++++--
 include/block/block_int.h      | 14 +++++---
 include/sysemu/block-backend.h |  3 +-
 block/backup.c                 | 20 ++++++++----
 block/block-backend.c          |  5 +--
 block/file-posix.c             | 21 +++++++-----
 block/io.c                     | 74 ++++++++++++++++++++++++++++++------------
 block/iscsi.c                  |  9 +++--
 block/qcow2.c                  | 20 +++++++-----
 block/raw-format.c             | 24 +++++++++-----
 qemu-img.c                     |  2 +-
 11 files changed, 159 insertions(+), 64 deletions(-)

-- 
2.11.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 1/4] block: disallow BDRV_REQ_NO_SERIALISING for write
  2018-07-05  7:46 [Qemu-devel] [PATCH v3 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
@ 2018-07-05  7:46 ` Vladimir Sementsov-Ogievskiy
  2018-07-06 21:32   ` Eric Blake
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 2/4] block: split flags in copy_range Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-05  7:46 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pl, pbonzini, ronniesahlberg, famz, stefanha, mreitz, kwolf,
	jcody, jsnow, den, vsementsov

Before commit 9ded4a01149 "backup: Use copy offloading",
BDRV_REQ_NO_SERIALISING was used for only one case: read in
copy-on-write operation during backup. Also, the flag was handled only
on read path (in bdrv_co_preadv and bdrv_aligned_preadv).

After 9ded4a01149, flag is used for not waiting serializing operations
on backup target (in same case of copy-on-write operation). This
behavior change is unsubstantiated and potentially dangerous, let's
drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block.h | 13 +++++++++++++
 block/io.c            |  7 ++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/block/block.h b/include/block/block.h
index e5c7759a0c..a06a4d27de 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -50,6 +50,19 @@ typedef enum {
      * opened with BDRV_O_UNMAP.
      */
     BDRV_REQ_MAY_UNMAP          = 0x4,
+
+    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
+     * wait_serialising_requests(), when reading.
+     *
+     * This flag is used for backup copy on write operation, when we need to
+     * read old data before write (write notifier triggered). It is ok, due to
+     * we already waited for serializing requests in initiative write (see
+     * bdrv_aligned_pwritev), and it is necessary for the case when initiative
+     * write is serializing itself (we'll dead lock waiting it).
+     *
+     * The described case is the only usage for the flag for now, so, it is
+     * supported only for read operation and restricted for write.
+     */
     BDRV_REQ_NO_SERIALISING     = 0x8,
     BDRV_REQ_FUA                = 0x10,
     BDRV_REQ_WRITE_COMPRESSED   = 0x20,
diff --git a/block/io.c b/block/io.c
index 1a2272fad3..51bef9fc50 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1572,6 +1572,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
     max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
                                    align);
 
+    /* BDRV_REQ_NO_SERIALISING is only for read operation */
+    assert(!(flags & BDRV_REQ_NO_SERIALISING));
     waited = wait_serialising_requests(req);
     assert(!waited || !req->serialising);
     assert(req->overlap_offset <= offset);
@@ -2931,9 +2933,12 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
                           bytes, BDRV_TRACKED_WRITE);
 
     if (!(flags & BDRV_REQ_NO_SERIALISING)) {
+        /* BDRV_REQ_NO_SERIALISING is only for read */
         wait_serialising_requests(&src_req);
-        wait_serialising_requests(&dst_req);
     }
+
+    wait_serialising_requests(&dst_req);
+
     if (recurse_src) {
         ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
                                                     src, src_offset,
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 2/4] block: split flags in copy_range
  2018-07-05  7:46 [Qemu-devel] [PATCH v3 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 1/4] block: disallow BDRV_REQ_NO_SERIALISING for write Vladimir Sementsov-Ogievskiy
@ 2018-07-05  7:46 ` Vladimir Sementsov-Ogievskiy
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 3/4] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-05  7:46 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pl, pbonzini, ronniesahlberg, famz, stefanha, mreitz, kwolf,
	jcody, jsnow, den, vsementsov

Pass read flags and write flags separately. This is needed to handle
coming BDRV_REQ_NO_SERIALISING clearly in following patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block.h          |  3 ++-
 include/block/block_int.h      | 14 ++++++++++----
 include/sysemu/block-backend.h |  3 ++-
 block/backup.c                 |  2 +-
 block/block-backend.c          |  5 +++--
 block/file-posix.c             | 21 +++++++++++++--------
 block/io.c                     | 43 +++++++++++++++++++++++-------------------
 block/iscsi.c                  |  9 ++++++---
 block/qcow2.c                  | 20 +++++++++++---------
 block/raw-format.c             | 24 +++++++++++++++--------
 qemu-img.c                     |  2 +-
 11 files changed, 89 insertions(+), 57 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index a06a4d27de..478ebc6c6c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -685,5 +685,6 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host);
  **/
 int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
                                     BdrvChild *dst, uint64_t dst_offset,
-                                    uint64_t bytes, BdrvRequestFlags flags);
+                                    uint64_t bytes, BdrvRequestFlags read_flags,
+                                    BdrvRequestFlags write_flags);
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index af71b414be..119b6e4ea5 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -218,7 +218,8 @@ struct BlockDriver {
                                                 BdrvChild *dst,
                                                 uint64_t dst_offset,
                                                 uint64_t bytes,
-                                                BdrvRequestFlags flags);
+                                                BdrvRequestFlags read_flags,
+                                                BdrvRequestFlags write_flags);
 
     /* Map [offset, offset + nbytes) range onto a child of bs to copy data to,
      * and invoke bdrv_co_copy_range_to(child, src, ...), or perform the copy
@@ -234,7 +235,8 @@ struct BlockDriver {
                                               BdrvChild *dst,
                                               uint64_t dst_offset,
                                               uint64_t bytes,
-                                              BdrvRequestFlags flags);
+                                              BdrvRequestFlags read_flags,
+                                              BdrvRequestFlags write_flags);
 
     /*
      * Building block for bdrv_block_status[_above] and
@@ -1153,10 +1155,14 @@ void blockdev_close_all_bdrv_states(void);
 
 int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
                                          BdrvChild *dst, uint64_t dst_offset,
-                                         uint64_t bytes, BdrvRequestFlags flags);
+                                         uint64_t bytes,
+                                         BdrvRequestFlags read_flags,
+                                         BdrvRequestFlags write_flags);
 int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
                                        BdrvChild *dst, uint64_t dst_offset,
-                                       uint64_t bytes, BdrvRequestFlags flags);
+                                       uint64_t bytes,
+                                       BdrvRequestFlags read_flags,
+                                       BdrvRequestFlags write_flags);
 
 int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8d03d493c2..830d873f24 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -234,6 +234,7 @@ void blk_unregister_buf(BlockBackend *blk, void *host);
 
 int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
                                    BlockBackend *blk_out, int64_t off_out,
-                                   int bytes, BdrvRequestFlags flags);
+                                   int bytes, BdrvRequestFlags read_flags,
+                                   BdrvRequestFlags write_flags);
 
 #endif
diff --git a/block/backup.c b/block/backup.c
index 81895ddbe2..f3e4e814b6 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -163,7 +163,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     hbitmap_reset(job->copy_bitmap, start / job->cluster_size,
                   nr_clusters);
     ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
-                            is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
+                            is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0);
     if (ret < 0) {
         trace_backup_do_cow_copy_range_fail(job, start, ret);
         hbitmap_set(job->copy_bitmap, start / job->cluster_size,
diff --git a/block/block-backend.c b/block/block-backend.c
index 6b75bca317..ac8c3e0b1c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2218,7 +2218,8 @@ void blk_unregister_buf(BlockBackend *blk, void *host)
 
 int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
                                    BlockBackend *blk_out, int64_t off_out,
-                                   int bytes, BdrvRequestFlags flags)
+                                   int bytes, BdrvRequestFlags read_flags,
+                                   BdrvRequestFlags write_flags)
 {
     int r;
     r = blk_check_byte_request(blk_in, off_in, bytes);
@@ -2231,5 +2232,5 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
     }
     return bdrv_co_copy_range(blk_in->root, off_in,
                               blk_out->root, off_out,
-                              bytes, flags);
+                              bytes, read_flags, write_flags);
 }
diff --git a/block/file-posix.c b/block/file-posix.c
index 829ee538d8..2f753f0856 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2578,18 +2578,23 @@ static void raw_abort_perm_update(BlockDriverState *bs)
     raw_handle_perm_lock(bs, RAW_PL_ABORT, 0, 0, NULL);
 }
 
-static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
-                                               BdrvChild *src, uint64_t src_offset,
-                                               BdrvChild *dst, uint64_t dst_offset,
-                                               uint64_t bytes, BdrvRequestFlags flags)
+static int coroutine_fn raw_co_copy_range_from(
+        BlockDriverState *bs, BdrvChild *src, uint64_t src_offset,
+        BdrvChild *dst, uint64_t dst_offset, uint64_t bytes,
+        BdrvRequestFlags read_flags, BdrvRequestFlags write_flags)
 {
-    return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags);
+    return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes,
+                                 read_flags, write_flags);
 }
 
 static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
-                                             BdrvChild *src, uint64_t src_offset,
-                                             BdrvChild *dst, uint64_t dst_offset,
-                                             uint64_t bytes, BdrvRequestFlags flags)
+                                             BdrvChild *src,
+                                             uint64_t src_offset,
+                                             BdrvChild *dst,
+                                             uint64_t dst_offset,
+                                             uint64_t bytes,
+                                             BdrvRequestFlags read_flags,
+                                             BdrvRequestFlags write_flags)
 {
     BDRVRawState *s = bs->opaque;
     BDRVRawState *src_s;
diff --git a/block/io.c b/block/io.c
index 51bef9fc50..b66764d6c4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2890,13 +2890,10 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
     }
 }
 
-static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
-                                                    uint64_t src_offset,
-                                                    BdrvChild *dst,
-                                                    uint64_t dst_offset,
-                                                    uint64_t bytes,
-                                                    BdrvRequestFlags flags,
-                                                    bool recurse_src)
+static int coroutine_fn bdrv_co_copy_range_internal(
+        BdrvChild *src, uint64_t src_offset, BdrvChild *dst,
+        uint64_t dst_offset, uint64_t bytes, BdrvRequestFlags read_flags,
+        BdrvRequestFlags write_flags, bool recurse_src)
 {
     BdrvTrackedRequest src_req, dst_req;
     int ret;
@@ -2908,8 +2905,8 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
     if (ret) {
         return ret;
     }
-    if (flags & BDRV_REQ_ZERO_WRITE) {
-        return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, flags);
+    if (write_flags & BDRV_REQ_ZERO_WRITE) {
+        return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, write_flags);
     }
 
     if (!src || !src->bs) {
@@ -2932,23 +2929,26 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
     tracked_request_begin(&dst_req, dst->bs, dst_offset,
                           bytes, BDRV_TRACKED_WRITE);
 
-    if (!(flags & BDRV_REQ_NO_SERIALISING)) {
-        /* BDRV_REQ_NO_SERIALISING is only for read */
+    if (!(read_flags & BDRV_REQ_NO_SERIALISING)) {
         wait_serialising_requests(&src_req);
     }
 
+    /* BDRV_REQ_NO_SERIALISING is only for read */
+    assert(!(write_flags & BDRV_REQ_NO_SERIALISING));
     wait_serialising_requests(&dst_req);
 
     if (recurse_src) {
         ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
                                                     src, src_offset,
                                                     dst, dst_offset,
-                                                    bytes, flags);
+                                                    bytes, read_flags,
+                                                    write_flags);
     } else {
         ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
                                                   src, src_offset,
                                                   dst, dst_offset,
-                                                  bytes, flags);
+                                                  bytes, read_flags,
+                                                  write_flags);
     }
     tracked_request_end(&src_req);
     tracked_request_end(&dst_req);
@@ -2963,10 +2963,12 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src,
  * semantics. */
 int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
                                          BdrvChild *dst, uint64_t dst_offset,
-                                         uint64_t bytes, BdrvRequestFlags flags)
+                                         uint64_t bytes,
+                                         BdrvRequestFlags read_flags,
+                                         BdrvRequestFlags write_flags)
 {
     return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
-                                       bytes, flags, true);
+                                       bytes, read_flags, write_flags, true);
 }
 
 /* Copy range from @src to @dst.
@@ -2975,19 +2977,22 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
  * semantics. */
 int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
                                        BdrvChild *dst, uint64_t dst_offset,
-                                       uint64_t bytes, BdrvRequestFlags flags)
+                                       uint64_t bytes,
+                                       BdrvRequestFlags read_flags,
+                                       BdrvRequestFlags write_flags)
 {
     return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
-                                       bytes, flags, false);
+                                       bytes, read_flags, write_flags, false);
 }
 
 int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
                                     BdrvChild *dst, uint64_t dst_offset,
-                                    uint64_t bytes, BdrvRequestFlags flags)
+                                    uint64_t bytes, BdrvRequestFlags read_flags,
+                                    BdrvRequestFlags write_flags)
 {
     return bdrv_co_copy_range_from(src, src_offset,
                                    dst, dst_offset,
-                                   bytes, flags);
+                                   bytes, read_flags, write_flags);
 }
 
 static void bdrv_parent_cb_resize(BlockDriverState *bs)
diff --git a/block/iscsi.c b/block/iscsi.c
index ead2bd5aa7..38b917a1e5 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2193,9 +2193,11 @@ static int coroutine_fn iscsi_co_copy_range_from(BlockDriverState *bs,
                                                  BdrvChild *dst,
                                                  uint64_t dst_offset,
                                                  uint64_t bytes,
-                                                 BdrvRequestFlags flags)
+                                                 BdrvRequestFlags read_flags,
+                                                 BdrvRequestFlags write_flags)
 {
-    return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags);
+    return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes,
+                                 read_flags, write_flags);
 }
 
 static struct scsi_task *iscsi_xcopy_task(int param_len)
@@ -2332,7 +2334,8 @@ static int coroutine_fn iscsi_co_copy_range_to(BlockDriverState *bs,
                                                BdrvChild *dst,
                                                uint64_t dst_offset,
                                                uint64_t bytes,
-                                               BdrvRequestFlags flags)
+                                               BdrvRequestFlags read_flags,
+                                               BdrvRequestFlags write_flags)
 {
     IscsiLun *dst_lun = dst->bs->opaque;
     IscsiLun *src_lun;
diff --git a/block/qcow2.c b/block/qcow2.c
index 2f9e58e0c4..6855356fb9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3246,13 +3246,14 @@ static int coroutine_fn
 qcow2_co_copy_range_from(BlockDriverState *bs,
                          BdrvChild *src, uint64_t src_offset,
                          BdrvChild *dst, uint64_t dst_offset,
-                         uint64_t bytes, BdrvRequestFlags flags)
+                         uint64_t bytes, BdrvRequestFlags read_flags,
+                         BdrvRequestFlags write_flags)
 {
     BDRVQcow2State *s = bs->opaque;
     int ret;
     unsigned int cur_bytes; /* number of bytes in current iteration */
     BdrvChild *child = NULL;
-    BdrvRequestFlags cur_flags;
+    BdrvRequestFlags cur_write_flags;
 
     assert(!bs->encrypted);
     qemu_co_mutex_lock(&s->lock);
@@ -3261,7 +3262,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
         uint64_t copy_offset = 0;
         /* prepare next request */
         cur_bytes = MIN(bytes, INT_MAX);
-        cur_flags = flags;
+        cur_write_flags = write_flags;
 
         ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes, &copy_offset);
         if (ret < 0) {
@@ -3273,20 +3274,20 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
             if (bs->backing && bs->backing->bs) {
                 int64_t backing_length = bdrv_getlength(bs->backing->bs);
                 if (src_offset >= backing_length) {
-                    cur_flags |= BDRV_REQ_ZERO_WRITE;
+                    cur_write_flags |= BDRV_REQ_ZERO_WRITE;
                 } else {
                     child = bs->backing;
                     cur_bytes = MIN(cur_bytes, backing_length - src_offset);
                     copy_offset = src_offset;
                 }
             } else {
-                cur_flags |= BDRV_REQ_ZERO_WRITE;
+                cur_write_flags |= BDRV_REQ_ZERO_WRITE;
             }
             break;
 
         case QCOW2_CLUSTER_ZERO_PLAIN:
         case QCOW2_CLUSTER_ZERO_ALLOC:
-            cur_flags |= BDRV_REQ_ZERO_WRITE;
+            cur_write_flags |= BDRV_REQ_ZERO_WRITE;
             break;
 
         case QCOW2_CLUSTER_COMPRESSED:
@@ -3310,7 +3311,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
         ret = bdrv_co_copy_range_from(child,
                                       copy_offset,
                                       dst, dst_offset,
-                                      cur_bytes, cur_flags);
+                                      cur_bytes, read_flags, cur_write_flags);
         qemu_co_mutex_lock(&s->lock);
         if (ret < 0) {
             goto out;
@@ -3331,7 +3332,8 @@ static int coroutine_fn
 qcow2_co_copy_range_to(BlockDriverState *bs,
                        BdrvChild *src, uint64_t src_offset,
                        BdrvChild *dst, uint64_t dst_offset,
-                       uint64_t bytes, BdrvRequestFlags flags)
+                       uint64_t bytes, BdrvRequestFlags read_flags,
+                       BdrvRequestFlags write_flags)
 {
     BDRVQcow2State *s = bs->opaque;
     int offset_in_cluster;
@@ -3375,7 +3377,7 @@ qcow2_co_copy_range_to(BlockDriverState *bs,
         ret = bdrv_co_copy_range_to(src, src_offset,
                                     bs->file,
                                     cluster_offset + offset_in_cluster,
-                                    cur_bytes, flags);
+                                    cur_bytes, read_flags, write_flags);
         qemu_co_mutex_lock(&s->lock);
         if (ret < 0) {
             goto fail;
diff --git a/block/raw-format.c b/block/raw-format.c
index b78da564d4..a3591985f6 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -498,9 +498,13 @@ static int raw_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
 }
 
 static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
-                                               BdrvChild *src, uint64_t src_offset,
-                                               BdrvChild *dst, uint64_t dst_offset,
-                                               uint64_t bytes, BdrvRequestFlags flags)
+                                               BdrvChild *src,
+                                               uint64_t src_offset,
+                                               BdrvChild *dst,
+                                               uint64_t dst_offset,
+                                               uint64_t bytes,
+                                               BdrvRequestFlags read_flags,
+                                               BdrvRequestFlags write_flags)
 {
     int ret;
 
@@ -509,13 +513,17 @@ static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
         return ret;
     }
     return bdrv_co_copy_range_from(bs->file, src_offset, dst, dst_offset,
-                                   bytes, flags);
+                                   bytes, read_flags, write_flags);
 }
 
 static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
-                                             BdrvChild *src, uint64_t src_offset,
-                                             BdrvChild *dst, uint64_t dst_offset,
-                                             uint64_t bytes, BdrvRequestFlags flags)
+                                             BdrvChild *src,
+                                             uint64_t src_offset,
+                                             BdrvChild *dst,
+                                             uint64_t dst_offset,
+                                             uint64_t bytes,
+                                             BdrvRequestFlags read_flags,
+                                             BdrvRequestFlags write_flags)
 {
     int ret;
 
@@ -524,7 +532,7 @@ static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
         return ret;
     }
     return bdrv_co_copy_range_to(src, src_offset, bs->file, dst_offset, bytes,
-                                 flags);
+                                 read_flags, write_flags);
 }
 
 BlockDriver bdrv_raw = {
diff --git a/qemu-img.c b/qemu-img.c
index e1a506f7f6..e417c872db 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1783,7 +1783,7 @@ static int coroutine_fn convert_co_copy_range(ImgConvertState *s, int64_t sector
 
         ret = blk_co_copy_range(blk, offset, s->target,
                                 sector_num << BDRV_SECTOR_BITS,
-                                n << BDRV_SECTOR_BITS, 0);
+                                n << BDRV_SECTOR_BITS, 0, 0);
         if (ret < 0) {
             return ret;
         }
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 3/4] block: add BDRV_REQ_SERIALISING flag
  2018-07-05  7:46 [Qemu-devel] [PATCH v3 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 1/4] block: disallow BDRV_REQ_NO_SERIALISING for write Vladimir Sementsov-Ogievskiy
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 2/4] block: split flags in copy_range Vladimir Sementsov-Ogievskiy
@ 2018-07-05  7:46 ` Vladimir Sementsov-Ogievskiy
  2018-07-06 21:52   ` Eric Blake
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 4/4] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
  2018-07-06  7:17 ` [Qemu-devel] [PATCH v3 0/4] fix image fleecing Fam Zheng
  4 siblings, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-05  7:46 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pl, pbonzini, ronniesahlberg, famz, stefanha, mreitz, kwolf,
	jcody, jsnow, den, vsementsov

Serialized writes should be used in copy-on-write of backup(sync=none)
for image fleecing scheme.

We need to change an assert in bdrv_aligned_pwritev, added in
28de2dcd88de. The assert may fail now, because call to
wait_serialising_requests here may become first call to it for this
request with serializing flag set. It occurs if the request is aligned
(otherwise, we should already set serializing flag before calling
bdrv_aligned_pwritev and correspondingly waited for all intersecting
requests). However, for aligned requests, we should not care about
outdating of previously read data, as there no such data. Therefore,
let's just update an assert to not care about aligned requests.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block.h | 15 ++++++++++++++-
 block/io.c            | 26 +++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 478ebc6c6c..fded1b7657 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -71,8 +71,21 @@ typedef enum {
      * content. */
     BDRV_REQ_WRITE_UNCHANGED    = 0x40,
 
+    /* BDRV_REQ_SERIALISING forces request serializing. Only for writes. Used
+     * to serialize writes to target in backup process, when source is in
+     * backing chain of target (image fleecing scheme is example) to avoid a
+     * possibility for a client, reading from target during backup to read
+     * updated data from source in case of unhappy race of client-read and
+     * backup-cow-write.
+     *
+     * Note, that BDRV_REQ_SERIALISING is _not_ opposite in meaning to
+     * BDRV_REQ_NO_SERIALISING. May be, better name for the latter is
+     * _DO_NOT_WAIT_FOR_SERIALISING, but it is too long.
+     */
+    BDRV_REQ_SERIALISING        = 0x80,
+
     /* Mask of valid flags */
-    BDRV_REQ_MASK               = 0x7f,
+    BDRV_REQ_MASK               = 0xff,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/block/io.c b/block/io.c
index b66764d6c4..ca71bde3a8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -623,6 +623,16 @@ static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
     req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes);
 }
 
+static bool is_request_serialising_and_aligned(BdrvTrackedRequest *req)
+{
+    /* if request is serialising, overlap_offset and overlap_bytes are set, so
+     * we can check is request aligned. Otherwise don't care and return false
+     */
+
+    return req->serialising && (req->offset == req->overlap_offset) &&
+           (req->bytes == req->overlap_bytes);
+}
+
 /**
  * Round a region to cluster boundaries
  */
@@ -1291,6 +1301,9 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
         mark_request_serialising(req, bdrv_get_cluster_size(bs));
     }
 
+    /* BDRV_REQ_SERIALISING is only for write operation */
+    assert(!(flags & BDRV_REQ_SERIALISING));
+
     if (!(flags & BDRV_REQ_NO_SERIALISING)) {
         wait_serialising_requests(req);
     }
@@ -1574,8 +1587,14 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
 
     /* BDRV_REQ_NO_SERIALISING is only for read operation */
     assert(!(flags & BDRV_REQ_NO_SERIALISING));
+
+    if (flags & BDRV_REQ_SERIALISING) {
+        mark_request_serialising(req, bdrv_get_cluster_size(bs));
+    }
+
     waited = wait_serialising_requests(req);
-    assert(!waited || !req->serialising);
+    assert(!waited || !req->serialising ||
+           is_request_serialising_and_aligned(req));
     assert(req->overlap_offset <= offset);
     assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
     if (flags & BDRV_REQ_WRITE_UNCHANGED) {
@@ -2929,12 +2948,17 @@ static int coroutine_fn bdrv_co_copy_range_internal(
     tracked_request_begin(&dst_req, dst->bs, dst_offset,
                           bytes, BDRV_TRACKED_WRITE);
 
+    /* BDRV_REQ_SERIALISING is only for write operation */
+    assert(!(read_flags & BDRV_REQ_SERIALISING));
     if (!(read_flags & BDRV_REQ_NO_SERIALISING)) {
         wait_serialising_requests(&src_req);
     }
 
     /* BDRV_REQ_NO_SERIALISING is only for read */
     assert(!(write_flags & BDRV_REQ_NO_SERIALISING));
+    if (write_flags & BDRV_REQ_SERIALISING) {
+        mark_request_serialising(&dst_req, bdrv_get_cluster_size(dst->bs));
+    }
     wait_serialising_requests(&dst_req);
 
     if (recurse_src) {
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 4/4] block/backup: fix fleecing scheme: use serialized writes
  2018-07-05  7:46 [Qemu-devel] [PATCH v3 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 3/4] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
@ 2018-07-05  7:46 ` Vladimir Sementsov-Ogievskiy
  2018-07-06  7:17 ` [Qemu-devel] [PATCH v3 0/4] fix image fleecing Fam Zheng
  4 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-05  7:46 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: pl, pbonzini, ronniesahlberg, famz, stefanha, mreitz, kwolf,
	jcody, jsnow, den, vsementsov

Fleecing scheme works as follows: we want a kind of temporary snapshot
of active drive A. We create temporary image B, with B->backing = A.
Then we start backup(sync=none) from A to B. From this point, B reads
as point-in-time snapshot of A (A continues to be active drive,
accepting guest IO).

This scheme needs some additional synchronization between reads from B
and backup COW operations, otherwise, the following situation is
theoretically possible:

(assume B is qcow2, client is NBD client, reading from B)

1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and
   goes up to l2 table loading (assume cache miss)

2) guest write => backup COW => qcow2 write =>
   try to take qcow2 mutex => waiting

3. l2 table loaded, we see that cluster is UNALLOCATED, go to
   "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before
   bdrv_co_preadv(bs->backing, ...)

4) aha, mutex unlocked, backup COW continues, and we finally finish
   guest write and change cluster in our active disk A

5. actually, do bdrv_co_preadv(bs->backing, ...) and read
   _new updated_ data.

To avoid this, let's make backup writes serializing, to not intersect
with reads from B.

Note: we expand range of handled cases from (sync=none and
B->backing = A) to just (A in backing chain of B), to finally allow
safe reading from B during backup for all cases when A in backing chain
of B, i.e. B formally looks like point-in-time snapshot of A.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index f3e4e814b6..319fc922e8 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -47,6 +47,8 @@ typedef struct BackupBlockJob {
     HBitmap *copy_bitmap;
     bool use_copy_range;
     int64_t copy_range_size;
+
+    bool serialize_target_writes;
 } BackupBlockJob;
 
 static const BlockJobDriver backup_job_driver;
@@ -102,6 +104,8 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
     QEMUIOVector qiov;
     BlockBackend *blk = job->common.blk;
     int nbytes;
+    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
+    int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
     hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1);
     nbytes = MIN(job->cluster_size, job->len - start);
@@ -112,8 +116,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
     iov.iov_len = nbytes;
     qemu_iovec_init_external(&qiov, &iov, 1);
 
-    ret = blk_co_preadv(blk, start, qiov.size, &qiov,
-                        is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
+    ret = blk_co_preadv(blk, start, qiov.size, &qiov, read_flags);
     if (ret < 0) {
         trace_backup_do_cow_read_fail(job, start, ret);
         if (error_is_read) {
@@ -124,11 +127,11 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
 
     if (qemu_iovec_is_zero(&qiov)) {
         ret = blk_co_pwrite_zeroes(job->target, start,
-                                   qiov.size, BDRV_REQ_MAY_UNMAP);
+                                   qiov.size, write_flags | BDRV_REQ_MAY_UNMAP);
     } else {
         ret = blk_co_pwritev(job->target, start,
-                             qiov.size, &qiov,
-                             job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+                             qiov.size, &qiov, write_flags |
+                             (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));
     }
     if (ret < 0) {
         trace_backup_do_cow_write_fail(job, start, ret);
@@ -156,6 +159,8 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     int nr_clusters;
     BlockBackend *blk = job->common.blk;
     int nbytes;
+    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
+    int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
     assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
     nbytes = MIN(job->copy_range_size, end - start);
@@ -163,7 +168,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     hbitmap_reset(job->copy_bitmap, start / job->cluster_size,
                   nr_clusters);
     ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
-                            is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0);
+                            read_flags, write_flags);
     if (ret < 0) {
         trace_backup_do_cow_copy_range_fail(job, start, ret);
         hbitmap_set(job->copy_bitmap, start / job->cluster_size,
@@ -701,6 +706,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                        sync_bitmap : NULL;
     job->compress = compress;
 
+    /* Detect image-fleecing (and similar) schemes */
+    job->serialize_target_writes = bdrv_chain_contains(target, bs);
+
     /* If there is no backing file on the target, we cannot rely on COW if our
      * backup cluster size is smaller than the target cluster size. Even for
      * targets with a backing file, try to avoid COW if possible. */
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] fix image fleecing
  2018-07-05  7:46 [Qemu-devel] [PATCH v3 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 4/4] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
@ 2018-07-06  7:17 ` Fam Zheng
  2018-07-06 13:41   ` Vladimir Sementsov-Ogievskiy
  4 siblings, 1 reply; 13+ messages in thread
From: Fam Zheng @ 2018-07-06  7:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, pl, pbonzini, ronniesahlberg, stefanha,
	mreitz, kwolf, jcody, jsnow, den

On Thu, 07/05 10:46, Vladimir Sementsov-Ogievskiy wrote:
> Hi all.
> 
> This fixes image fleecing scheme for 3.0, details are in 04 patch.

Looks like this breaks 'test-replication':

http://patchew.org/QEMU/20180705074638.770905-1-vsementsov@virtuozzo.com/

test-replication: /stor/work/qemu/block/io.c:725: wait_serialising_requests: Assertion `qemu_coroutine_self() != req->co' failed.
GTester: last random seed: R02Se49967625f19c5be9918a2503fb2fff5
test-replication: /stor/work/qemu/block/io.c:725: wait_serialising_requests: Assertion `qemu_coroutine_self() != req->co' failed.
  GTESTER tests/test-qht-par
GTester: last random seed: R02S8fad2bb2daddc51eb51623bb7c86cb67
test-replication: /stor/work/qemu/block/io.c:725: wait_serialising_requests: Assertion `qemu_coroutine_self() != req->co' failed.
GTester: last random seed: R02S024718e9b1e2b1facafe254c43e1430b

Fam

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] fix image fleecing
  2018-07-06  7:17 ` [Qemu-devel] [PATCH v3 0/4] fix image fleecing Fam Zheng
@ 2018-07-06 13:41   ` Vladimir Sementsov-Ogievskiy
  2018-07-06 14:34     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-06 13:41 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, qemu-block, pl, pbonzini, ronniesahlberg, stefanha,
	mreitz, kwolf, jcody, jsnow, den

06.07.2018 10:17, Fam Zheng wrote:
> On Thu, 07/05 10:46, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all.
>>
>> This fixes image fleecing scheme for 3.0, details are in 04 patch.
> Looks like this breaks 'test-replication':
>
> http://patchew.org/QEMU/20180705074638.770905-1-vsementsov@virtuozzo.com/
>
> test-replication: /stor/work/qemu/block/io.c:725: wait_serialising_requests: Assertion `qemu_coroutine_self() != req->co' failed.
> GTester: last random seed: R02Se49967625f19c5be9918a2503fb2fff5
> test-replication: /stor/work/qemu/block/io.c:725: wait_serialising_requests: Assertion `qemu_coroutine_self() != req->co' failed.
>    GTESTER tests/test-qht-par
> GTester: last random seed: R02S8fad2bb2daddc51eb51623bb7c86cb67
> test-replication: /stor/work/qemu/block/io.c:725: wait_serialising_requests: Assertion `qemu_coroutine_self() != req->co' failed.
> GTester: last random seed: R02S024718e9b1e2b1facafe254c43e1430b
>
> Fam

interesting.

in gdb I see, that both requests are identical (i.e. their 
BdrvTrackedRequeststructs, except list filed)writes (self and req)...

and it's actually the same request, duplicated in the ist, first it is 
created in bdrv_co_copy_range_internal,

and than again,
#0  tracked_request_begin (req=0x7fffe8f9e7d0, bs=0x555555a43040, 
offset=33554432, bytes=33554432, type=
     BDRV_TRACKED_WRITE) at block/io.c:598
#1  0x00005555555f76bc in bdrv_co_copy_range_internal 
(src=0x55555597ef80, src_offset=33882112,
     dst=0x5555559f1d30, dst_offset=33554432, bytes=33554432, 
read_flags=BDRV_REQ_NO_SERIALISING,
     write_flags=BDRV_REQ_SERIALISING, recurse_src=true) at block/io.c:2949
#2  0x00005555555f78e9 in bdrv_co_copy_range_from (src=0x55555597ef80, 
src_offset=33882112,
     dst=0x5555559f1d30, dst_offset=33554432, bytes=33554432, 
read_flags=BDRV_REQ_NO_SERIALISING,
     write_flags=BDRV_REQ_SERIALISING) at block/io.c:2995
#3  0x00005555555ae931 in qcow2_co_copy_range_from (bs=0x555555977ff0, 
src=0x5555559f14d0,
     src_offset=33554432, dst=0x5555559f1d30, dst_offset=33554432, 
bytes=33554432,
     read_flags=BDRV_REQ_NO_SERIALISING, 
write_flags=BDRV_REQ_SERIALISING) at block/qcow2.c:3311
#4  0x00005555555f77cf in bdrv_co_copy_range_internal 
(src=0x5555559f14d0, src_offset=33554432,
     dst=0x5555559f1d30, dst_offset=33554432, bytes=33554432, 
read_flags=BDRV_REQ_NO_SERIALISING,
     write_flags=BDRV_REQ_SERIALISING, recurse_src=true) at block/io.c:2966


so, it's actually bug in copy_range architecture.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] fix image fleecing
  2018-07-06 13:41   ` Vladimir Sementsov-Ogievskiy
@ 2018-07-06 14:34     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-06 14:34 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, qemu-block, pl, pbonzini, ronniesahlberg, stefanha,
	mreitz, kwolf, jcody, jsnow, den

06.07.2018 16:41, Vladimir Sementsov-Ogievskiy wrote:
> 06.07.2018 10:17, Fam Zheng wrote:
>> On Thu, 07/05 10:46, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all.
>>>
>>> This fixes image fleecing scheme for 3.0, details are in 04 patch.
>> Looks like this breaks 'test-replication':
>>
>> http://patchew.org/QEMU/20180705074638.770905-1-vsementsov@virtuozzo.com/ 
>>
>>
>> test-replication: /stor/work/qemu/block/io.c:725: 
>> wait_serialising_requests: Assertion `qemu_coroutine_self() != 
>> req->co' failed.
>> GTester: last random seed: R02Se49967625f19c5be9918a2503fb2fff5
>> test-replication: /stor/work/qemu/block/io.c:725: 
>> wait_serialising_requests: Assertion `qemu_coroutine_self() != 
>> req->co' failed.
>>    GTESTER tests/test-qht-par
>> GTester: last random seed: R02S8fad2bb2daddc51eb51623bb7c86cb67
>> test-replication: /stor/work/qemu/block/io.c:725: 
>> wait_serialising_requests: Assertion `qemu_coroutine_self() != 
>> req->co' failed.
>> GTester: last random seed: R02S024718e9b1e2b1facafe254c43e1430b
>>
>> Fam
>
> interesting.
>
> in gdb I see, that both requests are identical (i.e. their 
> BdrvTrackedRequeststructs, except list filed)writes (self and req)...
>
> and it's actually the same request, duplicated in the ist, first it is 
> created in bdrv_co_copy_range_internal,
>
> and than again,
> #0  tracked_request_begin (req=0x7fffe8f9e7d0, bs=0x555555a43040, 
> offset=33554432, bytes=33554432, type=
>     BDRV_TRACKED_WRITE) at block/io.c:598
> #1  0x00005555555f76bc in bdrv_co_copy_range_internal 
> (src=0x55555597ef80, src_offset=33882112,
>     dst=0x5555559f1d30, dst_offset=33554432, bytes=33554432, 
> read_flags=BDRV_REQ_NO_SERIALISING,
>     write_flags=BDRV_REQ_SERIALISING, recurse_src=true) at 
> block/io.c:2949
> #2  0x00005555555f78e9 in bdrv_co_copy_range_from (src=0x55555597ef80, 
> src_offset=33882112,
>     dst=0x5555559f1d30, dst_offset=33554432, bytes=33554432, 
> read_flags=BDRV_REQ_NO_SERIALISING,
>     write_flags=BDRV_REQ_SERIALISING) at block/io.c:2995
> #3  0x00005555555ae931 in qcow2_co_copy_range_from (bs=0x555555977ff0, 
> src=0x5555559f14d0,
>     src_offset=33554432, dst=0x5555559f1d30, dst_offset=33554432, 
> bytes=33554432,
>     read_flags=BDRV_REQ_NO_SERIALISING, 
> write_flags=BDRV_REQ_SERIALISING) at block/qcow2.c:3311
> #4  0x00005555555f77cf in bdrv_co_copy_range_internal 
> (src=0x5555559f14d0, src_offset=33554432,
>     dst=0x5555559f1d30, dst_offset=33554432, bytes=33554432, 
> read_flags=BDRV_REQ_NO_SERIALISING,
>     write_flags=BDRV_REQ_SERIALISING, recurse_src=true) at 
> block/io.c:2966
>
>
> so, it's actually bug in copy_range architecture.
>

will fix with the next resend..

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/4] block: disallow BDRV_REQ_NO_SERIALISING for write
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 1/4] block: disallow BDRV_REQ_NO_SERIALISING for write Vladimir Sementsov-Ogievskiy
@ 2018-07-06 21:32   ` Eric Blake
  2018-07-06 21:46     ` Eric Blake
  2018-07-09 15:56     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Blake @ 2018-07-06 21:32 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, famz, ronniesahlberg, jcody, pl, mreitz, stefanha, den,
	pbonzini, jsnow

On 07/05/2018 02:46 AM, Vladimir Sementsov-Ogievskiy wrote:
> Before commit 9ded4a01149 "backup: Use copy offloading",
> BDRV_REQ_NO_SERIALISING was used for only one case: read in
> copy-on-write operation during backup. Also, the flag was handled only
> on read path (in bdrv_co_preadv and bdrv_aligned_preadv).
> 
> After 9ded4a01149, flag is used for not waiting serializing operations
> on backup target (in same case of copy-on-write operation). This
> behavior change is unsubstantiated and potentially dangerous, let's
> drop it.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/block.h | 13 +++++++++++++
>   block/io.c            |  7 ++++++-
>   2 files changed, 19 insertions(+), 1 deletion(-)
> 

Commenting only on the grammar:

> diff --git a/include/block/block.h b/include/block/block.h
> index e5c7759a0c..a06a4d27de 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -50,6 +50,19 @@ typedef enum {
>        * opened with BDRV_O_UNMAP.
>        */
>       BDRV_REQ_MAY_UNMAP          = 0x4,
> +
> +    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
> +     * wait_serialising_requests(), when reading.

Either:

/* BDRV_REQ_NO_SERALISING means that...

or

/* The BDRV_REQ_NO_SERIALISING flag means that...

s/want to/want/

> +     *
> +     * This flag is used for backup copy on write operation, when we need to
> +     * read old data before write (write notifier triggered). It is ok, due to
> +     * we already waited for serializing requests in initiative write (see
> +     * bdrv_aligned_pwritev), and it is necessary for the case when initiative
> +     * write is serializing itself (we'll dead lock waiting it).

It is okay since we already waited for other serializing requests in the 
initiating write (see bdrv_aligned_pwritev), and it is necessary since 
the initiating write is already serializing (without the flag, the read 
would deadlock waiting for the write to complete).

> +     *
> +     * The described case is the only usage for the flag for now, so, it is
> +     * supported only for read operation and restricted for write.

This last sentence is rather wordy; I'm fine with just:

The flag is only valid during read operations.

> +     */
>       BDRV_REQ_NO_SERIALISING     = 0x8,
>       BDRV_REQ_FUA                = 0x10,
>       BDRV_REQ_WRITE_COMPRESSED   = 0x20,

We're inconsistent on which flags we document; it might be nice to have 
a comment for each of them.  But not necessarily this patch's problem.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/4] block: disallow BDRV_REQ_NO_SERIALISING for write
  2018-07-06 21:32   ` Eric Blake
@ 2018-07-06 21:46     ` Eric Blake
  2018-07-09 16:08       ` Vladimir Sementsov-Ogievskiy
  2018-07-09 15:56     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Blake @ 2018-07-06 21:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, famz, stefanha, jcody, pl, mreitz, ronniesahlberg,
	pbonzini, den, jsnow

On 07/06/2018 04:32 PM, Eric Blake wrote:
> On 07/05/2018 02:46 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Before commit 9ded4a01149 "backup: Use copy offloading",
>> BDRV_REQ_NO_SERIALISING was used for only one case: read in
>> copy-on-write operation during backup. Also, the flag was handled only
>> on read path (in bdrv_co_preadv and bdrv_aligned_preadv).
>>
>> After 9ded4a01149, flag is used for not waiting serializing operations
>> on backup target (in same case of copy-on-write operation). This
>> behavior change is unsubstantiated and potentially dangerous, let's
>> drop it.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   include/block/block.h | 13 +++++++++++++
>>   block/io.c            |  7 ++++++-
>>   2 files changed, 19 insertions(+), 1 deletion(-)
>>
> 
> Commenting only on the grammar:
> 
>> diff --git a/include/block/block.h b/include/block/block.h
>> index e5c7759a0c..a06a4d27de 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -50,6 +50,19 @@ typedef enum {
>>        * opened with BDRV_O_UNMAP.
>>        */
>>       BDRV_REQ_MAY_UNMAP          = 0x4,
>> +
>> +    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
>> +     * wait_serialising_requests(), when reading.
> 
> Either:
> 
> /* BDRV_REQ_NO_SERALISING means that...
> 
> or
> 
> /* The BDRV_REQ_NO_SERIALISING flag means that...
> 
> s/want to/want/

Or, after reading patch 3/4,

BDRV_REQ_NO_SERAILISING bypasses request serialisation during reads.

(where the counterpart starts:

BDRV_REQ_SERIALISING forces request serialisation during writes.
)

> 
>> +     *
>> +     * This flag is used for backup copy on write operation, when we 
>> need to
>> +     * read old data before write (write notifier triggered). It is 
>> ok, due to
>> +     * we already waited for serializing requests in initiative write 
>> (see
>> +     * bdrv_aligned_pwritev), and it is necessary for the case when 
>> initiative
>> +     * write is serializing itself (we'll dead lock waiting it).
> 
> It is okay since we already waited for other serializing requests in the 
> initiating write (see bdrv_aligned_pwritev), and it is necessary since 
> the initiating write is already serializing (without the flag, the read 
> would deadlock waiting for the write to complete).
> 
>> +     *
>> +     * The described case is the only usage for the flag for now, so, 
>> it is
>> +     * supported only for read operation and restricted for write.
> 
> This last sentence is rather wordy; I'm fine with just:
> 
> The flag is only valid during read operations.

Or even just drop this last paragraph, since the first sentence of the 
comment already stated it was only for reads.

> 
>> +     */
>>       BDRV_REQ_NO_SERIALISING     = 0x8,
>>       BDRV_REQ_FUA                = 0x10,
>>       BDRV_REQ_WRITE_COMPRESSED   = 0x20,
> 
> We're inconsistent on which flags we document; it might be nice to have 
> a comment for each of them.  But not necessarily this patch's problem.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/4] block: add BDRV_REQ_SERIALISING flag
  2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 3/4] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
@ 2018-07-06 21:52   ` Eric Blake
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Blake @ 2018-07-06 21:52 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, famz, ronniesahlberg, jcody, pl, mreitz, stefanha, den,
	pbonzini, jsnow

On 07/05/2018 02:46 AM, Vladimir Sementsov-Ogievskiy wrote:
> Serialized writes should be used in copy-on-write of backup(sync=none)
> for image fleecing scheme.
> 
> We need to change an assert in bdrv_aligned_pwritev, added in
> 28de2dcd88de. The assert may fail now, because call to
> wait_serialising_requests here may become first call to it for this
> request with serializing flag set. It occurs if the request is aligned
> (otherwise, we should already set serializing flag before calling
> bdrv_aligned_pwritev and correspondingly waited for all intersecting
> requests). However, for aligned requests, we should not care about
> outdating of previously read data, as there no such data. Therefore,
> let's just update an assert to not care about aligned requests.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/block.h | 15 ++++++++++++++-
>   block/io.c            | 26 +++++++++++++++++++++++++-
>   2 files changed, 39 insertions(+), 2 deletions(-)
> 

Again just grammar suggestions:

> diff --git a/include/block/block.h b/include/block/block.h
> index 478ebc6c6c..fded1b7657 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -71,8 +71,21 @@ typedef enum {
>        * content. */
>       BDRV_REQ_WRITE_UNCHANGED    = 0x40,
>   
> +    /* BDRV_REQ_SERIALISING forces request serializing. Only for writes. 

After comparison with patch 1/4, I'd suggest:

BDRV_REQ_SERIALISING forces request serialisation for writes.

>  Used
> +     * to serialize writes to target in backup process, when source is in
> +     * backing chain of target (image fleecing scheme is example) to avoid a
> +     * possibility for a client, reading from target during backup to read
> +     * updated data from source in case of unhappy race of client-read and
> +     * backup-cow-write.

It is used to ensure that writes to the backing file of a backup process 
target cannot race with a read of the backup target that defers to the 
backing file.


> +     *
> +     * Note, that BDRV_REQ_SERIALISING is _not_ opposite in meaning to
> +     * BDRV_REQ_NO_SERIALISING. May be, better name for the latter is
> +     * _DO_NOT_WAIT_FOR_SERIALISING, but it is too long.

The longer rationale on naming might fit better in the commit comment; 
I'd probably drop this paragraph from the code, But if you keep it, the 
second sentence reads better as:

A more descriptive name for the latter might be 
_DO_NOT_WAIT_FOR_SERIALISING, except that is too long.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/4] block: disallow BDRV_REQ_NO_SERIALISING for write
  2018-07-06 21:32   ` Eric Blake
  2018-07-06 21:46     ` Eric Blake
@ 2018-07-09 15:56     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-09 15:56 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: kwolf, famz, ronniesahlberg, jcody, pl, mreitz, stefanha, den,
	pbonzini, jsnow

07.07.2018 00:32, Eric Blake wrote:
> On 07/05/2018 02:46 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Before commit 9ded4a01149 "backup: Use copy offloading",
>> BDRV_REQ_NO_SERIALISING was used for only one case: read in
>> copy-on-write operation during backup. Also, the flag was handled only
>> on read path (in bdrv_co_preadv and bdrv_aligned_preadv).
>>
>> After 9ded4a01149, flag is used for not waiting serializing operations
>> on backup target (in same case of copy-on-write operation). This
>> behavior change is unsubstantiated and potentially dangerous, let's
>> drop it.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   include/block/block.h | 13 +++++++++++++
>>   block/io.c            |  7 ++++++-
>>   2 files changed, 19 insertions(+), 1 deletion(-)
>>
>
> Commenting only on the grammar:
>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index e5c7759a0c..a06a4d27de 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -50,6 +50,19 @@ typedef enum {
>>        * opened with BDRV_O_UNMAP.
>>        */
>>       BDRV_REQ_MAY_UNMAP          = 0x4,
>> +
>> +    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
>> +     * wait_serialising_requests(), when reading.
>
> Either:
>
> /* BDRV_REQ_NO_SERALISING means that...
>
> or
>
> /* The BDRV_REQ_NO_SERIALISING flag means that...
>
> s/want to/want/
>
>> +     *
>> +     * This flag is used for backup copy on write operation, when we 
>> need to
>> +     * read old data before write (write notifier triggered). It is 
>> ok, due to
>> +     * we already waited for serializing requests in initiative 
>> write (see
>> +     * bdrv_aligned_pwritev), and it is necessary for the case when 
>> initiative
>> +     * write is serializing itself (we'll dead lock waiting it).
>
> It is okay since we already waited for other serializing requests in 
> the initiating write (see bdrv_aligned_pwritev), and it is necessary 
> since the initiating write is already serializing (without the flag, 
> the read would deadlock waiting for the write to complete).

it is necessary only for the case ..., not always, isn't it?

I'll use
It is okay since we already waited for other serializing requests in the 
initiating write (see bdrv_aligned_pwritev), and it is necessary if the 
initiating write is already serializing (without the flag, the read 
would deadlock waiting for the serialising write to complete).
>
>> +     *
>> +     * The described case is the only usage for the flag for now, 
>> so, it is
>> +     * supported only for read operation and restricted for write.
>
> This last sentence is rather wordy; I'm fine with just:
>
> The flag is only valid during read operations.

ok

>
>> +     */
>>       BDRV_REQ_NO_SERIALISING     = 0x8,
>>       BDRV_REQ_FUA                = 0x10,
>>       BDRV_REQ_WRITE_COMPRESSED   = 0x20,
>
> We're inconsistent on which flags we document; it might be nice to 
> have a comment for each of them.  But not necessarily this patch's 
> problem.
>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/4] block: disallow BDRV_REQ_NO_SERIALISING for write
  2018-07-06 21:46     ` Eric Blake
@ 2018-07-09 16:08       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-09 16:08 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: kwolf, famz, stefanha, jcody, pl, mreitz, ronniesahlberg,
	pbonzini, den, jsnow

07.07.2018 00:46, Eric Blake wrote:
> On 07/06/2018 04:32 PM, Eric Blake wrote:
>> On 07/05/2018 02:46 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Before commit 9ded4a01149 "backup: Use copy offloading",
>>> BDRV_REQ_NO_SERIALISING was used for only one case: read in
>>> copy-on-write operation during backup. Also, the flag was handled only
>>> on read path (in bdrv_co_preadv and bdrv_aligned_preadv).
>>>
>>> After 9ded4a01149, flag is used for not waiting serializing operations
>>> on backup target (in same case of copy-on-write operation). This
>>> behavior change is unsubstantiated and potentially dangerous, let's
>>> drop it.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   include/block/block.h | 13 +++++++++++++
>>>   block/io.c            |  7 ++++++-
>>>   2 files changed, 19 insertions(+), 1 deletion(-)
>>>
>>
>> Commenting only on the grammar:
>>
>>> diff --git a/include/block/block.h b/include/block/block.h
>>> index e5c7759a0c..a06a4d27de 100644
>>> --- a/include/block/block.h
>>> +++ b/include/block/block.h
>>> @@ -50,6 +50,19 @@ typedef enum {
>>>        * opened with BDRV_O_UNMAP.
>>>        */
>>>       BDRV_REQ_MAY_UNMAP          = 0x4,
>>> +
>>> +    /* The BDRV_REQ_NO_SERIALISING means that we don't want to
>>> +     * wait_serialising_requests(), when reading.
>>
>> Either:
>>
>> /* BDRV_REQ_NO_SERALISING means that...
>>
>> or
>>
>> /* The BDRV_REQ_NO_SERIALISING flag means that...
>>
>> s/want to/want/
>
> Or, after reading patch 3/4,
>
> BDRV_REQ_NO_SERAILISING bypasses request serialisation during reads.
>
> (where the counterpart starts:
>
> BDRV_REQ_SERIALISING forces request serialisation during writes.
> )

hmm. "bypasses request serialisation during reads" - it's not quite 
right, I think.
NO_SERIALISING means only that we will not wait for other requests. 
However, our request may be serialising itself, so other requests will 
wait for it (which, actually don't really make sense, because they will 
have to wait for original write operation too, which starts before read 
and finishes after it). So, it's funny, but currently, all 
"NO_SERIALISING" requests are strictly serialising in fact (except that 
they intersect with original writes (but in fact, are done before them).

>
>>
>>> +     *
>>> +     * This flag is used for backup copy on write operation, when 
>>> we need to
>>> +     * read old data before write (write notifier triggered). It is 
>>> ok, due to
>>> +     * we already waited for serializing requests in initiative 
>>> write (see
>>> +     * bdrv_aligned_pwritev), and it is necessary for the case when 
>>> initiative
>>> +     * write is serializing itself (we'll dead lock waiting it).
>>
>> It is okay since we already waited for other serializing requests in 
>> the initiating write (see bdrv_aligned_pwritev), and it is necessary 
>> since the initiating write is already serializing (without the flag, 
>> the read would deadlock waiting for the write to complete).
>>
>>> +     *
>>> +     * The described case is the only usage for the flag for now, 
>>> so, it is
>>> +     * supported only for read operation and restricted for write.
>>
>> This last sentence is rather wordy; I'm fine with just:
>>
>> The flag is only valid during read operations.
>
> Or even just drop this last paragraph, since the first sentence of the 
> comment already stated it was only for reads.
>
>>
>>> +     */
>>>       BDRV_REQ_NO_SERIALISING     = 0x8,
>>>       BDRV_REQ_FUA                = 0x10,
>>>       BDRV_REQ_WRITE_COMPRESSED   = 0x20,
>>
>> We're inconsistent on which flags we document; it might be nice to 
>> have a comment for each of them.  But not necessarily this patch's 
>> problem.
>>
>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-07-09 16:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-05  7:46 [Qemu-devel] [PATCH v3 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 1/4] block: disallow BDRV_REQ_NO_SERIALISING for write Vladimir Sementsov-Ogievskiy
2018-07-06 21:32   ` Eric Blake
2018-07-06 21:46     ` Eric Blake
2018-07-09 16:08       ` Vladimir Sementsov-Ogievskiy
2018-07-09 15:56     ` Vladimir Sementsov-Ogievskiy
2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 2/4] block: split flags in copy_range Vladimir Sementsov-Ogievskiy
2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 3/4] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
2018-07-06 21:52   ` Eric Blake
2018-07-05  7:46 ` [Qemu-devel] [PATCH v3 4/4] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
2018-07-06  7:17 ` [Qemu-devel] [PATCH v3 0/4] fix image fleecing Fam Zheng
2018-07-06 13:41   ` Vladimir Sementsov-Ogievskiy
2018-07-06 14:34     ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.