All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading
@ 2018-03-29 11:09 Fam Zheng
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 1/8] block: Introduce bdrv_co_map_range API Fam Zheng
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Fam Zheng @ 2018-03-29 11:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, pbonzini,
	Stefan Hajnoczi, eblake

[Posting a preview RFC for the general idea discussion and internal API review.
Libiscsi support is being worked on in the meantime.]

This series introduces block layer API for copy offloading and makes use of it
in qemu-img convert.

For now we implemented the operation in local file protocol with
copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
and potentially more.

As far as its usage goes, in addition to qemu-img convert, we can emulate
offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror.

The new bdrv_co_map_range can also be an alternative way to implement format
drivers in the future, once we make block/io.c use it in preadv/pwritev paths.

Fam Zheng (8):
  block: Introduce bdrv_co_map_range API
  qcow2: Implement bdrv_co_map_range
  block: Introduce bdrv_co_copy_range
  file-posix: Implement bdrv_co_copy_range
  file-posix: Implement bdrv_co_map_range
  raw: Implement raw_co_map_range
  block-backend: Add blk_co_copy_range
  qemu-img: Convert with copy offloading

 block/block-backend.c          |   8 ++
 block/file-posix.c             |  92 +++++++++++++++++++-
 block/io.c                     | 192 +++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c                  | 103 +++++++++++++++-------
 block/raw-format.c             |   9 ++
 include/block/block.h          |  12 ++-
 include/block/block_int.h      |  35 ++++++++
 include/block/raw-aio.h        |  10 ++-
 include/sysemu/block-backend.h |   4 +
 qemu-img.c                     |  45 +++++++++-
 10 files changed, 472 insertions(+), 38 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC PATCH 1/8] block: Introduce bdrv_co_map_range API
  2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
@ 2018-03-29 11:09 ` Fam Zheng
  2018-04-04 12:57   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-04 13:01   ` Stefan Hajnoczi
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 2/8] qcow2: Implement bdrv_co_map_range Fam Zheng
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 20+ messages in thread
From: Fam Zheng @ 2018-03-29 11:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, pbonzini,
	Stefan Hajnoczi, eblake

A little similar to bdrv_co_block_status but with the capability to do
allocation depending on the BDRV_REQ_ALLOCATE flag, this API is the
building block needed by copy offloading to work for format drivers.

The idea is the format drivers return "where" to copy from/to in its
underlying "file", then the block layer will drill down until it reaches
a protocol layer that can do offloaded copying.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/io.c                | 44 ++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |  8 +++++++-
 include/block/block_int.h | 21 +++++++++++++++++++++
 3 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index bd9a19a9c4..1b4cfcacb1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2826,3 +2826,47 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
         bdrv_unregister_buf(child->bs, host);
     }
 }
+
+int coroutine_fn bdrv_co_map_range(BdrvChild *child, int64_t offset,
+                                   int64_t bytes, int64_t *pnum, int64_t *map,
+                                   BlockDriverState **file,
+                                   int flags)
+{
+    BlockDriverState *bs = child->bs;
+
+    if (bs->encrypted) {
+        return -ENOTSUP;
+    }
+    while (bs && bs->drv && bs->drv->bdrv_co_map_range) {
+        int64_t cur_pnum, cur_map;
+        BlockDriverState *cur_file = NULL;
+        int r = bs->drv->bdrv_co_map_range(bs, offset, bytes,
+                                           &cur_pnum, &cur_map, &cur_file,
+                                           flags);
+        if (r < 0) {
+            return r;
+        }
+        offset = cur_map;
+        bytes = cur_pnum;
+        if (pnum) {
+            *pnum = cur_pnum;
+        }
+        if (map) {
+            *map = cur_map;
+        }
+        if (!(r & BDRV_BLOCK_OFFSET_VALID)) {
+            return -ENOTSUP;
+        }
+        if (file) {
+            *file = cur_file;
+        }
+        if (!cur_file) {
+            return -ENOTSUP;
+        }
+        if (cur_file == bs) {
+            return r;
+        }
+        bs = cur_file;
+    }
+    return -ENOTSUP;
+}
diff --git a/include/block/block.h b/include/block/block.h
index cdec3639a3..21d3e9db32 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -53,9 +53,10 @@ typedef enum {
     BDRV_REQ_NO_SERIALISING     = 0x8,
     BDRV_REQ_FUA                = 0x10,
     BDRV_REQ_WRITE_COMPRESSED   = 0x20,
+    BDRV_REQ_ALLOCATE           = 0x40,
 
     /* Mask of valid flags */
-    BDRV_REQ_MASK               = 0x3f,
+    BDRV_REQ_MASK               = 0x7f,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
@@ -604,4 +605,9 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
  */
 void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
 void bdrv_unregister_buf(BlockDriverState *bs, void *host);
+
+int bdrv_co_map_range(BdrvChild *child, int64_t offset,
+                      int64_t bytes, int64_t *pnum, int64_t *map,
+                      BlockDriverState **file,
+                      int flags);
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c4dd1d4bb8..3f3f6f3efd 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -475,6 +475,27 @@ struct BlockDriver {
      */
     void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
     void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host);
+
+    /**
+     * Get or create offset mappings for a virtual range.
+     *
+     * @offset: the offset of the beginning of the range.
+     * @bytes: the maximum bytes of the range to map.
+     * @pnum: at return, this will point to the number of bytes in the returned
+     * mapping status
+     * @map: at return, this will point to the offset which the range maps to
+     * @file: at return, this will point to the file where the data is mapped.
+     * If it is set to bs, it means the mapping is terminal and the result can
+     * be used for read/write and copy_range; if it is NULL, it means the range
+     * doesn't map to a file, or no I/O to any file is necessary; otherwise, it
+     * means the query should be passed down to an underlying format/protocol.
+     * @flags: flags for the request. BDRV_REQ_ALLOCATE means the range
+     * should be allocated if necessary.
+     */
+    int (*bdrv_co_map_range)(BlockDriverState *bs, int64_t offset,
+                             int64_t bytes, int64_t *pnum, int64_t *map,
+                             BlockDriverState **file,
+                             int flags);
     QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC PATCH 2/8] qcow2: Implement bdrv_co_map_range
  2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 1/8] block: Introduce bdrv_co_map_range API Fam Zheng
@ 2018-03-29 11:09 ` Fam Zheng
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 3/8] block: Introduce bdrv_co_copy_range Fam Zheng
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2018-03-29 11:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, pbonzini,
	Stefan Hajnoczi, eblake

Extract the common code from qcow2_co_block_status, then use it to
implement qcow2_co_map_range. The major difference is the allocation
logic, enabled by BDRV_REQ_ALLOCATE.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/qcow2.c | 103 +++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 73 insertions(+), 30 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 486f3e83b7..86bbeb69db 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1755,6 +1755,73 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     return status;
 }
 
+static int qcow2_handle_l2meta(BlockDriverState *bs, QCowL2Meta *l2meta)
+{
+    int ret = 0;
+
+    while (l2meta != NULL) {
+        QCowL2Meta *next;
+
+        if (!ret) {
+            ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
+        }
+
+        /* Take the request off the list of running requests */
+        if (l2meta->nb_clusters != 0) {
+            QLIST_REMOVE(l2meta, next_in_flight);
+        }
+
+        qemu_co_queue_restart_all(&l2meta->dependent_requests);
+
+        next = l2meta->next;
+        g_free(l2meta);
+        l2meta = next;
+    }
+    return ret;
+}
+
+static int coroutine_fn qcow2_co_map_range(BlockDriverState *bs, int64_t offset,
+                                           int64_t bytes, int64_t *pnum, int64_t *map,
+                                           BlockDriverState **file,
+                                           int flags)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int ret;
+    uint64_t cluster_offset;
+    unsigned int cur_bytes = MIN(bytes, INT_MAX);
+    int offset_in_cluster;
+    QCowL2Meta *l2meta = NULL;
+
+    if (!(flags & BDRV_REQ_ALLOCATE)) {
+        return qcow2_co_block_status(bs, true, offset, bytes, pnum, map, file);
+    }
+    qemu_co_mutex_lock(&s->lock);
+    ret = qcow2_alloc_cluster_offset(bs, offset, &cur_bytes,
+                                     &cluster_offset, &l2meta);
+    if (ret < 0) {
+        goto out;
+    }
+
+    offset_in_cluster = offset_into_cluster(s, offset);
+    ret = qcow2_pre_write_overlap_check(bs, 0,
+                                        cluster_offset + offset_in_cluster,
+                                        cur_bytes);
+    if (ret < 0) {
+        goto out;
+    }
+    ret = qcow2_handle_l2meta(bs, l2meta);
+    if (ret < 0) {
+        goto out;
+    }
+    *pnum = cur_bytes;
+    *map = cluster_offset + offset_in_cluster;
+    *file = bs->file->bs;
+    ret = BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA;
+out:
+    qemu_co_mutex_unlock(&s->lock);
+    return ret;
+}
+
 static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
                                         uint64_t bytes, QEMUIOVector *qiov,
                                         int flags)
@@ -2041,24 +2108,10 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
             }
         }
 
-        while (l2meta != NULL) {
-            QCowL2Meta *next;
-
-            ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
-            if (ret < 0) {
-                goto fail;
-            }
-
-            /* Take the request off the list of running requests */
-            if (l2meta->nb_clusters != 0) {
-                QLIST_REMOVE(l2meta, next_in_flight);
-            }
-
-            qemu_co_queue_restart_all(&l2meta->dependent_requests);
-
-            next = l2meta->next;
-            g_free(l2meta);
-            l2meta = next;
+        ret = qcow2_handle_l2meta(bs, l2meta);
+        l2meta = NULL;
+        if (ret) {
+            goto fail;
         }
 
         bytes -= cur_bytes;
@@ -2069,18 +2122,7 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
     ret = 0;
 
 fail:
-    while (l2meta != NULL) {
-        QCowL2Meta *next;
-
-        if (l2meta->nb_clusters != 0) {
-            QLIST_REMOVE(l2meta, next_in_flight);
-        }
-        qemu_co_queue_restart_all(&l2meta->dependent_requests);
-
-        next = l2meta->next;
-        g_free(l2meta);
-        l2meta = next;
-    }
+    qcow2_handle_l2meta(bs, l2meta);
 
     qemu_co_mutex_unlock(&s->lock);
 
@@ -4508,6 +4550,7 @@ BlockDriver bdrv_qcow2 = {
     .bdrv_co_create       = qcow2_co_create,
     .bdrv_has_zero_init = bdrv_has_zero_init_1,
     .bdrv_co_block_status = qcow2_co_block_status,
+    .bdrv_co_map_range    = qcow2_co_map_range,
 
     .bdrv_co_preadv         = qcow2_co_preadv,
     .bdrv_co_pwritev        = qcow2_co_pwritev,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC PATCH 3/8] block: Introduce bdrv_co_copy_range
  2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 1/8] block: Introduce bdrv_co_map_range API Fam Zheng
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 2/8] qcow2: Implement bdrv_co_map_range Fam Zheng
@ 2018-03-29 11:09 ` Fam Zheng
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 4/8] file-posix: Implement bdrv_co_copy_range Fam Zheng
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2018-03-29 11:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, pbonzini,
	Stefan Hajnoczi, eblake

This tries to do offloaded data copying in the same procotol (but
possibly a different BDS), or fall back to a bounce buffer if not
supported.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/io.c                | 148 ++++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |   4 ++
 include/block/block_int.h |  14 +++++
 3 files changed, 166 insertions(+)

diff --git a/block/io.c b/block/io.c
index 1b4cfcacb1..64cd3a5170 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2870,3 +2870,151 @@ int coroutine_fn bdrv_co_map_range(BdrvChild *child, int64_t offset,
     }
     return -ENOTSUP;
 }
+
+typedef struct {
+    BdrvChild *child;
+    int64_t offset;
+    int64_t remaining;
+    BlockDriverState *file;
+    int64_t map;
+    int64_t pnum;
+    int status;
+} BdrvCopyRangeState;
+
+static inline void bdrv_range_advance(BdrvCopyRangeState *s, int bytes)
+{
+    s->offset += bytes;
+    s->remaining -= bytes;
+    assert(s->remaining >= 0);
+    s->map = 0;
+    s->pnum = MAX(0, s->pnum - bytes);
+    s->file = NULL;
+}
+
+static int coroutine_fn
+bdrv_co_copy_range_iteration(BdrvCopyRangeState *in, BdrvCopyRangeState *out)
+{
+    int ret;
+    int bytes = MIN(in->remaining, out->remaining);
+    QEMUIOVector qiov;
+    BlockDriverState *in_bs = in->child->bs;
+    BlockDriverState *out_bs = out->child->bs;
+    uint8_t *buf;
+
+    if (!in->pnum) {
+        in->status = bdrv_co_map_range(in->child, in->offset, in->remaining,
+                                       &in->pnum, &in->map, &in->file, 0);
+    }
+    if (in->status < 0 || !in->file ||
+        in->offset + bytes > bdrv_getlength(in->file)) {
+        goto fallback;
+    }
+
+    assert(in->pnum > 0);
+    bytes = MIN(bytes, in->pnum);
+    if (in->status & BDRV_BLOCK_ZERO) {
+        ret = bdrv_co_pwrite_zeroes(out->child, out->offset, in->pnum, 0);
+        if (ret) {
+            return ret;
+        }
+        bdrv_range_advance(in, bytes);
+        bdrv_range_advance(out, bytes);
+        return 0;
+    }
+
+    if (!(in->status & BDRV_BLOCK_OFFSET_VALID)) {
+        goto fallback;
+    }
+
+    out->status = bdrv_co_map_range(out->child, out->offset, out->remaining,
+                                    &out->pnum, &out->map, &out->file,
+                                    BDRV_REQ_ALLOCATE);
+    if (out->status < 0 || !out->file ||
+        !(out->status & BDRV_BLOCK_OFFSET_VALID)) {
+        goto fallback;
+    }
+
+    bytes = MIN(bytes, out->pnum);
+
+    if (in->file->drv->bdrv_co_copy_range) {
+        if (!in->file->drv->bdrv_co_copy_range(in->file, in->map,
+                                               out->file, out->map,
+                                               bytes)) {
+            bdrv_range_advance(in, bytes);
+            bdrv_range_advance(out, bytes);
+            return 0;
+        }
+    }
+    /* At this point we could maximize bytes again */
+    bytes = MIN(in->remaining, out->remaining);
+
+fallback:
+    buf = qemu_try_memalign(MAX(bdrv_opt_mem_align(in_bs),
+                                bdrv_opt_mem_align(out_bs)),
+                            bytes);
+    if (!buf) {
+        return -ENOMEM;
+    }
+
+    qemu_iovec_init(&qiov, 1);
+    qemu_iovec_add(&qiov, buf, bytes);
+
+    ret = bdrv_co_preadv(in->child, in->offset, bytes, &qiov, 0);
+    if (ret) {
+        goto out;
+    }
+    ret = bdrv_co_pwritev(out->child, out->offset, bytes, &qiov, 0);
+    if (!ret) {
+        bdrv_range_advance(in, bytes);
+        bdrv_range_advance(out, bytes);
+    }
+
+out:
+    qemu_vfree(buf);
+    qemu_iovec_destroy(&qiov);
+    return ret;
+}
+
+int coroutine_fn bdrv_co_copy_range(BdrvChild *child_in, int64_t off_in,
+                                    BdrvChild *child_out, int64_t off_out,
+                                    int bytes)
+{
+    BdrvTrackedRequest in_req, out_req;
+    BlockDriverState *in_bs = child_in->bs;
+    BlockDriverState *out_bs = child_out->bs;
+    int ret = 0;
+    BdrvCopyRangeState in = (BdrvCopyRangeState) {
+        .child = child_in,
+        .offset = off_in,
+        .remaining = bytes,
+    };
+    BdrvCopyRangeState out = (BdrvCopyRangeState) {
+        .child = child_out,
+        .offset = off_out,
+        .remaining = bytes,
+    };
+
+    bdrv_inc_in_flight(in_bs);
+    bdrv_inc_in_flight(out_bs);
+    tracked_request_begin(&in_req, in_bs, off_in, bytes, BDRV_TRACKED_READ);
+    tracked_request_begin(&out_req, out_bs, off_out, bytes, BDRV_TRACKED_WRITE);
+
+    wait_serialising_requests(&in_req);
+    wait_serialising_requests(&out_req);
+
+    while (in.remaining > 0) {
+        ret = bdrv_co_copy_range_iteration(&in, &out);
+        assert(in.remaining == out.remaining);
+        if (ret < 0) {
+            goto out;
+        }
+    }
+    assert(in.remaining == 0);
+    assert(out.remaining == 0);
+out:
+    tracked_request_end(&in_req);
+    tracked_request_end(&out_req);
+    bdrv_dec_in_flight(in_bs);
+    bdrv_dec_in_flight(out_bs);
+    return ret;
+}
diff --git a/include/block/block.h b/include/block/block.h
index 21d3e9db32..7d57e53cfe 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -610,4 +610,8 @@ int bdrv_co_map_range(BdrvChild *child, int64_t offset,
                       int64_t bytes, int64_t *pnum, int64_t *map,
                       BlockDriverState **file,
                       int flags);
+int bdrv_co_copy_range(BdrvChild *child_in, int64_t off_in,
+                       BdrvChild *child_out, int64_t off_out,
+                       int bytes);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3f3f6f3efd..10a23b5754 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -496,6 +496,20 @@ struct BlockDriver {
                              int64_t bytes, int64_t *pnum, int64_t *map,
                              BlockDriverState **file,
                              int flags);
+    /**
+     * Do offloaded copying of data range. Must be implemented together with
+     * bdrv_co_map_range.
+     *
+     * @bs: where to copy data from
+     * @off_in: the offset to copy data from in @bs
+     * @out: where to write data to. This is guaranteed to be the same
+     *       BlockDriver as @bs
+     * @off_out: the offset to copy data to in @out
+     * @bytes: the number of bytes to copy
+     */
+    int (*bdrv_co_copy_range)(BlockDriverState *bs, int64_t off_in,
+                              BlockDriverState *out, int64_t off_out,
+                              int bytes);
     QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC PATCH 4/8] file-posix: Implement bdrv_co_copy_range
  2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
                   ` (2 preceding siblings ...)
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 3/8] block: Introduce bdrv_co_copy_range Fam Zheng
@ 2018-03-29 11:09 ` Fam Zheng
  2018-04-04 13:20   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 5/8] file-posix: Implement bdrv_co_map_range Fam Zheng
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2018-03-29 11:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, pbonzini,
	Stefan Hajnoczi, eblake

With copy_file_range(2), we can implement the bdrv_co_copy_range
semantics.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/file-posix.c      | 77 +++++++++++++++++++++++++++++++++++++++++++++++--
 include/block/raw-aio.h | 10 +++++--
 2 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index d7fb772c14..b13bc89423 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -100,6 +100,7 @@
 #ifdef CONFIG_XFS
 #include <xfs/xfs.h>
 #endif
+#include <sys/syscall.h>
 
 //#define DEBUG_BLOCK
 
@@ -185,6 +186,8 @@ typedef struct RawPosixAIOData {
 #define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
     off_t aio_offset;
     int aio_type;
+    int fd2;
+    off_t offset2;
 } RawPosixAIOData;
 
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -1421,6 +1424,41 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
     return -ENOTSUP;
 }
 
+#ifdef __NR_copy_file_range
+#define HAS_COPY_FILE_RANGE
+#endif
+
+#ifdef HAS_COPY_FILE_RANGE
+static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
+                             off_t *out_off, size_t len, unsigned int flags)
+{
+    return syscall(__NR_copy_file_range, in_fd, in_off, out_fd,
+                   out_off, len, flags);
+}
+#endif
+
+static ssize_t handle_aiocb_copy_range(RawPosixAIOData *aiocb)
+{
+#ifndef HAS_COPY_FILE_RANGE
+    return -ENOTSUP;
+#else
+    uint64_t bytes = aiocb->aio_nbytes;
+    off_t in_off = aiocb->aio_offset;
+    off_t out_off = aiocb->offset2;
+
+    while (bytes) {
+        ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off,
+                                      aiocb->fd2, &out_off,
+                                      bytes, 0);
+        if (ret < 0) {
+            return -errno;
+        }
+        bytes -= ret;
+    }
+    return 0;
+#endif
+}
+
 static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 {
     int ret = -EOPNOTSUPP;
@@ -1501,6 +1539,9 @@ static int aio_worker(void *arg)
     case QEMU_AIO_WRITE_ZEROES:
         ret = handle_aiocb_write_zeroes(aiocb);
         break;
+    case QEMU_AIO_COPY_RANGE:
+        ret = handle_aiocb_copy_range(aiocb);
+        break;
     default:
         fprintf(stderr, "invalid aio request (0x%x)\n", aiocb->aio_type);
         ret = -EINVAL;
@@ -1511,9 +1552,10 @@ static int aio_worker(void *arg)
     return ret;
 }
 
-static int paio_submit_co(BlockDriverState *bs, int fd,
-                          int64_t offset, QEMUIOVector *qiov,
-                          int bytes, int type)
+static int paio_submit_co_full(BlockDriverState *bs, int fd,
+                               int64_t offset, int fd2, int64_t offset2,
+                               QEMUIOVector *qiov,
+                               int bytes, int type)
 {
     RawPosixAIOData *acb = g_new(RawPosixAIOData, 1);
     ThreadPool *pool;
@@ -1521,6 +1563,8 @@ static int paio_submit_co(BlockDriverState *bs, int fd,
     acb->bs = bs;
     acb->aio_type = type;
     acb->aio_fildes = fd;
+    acb->fd2 = fd2;
+    acb->offset2 = offset2;
 
     acb->aio_nbytes = bytes;
     acb->aio_offset = offset;
@@ -1536,6 +1580,13 @@ static int paio_submit_co(BlockDriverState *bs, int fd,
     return thread_pool_submit_co(pool, aio_worker, acb);
 }
 
+static inline int paio_submit_co(BlockDriverState *bs, int fd,
+                                 int64_t offset, QEMUIOVector *qiov,
+                                 int bytes, int type)
+{
+    return paio_submit_co_full(bs, fd, offset, -1, 0, qiov, bytes, type);
+}
+
 static BlockAIOCB *paio_submit(BlockDriverState *bs, int fd,
         int64_t offset, QEMUIOVector *qiov, int bytes,
         BlockCompletionFunc *cb, void *opaque, int type)
@@ -1605,6 +1656,22 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
     return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
 }
 
+static int raw_co_copy_range(BlockDriverState *bs, int64_t off_in,
+                             BlockDriverState *out, int64_t off_out,
+                             int bytes)
+{
+    BDRVRawState *s = bs->opaque;
+    BDRVRawState *out_s;
+
+    assert(out->drv->bdrv_co_copy_range == raw_co_copy_range);
+    out_s = out->opaque;
+    if (fd_open(bs) < 0 || fd_open(out) < 0) {
+        return -EIO;
+    }
+    return paio_submit_co_full(bs, s->fd, off_in, out_s->fd, off_out,
+                               NULL, bytes, QEMU_AIO_COPY_RANGE);
+}
+
 static void raw_aio_plug(BlockDriverState *bs)
 {
 #ifdef CONFIG_LINUX_AIO
@@ -2321,6 +2388,7 @@ BlockDriver bdrv_file = {
 
     .bdrv_co_preadv         = raw_co_preadv,
     .bdrv_co_pwritev        = raw_co_pwritev,
+    .bdrv_co_copy_range     = raw_co_copy_range,
     .bdrv_aio_flush = raw_aio_flush,
     .bdrv_aio_pdiscard = raw_aio_pdiscard,
     .bdrv_refresh_limits = raw_refresh_limits,
@@ -2798,6 +2866,7 @@ static BlockDriver bdrv_host_device = {
 
     .bdrv_co_preadv         = raw_co_preadv,
     .bdrv_co_pwritev        = raw_co_pwritev,
+    .bdrv_co_copy_range     = raw_co_copy_range,
     .bdrv_aio_flush	= raw_aio_flush,
     .bdrv_aio_pdiscard   = hdev_aio_pdiscard,
     .bdrv_refresh_limits = raw_refresh_limits,
@@ -2920,6 +2989,7 @@ static BlockDriver bdrv_host_cdrom = {
 
     .bdrv_co_preadv         = raw_co_preadv,
     .bdrv_co_pwritev        = raw_co_pwritev,
+    .bdrv_co_copy_range     = raw_co_copy_range,
     .bdrv_aio_flush	= raw_aio_flush,
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
@@ -3050,6 +3120,7 @@ static BlockDriver bdrv_host_cdrom = {
 
     .bdrv_co_preadv         = raw_co_preadv,
     .bdrv_co_pwritev        = raw_co_pwritev,
+    .bdrv_co_copy_range     = raw_co_copy_range,
     .bdrv_aio_flush	= raw_aio_flush,
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index a4cdbbf1b7..324053020b 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -25,9 +25,15 @@
 #define QEMU_AIO_FLUSH        0x0008
 #define QEMU_AIO_DISCARD      0x0010
 #define QEMU_AIO_WRITE_ZEROES 0x0020
+#define QEMU_AIO_COPY_RANGE   0x0040
 #define QEMU_AIO_TYPE_MASK \
-        (QEMU_AIO_READ|QEMU_AIO_WRITE|QEMU_AIO_IOCTL|QEMU_AIO_FLUSH| \
-         QEMU_AIO_DISCARD|QEMU_AIO_WRITE_ZEROES)
+        (QEMU_AIO_READ | \
+         QEMU_AIO_WRITE | \
+         QEMU_AIO_IOCTL | \
+         QEMU_AIO_FLUSH | \
+         QEMU_AIO_DISCARD | \
+         QEMU_AIO_WRITE_ZEROES | \
+         QEMU_AIO_COPY_RANGE)
 
 /* AIO flags */
 #define QEMU_AIO_MISALIGNED   0x1000
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC PATCH 5/8] file-posix: Implement bdrv_co_map_range
  2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
                   ` (3 preceding siblings ...)
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 4/8] file-posix: Implement bdrv_co_copy_range Fam Zheng
@ 2018-03-29 11:09 ` Fam Zheng
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 6/8] raw: Implement raw_co_map_range Fam Zheng
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2018-03-29 11:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, pbonzini,
	Stefan Hajnoczi, eblake

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/file-posix.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index b13bc89423..4e615d937e 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1656,6 +1656,17 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
     return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
 }
 
+static int raw_co_map_range(BlockDriverState *bs, int64_t offset,
+                         int64_t bytes, int64_t *pnum, int64_t *map,
+                         BlockDriverState **file,
+                         int flags)
+{
+    *file = bs;
+    *pnum = bytes;
+    *map = offset;
+    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+}
+
 static int raw_co_copy_range(BlockDriverState *bs, int64_t off_in,
                              BlockDriverState *out, int64_t off_out,
                              int bytes)
@@ -2389,6 +2400,7 @@ BlockDriver bdrv_file = {
     .bdrv_co_preadv         = raw_co_preadv,
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_co_copy_range     = raw_co_copy_range,
+    .bdrv_co_map_range      = raw_co_map_range,
     .bdrv_aio_flush = raw_aio_flush,
     .bdrv_aio_pdiscard = raw_aio_pdiscard,
     .bdrv_refresh_limits = raw_refresh_limits,
@@ -2867,6 +2879,7 @@ static BlockDriver bdrv_host_device = {
     .bdrv_co_preadv         = raw_co_preadv,
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_co_copy_range     = raw_co_copy_range,
+    .bdrv_co_map_range      = raw_co_map_range,
     .bdrv_aio_flush	= raw_aio_flush,
     .bdrv_aio_pdiscard   = hdev_aio_pdiscard,
     .bdrv_refresh_limits = raw_refresh_limits,
@@ -2990,6 +3003,7 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_co_preadv         = raw_co_preadv,
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_co_copy_range     = raw_co_copy_range,
+    .bdrv_co_map_range      = raw_co_map_range,
     .bdrv_aio_flush	= raw_aio_flush,
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
@@ -3121,6 +3135,7 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_co_preadv         = raw_co_preadv,
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_co_copy_range     = raw_co_copy_range,
+    .bdrv_co_map_range      = raw_co_map_range,
     .bdrv_aio_flush	= raw_aio_flush,
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC PATCH 6/8] raw: Implement raw_co_map_range
  2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
                   ` (4 preceding siblings ...)
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 5/8] file-posix: Implement bdrv_co_map_range Fam Zheng
@ 2018-03-29 11:09 ` Fam Zheng
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 7/8] block-backend: Add blk_co_copy_range Fam Zheng
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2018-03-29 11:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, pbonzini,
	Stefan Hajnoczi, eblake

Because raw can be seen as a "passthrough" format, its implementation of
bdrv_co_map_range is exactly the same as bdrv_co_block_status. The
BDRV_REQ_ALLOCATE flag can be ignored because no metadata update is
necessary.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/raw-format.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index a378547c99..95e81af22d 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -263,6 +263,14 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
     return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }
 
+static int coroutine_fn raw_co_map_range(BlockDriverState *bs, int64_t offset,
+                                         int64_t bytes, int64_t *pnum, int64_t *map,
+                                         BlockDriverState **file,
+                                         int flags)
+{
+    return raw_co_block_status(bs, true, offset, bytes, pnum, map, file);
+}
+
 static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
                                              int64_t offset, int bytes,
                                              BdrvRequestFlags flags)
@@ -498,6 +506,7 @@ BlockDriver bdrv_raw = {
     .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
     .bdrv_co_pdiscard     = &raw_co_pdiscard,
     .bdrv_co_block_status = &raw_co_block_status,
+    .bdrv_co_map_range    = &raw_co_map_range,
     .bdrv_truncate        = &raw_truncate,
     .bdrv_getlength       = &raw_getlength,
     .has_variable_length  = true,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC PATCH 7/8] block-backend: Add blk_co_copy_range
  2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
                   ` (5 preceding siblings ...)
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 6/8] raw: Implement raw_co_map_range Fam Zheng
@ 2018-03-29 11:09 ` Fam Zheng
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 8/8] qemu-img: Convert with copy offloading Fam Zheng
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2018-03-29 11:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, pbonzini,
	Stefan Hajnoczi, eblake

It's a BlockBackend wrapper of bdrv_co_copy_range.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/block-backend.c          | 8 ++++++++
 include/sysemu/block-backend.h | 4 ++++
 2 files changed, 12 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 681b240b12..54b0789282 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2217,3 +2217,11 @@ void blk_unregister_buf(BlockBackend *blk, void *host)
 {
     bdrv_unregister_buf(blk_bs(blk), host);
 }
+
+int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
+                                   BlockBackend *blk_out, int64_t off_out,
+                                   int bytes)
+{
+    return bdrv_co_copy_range(blk_in->root, off_in, blk_out->root, off_out,
+                              bytes);
+}
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 92ab624fac..1fde14b461 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -232,4 +232,8 @@ void blk_set_force_allow_inactivate(BlockBackend *blk);
 void blk_register_buf(BlockBackend *blk, void *host, size_t size);
 void blk_unregister_buf(BlockBackend *blk, void *host);
 
+int blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
+                      BlockBackend *blk_out, int64_t off_out,
+                      int bytes);
+
 #endif
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC PATCH 8/8] qemu-img: Convert with copy offloading
  2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
                   ` (6 preceding siblings ...)
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 7/8] block-backend: Add blk_co_copy_range Fam Zheng
@ 2018-03-29 11:09 ` Fam Zheng
  2018-03-31  8:13 ` [Qemu-devel] [RFC PATCH 0/8] qemu-img convert " no-reply
  2018-04-04 13:23 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  9 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2018-03-29 11:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, pbonzini,
	Stefan Hajnoczi, eblake

The new blk_co_copy_range interface offers a more efficient way in the
case of network based storage. Make use of it to allow faster convert
operation.

Since copy offloading cannot do zero detection ('-S') and compression
(-c), only try it when these options are not used.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 qemu-img.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 855fa52514..571b6e2e49 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1544,6 +1544,7 @@ typedef struct ImgConvertState {
     bool compressed;
     bool target_has_backing;
     bool wr_in_order;
+    bool copy_range;
     int min_sparse;
     size_t cluster_sectors;
     size_t buf_sectors;
@@ -1737,6 +1738,37 @@ static int coroutine_fn convert_co_write(ImgConvertState *s, int64_t sector_num,
     return 0;
 }
 
+static int coroutine_fn convert_co_copy_range(ImgConvertState *s, int64_t sector_num,
+                                              int nb_sectors)
+{
+    int n, ret;
+
+    while (nb_sectors > 0) {
+        BlockBackend *blk;
+        int src_cur;
+        int64_t bs_sectors, src_cur_offset;
+        int64_t offset;
+
+        convert_select_part(s, sector_num, &src_cur, &src_cur_offset);
+        offset = (sector_num - src_cur_offset) << BDRV_SECTOR_BITS;
+        blk = s->src[src_cur];
+        bs_sectors = s->src_sectors[src_cur];
+
+        n = MIN(nb_sectors, bs_sectors - (sector_num - src_cur_offset));
+
+        ret = blk_co_copy_range(blk, offset, s->target,
+                                sector_num << BDRV_SECTOR_BITS,
+                                n << BDRV_SECTOR_BITS);
+        if (ret < 0) {
+            return ret;
+        }
+
+        sector_num += n;
+        nb_sectors -= n;
+    }
+    return 0;
+}
+
 static void coroutine_fn convert_co_do_copy(void *opaque)
 {
     ImgConvertState *s = opaque;
@@ -1759,6 +1791,7 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
         int n;
         int64_t sector_num;
         enum ImgConvertBlockStatus status;
+        bool copy_range;
 
         qemu_co_mutex_lock(&s->lock);
         if (s->ret != -EINPROGRESS || s->sector_num >= s->total_sectors) {
@@ -1777,6 +1810,7 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
         if (!s->min_sparse && s->status == BLK_ZERO) {
             n = MIN(n, s->buf_sectors);
         }
+        copy_range = s->copy_range && s->status == BLK_DATA;
         /* increment global sector counter so that other coroutines can
          * already continue reading beyond this request */
         s->sector_num += n;
@@ -1788,7 +1822,7 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
                                         s->allocated_sectors, 0);
         }
 
-        if (status == BLK_DATA) {
+        if (status == BLK_DATA && !copy_range) {
             ret = convert_co_read(s, sector_num, n, buf);
             if (ret < 0) {
                 error_report("error while reading sector %" PRId64
@@ -1810,7 +1844,11 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
         }
 
         if (s->ret == -EINPROGRESS) {
-            ret = convert_co_write(s, sector_num, n, buf, status);
+            if (copy_range) {
+                ret = convert_co_copy_range(s, sector_num, n);
+            } else {
+                ret = convert_co_write(s, sector_num, n, buf, status);
+            }
             if (ret < 0) {
                 error_report("error while writing sector %" PRId64
                              ": %s", sector_num, strerror(-ret));
@@ -1933,6 +1971,7 @@ static int img_convert(int argc, char **argv)
     ImgConvertState s = (ImgConvertState) {
         /* Need at least 4k of zeros for sparse detection */
         .min_sparse         = 8,
+        .copy_range         = true,
         .buf_sectors        = IO_BUF_SIZE / BDRV_SECTOR_SIZE,
         .wr_in_order        = true,
         .num_coroutines     = 8,
@@ -1973,6 +2012,7 @@ static int img_convert(int argc, char **argv)
             break;
         case 'c':
             s.compressed = true;
+            s.copy_range = false;
             break;
         case 'o':
             if (!is_valid_option_list(optarg)) {
@@ -2014,6 +2054,7 @@ static int img_convert(int argc, char **argv)
             }
 
             s.min_sparse = sval / BDRV_SECTOR_SIZE;
+            s.copy_range = false;
             break;
         }
         case 'p':
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading
  2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
                   ` (7 preceding siblings ...)
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 8/8] qemu-img: Convert with copy offloading Fam Zheng
@ 2018-03-31  8:13 ` no-reply
  2018-04-04 13:23 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  9 siblings, 0 replies; 20+ messages in thread
From: no-reply @ 2018-03-31  8:13 UTC (permalink / raw)
  To: famz; +Cc: qemu-devel, kwolf, qemu-block, mreitz, stefanha, pbonzini

Hi,

This series failed docker-build@min-glib build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 20180329110914.20888-1-famz@redhat.com
Subject: [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-build@min-glib
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
6f3b33fe62 qemu-img: Convert with copy offloading
b435388660 block-backend: Add blk_co_copy_range
388616cd3c raw: Implement raw_co_map_range
a4e47507b4 file-posix: Implement bdrv_co_map_range
8a6e25ecab file-posix: Implement bdrv_co_copy_range
a3a1a163e2 block: Introduce bdrv_co_copy_range
b42ec41133 qcow2: Implement bdrv_co_map_range
1eadcd710a block: Introduce bdrv_co_map_range API

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-mfpmh0ne/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   min-glib
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-mfpmh0ne/src'
  GEN     /var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136/qemu.tar
Cloning into '/var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136/qemu.tar.vroot'...
done.
Checking out files:  11% (690/6066)   
Checking out files:  12% (728/6066)   
Checking out files:  12% (746/6066)   
Checking out files:  13% (789/6066)   
Checking out files:  14% (850/6066)   
Checking out files:  15% (910/6066)   
Checking out files:  16% (971/6066)   
Checking out files:  17% (1032/6066)   
Checking out files:  18% (1092/6066)   
Checking out files:  19% (1153/6066)   
Checking out files:  20% (1214/6066)   
Checking out files:  21% (1274/6066)   
Checking out files:  22% (1335/6066)   
Checking out files:  23% (1396/6066)   
Checking out files:  24% (1456/6066)   
Checking out files:  25% (1517/6066)   
Checking out files:  26% (1578/6066)   
Checking out files:  27% (1638/6066)   
Checking out files:  28% (1699/6066)   
Checking out files:  29% (1760/6066)   
Checking out files:  30% (1820/6066)   
Checking out files:  31% (1881/6066)   
Checking out files:  32% (1942/6066)   
Checking out files:  33% (2002/6066)   
Checking out files:  34% (2063/6066)   
Checking out files:  35% (2124/6066)   
Checking out files:  36% (2184/6066)   
Checking out files:  37% (2245/6066)   
Checking out files:  38% (2306/6066)   
Checking out files:  39% (2366/6066)   
Checking out files:  40% (2427/6066)   
Checking out files:  41% (2488/6066)   
Checking out files:  42% (2548/6066)   
Checking out files:  43% (2609/6066)   
Checking out files:  44% (2670/6066)   
Checking out files:  45% (2730/6066)   
Checking out files:  46% (2791/6066)   
Checking out files:  47% (2852/6066)   
Checking out files:  48% (2912/6066)   
Checking out files:  49% (2973/6066)   
Checking out files:  50% (3033/6066)   
Checking out files:  51% (3094/6066)   
Checking out files:  52% (3155/6066)   
Checking out files:  53% (3215/6066)   
Checking out files:  54% (3276/6066)   
Checking out files:  55% (3337/6066)   
Checking out files:  56% (3397/6066)   
Checking out files:  57% (3458/6066)   
Checking out files:  58% (3519/6066)   
Checking out files:  59% (3579/6066)   
Checking out files:  60% (3640/6066)   
Checking out files:  60% (3687/6066)   
Checking out files:  61% (3701/6066)   
Checking out files:  62% (3761/6066)   
Checking out files:  63% (3822/6066)   
Checking out files:  64% (3883/6066)   
Checking out files:  65% (3943/6066)   
Checking out files:  66% (4004/6066)   
Checking out files:  67% (4065/6066)   
Checking out files:  68% (4125/6066)   
Checking out files:  69% (4186/6066)   
Checking out files:  69% (4226/6066)   
Checking out files:  70% (4247/6066)   
Checking out files:  71% (4307/6066)   
Checking out files:  72% (4368/6066)   
Checking out files:  73% (4429/6066)   
Checking out files:  74% (4489/6066)   
Checking out files:  75% (4550/6066)   
Checking out files:  76% (4611/6066)   
Checking out files:  77% (4671/6066)   
Checking out files:  78% (4732/6066)   
Checking out files:  79% (4793/6066)   
Checking out files:  80% (4853/6066)   
Checking out files:  81% (4914/6066)   
Checking out files:  82% (4975/6066)   
Checking out files:  83% (5035/6066)   
Checking out files:  84% (5096/6066)   
Checking out files:  85% (5157/6066)   
Checking out files:  86% (5217/6066)   
Checking out files:  87% (5278/6066)   
Checking out files:  88% (5339/6066)   
Checking out files:  89% (5399/6066)   
Checking out files:  90% (5460/6066)   
Checking out files:  91% (5521/6066)   
Checking out files:  92% (5581/6066)   
Checking out files:  93% (5642/6066)   
Checking out files:  94% (5703/6066)   
Checking out files:  95% (5763/6066)   
Checking out files:  96% (5824/6066)   
Checking out files:  97% (5885/6066)   
Checking out files:  98% (5945/6066)   
Checking out files:  99% (6006/6066)   
Checking out files: 100% (6066/6066)   
Checking out files: 100% (6066/6066), done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into '/var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
tar: /var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136/qemu.tar: Wrote only 2048 of 10240 bytes
tar: Error is not recoverable: exiting now
failed to create tar file
  COPY    RUNNER
    RUN test-build in qemu:min-glib 
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
/var/tmp/qemu/run: line 32: prep_fail: command not found
Environment variables:
HOSTNAME=a38ba0cc2dc5
MAKEFLAGS= -j8
J=8
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
TARGET_LIST=
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
FEATURES= dtc
DEBUG=
_=/usr/bin/env

/var/tmp/qemu/run: line 52: cd: /tmp/qemu-test/src/tests/docker: No such file or directory
/var/tmp/qemu/run: line 57: /test-build: No such file or directory
/var/tmp/qemu/run: line 57: exec: /test-build: cannot execute: No such file or directory
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 407, in <module>
    sys.exit(main())
  File "./tests/docker/docker.py", line 404, in main
    return args.cmdobj.run(args, argv)
  File "./tests/docker/docker.py", line 261, in run
    return Docker().run(argv, args.keep, quiet=args.quiet)
  File "./tests/docker/docker.py", line 229, in run
    quiet=quiet)
  File "./tests/docker/docker.py", line 147, in _do_check
    return subprocess.check_call(self._command + cmd, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'run', '--label', 'com.qemu.instance.uuid=5970da9034bb11e8a39d52540069c830', '-u', '0', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/root/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136:/var/tmp/qemu:z,ro', 'qemu:min-glib', '/var/tmp/qemu/run', 'test-build']' returned non-zero exit status 126
make[1]: *** [tests/docker/Makefile.include:129: docker-run] Error 1
make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-mfpmh0ne/src'
make: *** [tests/docker/Makefile.include:163: docker-run-test-build@min-glib] Error 2

real	0m38.034s
user	0m9.304s
sys	0m6.995s
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 1/8] block: Introduce bdrv_co_map_range API
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 1/8] block: Introduce bdrv_co_map_range API Fam Zheng
@ 2018-04-04 12:57   ` Stefan Hajnoczi
  2018-04-04 13:01   ` Stefan Hajnoczi
  1 sibling, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-04 12:57 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2248 bytes --]

On Thu, Mar 29, 2018 at 07:09:07PM +0800, Fam Zheng wrote:
> diff --git a/block/io.c b/block/io.c
> index bd9a19a9c4..1b4cfcacb1 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2826,3 +2826,47 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
>          bdrv_unregister_buf(child->bs, host);
>      }
>  }
> +
> +int coroutine_fn bdrv_co_map_range(BdrvChild *child, int64_t offset,
> +                                   int64_t bytes, int64_t *pnum, int64_t *map,

s/map/file_offset/ is clearer, since this function has self-expanatory
'offset' and 'file' arguments.  This is the "offset within the file" ->
file_offset.

> +    /**
> +     * Get or create offset mappings for a virtual range.
> +     *
> +     * @offset: the offset of the beginning of the range.

Adding the units and avoiding reusing "offset" in the description:

"byte position of the beginning of the range"

> +     * @bytes: the maximum bytes of the range to map.
> +     * @pnum: at return, this will point to the number of bytes in the returned
> +     * mapping status
> +     * @map: at return, this will point to the offset which the range maps to
> +     * @file: at return, this will point to the file where the data is mapped.
> +     * If it is set to bs, it means the mapping is terminal and the result can
> +     * be used for read/write and copy_range; if it is NULL, it means the range
> +     * doesn't map to a file, or no I/O to any file is necessary; otherwise, it
> +     * means the query should be passed down to an underlying format/protocol.

What does "the query should be passed down to an underlying
format/protocol" mean?

> +     * @flags: flags for the request. BDRV_REQ_ALLOCATE means the range
> +     * should be allocated if necessary.
> +     */

What are return values?

Please include a statement describing how long the mapping is considered
stable.  For example, a qcow2 copy-on-write could "move" the data
somewhere else.  Therefore it's not safe to store the value of "map" and
use it again later on.

Additionally, please explain what the caller is allowed to do with the
returned information.  Using qcow2 copy-on-write as an example again, it
is not okay to write to "map" since that bypasses qcow2's COW mechanism.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 1/8] block: Introduce bdrv_co_map_range API
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 1/8] block: Introduce bdrv_co_map_range API Fam Zheng
  2018-04-04 12:57   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2018-04-04 13:01   ` Stefan Hajnoczi
  1 sibling, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-04 13:01 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi, pbonzini

[-- Attachment #1: Type: text/plain, Size: 413 bytes --]

On Thu, Mar 29, 2018 at 07:09:07PM +0800, Fam Zheng wrote:
> +int coroutine_fn bdrv_co_map_range(BdrvChild *child, int64_t offset,
> +                                   int64_t bytes, int64_t *pnum, int64_t *map,

I just noticed that .bdrv_co_block_status() also calls the argument
"map".  In that case, let's keep it "map" unless you want to rename
"map" to "file_offset" in .bdrv_co_block_status() too.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 4/8] file-posix: Implement bdrv_co_copy_range
  2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 4/8] file-posix: Implement bdrv_co_copy_range Fam Zheng
@ 2018-04-04 13:20   ` Stefan Hajnoczi
  2018-04-09  8:53     ` Fam Zheng
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-04 13:20 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi, pbonzini

[-- Attachment #1: Type: text/plain, Size: 618 bytes --]

On Thu, Mar 29, 2018 at 07:09:10PM +0800, Fam Zheng wrote:
> +static ssize_t handle_aiocb_copy_range(RawPosixAIOData *aiocb)
> +{
> +#ifndef HAS_COPY_FILE_RANGE
> +    return -ENOTSUP;
> +#else
> +    uint64_t bytes = aiocb->aio_nbytes;
> +    off_t in_off = aiocb->aio_offset;
> +    off_t out_off = aiocb->offset2;
> +
> +    while (bytes) {
> +        ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off,
> +                                      aiocb->fd2, &out_off,
> +                                      bytes, 0);
> +        if (ret < 0) {
> +            return -errno;
> +        }

EINTR should retry.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
  2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
                   ` (8 preceding siblings ...)
  2018-03-31  8:13 ` [Qemu-devel] [RFC PATCH 0/8] qemu-img convert " no-reply
@ 2018-04-04 13:23 ` Stefan Hajnoczi
  2018-04-04 13:49   ` Fam Zheng
  9 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-04 13:23 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1096 bytes --]

On Thu, Mar 29, 2018 at 07:09:06PM +0800, Fam Zheng wrote:
> [Posting a preview RFC for the general idea discussion and internal API review.
> Libiscsi support is being worked on in the meantime.]
> 
> This series introduces block layer API for copy offloading and makes use of it
> in qemu-img convert.
> 
> For now we implemented the operation in local file protocol with
> copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
> and potentially more.
> 
> As far as its usage goes, in addition to qemu-img convert, we can emulate
> offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror.
> 
> The new bdrv_co_map_range can also be an alternative way to implement format
> drivers in the future, once we make block/io.c use it in preadv/pwritev paths.

I posted concerns about the bdrv_co_map_range() interface.  It would be
safer to only have a copy_range() interface without exposing how data is
mapped outside the driver where race conditions can occur and the format
driver no longer has full control over file layout.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
  2018-04-04 13:23 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2018-04-04 13:49   ` Fam Zheng
  2018-04-04 15:26     ` Paolo Bonzini
  2018-04-05 12:55     ` Stefan Hajnoczi
  0 siblings, 2 replies; 20+ messages in thread
From: Fam Zheng @ 2018-04-04 13:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi, pbonzini

On Wed, 04/04 14:23, Stefan Hajnoczi wrote:
> On Thu, Mar 29, 2018 at 07:09:06PM +0800, Fam Zheng wrote:
> > [Posting a preview RFC for the general idea discussion and internal API review.
> > Libiscsi support is being worked on in the meantime.]
> > 
> > This series introduces block layer API for copy offloading and makes use of it
> > in qemu-img convert.
> > 
> > For now we implemented the operation in local file protocol with
> > copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
> > and potentially more.
> > 
> > As far as its usage goes, in addition to qemu-img convert, we can emulate
> > offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror.
> > 
> > The new bdrv_co_map_range can also be an alternative way to implement format
> > drivers in the future, once we make block/io.c use it in preadv/pwritev paths.
> 
> I posted concerns about the bdrv_co_map_range() interface.  It would be
> safer to only have a copy_range() interface without exposing how data is
> mapped outside the driver where race conditions can occur and the format
> driver no longer has full control over file layout.

It's a good point, but I couldn't think of a way to implement copy_range between
two format drivers: both of them need to recurse down to their bs->file and what
we eventually want is a copy_file_range() on two fds (or an iscsi equivalent):

    src[qcow2]           ->              dst[raw]
        |                                   |
        v                                   v
    src[file]            ->              dst[file]
        |                                   |
        v                                   v
       fd1               ->                 fd2
                    copy_file_range

Maybe we should add BlockDriver.bdrv_co_map_range_{prepare,commit,abort} and
call them from bdrv_co_copy_range(). This way the code path works pretty much
the same way to .bdrv_co_preadv/pwritev.

Fam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
  2018-04-04 13:49   ` Fam Zheng
@ 2018-04-04 15:26     ` Paolo Bonzini
  2018-04-05 12:55     ` Stefan Hajnoczi
  1 sibling, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2018-04-04 15:26 UTC (permalink / raw)
  To: Fam Zheng, Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi

On 04/04/2018 15:49, Fam Zheng wrote:
>> I posted concerns about the bdrv_co_map_range() interface.  It would be
>> safer to only have a copy_range() interface without exposing how data is
>> mapped outside the driver where race conditions can occur and the format
>> driver no longer has full control over file layout.
> It's a good point, but I couldn't think of a way to implement copy_range between
> two format drivers: both of them need to recurse down to their bs->file and what
> we eventually want is a copy_file_range() on two fds (or an iscsi equivalent):
> 
>     src[qcow2]           ->              dst[raw]
>         |                                   |
>         v                                   v
>     src[file]            ->              dst[file]
>         |                                   |
>         v                                   v
>        fd1               ->                 fd2
>                     copy_file_range
> 
> Maybe we should add BlockDriver.bdrv_co_map_range_{prepare,commit,abort} and
> call them from bdrv_co_copy_range(). This way the code path works pretty much
> the same way to .bdrv_co_preadv/pwritev.

Or you could have bdrv_co_copy_range verify the mapping and return
-EAGAIN if it is not valid anymore?  (Then bdrv_co_map_range should fill
in a BlockDriverMapping struct, or something like that).

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
  2018-04-04 13:49   ` Fam Zheng
  2018-04-04 15:26     ` Paolo Bonzini
@ 2018-04-05 12:55     ` Stefan Hajnoczi
  2018-04-06 11:41       ` Paolo Bonzini
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-05 12:55 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Stefan Hajnoczi, qemu-devel, Kevin Wolf, qemu-block, Max Reitz, pbonzini

[-- Attachment #1: Type: text/plain, Size: 3965 bytes --]

On Wed, Apr 04, 2018 at 09:49:23PM +0800, Fam Zheng wrote:
> On Wed, 04/04 14:23, Stefan Hajnoczi wrote:
> > On Thu, Mar 29, 2018 at 07:09:06PM +0800, Fam Zheng wrote:
> > > [Posting a preview RFC for the general idea discussion and internal API review.
> > > Libiscsi support is being worked on in the meantime.]
> > > 
> > > This series introduces block layer API for copy offloading and makes use of it
> > > in qemu-img convert.
> > > 
> > > For now we implemented the operation in local file protocol with
> > > copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
> > > and potentially more.
> > > 
> > > As far as its usage goes, in addition to qemu-img convert, we can emulate
> > > offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror.
> > > 
> > > The new bdrv_co_map_range can also be an alternative way to implement format
> > > drivers in the future, once we make block/io.c use it in preadv/pwritev paths.
> > 
> > I posted concerns about the bdrv_co_map_range() interface.  It would be
> > safer to only have a copy_range() interface without exposing how data is
> > mapped outside the driver where race conditions can occur and the format
> > driver no longer has full control over file layout.
> 
> It's a good point, but I couldn't think of a way to implement copy_range between
> two format drivers: both of them need to recurse down to their bs->file and what
> we eventually want is a copy_file_range() on two fds (or an iscsi equivalent):
> 
>     src[qcow2]           ->              dst[raw]
>         |                                   |
>         v                                   v
>     src[file]            ->              dst[file]
>         |                                   |
>         v                                   v
>        fd1               ->                 fd2
>                     copy_file_range
> 
> Maybe we should add BlockDriver.bdrv_co_map_range_{prepare,commit,abort} and
> call them from bdrv_co_copy_range(). This way the code path works pretty much
> the same way to .bdrv_co_preadv/pwritev.

Regarding the recursion, there are two phases:
1. Finding the leaf BDS of the source (the left side of your diagram)
2. finding the leaf BDS of the destination (the right side)

Here is one way to represent this in BlockDriver:

  /* Map [offset, offset + nbytes) range onto a child of src_bs and
   * invoke bdrv_co_copy_file_range_src(child, ...), or invoke
   * bdrv_co_copy_file_range_dst() if the leaf child has been found.
   */
  .bdrv_co_copy_file_range_src(src_bs, dst_bs, offset, nbytes)

  /* Map [offset, offset + nbytes) range onto a child of dst_bs and
   * invoke bdrv_co_copy_file_range_src(src_bs, child, ...), or perform
   * the copy operation if the leaf child has been found.  Return
   * -ENOTSUP if src_bs and the leaf child have different BlockDrivers.
   */
  .bdrv_co_copy_file_range_dst(src_bs, dst_bs, offset, nbytes)

bdrv_copy_file_range() will invoke bdrv_co_copy_file_range_src() on
src[qcow2].  The qcow2 block driver will invoke
bdrv_co_copy_file_range_src() on src[file].  The file-posix driver will
invoke bdrv_co_copy_file_range_dst() on dst[raw].  The raw driver will
invoke bdrv_co_copy_file_range_dst() on dst[file], which sees that
src_bds (src[file]) is also file-posix and then goes ahead with
copy_file_range(2).

In the case where src[qcow2] is on file-posix but dst[raw] is on iSCSI,
the iSCSI .bdrv_co_copy_file_range_dst() call fails with -ENOTSUP and
the block layer can fall back to a traditional copy operation.

With this approach src[qcow2] could take a lock or keep track of a
serializing request struct so that other requests cannot interfere with
the operation, and it's done in a natural way since we remain in the
qcow2 function until the entire operation completes.  There's no need
for bookkeeping structs or callbacks.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
  2018-04-05 12:55     ` Stefan Hajnoczi
@ 2018-04-06 11:41       ` Paolo Bonzini
  2018-04-08  9:21         ` Fam Zheng
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2018-04-06 11:41 UTC (permalink / raw)
  To: Stefan Hajnoczi, Fam Zheng
  Cc: Stefan Hajnoczi, qemu-devel, Kevin Wolf, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]

On 05/04/2018 14:55, Stefan Hajnoczi wrote:
> bdrv_copy_file_range() will invoke bdrv_co_copy_file_range_src() on
> src[qcow2].  The qcow2 block driver will invoke
> bdrv_co_copy_file_range_src() on src[file].  The file-posix driver will
> invoke bdrv_co_copy_file_range_dst() on dst[raw].  The raw driver will
> invoke bdrv_co_copy_file_range_dst() on dst[file], which sees that
> src_bds (src[file]) is also file-posix and then goes ahead with
> copy_file_range(2).
> 
> In the case where src[qcow2] is on file-posix but dst[raw] is on iSCSI,
> the iSCSI .bdrv_co_copy_file_range_dst() call fails with -ENOTSUP and
> the block layer can fall back to a traditional copy operation.
> 
> With this approach src[qcow2] could take a lock or keep track of a
> serializing request struct so that other requests cannot interfere with
> the operation, and it's done in a natural way since we remain in the
> qcow2 function until the entire operation completes.  There's no need
> for bookkeeping structs or callbacks.

Could there be AB-BA deadlock if the guest attempts a concurrent copy
from A to B and from B to A?

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
  2018-04-06 11:41       ` Paolo Bonzini
@ 2018-04-08  9:21         ` Fam Zheng
  0 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2018-04-08  9:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefan Hajnoczi, Stefan Hajnoczi, qemu-devel, Kevin Wolf,
	qemu-block, Max Reitz

On Fri, 04/06 13:41, Paolo Bonzini wrote:
> On 05/04/2018 14:55, Stefan Hajnoczi wrote:
> > bdrv_copy_file_range() will invoke bdrv_co_copy_file_range_src() on
> > src[qcow2].  The qcow2 block driver will invoke
> > bdrv_co_copy_file_range_src() on src[file].  The file-posix driver will
> > invoke bdrv_co_copy_file_range_dst() on dst[raw].  The raw driver will
> > invoke bdrv_co_copy_file_range_dst() on dst[file], which sees that
> > src_bds (src[file]) is also file-posix and then goes ahead with
> > copy_file_range(2).
> > 
> > In the case where src[qcow2] is on file-posix but dst[raw] is on iSCSI,
> > the iSCSI .bdrv_co_copy_file_range_dst() call fails with -ENOTSUP and
> > the block layer can fall back to a traditional copy operation.
> > 
> > With this approach src[qcow2] could take a lock or keep track of a
> > serializing request struct so that other requests cannot interfere with
> > the operation, and it's done in a natural way since we remain in the
> > qcow2 function until the entire operation completes.  There's no need
> > for bookkeeping structs or callbacks.
> 
> Could there be AB-BA deadlock if the guest attempts a concurrent copy
> from A to B and from B to A?

I don't think bs_src need to hold its locks when calling into bs_dst for mapping
write ranges. So it should be safe.

Fam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 4/8] file-posix: Implement bdrv_co_copy_range
  2018-04-04 13:20   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2018-04-09  8:53     ` Fam Zheng
  0 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2018-04-09  8:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi, pbonzini

On Wed, 04/04 14:20, Stefan Hajnoczi wrote:
> On Thu, Mar 29, 2018 at 07:09:10PM +0800, Fam Zheng wrote:
> > +static ssize_t handle_aiocb_copy_range(RawPosixAIOData *aiocb)
> > +{
> > +#ifndef HAS_COPY_FILE_RANGE
> > +    return -ENOTSUP;
> > +#else
> > +    uint64_t bytes = aiocb->aio_nbytes;
> > +    off_t in_off = aiocb->aio_offset;
> > +    off_t out_off = aiocb->offset2;
> > +
> > +    while (bytes) {
> > +        ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off,
> > +                                      aiocb->fd2, &out_off,
> > +                                      bytes, 0);
> > +        if (ret < 0) {
> > +            return -errno;
> > +        }
> 
> EINTR should retry.

Will add (it is not listed in the manpage so I wasn't sure if it is necessary.)

Fam

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-04-09  8:53 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-29 11:09 [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading Fam Zheng
2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 1/8] block: Introduce bdrv_co_map_range API Fam Zheng
2018-04-04 12:57   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-04 13:01   ` Stefan Hajnoczi
2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 2/8] qcow2: Implement bdrv_co_map_range Fam Zheng
2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 3/8] block: Introduce bdrv_co_copy_range Fam Zheng
2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 4/8] file-posix: Implement bdrv_co_copy_range Fam Zheng
2018-04-04 13:20   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-09  8:53     ` Fam Zheng
2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 5/8] file-posix: Implement bdrv_co_map_range Fam Zheng
2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 6/8] raw: Implement raw_co_map_range Fam Zheng
2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 7/8] block-backend: Add blk_co_copy_range Fam Zheng
2018-03-29 11:09 ` [Qemu-devel] [RFC PATCH 8/8] qemu-img: Convert with copy offloading Fam Zheng
2018-03-31  8:13 ` [Qemu-devel] [RFC PATCH 0/8] qemu-img convert " no-reply
2018-04-04 13:23 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-04 13:49   ` Fam Zheng
2018-04-04 15:26     ` Paolo Bonzini
2018-04-05 12:55     ` Stefan Hajnoczi
2018-04-06 11:41       ` Paolo Bonzini
2018-04-08  9:21         ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.