All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH v2 0/7] qemu-img convert with copy offloading
@ 2018-04-18  3:04 Fam Zheng
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 1/7] block: Introduce API for " Fam Zheng
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Fam Zheng @ 2018-04-18  3:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Ronnie Sahlberg, qemu-block, Peter Lieven,
	Fam Zheng, Kevin Wolf, Max Reitz, Stefan Hajnoczi

v2: - Add iscsi EXTENDED COPY.
    - Design change: bdrv_co_copy_range_{from,to}. [Stefan]
    - Retry upon EINTR. [Stefan]
    - Drop the bounce buffer fallback. It is inefficient to attempt the
      offloaded copy over and over again if the error is returned from the host
      rather than checking our internal state. E.g.  trying copy_file_range
      between two filesystems results in EXDEV, in which case qemu-img.c should
      switch back to the copy path for the remaining sectors.

This series introduces block layer API for copy offloading and makes use of it
in qemu-img convert.

For now we implemented the operation in local file protocol with
copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
and potentially more.

As far as its usage goes, in addition to qemu-img convert, we can emulate
offloading in scsi-disk (handle EXTENDED COPY command), and use the API in
block jobs too.

Fam Zheng (7):
  block: Introduce API for copy offloading
  raw: Implement copy offloading
  qcow2: Implement copy offloading
  file-posix: Implement bdrv_co_copy_range
  iscsi: Implement copy offloading
  block-backend: Add blk_co_copy_range
  qemu-img: Convert with copy offloading

 block/block-backend.c          |   9 ++
 block/file-posix.c             |  99 ++++++++++++++-
 block/io.c                     |  91 ++++++++++++++
 block/iscsi.c                  | 266 +++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c                  | 224 +++++++++++++++++++++++++++++-----
 block/raw-format.c             |  20 ++++
 include/block/block.h          |   4 +
 include/block/block_int.h      |  30 +++++
 include/block/raw-aio.h        |  10 +-
 include/scsi/constants.h       |   3 +
 include/sysemu/block-backend.h |   4 +
 qemu-img.c                     |  50 +++++++-
 12 files changed, 773 insertions(+), 37 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC PATCH v2 1/7] block: Introduce API for copy offloading
  2018-04-18  3:04 [Qemu-devel] [RFC PATCH v2 0/7] qemu-img convert with copy offloading Fam Zheng
@ 2018-04-18  3:04 ` Fam Zheng
  2018-04-26 14:57   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 2/7] raw: Implement " Fam Zheng
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Fam Zheng @ 2018-04-18  3:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Ronnie Sahlberg, qemu-block, Peter Lieven,
	Fam Zheng, Kevin Wolf, Max Reitz, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/io.c                | 91 +++++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |  4 +++
 include/block/block_int.h | 30 ++++++++++++++++
 3 files changed, 125 insertions(+)

diff --git a/block/io.c b/block/io.c
index bd9a19a9c4..d274e9525f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2826,3 +2826,94 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
         bdrv_unregister_buf(child->bs, host);
     }
 }
+
+static int bdrv_co_copy_range_internal(BdrvChild *src,
+                                       uint64_t src_offset,
+                                       BdrvChild *dst,
+                                       uint64_t dst_offset,
+                                       uint64_t bytes, BdrvRequestFlags flags,
+                                       bool recurse_src)
+{
+    int ret;
+
+    if (!src || !dst || !src->bs || !dst->bs) {
+        return -ENOMEDIUM;
+    }
+    ret = bdrv_check_byte_request(src->bs, src_offset, bytes);
+    if (ret) {
+        return ret;
+    }
+
+    ret = bdrv_check_byte_request(dst->bs, dst_offset, bytes);
+    if (ret) {
+        return ret;
+    }
+    if (flags & BDRV_REQ_ZERO_WRITE) {
+        return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, flags);
+    }
+
+    if (!src->bs->drv->bdrv_co_copy_range_from
+        || !dst->bs->drv->bdrv_co_copy_range_to
+        || src->bs->encrypted || dst->bs->encrypted) {
+        return -ENOTSUP;
+    }
+    if (recurse_src) {
+        return src->bs->drv->bdrv_co_copy_range_from(src->bs,
+                                                     src, src_offset,
+                                                     dst, dst_offset,
+                                                     bytes, flags);
+    } else {
+        return dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
+                                                   src, src_offset,
+                                                   dst, dst_offset,
+                                                   bytes, flags);
+    }
+}
+
+/* Copy range from @bs to @dst. */
+int bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
+                            BdrvChild *dst, uint64_t dst_offset,
+                            uint64_t bytes, BdrvRequestFlags flags)
+{
+    return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
+                                       bytes, flags, true);
+}
+
+/* Copy range from @src to @bs. Should only be called by block drivers when @bs
+ * is the leaf. */
+int bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
+                          BdrvChild *dst, uint64_t dst_offset,
+                          uint64_t bytes, BdrvRequestFlags flags)
+{
+    return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
+                                       bytes, flags, false);
+}
+
+int bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
+                       BdrvChild *dst, uint64_t dst_offset,
+                       uint64_t bytes, BdrvRequestFlags flags)
+{
+    BdrvTrackedRequest src_req, dst_req;
+    BlockDriverState *src_bs = src->bs;
+    BlockDriverState *dst_bs = dst->bs;
+    int ret;
+
+    bdrv_inc_in_flight(src_bs);
+    bdrv_inc_in_flight(dst_bs);
+    tracked_request_begin(&src_req, src_bs, src_offset,
+                          bytes, BDRV_TRACKED_READ);
+    tracked_request_begin(&dst_req, dst_bs, dst_offset,
+                          bytes, BDRV_TRACKED_WRITE);
+
+    wait_serialising_requests(&src_req);
+    wait_serialising_requests(&dst_req);
+    ret = bdrv_co_copy_range_from(src, src_offset,
+                                  dst, dst_offset,
+                                  bytes, flags);
+
+    tracked_request_end(&src_req);
+    tracked_request_end(&dst_req);
+    bdrv_dec_in_flight(src_bs);
+    bdrv_dec_in_flight(dst_bs);
+    return ret;
+}
diff --git a/include/block/block.h b/include/block/block.h
index cdec3639a3..72ac011b2b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -604,4 +604,8 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
  */
 void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
 void bdrv_unregister_buf(BlockDriverState *bs, void *host);
+
+int bdrv_co_copy_range(BdrvChild *bs, uint64_t offset,
+                       BdrvChild *src, uint64_t src_offset,
+                       uint64_t bytes, BdrvRequestFlags flags);
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c4dd1d4bb8..305c29b75e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -206,6 +206,29 @@ struct BlockDriver {
     int coroutine_fn (*bdrv_co_pdiscard)(BlockDriverState *bs,
         int64_t offset, int bytes);
 
+    /* Map [offset, offset + nbytes) range onto a child of @bs to copy from,
+     * and invoke bdrv_co_copy_range_from(child, ...), or invoke
+     * bdrv_co_copy_range_to() if @bs is the leaf child to copy data from.
+     */
+    int coroutine_fn (*bdrv_co_copy_range_from)(BlockDriverState *bs,
+                                                BdrvChild *src,
+                                                uint64_t offset,
+                                                BdrvChild *dst,
+                                                uint64_t dst_offset,
+                                                uint64_t bytes, BdrvRequestFlags flags);
+
+    /* Map [offset, offset + nbytes) range onto a child of bs to copy data to,
+     * and invoke bdrv_co_copy_range_to(child, src, ...), or perform the copy
+     * operation if @bs is the leaf and @src has the same BlockDriver.  Return
+     * -ENOTSUP if @bs is the leaf but @src has a different BlockDriver.
+     */
+    int coroutine_fn (*bdrv_co_copy_range_to)(BlockDriverState *bs,
+                                              BdrvChild *src,
+                                              uint64_t src_offset,
+                                              BdrvChild *dst,
+                                              uint64_t dst_offset,
+                                              uint64_t bytes, BdrvRequestFlags flags);
+
     /*
      * Building block for bdrv_block_status[_above] and
      * bdrv_is_allocated[_above].  The driver should answer only
@@ -1090,4 +1113,11 @@ void bdrv_dec_in_flight(BlockDriverState *bs);
 
 void blockdev_close_all_bdrv_states(void);
 
+int bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
+                            BdrvChild *dst, uint64_t dst_offset,
+                            uint64_t bytes, BdrvRequestFlags flags);
+int bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
+                            BdrvChild *dst, uint64_t dst_offset,
+                            uint64_t bytes, BdrvRequestFlags flags);
+
 #endif /* BLOCK_INT_H */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC PATCH v2 2/7] raw: Implement copy offloading
  2018-04-18  3:04 [Qemu-devel] [RFC PATCH v2 0/7] qemu-img convert with copy offloading Fam Zheng
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 1/7] block: Introduce API for " Fam Zheng
@ 2018-04-18  3:04 ` Fam Zheng
  2018-04-26 14:57   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 3/7] qcow2: " Fam Zheng
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Fam Zheng @ 2018-04-18  3:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Ronnie Sahlberg, qemu-block, Peter Lieven,
	Fam Zheng, Kevin Wolf, Max Reitz, Stefan Hajnoczi

Just pass down to ->file.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/raw-format.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index a378547c99..febddf00c0 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -482,6 +482,24 @@ static int raw_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
     return bdrv_probe_geometry(bs->file->bs, geo);
 }
 
+static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
+                                               BdrvChild *src, uint64_t src_offset,
+                                               BdrvChild *dst, uint64_t dst_offset,
+                                               uint64_t bytes, BdrvRequestFlags flags)
+{
+    return bdrv_co_copy_range_from(bs->file, src_offset, dst, dst_offset,
+                                   bytes, flags);
+}
+
+static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
+                                             BdrvChild *src, uint64_t src_offset,
+                                             BdrvChild *dst, uint64_t dst_offset,
+                                             uint64_t bytes, BdrvRequestFlags flags)
+{
+    return bdrv_co_copy_range_to(src, src_offset, bs->file, dst_offset, bytes,
+                                 flags);
+}
+
 BlockDriver bdrv_raw = {
     .format_name          = "raw",
     .instance_size        = sizeof(BDRVRawState),
@@ -498,6 +516,8 @@ BlockDriver bdrv_raw = {
     .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
     .bdrv_co_pdiscard     = &raw_co_pdiscard,
     .bdrv_co_block_status = &raw_co_block_status,
+    .bdrv_co_copy_range_from = &raw_co_copy_range_from,
+    .bdrv_co_copy_range_to  = &raw_co_copy_range_to,
     .bdrv_truncate        = &raw_truncate,
     .bdrv_getlength       = &raw_getlength,
     .has_variable_length  = true,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC PATCH v2 3/7] qcow2: Implement copy offloading
  2018-04-18  3:04 [Qemu-devel] [RFC PATCH v2 0/7] qemu-img convert with copy offloading Fam Zheng
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 1/7] block: Introduce API for " Fam Zheng
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 2/7] raw: Implement " Fam Zheng
@ 2018-04-18  3:04 ` Fam Zheng
  2018-04-26 15:34   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 4/7] file-posix: Implement bdrv_co_copy_range Fam Zheng
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Fam Zheng @ 2018-04-18  3:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Ronnie Sahlberg, qemu-block, Peter Lieven,
	Fam Zheng, Kevin Wolf, Max Reitz, Stefan Hajnoczi

The two callbacks are implemented quite similarly to the read/write
functions: bdrv_co_copy_range_from maps for read and calls into bs->file
or bs->backing depending on the allocation status; bdrv_co_copy_range_to
maps for write and calls into bs->file.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/qcow2.c | 224 ++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 194 insertions(+), 30 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 486f3e83b7..9a0046220d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1755,6 +1755,31 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     return status;
 }
 
+static int qcow2_handle_l2meta(BlockDriverState *bs, QCowL2Meta *l2meta)
+{
+    int ret = 0;
+
+    while (l2meta != NULL) {
+        QCowL2Meta *next;
+
+        if (!ret) {
+            ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
+        }
+
+        /* Take the request off the list of running requests */
+        if (l2meta->nb_clusters != 0) {
+            QLIST_REMOVE(l2meta, next_in_flight);
+        }
+
+        qemu_co_queue_restart_all(&l2meta->dependent_requests);
+
+        next = l2meta->next;
+        g_free(l2meta);
+        l2meta = next;
+    }
+    return ret;
+}
+
 static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
                                         uint64_t bytes, QEMUIOVector *qiov,
                                         int flags)
@@ -2041,24 +2066,10 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
             }
         }
 
-        while (l2meta != NULL) {
-            QCowL2Meta *next;
-
-            ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
-            if (ret < 0) {
-                goto fail;
-            }
-
-            /* Take the request off the list of running requests */
-            if (l2meta->nb_clusters != 0) {
-                QLIST_REMOVE(l2meta, next_in_flight);
-            }
-
-            qemu_co_queue_restart_all(&l2meta->dependent_requests);
-
-            next = l2meta->next;
-            g_free(l2meta);
-            l2meta = next;
+        ret = qcow2_handle_l2meta(bs, l2meta);
+        l2meta = NULL;
+        if (ret) {
+            goto fail;
         }
 
         bytes -= cur_bytes;
@@ -2069,18 +2080,7 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
     ret = 0;
 
 fail:
-    while (l2meta != NULL) {
-        QCowL2Meta *next;
-
-        if (l2meta->nb_clusters != 0) {
-            QLIST_REMOVE(l2meta, next_in_flight);
-        }
-        qemu_co_queue_restart_all(&l2meta->dependent_requests);
-
-        next = l2meta->next;
-        g_free(l2meta);
-        l2meta = next;
-    }
+    qcow2_handle_l2meta(bs, l2meta);
 
     qemu_co_mutex_unlock(&s->lock);
 
@@ -3267,6 +3267,168 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
     return ret;
 }
 
+static int qcow2_co_copy_range_from(BlockDriverState *bs,
+                                    BdrvChild *src, uint64_t src_offset,
+                                    BdrvChild *dst, uint64_t dst_offset,
+                                    uint64_t bytes, BdrvRequestFlags flags)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int offset_in_cluster;
+    int ret;
+    unsigned int cur_bytes; /* number of bytes in current iteration */
+    uint64_t cluster_offset = 0;
+    BdrvChild *child = NULL;
+
+    assert(!bs->encrypted);
+    qemu_co_mutex_lock(&s->lock);
+
+    while (bytes != 0) {
+
+        /* prepare next request */
+        cur_bytes = MIN(bytes, INT_MAX);
+
+        ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes, &cluster_offset);
+        if (ret < 0) {
+            goto out;
+        }
+
+        offset_in_cluster = offset_into_cluster(s, src_offset);
+
+        switch (ret) {
+        case QCOW2_CLUSTER_UNALLOCATED:
+            if (bs->backing) {
+                child = bs->backing;
+            } else {
+                flags |= BDRV_REQ_ZERO_WRITE;
+            }
+            break;
+
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+            flags |= BDRV_REQ_ZERO_WRITE;
+            break;
+
+        case QCOW2_CLUSTER_COMPRESSED:
+            ret = -ENOTSUP;
+            goto out;
+            break;
+
+        case QCOW2_CLUSTER_NORMAL:
+            child = bs->file;
+            if ((cluster_offset & 511) != 0) {
+                ret = -EIO;
+                goto out;
+            }
+            break;
+
+        default:
+            abort();
+        }
+        qemu_co_mutex_unlock(&s->lock);
+        ret = bdrv_co_copy_range_from(child,
+                                      cluster_offset + offset_in_cluster,
+                                      dst, dst_offset,
+                                      cur_bytes, flags);
+        qemu_co_mutex_lock(&s->lock);
+        if (ret < 0) {
+            goto out;
+        }
+
+        bytes -= cur_bytes;
+        src_offset += cur_bytes;
+    }
+    ret = 0;
+
+out:
+    qemu_co_mutex_unlock(&s->lock);
+    return ret;
+}
+
+static int qcow2_co_copy_range_to(BlockDriverState *bs,
+                                  BdrvChild *src, uint64_t src_offset,
+                                  BdrvChild *dst, uint64_t dst_offset,
+                                  uint64_t bytes, BdrvRequestFlags flags)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int offset_in_cluster;
+    int ret;
+    unsigned int cur_bytes; /* number of sectors in current iteration */
+    uint64_t cluster_offset;
+    uint8_t *cluster_data = NULL;
+    QCowL2Meta *l2meta = NULL;
+
+    assert(!bs->encrypted);
+    s->cluster_cache_offset = -1; /* disable compressed cache */
+
+    qemu_co_mutex_lock(&s->lock);
+
+    while (bytes != 0) {
+
+        l2meta = NULL;
+
+        offset_in_cluster = offset_into_cluster(s, dst_offset);
+        cur_bytes = MIN(bytes, INT_MAX);
+
+        ret = qcow2_alloc_cluster_offset(bs, dst_offset, &cur_bytes,
+                                         &cluster_offset, &l2meta);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        assert((cluster_offset & 511) == 0);
+
+        ret = qcow2_pre_write_overlap_check(bs, 0,
+                cluster_offset + offset_in_cluster, cur_bytes);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        qemu_co_mutex_unlock(&s->lock);
+        ret = bdrv_co_copy_range_to(src, src_offset,
+                                    bs->file,
+                                    cluster_offset + offset_in_cluster,
+                                    cur_bytes, flags);
+        qemu_co_mutex_lock(&s->lock);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        while (l2meta != NULL) {
+            QCowL2Meta *next;
+
+            ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
+            if (ret < 0) {
+                goto fail;
+            }
+
+            /* Take the request off the list of running requests */
+            if (l2meta->nb_clusters != 0) {
+                QLIST_REMOVE(l2meta, next_in_flight);
+            }
+
+            qemu_co_queue_restart_all(&l2meta->dependent_requests);
+
+            next = l2meta->next;
+            g_free(l2meta);
+            l2meta = next;
+        }
+
+        bytes -= cur_bytes;
+        dst_offset += cur_bytes;
+    }
+    ret = 0;
+
+fail:
+    qcow2_handle_l2meta(bs, l2meta);
+
+    qemu_co_mutex_unlock(&s->lock);
+
+    qemu_vfree(cluster_data);
+    trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
+
+    return ret;
+}
+
 static int qcow2_truncate(BlockDriverState *bs, int64_t offset,
                           PreallocMode prealloc, Error **errp)
 {
@@ -4515,6 +4677,8 @@ BlockDriver bdrv_qcow2 = {
 
     .bdrv_co_pwrite_zeroes  = qcow2_co_pwrite_zeroes,
     .bdrv_co_pdiscard       = qcow2_co_pdiscard,
+    .bdrv_co_copy_range_from = qcow2_co_copy_range_from,
+    .bdrv_co_copy_range_to  = qcow2_co_copy_range_to,
     .bdrv_truncate          = qcow2_truncate,
     .bdrv_co_pwritev_compressed = qcow2_co_pwritev_compressed,
     .bdrv_make_empty        = qcow2_make_empty,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC PATCH v2 4/7] file-posix: Implement bdrv_co_copy_range
  2018-04-18  3:04 [Qemu-devel] [RFC PATCH v2 0/7] qemu-img convert with copy offloading Fam Zheng
                   ` (2 preceding siblings ...)
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 3/7] qcow2: " Fam Zheng
@ 2018-04-18  3:04 ` Fam Zheng
  2018-05-03  9:25   ` Stefan Hajnoczi
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 5/7] iscsi: Implement copy offloading Fam Zheng
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Fam Zheng @ 2018-04-18  3:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Ronnie Sahlberg, qemu-block, Peter Lieven,
	Fam Zheng, Kevin Wolf, Max Reitz, Stefan Hajnoczi

With copy_file_range(2), we can implement the bdrv_co_copy_range
semantics.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/file-posix.c      | 99 +++++++++++++++++++++++++++++++++++++++++++++++--
 include/block/raw-aio.h | 10 ++++-
 2 files changed, 104 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 3794c0007a..45ad543481 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -100,6 +100,7 @@
 #ifdef CONFIG_XFS
 #include <xfs/xfs.h>
 #endif
+#include <sys/syscall.h>
 
 //#define DEBUG_BLOCK
 
@@ -185,6 +186,8 @@ typedef struct RawPosixAIOData {
 #define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
     off_t aio_offset;
     int aio_type;
+    int fd2;
+    off_t offset2;
 } RawPosixAIOData;
 
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -1421,6 +1424,48 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
     return -ENOTSUP;
 }
 
+#ifdef __NR_copy_file_range
+#define HAS_COPY_FILE_RANGE
+#endif
+
+#ifdef HAS_COPY_FILE_RANGE
+static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
+                             off_t *out_off, size_t len, unsigned int flags)
+{
+    return syscall(__NR_copy_file_range, in_fd, in_off, out_fd,
+                   out_off, len, flags);
+}
+#endif
+
+static ssize_t handle_aiocb_copy_range(RawPosixAIOData *aiocb)
+{
+#ifndef HAS_COPY_FILE_RANGE
+    return -ENOTSUP;
+#else
+    uint64_t bytes = aiocb->aio_nbytes;
+    off_t in_off = aiocb->aio_offset;
+    off_t out_off = aiocb->offset2;
+
+    while (bytes) {
+        ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off,
+                                      aiocb->fd2, &out_off,
+                                      bytes, 0);
+        if (ret == -EINTR) {
+            continue;
+        }
+        if (ret < 0) {
+            return -errno;
+        }
+        if (!ret) {
+            /* No progress (e.g. when beyond EOF), fall back to buffer I/O. */
+            return -ENOTSUP;
+        }
+        bytes -= ret;
+    }
+    return 0;
+#endif
+}
+
 static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 {
     int ret = -EOPNOTSUPP;
@@ -1501,6 +1546,9 @@ static int aio_worker(void *arg)
     case QEMU_AIO_WRITE_ZEROES:
         ret = handle_aiocb_write_zeroes(aiocb);
         break;
+    case QEMU_AIO_COPY_RANGE:
+        ret = handle_aiocb_copy_range(aiocb);
+        break;
     default:
         fprintf(stderr, "invalid aio request (0x%x)\n", aiocb->aio_type);
         ret = -EINVAL;
@@ -1511,9 +1559,10 @@ static int aio_worker(void *arg)
     return ret;
 }
 
-static int paio_submit_co(BlockDriverState *bs, int fd,
-                          int64_t offset, QEMUIOVector *qiov,
-                          int bytes, int type)
+static int paio_submit_co_full(BlockDriverState *bs, int fd,
+                               int64_t offset, int fd2, int64_t offset2,
+                               QEMUIOVector *qiov,
+                               int bytes, int type)
 {
     RawPosixAIOData *acb = g_new(RawPosixAIOData, 1);
     ThreadPool *pool;
@@ -1521,6 +1570,8 @@ static int paio_submit_co(BlockDriverState *bs, int fd,
     acb->bs = bs;
     acb->aio_type = type;
     acb->aio_fildes = fd;
+    acb->fd2 = fd2;
+    acb->offset2 = offset2;
 
     acb->aio_nbytes = bytes;
     acb->aio_offset = offset;
@@ -1536,6 +1587,13 @@ static int paio_submit_co(BlockDriverState *bs, int fd,
     return thread_pool_submit_co(pool, aio_worker, acb);
 }
 
+static inline int paio_submit_co(BlockDriverState *bs, int fd,
+                                 int64_t offset, QEMUIOVector *qiov,
+                                 int bytes, int type)
+{
+    return paio_submit_co_full(bs, fd, offset, -1, 0, qiov, bytes, type);
+}
+
 static BlockAIOCB *paio_submit(BlockDriverState *bs, int fd,
         int64_t offset, QEMUIOVector *qiov, int bytes,
         BlockCompletionFunc *cb, void *opaque, int type)
@@ -2312,6 +2370,35 @@ static void raw_abort_perm_update(BlockDriverState *bs)
     raw_handle_perm_lock(bs, RAW_PL_ABORT, 0, 0, NULL);
 }
 
+static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
+                                  BdrvChild *src, uint64_t src_offset,
+                                  BdrvChild *dst, uint64_t dst_offset,
+                                  uint64_t bytes, BdrvRequestFlags flags)
+{
+    return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags);
+}
+
+static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
+                                             BdrvChild *src, uint64_t src_offset,
+                                             BdrvChild *dst, uint64_t dst_offset,
+                                             uint64_t bytes, BdrvRequestFlags flags)
+{
+    BDRVRawState *s = bs->opaque;
+    BDRVRawState *src_s;
+
+    assert(dst->bs == bs);
+    if (src->bs->drv->bdrv_co_copy_range_to != raw_co_copy_range_to) {
+        return -ENOTSUP;
+    }
+
+    src_s = src->opaque;
+    if (fd_open(bs) < 0 || fd_open(bs) < 0) {
+        return -EIO;
+    }
+    return paio_submit_co_full(bs, src_s->fd, src_offset, s->fd, dst_offset,
+                               NULL, bytes, QEMU_AIO_COPY_RANGE);
+}
+
 BlockDriver bdrv_file = {
     .format_name = "file",
     .protocol_name = "file",
@@ -2334,6 +2421,8 @@ BlockDriver bdrv_file = {
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_aio_flush = raw_aio_flush,
     .bdrv_aio_pdiscard = raw_aio_pdiscard,
+    .bdrv_co_copy_range_from = raw_co_copy_range_from,
+    .bdrv_co_copy_range_to  = raw_co_copy_range_to,
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
     .bdrv_io_unplug = raw_aio_unplug,
@@ -2811,6 +2900,10 @@ static BlockDriver bdrv_host_device = {
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_aio_flush	= raw_aio_flush,
     .bdrv_aio_pdiscard   = hdev_aio_pdiscard,
+    .bdrv_co_copy_range_from = raw_co_copy_range_from,
+    .bdrv_co_copy_range_to  = raw_co_copy_range_to,
+    .bdrv_co_copy_range_from = raw_co_copy_range_from,
+    .bdrv_co_copy_range_to  = raw_co_copy_range_to,
     .bdrv_refresh_limits = raw_refresh_limits,
     .bdrv_io_plug = raw_aio_plug,
     .bdrv_io_unplug = raw_aio_unplug,
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index a4cdbbf1b7..324053020b 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -25,9 +25,15 @@
 #define QEMU_AIO_FLUSH        0x0008
 #define QEMU_AIO_DISCARD      0x0010
 #define QEMU_AIO_WRITE_ZEROES 0x0020
+#define QEMU_AIO_COPY_RANGE   0x0040
 #define QEMU_AIO_TYPE_MASK \
-        (QEMU_AIO_READ|QEMU_AIO_WRITE|QEMU_AIO_IOCTL|QEMU_AIO_FLUSH| \
-         QEMU_AIO_DISCARD|QEMU_AIO_WRITE_ZEROES)
+        (QEMU_AIO_READ | \
+         QEMU_AIO_WRITE | \
+         QEMU_AIO_IOCTL | \
+         QEMU_AIO_FLUSH | \
+         QEMU_AIO_DISCARD | \
+         QEMU_AIO_WRITE_ZEROES | \
+         QEMU_AIO_COPY_RANGE)
 
 /* AIO flags */
 #define QEMU_AIO_MISALIGNED   0x1000
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC PATCH v2 5/7] iscsi: Implement copy offloading
  2018-04-18  3:04 [Qemu-devel] [RFC PATCH v2 0/7] qemu-img convert with copy offloading Fam Zheng
                   ` (3 preceding siblings ...)
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 4/7] file-posix: Implement bdrv_co_copy_range Fam Zheng
@ 2018-04-18  3:04 ` Fam Zheng
  2018-05-03 10:27   ` Stefan Hajnoczi
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 6/7] block-backend: Add blk_co_copy_range Fam Zheng
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 7/7] qemu-img: Convert with copy offloading Fam Zheng
  6 siblings, 1 reply; 18+ messages in thread
From: Fam Zheng @ 2018-04-18  3:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Ronnie Sahlberg, qemu-block, Peter Lieven,
	Fam Zheng, Kevin Wolf, Max Reitz, Stefan Hajnoczi

Issue EXTENDED COPY (LID1) command to implement the copy_range API.

The parameter data construction code is ported from libiscsi's
iscsi-dd.c.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/iscsi.c            | 266 +++++++++++++++++++++++++++++++++++++++++++++++
 include/scsi/constants.h |   3 +
 2 files changed, 269 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index f5aecfc883..7d17e03ad3 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -68,6 +68,7 @@ typedef struct IscsiLun {
     QemuMutex mutex;
     struct scsi_inquiry_logical_block_provisioning lbp;
     struct scsi_inquiry_block_limits bl;
+    struct scsi_inquiry_device_designator *dd;
     unsigned char *zeroblock;
     /* The allocmap tracks which clusters (pages) on the iSCSI target are
      * allocated and which are not. In case a target returns zeros for
@@ -1740,6 +1741,29 @@ static QemuOptsList runtime_opts = {
     },
 };
 
+static void iscsi_save_designator(IscsiLun *lun,
+                                  struct scsi_inquiry_device_identification *inq_di)
+{
+    struct scsi_inquiry_device_designator *desig, *copy = NULL;
+
+    for (desig = inq_di->designators; desig; desig = desig->next) {
+        if (desig->association ||
+            desig->designator_type > SCSI_DESIGNATOR_TYPE_NAA) {
+            continue;
+        }
+        /* NAA works better than T10 vendor ID based designator. */
+        if (!copy || copy->designator_type < desig->designator_type) {
+            copy = desig;
+        }
+    }
+    if (copy) {
+        lun->dd = g_new(struct scsi_inquiry_device_designator, 1);
+        *lun->dd = *copy;
+        lun->dd->designator = g_malloc(copy->designator_length);
+        memcpy(lun->dd->designator, copy->designator, copy->designator_length);
+    }
+}
+
 static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
                       Error **errp)
 {
@@ -1922,6 +1946,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
         struct scsi_task *inq_task;
         struct scsi_inquiry_logical_block_provisioning *inq_lbp;
         struct scsi_inquiry_block_limits *inq_bl;
+        struct scsi_inquiry_device_identification *inq_di;
         switch (inq_vpd->pages[i]) {
         case SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING:
             inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
@@ -1947,6 +1972,17 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
                    sizeof(struct scsi_inquiry_block_limits));
             scsi_free_scsi_task(inq_task);
             break;
+        case SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION:
+            inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
+                                    SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION,
+                                    (void **) &inq_di, errp);
+            if (inq_task == NULL) {
+                ret = -EINVAL;
+                goto out;
+            }
+            iscsi_save_designator(iscsilun, inq_di);
+            scsi_free_scsi_task(inq_task);
+            break;
         default:
             break;
         }
@@ -2003,6 +2039,8 @@ static void iscsi_close(BlockDriverState *bs)
         iscsi_logout_sync(iscsi);
     }
     iscsi_destroy_context(iscsi);
+    g_free(iscsilun->dd->designator);
+    g_free(iscsilun->dd);
     g_free(iscsilun->zeroblock);
     iscsi_allocmap_free(iscsilun);
     qemu_mutex_destroy(&iscsilun->mutex);
@@ -2184,6 +2222,230 @@ static void coroutine_fn iscsi_co_invalidate_cache(BlockDriverState *bs,
     iscsi_allocmap_invalidate(iscsilun);
 }
 
+static int coroutine_fn iscsi_co_copy_range_from(BlockDriverState *bs,
+                                                 BdrvChild *src,
+                                                 uint64_t src_offset,
+                                                 BdrvChild *dst,
+                                                 uint64_t dst_offset,
+                                                 uint64_t bytes,
+                                                 BdrvRequestFlags flags)
+{
+    return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags);
+}
+
+static struct scsi_task *iscsi_xcopy_task(int param_len)
+{
+    struct scsi_task *task;
+
+    task = g_new0(struct scsi_task, 1);
+
+    task->cdb[0]     = EXTENDED_COPY;
+    task->cdb[10]    = (param_len >> 24) & 0xFF;
+    task->cdb[11]    = (param_len >> 16) & 0xFF;
+    task->cdb[12]    = (param_len >> 8) & 0xFF;
+    task->cdb[13]    = param_len & 0xFF;
+    task->cdb_size   = 16;
+    task->xfer_dir   = SCSI_XFER_WRITE;
+    task->expxferlen = param_len;
+
+    return task;
+}
+
+static int iscsi_populate_target_desc(unsigned char *desc, IscsiLun *lun)
+{
+    struct scsi_inquiry_device_designator *dd = lun->dd;
+
+    memset(desc, 0, 32);
+    desc[0] = IDENT_DESCR_TGT_DESCR;
+    desc[4] = dd->code_set;
+    desc[5] = (dd->designator_type & 0xF)
+        | ((dd->association & 3) << 4);
+    desc[7] = dd->designator_length;
+    memcpy(desc + 8, dd->designator, dd->designator_length);
+
+    desc[28] = 0;
+    desc[29] = (lun->block_size >> 16) & 0xFF;
+    desc[30] = (lun->block_size >> 8) & 0xFF;
+    desc[31] = lun->block_size & 0xFF;
+
+    return 32;
+}
+
+static int iscsi_xcopy_desc_hdr(uint8_t *hdr, int dc, int cat, int src_index,
+                                int dst_index)
+{
+    int desc_len = 28;
+
+    hdr[0] = 0x02; /* BLK_TO_BLK_SEG_DESCR */
+    hdr[1] = ((dc << 1) | cat) & 0xFF;
+    hdr[2] = (desc_len >> 8) & 0xFF;
+    /* don't account for the first 4 bytes in descriptor header*/
+    hdr[3] = (desc_len - 4 /* SEG_DESC_SRC_INDEX_OFFSET */) & 0xFF;
+    hdr[4] = (src_index >> 8) & 0xFF;
+    hdr[5] = src_index & 0xFF;
+    hdr[6] = (dst_index >> 8) & 0xFF;
+    hdr[7] = dst_index & 0xFF;
+
+    return desc_len;
+}
+
+static int iscsi_xcopy_populate_desc(uint8_t *desc, int dc, int cat,
+                                     int src_index, int dst_index, int num_blks,
+                                     uint64_t src_lba, uint64_t dst_lba)
+{
+    int desc_len = iscsi_xcopy_desc_hdr(desc, dc, cat,
+                                        src_index, dst_index);
+
+    desc[10] = (num_blks >> 8) & 0xFF;
+    desc[11] = num_blks & 0xFF;
+    desc[12] = (src_lba >> 56) & 0xFF;
+    desc[13] = (src_lba >> 48) & 0xFF;
+    desc[14] = (src_lba >> 40) & 0xFF;
+    desc[15] = (src_lba >> 32) & 0xFF;
+    desc[16] = (src_lba >> 24) & 0xFF;
+    desc[17] = (src_lba >> 16) & 0xFF;
+    desc[18] = (src_lba >> 8) & 0xFF;
+    desc[19] = src_lba & 0xFF;
+    desc[20] = (dst_lba >> 56) & 0xFF;
+    desc[21] = (dst_lba >> 48) & 0xFF;
+    desc[22] = (dst_lba >> 40) & 0xFF;
+    desc[23] = (dst_lba >> 32) & 0xFF;
+    desc[24] = (dst_lba >> 24) & 0xFF;
+    desc[25] = (dst_lba >> 16) & 0xFF;
+    desc[26] = (dst_lba >> 8) & 0xFF;
+    desc[27] = dst_lba & 0xFF;
+
+    return desc_len;
+}
+
+static void iscsi_xcopy_populate_header(unsigned char *buf, int list_id, int str,
+                                        int list_id_usage, int prio,
+                                        int tgt_desc_len,
+                                        int seg_desc_len, int inline_data_len)
+{
+    buf[0] = list_id;
+    buf[1] = ((str & 1) << 5) | ((list_id_usage & 3) << 3) | (prio & 7);
+    buf[2] = (tgt_desc_len >> 8) & 0xFF;
+    buf[3] = tgt_desc_len & 0xFF;
+    buf[8] = (seg_desc_len >> 24) & 0xFF;
+    buf[9] = (seg_desc_len >> 16) & 0xFF;
+    buf[10] = (seg_desc_len >> 8) & 0xFF;
+    buf[11] = seg_desc_len & 0xFF;
+    buf[12] = (inline_data_len >> 24) & 0xFF;
+    buf[13] = (inline_data_len >> 16) & 0xFF;
+    buf[14] = (inline_data_len >> 8) & 0xFF;
+    buf[15] = inline_data_len & 0xFF;
+}
+
+static void iscsi_xcopy_data(struct iscsi_data *data,
+                             IscsiLun *src, int64_t src_lba,
+                             IscsiLun *dst, int64_t dst_lba,
+                             int num_blocks)
+{
+    uint8_t *buf;
+    int offset;
+    int tgt_desc_len, seg_desc_len;
+
+    data->size = XCOPY_DESC_OFFSET +
+                 32 * 2 +    /* IDENT_DESCR_TGT_DESCR */
+                 28;         /* BLK_TO_BLK_SEG_DESCR */
+    data->data = g_malloc0(data->size);
+    buf = data->data;
+
+    /* Initialise CSCD list with one src + one dst descriptor */
+    offset = XCOPY_DESC_OFFSET;
+    offset += iscsi_populate_target_desc(buf + offset, src);
+    offset += iscsi_populate_target_desc(buf + offset, dst);
+    tgt_desc_len = offset - XCOPY_DESC_OFFSET;
+
+    /* Initialise one segment descriptor */
+    seg_desc_len = iscsi_xcopy_populate_desc(buf + offset, 0, 0,
+                                             0, 1, num_blocks,
+                                             src_lba, dst_lba);
+    offset += seg_desc_len;
+
+    /* Initialise the parameter list header */
+    iscsi_xcopy_populate_header(buf, 1, 0, 2 /* LIST_ID_USAGE_DISCARD */,
+                                0, tgt_desc_len, seg_desc_len, 0);
+}
+
+static int coroutine_fn iscsi_co_copy_range_to(BlockDriverState *bs,
+                                               BdrvChild *src,
+                                               uint64_t src_offset,
+                                               BdrvChild *dst,
+                                               uint64_t dst_offset,
+                                               uint64_t bytes,
+                                               BdrvRequestFlags flags)
+{
+    IscsiLun *dst_lun = dst->bs->opaque;
+    IscsiLun *src_lun;
+    struct IscsiTask iscsi_task;
+    struct iscsi_data data;
+    int r = 0;
+    int block_size;
+
+    if (src->bs->drv->bdrv_co_copy_range_to != iscsi_co_copy_range_to) {
+        return -ENOTSUP;
+    }
+    src_lun = src->bs->opaque;
+
+    if (!src_lun->dd || !dst_lun->dd) {
+        return -ENOTSUP;
+    }
+    if (!is_byte_request_lun_aligned(dst_offset, bytes, dst_lun)) {
+        return -ENOTSUP;
+    }
+    if (!is_byte_request_lun_aligned(src_offset, bytes, src_lun)) {
+        return -ENOTSUP;
+    }
+    if (dst_lun->block_size != src_lun->block_size ||
+        !dst_lun->block_size) {
+        return -ENOTSUP;
+    }
+
+    block_size = dst_lun->block_size;
+    iscsi_xcopy_data(&data,
+                     src_lun, src_offset / block_size,
+                     dst_lun, dst_offset / block_size,
+                     bytes / block_size);
+
+    iscsi_co_init_iscsitask(dst_lun, &iscsi_task);
+
+    qemu_mutex_lock(&dst_lun->mutex);
+    iscsi_task.task = iscsi_xcopy_task(data.size);
+retry:
+    if (iscsi_scsi_command_async(dst_lun->iscsi, dst_lun->lun,
+                                 iscsi_task.task, iscsi_co_generic_cb,
+                                 &data,
+                                 &iscsi_task) != 0) {
+        r = -EIO;
+        goto out_unlock;
+    }
+
+    while (!iscsi_task.complete) {
+        iscsi_set_events(dst_lun);
+        qemu_mutex_unlock(&dst_lun->mutex);
+        qemu_coroutine_yield();
+        qemu_mutex_lock(&dst_lun->mutex);
+    }
+
+    if (iscsi_task.do_retry) {
+        iscsi_task.complete = 0;
+        goto retry;
+    }
+
+    if (iscsi_task.status != SCSI_STATUS_GOOD) {
+        r = iscsi_task.err_code;
+        goto out_unlock;
+    }
+
+out_unlock:
+    g_free(iscsi_task.task);
+    qemu_mutex_unlock(&dst_lun->mutex);
+    g_free(iscsi_task.err_str);
+    return r;
+}
+
 static QemuOptsList iscsi_create_opts = {
     .name = "iscsi-create-opts",
     .head = QTAILQ_HEAD_INITIALIZER(iscsi_create_opts.head),
@@ -2218,6 +2480,8 @@ static BlockDriver bdrv_iscsi = {
 
     .bdrv_co_block_status  = iscsi_co_block_status,
     .bdrv_co_pdiscard      = iscsi_co_pdiscard,
+    .bdrv_co_copy_range_from = iscsi_co_copy_range_from,
+    .bdrv_co_copy_range_to  = iscsi_co_copy_range_to,
     .bdrv_co_pwrite_zeroes = iscsi_co_pwrite_zeroes,
     .bdrv_co_readv         = iscsi_co_readv,
     .bdrv_co_writev_flags  = iscsi_co_writev_flags,
@@ -2253,6 +2517,8 @@ static BlockDriver bdrv_iser = {
 
     .bdrv_co_block_status  = iscsi_co_block_status,
     .bdrv_co_pdiscard      = iscsi_co_pdiscard,
+    .bdrv_co_copy_range_from = iscsi_co_copy_range_from,
+    .bdrv_co_copy_range_to  = iscsi_co_copy_range_to,
     .bdrv_co_pwrite_zeroes = iscsi_co_pwrite_zeroes,
     .bdrv_co_readv         = iscsi_co_readv,
     .bdrv_co_writev_flags  = iscsi_co_writev_flags,
diff --git a/include/scsi/constants.h b/include/scsi/constants.h
index a141dd71f8..a4a393ff1f 100644
--- a/include/scsi/constants.h
+++ b/include/scsi/constants.h
@@ -311,4 +311,7 @@
 #define MMC_PROFILE_HDDVD_RW_DL         0x005A
 #define MMC_PROFILE_INVALID             0xFFFF
 
+#define XCOPY_DESC_OFFSET 16
+#define IDENT_DESCR_TGT_DESCR 0xE4
+
 #endif
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC PATCH v2 6/7] block-backend: Add blk_co_copy_range
  2018-04-18  3:04 [Qemu-devel] [RFC PATCH v2 0/7] qemu-img convert with copy offloading Fam Zheng
                   ` (4 preceding siblings ...)
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 5/7] iscsi: Implement copy offloading Fam Zheng
@ 2018-04-18  3:04 ` Fam Zheng
  2018-05-03 10:28   ` Stefan Hajnoczi
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 7/7] qemu-img: Convert with copy offloading Fam Zheng
  6 siblings, 1 reply; 18+ messages in thread
From: Fam Zheng @ 2018-04-18  3:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Ronnie Sahlberg, qemu-block, Peter Lieven,
	Fam Zheng, Kevin Wolf, Max Reitz, Stefan Hajnoczi

It's a BlockBackend wrapper of the BDS interface.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/block-backend.c          | 9 +++++++++
 include/sysemu/block-backend.h | 4 ++++
 2 files changed, 13 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 681b240b12..2a984b1864 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2217,3 +2217,12 @@ void blk_unregister_buf(BlockBackend *blk, void *host)
 {
     bdrv_unregister_buf(blk_bs(blk), host);
 }
+
+int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
+                                   BlockBackend *blk_out, int64_t off_out,
+                                   int bytes, BdrvRequestFlags flags)
+{
+    return bdrv_co_copy_range(blk_in->root, off_in,
+                              blk_out->root, off_out,
+                              bytes, flags);
+}
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 92ab624fac..6f043b6b51 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -232,4 +232,8 @@ void blk_set_force_allow_inactivate(BlockBackend *blk);
 void blk_register_buf(BlockBackend *blk, void *host, size_t size);
 void blk_unregister_buf(BlockBackend *blk, void *host);
 
+int blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
+                      BlockBackend *blk_out, int64_t off_out,
+                      int bytes, BdrvRequestFlags flags);
+
 #endif
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [RFC PATCH v2 7/7] qemu-img: Convert with copy offloading
  2018-04-18  3:04 [Qemu-devel] [RFC PATCH v2 0/7] qemu-img convert with copy offloading Fam Zheng
                   ` (5 preceding siblings ...)
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 6/7] block-backend: Add blk_co_copy_range Fam Zheng
@ 2018-04-18  3:04 ` Fam Zheng
  2018-05-03 10:34   ` Stefan Hajnoczi
  6 siblings, 1 reply; 18+ messages in thread
From: Fam Zheng @ 2018-04-18  3:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Ronnie Sahlberg, qemu-block, Peter Lieven,
	Fam Zheng, Kevin Wolf, Max Reitz, Stefan Hajnoczi

The new blk_co_copy_range interface offers a more efficient way in the
case of network based storage. Make use of it to allow faster convert
operation.

Since copy offloading cannot do zero detection ('-S') and compression
(-c), only try it when these options are not used.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 qemu-img.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 855fa52514..268b749592 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1544,6 +1544,7 @@ typedef struct ImgConvertState {
     bool compressed;
     bool target_has_backing;
     bool wr_in_order;
+    bool copy_range;
     int min_sparse;
     size_t cluster_sectors;
     size_t buf_sectors;
@@ -1737,6 +1738,37 @@ static int coroutine_fn convert_co_write(ImgConvertState *s, int64_t sector_num,
     return 0;
 }
 
+static int coroutine_fn convert_co_copy_range(ImgConvertState *s, int64_t sector_num,
+                                              int nb_sectors)
+{
+    int n, ret;
+
+    while (nb_sectors > 0) {
+        BlockBackend *blk;
+        int src_cur;
+        int64_t bs_sectors, src_cur_offset;
+        int64_t offset;
+
+        convert_select_part(s, sector_num, &src_cur, &src_cur_offset);
+        offset = (sector_num - src_cur_offset) << BDRV_SECTOR_BITS;
+        blk = s->src[src_cur];
+        bs_sectors = s->src_sectors[src_cur];
+
+        n = MIN(nb_sectors, bs_sectors - (sector_num - src_cur_offset));
+
+        ret = blk_co_copy_range(blk, offset, s->target,
+                                sector_num << BDRV_SECTOR_BITS,
+                                n << BDRV_SECTOR_BITS, 0);
+        if (ret < 0) {
+            return ret;
+        }
+
+        sector_num += n;
+        nb_sectors -= n;
+    }
+    return 0;
+}
+
 static void coroutine_fn convert_co_do_copy(void *opaque)
 {
     ImgConvertState *s = opaque;
@@ -1759,6 +1791,7 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
         int n;
         int64_t sector_num;
         enum ImgConvertBlockStatus status;
+        bool copy_range;
 
         qemu_co_mutex_lock(&s->lock);
         if (s->ret != -EINPROGRESS || s->sector_num >= s->total_sectors) {
@@ -1788,7 +1821,9 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
                                         s->allocated_sectors, 0);
         }
 
-        if (status == BLK_DATA) {
+retry:
+        copy_range = s->copy_range && s->status == BLK_DATA;
+        if (status == BLK_DATA && !copy_range) {
             ret = convert_co_read(s, sector_num, n, buf);
             if (ret < 0) {
                 error_report("error while reading sector %" PRId64
@@ -1810,7 +1845,15 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
         }
 
         if (s->ret == -EINPROGRESS) {
-            ret = convert_co_write(s, sector_num, n, buf, status);
+            if (copy_range) {
+                ret = convert_co_copy_range(s, sector_num, n);
+                if (ret) {
+                    s->copy_range = false;
+                    goto retry;
+                }
+            } else {
+                ret = convert_co_write(s, sector_num, n, buf, status);
+            }
             if (ret < 0) {
                 error_report("error while writing sector %" PRId64
                              ": %s", sector_num, strerror(-ret));
@@ -1933,6 +1976,7 @@ static int img_convert(int argc, char **argv)
     ImgConvertState s = (ImgConvertState) {
         /* Need at least 4k of zeros for sparse detection */
         .min_sparse         = 8,
+        .copy_range         = true,
         .buf_sectors        = IO_BUF_SIZE / BDRV_SECTOR_SIZE,
         .wr_in_order        = true,
         .num_coroutines     = 8,
@@ -1973,6 +2017,7 @@ static int img_convert(int argc, char **argv)
             break;
         case 'c':
             s.compressed = true;
+            s.copy_range = false;
             break;
         case 'o':
             if (!is_valid_option_list(optarg)) {
@@ -2014,6 +2059,7 @@ static int img_convert(int argc, char **argv)
             }
 
             s.min_sparse = sval / BDRV_SECTOR_SIZE;
+            s.copy_range = false;
             break;
         }
         case 'p':
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH v2 1/7] block: Introduce API for copy offloading
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 1/7] block: Introduce API for " Fam Zheng
@ 2018-04-26 14:57   ` Stefan Hajnoczi
  2018-04-27  5:52     ` Fam Zheng
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2018-04-26 14:57 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Kevin Wolf, Stefan Hajnoczi, qemu-block,
	Peter Lieven, Max Reitz, Ronnie Sahlberg, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2061 bytes --]

On Wed, Apr 18, 2018 at 11:04:18AM +0800, Fam Zheng wrote:
> diff --git a/block/io.c b/block/io.c
> index bd9a19a9c4..d274e9525f 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2826,3 +2826,94 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
>          bdrv_unregister_buf(child->bs, host);
>      }
>  }
> +
> +static int bdrv_co_copy_range_internal(BdrvChild *src,

Please remember to use coroutine_fn for coroutines!  This applies to the
other functions in this patch too.

> +int bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
> +                       BdrvChild *dst, uint64_t dst_offset,
> +                       uint64_t bytes, BdrvRequestFlags flags)
> +{
> +    BdrvTrackedRequest src_req, dst_req;
> +    BlockDriverState *src_bs = src->bs;
> +    BlockDriverState *dst_bs = dst->bs;
> +    int ret;
> +
> +    bdrv_inc_in_flight(src_bs);
> +    bdrv_inc_in_flight(dst_bs);
> +    tracked_request_begin(&src_req, src_bs, src_offset,
> +                          bytes, BDRV_TRACKED_READ);
> +    tracked_request_begin(&dst_req, dst_bs, dst_offset,
> +                          bytes, BDRV_TRACKED_WRITE);

Tracked requests and in-flight counters are only updated on root nodes.
This is not how read/write works.  Does drain work on an internal or
leaf node with multiple parents?

> diff --git a/include/block/block.h b/include/block/block.h
> index cdec3639a3..72ac011b2b 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -604,4 +604,8 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
>   */
>  void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
>  void bdrv_unregister_buf(BlockDriverState *bs, void *host);
> +
> +int bdrv_co_copy_range(BdrvChild *bs, uint64_t offset,
> +                       BdrvChild *src, uint64_t src_offset,
> +                       uint64_t bytes, BdrvRequestFlags flags);

Please document this new block.h API.

These arguments are in the wrong order!  The first BdrvChild is the
source and the second is the destination.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH v2 2/7] raw: Implement copy offloading
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 2/7] raw: Implement " Fam Zheng
@ 2018-04-26 14:57   ` Stefan Hajnoczi
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2018-04-26 14:57 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Kevin Wolf, Stefan Hajnoczi, qemu-block,
	Peter Lieven, Max Reitz, Ronnie Sahlberg, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 286 bytes --]

On Wed, Apr 18, 2018 at 11:04:19AM +0800, Fam Zheng wrote:
> Just pass down to ->file.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/raw-format.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH v2 3/7] qcow2: Implement copy offloading
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 3/7] qcow2: " Fam Zheng
@ 2018-04-26 15:34   ` Stefan Hajnoczi
  2018-04-27  6:28     ` Fam Zheng
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2018-04-26 15:34 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Kevin Wolf, Stefan Hajnoczi, qemu-block,
	Peter Lieven, Max Reitz, Ronnie Sahlberg, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

On Wed, Apr 18, 2018 at 11:04:20AM +0800, Fam Zheng wrote:
> +static int qcow2_co_copy_range_to(BlockDriverState *bs,
> +                                  BdrvChild *src, uint64_t src_offset,
> +                                  BdrvChild *dst, uint64_t dst_offset,
> +                                  uint64_t bytes, BdrvRequestFlags flags)
> +{

In theory this could increment the refcount if src == dst.  It's not
necessary to go down to leaf nodes if the sharing can be handled at the
qcow2 L2 table level.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH v2 1/7] block: Introduce API for copy offloading
  2018-04-26 14:57   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2018-04-27  5:52     ` Fam Zheng
  0 siblings, 0 replies; 18+ messages in thread
From: Fam Zheng @ 2018-04-27  5:52 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, Stefan Hajnoczi, qemu-block,
	Peter Lieven, Max Reitz, Ronnie Sahlberg, Paolo Bonzini

On Thu, 04/26 15:57, Stefan Hajnoczi wrote:
> On Wed, Apr 18, 2018 at 11:04:18AM +0800, Fam Zheng wrote:
> > diff --git a/block/io.c b/block/io.c
> > index bd9a19a9c4..d274e9525f 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -2826,3 +2826,94 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host)
> >          bdrv_unregister_buf(child->bs, host);
> >      }
> >  }
> > +
> > +static int bdrv_co_copy_range_internal(BdrvChild *src,
> 
> Please remember to use coroutine_fn for coroutines!  This applies to the
> other functions in this patch too.

OK!

> 
> > +int bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
> > +                       BdrvChild *dst, uint64_t dst_offset,
> > +                       uint64_t bytes, BdrvRequestFlags flags)
> > +{
> > +    BdrvTrackedRequest src_req, dst_req;
> > +    BlockDriverState *src_bs = src->bs;
> > +    BlockDriverState *dst_bs = dst->bs;
> > +    int ret;
> > +
> > +    bdrv_inc_in_flight(src_bs);
> > +    bdrv_inc_in_flight(dst_bs);
> > +    tracked_request_begin(&src_req, src_bs, src_offset,
> > +                          bytes, BDRV_TRACKED_READ);
> > +    tracked_request_begin(&dst_req, dst_bs, dst_offset,
> > +                          bytes, BDRV_TRACKED_WRITE);
> 
> Tracked requests and in-flight counters are only updated on root nodes.
> This is not how read/write works.  Does drain work on an internal or
> leaf node with multiple parents?

Telling from how bdrv_do_drained_begin is implemented now (recursive both to
parents and children), I think it should work. But you are right this is
inconsistent with read/write code, it seems I could move this it to
bdrv_co_copy_range_internal.

> 
> > diff --git a/include/block/block.h b/include/block/block.h
> > index cdec3639a3..72ac011b2b 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -604,4 +604,8 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
> >   */
> >  void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
> >  void bdrv_unregister_buf(BlockDriverState *bs, void *host);
> > +
> > +int bdrv_co_copy_range(BdrvChild *bs, uint64_t offset,
> > +                       BdrvChild *src, uint64_t src_offset,
> > +                       uint64_t bytes, BdrvRequestFlags flags);
> 
> Please document this new block.h API.

OK.

> 
> These arguments are in the wrong order!  The first BdrvChild is the
> source and the second is the destination.

Right, will fix.

Fam

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH v2 3/7] qcow2: Implement copy offloading
  2018-04-26 15:34   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2018-04-27  6:28     ` Fam Zheng
  0 siblings, 0 replies; 18+ messages in thread
From: Fam Zheng @ 2018-04-27  6:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, Stefan Hajnoczi, qemu-block,
	Peter Lieven, Max Reitz, Ronnie Sahlberg, Paolo Bonzini

On Thu, 04/26 16:34, Stefan Hajnoczi wrote:
> On Wed, Apr 18, 2018 at 11:04:20AM +0800, Fam Zheng wrote:
> > +static int qcow2_co_copy_range_to(BlockDriverState *bs,
> > +                                  BdrvChild *src, uint64_t src_offset,
> > +                                  BdrvChild *dst, uint64_t dst_offset,
> > +                                  uint64_t bytes, BdrvRequestFlags flags)
> > +{
> 
> In theory this could increment the refcount if src == dst.  It's not
> necessary to go down to leaf nodes if the sharing can be handled at the
> qcow2 L2 table level.

Yes, this is a good optimization to have.

Fam

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 4/7] file-posix: Implement bdrv_co_copy_range
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 4/7] file-posix: Implement bdrv_co_copy_range Fam Zheng
@ 2018-05-03  9:25   ` Stefan Hajnoczi
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2018-05-03  9:25 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Paolo Bonzini, Ronnie Sahlberg, qemu-block,
	Peter Lieven, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1651 bytes --]

On Wed, Apr 18, 2018 at 11:04:21AM +0800, Fam Zheng wrote:
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 3794c0007a..45ad543481 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -100,6 +100,7 @@
>  #ifdef CONFIG_XFS
>  #include <xfs/xfs.h>
>  #endif
> +#include <sys/syscall.h>

Is sys/syscall.h available on all systems?  I wasn't able to find it in
the POSIX specs.

>  
>  //#define DEBUG_BLOCK
>  
> @@ -185,6 +186,8 @@ typedef struct RawPosixAIOData {
>  #define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
>      off_t aio_offset;
>      int aio_type;
> +    int fd2;
> +    off_t offset2;

Is there a reason for abandoning the aio_* field naming convention?

>  } RawPosixAIOData;
>  
>  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> @@ -1421,6 +1424,48 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>      return -ENOTSUP;
>  }
>  
> +#ifdef __NR_copy_file_range
> +#define HAS_COPY_FILE_RANGE
> +#endif
> +
> +#ifdef HAS_COPY_FILE_RANGE
> +static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
> +                             off_t *out_off, size_t len, unsigned int flags)
> +{
> +    return syscall(__NR_copy_file_range, in_fd, in_off, out_fd,
> +                   out_off, len, flags);
> +}
> +#endif

Further #ifdefs can be avoided by providing an implementation here:

#else
static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
                             off_t *out_off, size_t len, unsigned int flags)
{
    errno = ENOSYS;
    return -1;
}
#endif /* !HAS_COPY_FILE_RANGE */

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 5/7] iscsi: Implement copy offloading
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 5/7] iscsi: Implement copy offloading Fam Zheng
@ 2018-05-03 10:27   ` Stefan Hajnoczi
  2018-05-09  6:30     ` Fam Zheng
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2018-05-03 10:27 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Paolo Bonzini, Ronnie Sahlberg, qemu-block,
	Peter Lieven, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 5715 bytes --]

On Wed, Apr 18, 2018 at 11:04:22AM +0800, Fam Zheng wrote:
> +static void iscsi_save_designator(IscsiLun *lun,
> +                                  struct scsi_inquiry_device_identification *inq_di)
> +{
> +    struct scsi_inquiry_device_designator *desig, *copy = NULL;
> +
> +    for (desig = inq_di->designators; desig; desig = desig->next) {
> +        if (desig->association ||
> +            desig->designator_type > SCSI_DESIGNATOR_TYPE_NAA) {
> +            continue;
> +        }
> +        /* NAA works better than T10 vendor ID based designator. */
> +        if (!copy || copy->designator_type < desig->designator_type) {
> +            copy = desig;
> +        }
> +    }
> +    if (copy) {
> +        lun->dd = g_new(struct scsi_inquiry_device_designator, 1);
> +        *lun->dd = *copy;

Just in case:

  lun->dd->next = NULL;

> +        lun->dd->designator = g_malloc(copy->designator_length);
> +        memcpy(lun->dd->designator, copy->designator, copy->designator_length);
> +    }
> +}
> +
>  static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>                        Error **errp)
>  {
> @@ -1922,6 +1946,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>          struct scsi_task *inq_task;
>          struct scsi_inquiry_logical_block_provisioning *inq_lbp;
>          struct scsi_inquiry_block_limits *inq_bl;
> +        struct scsi_inquiry_device_identification *inq_di;
>          switch (inq_vpd->pages[i]) {
>          case SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING:
>              inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
> @@ -1947,6 +1972,17 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>                     sizeof(struct scsi_inquiry_block_limits));
>              scsi_free_scsi_task(inq_task);
>              break;
> +        case SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION:

The device designator part should probably be a separate patch.

> +            inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
> +                                    SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION,
> +                                    (void **) &inq_di, errp);
> +            if (inq_task == NULL) {
> +                ret = -EINVAL;
> +                goto out;
> +            }
> +            iscsi_save_designator(iscsilun, inq_di);
> +            scsi_free_scsi_task(inq_task);
> +            break;
>          default:
>              break;
>          }
> @@ -2003,6 +2039,8 @@ static void iscsi_close(BlockDriverState *bs)
>          iscsi_logout_sync(iscsi);
>      }
>      iscsi_destroy_context(iscsi);
> +    g_free(iscsilun->dd->designator);

Can iscsilun->dd be NULL?

> +static void iscsi_xcopy_data(struct iscsi_data *data,
> +                             IscsiLun *src, int64_t src_lba,
> +                             IscsiLun *dst, int64_t dst_lba,
> +                             int num_blocks)

It's not obvious that int is the appropriate type for num_blocks.  The
caller had a uint64_t value.  Also, IscsiLun->num_blocks is uint64_t.

> +{
> +    uint8_t *buf;
> +    int offset;
> +    int tgt_desc_len, seg_desc_len;
> +
> +    data->size = XCOPY_DESC_OFFSET +

struct iscsi_data->size is size_t.  Why does this function and the
populate function use int for memory offsets/lengths?

> +                 32 * 2 +    /* IDENT_DESCR_TGT_DESCR */
> +                 28;         /* BLK_TO_BLK_SEG_DESCR */
> +    data->data = g_malloc0(data->size);
> +    buf = data->data;
> +
> +    /* Initialise CSCD list with one src + one dst descriptor */
> +    offset = XCOPY_DESC_OFFSET;
> +    offset += iscsi_populate_target_desc(buf + offset, src);
> +    offset += iscsi_populate_target_desc(buf + offset, dst);
> +    tgt_desc_len = offset - XCOPY_DESC_OFFSET;
> +
> +    /* Initialise one segment descriptor */
> +    seg_desc_len = iscsi_xcopy_populate_desc(buf + offset, 0, 0,
> +                                             0, 1, num_blocks,
> +                                             src_lba, dst_lba);
> +    offset += seg_desc_len;
> +
> +    /* Initialise the parameter list header */
> +    iscsi_xcopy_populate_header(buf, 1, 0, 2 /* LIST_ID_USAGE_DISCARD */,
> +                                0, tgt_desc_len, seg_desc_len, 0);

offset, tgt_desc_len, and seg_desc_len are not necessary:

1. We already hardcoded exact size for g_malloc0(), so why are we
   calculating the size again dynamically?

2. The populate functions assume the caller already knows the size
   since there is no buffer length argument to prevent overflows.

It's cleaner to use hardcoded offsets:

  iscsi_xcopy_populate_header(buf, 1, 0, 2 /* LIST_ID_USAGE_DISCARD */,
                              0, 2 * XCOPY_TGT_DESC_LEN,
			      XCOPY_SEG_DESC_LEN, 0);
  iscsi_populate_target_desc(&buf[XCOPY_SRC_OFFSET], src);
  iscsi_populate_target_desc(&buf[XCOPY_DST_OFFSET], dst);
  iscsi_xcopy_populate_desc(&buf[XCOPY_SEG_OFFSET], 0, 0, 0, 1,
                            num_blocks, src_lba, dst_lba);

Then the populate functions don't need to return sizes and we don't have
to pretend that offsets are calculated dynamically.

> +    while (!iscsi_task.complete) {
> +        iscsi_set_events(dst_lun);
> +        qemu_mutex_unlock(&dst_lun->mutex);
> +        qemu_coroutine_yield();
> +        qemu_mutex_lock(&dst_lun->mutex);
> +    }

This code is duplicated several times already.  Please create a helper
function as a separate patch before adding xcopy support:

  /* Yield until @iTask has completed */
  static void coroutine_fn iscsi_co_wait_for_task(IscsiTask *iTask,
                                                  IscsiLun *iscsilun);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 6/7] block-backend: Add blk_co_copy_range
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 6/7] block-backend: Add blk_co_copy_range Fam Zheng
@ 2018-05-03 10:28   ` Stefan Hajnoczi
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2018-05-03 10:28 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Paolo Bonzini, Ronnie Sahlberg, qemu-block,
	Peter Lieven, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]

On Wed, Apr 18, 2018 at 11:04:23AM +0800, Fam Zheng wrote:
> It's a BlockBackend wrapper of the BDS interface.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/block-backend.c          | 9 +++++++++
>  include/sysemu/block-backend.h | 4 ++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 681b240b12..2a984b1864 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -2217,3 +2217,12 @@ void blk_unregister_buf(BlockBackend *blk, void *host)
>  {
>      bdrv_unregister_buf(blk_bs(blk), host);
>  }
> +
> +int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
> +                                   BlockBackend *blk_out, int64_t off_out,
> +                                   int bytes, BdrvRequestFlags flags)
> +{
> +    return bdrv_co_copy_range(blk_in->root, off_in,
> +                              blk_out->root, off_out,
> +                              bytes, flags);
> +}

What happens if either ->root field is NULL?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 7/7] qemu-img: Convert with copy offloading
  2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 7/7] qemu-img: Convert with copy offloading Fam Zheng
@ 2018-05-03 10:34   ` Stefan Hajnoczi
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2018-05-03 10:34 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Paolo Bonzini, Ronnie Sahlberg, qemu-block,
	Peter Lieven, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 587 bytes --]

On Wed, Apr 18, 2018 at 11:04:24AM +0800, Fam Zheng wrote:
> The new blk_co_copy_range interface offers a more efficient way in the
> case of network based storage. Make use of it to allow faster convert
> operation.
> 
> Since copy offloading cannot do zero detection ('-S') and compression
> (-c), only try it when these options are not used.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  qemu-img.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 48 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 5/7] iscsi: Implement copy offloading
  2018-05-03 10:27   ` Stefan Hajnoczi
@ 2018-05-09  6:30     ` Fam Zheng
  0 siblings, 0 replies; 18+ messages in thread
From: Fam Zheng @ 2018-05-09  6:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Paolo Bonzini, Ronnie Sahlberg, qemu-block,
	Peter Lieven, Kevin Wolf, Max Reitz

On Thu, 05/03 11:27, Stefan Hajnoczi wrote:
> On Wed, Apr 18, 2018 at 11:04:22AM +0800, Fam Zheng wrote:
> > +static void iscsi_save_designator(IscsiLun *lun,
> > +                                  struct scsi_inquiry_device_identification *inq_di)
> > +{
> > +    struct scsi_inquiry_device_designator *desig, *copy = NULL;
> > +
> > +    for (desig = inq_di->designators; desig; desig = desig->next) {
> > +        if (desig->association ||
> > +            desig->designator_type > SCSI_DESIGNATOR_TYPE_NAA) {
> > +            continue;
> > +        }
> > +        /* NAA works better than T10 vendor ID based designator. */
> > +        if (!copy || copy->designator_type < desig->designator_type) {
> > +            copy = desig;
> > +        }
> > +    }
> > +    if (copy) {
> > +        lun->dd = g_new(struct scsi_inquiry_device_designator, 1);
> > +        *lun->dd = *copy;
> 
> Just in case:
> 
>   lun->dd->next = NULL;

Sure.

> 
> > +        lun->dd->designator = g_malloc(copy->designator_length);
> > +        memcpy(lun->dd->designator, copy->designator, copy->designator_length);
> > +    }
> > +}
> > +
> >  static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
> >                        Error **errp)
> >  {
> > @@ -1922,6 +1946,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
> >          struct scsi_task *inq_task;
> >          struct scsi_inquiry_logical_block_provisioning *inq_lbp;
> >          struct scsi_inquiry_block_limits *inq_bl;
> > +        struct scsi_inquiry_device_identification *inq_di;
> >          switch (inq_vpd->pages[i]) {
> >          case SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING:
> >              inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
> > @@ -1947,6 +1972,17 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
> >                     sizeof(struct scsi_inquiry_block_limits));
> >              scsi_free_scsi_task(inq_task);
> >              break;
> > +        case SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION:
> 
> The device designator part should probably be a separate patch.

Good idea, I'll do the split.

> 
> > +            inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
> > +                                    SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION,
> > +                                    (void **) &inq_di, errp);
> > +            if (inq_task == NULL) {
> > +                ret = -EINVAL;
> > +                goto out;
> > +            }
> > +            iscsi_save_designator(iscsilun, inq_di);
> > +            scsi_free_scsi_task(inq_task);
> > +            break;
> >          default:
> >              break;
> >          }
> > @@ -2003,6 +2039,8 @@ static void iscsi_close(BlockDriverState *bs)
> >          iscsi_logout_sync(iscsi);
> >      }
> >      iscsi_destroy_context(iscsi);
> > +    g_free(iscsilun->dd->designator);
> 
> Can iscsilun->dd be NULL?

Yes, if the INQUIRY response isn't typical. Will add an if here.

> 
> > +static void iscsi_xcopy_data(struct iscsi_data *data,
> > +                             IscsiLun *src, int64_t src_lba,
> > +                             IscsiLun *dst, int64_t dst_lba,
> > +                             int num_blocks)
> 
> It's not obvious that int is the appropriate type for num_blocks.  The
> caller had a uint64_t value.  Also, IscsiLun->num_blocks is uint64_t.

This is limited by the field size in the command data (2 bytes) so int is big
enough, although unsigned is better. I can change the type, add a comment and an
assert.

> 
> > +{
> > +    uint8_t *buf;
> > +    int offset;
> > +    int tgt_desc_len, seg_desc_len;
> > +
> > +    data->size = XCOPY_DESC_OFFSET +
> 
> struct iscsi_data->size is size_t.  Why does this function and the
> populate function use int for memory offsets/lengths?
> 
> > +                 32 * 2 +    /* IDENT_DESCR_TGT_DESCR */
> > +                 28;         /* BLK_TO_BLK_SEG_DESCR */
> > +    data->data = g_malloc0(data->size);
> > +    buf = data->data;
> > +
> > +    /* Initialise CSCD list with one src + one dst descriptor */
> > +    offset = XCOPY_DESC_OFFSET;
> > +    offset += iscsi_populate_target_desc(buf + offset, src);
> > +    offset += iscsi_populate_target_desc(buf + offset, dst);
> > +    tgt_desc_len = offset - XCOPY_DESC_OFFSET;
> > +
> > +    /* Initialise one segment descriptor */
> > +    seg_desc_len = iscsi_xcopy_populate_desc(buf + offset, 0, 0,
> > +                                             0, 1, num_blocks,
> > +                                             src_lba, dst_lba);
> > +    offset += seg_desc_len;
> > +
> > +    /* Initialise the parameter list header */
> > +    iscsi_xcopy_populate_header(buf, 1, 0, 2 /* LIST_ID_USAGE_DISCARD */,
> > +                                0, tgt_desc_len, seg_desc_len, 0);
> 
> offset, tgt_desc_len, and seg_desc_len are not necessary:
> 
> 1. We already hardcoded exact size for g_malloc0(), so why are we
>    calculating the size again dynamically?
> 
> 2. The populate functions assume the caller already knows the size
>    since there is no buffer length argument to prevent overflows.
> 
> It's cleaner to use hardcoded offsets:
> 
>   iscsi_xcopy_populate_header(buf, 1, 0, 2 /* LIST_ID_USAGE_DISCARD */,
>                               0, 2 * XCOPY_TGT_DESC_LEN,
> 			      XCOPY_SEG_DESC_LEN, 0);
>   iscsi_populate_target_desc(&buf[XCOPY_SRC_OFFSET], src);
>   iscsi_populate_target_desc(&buf[XCOPY_DST_OFFSET], dst);
>   iscsi_xcopy_populate_desc(&buf[XCOPY_SEG_OFFSET], 0, 0, 0, 1,
>                             num_blocks, src_lba, dst_lba);
> 
> Then the populate functions don't need to return sizes and we don't have
> to pretend that offsets are calculated dynamically.

The data types and offset calculation issues are all copyied from libiscsi
example, I'll clean up as you suggested. Thanks.

> 
> > +    while (!iscsi_task.complete) {
> > +        iscsi_set_events(dst_lun);
> > +        qemu_mutex_unlock(&dst_lun->mutex);
> > +        qemu_coroutine_yield();
> > +        qemu_mutex_lock(&dst_lun->mutex);
> > +    }
> 
> This code is duplicated several times already.  Please create a helper
> function as a separate patch before adding xcopy support:
> 
>   /* Yield until @iTask has completed */
>   static void coroutine_fn iscsi_co_wait_for_task(IscsiTask *iTask,
>                                                   IscsiLun *iscsilun);

Yes, will do.

Fam

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-05-09  6:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-18  3:04 [Qemu-devel] [RFC PATCH v2 0/7] qemu-img convert with copy offloading Fam Zheng
2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 1/7] block: Introduce API for " Fam Zheng
2018-04-26 14:57   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-27  5:52     ` Fam Zheng
2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 2/7] raw: Implement " Fam Zheng
2018-04-26 14:57   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 3/7] qcow2: " Fam Zheng
2018-04-26 15:34   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-27  6:28     ` Fam Zheng
2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 4/7] file-posix: Implement bdrv_co_copy_range Fam Zheng
2018-05-03  9:25   ` Stefan Hajnoczi
2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 5/7] iscsi: Implement copy offloading Fam Zheng
2018-05-03 10:27   ` Stefan Hajnoczi
2018-05-09  6:30     ` Fam Zheng
2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 6/7] block-backend: Add blk_co_copy_range Fam Zheng
2018-05-03 10:28   ` Stefan Hajnoczi
2018-04-18  3:04 ` [Qemu-devel] [RFC PATCH v2 7/7] qemu-img: Convert with copy offloading Fam Zheng
2018-05-03 10:34   ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.