All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
@ 2017-05-23 11:22 Alberto Garcia
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 1/7] qcow2: Remove unused Error in do_perform_cow() Alberto Garcia
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-05-23 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev, Alberto Garcia

Hi all,

here's a patch series that rewrites the copy-on-write code in the
qcow2 driver to reduce the number of I/O operations.

The situation is that when a guest sends a write request and QEMU
needs to allocate new cluster(s) in a qcow2 file, the unwritten
regions of the new cluster(s) need to be filled with the existing data
(e.g. from the backing image) or with zeroes.

The whole process can require up to 5 I/O operations:

1) Write the data from the actual write request.
2) Read the existing data located before the guest data.
3) Write that data to the new clusters.
4) Read the existing data located after the guest data.
5) Write that data to the new clusters.

This series reduces that to only two operations:

1) Read the existing data from the original clusters
2) Write the updated data (=original + guest request) to the new clusters

Step (1) implies that there's data that will be read but will be
immediately discarded (because it's overwritten by the guest
request). I haven't really detected any big performance problems
because of that, but I decided to be conservative and my code includes
a simple heuristic that keeps the old behavior if the amount of data
to be discarded is higher than 16KB.

I've been testing this series in several scenarios, with different
cluster sizes (32K, 64K, 1MB) and request sizes (from 4 up to 512KB),
and both with an SSD and a rotating HDD. The results vary depending on
the case, with an average increase of 60% in the number of IOPS in the
HDD case, and 15% in the SSD case. In some cases there are really no
big differences and the results are similar before and after this
patch.

Further work for the future includes detecting when the data that
needs to be written consists on zeroes (i.e. allocating a new cluster
with no backing image) and optimizing that case, but let's start with
this.

Regards,

Berto

Alberto Garcia (7):
  qcow2: Remove unused Error in do_perform_cow()
  qcow2: Use unsigned int for both members of Qcow2COWRegion
  qcow2: Make perform_cow() call do_perform_cow() twice
  qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()
  qcow2: Allow reading both COW regions with only one request
  qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()
  qcow2: Merge the writing of the COW regions with the guest data

 block/qcow2-cluster.c | 188 +++++++++++++++++++++++++++++++++++++-------------
 block/qcow2.c         |  58 +++++++++++++---
 block/qcow2.h         |  11 ++-
 3 files changed, 197 insertions(+), 60 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 1/7] qcow2: Remove unused Error in do_perform_cow()
  2017-05-23 11:22 [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Alberto Garcia
@ 2017-05-23 11:22 ` Alberto Garcia
  2017-05-23 20:21   ` Eric Blake
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 2/7] qcow2: Use unsigned int for both members of Qcow2COWRegion Alberto Garcia
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Alberto Garcia @ 2017-05-23 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev, Alberto Garcia

qcow2_encrypt_sectors() does not need an Error parameter, and we're
not checking its value anyway, so we can safely remove it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 347d94b0d2..3dc170e421 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -440,16 +440,14 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
     }
 
     if (bs->encrypted) {
-        Error *err = NULL;
         int64_t sector = (src_cluster_offset + offset_in_cluster)
                          >> BDRV_SECTOR_BITS;
         assert(s->cipher);
         assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
         assert((bytes & ~BDRV_SECTOR_MASK) == 0);
         if (qcow2_encrypt_sectors(s, sector, iov.iov_base, iov.iov_base,
-                                  bytes >> BDRV_SECTOR_BITS, true, &err) < 0) {
+                                  bytes >> BDRV_SECTOR_BITS, true, NULL) < 0) {
             ret = -EIO;
-            error_free(err);
             goto out;
         }
     }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 2/7] qcow2: Use unsigned int for both members of Qcow2COWRegion
  2017-05-23 11:22 [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Alberto Garcia
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 1/7] qcow2: Remove unused Error in do_perform_cow() Alberto Garcia
@ 2017-05-23 11:22 ` Alberto Garcia
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice Alberto Garcia
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-05-23 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev, Alberto Garcia

Qcow2COWRegion has two attributes:

- The offset of the COW region from the start of the first cluster
  touched by the I/O request. Since it's always going to be positive
  and the maximum request size is at most INT_MAX, we can use a
  regular unsigned int to store this offset.

- The size of the COW region in bytes. This is guaranteed to be >= 0,
  so we should use an unsigned type instead.

In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes.
It will also help keep some assertions simpler now that we know that
there are no negative numbers.

The prototype of do_perform_cow() is also updated to reflect these
changes.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 4 ++--
 block/qcow2.h         | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3dc170e421..6757875ec9 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -406,8 +406,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
 static int coroutine_fn do_perform_cow(BlockDriverState *bs,
                                        uint64_t src_cluster_offset,
                                        uint64_t cluster_offset,
-                                       int offset_in_cluster,
-                                       int bytes)
+                                       unsigned offset_in_cluster,
+                                       unsigned bytes)
 {
     BDRVQcow2State *s = bs->opaque;
     QEMUIOVector qiov;
diff --git a/block/qcow2.h b/block/qcow2.h
index 1801dc30dc..c26ee0a33d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -301,10 +301,10 @@ typedef struct Qcow2COWRegion {
      * Offset of the COW region in bytes from the start of the first cluster
      * touched by the request.
      */
-    uint64_t    offset;
+    unsigned    offset;
 
     /** Number of bytes to copy */
-    int         nb_bytes;
+    unsigned    nb_bytes;
 } Qcow2COWRegion;
 
 /**
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice
  2017-05-23 11:22 [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Alberto Garcia
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 1/7] qcow2: Remove unused Error in do_perform_cow() Alberto Garcia
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 2/7] qcow2: Use unsigned int for both members of Qcow2COWRegion Alberto Garcia
@ 2017-05-23 11:22 ` Alberto Garcia
  2017-05-26  8:11   ` Kevin Wolf
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 4/7] qcow2: Split do_perform_cow() into _read(), _encrypt() and _write() Alberto Garcia
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Alberto Garcia @ 2017-05-23 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev, Alberto Garcia

Instead of calling perform_cow() twice with a different COW region
each time, call it just once and make perform_cow() handle both
regions.

This patch simply moves code around. The next one will do the actual
reordering of the COW operations.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 6757875ec9..59a0ffba1a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -414,6 +414,10 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
     struct iovec iov;
     int ret;
 
+    if (bytes == 0) {
+        return 0;
+    }
+
     iov.iov_len = bytes;
     iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
     if (iov.iov_base == NULL) {
@@ -751,31 +755,40 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     return cluster_offset;
 }
 
-static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, Qcow2COWRegion *r)
+static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
 {
     BDRVQcow2State *s = bs->opaque;
+    Qcow2COWRegion *start = &m->cow_start;
+    Qcow2COWRegion *end = &m->cow_end;
     int ret;
 
-    if (r->nb_bytes == 0) {
+    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
     }
 
     qemu_co_mutex_unlock(&s->lock);
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset, r->offset, r->nb_bytes);
+    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
+                         start->offset, start->nb_bytes);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
+                         end->offset, end->nb_bytes);
+
+fail:
     qemu_co_mutex_lock(&s->lock);
 
-    if (ret < 0) {
-        return ret;
-    }
-
     /*
      * Before we update the L2 table to actually point to the new cluster, we
      * need to be sure that the refcounts have been increased and COW was
      * handled.
      */
-    qcow2_cache_depends_on_flush(s->l2_table_cache);
+    if (ret == 0) {
+        qcow2_cache_depends_on_flush(s->l2_table_cache);
+    }
 
-    return 0;
+    return ret;
 }
 
 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
@@ -795,12 +808,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     }
 
     /* copy content of unmodified sectors */
-    ret = perform_cow(bs, m, &m->cow_start);
-    if (ret < 0) {
-        goto err;
-    }
-
-    ret = perform_cow(bs, m, &m->cow_end);
+    ret = perform_cow(bs, m);
     if (ret < 0) {
         goto err;
     }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 4/7] qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()
  2017-05-23 11:22 [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Alberto Garcia
                   ` (2 preceding siblings ...)
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice Alberto Garcia
@ 2017-05-23 11:22 ` Alberto Garcia
  2017-05-23 11:23 ` [Qemu-devel] [PATCH 5/7] qcow2: Allow reading both COW regions with only one request Alberto Garcia
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-05-23 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev, Alberto Garcia

This patch splits do_perform_cow() into three separate functions to
read, encrypt and write the COW regions.

perform_cow() can now read both regions first, then encrypt them and
finally write them to disk. The memory allocation is also done in
this function now, using one single buffer large enough to hold both
regions.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 114 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 84 insertions(+), 30 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 59a0ffba1a..643c48088b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -403,34 +403,26 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
     return 0;
 }
 
-static int coroutine_fn do_perform_cow(BlockDriverState *bs,
-                                       uint64_t src_cluster_offset,
-                                       uint64_t cluster_offset,
-                                       unsigned offset_in_cluster,
-                                       unsigned bytes)
+static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
+                                            uint64_t src_cluster_offset,
+                                            unsigned offset_in_cluster,
+                                            uint8_t *buffer,
+                                            unsigned bytes)
 {
-    BDRVQcow2State *s = bs->opaque;
     QEMUIOVector qiov;
-    struct iovec iov;
+    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
     int ret;
 
     if (bytes == 0) {
         return 0;
     }
 
-    iov.iov_len = bytes;
-    iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
-    if (iov.iov_base == NULL) {
-        return -ENOMEM;
-    }
-
     qemu_iovec_init_external(&qiov, &iov, 1);
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
 
     if (!bs->drv) {
-        ret = -ENOMEDIUM;
-        goto out;
+        return -ENOMEDIUM;
     }
 
     /* Call .bdrv_co_readv() directly instead of using the public block-layer
@@ -440,39 +432,63 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
     ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
                                   bytes, &qiov, 0);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
-    if (bs->encrypted) {
+    return 0;
+}
+
+static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
+                                                uint64_t src_cluster_offset,
+                                                unsigned offset_in_cluster,
+                                                uint8_t *buffer,
+                                                unsigned bytes)
+{
+    if (bytes && bs->encrypted) {
+        BDRVQcow2State *s = bs->opaque;
         int64_t sector = (src_cluster_offset + offset_in_cluster)
                          >> BDRV_SECTOR_BITS;
         assert(s->cipher);
         assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
         assert((bytes & ~BDRV_SECTOR_MASK) == 0);
-        if (qcow2_encrypt_sectors(s, sector, iov.iov_base, iov.iov_base,
+        if (qcow2_encrypt_sectors(s, sector, buffer, buffer,
                                   bytes >> BDRV_SECTOR_BITS, true, NULL) < 0) {
-            ret = -EIO;
-            goto out;
+            return false;
         }
     }
+    return true;
+}
+
+static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
+                                             uint64_t cluster_offset,
+                                             unsigned offset_in_cluster,
+                                             uint8_t *buffer,
+                                             unsigned bytes)
+{
+    QEMUIOVector qiov;
+    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
+    int ret;
+
+    if (bytes == 0) {
+        return 0;
+    }
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
 
     ret = qcow2_pre_write_overlap_check(bs, 0,
             cluster_offset + offset_in_cluster, bytes);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
     ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
                           bytes, &qiov, 0);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
-    ret = 0;
-out:
-    qemu_vfree(iov.iov_base);
-    return ret;
+    return 0;
 }
 
 
@@ -760,22 +776,59 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     BDRVQcow2State *s = bs->opaque;
     Qcow2COWRegion *start = &m->cow_start;
     Qcow2COWRegion *end = &m->cow_end;
+    unsigned buffer_size = start->nb_bytes + end->nb_bytes;
+    uint8_t *start_buffer, *end_buffer;
     int ret;
 
+    assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
+
     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
     }
 
+    /* Reserve a buffer large enough to store the data from both the
+     * start and end COW regions */
+    start_buffer = qemu_try_blockalign(bs, buffer_size);
+    if (start_buffer == NULL) {
+        return -ENOMEM;
+    }
+    /* The part of the buffer where the end region is located */
+    end_buffer = start_buffer + start->nb_bytes;
+
     qemu_co_mutex_unlock(&s->lock);
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
-                         start->offset, start->nb_bytes);
+    /* First we read the existing data from both COW regions */
+    ret = do_perform_cow_read(bs, m->offset, start->offset,
+                              start_buffer, start->nb_bytes);
     if (ret < 0) {
         goto fail;
     }
 
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
-                         end->offset, end->nb_bytes);
+    ret = do_perform_cow_read(bs, m->offset, end->offset,
+                              end_buffer, end->nb_bytes);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* Encrypt the data if necessary before writing it */
+    if (bs->encrypted) {
+        if (!do_perform_cow_encrypt(bs, m->offset, start->offset,
+                                    start_buffer, start->nb_bytes) ||
+            !do_perform_cow_encrypt(bs, m->offset, end->offset,
+                                    end_buffer, end->nb_bytes)) {
+            ret = -EIO;
+            goto fail;
+        }
+    }
+
+    /* And now we can write everything */
+    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
+                               start_buffer, start->nb_bytes);
+    if (ret < 0) {
+        goto fail;
+    }
 
+    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
+                               end_buffer, end->nb_bytes);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
@@ -788,6 +841,7 @@ fail:
         qcow2_cache_depends_on_flush(s->l2_table_cache);
     }
 
+    qemu_vfree(start_buffer);
     return ret;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 5/7] qcow2: Allow reading both COW regions with only one request
  2017-05-23 11:22 [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Alberto Garcia
                   ` (3 preceding siblings ...)
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 4/7] qcow2: Split do_perform_cow() into _read(), _encrypt() and _write() Alberto Garcia
@ 2017-05-23 11:23 ` Alberto Garcia
  2017-05-23 11:23 ` [Qemu-devel] [PATCH 6/7] qcow2: Pass a QEMUIOVector to do_perform_cow_{read, write}() Alberto Garcia
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-05-23 11:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev, Alberto Garcia

Reading both COW regions requires two separate requests, but it's
perfectly possible to merge them and perform only one. This generally
improves performance, particularly on rotating disk drives. The
downside is that the data in the middle region is read but discarded.

This patch takes a conservative approach and only merges reads when
the size of the middle region is <= 16KB.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 643c48088b..67a761731d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -777,34 +777,53 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     Qcow2COWRegion *start = &m->cow_start;
     Qcow2COWRegion *end = &m->cow_end;
     unsigned buffer_size = start->nb_bytes + end->nb_bytes;
+    unsigned data_bytes = end->offset - (start->offset + start->nb_bytes);
+    bool merge_reads = false;
     uint8_t *start_buffer, *end_buffer;
     int ret;
 
     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
+    assert(buffer_size <= UINT_MAX - data_bytes);
+    assert(start->offset + start->nb_bytes <= end->offset);
 
     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
     }
 
-    /* Reserve a buffer large enough to store the data from both the
-     * start and end COW regions */
+    /* If we have to read both the start and end COW regions and the
+     * middle region is not too large then perform just one read
+     * operation */
+    if (start->nb_bytes && end->nb_bytes && data_bytes <= 16384) {
+        merge_reads = true;
+        buffer_size += data_bytes;
+    }
+
+    /* Reserve a buffer large enough to store all the data that we're
+     * going to read */
     start_buffer = qemu_try_blockalign(bs, buffer_size);
     if (start_buffer == NULL) {
         return -ENOMEM;
     }
     /* The part of the buffer where the end region is located */
-    end_buffer = start_buffer + start->nb_bytes;
+    end_buffer = start_buffer + buffer_size - end->nb_bytes;
 
     qemu_co_mutex_unlock(&s->lock);
-    /* First we read the existing data from both COW regions */
-    ret = do_perform_cow_read(bs, m->offset, start->offset,
-                              start_buffer, start->nb_bytes);
-    if (ret < 0) {
-        goto fail;
-    }
+    /* First we read the existing data from both COW regions. We
+     * either read the whole region in one go, or the start and end
+     * regions separately. */
+    if (merge_reads) {
+        ret = do_perform_cow_read(bs, m->offset, start->offset,
+                                  start_buffer, buffer_size);
+    } else {
+        ret = do_perform_cow_read(bs, m->offset, start->offset,
+                                  start_buffer, start->nb_bytes);
+        if (ret < 0) {
+            goto fail;
+        }
 
-    ret = do_perform_cow_read(bs, m->offset, end->offset,
-                              end_buffer, end->nb_bytes);
+        ret = do_perform_cow_read(bs, m->offset, end->offset,
+                                  end_buffer, end->nb_bytes);
+    }
     if (ret < 0) {
         goto fail;
     }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 6/7] qcow2: Pass a QEMUIOVector to do_perform_cow_{read, write}()
  2017-05-23 11:22 [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Alberto Garcia
                   ` (4 preceding siblings ...)
  2017-05-23 11:23 ` [Qemu-devel] [PATCH 5/7] qcow2: Allow reading both COW regions with only one request Alberto Garcia
@ 2017-05-23 11:23 ` Alberto Garcia
  2017-05-23 11:23 ` [Qemu-devel] [PATCH 7/7] qcow2: Merge the writing of the COW regions with the guest data Alberto Garcia
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-05-23 11:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev, Alberto Garcia

Instead of passing a single buffer pointer to do_perform_cow_write(),
pass a QEMUIOVector. This will allow us to merge the write requests
for the COW regions and the actual data into a single one.

Although do_perform_cow_read() does not strictly need to change its
API, we're doing it here as well for consistency.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 51 ++++++++++++++++++++++++---------------------------
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 67a761731d..ae737adf88 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -406,19 +406,14 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
 static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
                                             uint64_t src_cluster_offset,
                                             unsigned offset_in_cluster,
-                                            uint8_t *buffer,
-                                            unsigned bytes)
+                                            QEMUIOVector *qiov)
 {
-    QEMUIOVector qiov;
-    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
     int ret;
 
-    if (bytes == 0) {
+    if (qiov->size == 0) {
         return 0;
     }
 
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
     BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
 
     if (!bs->drv) {
@@ -430,7 +425,7 @@ static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
      * which can lead to deadlock when block layer copy-on-read is enabled.
      */
     ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
-                                  bytes, &qiov, 0);
+                                  qiov->size, qiov, 0);
     if (ret < 0) {
         return ret;
     }
@@ -462,28 +457,23 @@ static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
 static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
                                              uint64_t cluster_offset,
                                              unsigned offset_in_cluster,
-                                             uint8_t *buffer,
-                                             unsigned bytes)
+                                             QEMUIOVector *qiov)
 {
-    QEMUIOVector qiov;
-    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
     int ret;
 
-    if (bytes == 0) {
+    if (qiov->size == 0) {
         return 0;
     }
 
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
     ret = qcow2_pre_write_overlap_check(bs, 0,
-            cluster_offset + offset_in_cluster, bytes);
+            cluster_offset + offset_in_cluster, qiov->size);
     if (ret < 0) {
         return ret;
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
     ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
-                          bytes, &qiov, 0);
+                          qiov->size, qiov, 0);
     if (ret < 0) {
         return ret;
     }
@@ -780,6 +770,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     unsigned data_bytes = end->offset - (start->offset + start->nb_bytes);
     bool merge_reads = false;
     uint8_t *start_buffer, *end_buffer;
+    QEMUIOVector qiov;
     int ret;
 
     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
@@ -807,22 +798,25 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     /* The part of the buffer where the end region is located */
     end_buffer = start_buffer + buffer_size - end->nb_bytes;
 
+    qemu_iovec_init(&qiov, 3);
+
     qemu_co_mutex_unlock(&s->lock);
     /* First we read the existing data from both COW regions. We
      * either read the whole region in one go, or the start and end
      * regions separately. */
     if (merge_reads) {
-        ret = do_perform_cow_read(bs, m->offset, start->offset,
-                                  start_buffer, buffer_size);
+        qemu_iovec_add(&qiov, start_buffer, buffer_size);
+        ret = do_perform_cow_read(bs, m->offset, start->offset, &qiov);
     } else {
-        ret = do_perform_cow_read(bs, m->offset, start->offset,
-                                  start_buffer, start->nb_bytes);
+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        ret = do_perform_cow_read(bs, m->offset, start->offset, &qiov);
         if (ret < 0) {
             goto fail;
         }
 
-        ret = do_perform_cow_read(bs, m->offset, end->offset,
-                                  end_buffer, end->nb_bytes);
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        ret = do_perform_cow_read(bs, m->offset, end->offset, &qiov);
     }
     if (ret < 0) {
         goto fail;
@@ -840,14 +834,16 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     }
 
     /* And now we can write everything */
-    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
-                               start_buffer, start->nb_bytes);
+    qemu_iovec_reset(&qiov);
+    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
     if (ret < 0) {
         goto fail;
     }
 
-    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
-                               end_buffer, end->nb_bytes);
+    qemu_iovec_reset(&qiov);
+    qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
@@ -861,6 +857,7 @@ fail:
     }
 
     qemu_vfree(start_buffer);
+    qemu_iovec_destroy(&qiov);
     return ret;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 7/7] qcow2: Merge the writing of the COW regions with the guest data
  2017-05-23 11:22 [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Alberto Garcia
                   ` (5 preceding siblings ...)
  2017-05-23 11:23 ` [Qemu-devel] [PATCH 6/7] qcow2: Pass a QEMUIOVector to do_perform_cow_{read, write}() Alberto Garcia
@ 2017-05-23 11:23 ` Alberto Garcia
       [not found]   ` <5925B107.1060404@virtuozzo.com>
  2017-05-26 14:09   ` Alberto Garcia
  2017-05-23 14:36 ` [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Eric Blake
  2017-06-07 11:44 ` Alberto Garcia
  8 siblings, 2 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-05-23 11:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev, Alberto Garcia

If the guest tries to write data that results on the allocation of a
new cluster, instead of writing the guest data first and then the data
from the COW regions, write everything together using one single I/O
operation.

This can improve the write performance by 25% or more, depending on
several factors such as the media type, the cluster size and the I/O
request size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 34 ++++++++++++++++++++++--------
 block/qcow2.c         | 58 ++++++++++++++++++++++++++++++++++++++++++---------
 block/qcow2.h         |  7 +++++++
 3 files changed, 80 insertions(+), 19 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ae737adf88..e926dad6bb 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -776,6 +776,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
     assert(buffer_size <= UINT_MAX - data_bytes);
     assert(start->offset + start->nb_bytes <= end->offset);
+    assert(!m->data_qiov || m->data_qiov->size == data_bytes);
 
     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
@@ -833,17 +834,32 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
         }
     }
 
-    /* And now we can write everything */
-    qemu_iovec_reset(&qiov);
-    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
-    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
-    if (ret < 0) {
-        goto fail;
+    /* And now we can write everything. If we have the guest data we
+     * can write everything in one single operation */
+    if (m->data_qiov) {
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        /* NOTE: we have a write_aio blkdebug event here followed by
+         * a cow_write one in do_perform_cow_write(), but there's only
+         * one single I/O operation */
+        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+        ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
+    } else {
+        /* If there's no guest data then write both COW regions separately */
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
     }
 
-    qemu_iovec_reset(&qiov);
-    qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
-    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
diff --git a/block/qcow2.c b/block/qcow2.c
index a8d61f0981..71023ab773 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1575,6 +1575,38 @@ fail:
     return ret;
 }
 
+/* Check if it's possible to merge a write request with the writing of
+ * the data from the COW regions */
+static bool can_merge_cow(uint64_t offset, unsigned bytes,
+                          QEMUIOVector *hd_qiov, QCowL2Meta *l2meta)
+{
+    QCowL2Meta *m;
+
+    for (m = l2meta; m != NULL; m = m->next) {
+        /* If both COW regions are empty then there's nothing to merge */
+        if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
+            continue;
+        }
+
+        /* The data (middle) region must be immediately after the
+         * start region */
+        if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
+            continue;
+        }
+
+        /* The end region must be immediately after the data (middle)
+         * region */
+        if (m->offset + m->cow_end.offset != offset + bytes) {
+            continue;
+        }
+
+        m->data_qiov = hd_qiov;
+        return true;
+    }
+
+    return false;
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
                                          uint64_t bytes, QEMUIOVector *qiov,
                                          int flags)
@@ -1657,16 +1689,22 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
             goto fail;
         }
 
-        qemu_co_mutex_unlock(&s->lock);
-        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-        trace_qcow2_writev_data(qemu_coroutine_self(),
-                                cluster_offset + offset_in_cluster);
-        ret = bdrv_co_pwritev(bs->file,
-                              cluster_offset + offset_in_cluster,
-                              cur_bytes, &hd_qiov, 0);
-        qemu_co_mutex_lock(&s->lock);
-        if (ret < 0) {
-            goto fail;
+        /* If we need to do COW, check if it's possible to merge the
+         * writing of the guest data together with that of the COW regions.
+         * If it's not possible (or not necessary) then write the
+         * guest data now. */
+        if (!can_merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
+            qemu_co_mutex_unlock(&s->lock);
+            BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+            trace_qcow2_writev_data(qemu_coroutine_self(),
+                                    cluster_offset + offset_in_cluster);
+            ret = bdrv_co_pwritev(bs->file,
+                                  cluster_offset + offset_in_cluster,
+                                  cur_bytes, &hd_qiov, 0);
+            qemu_co_mutex_lock(&s->lock);
+            if (ret < 0) {
+                goto fail;
+            }
         }
 
         while (l2meta != NULL) {
diff --git a/block/qcow2.h b/block/qcow2.h
index c26ee0a33d..87b15eb4aa 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -343,6 +343,13 @@ typedef struct QCowL2Meta
      */
     Qcow2COWRegion cow_end;
 
+    /**
+     * The I/O vector with the data from the actual guest write request.
+     * If non-NULL, this is meant to be merged together with the data
+     * from @cow_start and @cow_end into one single write operation.
+     */
+    QEMUIOVector *data_qiov;
+
     /** Pointer to next L2Meta of the same write request */
     struct QCowL2Meta *next;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-23 11:22 [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Alberto Garcia
                   ` (6 preceding siblings ...)
  2017-05-23 11:23 ` [Qemu-devel] [PATCH 7/7] qcow2: Merge the writing of the COW regions with the guest data Alberto Garcia
@ 2017-05-23 14:36 ` Eric Blake
  2017-05-24 14:20   ` Alberto Garcia
  2017-06-07 11:44 ` Alberto Garcia
  8 siblings, 1 reply; 32+ messages in thread
From: Eric Blake @ 2017-05-23 14:36 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Denis V . Lunev, Anton Nefedov

[-- Attachment #1: Type: text/plain, Size: 525 bytes --]

On 05/23/2017 06:22 AM, Alberto Garcia wrote:
> Hi all,
> 
> here's a patch series that rewrites the copy-on-write code in the
> qcow2 driver to reduce the number of I/O operations.

And it competes with Denis and Anton's patches:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg04547.html

What plan of attack should we take on merging the best parts of these
two series?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] qcow2: Remove unused Error in do_perform_cow()
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 1/7] qcow2: Remove unused Error in do_perform_cow() Alberto Garcia
@ 2017-05-23 20:21   ` Eric Blake
  2017-05-24  9:48     ` Alberto Garcia
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Blake @ 2017-05-23 20:21 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Stefan Hajnoczi, Denis V . Lunev

[-- Attachment #1: Type: text/plain, Size: 1153 bytes --]

On 05/23/2017 06:22 AM, Alberto Garcia wrote:
> qcow2_encrypt_sectors() does not need an Error parameter, and we're
> not checking its value anyway, so we can safely remove it.

Misleading. You are NOT removing the Error parameter from
qcow2_encrypt_sectors(), but rather are explicitly ignoring any errors
by passing NULL.

I'd update the commit message to something like:

We are relying on the return value of qcow2_encrypt_sectors() to flag
problems, but have no way to report that error to the end user.  Since
we are just throwing away the error, we can pass NULL instead for
simpler code.

A more robust solution would figure out how to pass the original error
(rather than a new message related to our -EIO return) back to the
caller, but that is more invasive.

> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 

With a better commit message,
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] qcow2: Remove unused Error in do_perform_cow()
  2017-05-23 20:21   ` Eric Blake
@ 2017-05-24  9:48     ` Alberto Garcia
  0 siblings, 0 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-05-24  9:48 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Stefan Hajnoczi, Denis V . Lunev

On Tue 23 May 2017 10:21:49 PM CEST, Eric Blake wrote:

>> qcow2_encrypt_sectors() does not need an Error parameter, and we're
>> not checking its value anyway, so we can safely remove it.
>
> Misleading. You are NOT removing the Error parameter from
> qcow2_encrypt_sectors(), but rather are explicitly ignoring any errors
> by passing NULL.

Ok, I'll update the comment in the next revision with something like
what you suggest.

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-23 14:36 ` [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Eric Blake
@ 2017-05-24 14:20   ` Alberto Garcia
  2017-05-24 16:09     ` Anton Nefedov
  0 siblings, 1 reply; 32+ messages in thread
From: Alberto Garcia @ 2017-05-24 14:20 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Denis V . Lunev, Anton Nefedov

On Tue 23 May 2017 04:36:52 PM CEST, Eric Blake wrote:

>> here's a patch series that rewrites the copy-on-write code in the
>> qcow2 driver to reduce the number of I/O operations.
>
> And it competes with Denis and Anton's patches:
> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg04547.html
>
> What plan of attack should we take on merging the best parts of these
> two series?

I took a look at that series and unless I overlooked something important
it seems that there's actually not that much overlap. Denis and Anton's
series deals with cluster preallocation and mine reduces the number of
I/O operations when there's a COW scenario. Most of my modifications are
in the peform_cow() function, but they barely touch that one.

I think we can review both separately, either of us can rebase our
series on top of the other, I don't expect big changes or conflicts.

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-24 14:20   ` Alberto Garcia
@ 2017-05-24 16:09     ` Anton Nefedov
  2017-05-24 16:20       ` Alberto Garcia
  2017-05-26 10:17       ` [Qemu-devel] " Kevin Wolf
  0 siblings, 2 replies; 32+ messages in thread
From: Anton Nefedov @ 2017-05-24 16:09 UTC (permalink / raw)
  To: Alberto Garcia, Eric Blake, qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Stefan Hajnoczi, Denis V . Lunev


On 05/24/2017 05:20 PM, Alberto Garcia wrote:
> On Tue 23 May 2017 04:36:52 PM CEST, Eric Blake wrote:
>
>>> here's a patch series that rewrites the copy-on-write code in the
>>> qcow2 driver to reduce the number of I/O operations.
>>
>> And it competes with Denis and Anton's patches:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg04547.html
>>
>> What plan of attack should we take on merging the best parts of these
>> two series?
>
> I took a look at that series and unless I overlooked something important
> it seems that there's actually not that much overlap. Denis and Anton's
> series deals with cluster preallocation and mine reduces the number of
> I/O operations when there's a COW scenario. Most of my modifications are
> in the peform_cow() function, but they barely touch that one.
>
> I think we can review both separately, either of us can rebase our
> series on top of the other, I don't expect big changes or conflicts.
>
> Berto
>

I agree; as mentioned we have similar patches and they don't conflict much.
We noticed a performance regression on HDD though, for the presumably 
optimized case (random 4k write over a large backed image); so the 
patches were put on hold.
SSD looked fine though.

/Anton

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-24 16:09     ` Anton Nefedov
@ 2017-05-24 16:20       ` Alberto Garcia
  2017-05-24 16:26         ` Anton Nefedov
  2017-05-26 10:17       ` [Qemu-devel] " Kevin Wolf
  1 sibling, 1 reply; 32+ messages in thread
From: Alberto Garcia @ 2017-05-24 16:20 UTC (permalink / raw)
  To: Anton Nefedov, Eric Blake, qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Stefan Hajnoczi, Denis V . Lunev

On Wed 24 May 2017 06:09:42 PM CEST, Anton Nefedov wrote:

> I agree; as mentioned we have similar patches and they don't conflict
> much.  We noticed a performance regression on HDD though, for the
> presumably optimized case (random 4k write over a large backed image);
> so the patches were put on hold.

Interesting, I think that scenario was noticeably faster in my
tests. What cluster size(s) and image size(s) were you using?

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-24 16:20       ` Alberto Garcia
@ 2017-05-24 16:26         ` Anton Nefedov
  2017-05-25 11:48           ` Alberto Garcia
  0 siblings, 1 reply; 32+ messages in thread
From: Anton Nefedov @ 2017-05-24 16:26 UTC (permalink / raw)
  To: Alberto Garcia, Eric Blake, qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Stefan Hajnoczi, Denis V . Lunev



On 05/24/2017 07:20 PM, Alberto Garcia wrote:
> On Wed 24 May 2017 06:09:42 PM CEST, Anton Nefedov wrote:
> 
>> I agree; as mentioned we have similar patches and they don't conflict
>> much.  We noticed a performance regression on HDD though, for the
>> presumably optimized case (random 4k write over a large backed image);
>> so the patches were put on hold.
> 
> Interesting, I think that scenario was noticeably faster in my
> tests. What cluster size(s) and image size(s) were you using?
> 
64k cluster, 2g image, write 32m in portions of 4k at random offsets

/Anton

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] qcow2: Merge the writing of the COW regions with the guest data
       [not found]   ` <5925B107.1060404@virtuozzo.com>
@ 2017-05-24 16:43     ` Anton Nefedov
  2017-05-24 19:05       ` Alberto Garcia
  0 siblings, 1 reply; 32+ messages in thread
From: Anton Nefedov @ 2017-05-24 16:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, qemu-block, mreitz, stefanha, den

> If the guest tries to write data that results on the allocation of a
> new cluster, instead of writing the guest data first and then the data
> from the COW regions, write everything together using one single I/O
> operation.
>
> This can improve the write performance by 25% or more, depending on
> several factors such as the media type, the cluster size and the I/O
> request size.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 34 ++++++++++++++++++++++--------
>  block/qcow2.c         | 58
> ++++++++++++++++++++++++++++++++++++++++++---------
>  block/qcow2.h         |  7 +++++++
>  3 files changed, 80 insertions(+), 19 deletions(-)
>
> -    /* And now we can write everything */
> -    qemu_iovec_reset(&qiov);
> -    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
> -    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
> -    if (ret < 0) {
> -        goto fail;
> +    /* And now we can write everything. If we have the guest data we
> +     * can write everything in one single operation */
> +    if (m->data_qiov) {
> +        qemu_iovec_reset(&qiov);
> +        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
> +        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
> +        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);

Can it be a problem if (m->data_qiov->niov == IOV_MAX)?
We had to add merge-iovecs code for the case (maybe there's better
solution?)

Also, will this work if allocation is split into several l2metas?
e.g. one has cow_start.nb_bytes and another has cow_end.nb_bytes

/Anton

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] qcow2: Merge the writing of the COW regions with the guest data
  2017-05-24 16:43     ` Anton Nefedov
@ 2017-05-24 19:05       ` Alberto Garcia
  2017-05-26 10:12         ` Kevin Wolf
  0 siblings, 1 reply; 32+ messages in thread
From: Alberto Garcia @ 2017-05-24 19:05 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: kwolf, qemu-block, mreitz, stefanha, den

On Wed 24 May 2017 06:43:31 PM CEST, Anton Nefedov wrote:
>> +    if (m->data_qiov) {
>> +        qemu_iovec_reset(&qiov);
>> +        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
>> +        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
>> +        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
>
> Can it be a problem if (m->data_qiov->niov == IOV_MAX)?
> We had to add merge-iovecs code for the case (maybe there's better
> solution?)

You're right, good catch! I'll add a check for that. To be honest I
don't think that's likely to happen in practice, so if it does we can
simply fall back to the old behavior (separate writes).

> Also, will this work if allocation is split into several l2metas?
> e.g. one has cow_start.nb_bytes and another has cow_end.nb_bytes

The guest request will be merged with the first l2meta that has to copy
at least one of the two regions. It doesn't matter if the other one has
nb_bytes == 0.

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-24 16:26         ` Anton Nefedov
@ 2017-05-25 11:48           ` Alberto Garcia
  2017-05-25 14:35             ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  0 siblings, 1 reply; 32+ messages in thread
From: Alberto Garcia @ 2017-05-25 11:48 UTC (permalink / raw)
  To: Anton Nefedov, Eric Blake, qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Stefan Hajnoczi, Denis V . Lunev

On Wed 24 May 2017 06:26:23 PM CEST, Anton Nefedov wrote:
>>> I agree; as mentioned we have similar patches and they don't
>>> conflict much.  We noticed a performance regression on HDD though,
>>> for the presumably optimized case (random 4k write over a large
>>> backed image); so the patches were put on hold.
>> 
>> Interesting, I think that scenario was noticeably faster in my
>> tests. What cluster size(s) and image size(s) were you using?
>> 
> 64k cluster, 2g image, write 32m in portions of 4k at random offsets

I just tried that and the optimized case performs better (as expected),
almost twice as fast in fact:

write: io=32892KB, bw=162944B/s, iops=39, runt=206705msec
write: io=32892KB, bw=309256B/s, iops=75, runt=108911msec

I'll try in a different machine.

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-25 11:48           ` Alberto Garcia
@ 2017-05-25 14:35             ` Alberto Garcia
  0 siblings, 0 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-05-25 14:35 UTC (permalink / raw)
  To: Anton Nefedov, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Stefan Hajnoczi, qemu-block, Max Reitz

On Thu 25 May 2017 01:48:39 PM CEST, Alberto Garcia wrote:
>>>> I agree; as mentioned we have similar patches and they don't
>>>> conflict much.  We noticed a performance regression on HDD though,
>>>> for the presumably optimized case (random 4k write over a large
>>>> backed image); so the patches were put on hold.
>>> 
>>> Interesting, I think that scenario was noticeably faster in my
>>> tests. What cluster size(s) and image size(s) were you using?
>>> 
>> 64k cluster, 2g image, write 32m in portions of 4k at random offsets
>
> I just tried that and the optimized case performs better (as
> expected), almost twice as fast in fact:
>
> write: io=32892KB, bw=162944B/s, iops=39, runt=206705msec
> write: io=32892KB, bw=309256B/s, iops=75, runt=108911msec
>
> I'll try in a different machine.

I made more tests of that same scenario with a different HDD:

write: io=32892KB, bw=588588B/s, iops=143, runt= 57224msec
write: io=32892KB, bw=779951B/s, iops=190, runt= 43184msec

And here are the results without a backing file:

write: io=32892KB, bw=1510.2KB/s, iops=377, runt= 21781msec
write: io=32892KB, bw=5417.1KB/s, iops=1354, runt=  6071msec

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice
  2017-05-23 11:22 ` [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice Alberto Garcia
@ 2017-05-26  8:11   ` Kevin Wolf
  2017-05-26  9:10     ` Alberto Garcia
  0 siblings, 1 reply; 32+ messages in thread
From: Kevin Wolf @ 2017-05-26  8:11 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev

Am 23.05.2017 um 13:22 hat Alberto Garcia geschrieben:
> Instead of calling perform_cow() twice with a different COW region
> each time, call it just once and make perform_cow() handle both
> regions.
> 
> This patch simply moves code around. The next one will do the actual
> reordering of the COW operations.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 38 +++++++++++++++++++++++---------------
>  1 file changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 6757875ec9..59a0ffba1a 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -414,6 +414,10 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
>      struct iovec iov;
>      int ret;
>  
> +    if (bytes == 0) {
> +        return 0;
> +    }
> +
>      iov.iov_len = bytes;
>      iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
>      if (iov.iov_base == NULL) {
> @@ -751,31 +755,40 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>      return cluster_offset;
>  }
>  
> -static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, Qcow2COWRegion *r)
> +static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
>  {
>      BDRVQcow2State *s = bs->opaque;
> +    Qcow2COWRegion *start = &m->cow_start;
> +    Qcow2COWRegion *end = &m->cow_end;
>      int ret;
>  
> -    if (r->nb_bytes == 0) {
> +    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
>          return 0;
>      }

With this change, it can now happen that we call do_perform_cow() with
bytes == 0. Maybe it would be better for do_perform_cow() to check bytes
first so that it doesn't bdrv_co_preadv/pwritev() for 0 bytes?

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice
  2017-05-26  8:11   ` Kevin Wolf
@ 2017-05-26  9:10     ` Alberto Garcia
  2017-05-26 10:08       ` Kevin Wolf
  0 siblings, 1 reply; 32+ messages in thread
From: Alberto Garcia @ 2017-05-26  9:10 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev

On Fri 26 May 2017 10:11:29 AM CEST, Kevin Wolf <kwolf@redhat.com> wrote:
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -414,6 +414,10 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
>>      struct iovec iov;
>>      int ret;
>>  
>> +    if (bytes == 0) {
>> +        return 0;
>> +    }
>> +

   [...]

>> +static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
>>  {
>>      BDRVQcow2State *s = bs->opaque;
>> +    Qcow2COWRegion *start = &m->cow_start;
>> +    Qcow2COWRegion *end = &m->cow_end;
>>      int ret;
>>  
>> -    if (r->nb_bytes == 0) {
>> +    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
>>          return 0;
>>      }
>
> With this change, it can now happen that we call do_perform_cow() with
> bytes == 0.

Yes, but see the change I made to do_perform_cow() in the same patch
(quoted above).

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice
  2017-05-26  9:10     ` Alberto Garcia
@ 2017-05-26 10:08       ` Kevin Wolf
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2017-05-26 10:08 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev

Am 26.05.2017 um 11:10 hat Alberto Garcia geschrieben:
> On Fri 26 May 2017 10:11:29 AM CEST, Kevin Wolf <kwolf@redhat.com> wrote:
> >> --- a/block/qcow2-cluster.c
> >> +++ b/block/qcow2-cluster.c
> >> @@ -414,6 +414,10 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
> >>      struct iovec iov;
> >>      int ret;
> >>  
> >> +    if (bytes == 0) {
> >> +        return 0;
> >> +    }
> >> +
> 
>    [...]
> 
> >> +static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
> >>  {
> >>      BDRVQcow2State *s = bs->opaque;
> >> +    Qcow2COWRegion *start = &m->cow_start;
> >> +    Qcow2COWRegion *end = &m->cow_end;
> >>      int ret;
> >>  
> >> -    if (r->nb_bytes == 0) {
> >> +    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
> >>          return 0;
> >>      }
> >
> > With this change, it can now happen that we call do_perform_cow() with
> > bytes == 0.
> 
> Yes, but see the change I made to do_perform_cow() in the same patch
> (quoted above).

Wait... How did you manage to hack my email account and insert this
retroactively? :-)

Sorry for the noise then, I must have been looking at the source code of
the wrong commit.

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] qcow2: Merge the writing of the COW regions with the guest data
  2017-05-24 19:05       ` Alberto Garcia
@ 2017-05-26 10:12         ` Kevin Wolf
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2017-05-26 10:12 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Anton Nefedov, qemu-devel, qemu-block, mreitz, stefanha, den

Am 24.05.2017 um 21:05 hat Alberto Garcia geschrieben:
> On Wed 24 May 2017 06:43:31 PM CEST, Anton Nefedov wrote:
> >> +    if (m->data_qiov) {
> >> +        qemu_iovec_reset(&qiov);
> >> +        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
> >> +        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
> >> +        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
> >
> > Can it be a problem if (m->data_qiov->niov == IOV_MAX)?
> > We had to add merge-iovecs code for the case (maybe there's better
> > solution?)
> 
> You're right, good catch! I'll add a check for that. To be honest I
> don't think that's likely to happen in practice, so if it does we can
> simply fall back to the old behavior (separate writes).
> 
> > Also, will this work if allocation is split into several l2metas?
> > e.g. one has cow_start.nb_bytes and another has cow_end.nb_bytes
> 
> The guest request will be merged with the first l2meta that has to copy
> at least one of the two regions. It doesn't matter if the other one has
> nb_bytes == 0.

I think we can improve this later so that we merge everything together
(multiple l2metas and guest data) into a single request, but what this
seris implements is still a very good first step, so we shouldn't let
this stop us from taking the good rather than waiting for the perfect.

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-24 16:09     ` Anton Nefedov
  2017-05-24 16:20       ` Alberto Garcia
@ 2017-05-26 10:17       ` Kevin Wolf
  2017-05-26 12:47         ` Anton Nefedov
  1 sibling, 1 reply; 32+ messages in thread
From: Kevin Wolf @ 2017-05-26 10:17 UTC (permalink / raw)
  To: Anton Nefedov
  Cc: Alberto Garcia, Eric Blake, qemu-devel, qemu-block, Max Reitz,
	Stefan Hajnoczi, Denis V . Lunev

Am 24.05.2017 um 18:09 hat Anton Nefedov geschrieben:
> I agree; as mentioned we have similar patches and they don't conflict much.
> We noticed a performance regression on HDD though, for the
> presumably optimized case (random 4k write over a large backed
> image); so the patches were put on hold.

You're talking about your own patches that should do the same thing,
right? Can you re-do the same test with Berto's patches? Maybe there was
just an implementation glitch in yours.

This approach should very obviously result in a performance improvement,
and the patches are relatively simple, so I'm very much inclined to
merge this as soon as possible.

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-26 10:17       ` [Qemu-devel] " Kevin Wolf
@ 2017-05-26 12:47         ` Anton Nefedov
  2017-05-26 13:08           ` Alberto Garcia
  0 siblings, 1 reply; 32+ messages in thread
From: Anton Nefedov @ 2017-05-26 12:47 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Alberto Garcia, Eric Blake, qemu-devel, qemu-block, Max Reitz,
	Stefan Hajnoczi, Denis V . Lunev


On 05/26/2017 01:17 PM, Kevin Wolf wrote:
> Am 24.05.2017 um 18:09 hat Anton Nefedov geschrieben:
>> I agree; as mentioned we have similar patches and they don't conflict much.
>> We noticed a performance regression on HDD though, for the
>> presumably optimized case (random 4k write over a large backed
>> image); so the patches were put on hold.
> 
> You're talking about your own patches that should do the same thing,
> right? Can you re-do the same test with Berto's patches? Maybe there was
> just an implementation glitch in yours.
> 
> This approach should very obviously result in a performance improvement,
> and the patches are relatively simple, so I'm very much inclined to
> merge this as soon as possible.
> 
> Kevin
> 


Tried the another machine; about 10% improvement here

[root@dpclient centos-7.3-x86_64]# qemu-img info ./harddisk2.hdd
image: ./harddisk2.hdd
file format: qcow2
virtual size: 2.0G (2147483648 bytes)
disk size: 260K
cluster_size: 65536
backing file: harddisk2.hdd.base (actual path: ./harddisk2.hdd.base)
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
[root@dpclient centos-7.3-x86_64]# qemu-img info ./harddisk2.hdd.base
image: ./harddisk2.hdd.base
file format: qcow2
virtual size: 2.0G (2147483648 bytes)
disk size: 2.0G
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: true
refcount bits: 16
corrupt: false
[root@dpclient centos-7.3-x86_64]# filefrag ./harddisk2.hdd.base
./harddisk2.hdd.base: 3 extents found


[root@localhost ~]# fio --name=randwrite --blocksize=4k
--filename=/dev/sdb --rw=randwrite --direct=1 --ioengine=libaio
--size=2g --io_size=32m

/* master */
write: io=32768KB, bw=372785B/s, iops=91, runt= 90010msec
/* Berto's patches */
write: io=32768KB, bw=404304B/s, iops=98, runt= 82993msec


/Anton

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-26 12:47         ` Anton Nefedov
@ 2017-05-26 13:08           ` Alberto Garcia
  2017-05-26 13:32             ` Anton Nefedov
  0 siblings, 1 reply; 32+ messages in thread
From: Alberto Garcia @ 2017-05-26 13:08 UTC (permalink / raw)
  To: Anton Nefedov, Kevin Wolf
  Cc: Eric Blake, qemu-devel, qemu-block, Max Reitz, Stefan Hajnoczi,
	Denis V . Lunev

On Fri 26 May 2017 02:47:55 PM CEST, Anton Nefedov wrote:
> Tried the another machine; about 10% improvement here
  [...]
> [root@localhost ~]# fio --name=randwrite --blocksize=4k
> --filename=/dev/sdb --rw=randwrite --direct=1 --ioengine=libaio
> --size=2g --io_size=32m

In my tests I sometimes detected slight performance decreases in that
HDD scenario but using 'write' instead of 'randwrite' (and --runtime=60
instead of --io_size).

Can you try and see how that works for you?

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-26 13:08           ` Alberto Garcia
@ 2017-05-26 13:32             ` Anton Nefedov
  2017-05-26 13:38               ` Alberto Garcia
  0 siblings, 1 reply; 32+ messages in thread
From: Anton Nefedov @ 2017-05-26 13:32 UTC (permalink / raw)
  To: Alberto Garcia, Kevin Wolf
  Cc: Eric Blake, qemu-devel, qemu-block, Max Reitz, Stefan Hajnoczi,
	Denis V . Lunev

On 05/26/2017 04:08 PM, Alberto Garcia wrote:
> On Fri 26 May 2017 02:47:55 PM CEST, Anton Nefedov wrote:
>> Tried the another machine; about 10% improvement here
>    [...]
>> [root@localhost ~]# fio --name=randwrite --blocksize=4k
>> --filename=/dev/sdb --rw=randwrite --direct=1 --ioengine=libaio
>> --size=2g --io_size=32m
> 
> In my tests I sometimes detected slight performance decreases in that
> HDD scenario but using 'write' instead of 'randwrite' (and --runtime=60
> instead of --io_size).
> 
> Can you try and see how that works for you?
> 
> Berto
> 

For me it keeps giving pretty much the same result before and after

  write: io=512736KB, bw=8545.4KB/s, iops=2136, runt= 60004msec

/Anton

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-26 13:32             ` Anton Nefedov
@ 2017-05-26 13:38               ` Alberto Garcia
  0 siblings, 0 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-05-26 13:38 UTC (permalink / raw)
  To: Anton Nefedov, Kevin Wolf
  Cc: Eric Blake, qemu-devel, qemu-block, Max Reitz, Stefan Hajnoczi,
	Denis V . Lunev

On Fri 26 May 2017 03:32:49 PM CEST, Anton Nefedov wrote:
>>> [root@localhost ~]# fio --name=randwrite --blocksize=4k
>>> --filename=/dev/sdb --rw=randwrite --direct=1 --ioengine=libaio
>>> --size=2g --io_size=32m
>> 
>> In my tests I sometimes detected slight performance decreases in that
>> HDD scenario but using 'write' instead of 'randwrite' (and --runtime=60
>> instead of --io_size).
>> 
>> Can you try and see how that works for you?
>
> For me it keeps giving pretty much the same result before and after
>
>   write: io=512736KB, bw=8545.4KB/s, iops=2136, runt= 60004msec

Ok, that's the expected behavior. Good!

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] qcow2: Merge the writing of the COW regions with the guest data
  2017-05-23 11:23 ` [Qemu-devel] [PATCH 7/7] qcow2: Merge the writing of the COW regions with the guest data Alberto Garcia
       [not found]   ` <5925B107.1060404@virtuozzo.com>
@ 2017-05-26 14:09   ` Alberto Garcia
  1 sibling, 0 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-05-26 14:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev

On Tue 23 May 2017 01:23:02 PM CEST, Alberto Garcia wrote:

You can still review this now if you want, but here's a couple of minor
things I'll correct in the next revision:

> +    if (m->data_qiov) {
> +        qemu_iovec_reset(&qiov);
> +        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
> +        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
> +        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);

I'll check start->nb_bytes and end->nb_bytes for zero before calling
qemu_iovec_add() (no practical difference, but I find it a bit cleaner).

> +/* Check if it's possible to merge a write request with the writing of
> + * the data from the COW regions */
> +static bool can_merge_cow(uint64_t offset, unsigned bytes,
> +                          QEMUIOVector *hd_qiov, QCowL2Meta *l2meta)
> +{
> +    QCowL2Meta *m;
> +
> +    for (m = l2meta; m != NULL; m = m->next) {
> +        /* If both COW regions are empty then there's nothing to merge */
> +        if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
> +            continue;
> +        }
> +
> +        /* The data (middle) region must be immediately after the
> +         * start region */
> +        if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
> +            continue;
> +        }

I realized that I'm using guest offsets in this function. Since this
checks if we can _write_ everything together I think I should be using
host offsets instead.

I also don't think this makes any difference in practice in this case,
but I'll update it.

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-05-23 11:22 [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Alberto Garcia
                   ` (7 preceding siblings ...)
  2017-05-23 14:36 ` [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Eric Blake
@ 2017-06-07 11:44 ` Alberto Garcia
  2017-06-07 11:59   ` Kevin Wolf
  8 siblings, 1 reply; 32+ messages in thread
From: Alberto Garcia @ 2017-06-07 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev

ping

On Tue, May 23, 2017 at 01:22:55PM +0200, Alberto Garcia wrote:
> Hi all,
> 
> here's a patch series that rewrites the copy-on-write code in the
> qcow2 driver to reduce the number of I/O operations.
> 
> The situation is that when a guest sends a write request and QEMU
> needs to allocate new cluster(s) in a qcow2 file, the unwritten
> regions of the new cluster(s) need to be filled with the existing data
> (e.g. from the backing image) or with zeroes.
> 
> The whole process can require up to 5 I/O operations:
> 
> 1) Write the data from the actual write request.
> 2) Read the existing data located before the guest data.
> 3) Write that data to the new clusters.
> 4) Read the existing data located after the guest data.
> 5) Write that data to the new clusters.
> 
> This series reduces that to only two operations:
> 
> 1) Read the existing data from the original clusters
> 2) Write the updated data (=original + guest request) to the new clusters
> 
> Step (1) implies that there's data that will be read but will be
> immediately discarded (because it's overwritten by the guest
> request). I haven't really detected any big performance problems
> because of that, but I decided to be conservative and my code includes
> a simple heuristic that keeps the old behavior if the amount of data
> to be discarded is higher than 16KB.
> 
> I've been testing this series in several scenarios, with different
> cluster sizes (32K, 64K, 1MB) and request sizes (from 4 up to 512KB),
> and both with an SSD and a rotating HDD. The results vary depending on
> the case, with an average increase of 60% in the number of IOPS in the
> HDD case, and 15% in the SSD case. In some cases there are really no
> big differences and the results are similar before and after this
> patch.
> 
> Further work for the future includes detecting when the data that
> needs to be written consists on zeroes (i.e. allocating a new cluster
> with no backing image) and optimizing that case, but let's start with
> this.
> 
> Regards,
> 
> Berto
> 
> Alberto Garcia (7):
>   qcow2: Remove unused Error in do_perform_cow()
>   qcow2: Use unsigned int for both members of Qcow2COWRegion
>   qcow2: Make perform_cow() call do_perform_cow() twice
>   qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()
>   qcow2: Allow reading both COW regions with only one request
>   qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()
>   qcow2: Merge the writing of the COW regions with the guest data
> 
>  block/qcow2-cluster.c | 188 +++++++++++++++++++++++++++++++++++++-------------
>  block/qcow2.c         |  58 +++++++++++++---
>  block/qcow2.h         |  11 ++-
>  3 files changed, 197 insertions(+), 60 deletions(-)
> 
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-06-07 11:44 ` Alberto Garcia
@ 2017-06-07 11:59   ` Kevin Wolf
  2017-06-07 12:13     ` Alberto Garcia
  0 siblings, 1 reply; 32+ messages in thread
From: Kevin Wolf @ 2017-06-07 11:59 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev

Am 07.06.2017 um 13:44 hat Alberto Garcia geschrieben:
> ping

You wanted to address two or three things in the next version, so I
assumed that this version shouldn't be merged.

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW
  2017-06-07 11:59   ` Kevin Wolf
@ 2017-06-07 12:13     ` Alberto Garcia
  0 siblings, 0 replies; 32+ messages in thread
From: Alberto Garcia @ 2017-06-07 12:13 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi,
	Denis V . Lunev

On Wed 07 Jun 2017 01:59:58 PM CEST, Kevin Wolf wrote:
> Am 07.06.2017 um 13:44 hat Alberto Garcia geschrieben:
>> ping
>
> You wanted to address two or three things in the next version, so I
> assumed that this version shouldn't be merged.

Right, I had a couple of minor changes, but the core of the series is
going to remain the same and can already be reviewed.

But of course I can just send the second version with my changes, I'll
do it then.

Berto

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-06-07 12:13 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-23 11:22 [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Alberto Garcia
2017-05-23 11:22 ` [Qemu-devel] [PATCH 1/7] qcow2: Remove unused Error in do_perform_cow() Alberto Garcia
2017-05-23 20:21   ` Eric Blake
2017-05-24  9:48     ` Alberto Garcia
2017-05-23 11:22 ` [Qemu-devel] [PATCH 2/7] qcow2: Use unsigned int for both members of Qcow2COWRegion Alberto Garcia
2017-05-23 11:22 ` [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice Alberto Garcia
2017-05-26  8:11   ` Kevin Wolf
2017-05-26  9:10     ` Alberto Garcia
2017-05-26 10:08       ` Kevin Wolf
2017-05-23 11:22 ` [Qemu-devel] [PATCH 4/7] qcow2: Split do_perform_cow() into _read(), _encrypt() and _write() Alberto Garcia
2017-05-23 11:23 ` [Qemu-devel] [PATCH 5/7] qcow2: Allow reading both COW regions with only one request Alberto Garcia
2017-05-23 11:23 ` [Qemu-devel] [PATCH 6/7] qcow2: Pass a QEMUIOVector to do_perform_cow_{read, write}() Alberto Garcia
2017-05-23 11:23 ` [Qemu-devel] [PATCH 7/7] qcow2: Merge the writing of the COW regions with the guest data Alberto Garcia
     [not found]   ` <5925B107.1060404@virtuozzo.com>
2017-05-24 16:43     ` Anton Nefedov
2017-05-24 19:05       ` Alberto Garcia
2017-05-26 10:12         ` Kevin Wolf
2017-05-26 14:09   ` Alberto Garcia
2017-05-23 14:36 ` [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW Eric Blake
2017-05-24 14:20   ` Alberto Garcia
2017-05-24 16:09     ` Anton Nefedov
2017-05-24 16:20       ` Alberto Garcia
2017-05-24 16:26         ` Anton Nefedov
2017-05-25 11:48           ` Alberto Garcia
2017-05-25 14:35             ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2017-05-26 10:17       ` [Qemu-devel] " Kevin Wolf
2017-05-26 12:47         ` Anton Nefedov
2017-05-26 13:08           ` Alberto Garcia
2017-05-26 13:32             ` Anton Nefedov
2017-05-26 13:38               ` Alberto Garcia
2017-06-07 11:44 ` Alberto Garcia
2017-06-07 11:59   ` Kevin Wolf
2017-06-07 12:13     ` Alberto Garcia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.