qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/7] backup improvements
@ 2019-08-10 19:31 Vladimir Sementsov-Ogievskiy
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 1/7] block/backup: deal with zero detection Vladimir Sementsov-Ogievskiy
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-10 19:31 UTC (permalink / raw)
  To: qemu-block
  Cc: fam, kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den, jsnow

Hi all!

There are some fixes and refactorings I need on my way to resend
my backup-top series. It's obvious now that I need to share copying
code between backup and backup-top, as backup copying code becomes
smarter and more complicated. So the goal of the series is to make copying
code more share-able.

Based-on: https://github.com/jnsnow/qemu bitmaps 

v3:
03: Ha, fix real bug, we shouldn't touch src before handling write-zero,
    as src may be NULL. So, replace align and max_transfer calculation
    together with max_transfer == 0 check. Drop comment, as there is no
    special place for it now..
04: add Max's r-b:
06-07: rewrite to keep bounce_buffer sharing between iterations and to
       limit allocation [Eric]

v2 (by Max's comments):

02: Add Max's r-b
03: - split out backup changes to 03
   - handle common max_transfer = 0 case
04: splat out from 02
06: fix allocation size
07: - rebase on 06 changes
   - add Max's r-b

two patches are dropped or postponed for the next step

Vladimir Sementsov-Ogievskiy (7):
  block/backup: deal with zero detection
  block/backup: refactor write_flags
  block/io: handle alignment and max_transfer for copy_range
  block/backup: drop handling of max_transfer for copy_range
  block/backup: fix backup_cow_with_offload for last cluster
  block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  block/backup: merge duplicated logic into backup_do_cow

 block/backup.c | 154 ++++++++++++++++++++++---------------------------
 block/io.c     |  44 +++++++++++---
 blockdev.c     |   8 +--
 3 files changed, 110 insertions(+), 96 deletions(-)

-- 
2.18.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 1/7] block/backup: deal with zero detection
  2019-08-10 19:31 [Qemu-devel] [PATCH v3 0/7] backup improvements Vladimir Sementsov-Ogievskiy
@ 2019-08-10 19:31 ` Vladimir Sementsov-Ogievskiy
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 2/7] block/backup: refactor write_flags Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-10 19:31 UTC (permalink / raw)
  To: qemu-block
  Cc: fam, kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den, jsnow

We have detect_zeroes option, so at least for blockdev-backup user
should define it if zero-detection is needed. For drive-backup leave
detection enabled by default but do it through existing option instead
of open-coding.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/backup.c | 15 ++++++---------
 blockdev.c     |  8 ++++----
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index adc4d44244..d815436455 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -113,7 +113,10 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
     BlockBackend *blk = job->common.blk;
     int nbytes;
     int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-    int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
+    int write_flags =
+            (job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0) |
+            (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+
 
     assert(QEMU_IS_ALIGNED(start, job->cluster_size));
     bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
@@ -131,14 +134,8 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
         goto fail;
     }
 
-    if (buffer_is_zero(*bounce_buffer, nbytes)) {
-        ret = blk_co_pwrite_zeroes(job->target, start,
-                                   nbytes, write_flags | BDRV_REQ_MAY_UNMAP);
-    } else {
-        ret = blk_co_pwrite(job->target, start,
-                            nbytes, *bounce_buffer, write_flags |
-                            (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));
-    }
+    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
+                        write_flags);
     if (ret < 0) {
         trace_backup_do_cow_write_fail(job, start, ret);
         if (error_is_read) {
diff --git a/blockdev.c b/blockdev.c
index 29c6c6044a..2d7e7be538 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3613,7 +3613,7 @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
     BlockDriverState *source = NULL;
     BlockJob *job = NULL;
     AioContext *aio_context;
-    QDict *options = NULL;
+    QDict *options;
     Error *local_err = NULL;
     int flags;
     int64_t size;
@@ -3686,10 +3686,10 @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
         goto out;
     }
 
+    options = qdict_new();
+    qdict_put_str(options, "discard", "unmap");
+    qdict_put_str(options, "detect-zeroes", "unmap");
     if (backup->format) {
-        if (!options) {
-            options = qdict_new();
-        }
         qdict_put_str(options, "driver", backup->format);
     }
 
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 2/7] block/backup: refactor write_flags
  2019-08-10 19:31 [Qemu-devel] [PATCH v3 0/7] backup improvements Vladimir Sementsov-Ogievskiy
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 1/7] block/backup: deal with zero detection Vladimir Sementsov-Ogievskiy
@ 2019-08-10 19:31 ` Vladimir Sementsov-Ogievskiy
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 3/7] block/io: handle alignment and max_transfer for copy_range Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-10 19:31 UTC (permalink / raw)
  To: qemu-block
  Cc: fam, kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den, jsnow

write flags are constant, let's store it in BackupBlockJob instead of
recalculating. It also makes two boolean fields to be unused, so,
drop them.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/backup.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index d815436455..c6a3b2b7bb 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -50,14 +50,13 @@ typedef struct BackupBlockJob {
     uint64_t len;
     uint64_t bytes_read;
     int64_t cluster_size;
-    bool compress;
     NotifierWithReturn before_write;
     QLIST_HEAD(, CowRequest) inflight_reqs;
 
     bool use_copy_range;
     int64_t copy_range_size;
 
-    bool serialize_target_writes;
+    BdrvRequestFlags write_flags;
     bool initializing_bitmap;
 } BackupBlockJob;
 
@@ -113,10 +112,6 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
     BlockBackend *blk = job->common.blk;
     int nbytes;
     int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-    int write_flags =
-            (job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0) |
-            (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
-
 
     assert(QEMU_IS_ALIGNED(start, job->cluster_size));
     bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
@@ -135,7 +130,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
     }
 
     ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
-                        write_flags);
+                        job->write_flags);
     if (ret < 0) {
         trace_backup_do_cow_write_fail(job, start, ret);
         if (error_is_read) {
@@ -163,7 +158,6 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     BlockBackend *blk = job->common.blk;
     int nbytes;
     int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-    int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
     assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
     assert(QEMU_IS_ALIGNED(start, job->cluster_size));
@@ -172,7 +166,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     bdrv_reset_dirty_bitmap(job->copy_bitmap, start,
                             job->cluster_size * nr_clusters);
     ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
-                            read_flags, write_flags);
+                            read_flags, job->write_flags);
     if (ret < 0) {
         trace_backup_do_cow_copy_range_fail(job, start, ret);
         bdrv_set_dirty_bitmap(job->copy_bitmap, start,
@@ -748,10 +742,16 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     job->sync_mode = sync_mode;
     job->sync_bitmap = sync_bitmap;
     job->bitmap_mode = bitmap_mode;
-    job->compress = compress;
 
-    /* Detect image-fleecing (and similar) schemes */
-    job->serialize_target_writes = bdrv_chain_contains(target, bs);
+    /*
+     * Set write flags:
+     *  1. Detect image-fleecing (and similar) schemes
+     *  2. Handle compression
+     */
+    job->write_flags =
+            (bdrv_chain_contains(target, bs) ? BDRV_REQ_SERIALISING : 0) |
+            (compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+
     job->cluster_size = cluster_size;
     job->copy_bitmap = copy_bitmap;
     copy_bitmap = NULL;
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 3/7] block/io: handle alignment and max_transfer for copy_range
  2019-08-10 19:31 [Qemu-devel] [PATCH v3 0/7] backup improvements Vladimir Sementsov-Ogievskiy
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 1/7] block/backup: deal with zero detection Vladimir Sementsov-Ogievskiy
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 2/7] block/backup: refactor write_flags Vladimir Sementsov-Ogievskiy
@ 2019-08-10 19:31 ` Vladimir Sementsov-Ogievskiy
  2019-08-12 14:48   ` Max Reitz
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 4/7] block/backup: drop handling of " Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-10 19:31 UTC (permalink / raw)
  To: qemu-block
  Cc: fam, kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den, jsnow

copy_range ignores these limitations, let's improve it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/io.c | 44 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index 06305c6ea6..45b1e1f76e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3005,11 +3005,13 @@ static int coroutine_fn bdrv_co_copy_range_internal(
 {
     BdrvTrackedRequest req;
     int ret;
+    uint32_t align, max_transfer;
 
     /* TODO We can support BDRV_REQ_NO_FALLBACK here */
     assert(!(read_flags & BDRV_REQ_NO_FALLBACK));
     assert(!(write_flags & BDRV_REQ_NO_FALLBACK));
 
+
     if (!dst || !dst->bs) {
         return -ENOMEDIUM;
     }
@@ -3029,9 +3031,19 @@ static int coroutine_fn bdrv_co_copy_range_internal(
         return ret;
     }
 
+    align = MAX(src->bs->bl.request_alignment, dst->bs->bl.request_alignment);
+    max_transfer =
+            QEMU_ALIGN_DOWN(MIN_NON_ZERO(MIN_NON_ZERO(src->bs->bl.max_transfer,
+                                                      dst->bs->bl.max_transfer),
+                                         INT_MAX), align);
+
     if (!src->bs->drv->bdrv_co_copy_range_from
         || !dst->bs->drv->bdrv_co_copy_range_to
-        || src->bs->encrypted || dst->bs->encrypted) {
+        || src->bs->encrypted || dst->bs->encrypted ||
+        (max_transfer == 0 && bytes > 0) ||
+        !QEMU_IS_ALIGNED(src_offset, src->bs->bl.request_alignment) ||
+        !QEMU_IS_ALIGNED(dst_offset, dst->bs->bl.request_alignment) ||
+        !QEMU_IS_ALIGNED(bytes, align)) {
         return -ENOTSUP;
     }
 
@@ -3046,11 +3058,22 @@ static int coroutine_fn bdrv_co_copy_range_internal(
             wait_serialising_requests(&req);
         }
 
-        ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
-                                                    src, src_offset,
-                                                    dst, dst_offset,
-                                                    bytes,
-                                                    read_flags, write_flags);
+        while (bytes) {
+            int num = MIN(bytes, max_transfer);
+
+            ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
+                                                        src, src_offset,
+                                                        dst, dst_offset,
+                                                        num,
+                                                        read_flags,
+                                                        write_flags);
+            if (ret < 0) {
+                break;
+            }
+            bytes -= num;
+            src_offset += num;
+            dst_offset += num;
+        }
 
         tracked_request_end(&req);
         bdrv_dec_in_flight(src->bs);
@@ -3060,12 +3083,17 @@ static int coroutine_fn bdrv_co_copy_range_internal(
                               BDRV_TRACKED_WRITE);
         ret = bdrv_co_write_req_prepare(dst, dst_offset, bytes, &req,
                                         write_flags);
-        if (!ret) {
+        while (!ret && bytes) {
+            int num = MIN(bytes, max_transfer);
+
             ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
                                                       src, src_offset,
                                                       dst, dst_offset,
-                                                      bytes,
+                                                      num,
                                                       read_flags, write_flags);
+            bytes -= num;
+            src_offset += num;
+            dst_offset += num;
         }
         bdrv_co_write_req_finish(dst, dst_offset, bytes, &req, ret);
         tracked_request_end(&req);
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 4/7] block/backup: drop handling of max_transfer for copy_range
  2019-08-10 19:31 [Qemu-devel] [PATCH v3 0/7] backup improvements Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 3/7] block/io: handle alignment and max_transfer for copy_range Vladimir Sementsov-Ogievskiy
@ 2019-08-10 19:31 ` Vladimir Sementsov-Ogievskiy
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 5/7] block/backup: fix backup_cow_with_offload for last cluster Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-10 19:31 UTC (permalink / raw)
  To: qemu-block
  Cc: fam, kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den, jsnow

Since previous commit, copy_range supports max_transfer, so we don't
need to handle it by hand.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/backup.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index c6a3b2b7bb..228ba9423c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -54,7 +54,6 @@ typedef struct BackupBlockJob {
     QLIST_HEAD(, CowRequest) inflight_reqs;
 
     bool use_copy_range;
-    int64_t copy_range_size;
 
     BdrvRequestFlags write_flags;
     bool initializing_bitmap;
@@ -156,12 +155,11 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     int ret;
     int nr_clusters;
     BlockBackend *blk = job->common.blk;
-    int nbytes;
+    int nbytes = end - start;
     int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
 
-    assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
+    assert(end - start < INT_MAX);
     assert(QEMU_IS_ALIGNED(start, job->cluster_size));
-    nbytes = MIN(job->copy_range_size, end - start);
     nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size);
     bdrv_reset_dirty_bitmap(job->copy_bitmap, start,
                             job->cluster_size * nr_clusters);
@@ -756,11 +754,6 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     job->copy_bitmap = copy_bitmap;
     copy_bitmap = NULL;
     job->use_copy_range = !compress; /* compression isn't supported for it */
-    job->copy_range_size = MIN_NON_ZERO(blk_get_max_transfer(job->common.blk),
-                                        blk_get_max_transfer(job->target));
-    job->copy_range_size = MAX(job->cluster_size,
-                               QEMU_ALIGN_UP(job->copy_range_size,
-                                             job->cluster_size));
 
     /* Required permissions are already taken with target's blk_new() */
     block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 5/7] block/backup: fix backup_cow_with_offload for last cluster
  2019-08-10 19:31 [Qemu-devel] [PATCH v3 0/7] backup improvements Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 4/7] block/backup: drop handling of " Vladimir Sementsov-Ogievskiy
@ 2019-08-10 19:31 ` Vladimir Sementsov-Ogievskiy
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once Vladimir Sementsov-Ogievskiy
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 7/7] block/backup: merge duplicated logic into backup_do_cow Vladimir Sementsov-Ogievskiy
  6 siblings, 0 replies; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-10 19:31 UTC (permalink / raw)
  To: qemu-block
  Cc: fam, kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den, jsnow

We shouldn't try to copy bytes beyond EOF. Fix it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/backup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/backup.c b/block/backup.c
index 228ba9423c..d482d93458 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -155,7 +155,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     int ret;
     int nr_clusters;
     BlockBackend *blk = job->common.blk;
-    int nbytes = end - start;
+    int nbytes = MIN(end - start, job->len - start);
     int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
 
     assert(end - start < INT_MAX);
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-10 19:31 [Qemu-devel] [PATCH v3 0/7] backup improvements Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 5/7] block/backup: fix backup_cow_with_offload for last cluster Vladimir Sementsov-Ogievskiy
@ 2019-08-10 19:31 ` Vladimir Sementsov-Ogievskiy
  2019-08-12 15:10   ` Max Reitz
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 7/7] block/backup: merge duplicated logic into backup_do_cow Vladimir Sementsov-Ogievskiy
  6 siblings, 1 reply; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-10 19:31 UTC (permalink / raw)
  To: qemu-block
  Cc: fam, kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den, jsnow

backup_cow_with_offload can transfer more than one cluster. Let
backup_cow_with_bounce_buffer behave similarly. It reduces the number
of IO requests, since there is no need to copy cluster by cluster.

Logic around bounce_buffer allocation changed: we can't just allocate
one-cluster-sized buffer to share for all iterations. We can't also
allocate buffer of full-request length it may be too large, so
BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
is to allocate a buffer sufficient to handle all remaining iterations
at the point where we need the buffer for the first time.

Bonus: get rid of pointer-to-pointer.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 41 insertions(+), 24 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index d482d93458..65f7212c85 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -27,6 +27,7 @@
 #include "qemu/error-report.h"
 
 #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
+#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
 
 typedef struct CowRequest {
     int64_t start_byte;
@@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
     qemu_co_queue_restart_all(&req->wait_queue);
 }
 
-/* Copy range to target with a bounce buffer and return the bytes copied. If
- * error occurred, return a negative error number */
+/*
+ * Copy range to target with a bounce buffer and return the bytes copied. If
+ * error occurred, return a negative error number
+ *
+ * @bounce_buffer is assumed to enough to store
+ * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
+ */
 static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
                                                       int64_t start,
                                                       int64_t end,
                                                       bool is_write_notifier,
                                                       bool *error_is_read,
-                                                      void **bounce_buffer)
+                                                      void *bounce_buffer)
 {
     int ret;
     BlockBackend *blk = job->common.blk;
-    int nbytes;
+    int nbytes, remaining_bytes;
     int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
 
     assert(QEMU_IS_ALIGNED(start, job->cluster_size));
-    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
-    nbytes = MIN(job->cluster_size, job->len - start);
-    if (!*bounce_buffer) {
-        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
-    }
+    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
+    nbytes = MIN(end - start, job->len - start);
 
-    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
-    if (ret < 0) {
-        trace_backup_do_cow_read_fail(job, start, ret);
-        if (error_is_read) {
-            *error_is_read = true;
+
+    remaining_bytes = nbytes;
+    while (remaining_bytes) {
+        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
+
+        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
+        if (ret < 0) {
+            trace_backup_do_cow_read_fail(job, start, ret);
+            if (error_is_read) {
+                *error_is_read = true;
+            }
+            goto fail;
         }
-        goto fail;
-    }
 
-    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
-                        job->write_flags);
-    if (ret < 0) {
-        trace_backup_do_cow_write_fail(job, start, ret);
-        if (error_is_read) {
-            *error_is_read = false;
+        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
+                            job->write_flags);
+        if (ret < 0) {
+            trace_backup_do_cow_write_fail(job, start, ret);
+            if (error_is_read) {
+                *error_is_read = false;
+            }
+            goto fail;
         }
-        goto fail;
+
+        start += chunk;
+        remaining_bytes -= chunk;
     }
 
     return nbytes;
@@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
             }
         }
         if (!job->use_copy_range) {
+            if (!bounce_buffer) {
+                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
+                                 MAX(dirty_end - start, end - dirty_end));
+                bounce_buffer = blk_try_blockalign(job->common.blk, len);
+            }
             ret = backup_cow_with_bounce_buffer(job, start, dirty_end,
                                                 is_write_notifier,
-                                                error_is_read, &bounce_buffer);
+                                                error_is_read, bounce_buffer);
         }
         if (ret < 0) {
             break;
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 7/7] block/backup: merge duplicated logic into backup_do_cow
  2019-08-10 19:31 [Qemu-devel] [PATCH v3 0/7] backup improvements Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once Vladimir Sementsov-Ogievskiy
@ 2019-08-10 19:31 ` Vladimir Sementsov-Ogievskiy
  6 siblings, 0 replies; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-10 19:31 UTC (permalink / raw)
  To: qemu-block
  Cc: fam, kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den, jsnow

backup_cow_with_offload and backup_cow_with_bounce_buffer contains a
lot of duplicated logic. Move it into backup_do_cow.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/backup.c | 97 ++++++++++++++++++++------------------------------
 1 file changed, 38 insertions(+), 59 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 65f7212c85..0ac31c2760 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -104,87 +104,61 @@ static void cow_request_end(CowRequest *req)
  * error occurred, return a negative error number
  *
  * @bounce_buffer is assumed to enough to store
- * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
+ * MIN(BACKUP_MAX_BOUNCE_BUFFER, @bytes) bytes
  */
-static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
-                                                      int64_t start,
-                                                      int64_t end,
-                                                      bool is_write_notifier,
-                                                      bool *error_is_read,
-                                                      void *bounce_buffer)
+static int coroutine_fn backup_cow_with_bounce_buffer(
+        BackupBlockJob *job, int64_t offset, int64_t bytes,
+        BdrvRequestFlags read_flags, bool *error_is_read, void *bounce_buffer)
 {
-    int ret;
     BlockBackend *blk = job->common.blk;
-    int nbytes, remaining_bytes;
-    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-
-    assert(QEMU_IS_ALIGNED(start, job->cluster_size));
-    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
-    nbytes = MIN(end - start, job->len - start);
-
 
-    remaining_bytes = nbytes;
-    while (remaining_bytes) {
-        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
+    while (bytes) {
+        int ret;
+        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, bytes);
 
-        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
+        ret = blk_co_pread(blk, offset, chunk, bounce_buffer, read_flags);
         if (ret < 0) {
-            trace_backup_do_cow_read_fail(job, start, ret);
+            trace_backup_do_cow_read_fail(job, offset, ret);
             if (error_is_read) {
                 *error_is_read = true;
             }
-            goto fail;
+            return ret;
         }
 
-        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
+        ret = blk_co_pwrite(job->target, offset, chunk, bounce_buffer,
                             job->write_flags);
         if (ret < 0) {
-            trace_backup_do_cow_write_fail(job, start, ret);
+            trace_backup_do_cow_write_fail(job, offset, ret);
             if (error_is_read) {
                 *error_is_read = false;
             }
-            goto fail;
+            return ret;
         }
 
-        start += chunk;
-        remaining_bytes -= chunk;
+        offset += chunk;
+        bytes -= chunk;
     }
 
-    return nbytes;
-fail:
-    bdrv_set_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
-    return ret;
-
+    return 0;
 }
 
 /* Copy range to target and return the bytes copied. If error occurred, return a
  * negative error number. */
 static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
-                                                int64_t start,
-                                                int64_t end,
-                                                bool is_write_notifier)
+                                                int64_t offset,
+                                                int64_t bytes,
+                                                BdrvRequestFlags read_flags)
 {
     int ret;
-    int nr_clusters;
     BlockBackend *blk = job->common.blk;
-    int nbytes = MIN(end - start, job->len - start);
-    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-
-    assert(end - start < INT_MAX);
-    assert(QEMU_IS_ALIGNED(start, job->cluster_size));
-    nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size);
-    bdrv_reset_dirty_bitmap(job->copy_bitmap, start,
-                            job->cluster_size * nr_clusters);
-    ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
+
+    ret = blk_co_copy_range(blk, offset, job->target, offset, bytes,
                             read_flags, job->write_flags);
     if (ret < 0) {
-        trace_backup_do_cow_copy_range_fail(job, start, ret);
-        bdrv_set_dirty_bitmap(job->copy_bitmap, start,
-                              job->cluster_size * nr_clusters);
-        return ret;
+        trace_backup_do_cow_copy_range_fail(job, offset, ret);
     }
 
-    return nbytes;
+    return ret;
 }
 
 /*
@@ -268,6 +242,8 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     int64_t start, end; /* bytes */
     void *bounce_buffer = NULL;
     int64_t skip_bytes;
+    BdrvRequestFlags read_flags =
+            is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
 
     qemu_co_rwlock_rdlock(&job->flush_rwlock);
 
@@ -281,6 +257,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 
     while (start < end) {
         int64_t dirty_end;
+        int64_t cur_bytes;
 
         if (!bdrv_dirty_bitmap_get(job->copy_bitmap, start)) {
             trace_backup_do_cow_skip(job, start);
@@ -304,10 +281,11 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
         }
 
         trace_backup_do_cow_process(job, start);
+        cur_bytes = MIN(dirty_end - start, job->len - start);
+        bdrv_reset_dirty_bitmap(job->copy_bitmap, start, dirty_end - start);
 
         if (job->use_copy_range) {
-            ret = backup_cow_with_offload(job, start, dirty_end,
-                                          is_write_notifier);
+            ret = backup_cow_with_offload(job, start, cur_bytes, read_flags);
             if (ret < 0) {
                 job->use_copy_range = false;
             }
@@ -315,24 +293,25 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
         if (!job->use_copy_range) {
             if (!bounce_buffer) {
                 size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
-                                 MAX(dirty_end - start, end - dirty_end));
+                                 MAX(cur_bytes,
+                                     end - start - cur_bytes));
                 bounce_buffer = blk_try_blockalign(job->common.blk, len);
             }
-            ret = backup_cow_with_bounce_buffer(job, start, dirty_end,
-                                                is_write_notifier,
-                                                error_is_read, bounce_buffer);
+            ret = backup_cow_with_bounce_buffer(job, start, cur_bytes,
+                                                read_flags, error_is_read,
+                                                bounce_buffer);
         }
         if (ret < 0) {
+            bdrv_set_dirty_bitmap(job->copy_bitmap, start, dirty_end - start);
             break;
         }
 
         /* Publish progress, guest I/O counts as progress too.  Note that the
          * offset field is an opaque progress value, it is not a disk offset.
          */
-        start += ret;
-        job->bytes_read += ret;
-        job_progress_update(&job->common.job, ret);
-        ret = 0;
+        start += cur_bytes;
+        job->bytes_read += cur_bytes;
+        job_progress_update(&job->common.job, cur_bytes);
     }
 
     if (bounce_buffer) {
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/7] block/io: handle alignment and max_transfer for copy_range
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 3/7] block/io: handle alignment and max_transfer for copy_range Vladimir Sementsov-Ogievskiy
@ 2019-08-12 14:48   ` Max Reitz
  2019-08-20 15:18     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 24+ messages in thread
From: Max Reitz @ 2019-08-12 14:48 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 366 bytes --]

On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
> copy_range ignores these limitations, let's improve it.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/io.c | 44 ++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 36 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once Vladimir Sementsov-Ogievskiy
@ 2019-08-12 15:10   ` Max Reitz
  2019-08-12 15:47     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 24+ messages in thread
From: Max Reitz @ 2019-08-12 15:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 5719 bytes --]

On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
> backup_cow_with_offload can transfer more than one cluster. Let
> backup_cow_with_bounce_buffer behave similarly. It reduces the number
> of IO requests, since there is no need to copy cluster by cluster.
> 
> Logic around bounce_buffer allocation changed: we can't just allocate
> one-cluster-sized buffer to share for all iterations. We can't also
> allocate buffer of full-request length it may be too large, so
> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
> is to allocate a buffer sufficient to handle all remaining iterations
> at the point where we need the buffer for the first time.
> 
> Bonus: get rid of pointer-to-pointer.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>  1 file changed, 41 insertions(+), 24 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index d482d93458..65f7212c85 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -27,6 +27,7 @@
>  #include "qemu/error-report.h"
>  
>  #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>  
>  typedef struct CowRequest {
>      int64_t start_byte;
> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>      qemu_co_queue_restart_all(&req->wait_queue);
>  }
>  
> -/* Copy range to target with a bounce buffer and return the bytes copied. If
> - * error occurred, return a negative error number */
> +/*
> + * Copy range to target with a bounce buffer and return the bytes copied. If
> + * error occurred, return a negative error number
> + *
> + * @bounce_buffer is assumed to enough to store

s/to/to be/

> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
> + */
>  static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>                                                        int64_t start,
>                                                        int64_t end,
>                                                        bool is_write_notifier,
>                                                        bool *error_is_read,
> -                                                      void **bounce_buffer)
> +                                                      void *bounce_buffer)
>  {
>      int ret;
>      BlockBackend *blk = job->common.blk;
> -    int nbytes;
> +    int nbytes, remaining_bytes;
>      int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>  
>      assert(QEMU_IS_ALIGNED(start, job->cluster_size));
> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
> -    nbytes = MIN(job->cluster_size, job->len - start);
> -    if (!*bounce_buffer) {
> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
> -    }
> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
> +    nbytes = MIN(end - start, job->len - start);
>  
> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
> -    if (ret < 0) {
> -        trace_backup_do_cow_read_fail(job, start, ret);
> -        if (error_is_read) {
> -            *error_is_read = true;
> +
> +    remaining_bytes = nbytes;
> +    while (remaining_bytes) {
> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
> +
> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
> +        if (ret < 0) {
> +            trace_backup_do_cow_read_fail(job, start, ret);
> +            if (error_is_read) {
> +                *error_is_read = true;
> +            }
> +            goto fail;
>          }
> -        goto fail;
> -    }
>  
> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
> -                        job->write_flags);
> -    if (ret < 0) {
> -        trace_backup_do_cow_write_fail(job, start, ret);
> -        if (error_is_read) {
> -            *error_is_read = false;
> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
> +                            job->write_flags);
> +        if (ret < 0) {
> +            trace_backup_do_cow_write_fail(job, start, ret);
> +            if (error_is_read) {
> +                *error_is_read = false;
> +            }
> +            goto fail;
>          }
> -        goto fail;
> +
> +        start += chunk;
> +        remaining_bytes -= chunk;
>      }
>  
>      return nbytes;
> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>              }
>          }
>          if (!job->use_copy_range) {
> +            if (!bounce_buffer) {
> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
> +                                 MAX(dirty_end - start, end - dirty_end));
> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
> +            }

If you use _try_, you should probably also check whether it succeeded.

Anyway, I wonder whether it’d be better to just allocate this buffer
once per job (the first time we get here, probably) to be of size
BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
a buf-size parameter similar to what the mirror jobs have.)

Max

>              ret = backup_cow_with_bounce_buffer(job, start, dirty_end,
>                                                  is_write_notifier,
> -                                                error_is_read, &bounce_buffer);
> +                                                error_is_read, bounce_buffer);
>          }
>          if (ret < 0) {
>              break;
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-12 15:10   ` Max Reitz
@ 2019-08-12 15:47     ` Vladimir Sementsov-Ogievskiy
  2019-08-12 16:11       ` Max Reitz
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-12 15:47 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow

12.08.2019 18:10, Max Reitz wrote:
> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>> backup_cow_with_offload can transfer more than one cluster. Let
>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>> of IO requests, since there is no need to copy cluster by cluster.
>>
>> Logic around bounce_buffer allocation changed: we can't just allocate
>> one-cluster-sized buffer to share for all iterations. We can't also
>> allocate buffer of full-request length it may be too large, so
>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>> is to allocate a buffer sufficient to handle all remaining iterations
>> at the point where we need the buffer for the first time.
>>
>> Bonus: get rid of pointer-to-pointer.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>   1 file changed, 41 insertions(+), 24 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index d482d93458..65f7212c85 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -27,6 +27,7 @@
>>   #include "qemu/error-report.h"
>>   
>>   #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>   
>>   typedef struct CowRequest {
>>       int64_t start_byte;
>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>       qemu_co_queue_restart_all(&req->wait_queue);
>>   }
>>   
>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>> - * error occurred, return a negative error number */
>> +/*
>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>> + * error occurred, return a negative error number
>> + *
>> + * @bounce_buffer is assumed to enough to store
> 
> s/to/to be/
> 
>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>> + */
>>   static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>                                                         int64_t start,
>>                                                         int64_t end,
>>                                                         bool is_write_notifier,
>>                                                         bool *error_is_read,
>> -                                                      void **bounce_buffer)
>> +                                                      void *bounce_buffer)
>>   {
>>       int ret;
>>       BlockBackend *blk = job->common.blk;
>> -    int nbytes;
>> +    int nbytes, remaining_bytes;
>>       int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>   
>>       assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>> -    nbytes = MIN(job->cluster_size, job->len - start);
>> -    if (!*bounce_buffer) {
>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>> -    }
>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>> +    nbytes = MIN(end - start, job->len - start);
>>   
>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>> -    if (ret < 0) {
>> -        trace_backup_do_cow_read_fail(job, start, ret);
>> -        if (error_is_read) {
>> -            *error_is_read = true;
>> +
>> +    remaining_bytes = nbytes;
>> +    while (remaining_bytes) {
>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>> +
>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>> +        if (ret < 0) {
>> +            trace_backup_do_cow_read_fail(job, start, ret);
>> +            if (error_is_read) {
>> +                *error_is_read = true;
>> +            }
>> +            goto fail;
>>           }
>> -        goto fail;
>> -    }
>>   
>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>> -                        job->write_flags);
>> -    if (ret < 0) {
>> -        trace_backup_do_cow_write_fail(job, start, ret);
>> -        if (error_is_read) {
>> -            *error_is_read = false;
>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>> +                            job->write_flags);
>> +        if (ret < 0) {
>> +            trace_backup_do_cow_write_fail(job, start, ret);
>> +            if (error_is_read) {
>> +                *error_is_read = false;
>> +            }
>> +            goto fail;
>>           }
>> -        goto fail;
>> +
>> +        start += chunk;
>> +        remaining_bytes -= chunk;
>>       }
>>   
>>       return nbytes;
>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>               }
>>           }
>>           if (!job->use_copy_range) {
>> +            if (!bounce_buffer) {
>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>> +                                 MAX(dirty_end - start, end - dirty_end));
>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>> +            }
> 
> If you use _try_, you should probably also check whether it succeeded.

Oops, you are right, of course.

> 
> Anyway, I wonder whether it’d be better to just allocate this buffer
> once per job (the first time we get here, probably) to be of size
> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
> a buf-size parameter similar to what the mirror jobs have.)
> 

Once per job will not work, as we may have several guest writes in parallel and therefore
several parallel copy-before-write operations. Or if you mean writing an allocator based
on once-allocated buffer like in mirror, I really dislike this idea, as we already have
allocator: memalign/malloc/free and it works well, no reason to invent a new one and
hardcode it into block layer (look at my answer to Eric on v2 of this patch for more info).

Or, if you mean only backup_loop generated copy-requests, yes we may keep only one buffer for them,
but:
1. it is not how it works now, so my patch is not a degradation in this case
2. I'm going to parallelize backup loop too, like my series "qcow2: async handling of fragmented io",
    which will need several allocated buffers anyway.

> 
>>               ret = backup_cow_with_bounce_buffer(job, start, dirty_end,
>>                                                   is_write_notifier,
>> -                                                error_is_read, &bounce_buffer);
>> +                                                error_is_read, bounce_buffer);
>>           }
>>           if (ret < 0) {
>>               break;
>>
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-12 15:47     ` Vladimir Sementsov-Ogievskiy
@ 2019-08-12 16:11       ` Max Reitz
  2019-08-12 16:37         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 24+ messages in thread
From: Max Reitz @ 2019-08-12 16:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 7407 bytes --]

On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
> 12.08.2019 18:10, Max Reitz wrote:
>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>> backup_cow_with_offload can transfer more than one cluster. Let
>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>> of IO requests, since there is no need to copy cluster by cluster.
>>>
>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>> one-cluster-sized buffer to share for all iterations. We can't also
>>> allocate buffer of full-request length it may be too large, so
>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>> is to allocate a buffer sufficient to handle all remaining iterations
>>> at the point where we need the buffer for the first time.
>>>
>>> Bonus: get rid of pointer-to-pointer.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>   1 file changed, 41 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index d482d93458..65f7212c85 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -27,6 +27,7 @@
>>>   #include "qemu/error-report.h"
>>>   
>>>   #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>   
>>>   typedef struct CowRequest {
>>>       int64_t start_byte;
>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>       qemu_co_queue_restart_all(&req->wait_queue);
>>>   }
>>>   
>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>> - * error occurred, return a negative error number */
>>> +/*
>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>> + * error occurred, return a negative error number
>>> + *
>>> + * @bounce_buffer is assumed to enough to store
>>
>> s/to/to be/
>>
>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>> + */
>>>   static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>                                                         int64_t start,
>>>                                                         int64_t end,
>>>                                                         bool is_write_notifier,
>>>                                                         bool *error_is_read,
>>> -                                                      void **bounce_buffer)
>>> +                                                      void *bounce_buffer)
>>>   {
>>>       int ret;
>>>       BlockBackend *blk = job->common.blk;
>>> -    int nbytes;
>>> +    int nbytes, remaining_bytes;
>>>       int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>   
>>>       assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>> -    if (!*bounce_buffer) {
>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>> -    }
>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>> +    nbytes = MIN(end - start, job->len - start);
>>>   
>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>> -    if (ret < 0) {
>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>> -        if (error_is_read) {
>>> -            *error_is_read = true;
>>> +
>>> +    remaining_bytes = nbytes;
>>> +    while (remaining_bytes) {
>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>> +
>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>> +        if (ret < 0) {
>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>> +            if (error_is_read) {
>>> +                *error_is_read = true;
>>> +            }
>>> +            goto fail;
>>>           }
>>> -        goto fail;
>>> -    }
>>>   
>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>> -                        job->write_flags);
>>> -    if (ret < 0) {
>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>> -        if (error_is_read) {
>>> -            *error_is_read = false;
>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>> +                            job->write_flags);
>>> +        if (ret < 0) {
>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>> +            if (error_is_read) {
>>> +                *error_is_read = false;
>>> +            }
>>> +            goto fail;
>>>           }
>>> -        goto fail;
>>> +
>>> +        start += chunk;
>>> +        remaining_bytes -= chunk;
>>>       }
>>>   
>>>       return nbytes;
>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>               }
>>>           }
>>>           if (!job->use_copy_range) {
>>> +            if (!bounce_buffer) {
>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>> +            }
>>
>> If you use _try_, you should probably also check whether it succeeded.
> 
> Oops, you are right, of course.
> 
>>
>> Anyway, I wonder whether it’d be better to just allocate this buffer
>> once per job (the first time we get here, probably) to be of size
>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>> a buf-size parameter similar to what the mirror jobs have.)
>>
> 
> Once per job will not work, as we may have several guest writes in parallel and therefore
> several parallel copy-before-write operations.

Hm.  I’m not quite happy with that because if the guest just issues many
large discards in parallel, this means that qemu will allocate a large
amount of memory.

It would be nice if there was a simple way to keep track of the total
memory usage and let requests yield if they would exceed it.

> Or if you mean writing an allocator based
> on once-allocated buffer like in mirror, I really dislike this idea, as we already have
> allocator: memalign/malloc/free and it works well, no reason to invent a new one and
> hardcode it into block layer (look at my answer to Eric on v2 of this patch for more info).

Well, at least it’d be something we can delay until blockdev-copy
arrives(TM).

Max

> Or, if you mean only backup_loop generated copy-requests, yes we may keep only one buffer for them,
> but:
> 1. it is not how it works now, so my patch is not a degradation in this case
> 2. I'm going to parallelize backup loop too, like my series "qcow2: async handling of fragmented io",
>     which will need several allocated buffers anyway.
> 
>>
>>>               ret = backup_cow_with_bounce_buffer(job, start, dirty_end,
>>>                                                   is_write_notifier,
>>> -                                                error_is_read, &bounce_buffer);
>>> +                                                error_is_read, bounce_buffer);
>>>           }
>>>           if (ret < 0) {
>>>               break;
>>>
>>
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-12 16:11       ` Max Reitz
@ 2019-08-12 16:37         ` Vladimir Sementsov-Ogievskiy
  2019-08-13 14:14           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-12 16:37 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow

12.08.2019 19:11, Max Reitz wrote:
> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>> 12.08.2019 18:10, Max Reitz wrote:
>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>
>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>> allocate buffer of full-request length it may be too large, so
>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>> at the point where we need the buffer for the first time.
>>>>
>>>> Bonus: get rid of pointer-to-pointer.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>>    1 file changed, 41 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/block/backup.c b/block/backup.c
>>>> index d482d93458..65f7212c85 100644
>>>> --- a/block/backup.c
>>>> +++ b/block/backup.c
>>>> @@ -27,6 +27,7 @@
>>>>    #include "qemu/error-report.h"
>>>>    
>>>>    #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>    
>>>>    typedef struct CowRequest {
>>>>        int64_t start_byte;
>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>        qemu_co_queue_restart_all(&req->wait_queue);
>>>>    }
>>>>    
>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>>> - * error occurred, return a negative error number */
>>>> +/*
>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>>> + * error occurred, return a negative error number
>>>> + *
>>>> + * @bounce_buffer is assumed to enough to store
>>>
>>> s/to/to be/
>>>
>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>> + */
>>>>    static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>                                                          int64_t start,
>>>>                                                          int64_t end,
>>>>                                                          bool is_write_notifier,
>>>>                                                          bool *error_is_read,
>>>> -                                                      void **bounce_buffer)
>>>> +                                                      void *bounce_buffer)
>>>>    {
>>>>        int ret;
>>>>        BlockBackend *blk = job->common.blk;
>>>> -    int nbytes;
>>>> +    int nbytes, remaining_bytes;
>>>>        int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>>    
>>>>        assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>> -    if (!*bounce_buffer) {
>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>> -    }
>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>    
>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>>> -    if (ret < 0) {
>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>> -        if (error_is_read) {
>>>> -            *error_is_read = true;
>>>> +
>>>> +    remaining_bytes = nbytes;
>>>> +    while (remaining_bytes) {
>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>> +
>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>>> +        if (ret < 0) {
>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>> +            if (error_is_read) {
>>>> +                *error_is_read = true;
>>>> +            }
>>>> +            goto fail;
>>>>            }
>>>> -        goto fail;
>>>> -    }
>>>>    
>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>> -                        job->write_flags);
>>>> -    if (ret < 0) {
>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>> -        if (error_is_read) {
>>>> -            *error_is_read = false;
>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>> +                            job->write_flags);
>>>> +        if (ret < 0) {
>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>> +            if (error_is_read) {
>>>> +                *error_is_read = false;
>>>> +            }
>>>> +            goto fail;
>>>>            }
>>>> -        goto fail;
>>>> +
>>>> +        start += chunk;
>>>> +        remaining_bytes -= chunk;
>>>>        }
>>>>    
>>>>        return nbytes;
>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>>                }
>>>>            }
>>>>            if (!job->use_copy_range) {
>>>> +            if (!bounce_buffer) {
>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>>> +            }
>>>
>>> If you use _try_, you should probably also check whether it succeeded.
>>
>> Oops, you are right, of course.
>>
>>>
>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>> once per job (the first time we get here, probably) to be of size
>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>>> a buf-size parameter similar to what the mirror jobs have.)
>>>
>>
>> Once per job will not work, as we may have several guest writes in parallel and therefore
>> several parallel copy-before-write operations.
> 
> Hm.  I’m not quite happy with that because if the guest just issues many
> large discards in parallel, this means that qemu will allocate a large
> amount of memory.
> 
> It would be nice if there was a simple way to keep track of the total
> memory usage and let requests yield if they would exceed it.

Agree, it should be fixed anyway.

> 
>> Or if you mean writing an allocator based
>> on once-allocated buffer like in mirror, I really dislike this idea, as we already have
>> allocator: memalign/malloc/free and it works well, no reason to invent a new one and
>> hardcode it into block layer (look at my answer to Eric on v2 of this patch for more info).
> 
> Well, at least it’d be something we can delay until blockdev-copy
> arrives(TM).
> 
> Max
> 
>> Or, if you mean only backup_loop generated copy-requests, yes we may keep only one buffer for them,
>> but:
>> 1. it is not how it works now, so my patch is not a degradation in this case
>> 2. I'm going to parallelize backup loop too, like my series "qcow2: async handling of fragmented io",
>>      which will need several allocated buffers anyway.
>>
>>>
>>>>                ret = backup_cow_with_bounce_buffer(job, start, dirty_end,
>>>>                                                    is_write_notifier,
>>>> -                                                error_is_read, &bounce_buffer);
>>>> +                                                error_is_read, bounce_buffer);
>>>>            }
>>>>            if (ret < 0) {
>>>>                break;
>>>>
>>>
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-12 16:37         ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 14:14           ` Vladimir Sementsov-Ogievskiy
  2019-08-13 14:23             ` Max Reitz
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 14:14 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow

12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
> 12.08.2019 19:11, Max Reitz wrote:
>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>> 12.08.2019 18:10, Max Reitz wrote:
>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>
>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>> allocate buffer of full-request length it may be too large, so
>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>> at the point where we need the buffer for the first time.
>>>>>
>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>    block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>>>    1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>
>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>> index d482d93458..65f7212c85 100644
>>>>> --- a/block/backup.c
>>>>> +++ b/block/backup.c
>>>>> @@ -27,6 +27,7 @@
>>>>>    #include "qemu/error-report.h"
>>>>>    #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>    typedef struct CowRequest {
>>>>>        int64_t start_byte;
>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>        qemu_co_queue_restart_all(&req->wait_queue);
>>>>>    }
>>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>>>> - * error occurred, return a negative error number */
>>>>> +/*
>>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>>>> + * error occurred, return a negative error number
>>>>> + *
>>>>> + * @bounce_buffer is assumed to enough to store
>>>>
>>>> s/to/to be/
>>>>
>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>> + */
>>>>>    static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>                                                          int64_t start,
>>>>>                                                          int64_t end,
>>>>>                                                          bool is_write_notifier,
>>>>>                                                          bool *error_is_read,
>>>>> -                                                      void **bounce_buffer)
>>>>> +                                                      void *bounce_buffer)
>>>>>    {
>>>>>        int ret;
>>>>>        BlockBackend *blk = job->common.blk;
>>>>> -    int nbytes;
>>>>> +    int nbytes, remaining_bytes;
>>>>>        int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>>>        assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>> -    if (!*bounce_buffer) {
>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>> -    }
>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>>>> -    if (ret < 0) {
>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>> -        if (error_is_read) {
>>>>> -            *error_is_read = true;
>>>>> +
>>>>> +    remaining_bytes = nbytes;
>>>>> +    while (remaining_bytes) {
>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>> +
>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>>>> +        if (ret < 0) {
>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>> +            if (error_is_read) {
>>>>> +                *error_is_read = true;
>>>>> +            }
>>>>> +            goto fail;
>>>>>            }
>>>>> -        goto fail;
>>>>> -    }
>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>> -                        job->write_flags);
>>>>> -    if (ret < 0) {
>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>> -        if (error_is_read) {
>>>>> -            *error_is_read = false;
>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>>> +                            job->write_flags);
>>>>> +        if (ret < 0) {
>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>> +            if (error_is_read) {
>>>>> +                *error_is_read = false;
>>>>> +            }
>>>>> +            goto fail;
>>>>>            }
>>>>> -        goto fail;
>>>>> +
>>>>> +        start += chunk;
>>>>> +        remaining_bytes -= chunk;
>>>>>        }
>>>>>        return nbytes;
>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>>>                }
>>>>>            }
>>>>>            if (!job->use_copy_range) {
>>>>> +            if (!bounce_buffer) {
>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>>>> +            }
>>>>
>>>> If you use _try_, you should probably also check whether it succeeded.
>>>
>>> Oops, you are right, of course.
>>>
>>>>
>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>> once per job (the first time we get here, probably) to be of size
>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>
>>>
>>> Once per job will not work, as we may have several guest writes in parallel and therefore
>>> several parallel copy-before-write operations.
>>
>> Hm.  I’m not quite happy with that because if the guest just issues many
>> large discards in parallel, this means that qemu will allocate a large
>> amount of memory.
>>
>> It would be nice if there was a simple way to keep track of the total
>> memory usage and let requests yield if they would exceed it.
> 
> Agree, it should be fixed anyway.
> 


But still..

Synchronous mirror allocates full-request buffers on guest write. Is it correct?

If we assume that it is correct to double memory usage of guest operations, than for backup
the problem is only in write_zero and discard where guest-assumed memory usage should be zero.
And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
improvement to be after backup-top filter merged.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-13 14:14           ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 14:23             ` Max Reitz
  2019-08-13 14:39               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 24+ messages in thread
From: Max Reitz @ 2019-08-13 14:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 8600 bytes --]

On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
>> 12.08.2019 19:11, Max Reitz wrote:
>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>>> 12.08.2019 18:10, Max Reitz wrote:
>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>>
>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>>> allocate buffer of full-request length it may be too large, so
>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>>> at the point where we need the buffer for the first time.
>>>>>>
>>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>> ---
>>>>>>    block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>>>>    1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>>
>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>> index d482d93458..65f7212c85 100644
>>>>>> --- a/block/backup.c
>>>>>> +++ b/block/backup.c
>>>>>> @@ -27,6 +27,7 @@
>>>>>>    #include "qemu/error-report.h"
>>>>>>    #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>>    typedef struct CowRequest {
>>>>>>        int64_t start_byte;
>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>>        qemu_co_queue_restart_all(&req->wait_queue);
>>>>>>    }
>>>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>> - * error occurred, return a negative error number */
>>>>>> +/*
>>>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>> + * error occurred, return a negative error number
>>>>>> + *
>>>>>> + * @bounce_buffer is assumed to enough to store
>>>>>
>>>>> s/to/to be/
>>>>>
>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>>> + */
>>>>>>    static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>>                                                          int64_t start,
>>>>>>                                                          int64_t end,
>>>>>>                                                          bool is_write_notifier,
>>>>>>                                                          bool *error_is_read,
>>>>>> -                                                      void **bounce_buffer)
>>>>>> +                                                      void *bounce_buffer)
>>>>>>    {
>>>>>>        int ret;
>>>>>>        BlockBackend *blk = job->common.blk;
>>>>>> -    int nbytes;
>>>>>> +    int nbytes, remaining_bytes;
>>>>>>        int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>>>>        assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>>> -    if (!*bounce_buffer) {
>>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>>> -    }
>>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>>>>> -    if (ret < 0) {
>>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>>> -        if (error_is_read) {
>>>>>> -            *error_is_read = true;
>>>>>> +
>>>>>> +    remaining_bytes = nbytes;
>>>>>> +    while (remaining_bytes) {
>>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>>> +
>>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>>>>> +        if (ret < 0) {
>>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>>> +            if (error_is_read) {
>>>>>> +                *error_is_read = true;
>>>>>> +            }
>>>>>> +            goto fail;
>>>>>>            }
>>>>>> -        goto fail;
>>>>>> -    }
>>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>>> -                        job->write_flags);
>>>>>> -    if (ret < 0) {
>>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>>> -        if (error_is_read) {
>>>>>> -            *error_is_read = false;
>>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>>>> +                            job->write_flags);
>>>>>> +        if (ret < 0) {
>>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>>> +            if (error_is_read) {
>>>>>> +                *error_is_read = false;
>>>>>> +            }
>>>>>> +            goto fail;
>>>>>>            }
>>>>>> -        goto fail;
>>>>>> +
>>>>>> +        start += chunk;
>>>>>> +        remaining_bytes -= chunk;
>>>>>>        }
>>>>>>        return nbytes;
>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>>>>                }
>>>>>>            }
>>>>>>            if (!job->use_copy_range) {
>>>>>> +            if (!bounce_buffer) {
>>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>>>>> +            }
>>>>>
>>>>> If you use _try_, you should probably also check whether it succeeded.
>>>>
>>>> Oops, you are right, of course.
>>>>
>>>>>
>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>>> once per job (the first time we get here, probably) to be of size
>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>>
>>>>
>>>> Once per job will not work, as we may have several guest writes in parallel and therefore
>>>> several parallel copy-before-write operations.
>>>
>>> Hm.  I’m not quite happy with that because if the guest just issues many
>>> large discards in parallel, this means that qemu will allocate a large
>>> amount of memory.
>>>
>>> It would be nice if there was a simple way to keep track of the total
>>> memory usage and let requests yield if they would exceed it.
>>
>> Agree, it should be fixed anyway.
>>
> 
> 
> But still..
> 
> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
> 
> If we assume that it is correct to double memory usage of guest operations, than for backup
> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.

Well, but that is the problem.  I didn’t say anything in v2, because I
only thought of normal writes and I found it fine to double the memory
usage there (a guest won’t issue huge write requests in parallel).  But
discard/write-zeroes are a different matter.

> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
> improvement to be after backup-top filter merged.

But do you need to distinguish it?  Why not just keep track of memory
usage and put the current I/O coroutine to sleep in a CoQueue or
something, and wake that up at the end of backup_do_cow()?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-13 14:23             ` Max Reitz
@ 2019-08-13 14:39               ` Vladimir Sementsov-Ogievskiy
  2019-08-13 14:57                 ` Max Reitz
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 14:39 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow

13.08.2019 17:23, Max Reitz wrote:
> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
>> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
>>> 12.08.2019 19:11, Max Reitz wrote:
>>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 12.08.2019 18:10, Max Reitz wrote:
>>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>>>
>>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>>>> allocate buffer of full-request length it may be too large, so
>>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>>>> at the point where we need the buffer for the first time.
>>>>>>>
>>>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>> ---
>>>>>>>     block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>>>>>     1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>>> index d482d93458..65f7212c85 100644
>>>>>>> --- a/block/backup.c
>>>>>>> +++ b/block/backup.c
>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>     #include "qemu/error-report.h"
>>>>>>>     #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>>>     typedef struct CowRequest {
>>>>>>>         int64_t start_byte;
>>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>>>         qemu_co_queue_restart_all(&req->wait_queue);
>>>>>>>     }
>>>>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>> - * error occurred, return a negative error number */
>>>>>>> +/*
>>>>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>> + * error occurred, return a negative error number
>>>>>>> + *
>>>>>>> + * @bounce_buffer is assumed to enough to store
>>>>>>
>>>>>> s/to/to be/
>>>>>>
>>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>>>> + */
>>>>>>>     static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>>>                                                           int64_t start,
>>>>>>>                                                           int64_t end,
>>>>>>>                                                           bool is_write_notifier,
>>>>>>>                                                           bool *error_is_read,
>>>>>>> -                                                      void **bounce_buffer)
>>>>>>> +                                                      void *bounce_buffer)
>>>>>>>     {
>>>>>>>         int ret;
>>>>>>>         BlockBackend *blk = job->common.blk;
>>>>>>> -    int nbytes;
>>>>>>> +    int nbytes, remaining_bytes;
>>>>>>>         int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>>>>>         assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>>>> -    if (!*bounce_buffer) {
>>>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>>>> -    }
>>>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>>>>>> -    if (ret < 0) {
>>>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>> -        if (error_is_read) {
>>>>>>> -            *error_is_read = true;
>>>>>>> +
>>>>>>> +    remaining_bytes = nbytes;
>>>>>>> +    while (remaining_bytes) {
>>>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>>>> +
>>>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>>>>>> +        if (ret < 0) {
>>>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>> +            if (error_is_read) {
>>>>>>> +                *error_is_read = true;
>>>>>>> +            }
>>>>>>> +            goto fail;
>>>>>>>             }
>>>>>>> -        goto fail;
>>>>>>> -    }
>>>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>>>> -                        job->write_flags);
>>>>>>> -    if (ret < 0) {
>>>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>> -        if (error_is_read) {
>>>>>>> -            *error_is_read = false;
>>>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>>>>> +                            job->write_flags);
>>>>>>> +        if (ret < 0) {
>>>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>> +            if (error_is_read) {
>>>>>>> +                *error_is_read = false;
>>>>>>> +            }
>>>>>>> +            goto fail;
>>>>>>>             }
>>>>>>> -        goto fail;
>>>>>>> +
>>>>>>> +        start += chunk;
>>>>>>> +        remaining_bytes -= chunk;
>>>>>>>         }
>>>>>>>         return nbytes;
>>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>>>>>                 }
>>>>>>>             }
>>>>>>>             if (!job->use_copy_range) {
>>>>>>> +            if (!bounce_buffer) {
>>>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>>>>>> +            }
>>>>>>
>>>>>> If you use _try_, you should probably also check whether it succeeded.
>>>>>
>>>>> Oops, you are right, of course.
>>>>>
>>>>>>
>>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>>>> once per job (the first time we get here, probably) to be of size
>>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>>>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>>>
>>>>>
>>>>> Once per job will not work, as we may have several guest writes in parallel and therefore
>>>>> several parallel copy-before-write operations.
>>>>
>>>> Hm.  I’m not quite happy with that because if the guest just issues many
>>>> large discards in parallel, this means that qemu will allocate a large
>>>> amount of memory.
>>>>
>>>> It would be nice if there was a simple way to keep track of the total
>>>> memory usage and let requests yield if they would exceed it.
>>>
>>> Agree, it should be fixed anyway.
>>>
>>
>>
>> But still..
>>
>> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
>>
>> If we assume that it is correct to double memory usage of guest operations, than for backup
>> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.
> 
> Well, but that is the problem.  I didn’t say anything in v2, because I
> only thought of normal writes and I found it fine to double the memory
> usage there (a guest won’t issue huge write requests in parallel).  But
> discard/write-zeroes are a different matter.
> 
>> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
>> improvement to be after backup-top filter merged.
> 
> But do you need to distinguish it?  Why not just keep track of memory
> usage and put the current I/O coroutine to sleep in a CoQueue or
> something, and wake that up at the end of backup_do_cow()?
> 

1. Because if we _can_ allow doubling of memory, it's more effective to not restrict allocations on
guest writes. It's just seems to be more effective technique.

2. Anyway, I'd allow some always-available size to allocate - let it be one cluster, which will correspond
to current behavior and prevent guest io hang in worst case. So I mean, if we have enough memory allow
individual CBW operation to allocate the whole buffer, and if we have no extra memory allow to allocate one
cluster.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-13 14:39               ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 14:57                 ` Max Reitz
  2019-08-13 15:00                   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 24+ messages in thread
From: Max Reitz @ 2019-08-13 14:57 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 10024 bytes --]

On 13.08.19 16:39, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 17:23, Max Reitz wrote:
>> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
>>> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
>>>> 12.08.2019 19:11, Max Reitz wrote:
>>>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 12.08.2019 18:10, Max Reitz wrote:
>>>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>>>>
>>>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>>>>> allocate buffer of full-request length it may be too large, so
>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>>>>> at the point where we need the buffer for the first time.
>>>>>>>>
>>>>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>>>>
>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>> ---
>>>>>>>>     block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>>>>>>     1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>>>> index d482d93458..65f7212c85 100644
>>>>>>>> --- a/block/backup.c
>>>>>>>> +++ b/block/backup.c
>>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>>     #include "qemu/error-report.h"
>>>>>>>>     #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>>>>     typedef struct CowRequest {
>>>>>>>>         int64_t start_byte;
>>>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>>>>         qemu_co_queue_restart_all(&req->wait_queue);
>>>>>>>>     }
>>>>>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>>> - * error occurred, return a negative error number */
>>>>>>>> +/*
>>>>>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>>> + * error occurred, return a negative error number
>>>>>>>> + *
>>>>>>>> + * @bounce_buffer is assumed to enough to store
>>>>>>>
>>>>>>> s/to/to be/
>>>>>>>
>>>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>>>>> + */
>>>>>>>>     static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>>>>                                                           int64_t start,
>>>>>>>>                                                           int64_t end,
>>>>>>>>                                                           bool is_write_notifier,
>>>>>>>>                                                           bool *error_is_read,
>>>>>>>> -                                                      void **bounce_buffer)
>>>>>>>> +                                                      void *bounce_buffer)
>>>>>>>>     {
>>>>>>>>         int ret;
>>>>>>>>         BlockBackend *blk = job->common.blk;
>>>>>>>> -    int nbytes;
>>>>>>>> +    int nbytes, remaining_bytes;
>>>>>>>>         int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>>>>>>         assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>>>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>>>>> -    if (!*bounce_buffer) {
>>>>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>>>>> -    }
>>>>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>>>>>>> -    if (ret < 0) {
>>>>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>> -        if (error_is_read) {
>>>>>>>> -            *error_is_read = true;
>>>>>>>> +
>>>>>>>> +    remaining_bytes = nbytes;
>>>>>>>> +    while (remaining_bytes) {
>>>>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>>>>> +
>>>>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>>>>>>> +        if (ret < 0) {
>>>>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>> +            if (error_is_read) {
>>>>>>>> +                *error_is_read = true;
>>>>>>>> +            }
>>>>>>>> +            goto fail;
>>>>>>>>             }
>>>>>>>> -        goto fail;
>>>>>>>> -    }
>>>>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>>>>> -                        job->write_flags);
>>>>>>>> -    if (ret < 0) {
>>>>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>> -        if (error_is_read) {
>>>>>>>> -            *error_is_read = false;
>>>>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>>>>>> +                            job->write_flags);
>>>>>>>> +        if (ret < 0) {
>>>>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>> +            if (error_is_read) {
>>>>>>>> +                *error_is_read = false;
>>>>>>>> +            }
>>>>>>>> +            goto fail;
>>>>>>>>             }
>>>>>>>> -        goto fail;
>>>>>>>> +
>>>>>>>> +        start += chunk;
>>>>>>>> +        remaining_bytes -= chunk;
>>>>>>>>         }
>>>>>>>>         return nbytes;
>>>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>>>>>>                 }
>>>>>>>>             }
>>>>>>>>             if (!job->use_copy_range) {
>>>>>>>> +            if (!bounce_buffer) {
>>>>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>>>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>>>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>>>>>>> +            }
>>>>>>>
>>>>>>> If you use _try_, you should probably also check whether it succeeded.
>>>>>>
>>>>>> Oops, you are right, of course.
>>>>>>
>>>>>>>
>>>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>>>>> once per job (the first time we get here, probably) to be of size
>>>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>>>>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>>>>
>>>>>>
>>>>>> Once per job will not work, as we may have several guest writes in parallel and therefore
>>>>>> several parallel copy-before-write operations.
>>>>>
>>>>> Hm.  I’m not quite happy with that because if the guest just issues many
>>>>> large discards in parallel, this means that qemu will allocate a large
>>>>> amount of memory.
>>>>>
>>>>> It would be nice if there was a simple way to keep track of the total
>>>>> memory usage and let requests yield if they would exceed it.
>>>>
>>>> Agree, it should be fixed anyway.
>>>>
>>>
>>>
>>> But still..
>>>
>>> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
>>>
>>> If we assume that it is correct to double memory usage of guest operations, than for backup
>>> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.
>>
>> Well, but that is the problem.  I didn’t say anything in v2, because I
>> only thought of normal writes and I found it fine to double the memory
>> usage there (a guest won’t issue huge write requests in parallel).  But
>> discard/write-zeroes are a different matter.
>>
>>> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
>>> improvement to be after backup-top filter merged.
>>
>> But do you need to distinguish it?  Why not just keep track of memory
>> usage and put the current I/O coroutine to sleep in a CoQueue or
>> something, and wake that up at the end of backup_do_cow()?
>>
> 
> 1. Because if we _can_ allow doubling of memory, it's more effective to not restrict allocations on
> guest writes. It's just seems to be more effective technique.

But the problem with backup and zero writes/discards is that the memory
is not doubled.  The request doesn’t need any memory, but the CBW
operation does, and maybe lots of it.

So the guest may issue many zero writes/discards in parallel and thus
exhaust memory on the host.

> 2. Anyway, I'd allow some always-available size to allocate - let it be one cluster, which will correspond
> to current behavior and prevent guest io hang in worst case.

The guest would only hang if it we have to copy more than e.g. 64 MB at
a time.  At which point I think it’s not unreasonable to sequentialize
requests.

Max

> So I mean, if we have enough memory allow
> individual CBW operation to allocate the whole buffer, and if we have no extra memory allow to allocate one
> cluster.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-13 14:57                 ` Max Reitz
@ 2019-08-13 15:00                   ` Vladimir Sementsov-Ogievskiy
  2019-08-13 15:02                     ` Max Reitz
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 15:00 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow

13.08.2019 17:57, Max Reitz wrote:
> On 13.08.19 16:39, Vladimir Sementsov-Ogievskiy wrote:
>> 13.08.2019 17:23, Max Reitz wrote:
>>> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
>>>> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 12.08.2019 19:11, Max Reitz wrote:
>>>>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> 12.08.2019 18:10, Max Reitz wrote:
>>>>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>>>>>
>>>>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>>>>>> allocate buffer of full-request length it may be too large, so
>>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>>>>>> at the point where we need the buffer for the first time.
>>>>>>>>>
>>>>>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>>> ---
>>>>>>>>>      block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>>>>>>>      1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>>>>> index d482d93458..65f7212c85 100644
>>>>>>>>> --- a/block/backup.c
>>>>>>>>> +++ b/block/backup.c
>>>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>>>      #include "qemu/error-report.h"
>>>>>>>>>      #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>>>>>      typedef struct CowRequest {
>>>>>>>>>          int64_t start_byte;
>>>>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>>>>>          qemu_co_queue_restart_all(&req->wait_queue);
>>>>>>>>>      }
>>>>>>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>>>> - * error occurred, return a negative error number */
>>>>>>>>> +/*
>>>>>>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>>>> + * error occurred, return a negative error number
>>>>>>>>> + *
>>>>>>>>> + * @bounce_buffer is assumed to enough to store
>>>>>>>>
>>>>>>>> s/to/to be/
>>>>>>>>
>>>>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>>>>>> + */
>>>>>>>>>      static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>>>>>                                                            int64_t start,
>>>>>>>>>                                                            int64_t end,
>>>>>>>>>                                                            bool is_write_notifier,
>>>>>>>>>                                                            bool *error_is_read,
>>>>>>>>> -                                                      void **bounce_buffer)
>>>>>>>>> +                                                      void *bounce_buffer)
>>>>>>>>>      {
>>>>>>>>>          int ret;
>>>>>>>>>          BlockBackend *blk = job->common.blk;
>>>>>>>>> -    int nbytes;
>>>>>>>>> +    int nbytes, remaining_bytes;
>>>>>>>>>          int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>>>>>>>          assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>>>>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>>>>>> -    if (!*bounce_buffer) {
>>>>>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>>>>>> -    }
>>>>>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>>>>>>>> -    if (ret < 0) {
>>>>>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>>> -        if (error_is_read) {
>>>>>>>>> -            *error_is_read = true;
>>>>>>>>> +
>>>>>>>>> +    remaining_bytes = nbytes;
>>>>>>>>> +    while (remaining_bytes) {
>>>>>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>>>>>> +
>>>>>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>>>>>>>> +        if (ret < 0) {
>>>>>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>>> +            if (error_is_read) {
>>>>>>>>> +                *error_is_read = true;
>>>>>>>>> +            }
>>>>>>>>> +            goto fail;
>>>>>>>>>              }
>>>>>>>>> -        goto fail;
>>>>>>>>> -    }
>>>>>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>>>>>> -                        job->write_flags);
>>>>>>>>> -    if (ret < 0) {
>>>>>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>>> -        if (error_is_read) {
>>>>>>>>> -            *error_is_read = false;
>>>>>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>>>>>>> +                            job->write_flags);
>>>>>>>>> +        if (ret < 0) {
>>>>>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>>> +            if (error_is_read) {
>>>>>>>>> +                *error_is_read = false;
>>>>>>>>> +            }
>>>>>>>>> +            goto fail;
>>>>>>>>>              }
>>>>>>>>> -        goto fail;
>>>>>>>>> +
>>>>>>>>> +        start += chunk;
>>>>>>>>> +        remaining_bytes -= chunk;
>>>>>>>>>          }
>>>>>>>>>          return nbytes;
>>>>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>>>>>>>                  }
>>>>>>>>>              }
>>>>>>>>>              if (!job->use_copy_range) {
>>>>>>>>> +            if (!bounce_buffer) {
>>>>>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>>>>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>>>>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>>>>>>>> +            }
>>>>>>>>
>>>>>>>> If you use _try_, you should probably also check whether it succeeded.
>>>>>>>
>>>>>>> Oops, you are right, of course.
>>>>>>>
>>>>>>>>
>>>>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>>>>>> once per job (the first time we get here, probably) to be of size
>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>>>>>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>>>>>
>>>>>>>
>>>>>>> Once per job will not work, as we may have several guest writes in parallel and therefore
>>>>>>> several parallel copy-before-write operations.
>>>>>>
>>>>>> Hm.  I’m not quite happy with that because if the guest just issues many
>>>>>> large discards in parallel, this means that qemu will allocate a large
>>>>>> amount of memory.
>>>>>>
>>>>>> It would be nice if there was a simple way to keep track of the total
>>>>>> memory usage and let requests yield if they would exceed it.
>>>>>
>>>>> Agree, it should be fixed anyway.
>>>>>
>>>>
>>>>
>>>> But still..
>>>>
>>>> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
>>>>
>>>> If we assume that it is correct to double memory usage of guest operations, than for backup
>>>> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.
>>>
>>> Well, but that is the problem.  I didn’t say anything in v2, because I
>>> only thought of normal writes and I found it fine to double the memory
>>> usage there (a guest won’t issue huge write requests in parallel).  But
>>> discard/write-zeroes are a different matter.
>>>
>>>> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
>>>> improvement to be after backup-top filter merged.
>>>
>>> But do you need to distinguish it?  Why not just keep track of memory
>>> usage and put the current I/O coroutine to sleep in a CoQueue or
>>> something, and wake that up at the end of backup_do_cow()?
>>>
>>
>> 1. Because if we _can_ allow doubling of memory, it's more effective to not restrict allocations on
>> guest writes. It's just seems to be more effective technique.
> 
> But the problem with backup and zero writes/discards is that the memory
> is not doubled.  The request doesn’t need any memory, but the CBW
> operation does, and maybe lots of it.
> 
> So the guest may issue many zero writes/discards in parallel and thus
> exhaust memory on the host.

So this is the reason to separate writes from write-zeros/discrads. So at least write will be happy. And I
think that write is more often request than write-zero/discard

> 
>> 2. Anyway, I'd allow some always-available size to allocate - let it be one cluster, which will correspond
>> to current behavior and prevent guest io hang in worst case.
> 
> The guest would only hang if it we have to copy more than e.g. 64 MB at
> a time.  At which point I think it’s not unreasonable to sequentialize
> requests.
> 



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-13 15:00                   ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 15:02                     ` Max Reitz
  2019-08-13 15:32                       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 24+ messages in thread
From: Max Reitz @ 2019-08-13 15:02 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 10689 bytes --]

On 13.08.19 17:00, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 17:57, Max Reitz wrote:
>> On 13.08.19 16:39, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.08.2019 17:23, Max Reitz wrote:
>>>> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 12.08.2019 19:11, Max Reitz wrote:
>>>>>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> 12.08.2019 18:10, Max Reitz wrote:
>>>>>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>>>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>>>>>>
>>>>>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>>>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>>>>>>> allocate buffer of full-request length it may be too large, so
>>>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>>>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>>>>>>> at the point where we need the buffer for the first time.
>>>>>>>>>>
>>>>>>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>>>> ---
>>>>>>>>>>      block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>>>>>>>>      1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>>>>>> index d482d93458..65f7212c85 100644
>>>>>>>>>> --- a/block/backup.c
>>>>>>>>>> +++ b/block/backup.c
>>>>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>>>>      #include "qemu/error-report.h"
>>>>>>>>>>      #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>>>>>>      typedef struct CowRequest {
>>>>>>>>>>          int64_t start_byte;
>>>>>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>>>>>>          qemu_co_queue_restart_all(&req->wait_queue);
>>>>>>>>>>      }
>>>>>>>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>>>>> - * error occurred, return a negative error number */
>>>>>>>>>> +/*
>>>>>>>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>>>>> + * error occurred, return a negative error number
>>>>>>>>>> + *
>>>>>>>>>> + * @bounce_buffer is assumed to enough to store
>>>>>>>>>
>>>>>>>>> s/to/to be/
>>>>>>>>>
>>>>>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>>>>>>> + */
>>>>>>>>>>      static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>>>>>>                                                            int64_t start,
>>>>>>>>>>                                                            int64_t end,
>>>>>>>>>>                                                            bool is_write_notifier,
>>>>>>>>>>                                                            bool *error_is_read,
>>>>>>>>>> -                                                      void **bounce_buffer)
>>>>>>>>>> +                                                      void *bounce_buffer)
>>>>>>>>>>      {
>>>>>>>>>>          int ret;
>>>>>>>>>>          BlockBackend *blk = job->common.blk;
>>>>>>>>>> -    int nbytes;
>>>>>>>>>> +    int nbytes, remaining_bytes;
>>>>>>>>>>          int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>>>>>>>>          assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>>>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>>>>>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>>>>>>> -    if (!*bounce_buffer) {
>>>>>>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>>>>>>> -    }
>>>>>>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>>>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>>>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>>>>>>>>> -    if (ret < 0) {
>>>>>>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>>>> -        if (error_is_read) {
>>>>>>>>>> -            *error_is_read = true;
>>>>>>>>>> +
>>>>>>>>>> +    remaining_bytes = nbytes;
>>>>>>>>>> +    while (remaining_bytes) {
>>>>>>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>>>>>>> +
>>>>>>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>>>>>>>>> +        if (ret < 0) {
>>>>>>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>>>> +            if (error_is_read) {
>>>>>>>>>> +                *error_is_read = true;
>>>>>>>>>> +            }
>>>>>>>>>> +            goto fail;
>>>>>>>>>>              }
>>>>>>>>>> -        goto fail;
>>>>>>>>>> -    }
>>>>>>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>>>>>>> -                        job->write_flags);
>>>>>>>>>> -    if (ret < 0) {
>>>>>>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>>>> -        if (error_is_read) {
>>>>>>>>>> -            *error_is_read = false;
>>>>>>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>>>>>>>> +                            job->write_flags);
>>>>>>>>>> +        if (ret < 0) {
>>>>>>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>>>> +            if (error_is_read) {
>>>>>>>>>> +                *error_is_read = false;
>>>>>>>>>> +            }
>>>>>>>>>> +            goto fail;
>>>>>>>>>>              }
>>>>>>>>>> -        goto fail;
>>>>>>>>>> +
>>>>>>>>>> +        start += chunk;
>>>>>>>>>> +        remaining_bytes -= chunk;
>>>>>>>>>>          }
>>>>>>>>>>          return nbytes;
>>>>>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>>>>>>>>                  }
>>>>>>>>>>              }
>>>>>>>>>>              if (!job->use_copy_range) {
>>>>>>>>>> +            if (!bounce_buffer) {
>>>>>>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>>>>>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>>>>>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>>>>>>>>> +            }
>>>>>>>>>
>>>>>>>>> If you use _try_, you should probably also check whether it succeeded.
>>>>>>>>
>>>>>>>> Oops, you are right, of course.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>>>>>>> once per job (the first time we get here, probably) to be of size
>>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>>>>>>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>>>>>>
>>>>>>>>
>>>>>>>> Once per job will not work, as we may have several guest writes in parallel and therefore
>>>>>>>> several parallel copy-before-write operations.
>>>>>>>
>>>>>>> Hm.  I’m not quite happy with that because if the guest just issues many
>>>>>>> large discards in parallel, this means that qemu will allocate a large
>>>>>>> amount of memory.
>>>>>>>
>>>>>>> It would be nice if there was a simple way to keep track of the total
>>>>>>> memory usage and let requests yield if they would exceed it.
>>>>>>
>>>>>> Agree, it should be fixed anyway.
>>>>>>
>>>>>
>>>>>
>>>>> But still..
>>>>>
>>>>> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
>>>>>
>>>>> If we assume that it is correct to double memory usage of guest operations, than for backup
>>>>> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.
>>>>
>>>> Well, but that is the problem.  I didn’t say anything in v2, because I
>>>> only thought of normal writes and I found it fine to double the memory
>>>> usage there (a guest won’t issue huge write requests in parallel).  But
>>>> discard/write-zeroes are a different matter.
>>>>
>>>>> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
>>>>> improvement to be after backup-top filter merged.
>>>>
>>>> But do you need to distinguish it?  Why not just keep track of memory
>>>> usage and put the current I/O coroutine to sleep in a CoQueue or
>>>> something, and wake that up at the end of backup_do_cow()?
>>>>
>>>
>>> 1. Because if we _can_ allow doubling of memory, it's more effective to not restrict allocations on
>>> guest writes. It's just seems to be more effective technique.
>>
>> But the problem with backup and zero writes/discards is that the memory
>> is not doubled.  The request doesn’t need any memory, but the CBW
>> operation does, and maybe lots of it.
>>
>> So the guest may issue many zero writes/discards in parallel and thus
>> exhaust memory on the host.
> 
> So this is the reason to separate writes from write-zeros/discrads. So at least write will be happy. And I
> think that write is more often request than write-zero/discard

But that makes it complicated for no practical gain whatsoever.

>>
>>> 2. Anyway, I'd allow some always-available size to allocate - let it be one cluster, which will correspond
>>> to current behavior and prevent guest io hang in worst case.
>>
>> The guest would only hang if it we have to copy more than e.g. 64 MB at
>> a time.  At which point I think it’s not unreasonable to sequentialize
>> requests.

Because of this.  How is it bad to start sequentializing writes when the
data exceeds 64 MB?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-13 15:02                     ` Max Reitz
@ 2019-08-13 15:32                       ` Vladimir Sementsov-Ogievskiy
  2019-08-13 16:30                         ` Max Reitz
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 15:32 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow

13.08.2019 18:02, Max Reitz wrote:
> On 13.08.19 17:00, Vladimir Sementsov-Ogievskiy wrote:
>> 13.08.2019 17:57, Max Reitz wrote:
>>> On 13.08.19 16:39, Vladimir Sementsov-Ogievskiy wrote:
>>>> 13.08.2019 17:23, Max Reitz wrote:
>>>>> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> 12.08.2019 19:11, Max Reitz wrote:
>>>>>>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>> 12.08.2019 18:10, Max Reitz wrote:
>>>>>>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>>>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>>>>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>>>>>>>
>>>>>>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>>>>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>>>>>>>> allocate buffer of full-request length it may be too large, so
>>>>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>>>>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>>>>>>>> at the point where we need the buffer for the first time.
>>>>>>>>>>>
>>>>>>>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>>>>> ---
>>>>>>>>>>>       block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>>>>>>>>>       1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>>>>>>> index d482d93458..65f7212c85 100644
>>>>>>>>>>> --- a/block/backup.c
>>>>>>>>>>> +++ b/block/backup.c
>>>>>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>>>>>       #include "qemu/error-report.h"
>>>>>>>>>>>       #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>>>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>>>>>>>       typedef struct CowRequest {
>>>>>>>>>>>           int64_t start_byte;
>>>>>>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>>>>>>>           qemu_co_queue_restart_all(&req->wait_queue);
>>>>>>>>>>>       }
>>>>>>>>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>>>>>> - * error occurred, return a negative error number */
>>>>>>>>>>> +/*
>>>>>>>>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>>>>>> + * error occurred, return a negative error number
>>>>>>>>>>> + *
>>>>>>>>>>> + * @bounce_buffer is assumed to enough to store
>>>>>>>>>>
>>>>>>>>>> s/to/to be/
>>>>>>>>>>
>>>>>>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>>>>>>>> + */
>>>>>>>>>>>       static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>>>>>>>                                                             int64_t start,
>>>>>>>>>>>                                                             int64_t end,
>>>>>>>>>>>                                                             bool is_write_notifier,
>>>>>>>>>>>                                                             bool *error_is_read,
>>>>>>>>>>> -                                                      void **bounce_buffer)
>>>>>>>>>>> +                                                      void *bounce_buffer)
>>>>>>>>>>>       {
>>>>>>>>>>>           int ret;
>>>>>>>>>>>           BlockBackend *blk = job->common.blk;
>>>>>>>>>>> -    int nbytes;
>>>>>>>>>>> +    int nbytes, remaining_bytes;
>>>>>>>>>>>           int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>>>>>>>>>           assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>>>>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>>>>>>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>>>>>>>> -    if (!*bounce_buffer) {
>>>>>>>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>>>>>>>> -    }
>>>>>>>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>>>>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>>>>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>>>>>>>>>> -    if (ret < 0) {
>>>>>>>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>>>>> -        if (error_is_read) {
>>>>>>>>>>> -            *error_is_read = true;
>>>>>>>>>>> +
>>>>>>>>>>> +    remaining_bytes = nbytes;
>>>>>>>>>>> +    while (remaining_bytes) {
>>>>>>>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>>>>>>>> +
>>>>>>>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>>>>>>>>>> +        if (ret < 0) {
>>>>>>>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>>>>> +            if (error_is_read) {
>>>>>>>>>>> +                *error_is_read = true;
>>>>>>>>>>> +            }
>>>>>>>>>>> +            goto fail;
>>>>>>>>>>>               }
>>>>>>>>>>> -        goto fail;
>>>>>>>>>>> -    }
>>>>>>>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>>>>>>>> -                        job->write_flags);
>>>>>>>>>>> -    if (ret < 0) {
>>>>>>>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>>>>> -        if (error_is_read) {
>>>>>>>>>>> -            *error_is_read = false;
>>>>>>>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>>>>>>>>> +                            job->write_flags);
>>>>>>>>>>> +        if (ret < 0) {
>>>>>>>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>>>>> +            if (error_is_read) {
>>>>>>>>>>> +                *error_is_read = false;
>>>>>>>>>>> +            }
>>>>>>>>>>> +            goto fail;
>>>>>>>>>>>               }
>>>>>>>>>>> -        goto fail;
>>>>>>>>>>> +
>>>>>>>>>>> +        start += chunk;
>>>>>>>>>>> +        remaining_bytes -= chunk;
>>>>>>>>>>>           }
>>>>>>>>>>>           return nbytes;
>>>>>>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>>>>>>>>>                   }
>>>>>>>>>>>               }
>>>>>>>>>>>               if (!job->use_copy_range) {
>>>>>>>>>>> +            if (!bounce_buffer) {
>>>>>>>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>>>>>>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>>>>>>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>>>>>>>>>> +            }
>>>>>>>>>>
>>>>>>>>>> If you use _try_, you should probably also check whether it succeeded.
>>>>>>>>>
>>>>>>>>> Oops, you are right, of course.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>>>>>>>> once per job (the first time we get here, probably) to be of size
>>>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>>>>>>>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Once per job will not work, as we may have several guest writes in parallel and therefore
>>>>>>>>> several parallel copy-before-write operations.
>>>>>>>>
>>>>>>>> Hm.  I’m not quite happy with that because if the guest just issues many
>>>>>>>> large discards in parallel, this means that qemu will allocate a large
>>>>>>>> amount of memory.
>>>>>>>>
>>>>>>>> It would be nice if there was a simple way to keep track of the total
>>>>>>>> memory usage and let requests yield if they would exceed it.
>>>>>>>
>>>>>>> Agree, it should be fixed anyway.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> But still..
>>>>>>
>>>>>> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
>>>>>>
>>>>>> If we assume that it is correct to double memory usage of guest operations, than for backup
>>>>>> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.
>>>>>
>>>>> Well, but that is the problem.  I didn’t say anything in v2, because I
>>>>> only thought of normal writes and I found it fine to double the memory
>>>>> usage there (a guest won’t issue huge write requests in parallel).  But
>>>>> discard/write-zeroes are a different matter.
>>>>>
>>>>>> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
>>>>>> improvement to be after backup-top filter merged.
>>>>>
>>>>> But do you need to distinguish it?  Why not just keep track of memory
>>>>> usage and put the current I/O coroutine to sleep in a CoQueue or
>>>>> something, and wake that up at the end of backup_do_cow()?
>>>>>
>>>>
>>>> 1. Because if we _can_ allow doubling of memory, it's more effective to not restrict allocations on
>>>> guest writes. It's just seems to be more effective technique.
>>>
>>> But the problem with backup and zero writes/discards is that the memory
>>> is not doubled.  The request doesn’t need any memory, but the CBW
>>> operation does, and maybe lots of it.
>>>
>>> So the guest may issue many zero writes/discards in parallel and thus
>>> exhaust memory on the host.
>>
>> So this is the reason to separate writes from write-zeros/discrads. So at least write will be happy. And I
>> think that write is more often request than write-zero/discard
> 
> But that makes it complicated for no practical gain whatsoever.
> 
>>>
>>>> 2. Anyway, I'd allow some always-available size to allocate - let it be one cluster, which will correspond
>>>> to current behavior and prevent guest io hang in worst case.
>>>
>>> The guest would only hang if it we have to copy more than e.g. 64 MB at
>>> a time.  At which point I think it’s not unreasonable to sequentialize
>>> requests.
> 
> Because of this.  How is it bad to start sequentializing writes when the
> data exceeds 64 MB?
> 

So you want total memory limit of 64 MB? (with possible parameter like in mirror)

And allocation algorithm to copy count bytes:

if free_mem >= count: allocate count bytes
else if free_mem >= cluster: allocate cluster and copy in a loop
else wait in co-queue until some memory available and retry

Is it OK for you?


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-13 15:32                       ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 16:30                         ` Max Reitz
  2019-08-13 16:45                           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 24+ messages in thread
From: Max Reitz @ 2019-08-13 16:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 3194 bytes --]

On 13.08.19 17:32, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 18:02, Max Reitz wrote:
>> On 13.08.19 17:00, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.08.2019 17:57, Max Reitz wrote:
>>>> On 13.08.19 16:39, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 13.08.2019 17:23, Max Reitz wrote:
>>>>>> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:

[...]

>>>>>>> But still..
>>>>>>>
>>>>>>> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
>>>>>>>
>>>>>>> If we assume that it is correct to double memory usage of guest operations, than for backup
>>>>>>> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.
>>>>>>
>>>>>> Well, but that is the problem.  I didn’t say anything in v2, because I
>>>>>> only thought of normal writes and I found it fine to double the memory
>>>>>> usage there (a guest won’t issue huge write requests in parallel).  But
>>>>>> discard/write-zeroes are a different matter.
>>>>>>
>>>>>>> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
>>>>>>> improvement to be after backup-top filter merged.
>>>>>>
>>>>>> But do you need to distinguish it?  Why not just keep track of memory
>>>>>> usage and put the current I/O coroutine to sleep in a CoQueue or
>>>>>> something, and wake that up at the end of backup_do_cow()?
>>>>>>
>>>>>
>>>>> 1. Because if we _can_ allow doubling of memory, it's more effective to not restrict allocations on
>>>>> guest writes. It's just seems to be more effective technique.
>>>>
>>>> But the problem with backup and zero writes/discards is that the memory
>>>> is not doubled.  The request doesn’t need any memory, but the CBW
>>>> operation does, and maybe lots of it.
>>>>
>>>> So the guest may issue many zero writes/discards in parallel and thus
>>>> exhaust memory on the host.
>>>
>>> So this is the reason to separate writes from write-zeros/discrads. So at least write will be happy. And I
>>> think that write is more often request than write-zero/discard
>>
>> But that makes it complicated for no practical gain whatsoever.
>>
>>>>
>>>>> 2. Anyway, I'd allow some always-available size to allocate - let it be one cluster, which will correspond
>>>>> to current behavior and prevent guest io hang in worst case.
>>>>
>>>> The guest would only hang if it we have to copy more than e.g. 64 MB at
>>>> a time.  At which point I think it’s not unreasonable to sequentialize
>>>> requests.
>>
>> Because of this.  How is it bad to start sequentializing writes when the
>> data exceeds 64 MB?
>>
> 
> So you want total memory limit of 64 MB? (with possible parameter like in mirror)
> 
> And allocation algorithm to copy count bytes:
> 
> if free_mem >= count: allocate count bytes
> else if free_mem >= cluster: allocate cluster and copy in a loop
> else wait in co-queue until some memory available and retry
> 
> Is it OK for you?

Sounds good to me, although I don’t know whether the second branch is
necessary.  As I’ve said, the total limit is just an insurance against a
guest that does some crazy stuff.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-13 16:30                         ` Max Reitz
@ 2019-08-13 16:45                           ` Vladimir Sementsov-Ogievskiy
  2019-08-13 16:51                             ` Max Reitz
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 16:45 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow

13.08.2019 19:30, Max Reitz wrote:
> On 13.08.19 17:32, Vladimir Sementsov-Ogievskiy wrote:
>> 13.08.2019 18:02, Max Reitz wrote:
>>> On 13.08.19 17:00, Vladimir Sementsov-Ogievskiy wrote:
>>>> 13.08.2019 17:57, Max Reitz wrote:
>>>>> On 13.08.19 16:39, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 13.08.2019 17:23, Max Reitz wrote:
>>>>>>> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
> 
> [...]
> 
>>>>>>>> But still..
>>>>>>>>
>>>>>>>> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
>>>>>>>>
>>>>>>>> If we assume that it is correct to double memory usage of guest operations, than for backup
>>>>>>>> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.
>>>>>>>
>>>>>>> Well, but that is the problem.  I didn’t say anything in v2, because I
>>>>>>> only thought of normal writes and I found it fine to double the memory
>>>>>>> usage there (a guest won’t issue huge write requests in parallel).  But
>>>>>>> discard/write-zeroes are a different matter.
>>>>>>>
>>>>>>>> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
>>>>>>>> improvement to be after backup-top filter merged.
>>>>>>>
>>>>>>> But do you need to distinguish it?  Why not just keep track of memory
>>>>>>> usage and put the current I/O coroutine to sleep in a CoQueue or
>>>>>>> something, and wake that up at the end of backup_do_cow()?
>>>>>>>
>>>>>>
>>>>>> 1. Because if we _can_ allow doubling of memory, it's more effective to not restrict allocations on
>>>>>> guest writes. It's just seems to be more effective technique.
>>>>>
>>>>> But the problem with backup and zero writes/discards is that the memory
>>>>> is not doubled.  The request doesn’t need any memory, but the CBW
>>>>> operation does, and maybe lots of it.
>>>>>
>>>>> So the guest may issue many zero writes/discards in parallel and thus
>>>>> exhaust memory on the host.
>>>>
>>>> So this is the reason to separate writes from write-zeros/discrads. So at least write will be happy. And I
>>>> think that write is more often request than write-zero/discard
>>>
>>> But that makes it complicated for no practical gain whatsoever.
>>>
>>>>>
>>>>>> 2. Anyway, I'd allow some always-available size to allocate - let it be one cluster, which will correspond
>>>>>> to current behavior and prevent guest io hang in worst case.
>>>>>
>>>>> The guest would only hang if it we have to copy more than e.g. 64 MB at
>>>>> a time.  At which point I think it’s not unreasonable to sequentialize
>>>>> requests.
>>>
>>> Because of this.  How is it bad to start sequentializing writes when the
>>> data exceeds 64 MB?
>>>
>>
>> So you want total memory limit of 64 MB? (with possible parameter like in mirror)
>>
>> And allocation algorithm to copy count bytes:
>>
>> if free_mem >= count: allocate count bytes
>> else if free_mem >= cluster: allocate cluster and copy in a loop
>> else wait in co-queue until some memory available and retry
>>
>> Is it OK for you?
> 
> Sounds good to me, although I don’t know whether the second branch is
> necessary.  As I’ve said, the total limit is just an insurance against a
> guest that does some crazy stuff.
> 

I'm afraid that if there would be one big request it may wait forever while smaller
requests will eat most of available memory. So it would be unfair queue: smaller
requests will have higher priority in low memory case. With [2] it becomes more fair.



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  2019-08-13 16:45                           ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 16:51                             ` Max Reitz
  0 siblings, 0 replies; 24+ messages in thread
From: Max Reitz @ 2019-08-13 16:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 3724 bytes --]

On 13.08.19 18:45, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 19:30, Max Reitz wrote:
>> On 13.08.19 17:32, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.08.2019 18:02, Max Reitz wrote:
>>>> On 13.08.19 17:00, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 13.08.2019 17:57, Max Reitz wrote:
>>>>>> On 13.08.19 16:39, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> 13.08.2019 17:23, Max Reitz wrote:
>>>>>>>> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
>>
>> [...]
>>
>>>>>>>>> But still..
>>>>>>>>>
>>>>>>>>> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
>>>>>>>>>
>>>>>>>>> If we assume that it is correct to double memory usage of guest operations, than for backup
>>>>>>>>> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.
>>>>>>>>
>>>>>>>> Well, but that is the problem.  I didn’t say anything in v2, because I
>>>>>>>> only thought of normal writes and I found it fine to double the memory
>>>>>>>> usage there (a guest won’t issue huge write requests in parallel).  But
>>>>>>>> discard/write-zeroes are a different matter.
>>>>>>>>
>>>>>>>>> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
>>>>>>>>> improvement to be after backup-top filter merged.
>>>>>>>>
>>>>>>>> But do you need to distinguish it?  Why not just keep track of memory
>>>>>>>> usage and put the current I/O coroutine to sleep in a CoQueue or
>>>>>>>> something, and wake that up at the end of backup_do_cow()?
>>>>>>>>
>>>>>>>
>>>>>>> 1. Because if we _can_ allow doubling of memory, it's more effective to not restrict allocations on
>>>>>>> guest writes. It's just seems to be more effective technique.
>>>>>>
>>>>>> But the problem with backup and zero writes/discards is that the memory
>>>>>> is not doubled.  The request doesn’t need any memory, but the CBW
>>>>>> operation does, and maybe lots of it.
>>>>>>
>>>>>> So the guest may issue many zero writes/discards in parallel and thus
>>>>>> exhaust memory on the host.
>>>>>
>>>>> So this is the reason to separate writes from write-zeros/discrads. So at least write will be happy. And I
>>>>> think that write is more often request than write-zero/discard
>>>>
>>>> But that makes it complicated for no practical gain whatsoever.
>>>>
>>>>>>
>>>>>>> 2. Anyway, I'd allow some always-available size to allocate - let it be one cluster, which will correspond
>>>>>>> to current behavior and prevent guest io hang in worst case.
>>>>>>
>>>>>> The guest would only hang if it we have to copy more than e.g. 64 MB at
>>>>>> a time.  At which point I think it’s not unreasonable to sequentialize
>>>>>> requests.
>>>>
>>>> Because of this.  How is it bad to start sequentializing writes when the
>>>> data exceeds 64 MB?
>>>>
>>>
>>> So you want total memory limit of 64 MB? (with possible parameter like in mirror)
>>>
>>> And allocation algorithm to copy count bytes:
>>>
>>> if free_mem >= count: allocate count bytes
>>> else if free_mem >= cluster: allocate cluster and copy in a loop
>>> else wait in co-queue until some memory available and retry
>>>
>>> Is it OK for you?
>>
>> Sounds good to me, although I don’t know whether the second branch is
>> necessary.  As I’ve said, the total limit is just an insurance against a
>> guest that does some crazy stuff.
>>
> 
> I'm afraid that if there would be one big request it may wait forever while smaller
> requests will eat most of available memory. So it would be unfair queue: smaller
> requests will have higher priority in low memory case. With [2] it becomes more fair.

OK.  Sounds reasonable.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/7] block/io: handle alignment and max_transfer for copy_range
  2019-08-12 14:48   ` Max Reitz
@ 2019-08-20 15:18     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-20 15:18 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: fam, kwolf, Denis Lunev, qemu-devel, armbru, stefanha, jsnow

12.08.2019 17:48, Max Reitz wrote:
> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>> copy_range ignores these limitations, let's improve it.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/io.c | 44 ++++++++++++++++++++++++++++++++++++--------
>>   1 file changed, 36 insertions(+), 8 deletions(-)
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 

Hmm. Now I think that next patch is arguable, and I still don't see true way of
organizing limitation of request length and memory allocation in conjunction with
async requests in backup.

So, I'll send next version of "improvements" without this (there are already enough
simpler patches).

And this patch becomes something separate. Do you think we need it anyway? If yes,
please queue it in separate. It may be better to return ENOTSUP on too big requests
too, to keep simpler code and make callers optimize their copying loops by themselves.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-08-20 15:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-10 19:31 [Qemu-devel] [PATCH v3 0/7] backup improvements Vladimir Sementsov-Ogievskiy
2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 1/7] block/backup: deal with zero detection Vladimir Sementsov-Ogievskiy
2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 2/7] block/backup: refactor write_flags Vladimir Sementsov-Ogievskiy
2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 3/7] block/io: handle alignment and max_transfer for copy_range Vladimir Sementsov-Ogievskiy
2019-08-12 14:48   ` Max Reitz
2019-08-20 15:18     ` Vladimir Sementsov-Ogievskiy
2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 4/7] block/backup: drop handling of " Vladimir Sementsov-Ogievskiy
2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 5/7] block/backup: fix backup_cow_with_offload for last cluster Vladimir Sementsov-Ogievskiy
2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once Vladimir Sementsov-Ogievskiy
2019-08-12 15:10   ` Max Reitz
2019-08-12 15:47     ` Vladimir Sementsov-Ogievskiy
2019-08-12 16:11       ` Max Reitz
2019-08-12 16:37         ` Vladimir Sementsov-Ogievskiy
2019-08-13 14:14           ` Vladimir Sementsov-Ogievskiy
2019-08-13 14:23             ` Max Reitz
2019-08-13 14:39               ` Vladimir Sementsov-Ogievskiy
2019-08-13 14:57                 ` Max Reitz
2019-08-13 15:00                   ` Vladimir Sementsov-Ogievskiy
2019-08-13 15:02                     ` Max Reitz
2019-08-13 15:32                       ` Vladimir Sementsov-Ogievskiy
2019-08-13 16:30                         ` Max Reitz
2019-08-13 16:45                           ` Vladimir Sementsov-Ogievskiy
2019-08-13 16:51                             ` Max Reitz
2019-08-10 19:31 ` [Qemu-devel] [PATCH v3 7/7] block/backup: merge duplicated logic into backup_do_cow Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).