All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image
@ 2015-11-06 10:22 Fam Zheng
  2015-11-06 18:36 ` [Qemu-devel] [Qemu-block] " Max Reitz
  2015-11-09 16:04 ` [Qemu-devel] " Kevin Wolf
  0 siblings, 2 replies; 10+ messages in thread
From: Fam Zheng @ 2015-11-06 10:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, pbonzini, Jeff Cody, qemu-block

The "pnum < nb_sectors" condition in deciding whether to actually copy
data is unnecessarily strict, and the qiov initialization is
unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
branches.

Reorganize mirror_iteration flow so that we:

    1) Find the contiguous zero/discarded sectors with
    bdrv_get_block_status_above() before deciding what to do. We query
    s->buf_size sized blocks at a time.

    2) If the sectors in question are zeroed/discarded and aligned to
    target cluster, issue zero write or discard accordingly. It's done
    in mirror_do_zero_or_discard, where we don't add buffer to qiov.

    3) Otherwise, do the same loop as before in mirror_do_read.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/mirror.c | 161 +++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 127 insertions(+), 34 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b1252a1..ac796b4 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -157,28 +157,21 @@ static void mirror_read_complete(void *opaque, int ret)
                     mirror_write_complete, op);
 }
 
-static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
+static uint64_t mirror_do_read(MirrorBlockJob *s)
 {
     BlockDriverState *source = s->common.bs;
-    int nb_sectors, sectors_per_chunk, nb_chunks;
-    int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
+    int sectors_per_chunk, nb_sectors, nb_chunks;
+    int64_t end, next_chunk, next_sector, hbitmap_next_sector, sector_num;
     uint64_t delay_ns = 0;
     MirrorOp *op;
-    int pnum;
-    int64_t ret;
 
-    s->sector_num = hbitmap_iter_next(&s->hbi);
-    if (s->sector_num < 0) {
-        bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
-        s->sector_num = hbitmap_iter_next(&s->hbi);
-        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
-        assert(s->sector_num >= 0);
-    }
+    sector_num = s->sector_num;
+    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+    end = s->bdev_length / BDRV_SECTOR_SIZE;
 
+    next_sector = sector_num;
+    next_chunk = sector_num / sectors_per_chunk;
     hbitmap_next_sector = s->sector_num;
-    sector_num = s->sector_num;
-    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-    end = s->bdev_length / BDRV_SECTOR_SIZE;
 
     /* Extend the QEMUIOVector to include all adjacent blocks that will
      * be copied in this operation.
@@ -198,14 +191,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     next_sector = sector_num;
     next_chunk = sector_num / sectors_per_chunk;
 
-    /* Wait for I/O to this cluster (from a previous iteration) to be done.  */
-    while (test_bit(next_chunk, s->in_flight_bitmap)) {
-        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
-        s->waiting_for_io = true;
-        qemu_coroutine_yield();
-        s->waiting_for_io = false;
-    }
-
     do {
         int added_sectors, added_chunks;
 
@@ -301,24 +286,132 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     s->sectors_in_flight += nb_sectors;
     trace_mirror_one_iteration(s, sector_num, nb_sectors);
 
-    ret = bdrv_get_block_status_above(source, NULL, sector_num,
-                                      nb_sectors, &pnum);
-    if (ret < 0 || pnum < nb_sectors ||
-            (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
-        bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
-                       mirror_read_complete, op);
-    } else if (ret & BDRV_BLOCK_ZERO) {
+    bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
+                   mirror_read_complete, op);
+    return delay_ns;
+}
+
+static uint64_t mirror_do_zero_or_discard(MirrorBlockJob *s,
+                                          int64_t sector_num,
+                                          int nb_sectors,
+                                          bool is_discard)
+{
+    int sectors_per_chunk, nb_chunks;
+    int64_t next_chunk, next_sector, hbitmap_next_sector;
+    uint64_t delay_ns = 0;
+    MirrorOp *op;
+
+    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+    assert(nb_sectors >= sectors_per_chunk);
+    next_chunk = sector_num / sectors_per_chunk;
+    nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
+    bitmap_set(s->in_flight_bitmap, next_chunk, nb_chunks);
+    delay_ns = ratelimit_calculate_delay(&s->limit, nb_sectors);
+
+    /* Allocate a MirrorOp that is used as an AIO callback. The qiov is zeroed
+     * out so the freeing in iteration is nop. */
+    op = g_new0(MirrorOp, 1);
+    op->s = s;
+    op->sector_num = sector_num;
+    op->nb_sectors = nb_sectors;
+
+    /* Advance the HBitmapIter in parallel, so that we do not examine
+     * the same sector twice.
+     */
+    hbitmap_next_sector = sector_num;
+    next_sector = sector_num + nb_sectors;
+    while (next_sector > hbitmap_next_sector) {
+        hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
+        if (hbitmap_next_sector < 0) {
+            break;
+        }
+    }
+
+    bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num, nb_sectors);
+    s->in_flight++;
+    s->sectors_in_flight += nb_sectors;
+    if (is_discard) {
+        bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
+                         mirror_write_complete, op);
+    } else {
         bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
                               s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
                               mirror_write_complete, op);
-    } else {
-        assert(!(ret & BDRV_BLOCK_DATA));
-        bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
-                         mirror_write_complete, op);
     }
+
     return delay_ns;
 }
 
+static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
+{
+    BlockDriverState *source = s->common.bs;
+    int sectors_per_chunk;
+    int64_t sector_num, next_chunk;
+    int ret;
+    int contiguous_sectors = s->buf_size >> BDRV_SECTOR_BITS;
+    enum MirrorMethod {
+        MIRROR_METHOD_COPY,
+        MIRROR_METHOD_ZERO,
+        MIRROR_METHOD_DISCARD
+    } mirror_method;
+
+    s->sector_num = hbitmap_iter_next(&s->hbi);
+    if (s->sector_num < 0) {
+        bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
+        s->sector_num = hbitmap_iter_next(&s->hbi);
+        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
+        assert(s->sector_num >= 0);
+    }
+
+    sector_num = s->sector_num;
+    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+    next_chunk = sector_num / sectors_per_chunk;
+
+    /* Wait for I/O to this cluster (from a previous iteration) to be done.  */
+    while (test_bit(next_chunk, s->in_flight_bitmap)) {
+        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+        s->waiting_for_io = true;
+        qemu_coroutine_yield();
+        s->waiting_for_io = false;
+    }
+
+    mirror_method = MIRROR_METHOD_COPY;
+    ret = bdrv_get_block_status_above(source, NULL, sector_num,
+                                      contiguous_sectors,
+                                      &contiguous_sectors);
+
+    contiguous_sectors -= contiguous_sectors % sectors_per_chunk;
+    if (ret < 0 || contiguous_sectors < sectors_per_chunk) {
+        contiguous_sectors = sectors_per_chunk;
+    } else if (!(ret & BDRV_BLOCK_DATA)) {
+        int64_t target_sector_num;
+        int target_nb_sectors;
+        bdrv_round_to_clusters(s->target, sector_num, contiguous_sectors,
+                               &target_sector_num, &target_nb_sectors);
+        if (target_sector_num == sector_num &&
+            target_nb_sectors == contiguous_sectors) {
+            mirror_method = ret & BDRV_BLOCK_ZERO ?
+                                MIRROR_METHOD_ZERO :
+                                MIRROR_METHOD_DISCARD;
+        }
+    }
+
+    switch (mirror_method) {
+    case MIRROR_METHOD_COPY:
+        return mirror_do_read(s);
+    case MIRROR_METHOD_ZERO:
+        return mirror_do_zero_or_discard(s, sector_num,
+                                         contiguous_sectors,
+                                         false);
+    case MIRROR_METHOD_DISCARD:
+        return mirror_do_zero_or_discard(s, sector_num,
+                                         contiguous_sectors,
+                                         false);
+    default:
+        abort();
+    }
+}
+
 static void mirror_free_init(MirrorBlockJob *s)
 {
     int granularity = s->granularity;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] mirror: Improve zero-write and discard with fragmented image
  2015-11-06 10:22 [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image Fam Zheng
@ 2015-11-06 18:36 ` Max Reitz
  2015-11-09  2:12   ` Fam Zheng
  2015-11-09 16:04 ` [Qemu-devel] " Kevin Wolf
  1 sibling, 1 reply; 10+ messages in thread
From: Max Reitz @ 2015-11-06 18:36 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel; +Cc: Kevin Wolf, pbonzini, qemu-block

[-- Attachment #1: Type: text/plain, Size: 13786 bytes --]

On 06.11.2015 11:22, Fam Zheng wrote:
> The "pnum < nb_sectors" condition in deciding whether to actually copy
> data is unnecessarily strict, and the qiov initialization is
> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
> branches.
> 
> Reorganize mirror_iteration flow so that we:
> 
>     1) Find the contiguous zero/discarded sectors with
>     bdrv_get_block_status_above() before deciding what to do. We query
>     s->buf_size sized blocks at a time.
> 
>     2) If the sectors in question are zeroed/discarded and aligned to
>     target cluster, issue zero write or discard accordingly. It's done
>     in mirror_do_zero_or_discard, where we don't add buffer to qiov.
> 
>     3) Otherwise, do the same loop as before in mirror_do_read.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/mirror.c | 161 +++++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 127 insertions(+), 34 deletions(-)

Looks good overall, some comments below.

Max

> diff --git a/block/mirror.c b/block/mirror.c
> index b1252a1..ac796b4 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -157,28 +157,21 @@ static void mirror_read_complete(void *opaque, int ret)
>                      mirror_write_complete, op);
>  }
>  
> -static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +static uint64_t mirror_do_read(MirrorBlockJob *s)
>  {
>      BlockDriverState *source = s->common.bs;
> -    int nb_sectors, sectors_per_chunk, nb_chunks;
> -    int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
> +    int sectors_per_chunk, nb_sectors, nb_chunks;
> +    int64_t end, next_chunk, next_sector, hbitmap_next_sector, sector_num;
>      uint64_t delay_ns = 0;
>      MirrorOp *op;
> -    int pnum;
> -    int64_t ret;
>  
> -    s->sector_num = hbitmap_iter_next(&s->hbi);
> -    if (s->sector_num < 0) {
> -        bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
> -        s->sector_num = hbitmap_iter_next(&s->hbi);
> -        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
> -        assert(s->sector_num >= 0);
> -    }
> +    sector_num = s->sector_num;
> +    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> +    end = s->bdev_length / BDRV_SECTOR_SIZE;
>  
> +    next_sector = sector_num;
> +    next_chunk = sector_num / sectors_per_chunk;

@next_sector and @next_chunk set here...

>      hbitmap_next_sector = s->sector_num;
> -    sector_num = s->sector_num;
> -    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> -    end = s->bdev_length / BDRV_SECTOR_SIZE;
>  
>      /* Extend the QEMUIOVector to include all adjacent blocks that will
>       * be copied in this operation.
> @@ -198,14 +191,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>      next_sector = sector_num;
>      next_chunk = sector_num / sectors_per_chunk;

...and here already. So the above seems superfluous, considering that
they are not read in between.

(If you keep hbitmap_next_sector = s->sector_num; above the sector_num =
... block, that may reduce conflicts further)

>  
> -    /* Wait for I/O to this cluster (from a previous iteration) to be done.  */
> -    while (test_bit(next_chunk, s->in_flight_bitmap)) {
> -        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
> -        s->waiting_for_io = true;
> -        qemu_coroutine_yield();
> -        s->waiting_for_io = false;
> -    }
> -
>      do {
>          int added_sectors, added_chunks;
>  
> @@ -301,24 +286,132 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>      s->sectors_in_flight += nb_sectors;
>      trace_mirror_one_iteration(s, sector_num, nb_sectors);
>  
> -    ret = bdrv_get_block_status_above(source, NULL, sector_num,
> -                                      nb_sectors, &pnum);
> -    if (ret < 0 || pnum < nb_sectors ||
> -            (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
> -        bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
> -                       mirror_read_complete, op);
> -    } else if (ret & BDRV_BLOCK_ZERO) {
> +    bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
> +                   mirror_read_complete, op);
> +    return delay_ns;
> +}
> +
> +static uint64_t mirror_do_zero_or_discard(MirrorBlockJob *s,
> +                                          int64_t sector_num,
> +                                          int nb_sectors,
> +                                          bool is_discard)
> +{
> +    int sectors_per_chunk, nb_chunks;
> +    int64_t next_chunk, next_sector, hbitmap_next_sector;
> +    uint64_t delay_ns = 0;
> +    MirrorOp *op;
> +
> +    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> +    assert(nb_sectors >= sectors_per_chunk);
> +    next_chunk = sector_num / sectors_per_chunk;
> +    nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
> +    bitmap_set(s->in_flight_bitmap, next_chunk, nb_chunks);
> +    delay_ns = ratelimit_calculate_delay(&s->limit, nb_sectors);
> +
> +    /* Allocate a MirrorOp that is used as an AIO callback. The qiov is zeroed
> +     * out so the freeing in iteration is nop. */
> +    op = g_new0(MirrorOp, 1);
> +    op->s = s;
> +    op->sector_num = sector_num;
> +    op->nb_sectors = nb_sectors;
> +
> +    /* Advance the HBitmapIter in parallel, so that we do not examine
> +     * the same sector twice.
> +     */
> +    hbitmap_next_sector = sector_num;
> +    next_sector = sector_num + nb_sectors;
> +    while (next_sector > hbitmap_next_sector) {
> +        hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
> +        if (hbitmap_next_sector < 0) {
> +            break;
> +        }
> +    }
> +
> +    bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num, nb_sectors);
> +    s->in_flight++;
> +    s->sectors_in_flight += nb_sectors;
> +    if (is_discard) {
> +        bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
> +                         mirror_write_complete, op);
> +    } else {
>          bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
>                                s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
>                                mirror_write_complete, op);
> -    } else {
> -        assert(!(ret & BDRV_BLOCK_DATA));
> -        bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
> -                         mirror_write_complete, op);
>      }
> +
>      return delay_ns;
>  }
>  
> +static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +{
> +    BlockDriverState *source = s->common.bs;
> +    int sectors_per_chunk;
> +    int64_t sector_num, next_chunk;
> +    int ret;
> +    int contiguous_sectors = s->buf_size >> BDRV_SECTOR_BITS;
> +    enum MirrorMethod {
> +        MIRROR_METHOD_COPY,
> +        MIRROR_METHOD_ZERO,
> +        MIRROR_METHOD_DISCARD
> +    } mirror_method;
> +
> +    s->sector_num = hbitmap_iter_next(&s->hbi);
> +    if (s->sector_num < 0) {
> +        bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
> +        s->sector_num = hbitmap_iter_next(&s->hbi);
> +        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
> +        assert(s->sector_num >= 0);
> +    }
> +
> +    sector_num = s->sector_num;
> +    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> +    next_chunk = sector_num / sectors_per_chunk;
> +
> +    /* Wait for I/O to this cluster (from a previous iteration) to be done.  */
> +    while (test_bit(next_chunk, s->in_flight_bitmap)) {
> +        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
> +        s->waiting_for_io = true;
> +        qemu_coroutine_yield();
> +        s->waiting_for_io = false;
> +    }
> +
> +    mirror_method = MIRROR_METHOD_COPY;
> +    ret = bdrv_get_block_status_above(source, NULL, sector_num,
> +                                      contiguous_sectors,
> +                                      &contiguous_sectors);

We did have NULL as the base before, but is that correct? Shouldn't we
use backing_bs(source) or s->base if @sync is none or top?

Examining this is fun, the short answer is probably "NULL is wrong, but
it's the best we can do here".

(1) with NULL here:

$ ./qemu-img create -f qcow2 -b base.qcow2 top.qcow2
$ ./qemu-io -c 'write -P 42 0 64k' base.qcow2
$ ./qemu-img create -f qcow2 -o compat=0.10 -b base.qcow2 top.qcow2
# compat=0.10 is important because otherwise discard will create a
# zero cluster instead of truly discarding
$ (echo "{'execute':'qmp_capabilities'}
	 {'execute':'drive-mirror','arguments':
             {'device':'drive0','target':'out.qcow2',
              'format':'qcow2','sync':'top'}}
         {'execute':'human-monitor-command','arguments':
             {'command-line':'qemu-io drive0 \"discard 0 64k\"'}}
         {'execute':'block-job-complete','arguments':
             {'device':'drive0'}}
         {'execute':'quit'}") | \
  x86_64-softmmu/qemu-system-x86_64 -qmp stdio \
      -drive if=none,file=top.qcow2,id=drive0,discard=unmap
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 2},
"package": ""}, "capabilities": []}}
{"return": {}}
Formatting 'out.qcow2', fmt=qcow2 size=67108864 backing_file=base.qcow2
backing_fmt=qcow2 encryption=off cluster_size=65536 lazy_refcounts=off
refcount_bits=16
{"return": {}}
discard 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0000 sec (3.815 GiB/sec and 62500.0000 ops/sec)
{"return": ""}
{"timestamp": {"seconds": 1446830775, "microseconds": 198114}, "event":
"BLOCK_JOB_READY", "data": {"device": "drive0", "len": 0, "offset": 0,
"speed": 0, "type": "mirror"}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": 1446830775, "microseconds": 198371}, "event":
"SHUTDOWN"}
{"timestamp": {"seconds": 1446830775, "microseconds": 205506}, "event":
"BLOCK_JOB_COMPLETED", "data": {"device": "drive0", "len": 65536,
"offset": 65536, "speed": 0, "type": "mirror"}}

(Note that the discard must have been executed before BLOCK_JOB_READY,
it's a bit racy)

Now we check:

$ ./qemu-io -c 'read -P 0 0 64k' out.qcow2
Pattern verification failed at offset 0, 65536 bytes
read 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0000 sec (856.164 MiB/sec and 13698.6301 ops/sec)
$ ./qemu-io -c 'read -P 42 0 64k' out.qcow2
read 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0000 sec (868.056 MiB/sec and 13888.8889 ops/sec)

Well, that seems reasonable, considering that out.qcow2 has base.qcow2
as the backing file, but:

$ ./qemu-img map out.qcow2
Offset          Length          Mapped to       File
0               0x10000         0x50000         out.qcow2

Hm... Well, at least out.qcow2 and top.qcow2 return the same values when
read, but we don't need to copy that from the backing file. Just leaving
the cluster unallocated would have been enough.


(2) So with s->base instead of NULL we get:

$ ./qemu-io -c 'read -P 0 0 64k' out.qcow2
read 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0000 sec (856.164 MiB/sec and 13698.6301 ops/sec)
$ ./qemu-io -c 'read -P 42 0 64k' out.qcow2
Pattern verification failed at offset 0, 65536 bytes
read 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0000 sec (892.857 MiB/sec and 14285.7143 ops/sec)

And:

$ ./qemu-img map out.qcow2
Offset          Length          Mapped to       File

So the map output looks right, but now we don't read the backing file
anymore. That's because for qcow2 v3, qemu writes zero clusters when
asked to discard (that's why I had to use compat=0.10 for top.qcow2).

Using a compat=0.10 image with @mode = existing fixes that. But well...


So the issue is: We don't have a reliable function to punch a hole into
an overlay file. discard could do that (and does so for qcow2 v2), but
we considered returning zeros the better option if possible because
people might find that more reasonable.

So while it would be best to copy holes in the overlay file created by
discards to the mirrored file, there is no way for us to do that.
Therefore, the best we can do is copy from the backing file.


So, OK, NULL it is.

> +
> +    contiguous_sectors -= contiguous_sectors % sectors_per_chunk;
> +    if (ret < 0 || contiguous_sectors < sectors_per_chunk) {
> +        contiguous_sectors = sectors_per_chunk;
> +    } else if (!(ret & BDRV_BLOCK_DATA)) {
> +        int64_t target_sector_num;
> +        int target_nb_sectors;
> +        bdrv_round_to_clusters(s->target, sector_num, contiguous_sectors,
> +                               &target_sector_num, &target_nb_sectors);
> +        if (target_sector_num == sector_num &&
> +            target_nb_sectors == contiguous_sectors) {
> +            mirror_method = ret & BDRV_BLOCK_ZERO ?
> +                                MIRROR_METHOD_ZERO :
> +                                MIRROR_METHOD_DISCARD;
> +        }
> +    }
> +
> +    switch (mirror_method) {
> +    case MIRROR_METHOD_COPY:
> +        return mirror_do_read(s);
> +    case MIRROR_METHOD_ZERO:
> +        return mirror_do_zero_or_discard(s, sector_num,
> +                                         contiguous_sectors,
> +                                         false);
> +    case MIRROR_METHOD_DISCARD:
> +        return mirror_do_zero_or_discard(s, sector_num,
> +                                         contiguous_sectors,
> +                                         false);

s/false/true/?

> +    default:
> +        abort();
> +    }
> +}
> +
>  static void mirror_free_init(MirrorBlockJob *s)
>  {
>      int granularity = s->granularity;
> 




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] mirror: Improve zero-write and discard with fragmented image
  2015-11-06 18:36 ` [Qemu-devel] [Qemu-block] " Max Reitz
@ 2015-11-09  2:12   ` Fam Zheng
  0 siblings, 0 replies; 10+ messages in thread
From: Fam Zheng @ 2015-11-09  2:12 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

On Fri, 11/06 19:36, Max Reitz wrote:
> > +    next_sector = sector_num;
> > +    next_chunk = sector_num / sectors_per_chunk;
> 
> @next_sector and @next_chunk set here...
> 
> >      hbitmap_next_sector = s->sector_num;
> > -    sector_num = s->sector_num;
> > -    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> > -    end = s->bdev_length / BDRV_SECTOR_SIZE;
> >  
> >      /* Extend the QEMUIOVector to include all adjacent blocks that will
> >       * be copied in this operation.
> > @@ -198,14 +191,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> >      next_sector = sector_num;
> >      next_chunk = sector_num / sectors_per_chunk;
> 
> ...and here already. So the above seems superfluous, considering that
> they are not read in between.
> 
> (If you keep hbitmap_next_sector = s->sector_num; above the sector_num =
> ... block, that may reduce conflicts further)

Indeed, thanks for noticing this.

> > +    case MIRROR_METHOD_DISCARD:
> > +        return mirror_do_zero_or_discard(s, sector_num,
> > +                                         contiguous_sectors,
> > +                                         false);
> 
> s/false/true/?

Yes, thanks.

Fam

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image
  2015-11-06 10:22 [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image Fam Zheng
  2015-11-06 18:36 ` [Qemu-devel] [Qemu-block] " Max Reitz
@ 2015-11-09 16:04 ` Kevin Wolf
  2015-11-09 16:18   ` Paolo Bonzini
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin Wolf @ 2015-11-09 16:04 UTC (permalink / raw)
  To: Fam Zheng; +Cc: pbonzini, Jeff Cody, qemu-devel, qemu-block

Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
> The "pnum < nb_sectors" condition in deciding whether to actually copy
> data is unnecessarily strict, and the qiov initialization is
> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
> branches.
> 
> Reorganize mirror_iteration flow so that we:
> 
>     1) Find the contiguous zero/discarded sectors with
>     bdrv_get_block_status_above() before deciding what to do. We query
>     s->buf_size sized blocks at a time.
> 
>     2) If the sectors in question are zeroed/discarded and aligned to
>     target cluster, issue zero write or discard accordingly. It's done
>     in mirror_do_zero_or_discard, where we don't add buffer to qiov.
> 
>     3) Otherwise, do the same loop as before in mirror_do_read.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>

I'm not sure where in the patch to comment on this, so I'll just do it
here right in the beginning.

I'm concerned that we need to be more careful about races in this patch,
in particular regarding the bitmaps. I think the conditions for the two
bitmaps are:

* Dirty bitmap: We must clear the bit after finding the next piece of
  data to be mirrored, but before we yield after getting information
  that we use for the decision which kind of operation we need.

  In other words, we need to clear the dirty bitmap bit before calling
  bdrv_get_block_status_above(), because that's both the function that
  retrieves information about the next chunk and also a function that
  can yield.

  If after this point the data is written to, we need to mirror it
  again.

* In-flight bitmap: We need to make sure that we never mirror the same
  data twice at the same time as older data could overwrite newer data
  then.

  Strictly speaking, it looks to me as if this meant we can delay
  setting the bit until before we issue an AIO operation. It might be
  more obviously correct to set it at the same time as the dirty bitmap
  is cleared.

Otherwise, this looks good to me.

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image
  2015-11-09 16:04 ` [Qemu-devel] " Kevin Wolf
@ 2015-11-09 16:18   ` Paolo Bonzini
  2015-11-09 16:29     ` Kevin Wolf
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2015-11-09 16:18 UTC (permalink / raw)
  To: Kevin Wolf, Fam Zheng; +Cc: Jeff Cody, qemu-devel, qemu-block



On 09/11/2015 17:04, Kevin Wolf wrote:
> Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
>> The "pnum < nb_sectors" condition in deciding whether to actually copy
>> data is unnecessarily strict, and the qiov initialization is
>> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
>> branches.
>>
>> Reorganize mirror_iteration flow so that we:
>>
>>     1) Find the contiguous zero/discarded sectors with
>>     bdrv_get_block_status_above() before deciding what to do. We query
>>     s->buf_size sized blocks at a time.
>>
>>     2) If the sectors in question are zeroed/discarded and aligned to
>>     target cluster, issue zero write or discard accordingly. It's done
>>     in mirror_do_zero_or_discard, where we don't add buffer to qiov.
>>
>>     3) Otherwise, do the same loop as before in mirror_do_read.
>>
>> Signed-off-by: Fam Zheng <famz@redhat.com>
> 
> I'm not sure where in the patch to comment on this, so I'll just do it
> here right in the beginning.
> 
> I'm concerned that we need to be more careful about races in this patch,
> in particular regarding the bitmaps. I think the conditions for the two
> bitmaps are:
> 
> * Dirty bitmap: We must clear the bit after finding the next piece of
>   data to be mirrored, but before we yield after getting information
>   that we use for the decision which kind of operation we need.
> 
>   In other words, we need to clear the dirty bitmap bit before calling
>   bdrv_get_block_status_above(), because that's both the function that
>   retrieves information about the next chunk and also a function that
>   can yield.
> 
>   If after this point the data is written to, we need to mirror it
>   again.

With Fam's patch, that's not trivial for two reasons:

1) bdrv_get_block_status_above() can return a smaller amount than what
is asked.

2) the "read and write" case can handle s->granularity sectors per
iteration (many of them can be coalesced, but still that's how the
iteration works).

The simplest solution is to perform the query with s->granularity size
rather than s->buf_size.

Paolo

> * In-flight bitmap: We need to make sure that we never mirror the same
>   data twice at the same time as older data could overwrite newer data
>   then.
> 
>   Strictly speaking, it looks to me as if this meant we can delay
>   setting the bit until before we issue an AIO operation. It might be
>   more obviously correct to set it at the same time as the dirty bitmap
>   is cleared.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image
  2015-11-09 16:18   ` Paolo Bonzini
@ 2015-11-09 16:29     ` Kevin Wolf
  2015-11-10  6:14       ` Fam Zheng
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin Wolf @ 2015-11-09 16:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jeff Cody, Fam Zheng, qemu-devel, qemu-block

Am 09.11.2015 um 17:18 hat Paolo Bonzini geschrieben:
> 
> 
> On 09/11/2015 17:04, Kevin Wolf wrote:
> > Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
> >> The "pnum < nb_sectors" condition in deciding whether to actually copy
> >> data is unnecessarily strict, and the qiov initialization is
> >> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
> >> branches.
> >>
> >> Reorganize mirror_iteration flow so that we:
> >>
> >>     1) Find the contiguous zero/discarded sectors with
> >>     bdrv_get_block_status_above() before deciding what to do. We query
> >>     s->buf_size sized blocks at a time.
> >>
> >>     2) If the sectors in question are zeroed/discarded and aligned to
> >>     target cluster, issue zero write or discard accordingly. It's done
> >>     in mirror_do_zero_or_discard, where we don't add buffer to qiov.
> >>
> >>     3) Otherwise, do the same loop as before in mirror_do_read.
> >>
> >> Signed-off-by: Fam Zheng <famz@redhat.com>
> > 
> > I'm not sure where in the patch to comment on this, so I'll just do it
> > here right in the beginning.
> > 
> > I'm concerned that we need to be more careful about races in this patch,
> > in particular regarding the bitmaps. I think the conditions for the two
> > bitmaps are:
> > 
> > * Dirty bitmap: We must clear the bit after finding the next piece of
> >   data to be mirrored, but before we yield after getting information
> >   that we use for the decision which kind of operation we need.
> > 
> >   In other words, we need to clear the dirty bitmap bit before calling
> >   bdrv_get_block_status_above(), because that's both the function that
> >   retrieves information about the next chunk and also a function that
> >   can yield.
> > 
> >   If after this point the data is written to, we need to mirror it
> >   again.
> 
> With Fam's patch, that's not trivial for two reasons:
> 
> 1) bdrv_get_block_status_above() can return a smaller amount than what
> is asked.
> 
> 2) the "read and write" case can handle s->granularity sectors per
> iteration (many of them can be coalesced, but still that's how the
> iteration works).
> 
> The simplest solution is to perform the query with s->granularity size
> rather than s->buf_size.

Then we end up with many small operations, that's not what we want.

Why can't we mark up to s->buf_size dirty clusters as clean first, then
query the status, and mark all of those that we can't handle dirty
again?

Kevin

> > * In-flight bitmap: We need to make sure that we never mirror the same
> >   data twice at the same time as older data could overwrite newer data
> >   then.
> > 
> >   Strictly speaking, it looks to me as if this meant we can delay
> >   setting the bit until before we issue an AIO operation. It might be
> >   more obviously correct to set it at the same time as the dirty bitmap
> >   is cleared.
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image
  2015-11-09 16:29     ` Kevin Wolf
@ 2015-11-10  6:14       ` Fam Zheng
  2015-11-10  9:01         ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Fam Zheng @ 2015-11-10  6:14 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, Jeff Cody, qemu-devel, qemu-block

On Mon, 11/09 17:29, Kevin Wolf wrote:
> Am 09.11.2015 um 17:18 hat Paolo Bonzini geschrieben:
> > 
> > 
> > On 09/11/2015 17:04, Kevin Wolf wrote:
> > > Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
> > >> The "pnum < nb_sectors" condition in deciding whether to actually copy
> > >> data is unnecessarily strict, and the qiov initialization is
> > >> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
> > >> branches.
> > >>
> > >> Reorganize mirror_iteration flow so that we:
> > >>
> > >>     1) Find the contiguous zero/discarded sectors with
> > >>     bdrv_get_block_status_above() before deciding what to do. We query
> > >>     s->buf_size sized blocks at a time.
> > >>
> > >>     2) If the sectors in question are zeroed/discarded and aligned to
> > >>     target cluster, issue zero write or discard accordingly. It's done
> > >>     in mirror_do_zero_or_discard, where we don't add buffer to qiov.
> > >>
> > >>     3) Otherwise, do the same loop as before in mirror_do_read.
> > >>
> > >> Signed-off-by: Fam Zheng <famz@redhat.com>
> > > 
> > > I'm not sure where in the patch to comment on this, so I'll just do it
> > > here right in the beginning.
> > > 
> > > I'm concerned that we need to be more careful about races in this patch,
> > > in particular regarding the bitmaps. I think the conditions for the two
> > > bitmaps are:
> > > 
> > > * Dirty bitmap: We must clear the bit after finding the next piece of
> > >   data to be mirrored, but before we yield after getting information
> > >   that we use for the decision which kind of operation we need.
> > > 
> > >   In other words, we need to clear the dirty bitmap bit before calling
> > >   bdrv_get_block_status_above(), because that's both the function that
> > >   retrieves information about the next chunk and also a function that
> > >   can yield.
> > > 
> > >   If after this point the data is written to, we need to mirror it
> > >   again.
> > 
> > With Fam's patch, that's not trivial for two reasons:
> > 
> > 1) bdrv_get_block_status_above() can return a smaller amount than what
> > is asked.
> > 
> > 2) the "read and write" case can handle s->granularity sectors per
> > iteration (many of them can be coalesced, but still that's how the
> > iteration works).
> > 
> > The simplest solution is to perform the query with s->granularity size
> > rather than s->buf_size.
> 
> Then we end up with many small operations, that's not what we want.
> 
> Why can't we mark up to s->buf_size dirty clusters as clean first, then
> query the status, and mark all of those that we can't handle dirty
> again?

Then we may end up marking more clusters as dirty than it should be.

Because all bdrv_set_dirty() and bdrv_set_dirty_bitmap() callers are coroutine,
we can introduce a CoMutex to let bitmap reader block bdrv_set_dirty and
bdrv_set_dirty_bitmap.

Fam

> 
> Kevin
> 
> > > * In-flight bitmap: We need to make sure that we never mirror the same
> > >   data twice at the same time as older data could overwrite newer data
> > >   then.
> > > 
> > >   Strictly speaking, it looks to me as if this meant we can delay
> > >   setting the bit until before we issue an AIO operation. It might be
> > >   more obviously correct to set it at the same time as the dirty bitmap
> > >   is cleared.
> > 
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image
  2015-11-10  6:14       ` Fam Zheng
@ 2015-11-10  9:01         ` Paolo Bonzini
  2015-11-10 10:12           ` Kevin Wolf
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2015-11-10  9:01 UTC (permalink / raw)
  To: Fam Zheng, Kevin Wolf; +Cc: qemu-devel, qemu-block



On 10/11/2015 07:14, Fam Zheng wrote:
> On Mon, 11/09 17:29, Kevin Wolf wrote:
>> Am 09.11.2015 um 17:18 hat Paolo Bonzini geschrieben:
>>>
>>>
>>> On 09/11/2015 17:04, Kevin Wolf wrote:
>>>> Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
>>>>> The "pnum < nb_sectors" condition in deciding whether to actually copy
>>>>> data is unnecessarily strict, and the qiov initialization is
>>>>> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
>>>>> branches.
>>>>>
>>>>> Reorganize mirror_iteration flow so that we:
>>>>>
>>>>>     1) Find the contiguous zero/discarded sectors with
>>>>>     bdrv_get_block_status_above() before deciding what to do. We query
>>>>>     s->buf_size sized blocks at a time.
>>>>>
>>>>>     2) If the sectors in question are zeroed/discarded and aligned to
>>>>>     target cluster, issue zero write or discard accordingly. It's done
>>>>>     in mirror_do_zero_or_discard, where we don't add buffer to qiov.
>>>>>
>>>>>     3) Otherwise, do the same loop as before in mirror_do_read.
>>>>>
>>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>>>
>>>> I'm not sure where in the patch to comment on this, so I'll just do it
>>>> here right in the beginning.
>>>>
>>>> I'm concerned that we need to be more careful about races in this patch,
>>>> in particular regarding the bitmaps. I think the conditions for the two
>>>> bitmaps are:
>>>>
>>>> * Dirty bitmap: We must clear the bit after finding the next piece of
>>>>   data to be mirrored, but before we yield after getting information
>>>>   that we use for the decision which kind of operation we need.
>>>>
>>>>   In other words, we need to clear the dirty bitmap bit before calling
>>>>   bdrv_get_block_status_above(), because that's both the function that
>>>>   retrieves information about the next chunk and also a function that
>>>>   can yield.
>>>>
>>>>   If after this point the data is written to, we need to mirror it
>>>>   again.
>>>
>>> With Fam's patch, that's not trivial for two reasons:
>>>
>>> 1) bdrv_get_block_status_above() can return a smaller amount than what
>>> is asked.
>>>
>>> 2) the "read and write" case can handle s->granularity sectors per
>>> iteration (many of them can be coalesced, but still that's how the
>>> iteration works).
>>>
>>> The simplest solution is to perform the query with s->granularity size
>>> rather than s->buf_size.
>>
>> Then we end up with many small operations, that's not what we want.
>>
>> Why can't we mark up to s->buf_size dirty clusters as clean first, then
>> query the status, and mark all of those that we can't handle dirty
>> again?
> 
> Then we may end up marking more clusters as dirty than it should be.

You're both right.

> Because all bdrv_set_dirty() and bdrv_set_dirty_bitmap() callers are coroutine,
> we can introduce a CoMutex to let bitmap reader block bdrv_set_dirty and
> bdrv_set_dirty_bitmap.

I think this is not necessary.

I think the following is safe:

1) before calling bdrv_get_block_status_above(), find out how many
consecutive bits in the dirty bitmap are 1

2) zero all those bits in the dirty bitmap

3) call bdrv_get_block_status_above() with a size equivalent to the
number of dirty bits

4) if bdrv_get_block_status_above() only returns a partial result, loop
step (3) until all the dirty bits are processed

For full mirroring, this strategy will probably make the first
incremental iteration more expensive.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image
  2015-11-10  9:01         ` Paolo Bonzini
@ 2015-11-10 10:12           ` Kevin Wolf
  2015-11-10 10:30             ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin Wolf @ 2015-11-10 10:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Fam Zheng, qemu-devel, qemu-block

Am 10.11.2015 um 10:01 hat Paolo Bonzini geschrieben:
> 
> 
> On 10/11/2015 07:14, Fam Zheng wrote:
> > On Mon, 11/09 17:29, Kevin Wolf wrote:
> >> Am 09.11.2015 um 17:18 hat Paolo Bonzini geschrieben:
> >>>
> >>>
> >>> On 09/11/2015 17:04, Kevin Wolf wrote:
> >>>> Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
> >>>>> The "pnum < nb_sectors" condition in deciding whether to actually copy
> >>>>> data is unnecessarily strict, and the qiov initialization is
> >>>>> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
> >>>>> branches.
> >>>>>
> >>>>> Reorganize mirror_iteration flow so that we:
> >>>>>
> >>>>>     1) Find the contiguous zero/discarded sectors with
> >>>>>     bdrv_get_block_status_above() before deciding what to do. We query
> >>>>>     s->buf_size sized blocks at a time.
> >>>>>
> >>>>>     2) If the sectors in question are zeroed/discarded and aligned to
> >>>>>     target cluster, issue zero write or discard accordingly. It's done
> >>>>>     in mirror_do_zero_or_discard, where we don't add buffer to qiov.
> >>>>>
> >>>>>     3) Otherwise, do the same loop as before in mirror_do_read.
> >>>>>
> >>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
> >>>>
> >>>> I'm not sure where in the patch to comment on this, so I'll just do it
> >>>> here right in the beginning.
> >>>>
> >>>> I'm concerned that we need to be more careful about races in this patch,
> >>>> in particular regarding the bitmaps. I think the conditions for the two
> >>>> bitmaps are:
> >>>>
> >>>> * Dirty bitmap: We must clear the bit after finding the next piece of
> >>>>   data to be mirrored, but before we yield after getting information
> >>>>   that we use for the decision which kind of operation we need.
> >>>>
> >>>>   In other words, we need to clear the dirty bitmap bit before calling
> >>>>   bdrv_get_block_status_above(), because that's both the function that
> >>>>   retrieves information about the next chunk and also a function that
> >>>>   can yield.
> >>>>
> >>>>   If after this point the data is written to, we need to mirror it
> >>>>   again.
> >>>
> >>> With Fam's patch, that's not trivial for two reasons:
> >>>
> >>> 1) bdrv_get_block_status_above() can return a smaller amount than what
> >>> is asked.
> >>>
> >>> 2) the "read and write" case can handle s->granularity sectors per
> >>> iteration (many of them can be coalesced, but still that's how the
> >>> iteration works).
> >>>
> >>> The simplest solution is to perform the query with s->granularity size
> >>> rather than s->buf_size.
> >>
> >> Then we end up with many small operations, that's not what we want.
> >>
> >> Why can't we mark up to s->buf_size dirty clusters as clean first, then
> >> query the status, and mark all of those that we can't handle dirty
> >> again?
> > 
> > Then we may end up marking more clusters as dirty than it should be.
> 
> You're both right.
> 
> > Because all bdrv_set_dirty() and bdrv_set_dirty_bitmap() callers are coroutine,
> > we can introduce a CoMutex to let bitmap reader block bdrv_set_dirty and
> > bdrv_set_dirty_bitmap.
> 
> I think this is not necessary.
> 
> I think the following is safe:
> 
> 1) before calling bdrv_get_block_status_above(), find out how many
> consecutive bits in the dirty bitmap are 1
> 
> 2) zero all those bits in the dirty bitmap
> 
> 3) call bdrv_get_block_status_above() with a size equivalent to the
> number of dirty bits
> 
> 4) if bdrv_get_block_status_above() only returns a partial result, loop
> step (3) until all the dirty bits are processed

Right, you can always implement one iteration with more than one I/O
request. And maybe that would be the time to start a coroutine for the
requests already in the mirror code instead of complicating the AIO
state machine and letting block.c start coroutines.

> For full mirroring, this strategy will probably make the first
> incremental iteration more expensive.

You mean because we issue smaller, interleaved write and write_zeroes
requests now instead of only large writes? That's probably right, but
getting the right result should be more important than speed. :-)

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image
  2015-11-10 10:12           ` Kevin Wolf
@ 2015-11-10 10:30             ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2015-11-10 10:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Fam Zheng, qemu-devel, qemu-block



On 10/11/2015 11:12, Kevin Wolf wrote:
> > For full mirroring, this strategy will probably make the first
> > incremental iteration more expensive.
>
> You mean because we issue smaller, interleaved write and write_zeroes
> requests now instead of only large writes? That's probably right, but
> getting the right result should be more important than speed. :-)

No, because you might end up clearing the whole dirty bitmap before
issuing the first bdrv_get_block_status_above().  Blocks are actually
read much later; if someone sets the dirty bitmap in between, you will
re-write those blocks unnecessarily during the first incremental
iteration.  It's not specific to the first iteration, it's just more likely.

However, it may be enough to clamp the number of dirty bitmap bits that
you process in one go (e.g. to 100 MB of so).

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-11-10 10:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-06 10:22 [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image Fam Zheng
2015-11-06 18:36 ` [Qemu-devel] [Qemu-block] " Max Reitz
2015-11-09  2:12   ` Fam Zheng
2015-11-09 16:04 ` [Qemu-devel] " Kevin Wolf
2015-11-09 16:18   ` Paolo Bonzini
2015-11-09 16:29     ` Kevin Wolf
2015-11-10  6:14       ` Fam Zheng
2015-11-10  9:01         ` Paolo Bonzini
2015-11-10 10:12           ` Kevin Wolf
2015-11-10 10:30             ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.